there are the following scenarios: new data is generated by the online system every day500G, it is necessary to analyze these data by day, week,
Monthly and other dimensions for statistical summary. ask if it is suitable for useHiveWhat kind of table to handle?
aboutStreamingthe topology (Topology), which of the following descriptions is wrong?
existFusionInsight HDcluster, aboutkinitOperation command, which of the following statements is wrong? (multiple choice)
FusionInsight HDofHive, user-definedUDFcan andHiveBuilt-inUDFduplicate name, in this case,
will use user-definedUDF.
existSpark, assuminglinesIs anDStreamobject,filterStatements can be filtered out80%data for the following two
The correct statement is:
X: lines.filter(…).groupByKey(…)
Y: lines.groupByKey(…).filter(…)
HDFSRuntime,NameNodewill load all the metadata of the file system from disk into memory, so the file system can
The total number of files stored is limited byNameNodememory capacity.
FusionInsight HD a completeStreaming CQLWhich of the following parts does the application contain at least? (multiple choice)
existFusionInsight HDin the cluster,FlumeWhich service does not support writing collected data to the cluster?