Explanation: Solution :
Step 1 : Import Single table
sqoop import --connect jdbc:mysql://quickstart:3306/retail_db --username=retail dba -password=cloudera -table=orders --target-dir=p91_orders
Note : Please check you dont have space between before or after '=' sign. Sqoop uses the MapReduce framework to copy data from RDBMS to hdfs
Step 2 : Read the data from one of the partition, created using above command, hadoop fs -cat p91_orders/part-m-00000
Step 3: countByKey #Number of orders by status allOrders = sc.textFile("p91_orders")
#Generate key and value pairs (key is order status and vale as an empty string keyValue = aIIOrders.map(lambda line: (line.split(",")[3], ""))
#Using countByKey, aggregate data based on status as a key output=keyValue.countByKey()Jtems()
for line in output: print(line)
Step 4 : groupByKey
#Generate key and value pairs (key is order status and vale as an one
keyValue = allOrders.map(lambda line: (line.split)",")[3], 1))
#Using countByKey, aggregate data based on status as a key output= keyValue.groupByKey().map(lambda kv: (kv[0], sum(kv[1]}}}
tor line in output.collect(): print(line}
Step 5 : reduceByKey
#Generate key and value pairs (key is order status and vale as an one
keyValue = allOrders.map(lambda line: (line.split(","}[3], 1))
#Using countByKey, aggregate data based on status as a key output= keyValue.reduceByKey(lambda a, b: a + b)
tor line in output.collect(): print(line}
Step 6: aggregateByKey
#Generate key and value pairs (key is order status and vale as an one keyValue = allOrders.map(lambda line: (line.split(",")[3], line}}
output=keyValue.aggregateByKey(0, lambda a, b: a+1, lambda a, b: a+b}
for line in output.collect(): print(line}
Step 7 : combineByKey
#Generate key and value pairs (key is order status and vale as an one
keyValue = allOrders.map(lambda line: (line.split(",")[3], line))
output=keyValue.combineByKey(lambda value: 1, lambda ace, value: acc+1, lambda ace, value: acc+value)
tor line in output.collect(): print(line)
#Watch Spark Professional Training provided by www.ABCTECH.com to understand more on each above functions. (These are very important functions for real exam)