Problem Scenario 11 : You have been given following mysql database details as well as other info.
user=retail_dba
password=cloudera
database=retail_db
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Please accomplish following.
1. Import departments table in a directory called departments.
2. Once import is done, please insert following 5 records in departments mysql table.
Insert into departments(10, physics);
Insert into departments(11, Chemistry);
Insert into departments(12, Maths);
Insert into departments(13, Science);
Insert into departments(14, Engineering);
3. Now import only new inserted records and append to existring directory . which has been created in first step.
Problem Scenario 64 : You have been given below code snippet.
val a = sc.parallelize(List("dog", "salmon", "salmon", "rat", "elephant"), 3)
val b = a.keyBy(_.length)
val c = sc.parallelize(Ust("dog","cat","gnu","salmon","rabbit","turkey","wolf","bear","bee"), 3)
val d = c.keyBy(_.length)
operation1
Write a correct code snippet for operationl which will produce desired output, shown below.
Array[(lnt, (Option[String], String))] = Array((6,(Some(salmon),salmon)), (6,(Some(salmon),rabbit}}, (6,(Some(salmon),turkey)), (6,(Some(salmon),salmon)), (6,(Some(salmon),rabbit)), (6,(Some(salmon),turkey)), (3,(Some(dog),dog)), (3,(Some(dog),cat)), (3,(Some(dog),gnu)), (3,(Some(dog),bee)), (3,(Some(rat), (3,(Some(rat),cat)), (3,(Some(rat),gnu)), (3,(Some(rat),bee)), (4,(None,wo!f)), (4,(None,bear)))
Problem Scenario 29 : Please accomplish the following exercises using HDFS command line options.
1. Create a directory in hdfs named hdfs_commands.
2. Create a file in hdfs named data.txt in hdfs_commands.
3. Now copy this data.txt file on local filesystem, however while copying file please make sure file properties are not changed e.g. file permissions.
4. Now create a file in local directory named data_local.txt and move this file to hdfs in hdfs_commands directory.
5. Create a file data_hdfs.txt in hdfs_commands directory and copy it to local file system.
6. Create a file in local filesystem named file1.txt and put it to hdfs
Problem Scenario 82 : You have been given table in Hive with following structure (Which you have created in previous exercise).
productid int code string name string quantity int price float
Using SparkSQL accomplish following activities.
1. Select all the products name and quantity having quantity <= 2000
2. Select name and price of the product having code as 'PEN'
3. Select all the products, which name starts with PENCIL
4. Select all products which "name" begins with 'P\ followed by any two characters, followed by space, followed by zero or more characters
Problem Scenario 84 : In Continuation of previous question, please accomplish following activities.
1. Select all the products which has product code as null
2. Select all the products, whose name starts with Pen and results should be order by Price descending order.
3. Select all the products, whose name starts with Pen and results should be order by Price descending order and quantity ascending order.
4. Select top 2 products by price
Problem Scenario 76 : You have been given MySQL DB with following details.
user=retail_dba
password=cloudera
database=retail_db
table=retail_db.orders
table=retail_db.order_items
jdbc URL = jdbc:mysql://quickstart:3306/retail_db
Columns of order table : (orderid , order_date , ordercustomerid, order_status}
.....
Please accomplish following activities.
1. Copy "retail_db.orders" table to hdfs in a directory p91_orders.
2. Once data is copied to hdfs, using pyspark calculate the number of order for each status.
3. Use all the following methods to calculate the number of order for each status. (You need to know all these functions and its behavior for real exam)
- countByKey()
-groupByKey()
- reduceByKey()
-aggregateByKey()
- combineByKey()