Winter Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: v4s65

Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 Exam Dumps - Databricks Certified Associate Developer for Apache Spark 3.0 Exam

Go to page:
Question # 17

The code block shown below should return a one-column DataFrame where the column storeId is converted to string type. Choose the answer that correctly fills the blanks in the code block to

accomplish this.

transactionsDf.__1__(__2__.__3__(__4__))

A.

1. select

2. col("storeId")

3. cast

4. StringType

B.

1. select

2. col("storeId")

3. as

4. StringType

C.

1. cast

2. "storeId"

3. as

4. StringType()

D.

1. select

2. col("storeId")

3. cast

4. StringType()

E.

1. select

2. storeId

3. cast

4. StringType()

Full Access
Question # 18

Which of the following code blocks returns a 2-column DataFrame that shows the distinct values in column productId and the number of rows with that productId in DataFrame transactionsDf?

A.

transactionsDf.count("productId").distinct()

B.

transactionsDf.groupBy("productId").agg(col("value").count())

C.

transactionsDf.count("productId")

D.

transactionsDf.groupBy("productId").count()

E.

transactionsDf.groupBy("productId").select(count("value"))

Full Access
Question # 19

Which of the following describes characteristics of the Spark UI?

A.

Via the Spark UI, workloads can be manually distributed across executors.

B.

Via the Spark UI, stage execution speed can be modified.

C.

The Scheduler tab shows how jobs that are run in parallel by multiple users are distributed across the cluster.

D.

There is a place in the Spark UI that shows the property spark.executor.memory.

E.

Some of the tabs in the Spark UI are named Jobs, Stages, Storage, DAGs, Executors, and SQL.

Full Access
Question # 20

Which of the following describes Spark's standalone deployment mode?

A.

Standalone mode uses a single JVM to run Spark driver and executor processes.

B.

Standalone mode means that the cluster does not contain the driver.

C.

Standalone mode is how Spark runs on YARN and Mesos clusters.

D.

Standalone mode uses only a single executor per worker per application.

E.

Standalone mode is a viable solution for clusters that run multiple frameworks, not only Spark.

Full Access
Question # 21

Which of the following code blocks returns the number of unique values in column storeId of DataFrame transactionsDf?

A.

transactionsDf.select("storeId").dropDuplicates().count()

B.

transactionsDf.select(count("storeId")).dropDuplicates()

C.

transactionsDf.select(distinct("storeId")).count()

D.

transactionsDf.dropDuplicates().agg(count("storeId"))

E.

transactionsDf.distinct().select("storeId").count()

Full Access
Question # 22

The code block shown below should return a DataFrame with two columns, itemId and col. In this DataFrame, for each element in column attributes of DataFrame itemDf there should be a separate

row in which the column itemId contains the associated itemId from DataFrame itemsDf. The new DataFrame should only contain rows for rows in DataFrame itemsDf in which the column attributes

contains the element cozy.

A sample of DataFrame itemsDf is below.

Code block:

itemsDf.__1__(__2__).__3__(__4__, __5__(__6__))

A.

1. filter

2. array_contains("cozy")

3. select

4. "itemId"

5. explode

6. "attributes"

B.

1. where

2. "array_contains(attributes, 'cozy')"

3. select

4. itemId

5. explode

6. attributes

C.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. map

6. "attributes"

D.

1. filter

2. "array_contains(attributes, cozy)"

3. select

4. "itemId"

5. explode

6. "attributes"

E.

1. filter

2. "array_contains(attributes, 'cozy')"

3. select

4. "itemId"

5. explode

6. "attributes"

Full Access
Question # 23

Which of the following code blocks performs an inner join of DataFrames transactionsDf and itemsDf on columns productId and itemId, respectively, excluding columns value and storeId from

DataFrame transactionsDf and column attributes from DataFrame itemsDf?

A.

transactionsDf.drop('value', 'storeId').join(itemsDf.select('attributes'), transactionsDf.productId==itemsDf.itemId)

B.

1.transactionsDf.createOrReplaceTempView('transactionsDf')

2.itemsDf.createOrReplaceTempView('itemsDf')

3.

4.spark.sql("SELECT -value, -storeId FROM transactionsDf INNER JOIN itemsDf ON productId==itemId").drop("attributes")

C.

transactionsDf.drop("value", "storeId").join(itemsDf.drop("attributes"), "transactionsDf.productId==itemsDf.itemId")

D.

1.transactionsDf \

2. .drop(col('value'), col('storeId')) \

3. .join(itemsDf.drop(col('attributes')), col('productId')==col('itemId'))

E.

1.transactionsDf.createOrReplaceTempView('transactionsDf')

2.itemsDf.createOrReplaceTempView('itemsDf')

3.

4.statement = """

5.SELECT * FROM transactionsDf

6.INNER JOIN itemsDf

7.ON transactionsDf.productId==itemsDf.itemId

8."""

9.spark.sql(statement).drop("value", "storeId", "attributes")

Full Access
Question # 24

The code block displayed below contains an error. The code block should use Python method find_most_freq_letter to find the letter present most in column itemName of DataFrame itemsDf and

return it in a new column most_frequent_letter. Find the error.

Code block:

1. find_most_freq_letter_udf = udf(find_most_freq_letter)

2. itemsDf.withColumn("most_frequent_letter", find_most_freq_letter("itemName"))

A.

Spark is not using the UDF method correctly.

B.

The UDF method is not registered correctly, since the return type is missing.

C.

The "itemName" expression should be wrapped in col().

D.

UDFs do not exist in PySpark.

E.

Spark is not adding a column.

Full Access
Go to page: