Winter Sale Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: v4s65

Databricks-Certified-Professional-Data-Engineer Exam Dumps - Databricks Certified Data Engineer Professional Exam

Go to page:
Question # 9

Which statement regarding stream-static joins and static Delta tables is correct?

A.

Each microbatch of a stream-static join will use the most recent version of the static Delta table as of each microbatch.

B.

Each microbatch of a stream-static join will use the most recent version of the static Delta table as of the job's initialization.

C.

The checkpoint directory will be used to track state information for the unique keys present in the join.

D.

Stream-static joins cannot use static Delta tables because of consistency issues.

E.

The checkpoint directory will be used to track updates to the static Delta table.

Full Access
Question # 10

A Data engineer wants to run unit’s tests using common Python testing frameworks on python functions defined across several Databricks notebooks currently used in production.

How can the data engineer run unit tests against function that work with data in production?

A.

Run unit tests against non-production data that closely mirrors production

B.

Define and unit test functions using Files in Repos

C.

Define units test and functions within the same notebook

D.

Define and import unit test functions from a separate Databricks notebook

Full Access
Question # 11

A Delta Lake table was created with the below query:

Realizing that the original query had a typographical error, the below code was executed:

ALTER TABLE prod.sales_by_stor RENAME TO prod.sales_by_store

Which result will occur after running the second command?

A.

The table reference in the metastore is updated and no data is changed.

B.

The table name change is recorded in the Delta transaction log.

C.

All related files and metadata are dropped and recreated in a single ACID transaction.

D.

The table reference in the metastore is updated and all data files are moved.

E.

A new Delta transaction log Is created for the renamed table.

Full Access
Question # 12

Which distribution does Databricks support for installing custom Python code packages?

A.

sbt

B.

CRAN

C.

CRAM

D.

nom

E.

Wheels

F.

jars

Full Access
Question # 13

The data governance team has instituted a requirement that all tables containing Personal Identifiable Information (PH) must be clearly annotated. This includes adding column comments, table comments, and setting the custom table property "contains_pii" = true.

The following SQL DDL statement is executed to create a new table:

Which command allows manual confirmation that these three requirements have been met?

A.

DESCRIBE EXTENDED dev.pii test

B.

DESCRIBE DETAIL dev.pii test

C.

SHOW TBLPROPERTIES dev.pii test

D.

DESCRIBE HISTORY dev.pii test

E.

SHOW TABLES dev

Full Access
Question # 14

The following code has been migrated to a Databricks notebook from a legacy workload:

The code executes successfully and provides the logically correct results, however, it takes over 20 minutes to extract and load around 1 GB of data.

Which statement is a possible explanation for this behavior?

A.

%sh triggers a cluster restart to collect and install Git. Most of the latency is related to cluster startup time.

B.

Instead of cloning, the code should use %sh pip install so that the Python code can get executed in parallel across all nodes in a cluster.

C.

%sh does not distribute file moving operations; the final line of code should be updated to use %fs instead.

D.

Python will always execute slower than Scala on Databricks. The run.py script should be refactored to Scala.

E.

%sh executes shell code on the driver node. The code does not take advantage of the worker nodes or Databricks optimized Spark.

Full Access
Question # 15

A junior data engineer is working to implement logic for a Lakehouse table named silver_device_recordings. The source data contains 100 unique fields in a highly nested JSON structure.

The silver_device_recordings table will be used downstream for highly selective joins on a number of fields, and will also be leveraged by the machine learning team to filter on a handful of relevant fields, in total, 15 fields have been identified that will often be used for filter and join logic.

The data engineer is trying to determine the best approach for dealing with these nested fields before declaring the table schema.

Which of the following accurately presents information about Delta Lake and Databricks that may Impact their decision-making process?

A.

Because Delta Lake uses Parquet for data storage, Dremel encoding information for nesting can be directly referenced by the Delta transaction log.

B.

Tungsten encoding used by Databricks is optimized for storing string data: newly-added native support for querying JSON strings means that string types are always most efficient.

C.

Schema inference and evolution on Databricks ensure that inferred types will always accurately match the data types used by downstream systems.

D.

By default Delta Lake collects statistics on the first 32 columns in a table; these statistics are leveraged for data skipping when executing selective queries.

Full Access
Question # 16

The data engineer team is configuring environment for development testing, and production before beginning migration on a new data pipeline. The team requires extensive testing on both the code and data resulting from code execution, and the team want to develop and test against similar production data as possible.

A junior data engineer suggests that production data can be mounted to the development testing environments, allowing pre production code to execute against production data. Because all users have

Admin privileges in the development environment, the junior data engineer has offered to configure permissions and mount this data for the team.

Which statement captures best practices for this situation?

A.

Because access to production data will always be verified using passthrough credentials it is safe to mount data to any Databricks development environment.

B.

All developer, testing and production code and data should exist in a single unified workspace; creating separate environments for testing and development further reduces risks.

C.

In environments where interactive code will be executed, production data should only be accessible with read permissions; creating isolated databases for each environment further reduces risks.

D.

Because delta Lake versions all data and supports time travel, it is not possible for user error or malicious actors to permanently delete production data, as such it is generally safe to mount production data anywhere.

Full Access
Go to page: