Easter Sale Limited Time 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: scxmas70

Professional-Data-Engineer Exam Dumps - Google Professional Data Engineer Exam

Go to page:
Question # 25

Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour.

The data scientists have written the following code to read the data for a new key features in the logs.

BigQueryIO.Read

.named(“ReadLogData”)

.from(“clouddataflow-readonly:samples.log_data”)

You want to improve the performance of this data read. What should you do?

A.

Specify the TableReference object in the code.

B.

Use .fromQuery operation to read specific fields from the table.

C.

Use of both the Google BigQuery TableSchema and TableFieldSchema classes.

D.

Call a transform that returns TableRow objects, where each element in the PCollexction represents a single row in the table.

Full Access
Question # 26

You have several Spark jobs that run on a Cloud Dataproc cluster on a schedule. Some of the jobs run in sequence, and some of the jobs run concurrently. You need to automate this process. What should you do?

A.

Create a Cloud Dataproc Workflow Template

B.

Create an initialization action to execute the jobs

C.

Create a Directed Acyclic Graph in Cloud Composer

D.

Create a Bash script that uses the Cloud SDK to create a cluster, execute jobs, and then tear down the cluster

Full Access
Question # 27

You need to compose visualization for operations teams with the following requirements:

    Telemetry must include data from all 50,000 installations for the most recent 6 weeks (sampling once every minute)

    The report must not be more than 3 hours delayed from live data.

    The actionable report should only show suboptimal links.

    Most suboptimal links should be sorted to the top.

    Suboptimal links can be grouped and filtered by regional geography.

    User response time to load the report must be <5 seconds.

You create a data source to store the last 6 weeks of data, and create visualizations that allow viewers to see multiple date ranges, distinct geographic regions, and unique installation types. You always show the latest data without any changes to your visualizations. You want to avoid creating and updating new visualizations each month. What should you do?

A.

Look through the current data and compose a series of charts and tables, one for each possible

combination of criteria.

B.

Look through the current data and compose a small set of generalized charts and tables bound to criteria filters that allow value selection.

C.

Export the data to a spreadsheet, compose a series of charts and tables, one for each possible

combination of criteria, and spread them across multiple tabs.

D.

Load the data into relational database tables, write a Google App Engine application that queries all rows, summarizes the data across each criteria, and then renders results using the Google Charts and visualization API.

Full Access
Question # 28

You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work in progress on your clusters. What should you do?

A.

Increase the cluster size with more non-preemptible workers.

B.

Increase the cluster size with preemptible worker nodes, and configure them to forcefully decommission.

C.

Increase the cluster size with preemptible worker nodes, and use Cloud Stackdriver to trigger a script to preserve work.

D.

Increase the cluster size with preemptible worker nodes, and configure them to use graceful decommissioning.

Full Access
Question # 29

You are running your BigQuery project in the on-demand billing model and are executing a change data capture (CDC) process that ingests data. The CDC process loads 1 GB of data every 10 minutes into a temporary table, and then performs a merge into a 10 TB target table. This process is very scan intensive and you want to explore options to enable a predictable cost model. You need to create a BigQuery reservation based on utilization information gathered from BigQuery Monitoring and apply the reservation to the CDC process. What should you do?

A.

Create a BigQuery reservation for the job.

B.

Create a BigQuery reservation for the service account running the job.

C.

Create a BigQuery reservation for the dataset.

D.

Create a BigQuery reservation for the project.

Full Access
Question # 30

Your car factory is pushing machine measurements as messages into a Pub/Sub topic in your Google Cloud project. A Dataflow streaming job. that you wrote with the Apache Beam SDK, reads these messages, sends acknowledgment lo Pub/Sub. applies some custom business logic in a Doffs instance, and writes the result to BigQuery. You want to ensure that if your business logic fails on a message, the message will be sent to a Pub/Sub topic that you want to monitor for alerting purposes. What should you do?

A.

Use an exception handling block in your Data Flow’s Doffs code to push the messages that failed to be transformed through a side output

and to a new Pub/Sub topic. Use Cloud Monitoring to monitor the topic/num_jnacked_messages_by_region metric on this new topic.

B.

Enable retaining of acknowledged messages in your Pub/Sub pull subscription. Use Cloud Monitoring to monitor the

subscription/num_retained_acked_messages metric on this subscription.

C.

Enable dead lettering in your Pub/Sub pull subscription, and specify a new Pub/Sub topic as the dead letter topic. Use Cloud Monitoring to

monitor the subscription/dead_letter_message_count metric on your pull subscription.

D.

Create a snapshot of your Pub/Sub pull subscription. Use Cloud Monitoring to monitor the snapshot/numessages metric on this

snapshot.

Full Access
Question # 31

You need to create a near real-time inventory dashboard that reads the main inventory tables in your BigQuery data warehouse. Historical inventory data is stored as inventory balances by item and location. You have several thousand updates to inventory every hour. You want to maximize performance of the dashboard and ensure that the data is accurate. What should you do?

A.

Leverage BigQuery UPDATE statements to update the inventory balances as they are changing.

B.

Partition the inventory balance table by item to reduce the amount of data scanned with each inventory update.

C.

Use the BigQuery streaming the stream changes into a daily inventory movement table. Calculate balances in a view that joins it to the historical inventory balance table. Update the inventory balance table nightly.

D.

Use the BigQuery bulk loader to batch load inventory changes into a daily inventory movement table. Calculate balances in a view that joins it to the historical inventory balance table. Update the inventory balance table nightly.

Full Access
Question # 32

The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million records that must be updated in BigQuery. When you use the UPDATE statement in BigQuery, you receive a quotaExceeded error. What should you do?

A.

Reduce the number of records updated each day to stay within the BigQuery UPDATE DML statement limit.

B.

Increase the BigQuery UPDATE DML statement limit in the Quota management section of the Google Cloud Platform Console.

C.

Split the source CSV file into smaller CSV files in Cloud Storage to reduce the number of BigQuery UPDATE DML statements per BigQuery job.

D.

Import the new records from the CSV file into a new BigQuery table. Create a BigQuery job that merges the new records with the existing records and writes the results to a new BigQuery table.

Full Access
Go to page: