Professional-Data-Engineer Exam Dumps - Google Professional Data Engineer Exam

Searching for workable clues to ace the Google Professional-Data-Engineer Exam? You’re on the right place! ExamCert has realistic, trusted and authentic exam prep tools to help you achieve your desired credential. ExamCert’s Professional-Data-Engineer PDF Study Guide, Testing Engine and Exam Dumps follow a reliable exam preparation strategy, providing you the most relevant and updated study material that is crafted in an easy to learn format of questions and answers. ExamCert’s study tools aim at simplifying all complex and confusing concepts of the exam and introduce you to the real exam scenario and practice it with the help of its testing engine and real exam dumps

Go to page:

<< First
Prev
1
2
3
4
5
6
7
8
Next
Last >>

Question # 25

You have important legal hold documents in a Cloud Storage bucket. You need to ensure that these documents are not deleted or modified. What should you do?

Set a retention policy. Lock the retention policy.

Set a retention policy. Set the default storage class to Archive for long-term digital preservation.

Enable the Object Versioning feature. Add a lifecycle rule.

Enable the Object Versioning feature. Create a copy in a bucket in a different region.

Full Access

Question # 26

Youâ€™ve migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so youâ€™d like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so youâ€™d like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.

What should you do?

Increase the size of your parquet files to ensure them to be 1 GB minimum.

Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.

Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.

Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

Full Access

Question # 27

You need to choose a database for a new project that has the following requirements:

Fully managed

Able to automatically scale up

Transactionally consistent

Able to scale up to 6 TB

Able to be queried using SQL

Which database do you choose?

Cloud SQL

Cloud Bigtable

Cloud Spanner

Cloud Datastore

Full Access

Question # 28

You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimizes cost. How should you configure the BigQuery table?

Set the BigQuery dataset to be regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

Set the BigQuery dataset to be regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

Set the BigQuery dataset to be multi-regional. In the event of an emergency, use a point-in-time snapshot to recover the data.

Set the BigQuery dataset to be multi-regional. Create a scheduled query to make copies of the data to tables suffixed with the time of the backup. In the event of an emergency, use the backup copy of the table.

Full Access

Question # 29

You work for an economic consulting firm that helps companies identify economic trends as they happen. As part of your analysis, you use Google BigQuery to correlate customer data with the average prices of the 100 most common goods sold, including bread, gasoline, milk, and others. The average prices of these goods are updated every 30 minutes. You want to make sure this data stays up to date so you can combine it with other data in BigQuery as cheaply as possible. What should you do?

Load the data every 30 minutes into a new partitioned table in BigQuery.

Store and update the data in a regional Google Cloud Storage bucket and create a federated data source in BigQuery

Store the data in Google Cloud Datastore. Use Google Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Cloud Datastore

Store the data in a file in a regional Google Cloud Storage bucket. Use Cloud Dataflow to query BigQuery and combine the data programmatically with the data stored in Google Cloud Storage.

Full Access

Question # 30

You are preparing an organization-wide dataset. You need to preprocess customer data stored in a restricted bucket in Cloud Storage. The data will be used to create consumer analyses. You need to follow data privacy requirements, including protecting certain sensitive data elements, while also retaining all of the data for potential future use cases. What should you do?

Use Dataflow and the Cloud Data Loss Prevention API to mask sensitive data. Write the processed data in BigQuery.

Use the Cloud Data Loss Prevention API and Dataflow to detect and remove sensitive fields from the data in Cloud Storage. Write the filtered data in BigQuery.

Use Dataflow and Cloud KMS to encrypt sensitive fields and write the encrypted data in BigQuery. Share the encryption key by following the principle of least privilege.

Use customer-managed encryption keys (CMEK) to directly encrypt the data in Cloud Storage. Use federated queries from BigQuery. Share the encryption key by following the principle of least privilege.

Full Access

Answer:

Explanation:

The core requirements are to protect sensitive data elements (data privacy) while retainingalldata for potential future use, and then using this preprocessed data for consumer analyses.

Retaining All Data:This immediately makes option B (remove sensitive fields) unsuitable because it involves data loss.

Protecting Sensitive Data for Analysis & Future Use:Masking is a de-identification technique that redacts or replaces sensitive data with a substitute, allowing the data structure and usability for analysis to be maintained without exposing the original sensitive values. This aligns with protecting data while still making it usable.

Cloud Data Loss Prevention (DLP) API:This service is specifically designed to discover, classify, and protect sensitive data. It offers various de-identification techniques, including masking.

Dataflow:This is a serverless, fast, and cost-effective service for unified stream and batch data processing. It's well-suited for transforming large datasets, such as those read from Cloud Storage, and can integrate with the DLP API for de-identification.

Writing to BigQuery:BigQuery is an ideal destination for an organization-wide dataset for consumer analyses.

Therefore, using Dataflow to read the data from Cloud Storage, leveraging the Cloud DLP API tomask(a form of de-identification) the sensitive elements, and then writing the processed (masked) data to BigQuery is the most appropriate solution. This approach protects privacy for the consumer analyses dataset while the original, unaltered data can still be retained in the restricted Cloud Storage bucket for future use cases that might require access to the original sensitive information (under strict governance).

Let's analyze why other options are less suitable:

Option B:"Remove sensitive fields" means data loss, which contradicts the requirement to retain all data for potential future use cases.

Option C:Encrypting sensitive fields with Cloud KMS and writing them to BigQuery is a valid way to protect data. However, for "consumer analyses," masked data is generally more directly usable than encrypted data. Analysts would typically work with de-identified (e.g., masked) data rather than directly querying encrypted fields and managing decryption keys for analytical purposes. While decryption is possible, masking often provides a better balance of privacy and utility for broad analysis. The question also implies creating a datasetforanalysis, where masking makes the data ready-to-use for that purpose. The original data remains in Cloud Storage.

Option D:Using CMEK encrypts the entire object in Cloud Storage at rest. While this protects the data in Cloud Storage, federated queries from BigQuery would access the raw, unmasked data (assuming decryption occurs seamlessly). This doesn't address the preprocessing requirement of protectingcertain sensitive data elementswithin the data itself for theconsumer analysesdataset. The goal is to create a de-identified dataset for analysis, not just secure the raw data at rest.

[Reference:, Google Cloud Documentation: Cloud Data Loss Prevention > De-identification overview. "De-identification is the process of removing identifying information from data. Cloud DLP uses de-identification techniques such as masking, tokenization, pseudonymization, date shifting, and more to help you protect sensitive data.", Google Cloud Documentation: Cloud Data Loss Prevention > Basic de-identification > Masking. "Masking hides parts of data by replacing characters with a symbol, such as an asterisk (*) or hash (#).", Google Cloud Documentation: Dataflow > Overview. "Dataflow is a fully managed streaming analytics service that minimizes latency, processing time, and cost through autoscaling and batch processing.", Google Cloud Solution: Automating the de-identification of PII in large-scale datasets using Cloud DLP and Dataflow. This solution guide explicitly outlines using Dataflow and DLP API for de-identifying (including masking) data from Cloud Storage and loading it into BigQuery. "You can use Cloud DLP to scan data for sensitive elements andthen apply de-identification techniques such as redaction, masking, or tokenization." and "This tutorial uses Dataflow to orchestrate the de-identification process.", , , ]

Question # 31

You need to set access to BigQuery for different departments within your company. Your solution should comply with the following requirements:

Each department should have access only to their data.

Each department will have one or more leads who need to be able to create and update tables and provide them to their team.

Each department has data analysts who need to be able to query but not modify data.

How should you set access to the data in BigQuery?

Create a dataset for each department. Assign the department leads the role of OWNER, and assign the data analysts the role of WRITER on their dataset.

Create a dataset for each department. Assign the department leads the role of WRITER, and assign the data analysts the role of READER on their dataset.

Create a table for each department. Assign the department leads the role of Owner, and assign the data analysts the role of Editor on the project the table is in.

Create a table for each department. Assign the department leads the role of Editor, and assign the data analysts the role of Viewer on the project the table is in.

Full Access

Question # 32

You are implementing a chatbot to help an online retailer streamline their customer service. The chatbot must be able to respond to both text and voice inquiries. You are looking for a low-code or no-code option, and you want to be able to easily train the chatbot to provide answers to keywords. What should you do?

Use the Speech-to-Text API to build a Python application in App Engine.

Use the Speech-to-Text API to build a Python application in a Compute Engine instance.

Use Dialogflow for simple queries and the Speech-to-Text API for complex queries.

Use Dialogflow to implement the chatbot. defining the intents based on the most common queries collected.

Full Access