Professional-Data-Engineer Exam Dumps - Google Professional Data Engineer Exam

Searching for workable clues to ace the Google Professional-Data-Engineer Exam? You’re on the right place! ExamCert has realistic, trusted and authentic exam prep tools to help you achieve your desired credential. ExamCert’s Professional-Data-Engineer PDF Study Guide, Testing Engine and Exam Dumps follow a reliable exam preparation strategy, providing you the most relevant and updated study material that is crafted in an easy to learn format of questions and answers. ExamCert’s study tools aim at simplifying all complex and confusing concepts of the exam and introduce you to the real exam scenario and practice it with the help of its testing engine and real exam dumps

Go to page:

<< First
Prev
1
2
3
4
5
6
7
8
9
10
Next
Last >>

Question # 49

You have a data analyst team member who needs to analyze data by using BigQuery. The data analyst wants to create a data pipeline that would load 200 CSV files with an average size of 15MB from a Cloud Storage bucket into BigQuery daily. The data needs to be ingested and transformed before being accessed in BigQuery for analysis. You need to recommend a fully managed, no-code solution for the data analyst. What should you do?

Create a Cloud Run function and schedule it to run daily using Cloud Scheduler to load the data into BigQuery.

Use the BigQuery Data Transfer Service to load files from Cloud Storage to BigQuery, create a BigQuery job which transforms the data using BigQuery SQL and schedule it to run daily.

Build a custom Apache Beam pipeline and run it on Dataflow to load the file from Cloud Storage to BigQuery and schedule it to run daily using Cloud Composer.

Create a pipeline by using BigQuery pipelines and schedule it to load the data into BigQuery daily.

Full Access

Answer:

Explanation:

The requirements are for a daily scheduled load, ingest, and transformation, and specifically a fully managed, no-code solution.

Ingest (Load): The BigQuery Data Transfer Service (DTS) is the fully managed, serverless, and no-code solution for batch loading files (including CSV from Cloud Storage) into BigQuery on a schedule. This is the "ingest" part.

Transform: After loading the raw data into a staging table using DTS, the transformation can be done using BigQuery SQL. This transformation query can then be automated using a Scheduled Query in BigQuery, which is also a fully managed and no-code feature that runs on a schedule.

Fully Managed & No-Code: Both DTS for Cloud Storage and Scheduled Queries are native BigQuery features that are fully managed and configured through the console without requiring code, directly meeting the constraints.

Correcting other options:

A (Cloud Run + Script): Cloud Run requires writing a custom Python script, which violates the no-code requirement.

C (Dataflow + Apache Beam + Cloud Composer): This is a powerful, highly scalable ETL solution, but it requires writing custom code (Apache Beam) and requires setting up and managing a workflow orchestrator (Cloud Composer/Airflow), which violates both the fully managed (Dataflow is serverless, but the code/pipeline itself is custom and needs maintenance) and no-code requirements.

D (BigQuery pipelines): "BigQuery pipelines" is not a distinct, official product name in the Google Cloud documentation that fulfills a no-code scheduled ETL. The closest product is the combination of DTS and Scheduled Queries, as described in option B.

[Reference: Google Cloud Documentation on BigQuery Data Transfer Service and Scheduled Queries:, "The BigQuery Data Transfer Service automates data movement into BigQuery on a scheduled, managed basis... The BigQuery Data Transfer Service supports loading data from Cloud Storage in one of the following formats: Comma-separated values (CSV)..." (Source: What is BigQuery Data Transfer Service? and Introduction to Cloud Storage transfers), "A scheduled query is a query that BigQuery automatically runs at regular intervals. When you configure a scheduled query, you specify the GoogleSQL SELECT statement to run, the destination table for the query results, and the frequency of the query." (Source: Scheduling queries), This combination delivers a fully managed, no-code ELT (Extract-Load-Transform) pipeline., , , ]

Question # 50

You have several different unstructured data sources, within your on-premises data center as well as in the cloud. The data is in various formats, such as Apache Parquet and CSV. You want to centralize this data in Cloud Storage. You need to set up an object sink for your data that allows you to use your own encryption keys. You want to use a GUI-based solution. What should you do?

Use Cloud Data Fusion to move files into Cloud Storage.

Use Storage Transfer Service to move files into Cloud Storage.

Use Dataflow to move files into Cloud Storage.

Use BigQuery Data Transfer Service to move files into BigQuery.

Full Access

Question # 51

Your new customer has requested daily reports that show their net consumption of Google Cloud compute resources and who used the resources. You need to quickly and efficiently generate these daily reports. What should you do?

Do daily exports of Cloud Logging data to BigQuery. Create views filtering by project, log type, resource, and user.

Filter data in Cloud Logging by project, resource, and user; then export the data in CSV format.

Filter data in Cloud Logging by project, log type, resource, and user, then import the data into BigQuery.

Export Cloud Logging data to Cloud Storage in CSV format. Cleanse the data using Dataprep, filtering by project, resource, and user.

Full Access

Question # 52

You are collecting loT sensor data from millions of devices across the world and storing the data in BigQuery. Your access pattern is based on recent data tittered by location_id and device_version with the following query:

You want to optimize your queries for cost and performance. How should you structure your data?

Partition table data by create_date, location_id and device_version

Partition table data by create_date cluster table data by tocation_id and device_version

Cluster table data by create_date location_id and device_version

Cluster table data by create_date, partition by location and device_version

Full Access

Question # 53

You need to deploy additional dependencies to all of a Cloud Dataproc cluster at startup using an existing initialization action. Company security policies require that Cloud Dataproc nodes do not have access to the Internet so public initialization actions cannot fetch resources. What should you do?

Deploy the Cloud SQL Proxy on the Cloud Dataproc master

Use an SSH tunnel to give the Cloud Dataproc cluster access to the Internet

Copy all dependencies to a Cloud Storage bucket within your VPC security perimeter

Use Resource Manager to add the service account used by the Cloud Dataproc cluster to the Network User role

Full Access

Question # 54

You work for a mid-sized enterprise that needs to move its operational system transaction data from an on-premises database to GCP. The database is about 20 TB in size. Which database should you choose?

Cloud SQL

Cloud Bigtable

Cloud Spanner

Cloud Datastore

Full Access

Question # 55

Youâ€™ve migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial data are parquet files (on average 200-400 MB size each). You see some degradation in performance after the migration to Dataproc, so youâ€™d like to optimize for it. You need to keep in mind that your organization is very cost-sensitive, so youâ€™d like to continue using Dataproc on preemptibles (with 2 non-preemptible workers only) for this workload.

What should you do?

Increase the size of your parquet files to ensure them to be 1 GB minimum.

Switch to TFRecords formats (appr. 200MB per file) instead of parquet files.

Switch from HDDs to SSDs, copy initial data from GCS to HDFS, run the Spark job and copy results back to GCS.

Switch from HDDs to SSDs, override the preemptible VMs configuration to increase the boot disk size.

Full Access

Question # 56

Your team runs a complex analytical query daily that processes terabytes of data. Recently, after running for 20 minutes, the query fails with a "Resources exceeded" error. You need to resolve this issue. What should you do?

Increase your project's BigQuery API request quota.

Analyze the SQL syntax for errors.

Increase the maximum table size limit.

Move from BigQuery on-demand to slot reservations.

Full Access

Answer:

Explanation:

In BigQuery, the "Resources exceeded" error (specifically during execution) typically indicates that the query's resource demands (CPU/Memory/Shuffle) have surpassed the limits of the shared on-demand slot pool.

Slot Reservations: On-demand pricing uses a shared pool of slots with a soft cap (typically 2,000 slots). For massive, complex queries processing terabytes and running for long durations, this pool may be insufficient or subject to "load shedding" during peak times. Moving to Capacity-based pricing (Slots) allows you to reserve a dedicated number of slots (e.g., 500, 2,000, or more) that are exclusively yours, providing the sustained compute power needed for heavy analytical jobs.

Correcting other options:

A: API request quotas (e.g., queries per second) are unrelated to the internal compute resources required to execute a single massive query.

B: While query optimization (removing ORDER BY, etc.) can help, if the logic is correct but simply too large for the shared pool, syntax analysis won't fix the underlying resource constraint.

C: There is no "maximum table size limit" that causes an execution-time resource error; BigQuery supports petabyte-scale tables.

[Reference: Google Cloud Documentation on BigQuery Troubleshooting:, "Resources exceeded during query execution: This error occurs when a query uses too many resources (CPU, memory, or shuffle). This is common for queries that are very complex or join large datasets... To resolve this, you can: 1. Optimize your query... 2. Switch to capacity-based pricing (reservations) to ensure your jobs have a dedicated number of slots and are not impacted by the shared on-demand pool's limits." (Source: Troubleshoot query issues), "On-demand pricing is subject to a default slot quota... For workloads that require more predictability or higher scale, slot commitments and reservations provide dedicated capacity." (Source: BigQuery pricing models), , ]

Go to page: