Associate-Data-Practitioner Exam Dumps - Google Cloud Associate Data Practitioner (ADP Exam)

Go to page:

Question # 9

You work for a home insurance company. You are frequently asked to create and save risk reports with charts for specific areas using a publicly available storm event dataset. You want to be able to quickly create and re-run risk reports when new data becomes available. What should you do?

Export the storm event dataset as a CSV file. Import the file to Google Sheets, and use cell data in the worksheets to create charts.

Copy the storm event dataset into your BigQuery project. Use BigQuery Studio to query and visualize the data in Looker Studio.

Reference and query the storm event dataset using SQL in BigQuery Studio. Export the results to Google Sheets, and use cell data in the worksheets to create charts.

Reference and query the storm event dataset using SQL in a Colab Enterprise notebook. Display the table results and document with Markdown, and use Matplotlib to create charts.

Full Access

Question # 10

Your organization has a BigQuery dataset that contains sensitive employee information such as salaries and performance reviews. The payroll specialist in the HR department needs to have continuous access to aggregated performance data, but they do not need continuous access to other sensitive data. You need to grant the payroll specialist access to the performance data without granting them access to the entire dataset using the simplest and most secure approach. What should you do?

Use authorized views to share query results with the payroll specialist.

Create row-level and column-level permissions and policies on the table that contains performance data in the dataset. Provide the payroll specialist with the appropriate permission set.

Create a table with the aggregated performance data. Use table-level permissions to grant access to the payroll specialist.

Create a SQL query with the aggregated performance data. Export the results to an Avro file in a Cloud Storage bucket. Share the bucket with the payroll specialist.

Full Access

Question # 11

You created a customer support application that sends several forms of data to Google Cloud. Your application is sending:

1. Audio files from phone interactions with support agents that will be accessed during trainings.

2. CSV files of usersâ€™ personally identifiable information (Pll) that will be analyzed with SQL.

3. A large volume of small document files that will power other applications.

You need to select the appropriate tool for each data type given the required use case, while following Google-recommended practices. Which should you choose?

1. Cloud Storage

2. CloudSQL for PostgreSQL

3. Bigtable

1. Filestore

2. Cloud SQL for PostgreSQL

3. Datastore

1. Cloud Storage

2. BigQuery

3. Firestore

1. Filestore

2. Bigtable

3. BigQuery

Full Access

Question # 12

You want to process and load a daily sales CSV file stored in Cloud Storage into BigQuery for downstream reporting. You need to quickly build a scalable data pipeline that transforms the data while providing insights into data quality issues. What should you do?

Create a batch pipeline in Cloud Data Fusion by using a Cloud Storage source and a BigQuery sink.

Load the CSV file as a table in BigQuery, and use scheduled queries to run SQL transformation scripts.

Load the CSV file as a table in BigQuery. Create a batch pipeline in Cloud Data Fusion by using a BigQuery source and sink.

Create a batch pipeline in Dataflow by using the Cloud Storage CSV file to BigQuery batch template.

Full Access

Question # 13

You are a data analyst working with sensitive customer data in BigQuery. You need to ensure that only authorized personnel within your organization can query this data, while following the principle of least privilege. What should you do?

Enable access control by using IAM roles.

Update dataset privileges by using the SQL GRANT statement.

Export the data to Cloud Storage, and use signed URLs to authorize access.

Encrypt the data by using customer-managed encryption keys (CMEK).

Full Access

Question # 14

You manage an ecommerce website that has a diverse range of products. You need to forecast future product demand accurately to ensure that your company has sufficient inventory to meet customer needs and avoid stockouts. Your company's historical sales data is stored in a BigQuery table. You need to create a scalable solution that takes into account the seasonality and historical data to predict product demand. What should you do?

Use the historical sales data to train and create a BigQuery ML time series model. Use the ML.FORECAST function call to output the predictions into a new BigQuery table.

Use Colab Enterprise to create a Jupyter notebook. Use the historical sales data to train a custom prediction model in Python.

Use the historical sales data to train and create a BigQuery ML linear regression model. Use the ML.PREDICT function call to output the predictions into a new BigQuery table.

Use the historical sales data to train and create a BigQuery ML logistic regression model. Use the ML.PREDICT function call to output the predictions into a new BigQuery table.

Full Access

Question # 15

You need to design a data pipeline to process large volumes of raw server log data stored in Cloud Storage. The data needs to be cleaned, transformed, and aggregated before being loaded into BigQuery for analysis. The transformation involves complex data manipulation using Spark scripts that your team developed. You need to implement a solution that leverages your teamâ€™s existing skillset, processes data at scale, and minimizes cost. What should you do?

Use Dataflow with a custom template for the transformation logic.

Use Cloud Data Fusion to visually design and manage the pipeline.

Use Dataform to define the transformations in SQLX.

Use Dataproc to run the transformations on a cluster.

Full Access

Question # 16

You are developing a data ingestion pipeline to load small CSV files into BigQuery from Cloud Storage. You want to load these files upon arrival to minimize data latency. You want to accomplish this with minimal cost and maintenance. What should you do?

Use the bq command-line tool within a Cloud Shell instance to load the data into BigQuery.

Create a Cloud Composer pipeline to load new files from Cloud Storage to BigQuery and schedule it to run every 10 minutes.

Create a Cloud Run function to load the data into BigQuery that is triggered when data arrives in Cloud Storage.

Create a Dataproc cluster to pull CSV files from Cloud Storage, process them using Spark, and write the results to BigQuery.

Full Access

Answer:

Explanation:

Using aCloud Run functiontriggered by Cloud Storage to load the data into BigQuery is the best solution because it minimizes both cost and maintenance while providing low-latency data ingestion. Cloud Run is a serverless platform that automatically scales based on the workload, ensuring efficient use of resources without requiring a dedicated instance or cluster. It integrates seamlessly with Cloud Storage event notifications, enabling real-time processing of incoming files and loading them into BigQuery. This approach is cost-effective, scalable, and easy to manage.

The goal is to load small CSV files into BigQuery upon arrival (event-driven) with minimal latency, cost, and maintenance. Google Cloud provides serverless, event-driven options that align with this requirement. Letâ€™s evaluate each option in detail:

Option A: Cloud Composer (managed Apache Airflow) can schedule a pipeline to check Cloud Storage every 10 minutes, but this polling approach introduces latency (up to 10 minutes) and incurs costs for running Composer even when no files arrive. Maintenance includes managing DAGs and the Composer environment, which adds overhead. This is better suited for scheduled batch jobs, not event-driven ingestion.

Option B: A Cloud Run function triggered by a Cloud Storage event (via Eventarc or Pub/Sub) loads files into BigQuery as soon as they arrive, minimizing latency. Cloud Run is serverless, scales to zero when idle (low cost), and requires minimal maintenance (deploy and forget). Using the BigQuery API in the function (e.g., Python client library) handles small CSV loads efficiently. This aligns with Googleâ€™s serverless, event-driven best practices.

Option C: Dataproc with Spark is designed for large-scale, distributed processing, not small CSV ingestion. It requires cluster management, incurs higher costs (even with ephemeral clusters), and adds unnecessary complexity for a simple load task.

Option D: The bq command-line tool in Cloud Shell is manual and not automated, failing the â€œupon arrivalâ€ requirement. Itâ€™s a one-off tool, not a pipeline solution, and Cloud Shell isnâ€™t designed for persistent automation.

Why B is Best: Cloud Run leverages Cloud Storageâ€™s object creation events, ensuring near-zero latency between file arrival and BigQuery ingestion. Itâ€™s serverless, meaning no infrastructure to manage, and costs scale with usage (free when idle). For small CSVs, the BigQuery load job is lightweight, avoiding processing overhead.

Extract from Google Documentation: From "Triggering Cloud Run with Cloud Storage Events" (https://cloud.google.com/run/docs/triggering/using-events): "You can trigger Cloud Run services in response to Cloud Storage events, such as object creation, using Eventarc. This serverless approach minimizes latency and maintenance, making it ideal for real-time data pipelines." Additionally, from "Loading Data into BigQuery" (https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv): "Programmatically load CSV files from Cloud Storage using the BigQuery API, enabling automated ingestion with minimal overhead."

[References: Google Cloud Documentation - "Cloud Run Events" (https://cloud.google.com/run/docs), "BigQuery Load Jobs" (https://cloud.google.com/bigquery/docs/loading-data)., ]

Go to page: