Note! Following DAS-C01 Exam is Retired now. Please select the alternative replacement for your Exam Certification.

DAS-C01 Exam Dumps - AWS Certified Data Analytics - Specialty

Go to page:

<< First
Prev
1
2
3
4
5
6
7
8
Next
Last >>

Question # 9

A company developed a new elections reporting website that uses Amazon Kinesis Data Firehose to deliver full logs from AWS WAF to an Amazon S3 bucket. The company is now seeking a low-cost option to perform this infrequent data analysis with visualizations of logs in a way that requires minimal development effort.

Which solution meets these requirements?

Use an AWS Glue crawler to create and update a table in the Glue data catalog from the logs. Use Athena to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.

Create a second Kinesis Data Firehose delivery stream to deliver the log files to Amazon Elasticsearch Service (Amazon ES). Use Amazon ES to perform text-based searches of the logs for ad-hoc analyses and use Kibana for data visualizations.

Create an AWS Lambda function to convert the logs into .csv format. Then add the function to the Kinesis Data Firehose transformation configuration. Use Amazon Redshift to perform ad-hoc analyses of the logs using SQL queries and use Amazon QuickSight to develop data visualizations.

Create an Amazon EMR cluster and use Amazon S3 as the data source. Create an Apache Spark job to perform ad-hoc analyses and use Amazon QuickSight to develop data visualizations.

Full Access

Question # 10

A media analytics company consumes a stream of social media posts. The posts are sent to an Amazon Kinesis data stream partitioned on user_id. An AWS Lambda function retrieves the records and validates the content before loading the posts into an Amazon Elasticsearch cluster. The validation process needs to receive the posts for a given user in the order they were received. A data analyst has noticed that, during peak hours, the social media platform posts take more than an hour to appear in the Elasticsearch cluster.

What should the data analyst do reduce this latency?

Migrate the validation process to Amazon Kinesis Data Firehose.

Migrate the Lambda consumers from standard data stream iterators to an HTTP/2 stream consumer.

Increase the number of shards in the stream.

Configure multiple Lambda functions to process the stream.

Full Access

Question # 11

An Amazon Redshift database contains sensitive user data. Logging is necessary to meet compliance requirements. The logs must contain database authentication attempts, connections, and disconnections. The logs must also contain each query run against the database and record which database user ran each query.

Which steps will create the required logs?

Enable Amazon Redshift Enhanced VPC Routing. Enable VPC Flow Logs to monitor traffic.

Allow access to the Amazon Redshift database using AWS IAM only. Log access using AWS CloudTrail.

Enable audit logging for Amazon Redshift using the AWS Management Console or the AWS CLI.

Enable and download audit reports from AWS Artifact.

Full Access

Question # 12

An operations team notices that a few AWS Glue jobs for a given ETL application are failing. The AWS Glue jobs read a large number of small JSON files from an Amazon S3 bucket and write the data to a different S3 bucket in Apache Parquet format with no major transformations. Upon initial investigation, a data engineer notices the following error message in the History tab on the AWS Glue console: â€œCommand Failed with Exit Code 1.â€

Upon further investigation, the data engineer notices that the driver memory profile of the failed jobs crosses the safe threshold of 50% usage quickly and reaches 90â€“95% soon after. The average memory usage across all executors continues to be less than 4%.

The data engineer also notices the following error while examining the related Amazon CloudWatch Logs.

What should the data engineer do to solve the failure in the MOST cost-effective way?

Change the worker type from Standard to G.2X.

Modify the AWS Glue ETL code to use the â€˜groupFilesâ€™: â€˜inPartitionâ€™ feature.

Increase the fetch size setting by using AWS Glue dynamics frame.

Modify maximum capacity to increase the total maximum data processing units (DPUs) used.

Full Access

Question # 13

An online retail company is migrating its reporting system to AWS. The companyâ€™s legacy system runs data processing on online transactions using a complex series of nested Apache Hive queries. Transactional data is exported from the online system to the reporting system several times a day. Schemas in the files are stable

between updates.

A data analyst wants to quickly migrate the data processing to AWS, so any code changes should be minimized. To keep storage costs low, the data analyst decides to store the data in Amazon S3. It is vital that the data from the reports and associated analytics is completely up to date based on the data in Amazon S3.

Which solution meets these requirements?

Create an AWS Glue Data Catalog to manage the Hive metadata. Create an AWS Glue crawler over Amazon S3 that runs when data is refreshed to ensure that data changes are updated. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.

Create an AWS Glue Data Catalog to manage the Hive metadata. Create an Amazon EMR cluster with consistent view enabled. Run emrfs sync before each analytics step to ensure data changes are updated. Create an EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.

Create an Amazon Athena table with CREATE TABLE AS SELECT (CTAS) to ensure data is refreshed from underlying queries against the rawdataset. Create an AWS Glue Data Catalog to manage the Hive metadata over the CTAS table. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.

Use an S3 Select query to ensure that the data is properly updated. Create an AWS Glue Data Catalog to manage the Hive metadata over the S3 Select table. Create an Amazon EMR cluster and use the metadata in the AWS Glue Data Catalog to run Hive processing queries in Amazon EMR.

Full Access

Question # 14

A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake.

The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.

The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day.

How should this data be stored for optimal performance?

In Apache ORC partitioned by date and sorted by source IP

In compressed .csv partitioned by date and sorted by source IP

In Apache Parquet partitioned by source IP and sorted by date

In compressed nested JSON partitioned by source IP and sorted by date

Full Access

Question # 15

A company is migrating from an on-premises Apache Hadoop cluster to an Amazon EMR cluster. The cluster runs only during business hours. Due to a company requirement to avoid intraday cluster failures, the EMR cluster must be highly available. When the cluster is terminated at the end of each business day, the data must persist.

Which configurations would enable the EMR cluster to meet these requirements? (Choose three.)

EMR File System (EMRFS) for storage

Hadoop Distributed File System (HDFS) for storage

AWS Glue Data Catalog as the metastore for Apache Hive

MySQL database on the master node as the metastore for Apache Hive

Multiple master nodes in a single Availability Zone

Multiple master nodes in multiple Availability Zones

Full Access

Question # 16

A healthcare company ingests patient data from multiple data sources and stores it in an Amazon S3 staging bucket. An AWS Glue ETL job transforms the data, which is written to an S3-based data lake to be queried using Amazon Athena. The company wants to match patient records even when the records do not have a common unique identifier.

Which solution meets this requirement?

Use Amazon Macie pattern matching as part of the ETLjob

Train and use the AWS Glue PySpark filter class in the ETLjob

Partition tables and use the ETL job to partition the data on patient name

Train and use the AWS Glue FindMatches ML transform in the ETLjob

Full Access

Go to page: