AWS Certified Data Analytics - Specialty (DAS-C01)

The AWS Certified Data Analytics - Specialty (DAS-C01) were last updated on today.

Viewing page 4 out of 32 pages.
Viewing questions 16-20 out of 160 questions

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.and Azure
- Trademarks, certification & product names are used for reference only and belong to Amazon.and Azure

Topic 1 - Exam A

Question #16 Topic 1

A data analyst runs a large number of data manipulation language (DML) queries by using Amazon Athena with the JDBC driver. Recently, a query failed after it ran for 30 minutes. The query returned the following message: java.sql.SQLException: Query timeout The data analyst does not immediately need the query results. However, the data analyst needs a long-term solution for this problem. Which solution will meet these requirements?

A Split the query into smaller queries to search smaller subsets of data
B In the settings for Athena, adjust the DML query timeout limit
C In the Service Quotas console, request an increase for the DML query timeout
D Save the tables as compressed .csv files

Suggested Answer: C
NOTE: The best solution to address the issue of query timeout in Amazon Athena with the JDBC driver is to request an increase for the DML query timeout through the Service Quotas console. This will ensure that the data analyst has enough time to run queries without them timing out.

Question #17 Topic 1

A streaming application is reading data from Amazon Kinesis Data Streams and immediately writing the data to an Amazon S3 bucket every 10 seconds. The application is reading data from hundreds of shards. The batch interval cannot be changed due to a separate requirement. The data is being accessed by Amazon Athena. Users are seeing degradation in query performance as time progresses. Which action can help improve query performance?

A Merge the files in Amazon S3 to form larger files.
B Increase the number of shards in Kinesis Data Streams.
C Add more memory and CPU capacity to the streaming application.
D Write the files to multiple S3 buckets.

Suggested Answer: A
NOTE: Merging the files in Amazon S3 to form larger files can help improve query performance in Amazon Athena. By merging smaller files into larger files, it reduces the number of file scans that need to be performed during queries, resulting in improved performance.

Question #18 Topic 1

A financial company uses Amazon S3 as its data lake and has set up a data warehouse using a multi-node Amazon Redshift cluster. The data files in the data lake are organized in folders based on the data source of each data file. All the data files are loaded to one table in the Amazon Redshift cluster using a separate COPY command for each data file location. With this approach, loading all the data files into Amazon Redshift takes a long time to complete. Users want a faster solution with little or no increase in cost while maintaining the segregation of the data files in the S3 data lake. Which solution meets these requirements?

A Use Amazon EMR to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
B Load all the data files in parallel to Amazon Aurora, and run an AWS Glue job to load the data into Amazon Redshift.
C Use an AWS Glue job to copy all the data files into one folder and issue a COPY command to load the data into Amazon Redshift.
D Create a manifest file that contains the data file locations and issue a COPY command to load the data into Amazon Redshift.

Suggested Answer: D
NOTE: Creating a manifest file that contains the data file locations and issuing a COPY command to load the data into Amazon Redshift is the best solution. This approach allows for faster data loading as the manifest file provides a list of all the data file locations, eliminating the need for separate COPY commands for each file. Additionally, it maintains the segregation of the data files in the S3 data lake.

Question #19 Topic 1

An insurance company has raw data in JSON format that is sent without a predefined schedule through an Amazon Kinesis Data Firehose delivery stream to an Amazon S3 bucket. An AWS Glue crawler is scheduled to run every 8 hours to update the schema in the data catalog of the tables stored in the S3 bucket. Data analysts analyze the data using Apache Spark SQL on Amazon EMR set up with AWS Glue Data Catalog as the metastore. Data analysts say that, occasionally, the data they receive is stale. A data engineer needs to provide access to the most up-to-date data. Which solution meets these requirements?

A Create an external schema based on the AWS Glue Data Catalog on the existing Amazon Redshift cluster to query new data in Amazon S3 with Amazon Redshift Spectrum.
B Use Amazon CloudWatch Events with the rate (1 hour) expression to execute the AWS Glue crawler every hour.
C Using the AWS CLI, modify the execution schedule of the AWS Glue crawler from 8 hours to 1 minute.
D Run the AWS Glue crawler from an AWS Lambda function triggered by an S3:ObjectCreated:* event notification on the S3 bucket.

Suggested Answer: D
NOTE: Running the AWS Glue crawler from an AWS Lambda function triggered by an S3:ObjectCreated:* event notification on the S3 bucket would provide the most up-to-date data. By triggering the AWS Glue crawler whenever a new object is created in the S3 bucket, the schema in the data catalog can be updated immediately, ensuring that the analysts have access to the latest data.

Question #20 Topic 1

A company currently uses Amazon Athena to query its global datasets. The regional data is stored in Amazon S3 in the us-east-1 and us-west-2 Regions. The data is not encrypted. To simplify the query process and manage it centrally, the company wants to use Athena in us-west-2 to query data from Amazon S3 in both Regions. The solution should be as low-cost as possible. What should the company do to achieve this goal?

A Use AWS DMS to migrate the AWS Glue Data Catalog from us-east-1 to us-west-2. Run Athena queries in us-west-2.
B Run the AWS Glue crawler in us-west-2 to catalog datasets in all Regions. Once the data is crawled, run Athena queries in us-west-2.
C Enable cross-Region replication for the S3 buckets in us-east-1 to replicate data in us-west-2. Once the data is replicated in us-west-2, run the AWS Glue crawler there to update the AWS Glue Data Catalog in us-west-2 and run Athena queries.
D Update AWS Glue resource policies to provide us-east-1 AWS Glue Data Catalog access to us-west-2. Once the catalog in us-west-2 has access to the catalog in us-east-1, run Athena queries in us-west-2.

Suggested Answer: C
NOTE: The company should enable cross-Region replication for the S3 buckets in us-east-1 to replicate data in us-west-2. Once the data is replicated in us-west-2, they should run the AWS Glue crawler to update the AWS Glue Data Catalog in us-west-2 and run Athena queries. This approach allows for centralized management and querying of data from both Regions while minimizing costs.