AWS Certified Data Analytics - Specialty (DAS-C01)

The AWS Certified Data Analytics - Specialty (DAS-C01) were last updated on today.

Viewing page 3 out of 32 pages.
Viewing questions 11-15 out of 160 questions

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.and Azure
- Trademarks, certification & product names are used for reference only and belong to Amazon.and Azure

Topic 1 - Exam A

Question #11 Topic 1

A manufacturing company uses Amazon S3 to store its data. The company wants to use AWS Lake Formation to provide granular-level security on those data assets. The data is in Apache Parquet format. The company has set a deadline for a consultant to build a data lake. How should the consultant create the MOST cost-effective solution that meets these requirements?

A Run Lake Formation blueprints to move the data to Lake Formation. Once Lake Formation has the data, apply permissions on Lake Formation.
B To create the data catalog, run an AWS Glue crawler on the existing Parquet data. Register the Amazon S3 path and then apply permissions through Lake Formation to provide granular-level security.
C Install Apache Ranger on an Amazon EC2 instance and integrate with Amazon EMR. Using Ranger policies, create role-based access control for the existing data assets in Amazon S3.
D Create multiple IAM roles for different users and groups. Assign IAM roles to different data assets in Amazon S3 to create table-based and column-based access controls.

Suggested Answer: B
NOTE: The consultant should choose option B because it is the most cost-effective solution that meets the requirements. By running an AWS Glue crawler on the existing Parquet data, the consultant can create a data catalog and register the Amazon S3 path. Then, they can apply permissions through AWS Lake Formation to provide granular-level security.

Question #12 Topic 1

An energy company collects voltage data in real time from sensors that are attached to buildings. The company wants to receive notifications when a sequence of two voltage drops is detected within 10 minutes of a sudden voltage increase at the same building. All notifications must be delivered as quickly as possible. The system must be highly available. The company needs a solution that will automatically scale when this monitoring feature is implemented in other cities. The notification system is subscribed to an Amazon Simple Notification Service (Amazon SNS) topic for remediation. Which solution will meet these requirements?

A Create an Amazon Managed Streaming for Apache Kafka cluster to ingest the data. Use an Apache Spark Streaming with Apache Kafka consumer API in an automatically scaled Amazon EMR cluster to process the incoming data. Use the Spark Streaming application to detect the known event sequence and send the SNS message.
B Create a REST-based web service by using Amazon API Gateway in front of an AWS Lambda function. Create an Amazon RDS for PostgreSQL database with sufficient Provisioned IOPS to meet current demand. Configure the Lambda function to store incoming events in the RDS for PostgreSQL database, query the latest data to detect the known event sequence, and send the SNS message.
C Create an Amazon Kinesis Data Firehose delivery stream to capture the incoming sensor data. Use an AWS Lambda transformation function to detect the known event sequence and send the SNS message.
D Create an Amazon Kinesis data stream to capture the incoming sensor data. Create another stream for notifications. Set up AWS Application Auto Scaling on both streams. Create an Amazon Kinesis Data Analytics for Java application to detect the known event sequence, and add a message to the message stream Configure an AWS Lambda function to poll the message stream and publish to the SNS topic.

Suggested Answer: D
NOTE: The solution in choice D is the most suitable for the given requirements. It involves using Amazon Kinesis data stream and notifications stream to capture and process the sensor data. AWS Application Auto Scaling ensures that the system automatically scales when implemented in other cities. Amazon Kinesis Data Analytics is used to detect the known event sequence, and an AWS Lambda function is configured to publish the message to the SNS topic.

Question #13 Topic 1

A company wants to use an automatic machine learning (ML) Random Cut Forest (RCF) algorithm to visualize complex real-world scenarios, such as detecting seasonality and trends, excluding outers, and imputing missing values. The team working on this project is non-technical and is looking for an out-of-the-box solution that will require the LEAST amount of management overhead. Which solution will meet these requirements?

A Use an AWS Glue ML transform to create a forecast and then use Amazon QuickSight to visualize the data.
B Use Amazon QuickSight to visualize the data and then use ML-powered forecasting to forecast the key business metrics.
C Use a pre-build ML AMI from the AWS Marketplace to create forecasts and then use Amazon QuickSight to visualize the data.
D Use calculated fields to create a new forecast and then use Amazon QuickSight to visualize the data.

Suggested Answer: C
NOTE: The solution that will meet the company's requirements and require the least amount of management overhead is to use a pre-build ML AMI from the AWS Marketplace to create forecasts and then use Amazon QuickSight to visualize the data. This solution provides an out-of-the-box solution for the non-technical team and minimizes the need for manual management.

Question #14 Topic 1

A data architect is building an Amazon S3 data lake for a bank. The goal is to provide a single data repository for customer data needs, such as personalized recommendations. The bank uses Amazon Kinesis Data Firehose to ingest customers' personal information bank accounts, and transactions in near-real time from a transactional relational database. The bank requires all personally identifiable information (PII) that is stored in the AWS Cloud to be masked. Which solution will meet these requirements?

A Invoke an AWS Lambda function from Kinesis Data Firehose to mask PII before delivering the data into Amazon S3.
B Use Amazon Made, and configure it to discover and mask PII.
C Enable server-side encryption (SSE) in Amazon S3.
D Invoke Amazon Comprehend from Kinesis Data Firehose to detect and mask PII before delivering the data into Amazon S3.

Suggested Answer: A
NOTE: The solution that will meet the requirements is to invoke an AWS Lambda function from Kinesis Data Firehose to mask PII before delivering the data into Amazon S3. This allows for real-time masking of personally identifiable information and ensures that the data stored in the S3 data lake is protected.

Question #15 Topic 1

A company has a business unit uploading .csv files to an Amazon S3 bucket. The company's data platform team has set up an AWS Glue crawler to do discovery, and create tables and schemas. An AWS Glue job writes processed data from the created tables to an Amazon Redshift database. The AWS Glue job handles column mapping and creating the Amazon Redshift table appropriately. When the AWS Glue job is rerun for any reason in a day, duplicate records are introduced into the Amazon Redshift table. Which solution will update the Redshift table without duplicates when jobs are rerun?

A Modify the AWS Glue job to copy the rows into a staging table. Add SQL commands to replace the existing rows in the main table as postactions in the DynamicFrameWriter class.
B Load the previously inserted data into a MySQL database in the AWS Glue job. Perform an upsert operation in MySQL, and copy the results to the Amazon Redshift table.
C Use Apache Spark's DataFrame dropDuplicates() API to eliminate duplicates and then write the data to Amazon Redshift.
D Use the AWS Glue ResolveChoice built-in transform to select the most recent value of the column.

Suggested Answer: B
NOTE: The correct solution in this scenario is to load the previously inserted data into a MySQL database in the AWS Glue job and perform an upsert operation in MySQL to update the Redshift table without duplicates.