AWS Certified Machine Learning Specialty (MLS-C01)

The AWS Certified Machine Learning Specialty (MLS-C01) were last updated on today.

Viewing page 7 out of 57 pages.
Viewing questions 31-35 out of 285 questions

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.and Azure
- Trademarks, certification & product names are used for reference only and belong to Amazon.and Azure

Topic 1 - Exam A

Question #31 Topic 1

A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However, the ML Specialist cannot find the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance within the VPC. Why is the ML Specialist not seeing the instance visible in the VPC?

A Amazon SageMaker notebook instances are based on the EC2 instances within the customer account, but they run outside of VPCs.
B Amazon SageMaker notebook instances are based on the Amazon ECS service within customer accounts.
C Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts.
D Amazon SageMaker notebook instances are based on AWS ECS instances running within AWS service accounts.

Suggested Answer: C
NOTE: Reference: https://docs.aws.amazon.com/sagemaker/latest/dg/gs-setup-working-env.html

Question #32 Topic 1

A web-based company wants to improve its conversion rate on its landing page. Using a large historical dataset of customer visits, the company has repeatedly trained a multi-class deep learning network algorithm on Amazon SageMaker. However, there is an overfitting problem: training data shows 90% accuracy in predictions, while test data shows 70% accuracy only. The company needs to boost the generalization of its model before deploying it into production to maximize conversions of visits to purchases. Which action is recommended to provide the HIGHEST accuracy model for the company's test and validation data?

A Increase the randomization of training data in the mini-batches used in training
B Allocate a higher proportion of the overall data to the training dataset
C Apply L1 or L2 regularization and dropouts to the training
D Reduce the number of layers and units (or neurons) from the deep learning network

Suggested Answer: D
NOTE:

Question #33 Topic 1

A Machine Learning Specialist must build out a process to query a dataset on Amazon S3 using Amazon Athena. The dataset contains more than 800,000 records stored as plaintext CSV files. Each record contains 200 columns and is approximately 1.5 MB in size. Most queries will span 5 to 10 columns only. How should the Machine Learning Specialist transform the dataset to minimize query runtime?

A Convert the records to Apache Parquet format.
B Convert the records to JSON format.
C Convert the records to GZIP CSV format.
D Convert the records to XML format.

Suggested Answer: A
NOTE: Using compressions will reduce the amount of data scanned by Amazon Athena, and also reduce your S3 bucket storage. It's a Win-Win for your AWS bill. Supported formats: GZIP, LZO, SNAPPY (Parquet) and ZLIB. Reference: https://www.cloudforecast.io/blog/using-parquet-on-athena-to-save-money-on-aws/

Question #34 Topic 1

A company uses a long short-term memory (LSTM) model to evaluate the risk factors of a particular energy sector. The model reviews multi-page text documents to analyze each sentence of the text and categorize it as either a potential risk or no risk. The model is not performing well, even though the Data Scientist has experimented with many different network structures and tuned the corresponding hyperparameters. Which approach will provide the MAXIMUM performance boost?

A Initialize the words by term frequency-inverse document frequency (TF-IDF) vectors pretrained on a large collection of news articles related to the energy sector.
B Use gated recurrent units (GRUs) instead of LSTM and run the training process until the validation loss stops decreasing.
C Reduce the learning rate and run the training process until the training loss stops decreasing.
D Initialize the words by word2vec embeddings pretrained on a large collection of news articles related to the energy sector.

Suggested Answer: C
NOTE:

Question #35 Topic 1

A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs. The workflow consists of the following processes: * Start the workflow as soon as data is uploaded to Amazon S3. * When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3. * Store the results of joining datasets in Amazon S3. * If one of the jobs fails, send a notification to the Administrator. Which configuration will meet these requirements?

A Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
B Develop the ETL workflow using AWS Lambda to start an Amazon SageMaker notebook instance. Use a lifecycle configuration script to join the datasets and persist the results in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
C Develop the ETL workflow using AWS Batch to trigger the start of ETL jobs when data is uploaded to Amazon S3. Use AWS Glue to join the datasets in Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.
D Use AWS Lambda to chain other Lambda functions to read and join the datasets in Amazon S3 as soon as the data is uploaded to Amazon S3. Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure.

Suggested Answer: A
NOTE: Reference: https://aws.amazon.com/step-functions/use-cases/