AWS Certified Machine Learning Specialty (MLS-C01)

The AWS Certified Machine Learning Specialty (MLS-C01) were last updated on today.

Viewing page 6 out of 57 pages.
Viewing questions 26-30 out of 285 questions

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.and Azure
- Trademarks, certification & product names are used for reference only and belong to Amazon.and Azure

Topic 1 - Exam A

Question #26 Topic 1

A Machine Learning Specialist is applying a linear least squares regression model to a dataset with 1,000 records and 50 features. Prior to training, the ML Specialist notices that two features are perfectly linearly dependent. Why could this be an issue for the linear least squares regression model?

A It could cause the backpropagation algorithm to fail during training
B It could create a singular matrix during optimization, which fails to define a unique solution
C It could modify the loss function during optimization, causing it to fail during training
D It could introduce non-linear dependencies within the data, which could invalidate the linear assumptions of the model

Suggested Answer: C
NOTE:

Question #27 Topic 1

A Machine Learning Specialist is working for an online retailer that wants to run analytics on every customer visit, processed through a machine learning pipeline. The data needs to be ingested by Amazon Kinesis Data Streams at up to 100 transactions per second, and the JSON data blob is 100 KB in size. What is the MINIMUM number of shards in Kinesis Data Streams the Specialist should use to successfully ingest this data?

A 1 shards
B 10 shards
C 100 shards
D 1,000 shards

Suggested Answer: B
NOTE:

Question #28 Topic 1

A Data Scientist is developing a binary classifier to predict whether a patient has a particular disease on a series of test results. The Data Scientist has data on 400 patients randomly selected from the population. The disease is seen in 3% of the population. Which cross-validation strategy should the Data Scientist adopt?

A A k-fold cross-validation strategy with k=5
B A stratified k-fold cross-validation strategy with k=5
C A k-fold cross-validation strategy with k=5 and 3 repeats
D An 80/20 stratified split between training and validation

Suggested Answer: B
NOTE:

Question #29 Topic 1

A Data Scientist needs to migrate an existing on-premises ETL process to the cloud. The current process runs at regular time intervals and uses PySpark to combine and format multiple large data sources into a single consolidated output for downstream processing. The Data Scientist has been given the following requirements to the cloud solution: ✑ Combine multiple data sources. ✑ Reuse existing PySpark logic. ✑ Run the solution on the existing schedule. ✑ Minimize the number of servers that will need to be managed. Which architecture should the Data Scientist use to build this solution?

A Write the raw data to Amazon S3. Schedule an AWS Lambda function to submit a Spark step to a persistent Amazon EMR cluster based on the existing schedule. Use the existing PySpark logic to run the ETL job on the EMR cluster. Output the results to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.
B Write the raw data to Amazon S3. Create an AWS Glue ETL job to perform the ETL processing against the input data. Write the ETL job in PySpark to leverage the existing logic. Create a new AWS Glue trigger to trigger the ETL job based on the existing schedule. Configure the output target of the ETL job to write to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.
C Write the raw data to Amazon S3. Schedule an AWS Lambda function to run on the existing schedule and process the input data from Amazon S3. Write the Lambda logic in Python and implement the existing PySpark logic to perform the ETL process. Have the Lambda function output the results to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.
D Use Amazon Kinesis Data Analytics to stream the input data and perform real-time SQL queries against the stream to carry out the required transformations within the stream. Deliver the output results to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.

Suggested Answer: D
NOTE:

Question #30 Topic 1

A Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team has not provided any insight about which features are relevant for churn prediction. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. While training a logistic regression model, the Data Scientist observes that there is a wide gap between the training and validation set accuracy. Which methods can the Data Scientist use to improve the model performance and satisfy the Marketing team's needs? (Choose two.)

A Add L1 regularization to the classifier
B Add features to the dataset
C Perform recursive feature elimination
D Perform t-distributed stochastic neighbor embedding (t-SNE)
E Perform linear discriminant analysis

Suggested Answer: BE
NOTE: