AWS Certified Machine Learning Specialty (MLS-C01)

AWS Certified Machine Learning Specialty (MLS-C01) 更新于今天

查看第 6 至第 57 页.
查看第 26-30 至第 285 道题

Disclaimers:

- ExamTopics website is not related to, affiliated with, endorsed or authorized by Amazon.and Azure
- Trademarks, certification & product names are used for reference only and belong to Amazon.and Azure

Topic 1 - Exam A

Question #26 Topic 1

一位机器学习专家正在将线性最小二乘回归模型应用于具有1000条记录和50个特征的数据集。培训前，ML专家注意到，两个特性完全线性相关。为什么这会成为线性最小二乘回归模型的问题？

A 这可能会导致反向传播算法在训练过程中失败
B 它可以在优化过程中创建一个奇异矩阵，但无法定义唯一的解决方案
C 它可以在优化过程中修改损失函数，导致其在训练过程中失败
D 它可能会在数据中引入非线性相关性，从而使模型的线性假设失效

正确答案: C
解析:

Question #27 Topic 1

一位机器学习专家正在为一家在线零售商工作，该零售商希望通过机器学习管道对每次客户访问进行分析。Amazon Kinesis数据流需要以每秒100个事务的速度获取数据，JSON数据blob大小为100KB。专家应该使用Kinesis数据流中的碎片的最小数量来成功摄取这些数据？

A 1个碎片
B 10个碎片
C 100个碎片
D 1000个碎片

正确答案: B
解析:

Question #28 Topic 1

一位数据科学家正在开发一种二元分类器，根据一系列测试结果预测患者是否患有某种特定疾病。数据科学家有关于从人群中随机选择400名患者。该病在3%的人口中出现。数据科学家应采用哪种交叉验证策略？

A k＝5的k倍交叉验证策略
B k＝5的分层k倍交叉验证策略
C k＝5和3个重复的k倍交叉验证策略
D 培训和验证之间的80/20分层划分

正确答案: B
解析:

Question #29 Topic 1

一位数据科学家需要将现有的本地ETL过程迁移到云中。当前过程定期运行，并使用PySpark将多个大量数据源合并和格式化为单个整合输出，供下游处理。数据科学家已被分配到云解决方案以下要求：✑ 合并多个数据源。✑ 重用现有的PySpark逻辑。✑ 在现有的调度程序上运行解决方案。✑ 最小化需要管理的服务器数量。那么数据科学家应该使用哪种架构来构建这个解决方案呢？

A Write the raw data to Amazon S3. Schedule an AWS Lambda function to submit a Spark step to a persistent Amazon EMR cluster based on the existing schedule. Use the existing PySpark logic to run the ETL job on the EMR cluster. Output the results to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.
B Write the raw data to Amazon S3. Create an AWS Glue ETL job to perform the ETL processing against the input data. Write the ETL job in PySpark to leverage the existing logic. Create a new AWS Glue trigger to trigger the ETL job based on the existing schedule. Configure the output target of the ETL job to write to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.
C Write the raw data to Amazon S3. Schedule an AWS Lambda function to run on the existing schedule and process the input data from Amazon S3. Write the Lambda logic in Python and implement the existing PySpark logic to perform the ETL process. Have the Lambda function output the results to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.
D Use Amazon Kinesis Data Analytics to stream the input data and perform real-time SQL queries against the stream to carry out the required transformations within the stream. Deliver the output results to a ג€processedג€ location in Amazon S3 that is accessible for downstream use.

正确答案: D
解析:

Question #30 Topic 1

一位数据科学家正在建立一个模型，使用100个连续数字特征的数据集预测客户流失。营销团队没有提供任何关于哪些功能与流失预测相关的见解。营销团队希望解释模型，并了解相关功能对模型结果的直接影响。在训练逻辑回归模型时，《数据科学家》观察到，训练和验证集精度之间存在很大差距。数据科学家可以使用哪些方法来提高模型性能并满足营销团队的需求？（选择两个。）

A 将L1正则化添加到分类器
B 向数据集添加功能
C 执行递归特征消除
D 执行t-分布随机邻居嵌入（t-SNE）
E 执行线性判别分析

正确答案: BE
解析: