[Jul-2021] Pass Amazon AWS-Certified-Machine-Learning-Specialty Exam in First Attempt Guaranteed! [Q21-Q46]

[Jul-2021] Pass Amazon AWS-Certified-Machine-Learning-Specialty Exam in First Attempt Guaranteed!

Full AWS-Certified-Machine-Learning-Specialty Practice Test and 85 unique questions with explanations waiting just for you, get it now!

NEW QUESTION 21
A Machine Learning Specialist is implementing a full Bayesian network on a dataset that describes public transit in New York City. One of the random variables is discrete, and represents the number of minutes New Yorkers wait for a bus given that the buses cycle every 10 minutes, with a mean of 3 minutes.
Which prior probability distribution should the ML Specialist use for this variable?

A. Uniform distribution
B. Poisson distribution ,
C. Binomial distribution
D. Normal distribution

Answer: C

NEW QUESTION 22
A Machine Learning Specialist is preparing data for training on Amazon SageMaker The Specialist is transformed into a numpy .array, which appears to be negatively affecting the speed of the training What should the Specialist do to optimize the data for training on SageMaker'?

A. Use the SageMaker batch transform feature to transform the training data into a DataFrame
B. Use AWS Glue to compress the data into the Apache Parquet format
C. Transform the dataset into the Recordio protobuf format
D. Use the SageMaker hyperparameter optimization feature to automatically optimize the data

Answer: B

NEW QUESTION 23
An insurance company is developing a new device for vehicles that uses a camera to observe drivers' behavior and alert them when they appear distracted The company created approximately 10,000 training images in a controlled environment that a Machine Learning Specialist will use to train and evaluate machine learning models During the model evaluation the Specialist notices that the training error rate diminishes faster as the number of epochs increases and the model is not accurately inferring on the unseen test images Which of the following should be used to resolve this issue? (Select TWO)

A. Perform data augmentation on the training data
B. Add L2 regularization to the model
C. Make the neural network architecture complex.
D. Use gradient checking in the model
E. Add vanishing gradient to the model

Answer: A,D

NEW QUESTION 24
A Machine Learning Specialist uploads a dataset to an Amazon S3 bucket protected with server- side encryption using AWS KMS.
How should the ML Specialist define the Amazon SageMaker notebook instance so it can read the same dataset from Amazon S3?

A. Assign an IAM role to the Amazon SageMaker notebook with S3 read access to the dataset.
Grant permission in the KMS key policy to that role.
B. Define security group(s) to allow all HTTP inbound/outbound traffic and assign those security group(s) to the Amazon SageMaker notebook instance.
C. Assign the same KMS key used to encrypt data in Amazon S3 to the Amazon SageMaker notebook instance.
D. onfigure the Amazon SageMaker notebook instance to have access to the VPC. Grant permission in the KMS key policy to the notebook's KMS role.

Answer: C

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/encryption-at-rest.html

NEW QUESTION 25
A company is interested in building a fraud detection model. Currently, the Data Scientist does not have a sufficient amount of information due to the low number of fraud cases.
Which method is MOST likely to detect the GREATEST number of valid fraud cases?

A. Undersampling
B. Class weight adjustment
C. Oversampling using bootstrapping
D. Oversampling using SMOTE

Answer: D

Explanation:
With datasets that are not fully populated, the Synthetic Minority Over-sampling Technique (SMOTE) adds new information by adding synthetic data points to the minority class. This technique would be the most effective in this scenario. Refer to Section 4.2 at this link for supporting informatio

NEW QUESTION 26
A Machine Learning Specialist needs to be able to ingest streaming data and store it in Apache Parquet files for exploration and analysis. Which of the following services would both ingest and store this data in the correct format?

A. Amazon Kinesis Data Firehose
B. Amazon Kinesis Data Streams
C. AWSDMS
D. Amazon Kinesis Data Analytics

Answer: B

NEW QUESTION 27
A Data Science team within a large company uses Amazon SageMaker notebooks to access data stored in Amazon S3 buckets. The IT Security team is concerned that internet-enabled notebook instances create a security vulnerability where malicious code running on the instances could compromise data privacy. The company mandates that all instances stay within a secured VPC with no internet access, and data communication traffic must stay within the AWS network.
How should the Data Science team configure the notebook instance placement to meet these requirements?

A. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has a NAT gateway and an associated security group allowing only outbound connections to Amazon S3 and Amazon SageMaker.
B. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Ensure the VPC has S3 VPC endpoints and Amazon SageMaker VPC endpoints attached to it.
C. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Use IAM policies to grant access to Amazon S3 and Amazon SageMaker.
D. Associate the Amazon SageMaker notebook with a private subnet in a VPC. Place the Amazon SageMaker endpoint and S3 buckets within the same VPC.

Answer: B

NEW QUESTION 28
A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3 The source systems send data in CSV format in real lime The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3 Which solution takes the LEAST effort to implement?

A. Ingest CSV data from Amazon Kinesis Data Streams and use Amazon Glue to convert data into Parquet
B. Ingest CSV data using Apache Spark Structured Streaming in an Amazon EMR cluster and use Apache Spark to convert data into Parquet.
C. Ingest CSV data using Apache Kafka Streams on Amazon EC2 instances and use Kafka Connect S3 to serialize data as Parquet.
D. Ingest CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert data into Parquet

Answer: D

NEW QUESTION 29
An agency collects census information within a country to determine healthcare and social program needs by province and city. The census form collects responses for approximately 500 questions from each citizen.
Which combination of algorithms would provide the appropriate insights? (Select TWO.)

A. The factorization machines (FM) algorithm
B. The k-means algorithm
C. The Random Cut Forest (RCF) algorithm
D. The Latent Dirichlet Allocation (LDA) algorithm
E. The principal component analysis (PCA) algorithm

Answer: B,E

Explanation:
The PCA and K-means algorithms are useful in collection of data using census form.

NEW QUESTION 30
A company wants to classify user behavior as either fraudulent or normal. Based on internal research, a Machine Learning Specialist would like to build a binary classifier based on two features: age of account and transaction month. The class distribution for these features is illustrated in the figure provided.

Based on this information, which model would have the HIGHEST recall with respect to the fraudulent class?

A. Naive Bayesian classifier
B. Single Perceptron with sigmoidal activation function
C. Linear support vector machine (SVM)
D. Decision tree

Answer: A

NEW QUESTION 31
A Data Scientist needs to analyze employment data. The dataset contains approximately 10 million observations on people across 10 different features. During the preliminary analysis, the Data Scientist notices that income and age distributions are not normal. While income levels shows a right skew as expected, with fewer individuals having a higher income, the age distribution also show a right skew, with fewer older individuals participating in the workforce.
Which feature transformations can the Data Scientist apply to fix the incorrectly skewed data? (Choose two.)

A. Logarithmic transformation
B. Cross-validation
C. One hot encoding
D. High-degree polynomial transformation
E. Numerical value binning

Answer: B,E

NEW QUESTION 32
A large mobile network operating company is building a machine learning model to predict customers who are likely to unsubscribe from the service. The company plans to offer an incentive for these customers as the cost of churn is far greater than the cost of the incentive.
The model produces the following confusion matrix after evaluating on a test dataset of 100 customers:

Based on the model evaluation results, why is this a viable model for production?

A. The precision of the model is 86%, which is less than the accuracy of the model.
B. The precision of the model is 86%, which is greater than the accuracy of the model.
C. The model is 86% accurate and the cost incurred by the company as a result of false positives is less than the false negatives.
D. The model is 86% accurate and the cost incurred by the company as a result of false negatives is less than the false positives.

Answer: D

NEW QUESTION 33
A Data Scientist wants to gain real-time insights into a data stream of GZIP files. Which solution would allow the use of SQL to query the stream with the LEAST latency?

A. Amazon Kinesis Data Firehose to transform the data and put it into an Amazon S3 bucket.
B. Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data.
C. AWS Glue with a custom ETL script to transform the data.
D. An Amazon Kinesis Client Library to transform the data and save it to an Amazon ES cluster.

Answer: B

NEW QUESTION 34
A Data Scientist wants to gain real-time insights into a data stream of GZIP files. Which solution would allow the use of SQL to query the stream with the LEAST latency?

A. Amazon Kinesis Data Analytics with an AWS Lambda function to transform the data.
B. AWS Glue with a custom ETL script to transform the data.
C. An Amazon Kinesis Client Library to transform the data and save it to an Amazon ES cluster.
D. Amazon Kinesis Data Firehose to transform the data and put it into an Amazon S3 bucket.

Answer: D

NEW QUESTION 35
A Machine Learning Specialist is building a model that will perform time series forecasting using Amazon SageMaker. The Specialist has finished training the model and is now planning to perform load testing on the endpoint so they can configure Auto Scaling for the model variant.
Which approach will allow the Specialist to review the latency, memory utilization, and CPU utilization during the load test?

A. Generate an Amazon CloudWatch dashboard to create a single view for the latency, memory utilization, and CPU utilization metrics that are outputted by Amazon SageMaker.
B. Send Amazon CloudWatch Logs that were generated by Amazon SageMaker to Amazon ES and use Kibana to query and visualize the log data
C. Build custom Amazon CloudWatch Logs and then leverage Amazon ES and Kibana to query and visualize the log data as it is generated by Amazon SageMaker.
D. Review SageMaker logs that have been written to Amazon S3 by leveraging Amazon Athena and Amazon QuickSight to visualize logs as they are being produced.

Answer: A

Explanation:
https://docs.aws.amazon.com/sagemaker/latest/dg/monitoring-cloudwatch.html

NEW QUESTION 36
An online reseller has a large, multi-column dataset with one column missing 30% of its data. A Machine Learning Specialist believes that certain columns in the dataset could be used to reconstruct the missing data.
Which reconstruction approach should the Specialist use to preserve the integrity of the dataset?

A. Mean substitution
B. Listwise deletion
C. Last observation carried forward
D. Multiple imputation

Answer: D

Explanation:
Explanation/Reference: https://worldwidescience.org/topicpages/i/imputing+missing+values.html

NEW QUESTION 37
A manufacturing company has a large set of labeled historical sales data. The manufacturer would like to predict how many units of a particular part should be produced each quarter.
Which machine learning approach should be used to solve this problem?

A. Random Cut Forest (RCF)
B. Logistic regression
C. Principal component analysis (PCA)
D. Linear regression

Answer: D

Explanation:
https://docs.aws.amazon.com/zh_tw/machine-learning/latest/dg/regression-model-insights.html

NEW QUESTION 38
A Machine Learning Specialist is developing a daily ETL workflow containing multiple ETL jobs The workflow consists of the following processes
* Start the workflow as soon as data is uploaded to Amazon S3
* When all the datasets are available in Amazon S3, start an ETL job to join the uploaded datasets with multiple terabyte-sized datasets already stored in Amazon S3
* Store the results of joining datasets in Amazon S3
* If one of the jobs fails, send a notification to the Administrator
Which configuration will meet these requirements?

A. Use AWS Lambda to chain other Lambda functions to read and join the datasets in Amazon S3 as soon as the data is uploaded to Amazon S3 Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure
B. Develop the ETL workflow using AWS Batch to trigger the start of ETL jobs when data is uploaded to Amazon S3 Use AWS Glue to join the datasets in Amazon S3 Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure
C. Use AWS Lambda to trigger an AWS Step Functions workflow to wait for dataset uploads to complete in Amazon S3. Use AWS Glue to join the datasets Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure
D. Develop the ETL workflow using AWS Lambda to start an Amazon SageMaker notebook instance Use a lifecycle configuration script to join the datasets and persist the results in Amazon S3 Use an Amazon CloudWatch alarm to send an SNS notification to the Administrator in the case of a failure

Answer: C

NEW QUESTION 39
A Machine Learning Specialist has completed a proof of concept for a company using a small data sample, and now the Specialist is ready to implement an end-to-end solution in AWS using Amazon SageMaker. The historical training data is stored in Amazon RDS.
Which approach should the Specialist use for training a model using that data?

A. Move the data to Amazon DynamoDB and set up a connection to DynamoDB within the notebook to pull data in.
B. Move the data to Amazon ElastiCache using AWS DMS and set up a connection within the notebook to pull data in for fast access.
C. Write a direct connection to the SQL database within the notebook and pull data in
D. Push the data from Microsoft SQL Server to Amazon S3 using an AWS Data Pipeline and provide the S3 location within the notebook.

Answer: D

NEW QUESTION 40
Machine Learning Specialist is training a model to identify the make and model of vehicles in images. The Specialist wants to use transfer learning and an existing model trained on images of general objects. The Specialist collated a large custom dataset of pictures containing different vehicle makes and models.
What should the Specialist do to initialize the model to re-train it with the custom data?

A. Initialize the model with pre-trained weights in all layers including the last fully connected layer.
B. Initialize the model with pre-trained weights in all layers and replace the last fully connected layer.
C. Initialize the model with random weights in all layers including the last fully connected layer.
D. Initialize the model with random weights in all layers and replace the last fully connected layer.

Answer: B

NEW QUESTION 41
An interactive online dictionary wants to add a widget that displays words used in similar contexts. A Machine Learning Specialist is asked to provide word features for the downstream nearest neighbor model powering the widget.
What should the Specialist do to meet these requirements?

A. Create word embedding factors that store edit distance with every other word.
B. Create one-hot word encoding vectors.
C. Download word embedding's pre-trained on a large corpus.
D. Produce a set of synonyms for every word using Amazon Mechanical Turk.

Answer: B

NEW QUESTION 42
A Machine Learning Specialist is working with a large company to leverage machine learning within its products. The company wants to group its customers into categories based on which customers will and will not churn within the next 6 months. The company has labeled the data available to the Specialist.
Which machine learning model type should the Specialist use to accomplish this task?

A. Clustering
B. Linear regression
C. Classification
D. Reinforcement learning

Answer: C

Explanation:
Explanation
The goal of classification is to determine to which class or category a data point (customer in our case) belongs to. For classification problems, data scientists would use historical data with predefined target variables AKA labels (churner/non-churner) - answers that need to be predicted - to train an algorithm. With classification, businesses can answer the following questions:
* Will this customer churn or not?
* Will a customer renew their subscription?
* Will a user downgrade a pricing plan?
* Are there any signs of unusual customer behavior?

NEW QUESTION 43
A Machine Learning Specialist works for a credit card processing company and needs to predict which transactions may be fraudulent in near-real time. Specifically, the Specialist must train a model that returns the probability that a given transaction may fraudulent.
How should the Specialist frame this business problem?

A. Binary classification
B. Streaming classification
C. Regression classification
D. Multi-category classification

Answer: D

NEW QUESTION 44
A Machine Learning Specialist is using an Amazon SageMaker notebook instance in a private subnet of a corporate VPC. The ML Specialist has important data stored on the Amazon SageMaker notebook instance's Amazon EBS volume, and needs to take a snapshot of that EBS volume. However the ML Specialist cannot find the Amazon SageMaker notebook instance's EBS volume or Amazon EC2 instance within the VPC.
Why is the ML Specialist not seeing the instance visible in the VPC?

A. Amazon SageMaker notebook instances are based on AWS ECS instances running within AWS service accounts.
B. Amazon SageMaker notebook instances are based on the EC2 instances within the customer account, but they run outside of VPCs.
C. Amazon SageMaker notebook instances are based on the Amazon ECS service within customer accounts.
D. Amazon SageMaker notebook instances are based on EC2 instances running within AWS service accounts.

Answer: D

NEW QUESTION 45
A Machine Learning Specialist needs to create a data repository to hold a large amount of time-based training data for a new model. In the source system, new files are added every hour Throughout a single 24-hour period, the volume of hourly updates will change significantly. The Specialist always wants to train on the last
24 hours of the data
Which type of data repository is the MOST cost-effective solution?

A. An Amazon EMR cluster with hourly hive partitions on Amazon EBS volumes
B. An Amazon RDS database with hourly table partitions
C. An Amazon EBS-backed Amazon EC2 instance with hourly directories
D. An Amazon S3 data lake with hourly object prefixes

Answer: D

NEW QUESTION 46
......

Prepare for your Amazon certification with the updated PassExamDumps AWS-Certified-Machine-Learning-Specialty exam questions: https://drive.google.com/open?id=1ZsZ3VGiGxSe0zDQUSfUWDznmFZaE9szc

Get Latest AWS-Certified-Machine-Learning-Specialty Dumps Exam Questions in here: https://www.passexamdumps.com/AWS-Certified-Machine-Learning-Specialty-valid-exam-dumps.html

[Jul-2021] Pass Amazon AWS-Certified-Machine-Learning-Specialty Exam in First Attempt Guaranteed! [Q21-Q46]

Related Articles

Useful Links

Latest Exams

Contact Us