Professional-Machine-Learning-Engineer Exam Dumps - Google Professional Machine Learning Engineer

Go to page:

<< First
Prev
1
2
3
4
5
6
7
8
9
10
Next
Last >>

Question # 17

You work for a bank and are building a random forest model for fraud detection. You have a dataset that

includes transactions, of which 1% are identified as fraudulent. Which data transformation strategy would likely improve the performance of your classifier?

Write your data in TFRecords.

Z-normalize all the numeric features.

Oversample the fraudulent transaction 10 times.

Use one-hot encoding on all categorical features.

Full Access

Question # 18

You are an AI architect at a popular photo-sharing social media platform. Your organizationâ€™s content moderation team currently scans images uploaded by users and removes explicit images manually. You want to implement an AI service to automatically prevent users from uploading explicit images. What should you do?

Develop a custom TensorFlow model in a Vertex AI Workbench instance. Train the model on a dataset of manually labeled images. Deploy the model to a Vertex AI endpoint. Run periodic batch inference to identify inappropriate uploads and report them to the content moderation team.

Train an image clustering model using TensorFlow in a Vertex AI Workbench instance. Deploy this model to a Vertex AI endpoint and configure it for online inference. Run this model each time a new image is uploaded to identify and block inappropriate uploads.

Create a dataset using manually labeled images. Ingest this dataset into AutoML. Train an image classification model and deploy it to a Vertex AI endpoint. Integrate this endpoint with the image upload process to identify and block inappropriate uploads. Monitor predictions and periodically retrain the model.

Send a copy of every user-uploaded image to a Cloud Storage bucket. Configure a Cloud Run function that triggers the Cloud Vision API to detect explicit content each time a new image is uploaded. Report the classifications to the content moderation team for review.

Full Access

Question # 19

You developed a custom model by using Vertex Al to predict your application's user churn rate You are using Vertex Al Model Monitoring for skew detection The training data stored in BigQuery contains two sets of features - demographic and behavioral You later discover that two separate models trained on each set perform better than the original model

You need to configure a new model mentioning pipeline that splits traffic among the two models You want to use the same prediction-sampling-rate and monitoring-frequency for each model You also want to minimize management effort What should you do?

Keep the training dataset as is Deploy the models to two separate endpoints and submit two Vertex Al Model Monitoring jobs with appropriately selected feature-thresholds parameters

Keep the training dataset as is Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and feature selections

Separate the training dataset into two tables based on demographic and behavioral features Deploy the models to two separate endpoints, and submit two Vertex Al Model Monitoring jobs

Separate the training dataset into two tables based on demographic and behavioral features. Deploy both models to the same endpoint and submit a Vertex Al Model Monitoring job with a monitoring-config-from parameter that accounts for the model IDs and training datasets

Full Access

Question # 20

You recently developed a wide and deep model in TensorFlow. You generated training datasets using a SQL script that preprocessed raw data in BigQuery by performing instance-level transformations of the data. You need to create a training pipeline to retrain the model on a weekly basis. The trained model will be used to generate daily recommendations. You want to minimize model development and training time. How should you develop the training pipeline?

Use the Kubeflow Pipelines SDK to implement the pipeline Use the BigQueryJobop component to run the preprocessing script and the customTrainingJobop component to launch a Vertex Al training job.

Use the Kubeflow Pipelines SDK to implement the pipeline. Use the dataflowpythonjobopcomponent to preprocess the data and the customTraining JobOp component to launch a Vertex Al training job.

Use the TensorFlow Extended SDK to implement the pipeline Use the Examplegen component with the BigQuery executor to ingest the data the Transform component to preprocess the data, and the Trainer component to launch a Vertex Al training job.

Use the TensorFlow Extended SDK to implement the pipeline Implement the preprocessing steps as part of the input_fn of the model Use the ExampleGen component with the BigQuery executor to ingest the data and the Trainer component to launch a Vertex Al training job.

Full Access

Question # 21

You work at a leading healthcare firm developing state-of-the-art algorithms for various use cases You have unstructured textual data with custom labels You need to extract and classify various medical phrases with these labels What should you do?

Use the Healthcare Natural Language API to extract medical entities.

Use a BERT-based model to fine-tune a medical entity extraction model.

Use AutoML Entity Extraction to train a medical entity extraction model.

Use TensorFlow to build a custom medical entity extraction model.

Full Access

Answer:

Explanation:

Medical entity extraction is a task that involves identifying and classifying medical terms or concepts from unstructured textual data, such as electronic health records, clinical notes, or research papers.Â Medical entity extraction can help with various use cases, such as information retrieval, knowledge discovery, decision support, and data analysis1.

One possible approach to perform medical entity extraction is to use a BERT-based model to fine-tune a medical entity extraction model.Â BERT (Bidirectional Encoder Representations from Transformers) is a pre-trained language model that can capture the contextual information from both left and right sides of a given token2.Â BERT can be fine-tuned on a specific downstream task, such as medical entity extraction, by adding a task-specific layer on top of the pre-trained model and updating the model parameters with a small amount of labeled data3.

A BERT-based model can achieve high performance on medical entity extraction by leveraging the large-scale pre-training on general-domain corpora and the fine-tuning on domain-specific data.Â For example, Nesterov and Umerenkov4Â proposed a novel method of doing medical entity extraction from electronic health records as a single-step multi-label classification task by fine-tuning a transformer model pre-trained on a large EHR dataset. They showed that their model can achieve human-level quality for most frequent entities.

References:

1: Medical Named Entity Recognition from Un-labelled Medical Records based on Pre-trained Language Models and Domain Dictionary | Data Intelligence | MIT Press

2: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

3: Fine-tuning BERT for Medical Entity Extraction

4: Distantly supervised end-to-end medical entity extraction from electronic health records with human-level quality

Question # 22

You work for an online retailer. Your company has a few thousand short lifecycle products. Your company has five years of sales data stored in BigQuery. You have been asked to build a model that will make monthly sales predictions for each product. You want to use a solution that can be implemented quickly with minimal effort. What should you do?

Use Prophet on Vertex Al Training to build a custom model.

Use Vertex Al Forecast to build a NN-based model.

Use BigQuery ML to build a statistical AR1MA_PLUS model.

Use TensorFlow on Vertex Al Training to build a custom model.

Full Access

Question # 23

You are working on a binary classification ML algorithm that detects whether an image of a classified scanned document contains a companyâ€™s logo. In the dataset, 96% of examples donâ€™t have the logo, so the dataset is very skewed. Which metrics would give you the most confidence in your model?

F-score where recall is weighed more than precision

RMSE

F1 score

F-score where precision is weighed more than recall

Full Access

Answer:

Explanation:

Option A is correct because using F-score where recall is weighed more than precision is a suitable metric for binary classification with imbalanced data.Â F-score is a harmonic mean of precision and recall, which are two metrics that measure the accuracy and completeness of the positive class1.Â Precision is the fraction of true positives among all predicted positives, while recall is the fraction of true positives among all actual positives1. When the data is imbalanced, the positive class is the minority class, which is usually the class of interest. For example, in this case, the positive class is the images that contain the companyâ€™s logo, which are rare but important to detect.Â By weighing recall more than precision, we can emphasize the importance of finding all the positive examples, even if some false positives are included2.

Option B is incorrect because using RMSE (root mean squared error) is not a valid metric for binary classification with imbalanced data.Â RMSE is a metric that measures the average magnitude of the errors between the predicted and actual values3.Â RMSE is suitable for regression problems, where the target variable is continuous, not for classification problems, where the target variable is discrete4.

Option C is incorrect because using F1 score is not the best metric for binary classification with imbalanced data.Â F1 score is a special case of F-score where precision and recall are equally weighted1.Â F1 score is suitable for balanced data, where the positive and negative classes are equally important and frequent5.Â However, for imbalanced data, the positive class is more important and less frequent than the negative class, so F1 score may not reflect the performance of the model well2.

Option D is incorrect because using F-score where precision is weighed more than recall is not a good metric for binary classification with imbalanced data.Â By weighing precision more than recall, we can emphasize the importance of minimizing the false positives, even if some true positives are missed2.Â However, for imbalanced data, the true positives are more important and less frequent than the false positives, so this metric may not reflect the performance of the model well2.

References:

Precision, recall, and F-measure

F-score for imbalanced data

RMSE

Regression vs classification

F1 score

[Imbalanced classification]

[Binary classification]

Question # 24

You work for a gaming company that manages a popular online multiplayer game where teams with 6 players play against each other in 5-minute battles. There are many new players every day. You need to build a model that automatically assigns available players to teams in real time. User research indicates that the game is more enjoyable when battles have players with similar skill levels. Which business metrics should you track to measure your modelâ€™s performance? (Choose One Correct Answer)

Average time players wait before being assigned to a team

Precision and recall of assigning players to teams based on their predicted versus actual ability

User engagement as measured by the number of battles played daily per user

Rate of return as measured by additional revenue generated minus the cost of developing a new model

Full Access

Answer:

Explanation:

The best business metric to track to measure the modelâ€™s performance is user engagement as measured by the number of battles played daily per user. This metric reflects the main goal of the model, which is to enhance the user experience and satisfaction by creating balanced and fair battles. If the model is successful, it should increase the user retention and loyalty, as well as the word-of-mouth and referrals. This metric is also easy to measure and interpret, as it can be directly obtained from the user activity data.

The other options are not optimal for the following reasons:

A. Average time players wait before being assigned to a team is not a good metric, as it does not capture the quality or outcome of the battles. It only measures the efficiency of the model, which is not the primary objective. Moreover, this metric can be influenced by external factors, such as the availability and demand of players, the network latency, and the server capacity.

B. Precision and recall of assigning players to teams based on their predicted versus actual ability is not a good metric, as it is difficult to measure and interpret. It requires having a reliable and consistent way of estimating the playerâ€™s ability, which can be subjective and dynamic. It also requires having a ground truth label for each assignment, which can be costly and impractical to obtain. Moreover, this metric does not reflect the user feedback or satisfaction, which is the ultimate goal of the model.

D. Rate of return as measured by additional revenue generated minus the cost of developing a new model is not a good metric, as it is not directly related to the modelâ€™s performance. It measures the profitability of the model, which is a secondary objective. Moreover, this metric can be affected by many other factors, such as the market conditions, the pricing strategy, the marketing campaigns, and the competition.

References:

Professional ML Engineer Exam Guide

Preparing for Google Cloud Certification: Machine Learning Engineer Professional Certificate

Google Cloud launches machine learning engineer certification

How to measure user engagement

How to choose the right metrics for your machine learning model

Go to page: