Databricks-Machine-Learning-Professional Exam Dumps - Databricks Certified Machine Learning Professional

Go to page:

Question # 4

A data scientist has developed a scikit-learn random forest model model, but they have not yet logged model with MLflow. They want to obtain the input schema and the output schema of the model so they can document what type of data is expected as input.

Which of the following MLflow operations can be used to perform this task?

mlflow.models.schema.infer_schema

mlflow.models.signature.infer_signature

mlflow.models.Model.get_input_schema

mlflow.models.Model.signature

There is no way to obtain the input schema and the output schema of an unlogged model.

Full Access

Question # 5

A data scientist is utilizing MLflow to track their machine learning experiments. After completing a series of runs for the experiment with experiment ID exp_id, the data scientist wants to programmatically work with the experiment run data in a Spark DataFrame. They have an active MLflow Client client and an active Spark session spark.

Which of the following lines of code can be used to obtain run-level results for exp_id in a Spark DataFrame?

client.list_run_infos(exp_id)

spark.read.format("delta").load(exp_id)

There is no way to programmatically return row-level results from an MLflow Experiment.

mlflow.search_runs(exp_id)

spark.read.format("mlflow-experiment").load(exp_id)

Full Access

Question # 6

A machine learning engineer is attempting to create a webhook that will trigger a Databricks Jobjob_idwhen a model version for modelmodeltransitions into any MLflow Model Registry stage.

They have the following incomplete code block:

Which of the following lines of code can be used to fill in the blank so that the code block accomplishes the task?

"MODEL_VERSION_CREATED"

"MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

"MODEL_VERSION_TRANSITIONED_TO_STAGING"

"MODEL_VERSION_TRANSITIONED_STAGE"

"MODEL_VERSION_TRANSITIONED_TO_STAGING", "MODEL_VERSION_TRANSITIONED_TO_PRODUCTION"

Full Access

Question # 7

A machine learning engineer wants to log and deploy a model as an MLflow pyfunc model. They have custom preprocessing that needs to be completed on feature variables prior to fitting the model or computing predictions using that model. They decide to wrap this preprocessing in a custom model class ModelWithPreprocess, where the preprocessing is performed when calling fit and when calling predict. They then log the fitted model of the ModelWithPreprocess class as a pyfunc model.

Which of the following is a benefit of this approach when loading the logged pyfunc model for downstream deployment?

The pvfunc model can be used to deploy models in a parallelizable fashion

The same preprocessing logic will automatically be applied when calling fit

The same preprocessing logic will automatically be applied when calling predict

This approach has no impact when loading the logged Pvfunc model for downstream deployment

There is no longer a need for pipeline-like machine learning objects

Full Access

Question # 8

Which of the following statements describes streaming with Spark as a model deployment strategy?

The inference of batch processed records as soon as a trigger is hit

The inference of all types of records in real-time

The inference of batch processed records as soon as a Spark job is run

The inference of incrementally processed records as soon as trigger is hit

The inference of incrementally processed records as soon as a Spark job is run

Full Access

Answer:

Explanation:

Streaming with Spark as a model deployment strategy means applying a machine learning model to data streams that are processed incrementally and continuously by Spark Structured Streaming.Â Spark Structured Streaming is a scalable and fault-tolerant stream processing engine that enables complex analytics on live data streams using the Dataset/DataFrame API1.Â Spark Structured Streaming supports various sources and sinks for streaming data, such as Kafka, Kinesis, TCP sockets, Delta tables, etc2.Â Spark Structured Streaming also supports various types of operations on streaming data, such as aggregations, windowing, joins, and stateful transformations3.Â To deploy a machine learning model on streaming data, you can use the MLflow model registry to managethe model lifecycle and versioning4.Â You can also use the MLflow model serving feature to serve the model as a REST API endpoint that can be invoked by Spark Structured Streaming5.Â Alternatively, you can use the UDF (user-defined function) feature to apply the model to streaming data within Spark Structured Streaming6.

The inference of incrementally processed records as soon as trigger is hit describes the streaming with Spark as a model deployment strategy. A trigger defines when the results of a streaming query should be written to the output sink. A trigger can be based on a processing time interval, a data size limit, or a continuous mode that writes the results as soon as they are available. The trigger ensures that the streaming query is executed incrementally and continuously, and the model inference is applied to the latest available data. The other options are incorrect because:

Option A: The inference of batch processed records as soon as a trigger is hit does not describe streaming with Spark, but rather batch processing with Spark. Batch processing means applying a machine learning model to a finite set of data that is processed as a single job. Batch processing does not require a trigger, as the results are written to the output sink when the job is completed.
Option B: The inference of all types of records in real-time does not describe streaming with Spark, but rather a generic definition of real-time processing. Real-time processing means applying a machine learning model to data streams that are processed as soon as they arrive, with minimal latency. Real-time processing does not necessarily use Spark Structured Streaming, as there are other frameworks and tools that can support it, such as Apache Flink, Apache Storm, etc.
Option C: The inference of batch processed records as soon as a Spark job is run does not describe streaming with Spark, but rather batch processing with Spark. Batch processing means applying a machine learning model to a finite set of data that is processed as a single job. Batch processing does not depend on a Spark job, as the model inference can be done outside of Spark, such as using a REST API endpoint, a command-line tool, etc.
Option E: The inference of incrementally processed records as soon as a Spark job is run does not describe streaming with Spark, but rather a contradiction. Incrementally processed records imply streaming processing, while a Spark job implies batch processing. Streaming processing and batch processing are different paradigms of data processing, and cannot be mixed in this way.Â References:Â Structured Streaming Programming Guide,Â Input Sources and Output Sinks,Â Operations on streaming DataFrames/Datasets,Â MLflow Model Registry,Â MLflow Model Serving,Â Apply machine learning models, [Triggers], [Trigger Types], [Batch Processing], [Real-time Processing], [Real-time Data Processing Frameworks], [Deploy machine learning models], [Batch vs Streaming Processing]

Go to page: