(serve-ml-models-tutorial)=

Serve ML Models (Tensorflow, PyTorch, Scikit-Learn, others)

This guide shows how to train models from various machine learning frameworks and deploy them to Ray Serve.

See the Key Concepts to learn more general information about Ray Serve.

:::::{tab-set}

::::{tab-item} Keras and TensorFlow

This example trains and deploys a simple TensorFlow neural net. In particular, it shows:

How to train a TensorFlow model and load the model from your file system in your Ray Serve deployment.
How to parse the JSON request and make a prediction.

Ray Serve is framework-agnostic--you can use any version of TensorFlow. This tutorial uses TensorFlow 2 and Keras. You also need requests to send HTTP requests to your model deployment. If you haven't already, install TensorFlow 2 and requests by running:

console

$ pip install "tensorflow>=2.0" requests "ray[serve]"

Open a new Python file called tutorial_tensorflow.py. First, import Ray Serve and some other helpers.

{literalinclude}

:start-after: __doc_import_begin__
:end-before: __doc_import_end__

Next, train a simple MNIST model using Keras.

{literalinclude}

:start-after: __doc_train_model_begin__
:end-before: __doc_train_model_end__

Next, define a TFMnistModel class that accepts HTTP requests and runs the MNIST model that you trained. The @serve.deployment decorator makes it a deployment object that you can deploy onto Ray Serve. Note that Ray Serve exposes the deployment over an HTTP route. By default, when the deployment receives a request over HTTP, Ray Serve invokes the __call__ method.

{literalinclude}

:start-after: __doc_define_servable_begin__
:end-before: __doc_define_servable_end__

:::{note} When you deploy and instantiate the TFMnistModel class, Ray Serve loads the TensorFlow model from your file system so that it can be ready to run inference on the model and serve requests later. :::

Now that you've defined the Serve deployment, prepare it so that you can deploy it.

{literalinclude}

:start-after: __doc_deploy_begin__
:end-before: __doc_deploy_end__

:::{note} TFMnistModel.bind(TRAINED_MODEL_PATH) binds the argument TRAINED_MODEL_PATH to the deployment and returns a DeploymentNode object, a wrapping of the TFMnistModel deployment object, that you can then use to connect with other DeploymentNodes to form a more complex deployment graph. :::

Finally, deploy the model to Ray Serve through the terminal.

console

$ serve run tutorial_tensorflow:mnist_model

Next, query the model. While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:

python

import requests
import numpy as np

resp = requests.get(
    "http://localhost:8000/", json={"array": np.random.randn(28 * 28).tolist()}
)
print(resp.json())

You should get an output like the following, although the exact prediction may vary:

bash

{
 "prediction": [[-1.504277229309082, ..., -6.793371200561523]],
 "file": "/tmp/mnist_model.h5"
}

::::

::::{tab-item} PyTorch

This example loads and deploys a PyTorch ResNet model. In particular, it shows:

How to load the model from PyTorch's pre-trained Model Zoo.
How to parse the JSON request, transform the payload and make a prediction.

This tutorial requires PyTorch and Torchvision. Ray Serve is framework agnostic and works with any version of PyTorch. You also need requests to send HTTP requests to your model deployment. If you haven't already, install them by running:

console

$ pip install torch torchvision requests  "ray[serve]"

Open a new Python file called tutorial_pytorch.py. First, import Ray Serve and some other helpers.

{literalinclude}

:start-after: __doc_import_begin__
:end-before: __doc_import_end__

Define a class ImageModel that parses the input data, transforms the images, and runs the ResNet18 model loaded from torchvision. The @serve.deployment decorator makes it a deployment object that you can deploy onto Ray Serve. Note that Ray Serve exposes the deployment over an HTTP route. By default, when the deployment receives a request over HTTP, Ray Serve invokes the __call__ method.

{literalinclude}

:start-after: __doc_define_servable_begin__
:end-before: __doc_define_servable_end__

:::{note} When you deploy and instantiate an ImageModel class, Ray Serve loads the ResNet18 model from torchvision so that it can be ready to run inference on the model and serve requests later. :::

Now that you've defined the Serve deployment, prepare it so that you can deploy it.

{literalinclude}

:start-after: __doc_deploy_begin__
:end-before: __doc_deploy_end__

:::{note} ImageModel.bind() returns a DeploymentNode object, a wrapping of the ImageModel deployment object, that you can then use to connect with other DeploymentNodes to form a more complex deployment graph. :::

Finally, deploy the model to Ray Serve through the terminal.

console

$ serve run tutorial_pytorch:image_model

Next, query the model. While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:

python

import requests

ray_logo_bytes = requests.get(
    "https://raw.githubusercontent.com/ray-project/"
    "ray/master/doc/source/images/ray_header_logo.png"
).content

resp = requests.post("http://localhost:8000/", data=ray_logo_bytes)
print(resp.json())

You should get an output like the following, although the exact number may vary:

bash

{'class_index': 919}

::::

::::{tab-item} Scikit-learn

This example trains and deploys a simple scikit-learn classifier. In particular, it shows:

How to load the scikit-learn model from file system in your Ray Serve definition.
How to parse the JSON request and make a prediction.

Ray Serve is framework-agnostic. You can use any version of sklearn. You also need requests to send HTTP requests to your model deployment. If you haven't already, install scikit-learn and requests by running:

console

$ pip install scikit-learn requests "ray[serve]"

Open a new Python file called tutorial_sklearn.py. Import Ray Serve and some other helpers.

{literalinclude}

:start-after: __doc_import_begin__
:end-before: __doc_import_end__

Train a Classifier

Next, train a classifier with the Iris dataset.

First, instantiate a GradientBoostingClassifier loaded from scikit-learn.

{literalinclude}

:start-after: __doc_instantiate_model_begin__
:end-before: __doc_instantiate_model_end__

Next, load the Iris dataset and split the data into training and validation sets.

{literalinclude}

:start-after: __doc_data_begin__
:end-before: __doc_data_end__

Then, train the model and save it to a file.

{literalinclude}

:start-after: __doc_train_model_begin__
:end-before: __doc_train_model_end__

Deploy with Ray Serve

Finally, you're ready to deploy the classifier using Ray Serve.

Define a BoostingModel class that runs inference on the GradientBoosingClassifier model you trained and returns the resulting label. It's decorated with @serve.deployment to make it a deployment object so you can deploy it onto Ray Serve. Note that Ray Serve exposes the deployment over an HTTP route. By default, when the deployment receives a request over HTTP, Ray Serve invokes the __call__ method.

{literalinclude}

:start-after: __doc_define_servable_begin__
:end-before: __doc_define_servable_end__

:::{note} When you deploy and instantiate a BoostingModel class, Ray Serve loads the classifier model that you trained from the file system so that it can be ready to run inference on the model and serve requests later. :::

After you've defined the Serve deployment, prepare it so that you can deploy it.

{literalinclude}

:start-after: __doc_deploy_begin__
:end-before: __doc_deploy_end__

:::{note} BoostingModel.bind(MODEL_PATH, LABEL_PATH) binds the arguments MODEL_PATH and LABEL_PATH to the deployment and returns a DeploymentNode object, a wrapping of the BoostingModel deployment object, that you can then use to connect with other DeploymentNodes to form a more complex deployment graph. :::

Finally, deploy the model to Ray Serve through the terminal.

console

$ serve run tutorial_sklearn:boosting_model

Next, query the model. While Serve is running, open a separate terminal window, and run the following in an interactive Python shell or a separate Python script:

python

import requests

sample_request_input = {
    "sepal length": 1.2,
    "sepal width": 1.0,
    "petal length": 1.1,
    "petal width": 0.9,
}
response = requests.get("http://localhost:8000/", json=sample_request_input)
print(response.text)

You should get an output like the following, although the exact prediction may vary:

python

{"result": "versicolor"}

::::

:::::