doc/source/serve/advanced-guides/grpc-guide.md
(serve-set-up-grpc-service)=
This section helps you understand how to:
(custom-serve-grpc-service)=
Running a gRPC server starts with defining gRPC services, RPC methods, and protobufs similar to the one below.
:start-after: __begin_proto__
:end-before: __end_proto__
:language: proto
This example creates a file named user_defined_protos.proto with two
gRPC services: UserDefinedService and ImageClassificationService.
UserDefinedService has three RPC methods: __call__, Multiplexing, and Streaming.
ImageClassificationService has one RPC method: Predict. Their corresponding input
and output types are also defined specifically for each RPC method.
Once you define the .proto services, use grpcio-tools to compile python
code for those services. Example command looks like the following:
python -m grpc_tools.protoc -I=. --python_out=. --grpc_python_out=. ./user_defined_protos.proto
It generates two files: user_defined_protos_pb2.py and
user_defined_protos_pb2_grpc.py.
For more details on grpcio-tools see https://grpc.io/docs/languages/python/basics/#generating-client-and-server-code.
:::{note} Ensure that the generated files are in the same directory as where the Ray cluster is running so that Serve can import them when starting the proxies. :::
(start-serve-with-grpc-proxy)=
The Serve start CLI,
ray.serve.start API,
and Serve config files
all support starting Serve with a gRPC proxy. Two options are related to Serve's
gRPC proxy: grpc_port and grpc_servicer_functions. grpc_port is the port for gRPC
proxies to listen to. It defaults to 9000. grpc_servicer_functions is a list of import
paths for gRPC add_servicer_to_server functions to add to a gRPC proxy. It also
serves as the flag to determine whether to start gRPC server. The default is an empty
list, meaning no gRPC server is started.
::::{tab-set}
:::{tab-item} CLI
ray start --head
serve start \
--grpc-port 9000 \
--grpc-servicer-functions user_defined_protos_pb2_grpc.add_UserDefinedServiceServicer_to_server \
--grpc-servicer-functions user_defined_protos_pb2_grpc.add_ImageClassificationServiceServicer_to_server
:::
:::{tab-item} Python API
:start-after: __begin_start_grpc_proxy__
:end-before: __end_start_grpc_proxy__
:language: python
:::
:::{tab-item} Serve config file
# config.yaml
grpc_options:
port: 9000
grpc_servicer_functions:
- user_defined_protos_pb2_grpc.add_UserDefinedServiceServicer_to_server
- user_defined_protos_pb2_grpc.add_ImageClassificationServiceServicer_to_server
applications:
- name: app1
route_prefix: /app1
import_path: test_deployment_v2:g
runtime_env: {}
- name: app2
route_prefix: /app2
import_path: test_deployment_v2:g2
runtime_env: {}
# Start Serve with above config file.
serve run config.yaml
:::
::::
:::{note}
The default max gRPC message size is ~2GB. To adjust it, set RAY_SERVE_GRPC_MAX_MESSAGE_SIZE (in bytes) before starting Ray, e.g., export RAY_SERVE_GRPC_MAX_MESSAGE_SIZE=104857600 for 100MB.
:::
(deploy-serve-grpc-applications)=
gRPC applications in Serve works similarly to HTTP applications. The only difference is
that the input and output of the methods need to match with what's defined in the .proto
file and that the method of the application needs to be an exact match (case sensitive)
with the predefined RPC methods. For example, if we want to deploy UserDefinedService
with __call__ method, the method name needs to be __call__, the input type needs to
be UserDefinedMessage, and the output type needs to be UserDefinedResponse. Serve
passes the protobuf object into the method and expects the protobuf object back
from the method.
Example deployment:
:start-after: __begin_grpc_deployment__
:end-before: __end_grpc_deployment__
:language: python
Deploy the application:
:start-after: __begin_deploy_grpc_app__
:end-before: __end_deploy_grpc_app__
:language: python
:::{note}
route_prefix is still a required field as of Ray 2.7.0 due to a shared code path with
HTTP. Future releases will make it optional for gRPC.
:::
(send-serve-grpc-proxy-request)=
Sending a gRPC request to a Serve deployment is similar to sending a gRPC request to any other gRPC server. Create a gRPC channel and stub, then call the RPC method on the stub with the appropriate input. The output is the protobuf object that your Serve application returns.
Sending a gRPC request:
:start-after: __begin_send_grpc_requests__
:end-before: __end_send_grpc_requests__
:language: python
Read more about gRPC clients in Python: https://grpc.io/docs/languages/python/basics/#client
(serve-grpc-proxy-health-checks)=
Similar to HTTP /-/routes and /-/healthz endpoints, Serve also provides gRPC
service method to be used in health check.
/ray.serve.RayServeAPIService/ListApplications is used to list all applications
deployed in Serve./ray.serve.RayServeAPIService/Healthz is used to check the health of the proxy.
It returns OK status and "success" message if the proxy is healthy.The service method and protobuf are defined as below:
message ListApplicationsRequest {}
message ListApplicationsResponse {
repeated string application_names = 1;
}
message HealthzRequest {}
message HealthzResponse {
string message = 1;
}
service RayServeAPIService {
rpc ListApplications(ListApplicationsRequest) returns (ListApplicationsResponse);
rpc Healthz(HealthzRequest) returns (HealthzResponse);
}
You can call the service method with the following code:
:start-after: __begin_health_check__
:end-before: __end_health_check__
:language: python
:::{note}
Serve provides the RayServeAPIServiceStub stub, and HealthzRequest and
ListApplicationsRequest protobufs for you to use. You don't need to generate them
from the proto file. They are available for your reference.
:::
(serve-grpc-metadata)=
Just like HTTP headers, gRPC also supports metadata to pass request related information. You can pass metadata to Serve's gRPC proxy and Serve knows how to parse and use them. Serve also passes trailing metadata back to the client.
List of Serve accepted metadata keys:
application: The name of the Serve application to route to. If not passed and only
one application is deployed, serve routes to the only deployed app automatically.request_id: The request ID to track the request.multiplexed_model_id: The model ID to do model multiplexing.List of Serve returned trailing metadata keys:
request_id: The request ID to track the request.Example of using metadata:
:start-after: __begin_metadata__
:end-before: __end_metadata__
:language: python
(serve-grpc-proxy-more-examples)=
gRPC proxy supports all four gRPC streaming types:
The Streaming method is deployed with the app named "app1" above. The following code
gets a streaming response.
:start-after: __begin_streaming__
:end-before: __end_streaming__
:language: python
Client streaming allows clients to send a stream of requests and receive a single response.
Use the gRPCInputStream class to iterate over incoming request messages.
Define a proto file with a client streaming RPC:
service UserDefinedService {
rpc ClientStreaming(stream UserDefinedMessage) returns (UserDefinedResponse);
}
Deployment example:
:start-after: __begin_client_streaming_deployment__
:end-before: __end_client_streaming_deployment__
:language: python
Client code:
:start-after: __begin_client_streaming_client__
:end-before: __end_client_streaming_client__
:language: python
Bidirectional streaming allows clients to send and receive streams of messages simultaneously.
Define a proto file with a bidirectional streaming RPC:
service UserDefinedService {
rpc BidiStreaming(stream UserDefinedMessage) returns (stream UserDefinedResponse);
}
Deployment example:
:start-after: __begin_bidi_streaming_deployment__
:end-before: __end_bidi_streaming_deployment__
:language: python
Client code:
:start-after: __begin_bidi_streaming_client__
:end-before: __end_bidi_streaming_client__
:language: python
You can access the gRPC context in streaming methods by adding a grpc_context parameter:
:start-after: __begin_streaming_with_context__
:end-before: __end_streaming_with_context__
:language: python
Assuming we have the below deployments. ImageDownloader and DataPreprocessor are two
separate steps to download and process the image before PyTorch can run inference.
The ImageClassifier deployment initializes the model, calls both
ImageDownloader and DataPreprocessor, and feed into the resnet model to get the
classes and probabilities of the given image.
:start-after: __begin_model_composition_deployment__
:end-before: __end_model_composition_deployment__
:language: python
We can deploy the application with the following code:
:start-after: __begin_model_composition_deploy__
:end-before: __end_model_composition_deploy__
:language: python
The client code to call the application looks like the following:
:start-after: __begin_model_composition_client__
:end-before: __end_model_composition_client__
:language: python
:::{note}
At this point, two applications are running on Serve, "app1" and "app2". If more
than one application is running, you need to pass application to the
metadata so Serve knows which application to route to.
:::
(serve-grpc-proxy-error-handling)=
Similar to any other gRPC server, request throws a grpc.RpcError when the response
code is not "OK". Put your request code in a try-except block and handle
the error accordingly.
:start-after: __begin_error_handle__
:end-before: __end_error_handle__
:language: python
Serve uses the following gRPC error codes:
NOT_FOUND: When multiple applications are deployed to Serve and the application is
not passed in metadata or passed but no matching application.UNAVAILABLE: Only on the health check methods when the proxy is in draining state.
When the health check is throwing UNAVAILABLE, it means the health check failed on
this node and you should no longer route to this node.DEADLINE_EXCEEDED: The request took longer than the timeout setting and got cancelled.INTERNAL: Other unhandled errors during the request.(serve-grpc-proxy-grpc-context)=
Serve provides a gRPC context object
to the deployment replica to get information
about the request as well as setting response metadata such as code and details.
If the handler function is defined with a grpc_context argument, Serve will pass a
RayServegRPCContext object
in for each request. Below is an example of how to set a custom status code,
details, and trailing metadata. You can also set a status code before raising an
exception, and Serve will preserve that status code in the error response. This is
useful for returning meaningful status codes like RESOURCE_EXHAUSTED (retryable)
or INVALID_ARGUMENT (not retryable) instead of the generic INTERNAL error.
:start-after: __begin_grpc_context_define_app__
:end-before: __end_grpc_context_define_app__
:language: python
The client code is defined like the following to get those attributes.
:start-after: __begin_grpc_context_client__
:end-before: __end_grpc_context_client__
:language: python
:::{note}
If the handler raises an unhandled exception without setting a status code on the
RayServegRPCContext object, Serve returns an INTERNAL error code with the
exception message in the details. However, if you set a status code on the context
before raising the exception, Serve preserves that status code in the response.
This allows you to return meaningful status codes like INVALID_ARGUMENT or
RESOURCE_EXHAUSTED even when raising exceptions.
:::