doc/source/serve/advanced-guides/inplace-updates.md
(serve-inplace-updates)=
You can update your Serve applications once they're in production by updating the settings in your config file and redeploying it using the serve deploy command. In the redeployed config file, you can add new deployment settings or remove old deployment settings. This is because serve deploy is idempotent, meaning your Serve application's config always matches (or honors) the latest config you deployed successfully – regardless of what config files you deployed before that.
(serve-in-production-lightweight-update)=
Lightweight config updates modify running deployment replicas without tearing them down and restarting them, so there's less downtime as the deployments update. For each deployment, modifying the following values is considered a lightweight config update, and won't tear down the replicas for that deployment:
num_replicasautoscaling_configuser_configmax_ongoing_requestsgraceful_shutdown_timeout_sgraceful_shutdown_wait_loop_shealth_check_period_shealth_check_timeout_s(serve-updating-user-config)=
This example uses the text summarization and translation application from the production guide. Both of the individual deployments contain a reconfigure() method. This method allows you to issue lightweight updates to the deployments by updating the user_config.
First let's deploy the graph. Make sure to stop any previous Ray cluster using the CLI command ray stop for this example:
$ ray start --head
$ serve deploy serve_config.yaml
Then send a request to the application:
:language: python
:start-after: __start_client__
:end-before: __end_client__
Change the language that the text is translated into from French to German by changing the language attribute in the Translator user config:
...
applications:
- name: default
route_prefix: /
import_path: text_ml:app
runtime_env:
pip:
- torch
- transformers
deployments:
- name: Translator
num_replicas: 1
user_config:
language: german
...
Without stopping the Ray cluster, redeploy the app using serve deploy:
$ serve deploy serve_config.yaml
...
We can inspect our deployments with serve status. Once the application's status returns to RUNNING, we can try our request one more time:
$ serve status
proxies:
cef533a072b0f03bf92a6b98cb4eb9153b7b7c7b7f15954feb2f38ec: HEALTHY
applications:
default:
status: RUNNING
message: ''
last_deployed_time_s: 1694041157.2211847
deployments:
Translator:
status: HEALTHY
replica_states:
RUNNING: 1
message: ''
Summarizer:
status: HEALTHY
replica_states:
RUNNING: 1
message: ''
The language has updated. Now the returned text is in German instead of French.
:language: python
:start-after: __start_second_client__
:end-before: __end_second_client__
Changing the following values in a deployment's config will trigger redeployment and restart all the deployment's replicas.
ray_actor_optionsplacement_group_bundlesplacement_group_strategyChanging the following application-level config values is also considered a code update, and all deployments in the application will be restarted.
import_pathruntime_env:::{warning}
Although you can update your Serve application by deploying an entirely new deployment graph using a different import_path and a different runtime_env, this is NOT recommended in production.
The best practice for large-scale code updates is to start a new Ray cluster, deploy the updated code to it using serve deploy, and then switch traffic from your old cluster to the new one.
:::