docs/worker-versioning.md
Please join the #safe-deploys channel in the community slack for further information.
Note 1: In this iteration of Worker Versioning we deprecated the Version Set concept and APIs related to it. If you are using old Worker Versioning APIs please migrate to the new APIs using the process outlined in a later section.
Note 2: Worker Versioning is still in Pre-Release stage and not recommended for production usage. Future breaking changes may be made if deemed necessary. We love feedback! Please reach out in the Temporal Slack or at community.temporal.io with any thoughts.
Worker Versioning simplifies the process of deploying changes to Worker Programs. It does this by letting you specify a Build ID for your Worker. Temporal Server uses the Build ID to route each Workflow and/or Activity to a Worker instance that can process it.
Worker Versioning guarantees that Workflow Executions started on a particular Build ID will only be processed by Workers of the same Build ID, unless instructed otherwise (via Redirect Rules). With this guarantee, you can make any (non-deterministic) change to your Worker Programs freely and instruct Temporal Server to send new executions to the new Build ID and let old executions run on their old Build IDs to completion.
When using Worker Versioning, you may need to run multiple versions of your Worker for some time, until Workers of old Build IDs are not needed anymore (i.e. they are not reachable by any current or future Workflow Execution). Temporal provides Reachability API to help determine when a Worker version can be decommissioned.
Worker versioning is currently optimized for short-running workflows.
The main reason to use this feature is that it frees you from having to worry about nondeterministic changes. This is a significant gain, however it comes at the cost of running multiple versions of Workers simultaneously.
For this reason, Worker Versioning is best suited for workers that run only short-lived workflows. Because the workers with the old Build ID are only needed to be kept for a short window during/after a deployment until all open workflows belonging to the old Build ID close.
For long-running Workflows, you can still use Worker Versioning using one of the following approaches:
UseAssignmentRules VersioningIntent
so each new CaN execution starts on the latest Build ID.When juggling multiple Worker versions, you need each version to be able to handle the load
placed upon it. This can be difficult to calculate. Therefore, the safest option is to operate
each version at the normal capacity during your deployment.
For this reason, we recommend a blue-green deploy strategy.
Temporal Server version v1.24.0 or higher is needed to use the new Worker Versioning API. In addition, you need to use Temporal CLI or Go SDK for updating versioning rules (typically needed only when you do a new deployment). You can use the SDK of your choice, as listed below, to create and run workers that can opt into Worker Versioning.
In order to run versioned workers and update the versioning rules for each task queue, Worker Versioning needs to be enabled in the server.
To start a dev cluster with versioning enabled:
temporal server start-dev \
--dynamic-config-value frontend.workerVersioningWorkflowAPIs=true \
--dynamic-config-value frontend.workerVersioningRuleAPIs=true
To enable Worker Versioning in a self-hosted server, the following dynamic config fields must be set to true.
They can be set globally, per Namespace, or per Task Queue:
frontend.workerVersioningRuleAPIsfrontend.workerVersioningWorkflowAPIsIn Temporal Cloud, open a ticket against the support team to enable the feature and be included in pre-release.
This walkthrough assumes that there is some constant load regularly starting short-running workflow executions and an existing worker processing them. The workflow/worker could be versioned or unversioned.
In the worker options, pass in the Build ID your worker will poll on, and opt in to use the Build ID for versioning. The best practice is to provide a new Build ID generated by your build pipeline whenever a new build of your code is made. Below is an example written in Go.
w := worker.New(
c, "my-tq", worker.Options{
UseBuildIDForVersioning: true,
BuildID: os.Getenv("BUILD_ID"),
},
)
Build and deploy your worker with the tool of your choice (i.e. docker). The worker will begin polling for tasks scheduled to the Build ID you specified, but it won't receive any tasks until the versioning rules are updated.
Add an assignment rule targeting your Build ID to send 1% of new tasks to your versioned worker.
temporal task-queue versioning insert-assignment-rule --task-queue my-tq --build-id $BUILD_ID --percentage 1
Once this rule is created, 1% of new workflow executions will be sent to the versioned worker that you started above. The remaining 99% of new workflow executions will be assigned to the default Build ID, which is the first Build ID in the assignment rule list that has a ramp of 100%, or unversioned if no such rule exists.
You can check the progress of workflows filtered by Build ID like this:
temporal workflow list --query "BuildIds = 'assigned:$BUILD_ID'"
This will atomically delete the partially-ramped rule you added in step 4 and replace it with a rule with 100% ramp.
temporal task-queue versioning commit-build-id --task-queue my-tq --build-id $BUILD_ID
See the --help output for more details:
% temporal task-queue versioning commit-build-id --help
Completes the rollout of a BuildID and cleans up unnecessary rules possibly created during a gradual rollout. Specifically, this command will make the following changes atomically: 1. Adds an unconditional assignment rule for the target Build ID at the end of the list. 2. Removes all previously added assignment rules to the given target Build ID. 3. Removes any unconditional assignment rules for other Build IDs.
To prevent committing invalid Build IDs, we reject the request if no pollers have been seen recently for this Build ID. Use the force option to disable this validation.
Check the reachability of your old version. Now that all new workflow executions are assigned to the new Build ID,
your old worker (that was running in step 0) will not receive any tasks from new workflow executions. Running workflow
executions will send their outstanding tasks to workers with the version that the workflow execution is assigned to
(barring other instructions via Redirect Rules). Unless your application performs queries to closed workflows, a worker is no longer needed after the reachability status transitions from REACHABLE to CLOSED_WORKFLOWS_ONLY.
temporal task-queue describe --task-queue my-tq --select-build-id $OLD_BUILD_ID --report-reachability
Note: If your previous default was unversioned, replace --select-build-id $OLD_BUILD_ID with --select-unversioned in the above command.
If you omit the --select-* flags, results for the current default Build ID will be returned.
After the reachability status for your old version and task queue is CLOSED_WORKFLOWS_ONLY you can safely
decommission your old worker(s). See Build ID Reachability for more details.
You can use temporal task-queue versioning commands to update and read the Versioning
rules of a given Task Queue. Here are a few examples:
Add Assignment rule: send 10% of new executions to Build ID "abc-123".
temporal task-queue versioning insert-assignment-rule --task-queue MY_TQ --build-id abc-123 --percentage 10
Commit Build ID to complete the rollout of "abc-123" and cleanup unnecessary rules.
temporal task-queue versioning commit-build-id --task-queue MY_TQ --build-id abc-123
Add Redirect rule from "abc-123" to "xyz-789":
temporal task-queue versioning add-redirect-rule --task-queue MY_TQ --source-build-id abc-123 --target-build-id xyz-789
List Versioning rules:
temporal task-queue versioning get-rules --task-queue MY_TQ
You can use temporal task-queue describe command to get reachability status of a Build ID.
Example: report reachability of Build ID "abc-123" on Task Queue MY_TQ:
temporal task-queue describe --task-queue MY_TQ --select-build-id abc-123 --report-reachability
Temporal Server allows you to manage routing of tasks to Build IDs via Worker Versioning Rules.
Worker Versioning rules and added to a given Task Queue. If your Worker Program contains multiple Workers polling multiple Task Queues, you would need to separately update the rules of each Task Queue.
There are two types of rules: Build ID Assignment rules and Build ID Redirect rules.
Assignment rules are used to assign a Build ID for a new execution when it starts. Their primary use case is to specify the latest Build ID, but they have powerful features for gradual rollout of a new Build ID.
Once a Workflow Execution is assigned to a build ID, and it completes its first Workflow Task, the workflow stays on that Build ID regardless of changes in Assignment rules. This eliminates the need for compatibility between versions when you only care about using the new version for new Workflows and let existing Workflows finish in their own version.
Activities, Child Workflows and Continue-as-New executions have the option to inherit the Build ID of their parent/previous Workflow or use the latest Assignment rules to independently select a Build ID. This is specified by the parent/previous Workflow using VersioningIntent. We recommend that you allow Continued-as-New workflows to be assigned new versions, otherwise your workflow may become like a long-running workflow that gets stuck on the old build.
Unless there's a redirect rule for it, the task will be dispatched to Workers of the Build ID determined by the Assignment rules (or inherited).
When using Worker Versioning on a Task Queue, in the steady state,
there should typically be a single assignment rule to send all new executions
to the latest Build ID. Existence of at least one such "unconditional"
rule at all times is enforced by the system, unless the force flag is used
by the user when replacing/deleting these rules (for exceptional cases).
During a deployment, one or more additional rules can be added to assign a subset of the tasks to a new Build ID based on a "ramp percentage".
When there are multiple assignment rules for a Task Queue, the rules are evaluated in order, starting from index 0. The first applicable rule will be applied and the rest will be ignored.
In the event that no assignment rule is applicable on a task (or the Task Queue is simply not versioned), the tasks will be dispatched to an unversioned Worker.
Redirect rules should only be used when you want to move workflows and activities assigned to one Build ID (source) to another compatible Build ID (target). You are responsible to make sure the target Build ID of a redirect rule is able to process event histories made by the source Build ID by using Patching or other means.
Here are situations you might need a redirect rule:
Redirect rules can be chained.
Temporal Server can help you decide when to safely decommission workers of an old Build ID by providing Task Reachability status for that Build ID.
The reachability status can be one of the following:
It's safe to rely on Reachability status for shutting down old workers. However, it may take a bit longer to converge due to possible delays, usually no more than a few minutes, in the status being updated.
Future activities who inherit their workflow's Build ID but not its Task Queue will not be accounted for reachability as server cannot know if they'll happen as they do not use assignment rules of their Task Queue. Same goes for Child Workflows or Continue-As-New Workflows who inherit the parent/previous workflow's Build ID but not its Task Queue. In those cases, make sure to query reachability for the parent/previous workflow's Task Queue as well.
Worker Versioning is currently in Pre-Release. We will continue to improve this feature before public preview. We plan to respond to your feedback, so please reach out! In addition, here are likely improvements and behavior changes being considered for public preview:
To migrate your existing Task Queue, follow the normal procedure as if you upgrade from one Build ID to another.
The only limitation is that, as of now, redirect rules with unversioned source is not supported. Hence, if you want to redirect your long-running unversioned Workflows to a Build ID that is not possible. (This may change in the future.)
If you are using old Versioning API (i.e. using Version Sets) you can easily migrate to the new API.
You don't need to change your worker code or Task Queue name. Only for your next
deployment, add an Assignment rule instead of a Version Set (incompatible Build ID).
The assignment rule can generally be done using the insert-assignment-rule command,
however, commit-build-id provides an idempotent replacement to both (now-deprecated)
promote-set and add-new-default operations.
The Version Sets added previously will be present and be used for the old Workflow executions. They will be cleaned up automatically once all their workflows pass their retention time.
Note that:
temporal task-queue get-build-ids