docs/v3/concepts/global-concurrency-limits.mdx
Global concurrency limits provide a mechanism to control the number of concurrent operations in your workflows, enabling precise resource management and system stability. They work by allocating a fixed number of "slots" that must be acquired before an operation can proceed.
Global concurrency limits allow you to manage execution efficiently by controlling how many tasks, flows, or other operations can run simultaneously. Unlike other concurrency controls in Prefect that are scoped to specific objects (like deployments or work pools), global concurrency limits can be applied to any Python-based operation in your codebase.
They are ideal for:
While both global concurrency limits and rate limits control execution flow, they serve different purposes and work differently:
Concurrency limits control how many operations can run at the same time. When you use the concurrency context manager, a slot is occupied for the entire duration of the operation and released when the operation completes.
Rate limits control how frequently operations can start. When you use the rate_limit function, a slot is occupied briefly and then released automatically at a controlled rate determined by slot_decay_per_second.
The core difference is when slots are released:
Choose concurrency limits when:
Choose rate limits when:
Global concurrency limits use a slot-based system:
Each time a concurrency slot is occupied, a countdown begins on the server. The length of this countdown is known as the concurrency slot's lease duration. While a concurrency slot is occupied, the Prefect client periodically notifies the server that the slot is still in use and restarts the countdown.
If the countdown concludes before the lease has been renewed, the concurrency slot is released.
Lease expiration typically occurs when a process occupying a slot exits unexpectedly and is unable to notify the server that the slot should be released. This system exists to ensure that all concurrency slots are eventually released to prevent concurrency-related deadlocks.
The default lease duration is 5 minutes, but custom durations with a minimum of 1 minute can be supplied to the concurrency context manager.
Lease renewal failures and strict mode
If the Prefect client is unable to renew a lease (due to network issues, server unavailability, or other connectivity problems), the behavior depends on the parameters passed to the concurrency context manager:
strict=False, raise_on_lease_renewal_failure=None): If lease renewal fails, a warning is logged but execution continues. This provides resilience against temporary connectivity issues.strict=True): If lease renewal fails, execution stops immediately with an error. This ensures that operations only proceed when concurrency enforcement can be guaranteed.raise_on_lease_renewal_failure: Controls lease renewal failure behavior independently of the strict parameter. Set to True to terminate on renewal failure, or False to continue despite renewal failures. When None (the default), the strict parameter value is used for backward compatibility.Use strict=True when you need absolute certainty that concurrency limits are being enforced. Use raise_on_lease_renewal_failure=False with strict=True when you want slot acquisition to be strict but long-running tasks to tolerate transient lease renewal errors.
Global concurrency limits can be in an active or inactive state:
You can toggle a limit between active and inactive states to enable or disable concurrency enforcement without changing your code.
Slot decay is the mechanism that enables rate limiting functionality. When you configure a concurrency limit with slot_decay_per_second, slots are automatically released over time rather than waiting for an operation to complete.
How slot decay works:
Configuring decay rates:
For example:
Choose a decay rate that balances your required frequency of execution with acceptable system load.
<Note> When using the `rate_limit` function, the concurrency limit must have a slot decay configured. Attempting to use `rate_limit` with a limit that has no slot decay will result in an error. </Note>Prefect provides several mechanisms to control concurrency, each suited for different use cases:
| Concurrency Type | Scope | Use Case |
|---|---|---|
| Global concurrency limits | Any Python operation | General-purpose concurrency control for database connections, API calls, or any resource |
| Work pool flow run limits | Flows in a work pool | Limit concurrent flows on specific infrastructure |
| Work queue flow run limits | Flows in a work queue | Priority-based flow execution control |
| Deployment flow run limits | Specific deployment | Prevent concurrent runs of a specific deployment |
| Tag-based task concurrency limits | Prefect tasks with tags | Limit concurrent Prefect task runs with specific tags |
Key distinction: Global concurrency limits are the most flexible option—they can be applied to any Python-based operation, not just Prefect-specific objects. This makes them ideal for controlling access to external resources like databases, APIs, or file systems.
Use global concurrency limits to prevent resource exhaustion:
Use rate limits to maintain system stability:
Use global concurrency limits for fine-grained control:
For practical implementation examples, see how to apply global concurrency and rate limits.