docs/root/intro/arch_overview/intro/threading_model.rst
.. _arch_overview_threading:
Envoy uses a "single process, multiple threads" architecture. This model is designed to be highly concurrent and non-blocking. This allows a single Envoy process to efficiently handle a massive number of active connections and requests.
.. note::
Matt Klein wrote a detailed blog post <https://blog.envoyproxy.io/envoy-threading-model-a8d44b922310>_
on the Envoy threading model. Though it is a bit old, it is still accurate and a great companion to this
document.
At a high level, the threading model consists of three main components:
--concurrency flag)
that handle the actual listening, filtering, and forwarding of network traffic... tip:: For most workloads, we recommend configuring the number of worker threads to be equal to the number of hardware threads on the machine. This maximizes CPU utilization without incurring excessive context switching overhead.
Listener connection balancing ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When a connection is accepted by a listener, it is bound to a single worker thread for its entire lifetime. This design means that Envoy's "hot path" is highly parallelized and it avoids complex locking for the vast majority of request processing. In general, Envoy is written to be non-blocking.
By default, there is no coordination between worker threads. This means that all worker threads independently attempt to accept connections on each listener and rely on the kernel to perform the balancing between threads.
For most workloads, the kernel does a very good job of balancing incoming connections. However, for
some workloads with a small number of very long lived connections (e.g., service mesh HTTP2/gRPC
egress), it might be desirable to have Envoy forcibly balance connections between worker threads.
To support this behavior, Envoy allows for different types of
:ref:connection balancing <envoy_v3_api_field_config.listener.v3.Listener.connection_balance_config> to be configured on each
:ref:listener <arch_overview_listeners>.
.. note:: On Windows the kernel is not able to balance the connections properly with the async IO model that Envoy is using.
Until this is fixed by the platform, Envoy will enforce listener connection balancing on Windows. This allows us to balance connections between different worker threads. However, this behavior comes with a performance penalty.
Dispatcher ^^^^^^^^^^
At the core of Envoy's threading model is the Event::Dispatcher <https://github.com/envoyproxy/envoy/blob/main/envoy/event/dispatcher.h>_. Each thread (Main and Worker)
runs a loop rooted in a Dispatcher. This is a wrapper around libevent (or other event loops
in the future) that manages:
The Dispatcher allows code to be written in a single-threaded, non-blocking style. Instead of blocking on I/O, you register a callback that triggers when the I/O is ready.
When running the dispatcher, you can specify how it should run using Event::Dispatcher::RunType:
dispatcher.exit() is called. This is the standard mode for long-running threads like the main thread or workers.io_uring ^^^^^^^^
io_uring <https://man7.org/linux/man-pages/man7/io_uring.7.html>_ is an asynchronous I/O interface for Linux kernels that
can offer significant performance improvements over standard syscalls. Envoy supports using
io_uring for specific I/O operations.
In Envoy's threading model, io_uring is integrated directly into the event loop of each worker
thread.
io_uring instance
(submission and completion queues). This adheres to Envoy's shared-nothing architecture and
avoids locking contention between workers.io_uring completion queue is monitored via an eventfd which
is registered with the thread's Event::Dispatcher.eventfd. The Dispatcher wakes up, executes the callback, and processes the I/O
completion just like any other file event.This allows io_uring to coexist transparently with other event-driven mechanisms in Envoy.
Thread Local Storage (TLS) ^^^^^^^^^^^^^^^^^^^^^^^^^^
Envoy relies heavily on Thread Local Storage (TLS) to avoid locking contention. The
ThreadLocal::Instance <https://github.com/envoyproxy/envoy/blob/main/envoy/thread_local/thread_local.h>_ interface provides a mechanism to store data that is local to each thread
but accessible (mostly) via a global slot index.
Main Thread Responsibilities ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The Main Thread receives special treatment. Its responsibilities include:
/admin endpoint logic.Exceptions and Extensions ^^^^^^^^^^^^^^^^^^^^^^^^^
While the core model is strict, some extensions may need their own threads to manage their memory or perform complex operations.
AsyncFileManagerThreadPool <https://github.com/envoyproxy/envoy/blob/main/source/extensions/common/async_files/async_file_manager_thread_pool.h>_ manages a pool of threads to perform
blocking file operations (like opening files or reading large bodies) without blocking the
non-blocking worker threads.CacheEvictionThread <https://github.com/envoyproxy/envoy/blob/main/source/extensions/http/cache/file_system_http_cache/cache_eviction_thread.h>_ runs in the background to enforce cache size
limits and evict old entries, preventing these expensive iterators from stalling the data path.GeoipProvider <https://github.com/envoyproxy/envoy/blob/main/source/extensions/geoip_providers/maxmind/geoip_provider.cc>_ (MaxMind) uses a background thread to reload the database file when it changes.GetAddrInfoDnsResolver <https://github.com/envoyproxy/envoy/blob/main/source/extensions/network/dns_resolver/getaddrinfo/getaddrinfo.cc>_ uses a pool of dedicated threads to perform blocking getaddrinfo calls, ensuring that the main event loop is never blocked by OS-level DNS resolution.If you are writing an Envoy extension, follow these patterns to ensure thread safety and performance.
Spawning Threads ^^^^^^^^^^^^^^^^
Developers might want to spawn threads for tasks that involve heavy processing which should not interfere with the request's "hot path", such as blocking I/O, complex computations, or background maintenance tasks (e.g. database reloading).
Do not use std::thread directly. Instead, use the Thread::ThreadFactory <https://github.com/envoyproxy/envoy/blob/main/envoy/thread/thread.h>_ available in the
Server::Configuration::FactoryContext <https://github.com/envoyproxy/envoy/blob/main/envoy/server/factory_context.h>_.
.. code-block:: cpp
// Good: Uses Envoy's tracking and instrumented thread factory. Thread::ThreadPtr my_thread = thread_factory.createThread(this { doStuff(); });
// BAD: Bypasses Envoy's thread tracking. std::thread my_thread(this { doStuff(); });
Using the factory ensures that:
Thread::ThreadId system.Dispatcher Access & Usage ^^^^^^^^^^^^^^^^^^^^^^^^^
You will often need to execute code on a specific thread.
ThreadLocal::Instance::runOnAllThreads to execute a closure on every worker.Event::Dispatcher::post(). This is thread-safe and allows you
to queue a unit of work to be executed in the target thread's loop... code-block:: cpp
// Example: Posting a task to the main thread's dispatcher from a worker main_thread_dispatcher_->post(this { // This code runs on the main thread updateGlobalStats(); });
Threading in Tests ^^^^^^^^^^^^^^^^^^
Envoy's testing philosophy prioritizes determinism. But alas, threaded code poses a challenge.
Unit Tests """"""""""
For unit tests, the goal is to verify logic without the non-determinism of real threads.
Thread::MockThreadFactory <https://github.com/envoyproxy/envoy/blob/main/test/mocks/thread/mocks.h>_ instead of the real
factory. This allows you to inspect what runnable was passed to the thread without spawning a
system thread.Event::SimulatedTimeSystem <https://github.com/envoyproxy/envoy/blob/main/test/test_common/simulated_time_system.h>_ to
control the flow of time and timer firing explicitly... code-block:: cpp
// Do this in your test fixture. NiceMockThread::MockThreadFactory thread_factory_; EXPECT_CALL(thread_factory_, createThread(_)).WillOnce(Invoke([](std::function<void()> cb) { // Execute the callback immediately or store it for later to simulate thread timing. cb(); return nullptr; }));
Integration Tests """""""""""""""""
Integration tests use real worker threads. To test them reliably, you must synchronize steps using
objects like absl::Notification.
.. code-block:: cpp
// Signal from the background thread. absl::Notification done; dispatcher_->post(&done { doWork(); done.Notify(); });
// Wait on the main test thread done.WaitForNotification();
This is an extremely powerful way to reproduce (or prevent) race conditions.
Threading in Integration Tests ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Integration tests in Envoy are designed to test the interaction between components using real threads. Understanding the threading model of the test framework is crucial for writing reliable tests.
Component Threading """""""""""""""""""
runs() loop of the server.ClientConnection that uses a Dispatcher. It delegates I/O events to the dispatcher which might be running on the same or different thread depending on the setup.Thread Synchronization """"""""""""""""""""""
To test race conditions or specific sequences of events across these threads, Envoy uses ThreadSynchronizer <https://github.com/envoyproxy/envoy/blob/main/source/common/common/thread_synchronizer.h>_.
thread_synchronizer.syncPoint("event_name").thread_synchronizer.waitOn("event_name") to block the test execution until the production code reaches that point, or thread_synchronizer.signal("event_name") to unblock a thread waiting at a sync point.This allows for deterministic testing of concurrent behaviors that would otherwise be flaky.
Custom Threads in Tests """""""""""""""""""""""
Sometimes tests effectively act as a client or a concurrent actor. Use Thread::ThreadFactory to spawn threads in tests to simulate concurrent events.
.. code-block:: cpp
// Example from TrackedWatermarkBufferTest auto thread1 = Thread::threadFactoryForTest().createThread(& { buffer1->add("a"); }); auto thread2 = Thread::threadFactoryForTest().createThread(& { buffer2->add("b"); }); // ... thread1->join(); thread2->join();
Simulated Time in Integration Tests """""""""""""""""""""""""""""""""""
When using TestUsingSimulatedTime (or SimulatedTimeSystem) in integration tests, time advances are broadcast to all schedulers.
Event::Dispatcher register their schedulers with the simulated time system.timeSystem().advanceTimeWait(duration), it advances the simulated time and wakes up all registered schedulers. The schedulers then process any timers that have expired due to the time advance.Envoy provides several tools to help debug threading issues.
Watchdog ^^^^^^^^
Envoy includes a configurable "Watchdog" system. It spawns a separate thread that monitors the liveness of the Main Thread and all Worker Threads.
Debug Assertions ^^^^^^^^^^^^^^^^
In debug builds (or when configured), Envoy employs thread-safety assertions.
ASSERT_IS_MAIN_OR_TEST_THREAD(): Ensures the current code is running on the main thread.ASSERT_IS_WORKER_THREAD(): (If defined) Ensures code is running on the expected thread (Guard Dog thread).Mutex Tracing ^^^^^^^^^^^^^
For performance debugging, Envoy supports mutex tracing (--enable-mutex-tracing). This allows
you to identify lock contention hotspots by recording hold times and wait times for contended
mutexes, viewable via the admin interface.