docs/security/research/graphics/gpu_command_buffer.md
Authors: [email protected]
Last updated: April 10, 2023
The GPU Command Buffer is used to create and manage high throughput communication paths using a combination of Mojo IPC and shared memory regions. It is currently used by Chrome to support three distinct methods of GPU-accelerated graphics: WebGL2, Skia rasterization, and WebGPU. Mojo IPC messages are used to communicate the metadata necessary to establish new shared memory regions and coordinate state among endpoints producing and consuming from the memory region. Communication paths are between two endpoints: one client and one service. Typically the service endpoint resides in the GPU process and clients reside in the renderer process, but there is support for exceptions such as when GPU functionality is forced to run as a thread in the browser process, or unit tests where everything runs in a single process. A single renderer process may host multiple clients, such as when web content utilizes both WebGL and WebGPU features, where each context has an independent dedicated Command Buffer.
The structure of Command Buffer data can be thought of as a layering of protocols. Common functionality is shared by all client-service types and specialized application-specific functionality is built on top. Core functionality provides the mechanisms to establish and manage the communication path, such as creating new shared memory regions and providing synchronization primitives. Specialized functionality pertains to application-specific features, such as shader configuration and execution.
Isolating untrustworthy content within a sandboxed process is a cornerstone of the Chrome security model. Communication paths that bridge the sandbox to more trustworthy processes are a key part of the inter-process security boundary. The separation of the signaling mechanism (i.e. Mojo IPC) from the data transfer mechanism (i.e. shared memory) creates the potential for state inconsistencies. The layered nature of communication creates the potential for nuanced, cross-protocol dependencies. The use of shared memory creates the potential for time-of-check-time-of-use issues and other state inconsistencies.
The Command Buffer investigation was part of a broader effort to identify areas of interest to attackers within the Chrome GPU acceleration stack. After narrowing our focus to code supporting the WebGPU subsystem, we still found the stack to be far larger and more complex than we could comprehensively audit in one pass. To further narrow scope we identified discrete components of functionality underpinning the WebGPU stack. The Command Buffer is one such component. Investigation of the Command Buffer proceeded in parallel to analysis of other GPU subsystems.
Command Buffer communication is structured in layers. The foundational or common layer is shared by all higher layers. We chose to start analysis with the common features because of their applicability to all higher layers. The Command Buffer also supports application-specific features that behave like higher level protocols. Examples of higher level layers include support for WebGL and WebGPU. It is these higher level features - specifically the implications of cross-protocol interactions with the low level features - that we believe represent the bulk of complexity and attack surface. However, these are not yet explored and give motivation to revisit this subsystem.
The Command Buffer is used in four different scenarios:
Each client type implements at least one of two additional classes that each provide a means of IPC signaling:
The Proxy use case is most interesting because it is the attack surface available from a sandboxed renderer process as deployed in real Chrome instances. However, we intended to pursue fuzzing and Chrome's primary fuzzer framework is based on single-process unit tests, so it was necessary to instead target the in-process implementation supported by unit tests even though it stubs-out or emulates interesting features.
Existing fuzzers for the Command Buffer were developed by the GPU team and predate our analysis. We studied these fuzzers - in particular how they bootstrap the graphics subsystem for testing - and developed new fuzzers with different generation strategies, finding one new high severity security bug in the same narrow feature set already covered by existing fuzzers. Much like our fuzzers from this first pass, the existing fuzzers targeted specific portions of functionality. An emerging theme for improving fuzzing Chrome at large applies here as well: layering and integration of fuzzers is expected to increase reachability of complex state and therefore increase aggregate fuzzer effectiveness.
In other words, the coverage resulting from a combination of individual fuzzers is greater than the sum of its parts. Consequently, we intend to incrementally extend and integrate fuzzers in order to exercise cross-feature complexity during future work in the graphics subsystem. The next step is integration with new and complementary fuzzer targeting the WebGPU Dawn Wire protocol.
We developed one new fuzzer tailored for the Command Buffer. Its design is similar to an existing fuzzer, but the new fuzzer differs in a few ways:
It so happens that most of the CommandBuffer features targeted by the new fuzzer also involve Mojo IPC.
Three types of CommandBuffer clients make direct use of its features: WebGL2, Skia Raster, and WebGPU. This section characterizes how each client type makes use of the CommandBuffer.
One renderer may act as many clients. For example, if web content makes use of both WebGL2 and WebGPU, the renderer would have at least two separate CommandBuffer sessions.
CommandBuffer commands all start with a header containing two fields: an 11-bit command identifier and a 21-bit size. The header is followed by zero or more data fields.
The structure allows for 11 bits of unique commands, each with customizable data payloads. This is the mechanism used to implement the common commands.
Just like the common commands, Skia Raster commands and WebGL2/GLES2 commands are also implemented as native CommandBuffer commands; each new command is assigned an identifier and their parameters are defined using CommandBuffer conventions. All native CommandBuffer commands are validated at the CommandBuffer level.
WebGPU takes a different approach: a few native CommandBuffer
commands
support many new Dawn
APIs.
In particular, a single CommandBuffer command called DawnCommands (code
pointer:
client
/
service)
implements the bulk of the Dawn Wire protocol.
Unlike native CommandBuffer commands, which each have a unique id at the
CommandBuffer level, Dawn Wire commands are nested inside the data portion of
the DawnCommands CommandBuffer command. In the GPU process, the CommandBuffer
command handler reads the command data, which includes an identifier for the
Dawn command, then uses a switch statement to determine which Dawn command to
execute and how to further decode the data. Consequently, Dawn commands use an
entirely discrete set of serialization, deserialization and validation logic.
Notably, the command handler is platform specific, auto-generated, and its design allows individual handlers for Dawn commands (i.e. function pointers) to be overridden when the WebGPU session is established.
Specific Dawn commands of interest are described in a dedicated document.
This section describes high level feature operation to introduce the concepts. Later documents go into more detail about specific use cases.
TransferBuffer: Efficient Bulk Data TransferEnabling efficient bulk data transfer is a core CommandBuffer feature. GPU
clients - namely a renderer process - create a gpu::TransferBuffer that
includes a pointer to a memory region the client intends to share with the GPU
process. However, a pointer is only meaningful within a single process's address
space and the goal is cross-process sharing, so gpu::TransferBuffer also
contains a unique numeric identifier, buffer_id_, that will be consistent
across processes.
When a gpu::TransferBuffer is allocated and initialized, an important step is
"registration" with the GPU process. Registration is a Mojo IPC message
containing the buffer_id_ of the newly created gpu::TransferBuffer, which
allows the GPU process to record an association between the buffer_id_ and a
pointer to the shared buffer within its own address space. After registration
both client and service can refer to the same memory by its ID rather than a
pointer.
The gpu::TransferBuffer includes data structures that allow the client and
service to treat the shared memory region as a ring buffer. These data
structures include alignments and offsets within the shared buffer, as well as
features to let client and service indicate their respective positions of
production and consumption within the buffer. These structures must be kept in
sync across both the client and service to achieve intended operation, which
requires cooperation of both client and service. Notably, since the client
creates the shared memory it may also unilaterally reorganize or altogether
de-allocate the memory; this means the service must guard against a compromised
renderer that has many opportunities to manipulate state and shared resources.
SyncToken: Foundation for SynchronizationThe SyncToken is a CommandBuffer synchronization primitive. A token is
inserted into the command stream as a shared point of reference for both
producers and consumers. The shared point of reference allows building higher
level synchronization features. For example, client and server can communicate
expectations about the ordering of operations in the stream in relation to a
token.
Higher level several synchronization features such as CHROMIUM_sync_point and
CHROMIUM_ordering_barrier build on the SyncToken and are implemented as
extensions
to the CommandBuffer protocol.
Dedicated documentation goes into more detail about Chromium GPU synchronization features.
Mailbox: A cross-process identity for shared resourcesA gpu::Mailbox is a 16-byte random value that can be shared across GPU
contexts and processes. The gpu::Mailbox name is a single shared identifier
used to refer to the same object. Similar to a gpu::TransferBuffer ID, the
common identifier gives GPU clients and services in different processes a name
to refer to data across address spaces.
The key feature of the gpu::Mailbox is allowing producers and consumers to
associate multiple resources with a single name. After establishing a mailbox,
communicating endpoints can reference a collection of related resources using a
common name rather than managing many separate resources.
SharedImage: Allowing many endpoints to operate on shared binary dataSharedImage is a complex collection of features to pass binary data between
producers and consumers. It's key feature is allowing multiple contexts of
potentially different types - e.g. Skia Raster, WebGL, and WebGPU - to operate
on the same object.
Layering and integration of fuzzers is expected to increase reachable state and therefore increase aggregate fuzzer effectiveness. Consequently, we intended to incrementally extend and integrate fuzzers in order to exercise cross-protocol complexity during future work in the graphics subsystem. Modest changes to existing fuzzers can yield new bugs.
Layering Dawn on top of an independent communication mechanism has been a source of security bugs (e.g. 1, 2, 3, 4) because operations at the lower CommandBuffer level can violate assumptions made at the higher level.