doc/developer/design/20220511_command_and_response_binary_encoding.md
The contents of this design doc are a cleaned up version of a subset of the "Protobuf Support for ComputeCommand" Google Doc, which was the primary design doc while working on #11735.
As part of the Platform initiative, materialized will be broken down into different processes that communicate through a set of well-defined APIs comprised of commands12 (a.k.a. requests) and responses34.
API messages therefore need to be serialized into a backwards-compatible binary format in order to facilitate inter-process communication and replay-based failure recovery.
This document proposes an encoding to be adopted for these purposes based on Google's Protocol Buffers5 (a.k.a. Protobuf).
persist) already use Protobuf, and there is a large overalop of the set of types used both by persist and the current API (for example, everything under Row).
Second, Protobuf nicely integrates with adjacent technologies (such as gRPC) that might be of interest in the near future.We delegate the heavy-lifting of the serialization/deserialization efforts and large chunks of the backwards-compatibility story to Protobuf.
Following suit from the rest of the codebase, the Protobuf integration is handled by the prost6 library.
Given the fact that we want to retro-actively add Protobuf support for all Rust types used in the current API, we have the following strategies:
*.proto files.prost attributes.$T, create a mirroring type Proto$T and define a pair of conversion functions that mediate between the two types.The design proposed here is based on (3) because this strategy offers the highest degree of flexibility with respect to backwards compatibility. The pros and cons of (1) and (2) are discussed in the Alternatives section.
Pros
The selected strategy offers flexibility due to the separation of the serializable type (Proto$T) from the client facing type ($T). In particular:
$T as usual and keep client code simple.
The technical complexity caused by backwards-compatibility guarantees is accumulated in the Proto$T and the associated Proto$T ⇒ $T conversion function.*.proto defs by prost-build (see the rejected alternatives for details).
The existing Rust type $T can remain unchaged, while Proto$T can deviate in a predictable and consistent way based on the limitations of prost-build.
Moreover, we can offer library functions that mediate between $T and Proto$T in a consistent way across the codebase.
For example, we can enforce that the Rust type usize is always represented by the Protobuf type uint64.Cons
Proto$T in a *.proto file,prost-build to generate the Proto$T Rust type, and$T ⇔ Proto$T for each $T.$T used by the API.$T ⇔ Proto$T for complex type is coing to be recursive and therefore susceptible to stack overflow issues (see #9000).$T and Proto$T, possibly in the hot paths of some processes.*.proto filesWith this strategy, we will:
$T to a corresponding Protobuf message type,$T from this message using prost-build, and$T with its derived version in client code.Pros
*.proto files to derive messages in other languages.Cons
*.proto message definitions by prost and the existing Rust types, so we will need to touch client code. For example:
usize, chrono types, tuples).prost attributesWith this strategy, we will add Protobuf serialization and deserialization support directly to the existing types by annotating them with prost-derive macros (e.g. ::prost::Message).
Pros
Cons
Same as for the other rejected alternative.
$T ⇔ Proto$T step at the moment? If yes, we need to run some benchmarks to quantify this.#eng-storage regarding protobuf representation for Row#eng-persist regarding adopting the Codec trait#team-status thread prior to the meeting on 2022/03/29#help-rust thread regarding usize handling#prost Discord channel Q1: cross-crate imports#prost Discord channel Q2: blanket implementations for tuples#prost Discord channel Q3: FileDescriptorSet handling#prost Discord channel Q4: (Optional<T> handling in proto3)