doc/developer/command-and-response-binary-encoding.md
This guide is intended for developers that want to extend or modify the set of command and response types that comprise the APIs used between materialized
and clusterd. As part of this process, one also needs to:
This guide currently focuses primarily on (1). Details for (2) will be added as we accumulate more knowledge.
This process of adding Protobuf-based serialization support for a new Rust type $T consists of the following <a name="implementation-steps"></a>implementation steps:
$T.Proto$T (a.k.a. the Protobuf representation of $T) and compile it to Rust with prost.$T and Proto$T.If $T needs to be added to mz_expr::foo::bar, the source code of the mz_expr crate needs to be adapted as follows.
expr - crate root folder.
build.rs - contains prost_build instructions for compiling all *.proto files in the crate into *.rs source code.
src - crate sources folder.
foo/bar.proto - contains Protobuf definitions Proto$T for types $T located in foo/bar/mod.rs.
foo/bar/mod.rs - contains Rust definitions $T and Proto$T and associated traits.
The following sections contain details for of the above each action items.
$TWe consider two main cases for $T - structs and enums.
Here are the two definitions from expr/src/foo/bar/mod.rs to be used as a running example.
use chrono::NaiveDate;
use mz_repr::adt::char::CharLength;
// `$T` is a struct
pub struct MyStruct {
pub field_1: u64,
pub field_2: usize,
pub field_3: CharLength,
pub field_4: NaiveDate,
pub field_5: Vec<CharLength>,
pub field_6: Vec<Vec<CharLength>>,
pub field_7: HashMap<GlobalId, NaiveDate>,
pub field_8: Vec<u64>,
}
// `$T` is an enum
#[derive(Debug)]
pub enum MyEnum {
Var1(u64),
Var2(usize),
Var3(CharLength),
Var4(NaiveDate),
}
The above examples also illustrate of the <a name="type-classes"></a>classes of nested Rust types that one may encounter:
<ol type="a"> <li>Primitive types that have a Protobuf counterpart (such as <code>u64</code>).</li> <li>Primitive types that don't have a Protobuf counterpart (such as <code>usize</code>).</li> <li>Complex types that are defined by us (such as <code>MyLibType</code>).</li> <li>Complex types that are not defined by us (such as <code>DateTime</code>).</li> </ol>In addition MyStruct has a number of fields whose types are containers of primitive or complex types (Vec<_>, Vec<Vec<_>>, HashMap<_, _>).
The problem of encoding $T in a Protobuf-based binary format thereby decomposes into the problem of encoding instance of each of the above four classes.
The following rules apply in general:
Protobuf message for Proto$TThis step is only needed if $T is a complex type (classes (c) or (d)).
The initial message definition of Proto$T can be derived schematically from the shape of $T (see Appendix A for details).
Here are the example contents of expr/src/foo/bar.proto for the running examples from the previous section.
syntax = "proto3";
import "repr/src/adt/char.proto";
import "repr/src/chrono.proto";
package mz_expr.foo.bar;
// `$T` is a struct
message ProtoMyStruct {
message ProtoField7Entry {
mz_repr.global_id.ProtoGlobalId key = 1;
mz_repr.chrono.ProtoNaiveDate value = 2;
}
uint64 field_1 = 1;
uint64 field_2 = 2;
mz_repr.adt.char.ProtoCharLength field_3 = 3;
mz_repr.chrono.ProtoNaiveDate field_4 = 4;
repeated mz_repr.adt.char.ProtoCharLength field_5 = 5;
repeated mz_repr.adt.char.VecProtoCharLength field_6 = 6;
repeated ProtoField7Entry field_7 = 7;
repeated uint64 field_8 = 8;
}
// `$T` is an enum
message ProtoMyEnum {
oneof kind {
uint64 var1 = 1;
uint64 var2 = 2;
mz_repr.adt.char.ProtoCharLength var3 = 3;
mz_repr.chrono.ProtoNaiveDate var4 = 4;
}
}
build.rsThis step is only needed if $T is a complex type (classes (c) or (d)).
fn main() {
env::set_var("PROTOC", mz_build_tools::protoc());
prost_build::Config::new()
// list paths to external types used in the compiled files
.extern_path(".mz_repr.adt.char", "::mz_repr::adt::char")
.extern_path(".mz_repr.chrono", "::mz_repr::chrono")
// snip (...)
// make the docstring linter happy
.type_attribute(".", "#[allow(missing_docs)]")
// list paths to `*.proto` files to be compiled
.compile_protos(
&[
"expr/src/foo/bar.proto",
// snip (...)
],
&[".."],
)
.unwrap();
}
prostAdd the following line right after the use section at the top of expr/src/foo/bar/mod.rs:
include!(concat!(env!("OUT_DIR"), "/mz_expr.foo.bar.rs"));
$T ⇔ Proto$T mappingsFor types from classes (b), (c), and (d), we need to implement the RustType trait.
Here is the implementation for usize for example.
For example, here are the implementations for MyStruct
impl RustType<ProtoMyStruct> for MyStruct {
fn into_proto(&self) -> ProtoMyStruct {
ProtoMyStruct {
field_1: self.field_1,
field_2: self.field_2.into_proto(),
field_3: Some(self.field_3.into_proto()),
field_4: Some(self.field_4.into_proto()),
field_5: self.field_5.into_proto(),
field_6: self.field_6.into_proto(),
field_7: self.field_7.into_proto(),
field_8: self.field_8.into_proto(),
}
}
fn from_proto(proto: ProtoMyStruct) -> Result<Self, TryFromProtoError> {
Ok(MyStruct {
field_1: proto.field_1,
field_2: proto.field_2.into_rust()?,
field_3: proto.field_3.into_rust_if_some("ProtoMyStruct::field_3")?,
field_4: proto.field_4.into_rust_if_some("ProtoMyStruct::field_4")?,
field_5: proto.field_5.into_rust()?,
field_6: proto.field_6.into_rust()?,
field_7: proto.field_7.into_rust()?,
field_8: proto.field_8.into_rust()?,
})
}
}
impl ProtoMapEntry<GlobalId, NaiveDate> for proto_my_struct::ProtoField7Entry {
fn from_rust<'a>(entry: (&'a GlobalId, &'a NaiveDate)) -> Self {
Self {
key: Some(entry.0.into_proto()),
value: Some(entry.1.into_proto()),
}
}
fn into_rust(self) -> Result<(GlobalId, NaiveDate), TryFromProtoError> {
let key = self.key.into_rust_if_some("ProtoField7Entry::key")?;
let value = self.value.into_rust_if_some("ProtoField7Entry::value")?;
Ok((key, value))
}
}
and MyEnum.
impl RustType<ProtoMyEnum> for MyEnum {
fn into_proto(&self) -> ProtoMyEnum {
use proto_my_enum::Kind::*;
ProtoMyEnum {
kind: Some(match self {
MyEnum::Var1(x) => Var1(x.clone()),
MyEnum::Var2(x) => Var2(x.into_proto()),
MyEnum::Var3(x) => Var3(x.into_proto()),
MyEnum::Var4(x) => Var4(x.into_proto()),
}),
}
}
fn from_proto(proto: ProtoMyEnum) -> Result<Self, TryFromProtoError> {
use proto_my_enum::Kind::*;
let kind = proto
.kind
.ok_or_else(|| TryFromProtoError::missing_field("ProtoMyEnum::kind"))?;
Ok(match kind {
Var1(x) => MyEnum::Var1(x),
Var2(x) => MyEnum::Var2(x.into_rust()?),
Var3(x) => MyEnum::Var3(x.into_rust()?),
Var4(x) => MyEnum::Var4(x.into_rust()?),
})
}
}
Note that the trait needs to be implemented for all nested types as well, and the ProtoMapEntry trait needs to be implemented for types that represent encoded ~Map entries (such as proto_my_struct::ProtoField7Entry).
Note the pre-existing implementations for RustType.
The blanket implementations allow seamless use of into_proto() and into_rust()? syntax for (possibly nested) container types as long as the element type implements RustType.
$TUnit tests for Protobuf encoding support rely on the proptest library.
In order add a test for a new type, follow these steps.
proptest::Arbitrary for $TImplement proptest::Arbitrary for your Rust type $T.
proptest.proptest_derive::Arbitrary derive macro (example).Arbitrary implementation is required (example).Note that derived Arbitrary implementations occasionally suffer from stack overflow errors, as the ValueTree lives entirely on the stack.
This most often (but not exclusively) affects recursive and unbalanced structures.
See the relevant issues filed in AltSysrq/proptest/issues/152 and AltSysrq/proptest/issues/249.
As a consequence of that limitation, you might see errors like that one:
thread 'protocol::client::tests::storage_command_protobuf_roundtrip' has overflowed its stack
fatal runtime error: stack overflow
The current workaround in that case is to implement Arbitrary manually and to box the children of the current node using the .boxed() method. See 3ab46c5d for an example.
We are currently investigating fixing this in a private fork so we don't have to do this.
This section will be removed if we suceed in this endeavour.
Here are the derive-based Arbitrary implementations for MyStruct and MyEnum.
use chrono::NaiveDate;
use proptest_derive::Arbitrary;
use mz_repr::adt::char::CharLength;
use mz_repr::chrono::any_naive_date;
use mz_proto::*;
// `$T` is a struct
#[derive(Arbitrary, Debug, PartialEq, Eq)]
pub struct MyStruct {
pub field_1: u64,
pub field_2: usize,
pub field_3: CharLength,
#[proptest(strategy = "any_naive_date()")]
pub field_4: NaiveDate,
#[proptest(strategy = "tiny_char_length_vec()")]
pub field_5: Vec<CharLength>,
#[proptest(strategy = "prop::collection::vec(tiny_char_length_vec(), 0..3)")]
pub field_6: Vec<Vec<CharLength>>,
#[proptest(strategy = "tiny_id_to_naive_date_map()")]
pub field_7: HashMap<GlobalId, NaiveDate>,
#[proptest(strategy = "prop::collection::vec(any::<u64>(), 0..20).boxed()")]
pub field_8: Vec<u64>,
}
fn tiny_char_length_vec() -> prop::strategy::BoxedStrategy<Vec<CharLength>> {
prop::collection::vec(any::<CharLength>(), 0..3).boxed()
}
fn tiny_id_to_naive_date_map() -> prop::strategy::BoxedStrategy<HashMap<GlobalId, NaiveDate>> {
prop::collection::hash_map(any::<GlobalId>(), any_naive_date(), 0..3).boxed()
}
// `$T` is an enum
#[derive(Arbitrary, Debug, PartialEq, Eq, Hash)]
pub enum MyEnum {
Var1(u64),
Var2(usize),
Var3(CharLength),
Var4(#[proptest(strategy = "any_naive_date()")] NaiveDate),
}
protobuf_roundtrip testInstantiate the following test function template in the tests submodule of the module containing $T.
#[test]
fn $t_protobuf_roundtrip(expect in any::<$T>()) {
let actual = protobuf_roundtrip::<_, Proto$T>(&expect);
assert!(actual.is_ok());
assert_eq!(actual.unwrap(), expect);
}
Note that you might need to reduce the number of test cases with a custom ProptestConfig in order to keep the test runtime under control.
Here are the tests for MyStruct and MyEnum.
#[cfg(test)]
mod tests {
use proptest::prelude::*;
use mz_proto::protobuf_roundtrip;
use super::*;
// snip
proptest! {
// use 64 instead of the default (256) cases for these tests
#![proptest_config(ProptestConfig::with_cases(64))]
#[test]
fn my_struct_protobuf_roundtrip(expect in any::<MyStruct>()) {
let actual = protobuf_roundtrip::<_, ProtoMyStruct>(&expect);
assert!(actual.is_ok());
assert_eq!(actual.unwrap(), expect);
}
#[test]
fn my_enum_protobuf_roundtrip(expect in any::<MyEnum>()) {
let actual = protobuf_roundtrip::<_, ProtoMyEnum>(&expect);
assert!(actual.is_ok());
assert_eq!(actual.unwrap(), expect);
}
}
}
$TThe following table summarizes rules for deriving the message definition for Proto$T based on the structure of $T.
We use double square brackets 〚$T〛 to denote the Protobuf type derived from $T.