mojo/docs/wire_format_spec.md
This document serves as mostly-rigorous specification of the wire format uses by mojo messages. For the original designs, see the documents on the header and archive formats. The document on the validation testing format is also helpful.
This document is descriptive; if it deviates from the in-practice format, we should update the doc instead of the format. This document is also incomplete; footnotes mark locations where there may be unknown details.
Mojom messages are serialized into a series of bytes, comprised of a Mojom header, the serialized data (encoded as a struct), and optionally an array of associated interface IDs. Each section follows the previous one at an 8-byte alignment.
The Mojom format is not self-describing; the type of the serialized data is required to decode the message. Furthermore, the data in the body of each struct is packed, so it may not appear in the order it is defined in the Mojom file. Packing does not cross struct boundaries.
See the relevant section of the docs for descriptions of (most of) Mojom’s types. It is important to be familiar with how Mojo uses ordinal values (discussed in this section), since they sometimes determine ordering of values within the message.
IMPORTANT: Mojo (and Mojom) are designed for inter-process communication. They are not meant for communication between hosts. To be safe, you should never try to interpret an encoded message on a different host. Instead, you should use a network-safe protocol (e.g. protobuf) from the beginning.
This section lays out the high-level ideas that underpin the Mojom wire format.
The wire format draws a distinction between two types of data: leaf (or “simple”) values and structured data. The difference is that leaf values do not contain other values inside themselves, whereas structured data does. There are three types of structured data: structs, arrays, and unions.
The two types of data have different properties when serialized. Note that all data values appear inside the body of an enclosing structured data type, except for a single top-level struct containing all the arguments to the message. This struct is generated from the interface definition, and is not visible to the user.
A leaf data value:
A structured data value:
Note that these are an implementation detail of the Mojom format, and do not represent C/C++ pointers.
uint64[1] pointing at a later part of the message,
typically the tail of a structured data value.
Note: as a corollary of (2), the minimum value for a non-null pointer is 8, if
it immediately precedes the data it points to (since the pointer itself is
always 8 bytes).
Note: As a corollary of (2) and the alignment rules below, pointer values must always be divisible by 8 (since the pointer and pointee are both 8-byte-aligned).
[1]: It's possible these are pointer-sized instead of always 64-bit.
Data must be aligned inside the message, with respect to the beginning of the message. Each value or pointer is prefixed by 0 or more padding bytes in order to ensure that it begins at an appropriate location.
All values are host-endian[2].
[2]: There may be some parts of the bindings or related code that assume little-endian.
The ordinal of a field of a struct is a number indicating where in the struct it was declared. The first field has ordinal 0, the second has ordinal 1, and so on.
Since fields in the body of a struct are packed, the order they appear on the wire is not necessarily ordinal order.
Mojom allows most types to be declared as nullable. Nullability is represented
in different ways for different types. Note that a type cannot be
multiply-nullable (i.e. int?? Is not a valid mojom type).
Leaf Data:
bool tag and the value itself. The tag is guaranteed to
appear before the value (it has a lower ordinal value).
true.false. The value is still serialized, but
its bytes are undefined and considered meaningless.0xffffffff instead of using a tag bit.
Structured Data:
0 to
represent null.0) in the body of
the enclosing data. If a pointer is null, no corresponding data is emitted
in the enclosing data’s tail.
Note: Java (and maybe javascript) don't support unsigned integers, so they will interpret all integers as signed. Sending unsigned integers cross-language is therefore unsafe.
[3]: It's possible this is platform-dependent.
string type is an alias for array<uint8>, except that
the language bindings map it to a different source type (e.g. std::string in
C++).
map<K, V> is encoded precisely as if it were defined asstruct Foo { array<K> keys, array<V> values }.
keys[i] maps to the
value at values[i].int32 containing their underlying
discriminant value.
Note: If an enum is nested inside of a struct/interface, then the C++ bindings
generate an enum at the top level with the name StructName_EnumName, etc.
Note: Recursive unions (which contain themselves as a possible field) are not allowed, but mutually recursive unions are.
Mojom handles do not appear directly in the body of the encoded message. Instead, they are passed via a separate vector of handles alongside the message body. They are referred to from the message body via indices into that vector.
0xffffffff is reserved for nullable handle types only
(and indicates the None value)None are skipped when assigning indices.Note that since handle values should be globally unique, no index should be re-used within a message. Similarly, each handle in the separate vector should be referenced exactly once.
The pending_receiver type is represented on the wire as its underlying
message pipe handle value.
The pending_remote type is represented as 64-bit pair of the underlying handle
value and a 32-bit version field.
pending_remote value are the handle value.0xffffffff in
the first 32 bits. The remaining 32 bits may still contain a valid version
number.The pending_associated_remote and pending_associated_receiver types are
encoded like the pending_remote and pending_receiver types, but
rather than indexing into an external array of handle values, they index into
a separate array of interface IDs.
If (and only if) a message contains an associated remote or receiver, the
payload will be immediately followed by an array<uint32>, and the
payload_interface_ids header field will contain a pointer to that array. The
array is encoded like any other array. It is not considered part of
the payload. Interface IDs are 32-bit integers.
Associated remotes and receivers are otherwise encoded identically to their non-associated equivalents. Since they index into a different array, their indices are independent of the ones used by remote/receivers/raw handles.
Note that interface IDs are generated and attached to associated endpoints as part of (de)serialization. That is, encoding an associated remote or receiver is a stateful operation and has side-effects on other parts of the system. That process is independent of the wire format, however.
Fields of a struct are packed in order to take up less space. For the full, canonical algorithm, see the implementation in python. Field packing does not cross struct boundaries.
At a high level, the algorithm operates as follows:
For example, the following struct:
struct Foo {
uint8 n8;
uint64 n64;
uint16 n16_1;
bool b1;
uint16 n16_2;
uint32 n32;
bool b2;
}
Would be packed as
n8 [bitfield: 0 0 0 0 0 0 b2 b1] n16_1 n16_2 [pad 16] n64 n32
Because:
n8 and n64 have nowhere earlier to move.n16_1 fits between them, but is 2-byte-aligned and so must be placed one
byte after n8.b1 fits between n8 and n16_1 and starts a new bitfield in the 1s place.n16_2 fits exactly between n16_1 and n64.n32 doesn't fit in the remaining 16 bytes between n16_2 and n64, so it
stays where it is.b2 fits into the existing bitfield in the 2s place.Headers are serialized as if they were a Mojom struct with the following definition:
struct MessageHeader {
// Interface ID for identifying multiple interfaces running on the same
// message pipe.
uint32 interface_id;
// Message name, which is scoped to the interface that the message belongs to.
uint32 name;
// A combination of zero or more of the flag constants defined within the
// Message class.
uint32 flags;
// A unique (hopefully) value for a message. Used in tracing, forming the
// lower part of the 64-bit trace id, which is used to match trace events for
// sending and receiving a message (`name` forms the upper part).
uint32 trace_nonce;
// Only used if either kFlagExpectsResponse or kFlagIsResponse is set in
// order to match responses with corresponding requests.
[MinVersion = 1] uint64 request_id;
// Stores a Mojom Pointer (see #Pointers above) to the payload struct which
// appears after this header. Note: The `mojo_ptr_t` type can't be written
// down in real mojom. It represents a pointer-sized integer.
[MinVersion = 2] mojo_ptr_t payload;
// Stores a Mojom Pointer (see #Pointers above) to the list of interface IDs
// that optionally appears after the payload.
[MinVersion = 2] mojo_ptr_t payload_interface_ids;
// Stores the timestamp when this message was created.
[MinVersion = 3] int64_t creation_timeticks_us;
}
See this file for the C++ definition of the header type.