docs/idl.md
Interface Definition Language (IDL) is a custom Domain Specific Language (DSL) originally designed to generate code to meets MongoDB's needs for handling BSON. Server parameters and configuration options support was added later. It uses YAML 1.1 to defined a custom IDL and generates C++ code. IDL is primarily written in Python 3 (Python 2 originally) in the [buildscripts/idl/] directory. It has C++ support code for the generated code in the [src/mongo/idl] directory. The config option parsing support code is in [src/mongo/util/options_parser]. It is inspired by other IDL languages like XDR, ASN.1, MIDL, and Google's Protocol Buffers.
struct in IDL ) using BSONObj and BSONObjBuilderstruct but understand the unique requirements of commands. Also, can
parse OpMsg's document sequences.Status/StatusWithIDL is like a giant script that prints C++ code. For each invocation of the IDL compiler idlc.py, there are two files generated, a header and a source file. For instance,
python buildscripts/idl/idlc.py src/mongo/idl/unittest.idl
generates two files when invoked:
build/opt/mongo/idl/unittest_gen.h
build/opt/mongo/idl/unittest_gen.cpp
The generated files always have a suffix of either _gen.h or _gen.cpp. These files are human
readable and the output tries to match MongoDB's C++ style.
Important: At the top of each file is warning about modifying the generated file by hand. Any modifications to the files are lost when the build is rerun since the build regenerates the files. Also, the command used to regenerate the file is at top if one wants to generate a file without running the build system.
IDL is wide spread across the code base. Existing IDL files are good examples of how to use IDL. A
good reference is src/mongo/idl/unittest.idl which tests all IDL features.
IDL automates the tedious work of writing BSON parsers. Before IDL, a developer would need to write code to read the document and then add tests cases to validate the parser worked by design. IDL eliminates the need to write hand-written parsers and the test burden they incur.
Example Document:
{
"intField": 42,
"stringField": "question"
}
to represent this in IDL, write the following file:
src\mongo\example\example.idl:
global:
cpp_namespace: "mongo"
imports:
- "mongo/db/basic_types.idl"
structs:
example:
description: An example struct
fields:
intField:
type: int
stringField:
type: string
The next step is to actually generate code from the YAML description. To do that, add the following
to a BUILD.bazel file:
src\mongo\example\BUILD.bazel:
mongo_idl_library(
name='example',
src=[
'example.idl',
],
deps=[
'//src/mongo/idl:idl_parser',
],
)
Bazel knows how to invoke the IDL compiler and generate files in the build directory with the C++
code. This code can also be generated by --build_tag_filters=gen_source tag in bazel which is useful for
code navigation.
The generated IDL code looks something like the simplified code below.
build\<variant_director>\mongo\example\example_gen.h:
/**
* An example struct
*/
class Example {
public:
Example(std::int32_t intField, std::string stringField,
boost::optional<SerializationContext> serializationContext = boost::none)
void serialize(BSONObjBuilder* builder) const;
BSONObj toBSON() const;
static Example parse(const IDLParserContext& ctxt, const BSONObj& bsonObject);
std::int32_t getIntField() const;
void setIntField(std::int32_t value);
StringData getStringField() const
void setStringField(StringData value);
private:
std::int32_t _intField;
std::string _stringField;
};
IDL generates 5 sets of key methods.
constructor - a C++ constructor with only the required fields as argumentsparse - a static function that parses a BSON document to the C++ classserialize/toBSON - a method that serializes the C++ class to BSONget* - methods to value of a field after parsingset* - methods to set a field in the C++ class before serializationTo use this class in a C++ file, write the following code:
src\mongo\example\example.cpp:
#include "mongo/example/example_gen.h"
bool is42(BSONObj& doc) {
Example example = Example::parse(IDLParserContext("root"), doc);
return doc.getIntField() == 42;
}
If there are any problems parsing, the generated parser throws an exception. More details on the various features of IDL are described in the sections below.
Commands are a subset of structs. All commands are structs but not all structs are commands.
Commands are part of the MongoDB RPC protocol. As such, commands have special rules like the first
field of the command must be its name. IDL supports the unique needs of commands with additional
fields on the commands object.
The special features/requirements of commands:
namespace field.OP_MSG, $db must be present or defaults to adminstruct as a replyis_generic_cmd_list: "arg" that are in imported IDL files
will automatically be chained to all commands. The IDL compiler imports
generic_argument.idl by default, so any generic argument struct
defined in that file will be chained to all commands by default.$clusterTime, ok, etc
during parsing. The list of these fields is in generic_argument.idl.Example Command:
{
"hasEncryptedFields": "testCollection",
"encryptionType": "queryableEncryption",
"comment": "Example command",
"$db": "testDB"
}
which has a reply
{
"answer": "yes",
"ok": 1
}
to represent this in IDL, write the following file:
src\mongo\example\example_command.idl:
global:
cpp_namespace: "mongo"
imports:
- "mongo/db/basic_types.idl"
structs:
hasEncryptedFieldReply:
is_command_reply: true
fields:
answer:
type: string
commands:
hasEncryptedFields:
description: An example command
namespace: concatenate_with_db
fields:
encryptionType:
type: string
To see how to integrate a command IDL file in Bazel, see the example above for structs.
A IDL file consist of a series of top-level sections (i.e. YAML maps).
global - Global settings that affect code generationimports- List of other IDL files that contain enums, types and structs this file refers toenums - List of enums to generate code fortypes - List of types which instruct IDL how deserialize/serialize primitivesstructs - List of BSON documents to deserialize/serialize to C++ classescommands - List of BSON commands used by MongoDB RPC to deserialize/serialize to C++ classesserver_parameters - See docs/server_parameters.mdconfigs - TODO SERVER-79135feature_flags - TODO SERVER-79135cpp_namespace - string - The C++ namespace for all generated classes and enums to belong to.
Must start with mongo.
cpp_includes - sequence - A list of C++ headers to include in the generated .h file. You
should not list generated IDL headers here as includes for them are automatically generated from
imports.
configs - map - A section that defines global settings for configuration options
source - sequence - a subset of [yaml, cli, ini]
cli - configuration option handled by command lineyaml - configuration option handled by yaml config fileini - configuration option handled by deprecated ini file format. Do not use for new flags.section - string - Name of displayed section in --help
initializer - map
register - string - Name of generated function to add configuration options.
If not provided, an anonymous MONGOMODULE_STARTUP_OPTIONS_REGISTER initializer will be declared which will automatically register the config settings named in this file at startup time. This initializer will be named "idl" followed by a string of hex digits. Currently this string is the SHA1 hash of the header's filename, but this should not be used in dependency rules since it may change at a later time.
If provided, all registration logic will be implemented in a public function of the form Status registerName(optionenvironment::OptionSection* options_ptr). It it up to additional code to decide how and when this registration function is called.
store - string - Name of generated function to store configuration options.
This behaves like register, but using a MONGO_STARTUP_OPTIONS_STORE initializer in the
not-provided case, and declaring Status storeName(const optionenvironment::Environment&
params) in the provided case.
An example for a typical global section is:
global:
cpp_namespace: "mongo"
cpp_includes:
- "mongo/idl/idl_test_types.h"
mongo is the C++ namespace for the generated code. One header is listed because the IDL types
depend on it in this imaginary example.
The imports section is a list of other IDL files to include. If your IDL references other enums,
types, or structs, the imports section lists IDL file with the definition or IDL throws an error.
Note: The IDL compiler does not generate code for imported things, it generates code for the file
listed on the command line. For instance, if your IDL file imports a struct named ImportedStruct,
the generated code calls its ImportedStruct::parse function but does not generate the
ImportedStruct::parse definition or declaration.
The imports are transitive. The IDL compiler will recursively import all IDL files imported by
other IDL files. IDL will also implicitly de-duplicate imports and only process each file once. The
de-duplication is similar to how #pragma once works in C++.
IDL generates a C++ include for the generated headers of each IDL file in the generated code.
An example for a typical imports section is:
imports:
- "mongo/db/basic_types.idl"
Note: src/mongo/db/basic_types.idl is a foundational file for IDL. This file defines the standard types of IDL. Without this file, IDL does not know how to read and write a string or integer for instance.
The enums section is a YAML map that allow integer and string enumeration. These both map to
C++ enums, but differ in whether they parse integers or strings in a bson document.
Used to map a string value to a C++ enum value. In this case, the values of the enums themselves are
not important. Use string enums when strings are persisted, not integers. For string enums, the
values map is a map of enum value names to strings.
StringEnum:
description: "An example string enum"
type: string
values:
s0: "zero"
s1: "one"
s2: "two"
it generates an enum and ADL hooks to serialize and deserialize the enum:
enum class StringEnumEnum : std::int32_t {
s0,
s1,
s2,
};
void idlDeserialize(StringEnumEnum& en, ::mongo::StringData value, const IDLParserContext& ctxt);
::mongo::StringData idlSerialize(StringEnumEnum value);
constexpr ::mongo::StringData idlGetDefaultParserFieldName(StringEnumEnum) { return "StringEnumEnum"; }
These ADL hooks are not intended to be used directly by user code. See Serialization/Deserialization API.
Used to map a integer value to a C++ enum value. In this case, the values of the enums themselves
are important unlike string enums. Use integer enums when integers are persisted. For integer enums,
the values map is a map of enum value names to integers.
IntEnum:
description: "An example int enum"
type: int
values:
s0: 0
s1: 2
s2: 4
it generates an enum and ADL hooks to serialize and deserialize the enum:
enum class IntEnum : std::int32_t {
kS0 = 0,
kS1 = 2,
kS2 = 4,
};
void idlDeserialize(IntEnum& en, std::int32_t value, const IDLParserContext& ctxt);
std::int32_t idlSerialize(IntEnum value);
constexpr ::mongo::StringData idlGetDefaultParserFieldName(IntEnum) { return "IntEnum"; }
These ADL hooks are not intended to be used directly by user code. See Serialization/Deserialization API.
The public API to serialize and deserialize IDL-generated enums is defined in idl_parser.h and can be used like this:
auto serialized = idl::serialize(enumToSerialize);
auto parsedEnum = idl::deserialize<IdlEnum>(value);
The definitions of idl::serialize() and idl::deserialize() rely on the autogenerated ADL hooks to
find the serializer/deserializer implementations for each enum. User code should use this public API
and not the ADL hooks directly.
Each enum can have the following pieces:
description - string - A comment to add to the generated C++type - string - can be either string or intvalues - map - a map of enum value name -> enum valueLike struct.fields[], enum.values[] may be given as either a simple mapping name: value and indeed
most are, but the may also map names to a dictionary of information:
IntEnum:
description: "An example int enum"
type: int
values:
s0:
description: Nothing, nada, zip.
value: 0
s1: 2
description: 2 to the first power
value: 2
s2: 4
description: 2 squared!
value: 4
This is not needed in a lot of cases, but in some places it provides good documentation (which will be surfaced in the generated files as well) for future readers.
There's also a third, very rarely used property for enum values called extra_data. You can see an
example of this in src/mongo/db/auth/action_type.idl where the
ResourcePattern enum correlates itself to permitted ActionType enums allowed in Serverless. This
data gets used in
src/db/auth/authorization_session_impl.cpp.
A type declares all the information that IDL needs to know to read and write a C++ type from/to
BSON. Types are typically string values but can be anything such as documents. They are the main
extensibility point into IDL for C++ code. They allow users to incrementally adopt IDL in their
parsing. This means that not all structs have to be defined in IDL for IDL to be useful. Finally,
types allow users to customize IDL parsing for their own unique needs.
A field in a struct or command can be defined as a type but a field can also be an array, enum,
struct or variant. Declaring a field as something other then a type preferred to using types since
it allows more type information to be represented in IDL over C++. See type in the field
reference for more information.
Type supports builtin BSON types like int32, int64, and string. These are types built into
BSONElement/BSONObjBuilder. It also supports custom types to give the code full control of
parsing and serialization. Note: IDL has no builtin types. The
src/mongo/db/basic_types.idl file declares all common BSON types and must
be manually imported into every file. This separation makes unit testing easier and allows IDL to be
extendable by separating most type concerns from the python code.
The declaration of a type does not generate any code. The code for a type is generated once it is instantiated in a struct or command.
Kinds of types:
BSONElement/BSONObjBuilder. The
src/mongo/db/basic_types.idl file declares all common BSON typesany as the
bson_serialization_type.Here is a basic type definition for the string type.
string:
bson_serialization_type: string
description: "A BSON UTF-8 string"
cpp_type: "std::string"
deserializer: "mongo::BSONElement::str"
is_view: false
The five key things to note in this example:
bson_serialization_type - a list of types BSON generated code should check a type is before
calling the deserializer. In this case, IDL generated code checks if the BSON type is string.cpp_type - The C++ type to store the deserialized value as. This is type of the member variable
in the generated C++ class when this type is instantiated in a struct.deserializer - a method to all deserialize the type. Typically this is a function that takes
BSONElement as a parameter. The IDL generator has custom rules for BSONElement.serializer - omitted in this example because BSONObjBuilder has builtin support for
std::stringis_view - indicates whether the type is a view or not. If the type is a view, then it's
possible that objects of the type will not own all of its members. If the type is not a view,
then objects of the type are guaranteed to own all of its members. This field is optional and
defaults to True. To reduce the size of the C++ representation of structs including this type,
you can specify this field as False if the type is not a view type.Here is a more interesting example for mongo::NamespaceString. A NamespaceString is a BSON string
but has custom serialization rules.
namespacestring:
bson_serialization_type: string
description: "A MongoDB NamespaceString"
cpp_type: "mongo::NamespaceString"
serializer: ::mongo::NamespaceStringUtil::serialize
deserializer: ::mongo::NamespaceStringUtil::deserialize
deserialize_with_tenant: true
is_view: false
The key thing to note is this example specifies that both deserializer and serializer. They are
both prefixed with :: which tells IDL these are global static functions, not members of the C++
type mongo::NamespaceString. This also impacts what is passed to the function. Global static (or
free) serializer functions get the instance as the first arg, while member methods do not (because
they have access to this).
any types are the escape hatch of the IDL type system. Use any types when custom types are not
flexible enough. This is often used to deal pre-IDL fields/structs. IDL any types are responsible
for their own BSON type checking. They are also responsible for serializing the field name itself in
the BSON. IDL provides any type serializers with the field name but any types are responsible for
actually writing it to the BSONObjBuilder.
IDLAnyType:
bson_serialization_type: any
description: "Holds a BSONElement of any type."
cpp_type: "mongo::IDLAnyType"
serializer: mongo::IDLAnyType::serializeToBSON
deserializer: mongo::IDLAnyType::parseFromBSON
is_view: true
description - string - A comment to add to the generated C++bson_serialization_type - string or sequence - a list of types BSON generated code should check
a type is before calling the deserializer. Can also be any.
buildscripts/idl/idl/bson.py lists the supported types.bindata_subtype - string - if bson_serialization_type is bindata, this is the required
bindata subtype. buildscripts/idl/idl/bson.py lists the
supported bindata subtypes.cpp_type - The C++ type to store the deserialized value as. This is type of the member variable
in the generated C++ class when a struct/command uses this type.
std::string - When using std::string, the getters/setters using mongo::StringData insteadstd::vector<_> - When using std::vector<->, the getters/setters using
mongo::ConstDataRange insteaddeserializer - string - a method name to all deserialize the type. Typically this is a function
that takes BSONElement as a parameter. The IDL generator has custom rules for BSONElement. - By default, IDL assumes it is a instance methods of cpp_type. - If prefixed with ::, assumes the function is a global static function - By default, the deserializer's function signature is <function_name>(<cpp_type>). - For object types, the deserializer's function signature is <function_name>(const BSONObj& obj) - For any types, the deserializer's function signature is <function_name>(BSONElement element).serializer - string -a method name to all serialize the type. - By default, IDL assumes it is a instance methods of cpp_type. - If prefixed with ::, assumes the function is a global static function - By default, the deserializer's function signature is <type_append> <function_name>(const <cpp_type>&) where type_append is a type BSONObjBuilder understands. - For object types, the deserializer's function signature is <function_name>(const BSONObj& obj) - For any types that are not in an array, the serializer's function signature is
<function_name>(StringData fieldName, BSONObjBuilder* builder). - For any types that are in an array, the serializer's function signature is
<function_name>(BSONArrayBuilder* builder).deserialize_with_tenant - bool - if set, adds TenantId as the first parameter to
deserializerinternal_only - bool - undocumented, DO NOT USEdefault - string - default value for a type. A field in a struct inherits this value if a field
does not set a default. See struct's default rules for more information.is_view - indicates whether the type is a view or not. If the type is a view, then it's
possible that objects of the type will not own all of its members. If the type is not a view,
then objects of the type are guaranteed to own all of its members.Structs are the main IDL feature. They are used to serialize and deserialize a BSON document to C++.
A struct consists of a description, a set of optional flags and a sequence of fields. Commands are a
separate feature of IDL designed to handle the unique needs of commands (such as the first field of
a command is its name, common fields across commands). See commands below
The generated C++ parsers for structs are strict by default. This means that they throw an error on
fields that do not know about. Use strict: false to change this behavior. Mark persisted structs
with strict: false for future backwards compatibility needs.
A struct consists of one or more fields. All fields are required by default. The generated parser
errors if field is missing from the BSON document. On serialization, if a field has not been set,
the serializer calls invariant.
Fields can optionally be marked as optional and stored as boost::optional. These fields are
optional in the BSON document and the parser does not throw an error if missing. Also, they are not
required to be set before serialization. Non-optional fields can also have a default value. If a
field has a default, then it does not need to present in the BSON document or set in a setter.
exampleStruct:
description: An example command
fields:
requiredField: int
optionalField:
description: Provide it if you want to.
type: bool
optional: true
defaultedField:
description: >-
Most callers should rely on 42
as it is the answer to the question
of life the universe and everything.
type: long
validator:
gt: 0
lt: 50
default: 42
This generates a C++ function with methods to parse and serialize the struct. Note: This code has been simplified from the full code IDL generates.
class ExampleStruct {
public:
static constexpr auto kValueFieldName = "value"_sd;
ExampleStruct(boost::optional<SerializationContext> serializationContext = boost::none);
ExampleStruct(mongo::BSONObj value, boost::optional<SerializationContext> serializationContext = boost::none);
void serialize(BSONObjBuilder* builder) const;
BSONObj toBSON() const;
static ExampleStruct parse(const IDLParserContext& ctxt, const BSONObj& bsonObject);
std::int32_t getRequiredField() const { return _requiredField; }
void setRequiredField(std::int32_t value) { _requiredField = std::move(value); }
boost::optional<bool> getOptionalField() const { return _optionalField; }
void setOptionalField(boost::optional<bool> value) { _optionalField = std::move(value); }
std::int64_t getDefaultedField() const { return _defaultedField; }
void setDefaultedField(std::int64_t value) { validateDefaultedField(value); _defaultedField = std::move(value); }
const mongo::SerializationContext& getSerializationContext() const { return _serializationContext; }
void setSerializationContext(mongo::SerializationContext value) { _serializationContext = std::move(value); }
protected:
void parseProtected(const IDLParserContext& ctxt, const BSONObj& bsonObject);
BSONObj _anchorObj;
private:
string _value;
};
The IDL serializers take mongo::SerializationContext which a class to provide to the functions
that serialize mongo::NameSpaceString and mongo::DatabaseName. For more details see
src/mongo/util/serialization_context.h.
By default IDL parsers do not hold a reference to the BSONObj they parse. In the typical case, of
parsing a command from the network, this is fine since the network buffer outlives the generated
parser. But in other cases, you may want to anchor the BSONObj to the IDL generated parser. To do
this call parseOwned instead of parse.
BSONObj behaves as either a view type (i.e. StringData) or owned type (i.e. std::string). In
the first case, a view type, it is a const char* pointer to a block if memory. It does not control
the lifetime of the memory. In the second case, the owned case, it a block of memory with the first
8 bytes being pair of [uint32, uint32]. The first member is a reference count using
boost::intrusive_ptr and the second is the length of the bson document. The rest of the BSON
document is adjacent to this the second uint32. In this second case, every copy of the BSONObj
increments the reference count and when the reference count drops to zero, the BSONObj deletes the
memory block.
A unowned BSONObj can be converted to a owned type with the method getOwned(). This performs a
memory copy. This method is a no-op if type is already owned.
It can be advantageous to use parseOwned instead of parse since your IDL struct can use object
for fields instead of object_owned which create copies. The parseOwned method only affects the
lifetime of view types like object. IDL deep-copies all other types like string and binary
today.
Chained Structs is IDL's mechanism of IDL reuse by composition. Chained structs allow re-use of common struct definitions across IDL structs and commands.
For instance, the write commands insert, delete, and update all take
bypassDocumentValidation as an optional field. These write commands share
WriteCommandRequestBase as a chained struct that defines bypassDocumentValidation. By using
chained structs, IDL structs share the definition of the fields. This allows users to write a set of
field definitions once and reuse them across structs.
When IDL generates the classes, the chained structs are available as getters/setters on the generated class. This allows code that works with them to treat the shared IDL struct as a shared C++ class. Code can written once to work with the shared struct without having to resort to C++ templates. The fields of a chained struct are not stored in the parent class, they remain in the child chained struct. Also, chaining does not affect the code generation of any chained structs, only the type declares it wants to include chained structs.
If inline_chained_structs is true, then the members of the chained struct are also available on
the struct including them. This means that instead of users have to call
obj.getChainedStruct.getCommonField(), they can call obj.getCommonField() instead. Field storage
is not affected as this option is only syntactic sugar.
There can be multiple levels of chained structs. Be wary of circular chaining when choosing to use multi level chained structs.
description - string - A comment to add to the generated C++fields - sequence - see fields attributes reference belowstrict - bool - defaults to true, a strict parser errors if a unknown field is encountered by
the generated parser. Persisted structs should set this to false to allow them to encounter
documents from future versions of MongoDB without throwing an error.chained_structs - mapping - a list of structs to include this struct. IDL adds the chained
structs as member variables in the generated C++ class. IDL also adds a getter for each chained
struct.inline_chained_structs - bool - if true, exposes chained struct getters as members of this
struct in generated code.immutable - bool - if true, does not generate mutable getters for structsgenerate_comparison_operators - bool - if true, generates support for C++ operatiors: ==,
!=, <, >, <=, >=,non_const_getter - bool - if true, generates mutable getters for non-struct fieldscpp_validator_func - string - name of a C++ function to call after a BSON document has been
deserialized. Function has signature of void <function_name>(<struct_name>* obj). Method is
expected to thrown a C++ exception (i.e. uassert) if validation fails.is_command_reply - bool - if true, marks the struct as a command reply. A struct marked a
is_command_reply generates a parser that ignores known generic or common fields across all
replies when parsing replies (i.e. ok, errmsg, etc)is_generic_cmd_list - string - choice [arg, reply], if set, generates functions bool hasField(StringData) and bool shouldForwardToShards(StringData) for each field in the
struct. If set to arg, the struct will automatically be chained to every command.query_shape_component - bool - true indicates this special serialization code will be generated
to serialize as a query shapeunsafe_dangerous_disable_extra_field_duplicate_checks - bool - undocumented, DO NOT USEdescription - string - A comment to add to the generated C++cpp_name - string - Optional name to use for member variable and getters/setters. Defaults to
camelCase of field name.type - string or mapping - supports a single type, array<type>, or variant. Can also be
arrays.
enum, type, or struct that is defined in an IDL file or
importedarray<type> where type must be a enum, type, struct, or variant.
The C++ type will be std::vector<type> in this casebulkWrite for an example.ignore - bool - true means field generates no code but is ignored by the generated deserializer.
Used to deprecate fields that no longer have an affect but allow strict parsers to ignore them.optional - bool - true means the field is optional. Generated C++ type is
boost::optional<type>.default - string - the default value of type. Types with default values are not required to be
found in the original document or set before serializationsupports_doc_sequence - bool - true indicates the field can be found in a OpMsg's document
sequence. Must use the generated <struct>::parse(OpMsgRequest) parser to use thiscomparison_order - sequence - comparison order for fieldsvalidator - see validator referencenon_const_getter - bool - true indicates it generates a mutable getterunstable - bool - deprecated, prefer stability = unstable insteadstability - string - choice [unstable, stable] - if unstable, parsing the field throws a
field if strict api checking is enabledalways_serialize - bool - whether to always serialize optional fields even if noneforward_to_shards - bool - used by generic arg code to generate shouldForwardToShards, no
affect on BSON deserialization/serializationforward_from_shards - bool - used by generic arg code to generate shouldForwardFromShards, no
affect on BSON deserialization/serializationquery_shape - choice of [anonymize, literal, parameter, custom] - see
[src/mongo/db/query/query_shape.h]Validators generate functions that ensure a value during parse or set in a setter are valid. Comparisons are generated with C++ operators for these comparisons
gt - string - Validates field is greater than stringlt - string - Validates field is less than or equal to stringgte - string - Validates field is greater than stringlte - string - Validates field is less than or equal to stringcallback - string - A static function to call of the shape Status <function_name>(const <cpp_type> value). For non-simple types, value is passed by const-reference.Commands are a customized version of structs designed for MongoDB RPC. All structs are commands but
not all structs are commands. IDL supports the unique needs of commands with additional fields on
the command object when compared to struct.
The special features:
namespace field.OP_MSG, $db must be present or defaults to adminstruct as a replyis_generic_cmd_list: "arg" that are in imported IDL files
will automatically be chained to all commands. The IDL compiler imports
generic_argument.idl by default, so any generic argument struct
defined in that file will be chained to all commands by default.$clusterTime, ok, etc
during parsing. The list of these fields is in generic_argument.idl.The namespace field is the field that describes one kind of parameter a command takes.
concatenate_with_db - takes a collection name. Generates a method const NamespaceString getNamespace(). Examples: insert, update, deleteconcatenate_with_db_or_uuid - takes a collection name. Generates a method const NamespaceStringOrUUID& getNamespaceOrUUID(). Examples: find, countignored - ignores the first argument entirely. Examples: hello, setParameter, pingtype - takes a struct as the first argument. Examples: getLog, clearLog, renameCollectionCommands can also specify their replies that they return. Replies are regular struct with
is_command_reply = true.
description - see structschained_structs - - see structsfields - - see structscpp_name - - see structsstrict - - see structsgenerate_comparison_operators - see structsinline_chained_structs - see structsimmutable - see structsnon_const_getter - see structsnamespace - string - choice of a string [concatenate_with_db, concatenate_with_db_or_uuid,
ignored, type]. Instructs how the value of command field should be parsed - concatenate_with_db - Indicates the command field is a string and should be treated as a
collection name. Typically used by commands that deal with collections. Automatically
concatenated with $db by the IDL parser. Adds a method const NamespaceString getNamespace()
to the generated class. - concatenate_with_db_or_uuid - Indicates the command field is a string or uuid, and should be
treated as a collection name. Typically used by commands that deal with collections.
Automatically concatenated with $db by the IDL parser. Adds a method const NamespaceStringOrUUID& getNamespaceOrUUID() to the generated class. - ignored - Ignores the value of the command field. Used by commands that ignore their command
argument entirely - type - Indicates the command takes a custom type for the first field. type field must be
set.type - string - name of IDL type or struct to parse the command field ascommand_name - string - IDL generated parser expects the command to be named the name of YAML
map. This can be overwritten with command_name. Commands should be camelCasecommand_alias - string - allows commands to have multiple names. DO NOT USE. Some older commands
have both lowercase and camelCase names.reply_type - string - IDL struct that this command replies with. Reply struct must have
is_command_reply setapi_version - string - Typically set to the empty string "". Only set to a non-empty string if
command is part of the stable API. Generates a class name
<command_name>CommandNameCmdVersion1Gen derived from TypedCommand that commands should be
derived from.is_deprecated - bool - indicates command is deprecatedallow_global_collection_name - bool - if true, command can accept both collect names and
non-collection names. Used by the aggregate commandaccess_check - mapping - see access check referenceA list of privileges the command checks. Only applicable for commands that are a part of API Version 1. Checked at runtime when test commands are enabled.
none - bool - No privileges requiredsimple - mapping - single check or privilegecomplex - sequence - list of check and/or privilegecheck - string - checks a part of the access control system like is_authenticated. See
src/mongo/db/auth/access_checks.idl for a complete list.privilege - mapping
resource_pattern - string - a resource pattern to check for a given set of privileges. See
MatchType enum in src/mongo/db/auth/action_type.idl for
complete list.action_type - sequence - list of action types the command may check. See ActionType enum in
src/mongo/db/auth/action_type.idl for complete list.agg_stage - string - aggregation only. Name of aggregation stage. Used to appease the idl
compatibility checker.The IDL compiler is organized as a traditional compiler written in Python 3 (originally Python 2)
and is located in buildscripts\idl. It has 3 passes and has two
different tree representations that pass between passes. Having multiple passes reduces the
complexity of each pass by separating tasks across different files.
Here is an example of how IDL processes a file example.idl.
sequenceDiagram
title: IDL Flow for example.idl
participant Compiler
participant Parser
participant Binder
participant Generator
Compiler->>Parser: parser.parse("example.idl")
Parser->>Compiler: syntax.IDLSpec
Compiler->>Binder: binder.bind()
Binder->>Compiler: ast.IDLBoundSpec
Compiler->>Generator: generator.generate_code
Generator-->Generator: generator._generate_header
Generator-->Generator: generator._generate_source
Generator->>Compiler: Ok
compiler.py orchestrates the 3 passes by calling each
one in sequence. For instance, it calls the parser and passes the syntax tree it returns to the
binder. It also fixes up the include files for the generated code.
The two trees (syntax and ast) share just one type common.SourceLocation between them. While it
means there is some duplication between trees, it makes code readability better. If types were
shared between passes but with some fields just read/written in some passes, it would make reasoning
about the code more difficult.
_gen.h and source
file _gen.cpp. Generator does no error checks (it does have a few asserts though) as error
checking is the responsibility of earlier passes.IDL compiler does not throw exceptions. The C++ generated code does throw exceptions though. The
compiler adds all errors to the errors.ParserContext in
errors.py. This allows the IDL to capture more than one
error from the user's IDL file and report it to the user. All errors codes start with ID and are
of the format IDNNNN where N is a number. The python unit tests assert these error codes in
negative tests but by using string constants ERROR_ID_... for each error.
IDL has two sets of tests:
Since IDL is a python script, it is quick to iterate on since it does not need to be compiled. When making changes to IDL, it is recommended to call the IDL compiler directly instead of through the build system. If the IDL scripts are changed, this often triggers all the IDL generated files to be regenerated and then recompiled. It can be faster to just invoke the scripts manually and then invoke the compiler by hand also. Every IDL file has the python invocation to generate it printed at the top of the file.
When extending IDL, add tests to the python unit tests and C++ unit tests. With few exceptions, the unit tests exercise all features and combinations IDL can handle.
The parsing method a struct is initialized with indicates what type of ownership the constructed
object has on the BSONObj parameter. An internal BSONObj anchor ensures that the lifetime of
the BSONObj matches the lifetime of the object in the cases that the BSONObj parameter is
owned or shared.
If the struct is a view, then it's possible that objects of the type will not own all of its
members. If the struct is not a view, then objects of the type are guaranteed to own all of its
members. This is determined by recursively checking the fields of a struct. This info is used
during generation to determine whether or not a struct will need a BSONObj anchor.
IDL has been in use since 2017. In that time, here are a few best practices:
strict: false.
It's better for upgrade/downgrade. Commands should set strict: true or omit it as strict: true is the default. 1. For persistance: For upgrade/downgrade, if a persisted document with a strict parser has a
field added in new version N+1 and then the user downgrades to old version N, the strict
parser will throw an exception and reject the document. If this document was part of the
storage catalog for instance, the server would fail to start. 2. For commands: By using strict parsers, it gives the server the ability to add fields without
the risk of clients accidentally sending fields with the same name that had been ignored.object_owned instead of object. If your IDL uses object, it does not own
that BSONObj that it is returned from its getter. This means that once the BSONObj that was
passed to parse() goes out of scope, the object will point to free memory. Use object_owned
if this is not desired. object_owned incurs extra memory allocations though.
parseSharingOwnership or parseOwned methods. These
methods will ensure the IDL generated class has an anchor to the BSONObj. See comments in
the generated class. It is not advisable though to use these methods during normal command
request processing. The network buffer that holds the inbound request is available during the
lifetime of the request even though IDL does not anchor the network buffer.Adding new functionality to IDL should be accompanied by adding tests to idl_test.cpp, adding
tests to buildscripts/idl/tests, and adding the necessary schema to idl_schema.json.