third_party/blink/renderer/bindings/scripts/bind_gen/README.md
[TOC]
Python package
bind_gen
is the core part of Blink-V8 bindings code generator.
generate_bindings.py
is the driver script, which takes a Web IDL database (web_idl_database.pickle
generated by
web_idl_database
GN target) as an input and produces a set of C++ source files of Blink-V8
bindings (v8_*.h, v8_*.cc).
The bindings code generator is implemented as a tree builder of CodeNode
which is a fundamental building block. The following sub sections describe
what CodeNode is and how the code generator builds a tree of CodeNode.
CodeNodeThe code generator produces C++ source files (text files) but the content of each file is not represented as a single giant string nor a list of strings. The content of each file is represented as a CodeNode tree.
CodeNode is a fundamental building block that represents a text fragment in
the tree structure. A text file is represented as a tree of CodeNodes, each of
which represents a corresponding text fragment. The code generator is the
CodeNode tree builder.
Here is a simple example to build a CodeNode tree.
# SequenceNode and TextNode are subclasses of CodeNode.
def make_prologue():
return SequenceNode([
TextNode("// Prologue"),
TextNode("SetUp();"),
])
def make_epilogue():
return SequenceNode([
TextNode("// Epilogue"),
TextNode("CleanUp();"),
])
def main():
root_node = SequenceNode([
make_prologue(),
TextNode("LOG(INFO) << \"hello, world\";"),
make_epilogue(),
])
The root_node above represents the following text.
// Prologue
SetUp();
LOG(INFO) << "hello, world";
// Epilogue
CleanUp();
The basic features of CodeNode are implemented in code_node.py. Just for convenience, CodeNode subclasses corresponding to C++ constructs are provided in code_node_cxx.py.
CodeNode has an object-oriented design and has internal states (not only the
parent / child nodes but also more states to support advanced features).
The bindings code generator consists of multiple sub code generators. For
example, interface.py is a sub code generator of Web IDL interface and
enumeration.py is a sub code generator of Web IDL enumeration. Each Web IDL
definition has its own sub code generator.
This sub section describes how a sub code generator builds a CodeNode tree and
produces C++ source files by looking at
enumeration.py
as an example. The example code snippet below is simplified for explanation.
def generate_enumerations(task_queue):
for enumeration in web_idl_database.enumerations:
task_queue.post_task(generate_enumeration, enumeration.identifier)
generate_enumerations
is the entry point to this sub code generator. In favor of parallel processing,
task_queue is used. generate_enumeration (singular form) actually produces
a pair of C++ source files (*.h and *.cc).
def generate_enumeration(enumeration_identifier):
# Filepaths
header_path = path_manager.api_path(ext="h")
source_path = path_manager.api_path(ext="cc")
# Root nodes
header_node = ListNode(tail="\n")
source_node = ListNode(tail="\n")
# ... fill the contents of `header_node` and `source_node` ...
# Write down to the files.
write_code_node_to_file(header_node, path_manager.gen_path_to(header_path))
write_code_node_to_file(source_node, path_manager.gen_path_to(source_path))
The main task of
generate_enumeration
is to build CodeNode trees and write them down to files. A key point here
is to build two trees in parallel;
one for *.h and the other for *.cc. We can add a function declaration to the
header file while adding the corresponding function definition to the source
file. The following code snippet is an example to add constructors into the
header file and the source file.
# Namespaces
header_blink_ns = CxxNamespaceNode(name_style.namespace("blink"))
source_blink_ns = CxxNamespaceNode(name_style.namespace("blink"))
# {header,source}_blink_ns are added to {header,source}_node (the root
# nodes) respectively.
# Class definition
class_def = CxxClassDefNode(cg_context.class_name,
base_class_names=["bindings::EnumerationBase"],
final=True,
export=component_export(
api_component, for_testing))
ctor_decls, ctor_defs = make_constructors(cg_context)
# Define the class in 'blink' namespace.
header_blink_ns.body.append(class_def)
# Add constructors to public: section of the class.
class_def.public_section.append(ctor_decls)
# Add constructors (function definitions) into 'blink' namespace in the
# source file.
source_blink_ns.body.append(ctor_defs)
In the above code snippet,
make_constructors
creates and returns a CodeNode tree for the header file and another CodeNode
tree for the source file. For most cases, functions named make_xxx creates
and returns a pair of the CodeNode trees. These functions are subtree builders
of the CodeNode trees.
These subtree builders are implemented in a way of functional programming (unlike CodeNodes themselves are implemented in a way of object-oriented programming). These subtree builders create a pair of new CodeNode trees at every function call (returned code node instances are different per call, so their internal states are separate), but the contents are 100% determined solely by the input arguments. This property is very important when we use closures in advanced use cases.
So far, the typical code structure of the sub code generators is covered.
enumeration.py consists of several make_xxx functions (subtree builders) +
generate_enumeration (the top-level tree builder + file writer).
Bindings code generation has the following typical problems. Suppose we have the following simple code generator.
# Example of simple code generation
def make_foo():
return SequenceNode([
TextNode("HeavyResource* res = HeavyFunc();"),
TextNode("Foo(res);"),
])
def make_bar():
return SequenceNode([
TextNode("HeavyResource* res = HeavyFunc();"),
TextNode("Bar(res);"),
])
def main():
root_node = SequenceNode([
make_foo(),
make_bar(),
])
This produces the following C++ code, where we have two major problems. The
first problem is a symbol conflict: res is defined twice. Even if we gave
different names like res1 and res2, we have the second problem: the
produced code calls HeavyFunc twice, which is not efficient.
// Output of simple code generation example
HeavyResource* res = HeavyFunc();
Foo(res);
HeavyResource* res = HeavyFunc();
Bar(res);
Ideally we'd like to have the following code, without introducing tight coupling
between make_foo and make_bar.
// Ideal generated code
HeavyResource* res = HeavyFunc();
Foo(res);
Bar(res);
In order to resolve the above problems, the bindings code generator supports two-step code generation. This way may look like declarative programming.
# Example of two-step code generation
def bind_vars(code_node):
local_vars = [
SymbolNode("heavy_resource",
"HeavyResource* ${heavy_resource} = HeavyFunc(${address}, ${phone_number});"),
SymbolNode("address",
"String ${address} = GetAddress();"),
SymbolNode("phone",
"String ${phone_number} = GetPhoneNumber();"),
]
for symbol_node in local_vars:
code_node.register_code_symbol(symbol_node)
def make_foo():
return SequenceNode([
TextNode("Foo(${heavy_resource});"),
])
def make_bar():
return SequenceNode([
TextNode("Bar(${heavy_resource});"),
])
def main():
root_node = SymbolScopeNode()
bind_vars(root_node)
root_node.extend([
make_foo(),
make_bar(),
])
The above code generator has two kinds of code generation. One kind is
make_foo and make_bar, which are almost the same as before except for use
of a template variable (${heavy_resource}). The other kind is bind_vars,
which provides a catalogue of symbol definitions. We can make the definitions
of make_foo and make_bar simple with using the catalogue of symbol
definitions. This code generator produces the following C++ code
without producing duplicated function calls.
// Output of two-step code generation example
String address = GetAddress();
String phone_number = GetPhoneNumber();
HeavyResource* heavy_resource = HeavyFunc(address, phone_number);
Foo(heavy_resource);
Bar(heavy_resource);
The mechanism of two-step code generation is simple.
SymbolNode(name, definition) consists of a symbol name and code fragment that
defines the symbol. When a symbol name is referenced as ${symbol_name}, it's
simply replaced with symbol_name, plus it triggers insertion of the symbol
definition into a surrounding SequenceNode. This step happens recursively.
So not only heavy_resource's definition but also address and
phone_number's definitions are inserted, too.
With the two-step code generation, it's possible (and expected) to write code generators in the declarative programming style, which works better in general than the imperative programming style.
SymbolNode consists of a symbol name and its definition. You can reference a
symbol as ${symbol_name} in TextNode and FormatNode. It's okay that you
never reference a symbol. The symbol definition will be automatically inserted
only when you reference the symbol.
For simple use cases, a SymbolNode can be constructed from a pair of a symbol
name and a plain text (which can contain references in the form of ${...}) as
the definition.
# Example of simple use cases
addr_symbol = SymbolNode("address",
"void* ${address} = ${base} + ${offset};")
For more complicated use cases, SymbolNode's definition can be a callable that returns a SymbolDefinitionNode instead. This is useful when the definition has a complex structure of code node tree, since a plain text definition cannot represent a code node tree structure.
# Example of complicated use cases
def create_address(symbol_node):
node = SymbolDefinitionNode(symbol_node)
node.extend([
TextNode("void* ${address} = ${base} + ${offset};"),
CxxUnlikelyIfNode(
cond="!${address}",
attribute=None,
body=[
TextNode("${exception_state}.ThrowRangeError(\"...\");"),
TextNode("return;"),
]),
])
return node
addr_symbol = SymbolNode("address",
definition_constructor=create_address)
where CxxUnlikelyIfNode represents a C++ if statement with an unlikely condition (defined in code_node_cxx.py). This definition is better than a plain text definition because it inserts the definition of ${exception_state} at the best position depending on how much likely ${exception_state} is actually used.
// Output of the example of complicated use cases
void* base = ...; // ${base}'s definition is automatically inserted.
void* offset = ...; // ${offset}'s definition is automatically inserted.
// ${exception_state}'s definition may be inserted here if it's used often or
// outside of the following if statement.
// ExceptionState exception_state(...);
void* address = base + offset;
if (!address) {
// ${exception_state}'s definition may be inserted here if it's not used often
// or outside of this if statement.
ExceptionState exception_state(...);
exception_state.ThrowRangeError("...");
return;
}
SymbolDefinitionNode represents the code fragment that defines a symbol. The code generator automatically inserts symbol definitions at the best positions heuristically. However it's hard to determine the best position in one path calculation, so the code generator iterates symbol definition insertions/relocations until it finds the heuristically best positions. SymbolDefinitionNode is used to identify a subtree of code nodes that defines its symbol (i.e. used to distinguish automatically inserted code nodes from the original code node tree).
SequenceNode represents not only a list of CodeNodes but also insertion points of SymbolDefinitionNode. SymbolDefinitionNodes will be inserted between elements within a SequenceNode.
Compared to SequenceNode, ListNode represents just a list of CodeNodes that does not support automatic insertion of symbol definitions, i.e. ListNode is indivisible. SequenceNode should be used when your code nodes represent a series of C++ statements, otherwise ListNode is preferred over SequenceNode so that nothing will be inserted in between. See the following example.
# Example of SequenceNode vs ListNode
int_array = ListNode([
TextNode("int int_array[] = {"),
ListNode([
TextNode("${foo}"),
TextNode("${bar}"),
], separator=","),
TextNode("};"),
])
node = SequenceNode([
int_array,
TextNode("PrintIntArray(int_array);"),
])
This example produces the following C++ code. Since symbol definitions are
inserted only between elements of SequenceNode, ${foo} and ${bar}'s definitions
won't be inserted within int_array's definition.
// Output of SequenceNode vs ListNode example
int foo = ...; // ${foo}'s definition is automatically inserted here.
int bar = ...; // ${bar}'s definition is automatically inserted here.
int array[] = {
// ${foo}'s definition is _not_ inserted here.
foo,
// ${bar}'s definition is _not_ inserted here.
bar
};
PrintIntArray(int_array);
You can register SymbolNodes only into a SymbolScopeNode. Registered symbols
are effective only inside the SymbolScopeNode. This behavior reflects that
C++ variables are effective only inside the closest containing C++ block
({...}).
The driver script
generate_bindings.py
supports two useful command line flags:
--format_generated_files and --enable_code_generation_tracing.
--format_generated_files runs clang-format for the generated files so that
they are easy for developers to read.
--enable_code_generation_tracing outputs code comments (e.g.
/* make_wrapper_type_info:6304 */ in addition to the regular output in order
to clarify which line of the code generator code generated which line of
generated code.
This is useful to understand the correspondence between the code generator and
generated code.
When the tracing comments show functions which are too common and uninteresting
to you (e.g. make_blink_to_v8_value), you can exclude such functions
module-by-module basis by using
CodeGenTracing.add_modules_to_be_ignored.
Here is an example command line to run the script with the options (working fine as of 2024 May).
# Run generate_bindings.py with --format_generated_files and
# --enable_code_generation_tracing.
#
# web_idl_database.pickle must have already been generated and updated.
# Or, run 'autoninja -C out/Default web_idl_database' in advance.
$ cd out/Default
$ python3 ../../third_party/blink/renderer/bindings/scripts/generate_bindings.py \
async_iterator callback_function callback_interface dictionary enumeration interface namespace observable_array sync_iterator typedef union \
--web_idl_database gen/third_party/blink/renderer/bindings/web_idl_database.pickle \
--root_src_dir=../.. \
--root_gen_dir=gen \
--output_reldir=core=third_party/blink/renderer/bindings/core/v8/ \
--output_reldir=modules=third_party/blink/renderer/bindings/modules/v8/ \
--output_reldir=extensions_chromeos=third_party/blink/renderer/bindings/extensions_chromeos/v8/ \
--format_generated_files \
--enable_code_generation_tracing