g3doc/proposals/nvidia_driver_differ.md
Status as of 2024-08-14: Completed. To get an overview of what was ultimately implemented, check out the presentation here.
This tool is intended to make it easier to support new Nvidia driver versions within nvproxy. Any new version of an Nvidia driver can come with changes to its ioctl structs, and if nvproxy supports those structs, it will have to copy those changes as well. Prior to the implementation of this proposal, however, finding those changes is both difficult and tedious. This tool attempts to automate the bulk of this work.
This document goes over some design proposals of how this tool should be built, especially how it should interface with nvproxy.
Let's say we want to add support for a new driver version B. At a high level, the work of this tool can be broken into the following steps:
The biggest roadblock to implementing this is that the only immediate information reported by nvproxy is the ioctl calls it supports. Some ioctl calls have corresponding structs defined, but:
size bytes. Most control commands are
like this.Additionally, nested structs are also a concern. For example, some structs may be defined like so:
type IOCTL_FOO struct {
Foo uint32
Bars [MAX_BARS]IOCTL_BAR
}
type IOCTL_BAR struct {
Bar uint32
}
This means we not only have to map ioctl calls to their corresponding structs, but also (recursively) parse their fields to see if there are nested structs.
We can split this tool into two parts. The first part can be a Go tool that is built against nvproxy and finds the list of structs for version B. Once we have a specific list of structs to look up, we can pass that to a C++ tool that uses Clang's C++ AST Matcher API to find the corresponding struct definitions in the driver source code. These definitions are then passed back to the Go tool, which does the necessary diffing and reporting back to the user.
The primary problem to tackle on the Go side is how to get the list of struct
names nvproxy depends on for a given version B. Since this system should allow
for versioning of these struct names, we can extend the existing driverABI
struct to include this information.
However, almost every normal use case of driverABI will not need to use these
names, and they should not be sitting around wasting memory. Thus, we can add a
getStructNames function to driverABI that will construct and return the list
of relevant names only when they are needed. It should look like this:
type driverABI struct {
frontendIoctl map[uint32]frontendIoctlHandler
uvmIoctl map[uint32]uvmIoctlHandler
controlCmd map[uint32]controlCmdHandler
allocationClass map[nvgpu.ClassID]allocationClassHandler
getStructNames driverStructNamesFunc
}
type driverStructNamesFunc func() *driverStructNames
type driverStructNames struct {
frontendNames map[uint32][]string
uvmNames map[uint32][]string
controlNames map[uint32][]string
allocationNames map[nvgpu.ClassID][]string
}
The fields in driverStructNames map every ioctl to a list of struct names that
it depends on (this is a list to support the case of nested structs). By
explicitly mapping each struct name to their corresponding ioctl, it should make
this list easy to maintain. We can compare against the ioctls included in the
ABI map to ensure every ioctl call is accounted for in each version. It also
makes it easier to modify definitions for a specific ioctl number due to a
version change.
There are a few cases to consider when generating the list of names for an ioctl:
getStructName(any) that takes a struct and returns its corresponding
driver name in a []string. How this should be done is discussed further
below.simpleIoctl(name) that simply returns a []string with one element, to
make it more explicit.NV_ESC_RM_ALLOC) that allow
multiple types of parameters. In this case, corresponding lists for each
parameter type can be merged.Concretely, this would look something like this:
driverStructNames{
frontendNames: map[uint32][]string{
NV_ESC_RM_ALLOC_MEMORY: append(getStructName(NVOS21Parameters{}), getStructName(NVOS64Parameters{})...),
},
uvmNames: map[uint32][]string{
UVM_ALLOC_SEMAPHORE_POOL: getStructName(UVM_ALLOC_SEMAPHORE_POOL_PARAMS{})
},
controlCmd: map[uint32][]string{
NV2080_CTRL_CMD_GPU_GET_NAME_STRING: simpleIoctl("NV2080_CTRL_GPU_GET_NAME_STRING_PARAMS"),
},
allocationNames: map[nvgpu.ClassID][]string{
NV01_MEMORY_SYSTEM: getStructName(NV_MEMORY_ALLOCATION_PARAMS{}),
},
}
Looking specifically now at getStructName, there are a few ways in which it
can be implemented:
We can require that struct names in nvproxy are exactly the same as their counterpart in the Nvidia driver. This way, Go's reflect package can be used to simply read the name of the struct being passed in.
To handle versioning changes, we can agree on some suffix format. For
example, everything after a double underscore is ignored. This way, both
PARAMS and PARAMS__V550 can be defined.
We can introduce struct tags that specify the name of the corresponding struct in the Nvidia driver code, which would always sit on the first field. This could look something like this:
type IOCTL_FOO_V550 struct {
Foo uint32 `nvproxy:"ioctl_foo"`
Bars [MAX_BARS]IOCTL_BAR
Baz uint64
}
This struct tag can be read using reflect. For structs that are named the
same between nvproxy and the Nvidia driver, we can also have a convenient
nvproxy:"same" case that simply uses the struct’s name.
Instead of using a struct tag, we can use a struct comment similar to // +marshal or // +stateify. An external tool would then run on the nvproxy
package, find each struct with the struct comment, and implement an
interface that reports back the corresponding driver name.
type NvidiaDriverStruct interface {
func GetDriverName() string
}
// +nvproxy ioctl_foo
type IOCTL_FOO_V550 struct {
Foo uint32
Bars [MAX_BARS]IOCTL_BAR
Baz uint64
}
// Auto-generated
func (s IOCTL_FOO_V550) GetDriverName() string {
return "ioctl_foo"
}
The use of an external tool makes this method a lot more involved, and
potentially expensive to maintain. The main benefit is that it is a better
convention than requiring tags on the first field. The code will also be
similar to go_marshal or go_stateify, so a lot could be copied over.
Specifically, the code generation step and the code to collect all annotated
types in Generator.collectMarshallableTypes can be the same.
Comparing these three ideas, numbers 1 and 2 are definitely the easiest to implement. Idea 2 will be more robust as well, since we don’t have to worry about Nvidia driver structs potentially having double underscores or whatever separator we decide on. In the end, idea 2 was implemented; if it is important to maintaining convention, idea 3 can still be implemented afterwards.
There is also the problem of nested structs that needs to be addressed. Although the Go side can try and tackle this problem, it would be hard to maintain for the simple structs that are not defined in nvproxy, as we would have to manually check if they have nested structs and write down what they are. Thus, it would be easier to make the C++ Clang tool do this, and simply have the Go tool find the list of all top-level structs.
After gathering a list of struct names to verify, this tool can locally clone the code for both versions A and B. From here, Clang's C++ AST Matcher API can be used to generate an AST and find the struct definitions given the name.
The Clang API includes the ability to quickly set up command line tools to run
the AST matcher; this
tutorial in the
documentation covers everything this tool needs to do. Out of the box, it takes
in a source file, and allows you to run any set of matchers on that source file.
This means we can create a small C++ file that #includes all the header files
that contain struct definitions, similar to what
Geohot does with his sniffer.
Clang will automatically expand these #includes, so any struct defined in
there will be matchable.
In the driver source code, all structs are named via a typedef. This means the
tool should try and match against a typedef with a given struct name, and then
look at the struct type aliases. This is done with the following Clang matcher
expression:
typedefDecl(
allOf(
hasName(struct_name),
// Match and bind to the struct declaration.
hasType(
// Need to specify elaboratedType, otherwise hasType
// will complain that the type is ambiguous.
elaboratedType(
hasDeclaration(recordDecl().bind("struct_decl"))
)
)
)
).bind("typedef_decl");
A few structs in the driver share the same definition, so they are defined via
typedefs to each other. These structs will not get matched by the expression
above; instead, the tool should check that the typedefDecl is mapped to another
typedefDecl rather than a recordDecl, like so:
// Matches definitions like
// typedef NV906F_CTRL_GET_CLASS_ENGINEID_PARAMS NVC36F_CTRL_GET_CLASS_ENGINEID_PARAMS;
typedefDecl(
allOf(
hasName(struct_name),
// Match and bind to the struct declaration.
hasType(
// Need to specify elaboratedType, otherwise hasType
// will complain that the type is ambiguous.
elaboratedType(
hasDeclaration(typedefDecl())
)
)
)
).bind("typedef_decl");
These cases can be recorded as type aliases in the JSON output, described in more detail below.
Running this matcher will provide a binding to a clang::RecordDecl node
corresponding to the struct definition. From here, we can iterate through the
fields and get their name and type using clang::FieldDecl.getNameAsString()
and clang::FieldDecl->getType().getAsString().
One edge case is if the field type is an anonymous struct or union, like so:
typedef struct IOCTL_WITH_UNION {
int foo;
union {
int bar;
int baz;
} data;
}
Trying to get the type name directly will yield an auto generated name that
includes the absolute file path, which is not easy to compare. Instead, the tool
should check if a type is anonymous using clang::Type.hasUnnamedOrLocalType,
and create a standardized name if not. The standardized name can be of the form
PARENT_RECORD::FIELD_t; for example, IOCTL_WITH_UNION::data_t for the
example above.
The Clang tool should also recurse into any nested structs. Since it already has
the clang::QualType of each field, there are two possible cases to consider:
clang::RecordDecl
node.Along the way, the tool can also record the true type of any other field types
it find using clang::QualType.getCanonicalType(), in case these simple types
ever change. For example, the tool might record that NvHandle is an unsigned int
Finally, the Go side needs some way to interface with the C++ Clang side. To make things simple, the inputs and outputs can be encoded in JSON. Overall, interfacing with the parser would go something like this:
./driver_ast_parser --input=input.json source_file.cc
Input:
{
"structs": ["STRUCT", "NAMES", "HERE", ...],
"constants": ["CONSTANT", "NAMES", "HERE", ...]
}
Output:
{
// Named records since this captures both structs and unions found
"records": {
"STRUCT_NAME": {
"fields": [
{"name": "field1", "type": "int"},
{"name": "field2", "type": "NvHandle"}
],
"source": "/path/to/source/file.cc:line_number"
},
...
},
// All the typedefs found
"aliases": {
"NvHandle": "unsigned int"
},
"constants": {
"CONSTANT_NAME": UINT_VALUE
}
}
Beyond the nvproxy changes and C++ Clang parser, there are a few other details to work out.
The first is actually getting the driver source code locally for Clang to parse through. This can be done by cloning from the NVIDIA driver's GitHub repo:
git clone -b $VERSION --depth 1 https://github.com/NVIDIA/open-gpu-kernel-modules.git $SAVE_PATH
Next, the parser needs some source file to analyze and parse through. As
mentioned above, the easiest way to make this would be to create a single C++
file that #includes every relevant driver header file with struct definitions.
Finding these relevant header files does require hard-coding some paths;
however, the driver file structure seems very stable for now. Currently, the
list of header files is:
src/common/sdk/nvidia/inc/nvos.hsrc/nvidia/arch/nvalloc/unix/include/nv-ioctl.hsrc/nvidia/arch/nvalloc/unix/include/nv-unix-nvos-params-wrappers.hkernel-open/nvidia-uvm/uvm_ioctl.hkernel-open/nvidia-uvm/uvm_linux_ioctl.hsrc/common/sdk/nvidia/inc/ctrl/*.hsrc/common/sdk/nvidia/inc/ctrl/*/*.hsrc/common/sdk/nvidia/inc/class/*.hThese header files also #include from other header files. The include paths
for these files are as follows:
src/common/sdk/nvidia/incsrc/common/shared/incsrc/nvidia/arch/nvalloc/unix/includekernel-open/common/incUnfortunately, there are many duplicate definitions between non-UVM and UVM files. This means that the C++ parser should be run twice per driver version, for the non-UVM and UVM sources respectively.
To let Clang know about these include paths, a compile_commands.json file is
needed. The format of this file is documented
here, but for the
use case of this tool, the structure will always look as follows:
[
{ "directory": "source/file/directory",
"arguments": ["clang", "-I", "include/path/1", "-I", "include/path/2", ..., "non_uvm_source_file.cc"],
"file": "non_uvm_source_file.cc"
},
// repeated for UVM source file
]
Clang requires that the file is called compile_commands.json, and it
assumes that it exists in the same directory as the file being parsed. As such,
the differ will likely need to create a temporary directory when running, with
the following format:
temp_dir
\ driver_source_dir
\ compile_commands.json
\ non_uvm_source_file.cc
\ uvm_source_file.cc
Altogether, the differ will behave as follows:
non_uvm_source_file.cc
and uvm_source_file.cc.compile_commands.json.Do you love tests? Well luckily for you, there are a few tests that should be built around this diffing tool.
First, a few continuous tests should be made to ensure the list of struct names
is kept up to date. For every version covered by nvproxy’s ABI tree, one test
can check whether there are any supported ioctls that are missing in
driverStructNames, and another can run the parser to verify that every struct
name reported in driverStructNames actually exists in the driver source code.
There should also be a continuous test that uses this tool to verify that
nvproxy is correct. Rather than trying to use the differ, however, it might be
easier to just use the C++ Clang parser and verify individual versions of the
ABI. This test should take the driverStructNames for a given version, find the
corresponding driver struct definitions, and then match it against the nvproxy
equivalent struct.
This would require augmenting the driverABI mapping to also return struct
instances, which can be read using Go’s reflect library. Specifically, instead
of mapping ioctls to []strings, they can be mapped to slices of strings and
struct instances, like so:
type DriverStruct struct {
Name string
Instance any
}
type driverStructNames struct {
frontendNames map[uint32][]DriverStruct
uvmNames map[uint32][]DriverStruct
controlNames map[uint32][]DriverStruct
allocationNames map[nvgpu.ClassID][]DriverStruct
}
This allows for comparisons of struct definitions within nvproxy and the NVIDIA driver.
When verifying a struct, there are a few cases that can happen. The first case
is when nvproxy treats an ioctl as simple (DriverStruct.Instance == nil). The
test should look for a few signs in the driver definition, to verify that the
struct is actually simple:
NvP64, the struct is not simple."fd", the struct is not simple.Another case is when nvproxy defines a struct for a parameter, but the Nvidia
driver uses a simple type alias. NvHandle seems to be the only example of
this:
// nvproxy definition
type Handle struct {
Val uint32 `nvproxy:"NvHandle"`
}
// Driver definition
typedef NvU32 NvHandle;
To verify this, the test can compare the sizes of the two types and ensure they remain identical.
The last case is when both nvproxy and the driver have struct definitions. When thinking about how this can be done, there are a few complications to keep in mind:
Sometimes nvproxy flattens structs or unions. For example:
// nvproxy definition
type IOCTL_WITH_NESTED_STRUCT struct {
int foo;
int bar;
int baz;
}
// Driver definition
typedef struct {
int foo;
struct {
int bar;
int baz;
} data;
} IOCTL_WITH_NESTED_STRUCT;
Some unions are simply represented by [n]byte fields.
Some nvproxy structs use struct embedding, which should be accounted for
when looking through the fields using reflect.
type NV_MEMORY_ALLOCATION_PARAMS_V545 struct {
NV_MEMORY_ALLOCATION_PARAMS `nvproxy:"NV_MEMORY_ALLOCATION_PARAMS"`
NumaNode int32
_ uint32
}
nvproxy structs can have additional fields added for padding.
To alleviate the problem of nested or flattened structs, all struct definitions can be pre-flatten before comparing them. This will yield an array of fields for both sides. For example, this definition
typedef struct {
int a1;
int a2;
IOCTL_B b;
} IOCTL_A;
typedef struct {
bool b1;
bool b2;
IOCLT_C c;
bool b3;
} IOTCL_B;
typedef struct {
unsigned int c;
} IOCTL_C;
would be flattened into
[
int a1,
int a2,
bool b1,
bool b2,
unsigned int c,
bool b3,
]
Next, fields that have the same offset should be compared. Due to padding and union types, multiple nvproxy fields may correspond to a single driver field; however, as long as each driver field has a corresponding nvproxy field at the same offset, the extraneous fields do not matter. The following pseudo-code accomplishes all of this:
doStructsMatch(nvproxyType, driverType) -> bool
if nvproxyType.Size != driverType.Size:
return false
nvproxyFields = Flatten(nvproxyType)
driverFields = Flatten(driverType)
for each ith field in driverFields:
find the jth field in nvproxyFields with the same offset
if such a field doesn't exist:
return false
if !doTypesMatch(nvproxyFields[j].Type, driverFields[i].Type):
return false
return true
doTypesMatch(nvproxyType, driverType) -> bool
if driverType is an array:
if nvproxyType is not an array of the same length:
return false
recurse on the base type of each array
// These are special types that nvproxy has type definitions for
Check the following mappings from driverType -> nvproxyType:
NvHandle -> Handle
NvP64 -> P64
NvProcessorUuid -> NvUUID
// E.g. NvU32 aliases unsigned int
if driverType has an alias:
driverType = alias
Check the following mappings from driverType -> nvproxyType:
char -> byte
unsigned char -> uint8
short -> int16
unsigned short -> uint16
int -> int32
unsigned int -> uint32
long long -> int64
unsigned long long -> uint64
enum _ -> uint32
union -> [n]byte
struct -> doStructsMatch(nvproxyType, driverType)
This all requires some changes on the C++ parser side as well. Namely, it should
report sizes for records and aliases, whether a record is a union type, and
offsets for each record field. This can be done with
clang::ASTContext.getTypeInfo, clang::TagDecl.isUnion, and
clang::ASTContext.getFieldOffset respectively.
Occasionally, driver structs might change not by introducing a new field, but by
changing the purpose of an existing field. For example, a previously reserved
integer field might now be used as a file descriptor field, meaning that nvproxy
would need to add special handling for it. Although the differ reports changes
in field names, it could also report any code changes it thinks are necessary.
This could behave similarly to the verification test, which looks at simple
clues such as NvP64 types or fields ending in "fd".
Currently, nvproxy only support specific versions of the Nvidia driver. However, many intermediate versions likely do not have any breaking changes, and it is detrimental to users if they are forced to only use some specific driver versions. This differ tool could be used to find ranges of ABI versions that have no change, and nvproxy could support any version with this range.
In the future, nvproxy can record additional information using the
nvproxy:"..." tags. For example, any NvP64 field could be tagged with the
struct type that the pointer represents, allowing tests to recurse on these
hidden dependencies.