src/coreclr/debug/ee/amd64/gen_amd64InstrDecode/README.md
amd64InstrDecode.hThe following process was executed on an amd64 Linux host in this directory.
# Create the program createOpcodes
gcc createOpcodes.cpp -o createOpcodes
# Execute the program to create opcodes.cpp
./createOpcodes > opcodes.cpp
# Compile opcodes.cpp to opcodes
gcc -g opcodes.cpp -o opcodes
# Disassemble opcodes
gdb opcodes -batch -ex "set disassembly-flavor intel" -ex "disass /r opcodes" > opcodes.intel
# Parse disassembly and generate code
# Build as a separate step so it will display build errors, if any.
../../../../../../dotnet.sh build
cat opcodes.intel | ../../../../../../dotnet.sh run > new_amd64InstrDecode.h
After checking it, copy the generated new_amd64InstrDecode.h to ../amd64InstrDecode.h.
This process can be run using the createTables.sh script in this directory.
The primary purpose of amd64InstrDecode.h is to provide a reliable
and accurate mechanism to implement the amd64
NativeWalker::DecodeInstructionForPatchSkip(..) function.
This function needs to be able to decode an arbitrary amd64
instruction. The decoder currently must be able to identify:
To get this right is complicated, because the amd64 instruction set is complicated.
A high level view of the amd64 instruction set can be seen by looking at:
AMD64 Architecture Programmer's Manual Volume 3: General-Purpose and System Instructions
Section 1.1 Instruction Encoding Overview
Figure 1-1. Instruction Encoding Syntax
also:
Intel(R) 64 and IA-32 Architectures Software Developer's Manual
Volume 2: Instruction Set Reference, A-Z
Chapter 2 Instruction Format
Also useful in the manuals:
AMD: Appendix A: Opcode and Operand Encodings
Intel: Volume 2, Appendix A: Opcode Map
The general behavior of each instruction can be modified by many of the bytes in the 1-15 byte instruction (15 is the maximum byte length of an instruction).
This set of files generates a metadata table by extracting the data from sample instruction disassembly.
The process entails:
What set of possible instruction encodings are needed to extract the information needed in the tables?
So with modrm.mod = 0, modrm.rm = 0x5 (RIP relative memory access) we need all combinations of:
opcodemapopcodemodrm.regpp, W, L, L'Lvvvvrepe, repne, opSizeWe will iterate through all the necessary set. Many of these combinations will lead to invalid/undefined encodings. This will cause the disassembler to give up and mark the disassemble as bad.
The disassembly will then resume trying to disassemble at the next boundary.
To make sure the disassembler attempts to disassemble every instruction, we need to make sure the preceding instruction is always valid and terminates at our desired instruction boundary.
Through examination of the Primary opcode map, it is observed that
0x50-0x5f are all 1 byte instructions. These become convenient padding.
After each necessary instruction we insert enough padding bytes to fill the maximum instruction length and leave at least one additional one byte instruction.
Using a fixed suffix makes disassembly parsing simpler.
After the modrm byte, the generated instructions always include a postamble,
const char* postamble = "0x50, 0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59,\n";
This meets the padding consistency needs.
As a convenience to the parser the encoded instructions are logically ordered. The ordering is generally as follows, but can vary slightly depending on the needs of the particular opcode map:
This is to keep related instruction grouped together.
The simplest way to get these instructions into an object file for disassembly is to place them into a C++ BYTE array.
The file createOpcodes.cpp is the source for a program which will
generate opcodes.cpp
# Create the program createOpcodes
gcc createOpcodes.cpp -o createOpcodes
# Execute the program to create opcodes.cpp
./createOpcodes > opcodes.cpp
opcodes.cpp will now be a C++ source file with uint8_t opcodes[]
initialized with our set of necessary instructions and padding.
We need to compile this to an executable to prepare for disassembly.
# Compile opcodes.cpp to opcodes
gcc -g opcodes.cpp -o opcodes
In investigating the various disassembly formats, the intel
disassembly format is superior to the att format. This is because the
intel format clearly marks the instruction relative accesses and
their sizes. For instance:
Also it is important to have all the raw bytes in the disassembly. This allows accurately determining the instruction length.
It also helps identifying which instructions are from our needed set.
I happened to have used gdb as a disassembler.
# Disassemble opcodes
gdb opcodes -batch -ex "set disassembly-flavor intel" -ex "disass /r opcodes" > opcodes.intel
It seems objdump could provide similar results. This is untested. The parser may need to
be modified for subtle differences.
objdump -D -M intel -b --insn-width=15 -j .data opcodes
The lldb parser aborts parsing when it observes bad instruction. It might be usable with additional python scripts.
Windows disassembler may also work. It has not been tried.
# Parse disassembly and generate code
cat opcodes.intel | dotnet run > ../amd64InstrDecode.h
We are not interested in all lines in the disassembly. The disassembler stray comments, recovery, and our padding introduce lines we need to ignore.
We filter out and ignore non-disassembly lines using a Regex for a
disassembly line.
We expect the generated instruction samples to be in a group. The first instruction in the group is the only one we are interested in.
The group is terminated by a pair of instructions. The first terminal
instruction must have 0x58 as the last byte in its encoding. The final
terminal instruction must be a 0x59\tpop.
We continue parsing the first line of each group.
Many encodings are not valid. For gdb, these instructions are marked
(bad). We filter and ignore these.
For each sample, we need to calculate the important properties:
SuffixFlags
In a few cases it was observed the disassembly of some memory operations did not include a size. These were manually researched. For the ones with reasonable sizes, these were added to a table to manually override these unknown sizes.
opCodeExtTo facilitate identifying sets of instructions, the tool creates an opCodeExt.
For the Primary map this is simply the encoded opcode from the instruction
shifted left by 4 bits.
For the Secondary, F38, and F39 maps this is the encoded opcode from
the instruction shifted left by 4 bits or'ed with a synthetic pp. The
synthetic pp is constructed to match the rules of
Table 1-22. VEX/XOP.pp Encoding from the
AMD64 Architecture Programmer’s Manual Volume 3: General-Purpose and System Instructions.
For the case where the opSize 0x66 prefix is present with a rep* prefix, the rep* prefix is used
to encode pp.
For the VEX* maps this is the encoded opcode from
the instruction shifted left by 4 bits or'ed with pp.
For most instructions, the opCodeExt will uniquely identify the instruction.
For many instructions, modrm.reg is used to help uniquely identify the instruction.
These instruction typically change mnemonic and behavior as modrm.reg
changes. These become problematic, when the form of these instructions vary.
For a few other instructions the L, W, vvvv values may change the instruction
behavior. Usually these do not change mnemonic.
The set of instructions is therefore usually grouped by the opcode map and
opCodeExt generated above. For these a change in opCodeExt or map
will start a new group.
For select problematic groups of modrm.reg sensitive instructions, a
change in modrm.reg will start a new group.
intersection and union of the SuffixFlags for the set.intersection are common to all instructions in the set.union, but not in the intersection vary within the
set based on the encoding flags. These are the sometimesFlagssometimesFlags. For each combination of
sometimesFlags, check each rule by calling TestHypothesis. This
determines if the rule corresponds to the set of observations.Encode the rule as a string. The rule might encode that the W bit or L
bit causes a different memory/immediate behavior for the particular
<map, opCodeExt> entry.
Add the rule to the set of all observed rules. Add the set's rule with comment to a dictionary.
At this point generating the code is rather simple.
Iterate through the set of rules to create an enumeration of InstrForm.
For each map iterate through the dictionary, filling missing instructions with an appropriate pattern for undefined instructions.
The design uses a simple fully populated direct look up table to provide a nice simple means of looking up. This direct map approach is expected to consume ~10K bytes.
Other approaches like a sparse list may reduce total memory usage. The added complexity did not seem worth it.
This approach is intended to reduce the human error introduced by manually parsing and encoding the various instruction forms from their respective descriptions.
The approach of using a single object file as the source of disassembly samples, is restricted to a max compilation/link unit size. Early drafts were generating more instructions, and couldn't be compiled.
However, there is no restriction that all the samples must come from single executable. These could easily be separated by opcode map.
This design is for existing instruction sets. New instruction sets will require more work.
Further this methodology uses the disassembler to generate the tables. Until a reasonably featured disassembler is created, the new instruction set can not be supported by this methodology.
The previous methodology of manually encoding these new instruction set would still be possible.
This design presumes the disassembler is correct. The specific version of the disassembler may have disassembly bugs. Using newer disassemblers would mitigate this to some extent.
Add a patch to the parser to workaround the bug and regenerate the table
Regenerate and compare.
Add new feature code, regenerate.