asmcomp/power/NOTES.md
IBM POWER processors, 64-bit, little-endian, ELF ABI v2.
(Called ppc64el in Debian.)
No longer supported:
powerpc)powerpc)Instruction set architecture: Power ISA User Instruction Set Architecture, book 1 of Power Instruction Set (https://openpowerfoundation.org/specifications/isa/)
ELF ABI 64 bits version 2: OpenPOWER ABI for Linux Supplement for the Power Architecture 64-bit ELF V2 ABI (https://openpowerfoundation.org/specifications/64bitelfabi/)
The PowerPC Compiler Writer's Guide, Warthman Associates, 1996. (PDF available from various sources on the Web.)
The stack pointer SP (register 1) is always 16-byte aligned.
Every stack frame contains a 32-byte reserved area at SP to SP + 31.
A typical function prologue that allocates N bytes of stack and saves LR:
addi 1, 1, -N
mflr 0
std 0, (N + 16)(1) ; in caller's frame!
If the back pointer is to be maintained as well:
stdu 1, -N(1)
mflr 0
std 0, (N + 16)(1) ; in caller's frame!
If the function needs to stack-allocate M extra bytes, e.g. for an exception handler, it should leave the bottom 32 bytes reserved. So, it decrements SP by M and uses bytes SP + 32 ... SP + 32 + M - 1 as the stack-allocated space:
addi 1, 1, -32 ; allocate exception handler
std 29, 40(1) ; save previous trap ptr at handler + 8
Because the bottom 32 bytes are used by linker-generated call veeners, we need to reserve them both in C stack frames and in OCaml stack frames.
To speed up stack switching between OCaml and C, the stack pointers saved in struct stack_info and struct c_stack_link point to the bottom of the stack frame and include the reserved 32 bytes area.
Register r2 points to a table of contents (TOC) that contains absolute addresses for symbols. Different compilation units can have different TOCs. Hence, register r2 must be properly initialized at the beginning of functions that access the TOC, and properly saved and restored around function calls.
The general protocol is as follows:
When a function is called, its address (the address of the first instruction of the function) must be in register r12.
Every function that makes TOC references starts with two instructions that initialize r2 to a fixed offset from r12, offset determined by the linker:
0: addis 2, 12, (.TOC. - 0b)@ha
addi 2, 2, (.TOC. - 0b)@l
For example, here is a call to a function pointer found at offset 0 from register r3:
std 2, 24(1) ; save r2
ld 12, 0(3) ; address of function in r12
mtctr 12
bctrl ; call the function
ld 2, 24(1) ; restore r2
If the caller does not make TOC references, the std 2 and ld 2 can
be omitted. Likewise for a tail call:
ld 12, 0(3) ; address of function in r12
mtctr 12
bctr ; tail call the function
For calls to known functions bl function_name, a special protocol is
implemented by the linker. The object file contains the bl
instruction followed by a nop instruction:
bl function_name
nop
If the linker determines that function_name and the caller function
share the same TOC, it rewrites the code as
bl function_name+8 ; skip the prologue that initializes r2
nop
So, the callee does not recompute r2 from r12, it just uses the same r2 value as the caller. For this reason, r12 need not be initialized by the caller.
If function_name uses a different TOC than the caller function, the
linker generates a branch-and-link to a veener function. The veener
function saves r2, recomputes r12, and branches to function_name.
The nop after the branch-and-link becomes a reload of the caller's
TOC pointer in r2.
bl veener_function
ld 2, 24(1) ; restore r2
veener_function:
std 2, 24(1) ; save caller's r2
<load r12 with absolute address of function_name>
mtctr r12
bctr
However, this doesn't quite work for tail calls. Consider:
Because f1 and f2 have the same TOC, the linker inserted no code in f1 to save and restore r2 around the call to f2.
Because f2 tailcalls f3, r2 will not be restored to f2's TOC when f3 returns.
So, we're back into f1, with the wrong TOC in r2.
To avoid this problem, ocamlopt always reloads r2 after a direct call. One possibility would be to save r2 and reload r2 around the direct call, using the canonical slot in the current frame:
std 2, 24(1)
bl function_name
nop
ld 2, 24(1)
However, this adds 2 instructions to every direct call. We can avoid generating the store if the current function saves its r2 value somewhere on the stack at the beginning. Then, after each direct or indirect call, we can reload r2 from there.
What is "there"? Which location is used for saving r2 at the beginning of the function? A natural choice would be the word at SP + 24, i.e. the TOC-saving word for the current function. However, SP can vary inside the function, e.g. to install and remove exception handlers, so additional r2 saves would need to be generated.
The code currently generated by ocamlopt saves its r2 at SP + (N + 8) where N is the size of the stack frame, that is, at the word reserved by the callee for the caller to save the CR register, which ocamlopt-generated code does not use otherwise. This avoids allocating one more word in the current frame just to save r2.