docs/slides/encoding.md
At some point, someones calls a contract:
let contract = abi(TestContract, CONTRACT_ID);
contract.some_method(0);
Compiler will desugar this into
std::codec::contract_call(
CONTRACT_ID,
"some_method",
(0,),
)
which will be expanded into:
let first_parameter = encode("some_method");
let second_parameter = encode((0,));
let params = encode((
CONTRACT_ID,
first_parameter.ptr(),
second_parameter.ptr(),
));
let (ptr, len) = __contract_call(params.ptr(), coins, asset_id, gas);
let mut buffer = BufferReader::from_parts(ptr, len);
T::abi_decode(buffer)
and, in the end, will be compiled into a fuelVM call instruction.
$hp
│
│ ┌────────────────────────────┐
▼ │ │
┌──────────────┬──────┴──────┬──────────────┬──────▼────────┬───────────────┐
│ │ │ │ │ │
HEAP │ CONTRACT_ID │ method name │ method args │ encoded bytes │ encoded bytes │
│ 32 bytes │ param1 │ param2 │ │ │
└──────▲───────┴─────────────┴──────┬───────┴───────────────┴───────▲───────┘
│ │ │
│ └───────────────────────────────┘
│
... │
call $ra:ptr $rb:u64 $rc:ptr $rd:u64
... coins │ gas
│
┌▼─────────────┐
│ │
STACK .................... │ ASSET_ID │
│ 32 bytes │
└──────────────┘
▲ ▲
│ │
│ │
$ssp $sp
The example above was calling a contract implemented like:
impl TestContract for Contract {
fn some_method(qty: u64) {
...
}
}
Compiler will desugar this into:
pub fn __entry() {
let method_name = std::codec::decode_first_param::<str>();
if method_name == "some_method" {
let mut buffer = std::codec::BufferReader::from_second_parameter();
let args: (u64,) = buffer.decode::<(u64,)>();
let result: () = __contract_entry_some_method(args.0);
let result: raw_slice = encode::<()>(result);
__contract_ret(result.ptr(), result.len::<u8>());
}
__revert(123);
}
__contract_entry_some_method is the original some_method as is.
__contract_ret is a special function that immediately returns the current context, that is why there is no return in the generated code.
This is the memory layout before the contract method is actually called
$hp
|
| +----------------------------+
v | |
+--------------+------+------+--------------+------v--------+---------------+
| | | | | |
HEAP | CONTRACT_ID | method name | method args | encoded bytes | encoded bytes |
| 32 bytes | param1 | param2 | | |
+--------------+-------------+------+-------+------^--------+-------^--^----+
| | | |
+--------------+----------------+ |
| |
+-------------------------+ |
| |
| +-------------------------------------+
| |
+--------------+--------+-------+--------+
| | param1 param2 |
STACK | ASSET_ID | @ 73 words @ 74 words |
| 32 bytes | offset offset |
+--------------+-------------------------+
^ call frame metadata ^
| |
| |
$fp $ssp/$sp
ABI instabilityAPIs define the types that are exchanged:
abi MyContract {
fn some_method(arg: Vec<u64>);
}
but they do not define how these types are passed around. In the example above, the only safe way to call a contract, would be to use the exactly same version of the stdlib the contract implementation is using.
Why?
Any internal change to Vec could make calling a contract impossible.
Example: Suppose that caller and callee were compiled using std-lib v1.
struct Vec<T> {
pointer: raw_ptr,
cap: u64,
len: u64,
}
But the contract implementation decide to update to std-lib v2, that for some reason was changed to:
struct Vec<T> {
len: u64,
pointer: raw_ptr,
cap: u64,
}
Now caller will never be able to call the contract. Welcome to ABI Hell.
|
BEFORE | AFTER
|
+-----> { 0x...., 16, 4 } <----+ | +-----> { 0x...., 16, 4 } <---+
| | | | |
| | | | |
{ pointer, capacity, len } { pointer, capacity, len } | { pointer, capacity, len } { len, pointer, capacity }
|
What What | What What
CALLER CALLEE | CALLER CALLEE
sends sees | sends sees
Vec @ std-lib v1 Vec @ std-lib v1 | Vec @ std-lib v1 Vec @ std-lib v2
Now imagine everything that can affect byte layouts: lib versions, compiler versions, compiler flags, compiler optimizations etc....
All this can break contract calls without anyone noticing. All this, actually, also breaks whoever is consuming these bytes: indexers, SDKs, receipts parsers etc...
So our encoding scheme avoid this instability being an issue by forcing types to specify how they are encoded, decoded:
impl<T> AbiEncode for Vec<T> where T: AbiEncode
{
fn abi_encode(self, buffer: Buffer) -> Buffer {
...
}
}
impl<T> AbiDecode for Vec<T> where T: AbiDecode
{
fn abi_decode(ref mut buffer: BufferReader) -> Vec<T> {
...
}
}
And to improve dev experience, we automatically implement AbiEncode/AbiDecode for all types that do not contain pointers.
Return Value DemotionConsider a simple contract method as:
impl TestContract for Contract {
fn some_method() -> Vec<u64> {
Vec::new()
}
}
In reality this will be compiled into something like the code below. This is called "Return Value Demotion".
fn __contract_entry_some_method(return_value: &mut Vec<u64>) {
*return_value = Vec::new();
}
The problems for contracts is that return_value would point to memory region of the callee. And contracts cannot write into it.
That would demand return_value pointing to somewhere allocated by the contract, and being copied over by the callee.
AliasingIf we bypass encoding, and allow caller to pass pointers to contracts, that immediately creates a security issues, because malicious callers could craft aliased data structures and trick contracts.
For example:
impl TestContract for Contract {
fn some_method(v: Vec<u64>) {
let some_value1 = do_something(&v);
// do some_thing that allocate memory
let some_value2 = do_something(&v);
}
}
A malicious caller can craft a Vec which its pointer points to the memory the contract uses.
In the case above, would be possible to have some_value1 be different from some_value2, although this 100% not intuitive.
Allowing a caller to attack the contract somehow.
To avoid this, our encoding scheme, do not allow pointers, references etc... to be passed. You must always pass only the data.
scripts and predicates can have arguments on their main function.
fn main(v: u64) -> bool {
...
}
In both cases, the compiler will desugar into something like:
pub fn __entry() -> raw_slice {
let args: (u64,) = decode_script_data::<(u64,)>(); // or decode_predicate_data
let result: u64 = main(args.0);
encode::<u64>(result)
}
When the std::log is called we also encode its argument. So
log(1);
will be desugared into
__log(encode(1));
Configurables are a little bit more complex. What happens with configurables is that their initialization is evaluated at compile time.
configurable {
SOMETHING: u64 = 1,
}
In the example above, it will evaluate to 1. The compiler will then call encode(1) and append the result to the end of the binary.
And to allow SDKs to change this value just before deployment, the compiler generates an entry in the "ABI json", with the buffer offset in the binary.
{
"name": "SOMETHING",
"configurableType": { ... },
"offset": 7104
},
The last piece for configurables is how are they "decoded"?
This is done automatically by the compiler.
For example
configurable { SOMETHING: u64 = 1 }
fn main() -> u64 { SOMETHING }
will be desugared into something like
const SOMETHING: u64;
fn __entry() -> raw_slice {
std::codec::abi_decode_in_place(&mut SOMETHING, 7104, 8);
encode(main())
}
fn main() -> u64 {
SOMETHING
}
This is not valid sway, but gives the idea of what is happening.
end