Back to Chroma

s3heap-service

rust/s3heap-service/README.md

1.5.95.8 KB
Original Source

s3heap-service

The s3heap-service integrates with the function manager to trigger functions at no faster than a particular cadence, with reasonable guarantees that writing data will cause a function to run.

This document refines the design of the heap-tender and heap service until it can be implemented safely.

Abstract: A heap and a sysdb.

At the most abstract level, we have a heap and the sysdb. An item is either in the heap or not in the heap. For the sysdb, an item is not in the sysdb, in the sysdb and should be scheduled, or in the sysdb and waiting for writes to trigger the next scheduled run.

That gives this chart

Heap StateSysdb State
Not in heapNot in sysdb
Not in heapIn sysdb, should be scheduled
Not in heapIn sysdb, waiting for writes
In heapNot in sysdb
In heapIn sysdb, should be scheduled
In heapIn sysdb, waiting for writes

And then one must take into account whether there's a function template.

More abstractly, view it like this:

                     |                     | On Heap    | Not On Heap |

-------------------------|---------------------|------------|-------------| Has no function template | Not in sysdb | A_1 | A_2 | | In sysdb, scheduled | B_1 | B_2 | | In sysdb, waiting | C_1 | C_2 | -------------------------|---------------------|------------|-------------| Has function template | Not in sysdb | D_1 | D_2 | | In sysdb, scheduled | E_1 | E_2 | | In sysdb, waiting | F_1 | F_2 |

When viewed like this, we can establish rules for state transitions in our system. Each operation operates on either the sysdb or the heap, never both because there is no transactionality between S3 and databases. Thus, we can reason that we can jump to any row within the same column, or to another column within the same row.

State space diagram

Note that there are six base cases. Reasoning through all 36 cases and getting them right will be difficult. Instead, we aim to exploit symmetry: If there is a function template and something is in sysdb, it is as if there is no function template. As before, we can mark as trivially impossible anything that changes along two axes simultaneously. Anything listed as INVX is invariant X and is prohibited by the invariant.

                 From
A_1A_2B_1B_2C_1C_2D_1D_2E_1E_2F_1F_2
A_1-INV2DEL1XDEL1XTT2XXXXX
A_2STOP-XGCXDEL1XTT2XXXX
ToB_1INV1X-R1R1XXXT2XXX
B_2XADD1INV4-XWT1XXXT2XX
C_1INV1XDO1X-INV4XXXXT2X
C_2XINV3XDO2INV4-XXXXXT2
------------------------------------------------------------------------------
D_1TT1XXXXX-INV2INV6XINV6X
D_2XTT1XXXXINV5-XINV6XINV6
E_1XXTT1XXXXX-R1WT1X
E_2XXXTT1XXXTT3INV4-XWT1
F_1XXXXTT1XTT3XDO1X-X
F_2XXXXXTT1XHOLE2XDO2INV4-
  • -: Identity function. Always permitted.
  • X: The transition hops rows, columns, or column families in the 2x6 table.
  • STOP: Transition to the quiescent state.
  • TT1: Add function template.
  • TT2: Task template deleted.
  • TT3: Task template instantiated.
  • ADD1: Attach function.
  • DEL1: Delete function.
  • DO1: The function ran once and now is waiting for more log records.
  • DO2: Same as DO1, but technically not possible to happen.
  • WT1: Write triggered-state change.
  • GC: Garbage collection kicks in.
  • INV1: Task UUIDs are not reused. Therefore the function lifetime has the progression not used -> used -> never used again.
  • INV2: A function will only be added to the heap after it has been witnessed to exist as a template or sysdb entry. By INV1 if it is on heap and no longer witnessed it will never be used again. Therefore it cannot resurrect to add to the heap.
  • INV3: A function is always added in a non-waiting state. This is necessary to guarantee that functions don't get dropped. It is either existing and on the heap or quiescent and waiting for additional writes. The latter should never be the starting condition.
  • INV4: A two-phase commit with the heap makes it possible to transition the schedule to keep the function scheduled, commit the heap change, and then commit the change to sysdb. Therefore the signal will never leave the heap as long as the sysdb has a scheduled function.
  • INV5: By INV2 the function template was witnessed in sysdb before the function was added to the heap. By INV1, this means the function was deleted. An impossibility arises.
  • INV6: A function cannot be deleted if it descends a template.
  • R1: Corollary to INV4: On start, any outstanding 2PC is reconciled and converged to push the function to the heap.

Holes to overcomb/Unsurities:

  • HOLE2: What would compel a process to instantiate a function template if not in heap?