eden/fs/docs/Takeover.md
The takeover directory holds the logic for the Takeover Client (the new EdenFS process) and Server (the old EdenFS process) which are used during a graceful restart process.
Takeover is currently supported for NFS and FUSE mounts. Takeover does not support PrjFS mounts.
The takeover process allows a new EdenFS daemon to seamlessly take over mount points from an existing running daemon without unmounting the filesystems. This enables graceful restarts where the user's experience is minimally disrupted.
The key components involved are:
There are 5 main components in the takeover directory: thrift serialization library, client, server, data, and handler.
takeover.thrift)The thrift file defines the message types exchanged over the takeover socket:
struct TakeoverVersionQuery - Sent from the client to the server to inform
the server what features of the takeover protocol the client supports. This
struct contains two fields:
versions - A legacy field containing a set of supported protocol version
numbers. Modern clients send the singleton set containing version 7. After
version 7, we use capabilities for new features. This field can be removed
in favor of capabilities.capabilities - A 64-bit bitmask indicating which features the client
supports. This is the preferred method for protocol negotiation as it allows
for more granular feature matching.Empty "ready" ping - In version 4, a ping message introduced that sent by the server to ensure the client is still alive and ready to receive takeover data before actually sending it. This prevents the server from attempting to transfer mounts to a disconnected client.
Chunked message markers - For large takeover data (e.g., 3+ million inodes), the data is split into chunks:
FIRST_CHUNK - Signals the start of chunked data transferLAST_CHUNK - Signals the end of chunked data transferFLAGS_maximumChunkSize is default to 512 MBunion SerializedTakeoverResult - The modern format for takeover data. This
is either:
SerializedTakeoverInfo - Contains the takeover data on successstring errorReason - Contains an error message on failurestruct SerializedTakeoverInfo - Contains:
mounts - A list of SerializedMountInfo for each mount pointfileDescriptors - A list of FileDescriptorType indicating which file
descriptors are being transferred and in what orderstruct SerializedMountInfo - Contains mount-specific data:
mountPath - The path where the filesystem is mountedstateDirectory - The directory containing EdenFS state for this mountbindMountPaths - Legacy field, no longer usedconnInfo - For FUSE mounts, a binary blob containing the fuse_init_out
structure (left empty for NFS mounts)inodeMap - A SerializedInodeMap containing unloaded inode informationmountProtocol - The type of mount (FUSE, NFS, or UNKNOWN)struct SerializedInodeMap - Contains:
unloadedInodes - A list of SerializedInodeMapEntrystruct SerializedInodeMapEntry - Contains inode metadata:
inodeNumber - The inode numberparentInode - The parent inode numbername - The entry nameisUnlinked - Whether the inode has been unlinkednumFsReferences - Number of filesystem referenceshash - Optional object hash (unset means materialized)mode - The inode mode bitsenum FileDescriptorType - Types of file descriptors transferred:
LOCK_FILE - The EdenFS lock fileTHRIFT_SOCKET - The thrift server socketMOUNTD_SOCKET - The NFS mountd socket (optional, only for NFS mounts)enum TakeoverMountProtocol - Mount protocol types:
UNKNOWN - Unknown/unspecified (legacy)FUSE - FUSE mountNFS - NFS mountunion SerializedTakeoverData - Deprecated. Legacy format used by older
versions. Modern versions use SerializedTakeoverResult instead.
TakeoverClient.cpp)The client provides the takeoverMounts function which requests to take over
mount points from an existing edenfs process. On success, it returns a
TakeoverData object; on error, it throws an exception.
Parameters:
socketPath - Path to the takeover unix sockettakeoverReceiveTimeout - Timeout for receiving takeover datashouldThrowDuringTakeover - For testing: simulate an error during takeovershouldPing - For testing: whether to respond to the ready pingsupportedVersions - Set of supported protocol versionssupportedTakeoverCapabilities - Bitmask of supported capabilitiesProtocol Flow:
TakeoverVersionQuery containing supported versions and capabilitiesTakeoverDataTakeoverServer.cpp)A helper class that listens on a unix domain socket for clients that wish to
perform graceful takeover of this EdenServer's mount points. Uses the
EdenServer's main EventBase for driving I/O.
Public Interface:
TakeoverServer(eventBase, socketPath, handler, faultInjector, supportedVersions, supportedCapabilities) -
Constructor that initializes and starts the serverstart() - Begins listening on the takeover socketInternal Connection Handling (ConnHandler):
When a client connects, the server:
Validates credentials - Checks that the connecting process has the same UID as the server process (security check)
Receives version query - Waits up to 5 seconds for the client to send its supported versions and capabilities
Negotiates protocol - Computes the compatible version and capabilities between client and server. Capabilities are computed as the intersection of what both sides support.
Initiates shutdown - Calls handler->startTakeoverShutdown() to begin
the graceful shutdown process. This returns a Future that completes with
TakeoverData when the server is ready to transfer.
Pings the client (if PING capability is supported) - Sends a ping message
to verify the client is still connected. Waits up to 5 seconds (configurable
via pingReceiveTimeout flag) for a response. If the ping fails, the server
recovers and resumes normal operation.
Closes storage - Calls handler->closeStorage() to release locks on
local and backing stores so the new process can acquire them.
Sends takeover data - Serializes and sends the TakeoverData. For large
datasets, uses chunked transfer:
Signals completion - Fulfills the takeoverComplete promise to notify
the EdenServer that takeover is finished.
Error Handling:
TakeoverData through the promise
so EdenServer can recover and resume servingTakeoverData.h / TakeoverData.cpp)The TakeoverData class contains all information needed for takeover:
Capability Flags (TakeoverCapabilities):
| Flag | Value | Description |
|---|---|---|
CUSTOM_SERIALIZATION | 1 << 0 | Deprecated custom format, no longer supported |
FUSE | 1 << 1 | Supports FUSE mount serialization |
THRIFT_SERIALIZATION | 1 << 2 | Uses Thrift for serialization (required) |
PING | 1 << 3 | Server pings client before sending data |
MOUNT_TYPES | 1 << 4 | Protocol includes mount type information |
NFS | 1 << 5 | Supports NFS mount serialization |
RESULT_TYPE_SERIALIZATION | 1 << 6 | Uses SerializedTakeoverResult format |
ORDERED_FDS | 1 << 7 | File descriptor order is specified in message |
OPTIONAL_MOUNTD | 1 << 8 | Mountd socket is optional (requires ORDERED_FDS) |
CAPABILITY_MATCHING | 1 << 9 | Uses capability-based protocol negotiation |
INCLUDE_HEADER_SIZE | 1 << 10 | Header includes its size for future extensibility |
CHUNKED_MESSAGE | 1 << 11 | Supports chunked message transfer for large data |
Supported Capabilities (current build):
FUSE | MOUNT_TYPES | PING | THRIFT_SERIALIZATION | NFS |
RESULT_TYPE_SERIALIZATION | ORDERED_FDS | OPTIONAL_MOUNTD |
CAPABILITY_MATCHING | INCLUDE_HEADER_SIZE | CHUNKED_MESSAGE
Protocol Versions:
| Version | Description |
|---|---|
| 0 | Never supported (used for testing) |
| 1 | Deprecated: original protocol |
| 3 | Introduced Thrift serialization |
| 4 | Added ping handshake |
| 5 | Added NFS mount support |
| 6 | Added generic serialization and optional file descriptors |
| 7 | Capability-based negotiation, header size, chunked messages |
Note: Version numbers are being phased out in favor of capability-based negotiation. Version 7 should be the last numbered version. After this version, server and client negotiate and choose the capabilities that both support.
Data Members:
lockFile - The main eden lock file preventing multiple processesthriftSocket - The thrift server socketmountdServerSocket - Optional socket for NFS mountdgeneralFDOrder - Order of file descriptors in the messagemountPoints - Vector of MountInfo for each mounttakeoverComplete - Promise fulfilled when takeover data is sentMountInfo Structure:
mountPath - Absolute path where filesystem is mountedstateDirectory - Path to EdenFS state directory for this mountchannelInfo - Variant containing either FuseChannelData, NfsChannelData,
or ProjFsChannelDatainodeMap - Serialized inode map dataSerialization Format:
The serialized message format is:
<32-bit version><32-bit header size><64-bit capabilities><thrift-serialized data>
SerializedTakeoverResultFor chunked transfers, data is split into chunks of up to 512 MB (configurable).
Key Functions:
serialize(capabilities, msg) - Serialize takeover data into a UnixSocket
messagedeserialize(msg) - Deserialize a UnixSocket message into TakeoverDataserializePing() / isPing(buf) - Create/detect ping messagesserializeFirstChunk() / isFirstChunk(buf) - Create/detect chunk markersserializeLastChunk() / isLastChunk(buf) - Create/detect chunk markerscomputeCompatibleVersion(versions, supported) - Find best compatible versioncomputeCompatibleCapabilities(capabilities, supported) - Compute shared
capabilitiesversionToCapabilities(version) - Convert version number to capability setcapabilitiesToVersion(capabilities) - Convert capabilities to version numberTakeoverHandler.h)TakeoverHandler is a pure virtual interface for classes that want to implement
graceful takeover functionality. This is primarily implemented by the
EdenServer class. Alternative implementations exist for unit testing.
Virtual Functions:
startTakeoverShutdown() - Called when a graceful shutdown has been
requested, with a remote process attempting to take over the currently running
mount points. Returns a Future<TakeoverData> that will produce the takeover
data once the edenfs process is ready to transfer its mounts.
closeStorage() - Called before sending the TakeoverData to the client,
after a successful ready handshake (if applicable). This function should close
storage used by the server (local stores, backing stores) to release locks so
the new process can acquire them.
Client (New EdenFS) Server (Old EdenFS)
| |
|-------- Connect to socket ----------->|
| |
|---- TakeoverVersionQuery ------------>|
| (versions, capabilities) |
| timeout 5 seconds |
| [Validate UID matches]
| [Negotiate protocol]
| [startTakeoverShutdown()]
| |
|<-------------- Ping ------------------|
| |
|------------- Ping Response ---------->|
| timeout 5 seconds |
| [closeStorage()]
| |
|<-------- FIRST_CHUNK (if chunked) ----|
| |
|<-------- Data + File Descriptors -----|
|<-------- More Data Chunks ... --------|
| |
|<-------- LAST_CHUNK (if chunked) -----|
| |
[Deserialize TakeoverData] |
[Take over mounts] |
| |
[Fulfill takeoverComplete promise]
[Exit gracefully]
Note: takeover-receive-timeout is a configurable flag (through EdenConfig.h)
defaulted to 2.5 minutes. This is the time that the client will wait for the
server to send the takeover data. This timeout applies to each chunk of data
when sending data in chunks.
The takeover system includes several mechanisms for handling errors:
UID Validation - Prevents unauthorized processes from taking over mounts
Timeout Handling - Various timeouts prevent hanging:
Ping Verification - Before sending takeover data, the server pings the client to ensure it's still responsive. If the client doesn't respond, the server can recover and continue running.
Fault Injection - The server supports fault injection points for testing:
takeover.ping_receive - Inject faults during ping handlingtakeover.error during send - Simulate errors during data transferRecovery Path - If takeover fails after shutdown has started but before
data is sent, the server can recover by using the TakeoverData returned
through the takeoverComplete promise.