Back to Rustfs

Internode Transport Buffer Contract

crates/ecstore/docs/internode-transport/transport-buffer-contract.md

1.0.0-beta.54.7 KB
Original Source

Internode Transport Buffer Contract

Status: design note only. This document defines a backend-neutral buffer ownership and lifecycle contract for the InternodeDataTransport adapter. It does not implement a new backend and does not change production behavior.

Open-source Scope

The open-source RustFS path keeps tcp-http as the default internode data transport. This document defines adapter contracts only:

  • no additional production backend is introduced;
  • no dependency is added;
  • no new accepted production backend value is added;
  • RustFS core data-plane logic remains independent of the concrete transport implementation.

Current Adapter Surface

The current data-plane surface is byte-stream based:

Current pathCurrent API shapeCurrent ownership
Remote read streamInternodeDataTransport::open_read(...) -> FileReaderBackend returns boxed AsyncRead; callers provide temporary ReadBuf storage per poll.
Remote write streamInternodeDataTransport::open_write(...) -> FileWriterCallers pass borrowed &[u8] slices into boxed AsyncWrite; the backend owns any async body staging.
Walk-dir streamInternodeDataTransport::open_walk_dir(...) -> FileReaderSame boxed stream model as read, with a small serialized request body.

This API is correct for the current TCP/HTTP backend. The adapter contract describes current ownership boundaries without assuming implementation details outside TcpHttpInternodeDataTransport.

Buffer Ownership Model

Buffer roleAllocatorLifetime ownerTransport stateTCP/HTTP behavior
Send bufferCaller or RustFS-owned poolCaller until the writer copies or accepts bytes for the HTTP bodyExisting HTTP body stagingCopy into the existing AsyncWrite path when the writer cannot use the borrowed slice directly.
Receive bufferCaller-provided storageReader while filling; caller after poll_read returnsExisting HTTP response body chunksCopy from AsyncRead into caller storage as today.
Control metadataRustFS callerCaller/request objectNot buffer-managed by the data-plane backendSerialize into HTTP/gRPC/control-plane messages.
Fallback stagingTCP/HTTP backendTCP/HTTP backendExisting HttpReader/HttpWriter buffersExisting buffering semantics.

The current writer must not retain borrowed caller slices beyond the write call. When bytes must outlive the call, they are copied into owned HTTP body chunks.

This contract does not claim zero-copy behavior. The current TCP/HTTP path documents where copies occur.

Compatibility Contract

The current stream API remains the OSS compatibility contract:

rust
#[async_trait::async_trait]
pub trait InternodeDataTransport {
    async fn open_read(&self, request: ReadStreamRequest) -> Result<FileReader>;
    async fn open_write(&self, request: WriteStreamRequest) -> Result<FileWriter>;
    async fn open_walk_dir(&self, request: WalkDirStreamRequest) -> Result<FileReader>;
}

Current Adapter Contract

AreaRequired contract
OwnershipDefine when caller-owned bytes are copied or accepted by the transport.
CompletionReturn from stream operations only when bytes are accepted or an error is reported.
StagingKeep staging behavior inside the TCP/HTTP implementation.
Size limitsReport any RustFS-visible max_transfer_size; TCP/HTTP currently reports none.
OrderingPreserve ordered byte-stream semantics.
Copy accountingDocument known copy boundaries and avoid unmeasured zero-copy claims.

Current API Limitations

Current APICurrent limitation
FileReader = Box<dyn AsyncRead + Send + Sync + Unpin>AsyncRead exposes temporary caller ReadBuf storage.
FileWriter = Box<dyn AsyncWrite + Send + Sync + Unpin>AsyncWrite::poll_write receives borrowed &[u8] that cannot outlive the poll.
HttpWriterThe async HTTP body must own Bytes, so borrowed write buffers are copied into BytesMut or Bytes.
write_body_chunks_to_writerServer-side HTTP body chunks are copied into BytesMut before local disk write.
Erasure encode outputEncoded shards are represented as Vec<Bytes> and written through AsyncWrite.
Erasure decode inputShard reads allocate Vec<u8> buffers before decode.

These limitations do not block the current tcp-http backend.

Adapter Stability

InternodeDataTransport should keep RustFS core data-plane logic separate from the concrete transport implementation. The trait and tcp-http backend remain inside ecstore.

This PR does not perform a crate split, add runtime loading, introduce a plugin system, add a backend value, or implement a new transport backend.