s3stream/README.md
S3Stream is a shared streaming storage library that provides a unified interface for reading and writing streaming data to cloud object storage services like Amazon S3, Google Cloud Storage, and Azure Blob Storage. EBS is utilized here for its low-latency capabilities. It is designed to be used as the storage layer for distributed systems like Apache Kafka, Apache RocketMQ, etc. It provides the following features:
S3Stream provides a set of APIs for reading and writing streaming data to cloud object storage services. The APIs are designed to be simple and easy to use. The following APIs are provided by S3Stream:
public interface Stream {
/**
* Get stream id
*/
long streamId();
/**
* Get stream start offset.
*/
long startOffset();
/**
* Get stream next append record offset.
*/
long nextOffset();
/**
* Append RecordBatch to stream.
*/
CompletableFuture<AppendResult> append(RecordBatch recordBatch);
/**
* Fetch RecordBatch list from a stream.
*/
CompletableFuture<FetchResult> fetch(long startOffset, long endOffset, int maxBytesHint);
/**
* Trim stream.
*/
CompletableFuture<Void> trim(long newStartOffset);
}
Please refer to the S3Stream API for the newest API details.
In S3Stream's core architecture, data is initially written to the Write-Ahead Log (WAL) persistently, then it's uploaded to S3 storage in a near real-time fashion. To efficiently support two reading paradigms—Tailing Read and Catch-up Read—S3Stream incorporates a built-in Message Cache to expedite reading operations.
S3Stream supports various WAL storage options, including EBS, Regional EBS, S3, and other cloud storage services.