Back to Openviking

RAGFS Cache

docs/en/guides/13-ragfs-cache.md

0.4.312.5 KB
Original Source

RAGFS Cache

RAGFS cache is an optional read-cache layer for OpenViking. It speeds up full file reads and directory reads. It is only an acceleration layer, not the source of truth; backend filesystem data remains authoritative.

Assumptions:

  • Only one OpenViking / RAGFS process writes to the same namespace.
  • File and directory changes go through RAGFS.
  • The backend is not modified externally by bypassing RAGFS.
  • After a cache Provider successfully writes or deletes one key, later reads of that key do not return the old value.

Quick Start

For first-time setup, complete the base configuration first:

bash
openviking-server init
openviking-server doctor

Then enable the cache under storage.agfs.cache in ~/.openviking/ov.conf. The following Redis example is a good quick validation setup:

json
{
  "storage": {
    "workspace": "./data",
    "agfs": {
      "backend": "local",
      "cache": {
        "enabled": true,
        "provider": "redis",
        "namespace": "openviking",
        "max_file_size_bytes": 1048576,
        "bypass_prefixes": ["/queue", "/tmp"],
        "redis": {
          "mode": "standalone",
          "endpoints": ["redis://127.0.0.1:6379"],
          "pool_size": 32,
          "connect_timeout_ms": 1000,
          "command_timeout_ms": 20,
          "key_prefix": "ragfs-cache",
          "default_ttl_seconds": 3600,
          "read_from_replica": false
        }
      }
    }
  }
}

Start Redis and OpenViking:

bash
redis-server
openviking-server --config ~/.openviking/ov.conf

If the configuration file is at the default path ~/.openviking/ov.conf, you can also run:

bash
openviking-server

Available Providers:

ProviderBest forNotes
memoryLocal validation and testsIn-process cache; lost after restart
redisFast rollout on standard networksCurrently supports standalone; read from primary only
yuanrongNear-compute cache, shared memory, or heterogeneous multi-tier cacheRequires Yuanrong worker and native feature
mooncakeRemote memory pool, RDMA/TCP data planeRequires Mooncake services and native feature

If the runtime package was not compiled with the selected Provider, startup returns an error similar to "requires the ... feature".

Native Provider Builds

The standard OpenViking wheel is suitable for the memory and redis Providers. The yuanrong and mooncake Providers depend on platform-specific native SDKs and must be built for the target deployment environment.

Install the wheel builder first:

bash
python -m pip install "maturin[patchelf]"

Yuanrong

Install the Yuanrong DataSystem C++ SDK and export its header and library locations:

bash
export YUANRONG_SDK_INCLUDE=/path/to/yuanrong/include
export YUANRONG_SDK_LIB_DIR=/path/to/yuanrong/lib
# Optional; defaults to "datasystem".
export YUANRONG_SDK_LIB_NAME=datasystem
export LD_LIBRARY_PATH="$YUANRONG_SDK_LIB_DIR:${LD_LIBRARY_PATH:-}"

Build and install the wheel:

bash
maturin build --release \
  --manifest-path crates/ragfs-python-native/Cargo.toml \
  --features yuanrong-native

python -m pip install --force-reinstall target/wheels/ragfs_python-*.whl

The Yuanrong worker configured by storage.agfs.cache.yuanrong must be available when OpenViking starts.

Mooncake

Check out the Mooncake revision used by crates/ragfs-cache-mooncake/Cargo.toml, then build Mooncake Store with Rust support:

bash
cmake -S /path/to/Mooncake -B /path/to/Mooncake/build \
  -DWITH_STORE=ON \
  -DWITH_STORE_RUST=ON \
  -DCMAKE_BUILD_TYPE=Release

cmake --build /path/to/Mooncake/build \
  --target build_mooncake_store_rust mooncake_master -j

Export the paths required by the official Mooncake Rust binding:

bash
export MOONCAKE_BUILD_DIR=/path/to/Mooncake/build
export MOONCAKE_STORE_LIB_DIR="$MOONCAKE_BUILD_DIR/mooncake-store/src"
export MOONCAKE_STORE_INCLUDE_DIR=/path/to/Mooncake/mooncake-store/include
export LD_LIBRARY_PATH="$MOONCAKE_BUILD_DIR/mooncake-common:\
$MOONCAKE_BUILD_DIR/mooncake-common/src:\
$MOONCAKE_BUILD_DIR/mooncake-store/src:\
$MOONCAKE_BUILD_DIR/mooncake-store/src/cachelib_memory_allocator:\
$MOONCAKE_BUILD_DIR/mooncake-transfer-engine/src:\
$MOONCAKE_BUILD_DIR/mooncake-transfer-engine/src/common/base:\
${LD_LIBRARY_PATH:-}"

Build and install the wheel:

bash
maturin build --release \
  --manifest-path crates/ragfs-python-native/Cargo.toml \
  --features mooncake-native

python -m pip install --force-reinstall target/wheels/ragfs_python-*.whl

The Mooncake metadata service and Master configured by storage.agfs.cache.mooncake must be available when OpenViking starts. Native wheels are platform-specific and should be built on a system compatible with the target deployment environment.

For production wheels, use a Mooncake revision whose Rust build.rs links libasan only when ASan is explicitly enabled. Verify that the release wheel does not contain or depend on libasan:

bash
rm -rf /tmp/ragfs-python-wheel
python -m zipfile -e target/wheels/ragfs_python-*.whl /tmp/ragfs-python-wheel
readelf -d /tmp/ragfs-python-wheel/ragfs_python/ragfs_python.abi3.so \
  | grep libasan
find /tmp/ragfs-python-wheel -name 'libasan*'

Both checks should produce no output.

Configuration

storage.agfs.cache supports these common options:

OptionTypeDefaultDescription
enabledboolfalseEnable the RAGFS cache
providerstr"memory"memory, redis, yuanrong, or mooncake
namespacestr"openviking"Cache namespace for isolating deployments or tenants
max_file_size_bytesint1048576Maximum full-file object size admitted to cache
bypass_prefixeslist[str][]Path prefixes that always bypass cache

Redis configuration:

OptionDefaultDescription
mode"standalone"Redis deployment mode
endpoints["redis://127.0.0.1:6379"]Redis connection URLs
username""Redis ACL username
password_env""Environment variable that stores the Redis password
pool_size32Command concurrency
connect_timeout_ms1000Connection timeout
command_timeout_ms20Command timeout
key_prefix"ragfs-cache"Redis-side key prefix
default_ttl_seconds3600Default TTL; 0 means no TTL
read_from_replicafalseMust be false in standalone mode

Yuanrong configuration:

json
{
  "storage": {
    "agfs": {
      "cache": {
        "enabled": true,
        "provider": "yuanrong",
        "yuanrong": {
          "host": "127.0.0.1",
          "port": 31501,
          "connect_timeout_ms": 5000,
          "request_timeout_ms": 5000,
          "sdk_concurrency": 4
        }
      }
    }
  }
}

Mooncake configuration:

json
{
  "storage": {
    "agfs": {
      "cache": {
        "enabled": true,
        "provider": "mooncake",
        "mooncake": {
          "local_hostname": "127.0.0.1",
          "metadata_server": "http://127.0.0.1:8080/metadata",
          "master_server_addr": "127.0.0.1:50051",
          "protocol": "tcp",
          "device_name": "",
          "global_segment_size": 536870912,
          "local_buffer_size": 134217728,
          "replica_num": 2,
          "sdk_concurrency": 4,
          "operation_timeout_ms": 5000
        }
      }
    }
  }
}

Architecture

RAGFS splits caching into two layers:

  • CachedFileSystem: implements filesystem semantics, including cache hit/miss handling, backend fallback, cache fill, invalidation, generation checks, and metrics.
  • CacheProvider: only stores cache objects through get, put, delete, batch reads/writes, and close operations.

Call flow:

text
OpenViking
  -> RAGFS / MountableFS
  -> CachedFileSystem
       |-> CacheProvider -> Memory / Redis / Yuanrong / Mooncake
       `-> Backend FileSystem

With this boundary, file, directory, rename, recursive delete, and write-after-invalidation logic live only in the common layer. A Provider does not need to understand path semantics; it only needs to store stable key-value objects.

Cache Objects

RAGFS mainly caches three object types.

File Cache

File keys use a stable namespace and path hash:

text
ragfs:v1:{namespace}:file:{hash(path)}

The file value is a CacheEnvelope containing file content, object kind, path, and generation snapshots. After a full-read cache hit, RAGFS validates the envelope and generation before returning the content.

The default policy prefers summary files such as .abstract.md and .overview.md. Files larger than max_file_size_bytes are not admitted to cache. Non-full range reads also bypass the cache.

Directory Cache

Directory key:

text
ragfs:v1:{namespace}:dir:{hash(path)}

The directory cache stores raw backend read_dir entries, not permission-filtered final results. Permission, role, and agent-context filtering still happens in the OpenViking upper layer at request time.

This lets one directory cache object serve ls, tree, glob, the file-collection phase of grep, and path collection before delete or move operations.

Subtree Generation

Subtree generation key:

text
ragfs:v1:{namespace}:subtree:{hash(scope)}

remove_all and directory rename can leave descendant keys behind in the Provider. RAGFS bumps the subtree generation so old envelopes fail their generation snapshot check. Later real reads fall back to the backend and rebuild the cache.

Consistency and Invalidation

In the single-writer scenario, RAGFS does not need a distributed write lock. The important part is maintaining three invalidation classes according to filesystem semantics:

  • File changes: delete or update file_key(path) and delete dir_key(parent).
  • Directory changes: delete the directory's own dir_key and the parent directory's dir_key.
  • Subtree changes: bump subtree generation for recursive delete and directory rename.

Typical write order:

text
Acquire the in-process operation lock
-> Apply backend change
-> Update or delete related cache keys
-> Bump subtree generation when needed
-> Return result

If a Provider fails, RAGFS treats the backend as authoritative and puts the affected path into short-term bypass, avoiding reads from potentially stale cache.

Request Coalescing

When multiple requests read the same uncached small file or directory at the same time, CachedFileSystem uses an in-process inflight table to coalesce them:

text
The first miss becomes the leader and performs backend fallback and cache fill.
Later requests for the same key become followers and wait for the leader result.
The inflight entry is removed after the request completes.

This only reduces duplicate backend access within one OpenViking process. It does not change the Provider consistency boundary.

Cache Policy

RAGFS automatically bypasses paths that are not suitable for caching:

  • Lock files: .path.ovlock, *.lock, *.lck
  • Control files: enqueue, dequeue, peek, ack
  • Transient state: heartbeat, lease, cursor, offset, pid
  • Path prefixes configured through bypass_prefixes

Add permission-sensitive directories to bypass_prefixes. If raw directory entries themselves depend on the caller's permissions, that directory should not be cached.

Failure and Observability

The cache layer must not affect filesystem correctness:

  • get failure: fall back to backend.
  • put failure: record the error and put the path into bypass.
  • delete failure: record the error and put the path or scope into bypass.
  • Provider unavailable: do not return old cache; use backend results as authoritative.

Recommended signals to watch:

  • cache hit / miss / bypass
  • stale generation
  • provider get / put / delete latency
  • cache set / delete failures
  • inflight leader / follower / backend saved
  • backend fallback bytes
  1. Use memory locally to validate the configuration shape.
  2. Use redis to validate real remote-cache benefits.
  3. Move to yuanrong or mooncake for high-performance environments.
  4. Cache summary files and raw read_dir first, then expand to more regular small files.
  5. Add lock, control-plane, and permission-sensitive paths to bypass_prefixes.

In short: RAGFS cache is responsible for correct invalidation according to filesystem semantics, while the Provider is responsible for where cache objects live. As long as the backend remains the source of truth, every cache hit must pass envelope and generation validation before it is returned.