RAGFS Cache

RAGFS cache is an optional read-cache layer for OpenViking. It speeds up full file reads and directory reads. It is only an acceleration layer, not the source of truth; backend filesystem data remains authoritative.

Assumptions:

Only one OpenViking / RAGFS process writes to the same namespace.
File and directory changes go through RAGFS.
The backend is not modified externally by bypassing RAGFS.
After a cache Provider successfully writes or deletes one key, later reads of that key do not return the old value.

Quick Start

For first-time setup, complete the base configuration first:

bash

openviking-server init
openviking-server doctor

Then enable the cache under storage.agfs.cache in ~/.openviking/ov.conf. The following Redis example is a good quick validation setup:

json

{
  "storage": {
    "workspace": "./data",
    "agfs": {
      "backend": "local",
      "cache": {
        "enabled": true,
        "provider": "redis",
        "namespace": "openviking",
        "max_file_size_bytes": 1048576,
        "bypass_prefixes": ["/queue", "/tmp"],
        "redis": {
          "mode": "standalone",
          "endpoints": ["redis://127.0.0.1:6379"],
          "pool_size": 32,
          "connect_timeout_ms": 1000,
          "command_timeout_ms": 20,
          "key_prefix": "ragfs-cache",
          "default_ttl_seconds": 3600,
          "read_from_replica": false
        }
      }
    }
  }
}

Start Redis and OpenViking:

bash

redis-server
openviking-server --config ~/.openviking/ov.conf

If the configuration file is at the default path ~/.openviking/ov.conf, you can also run:

bash

openviking-server

Available Providers:

Provider	Best for	Notes
`memory`	Local validation and tests	In-process cache; lost after restart
`redis`	Fast rollout on standard networks	Currently supports standalone; read from primary only
`yuanrong`	Near-compute cache, shared memory, or heterogeneous multi-tier cache	Requires Yuanrong worker and native feature
`mooncake`	Remote memory pool, RDMA/TCP data plane	Requires Mooncake services and native feature

If the runtime package was not compiled with the selected Provider, startup returns an error similar to "requires the ... feature".

Native Provider Builds

The standard OpenViking wheel is suitable for the memory and redis Providers. The yuanrong and mooncake Providers depend on platform-specific native SDKs and must be built for the target deployment environment.

Install the wheel builder first:

bash

python -m pip install "maturin[patchelf]"

Yuanrong

Install the Yuanrong DataSystem C++ SDK and export its header and library locations:

bash

export YUANRONG_SDK_INCLUDE=/path/to/yuanrong/include
export YUANRONG_SDK_LIB_DIR=/path/to/yuanrong/lib
# Optional; defaults to "datasystem".
export YUANRONG_SDK_LIB_NAME=datasystem
export LD_LIBRARY_PATH="$YUANRONG_SDK_LIB_DIR:${LD_LIBRARY_PATH:-}"

Build and install the wheel:

bash

maturin build --release \
  --manifest-path crates/ragfs-python-native/Cargo.toml \
  --features yuanrong-native

python -m pip install --force-reinstall target/wheels/ragfs_python-*.whl

The Yuanrong worker configured by storage.agfs.cache.yuanrong must be available when OpenViking starts.

Mooncake

Check out the Mooncake revision used by crates/ragfs-cache-mooncake/Cargo.toml, then build Mooncake Store with Rust support:

bash

cmake -S /path/to/Mooncake -B /path/to/Mooncake/build \
  -DWITH_STORE=ON \
  -DWITH_STORE_RUST=ON \
  -DCMAKE_BUILD_TYPE=Release

cmake --build /path/to/Mooncake/build \
  --target build_mooncake_store_rust mooncake_master -j

Export the paths required by the official Mooncake Rust binding:

bash

export MOONCAKE_BUILD_DIR=/path/to/Mooncake/build
export MOONCAKE_STORE_LIB_DIR="$MOONCAKE_BUILD_DIR/mooncake-store/src"
export MOONCAKE_STORE_INCLUDE_DIR=/path/to/Mooncake/mooncake-store/include
export LD_LIBRARY_PATH="$MOONCAKE_BUILD_DIR/mooncake-common:\
$MOONCAKE_BUILD_DIR/mooncake-common/src:\
$MOONCAKE_BUILD_DIR/mooncake-store/src:\
$MOONCAKE_BUILD_DIR/mooncake-store/src/cachelib_memory_allocator:\
$MOONCAKE_BUILD_DIR/mooncake-transfer-engine/src:\
$MOONCAKE_BUILD_DIR/mooncake-transfer-engine/src/common/base:\
${LD_LIBRARY_PATH:-}"

Build and install the wheel:

bash

maturin build --release \
  --manifest-path crates/ragfs-python-native/Cargo.toml \
  --features mooncake-native

python -m pip install --force-reinstall target/wheels/ragfs_python-*.whl

The Mooncake metadata service and Master configured by storage.agfs.cache.mooncake must be available when OpenViking starts. Native wheels are platform-specific and should be built on a system compatible with the target deployment environment.

For production wheels, use a Mooncake revision whose Rust build.rs links libasan only when ASan is explicitly enabled. Verify that the release wheel does not contain or depend on libasan:

bash

rm -rf /tmp/ragfs-python-wheel
python -m zipfile -e target/wheels/ragfs_python-*.whl /tmp/ragfs-python-wheel
readelf -d /tmp/ragfs-python-wheel/ragfs_python/ragfs_python.abi3.so \
  | grep libasan
find /tmp/ragfs-python-wheel -name 'libasan*'

Both checks should produce no output.

Configuration

storage.agfs.cache supports these common options:

Option	Type	Default	Description
`enabled`	bool	`false`	Enable the RAGFS cache
`provider`	str	`"memory"`	`memory`, `redis`, `yuanrong`, or `mooncake`
`namespace`	str	`"openviking"`	Cache namespace for isolating deployments or tenants
`max_file_size_bytes`	int	`1048576`	Maximum full-file object size admitted to cache
`bypass_prefixes`	list[str]	`[]`	Path prefixes that always bypass cache

Redis configuration:

Option	Default	Description
`mode`	`"standalone"`	Redis deployment mode
`endpoints`	`["redis://127.0.0.1:6379"]`	Redis connection URLs
`username`	`""`	Redis ACL username
`password_env`	`""`	Environment variable that stores the Redis password
`pool_size`	`32`	Command concurrency
`connect_timeout_ms`	`1000`	Connection timeout
`command_timeout_ms`	`20`	Command timeout
`key_prefix`	`"ragfs-cache"`	Redis-side key prefix
`default_ttl_seconds`	`3600`	Default TTL; `0` means no TTL
`read_from_replica`	`false`	Must be `false` in standalone mode

Yuanrong configuration:

json

{
  "storage": {
    "agfs": {
      "cache": {
        "enabled": true,
        "provider": "yuanrong",
        "yuanrong": {
          "host": "127.0.0.1",
          "port": 31501,
          "connect_timeout_ms": 5000,
          "request_timeout_ms": 5000,
          "sdk_concurrency": 4
        }
      }
    }
  }
}

Mooncake configuration:

json

{
  "storage": {
    "agfs": {
      "cache": {
        "enabled": true,
        "provider": "mooncake",
        "mooncake": {
          "local_hostname": "127.0.0.1",
          "metadata_server": "http://127.0.0.1:8080/metadata",
          "master_server_addr": "127.0.0.1:50051",
          "protocol": "tcp",
          "device_name": "",
          "global_segment_size": 536870912,
          "local_buffer_size": 134217728,
          "replica_num": 2,
          "sdk_concurrency": 4,
          "operation_timeout_ms": 5000
        }
      }
    }
  }
}

Architecture

RAGFS splits caching into two layers:

CachedFileSystem: implements filesystem semantics, including cache hit/miss handling, backend fallback, cache fill, invalidation, generation checks, and metrics.
CacheProvider: only stores cache objects through get, put, delete, batch reads/writes, and close operations.

Call flow:

text

OpenViking
  -> RAGFS / MountableFS
  -> CachedFileSystem
       |-> CacheProvider -> Memory / Redis / Yuanrong / Mooncake
       `-> Backend FileSystem

With this boundary, file, directory, rename, recursive delete, and write-after-invalidation logic live only in the common layer. A Provider does not need to understand path semantics; it only needs to store stable key-value objects.

Cache Objects

RAGFS mainly caches three object types.

File Cache

File keys use a stable namespace and path hash:

text

ragfs:v1:{namespace}:file:{hash(path)}

The file value is a CacheEnvelope containing file content, object kind, path, and generation snapshots. After a full-read cache hit, RAGFS validates the envelope and generation before returning the content.

The default policy prefers summary files such as .abstract.md and .overview.md. Files larger than max_file_size_bytes are not admitted to cache. Non-full range reads also bypass the cache.

Directory Cache

Directory key:

text

ragfs:v1:{namespace}:dir:{hash(path)}

The directory cache stores raw backend read_dir entries, not permission-filtered final results. Permission, role, and agent-context filtering still happens in the OpenViking upper layer at request time.

This lets one directory cache object serve ls, tree, glob, the file-collection phase of grep, and path collection before delete or move operations.

Subtree Generation

Subtree generation key:

text

ragfs:v1:{namespace}:subtree:{hash(scope)}

remove_all and directory rename can leave descendant keys behind in the Provider. RAGFS bumps the subtree generation so old envelopes fail their generation snapshot check. Later real reads fall back to the backend and rebuild the cache.

Consistency and Invalidation

In the single-writer scenario, RAGFS does not need a distributed write lock. The important part is maintaining three invalidation classes according to filesystem semantics:

File changes: delete or update file_key(path) and delete dir_key(parent).
Directory changes: delete the directory's own dir_key and the parent directory's dir_key.
Subtree changes: bump subtree generation for recursive delete and directory rename.

Typical write order:

text

Acquire the in-process operation lock
-> Apply backend change
-> Update or delete related cache keys
-> Bump subtree generation when needed
-> Return result

If a Provider fails, RAGFS treats the backend as authoritative and puts the affected path into short-term bypass, avoiding reads from potentially stale cache.

Request Coalescing

When multiple requests read the same uncached small file or directory at the same time, CachedFileSystem uses an in-process inflight table to coalesce them:

text

The first miss becomes the leader and performs backend fallback and cache fill.
Later requests for the same key become followers and wait for the leader result.
The inflight entry is removed after the request completes.

This only reduces duplicate backend access within one OpenViking process. It does not change the Provider consistency boundary.

Cache Policy

RAGFS automatically bypasses paths that are not suitable for caching:

Lock files: .path.ovlock, *.lock, *.lck
Control files: enqueue, dequeue, peek, ack
Transient state: heartbeat, lease, cursor, offset, pid
Path prefixes configured through bypass_prefixes

Add permission-sensitive directories to bypass_prefixes. If raw directory entries themselves depend on the caller's permissions, that directory should not be cached.

Failure and Observability

The cache layer must not affect filesystem correctness:

get failure: fall back to backend.
put failure: record the error and put the path into bypass.
delete failure: record the error and put the path or scope into bypass.
Provider unavailable: do not return old cache; use backend results as authoritative.

Recommended signals to watch:

cache hit / miss / bypass
stale generation
provider get / put / delete latency
cache set / delete failures
inflight leader / follower / backend saved
backend fallback bytes

Recommended Rollout

Use memory locally to validate the configuration shape.
Use redis to validate real remote-cache benefits.
Move to yuanrong or mooncake for high-performance environments.
Cache summary files and raw read_dir first, then expand to more regular small files.
Add lock, control-plane, and permission-sensitive paths to bypass_prefixes.

In short: RAGFS cache is responsible for correct invalidation according to filesystem semantics, while the Provider is responsible for where cache objects live. As long as the backend remains the source of truth, every cache hit must pass envelope and generation validation before it is returned.