Back to Ruflo

ADR-057: RVF Native Storage Backend — Replace sql.js with RuVector Format

v3/implementation/adrs/ADR-057-rvf-native-storage-backend.md

3.6.3053.4 KB
Original Source

ADR-057: RVF Native Storage Backend — Replace sql.js with RuVector Format

FieldValue
StatusProposed
Date2026-02-28
AuthorsClaude Flow Team
Supersedes
RelatedADR-053 (AgentDB Controller Activation), ADR-054 (RVF Plugin Marketplace), ADR-055 (Controller Bug Remediation), ADR-056 (agentic-flow v3 Integration)

1. Context

The Problem

npx ruflo@latest installs 1.3GB across 914 packages with a 35-second cold start in Docker. The Docker optimization work (Dockerfile.lite with --omit=optional + aggressive pruning) reduced this to 324MB, but the core dependency chain still carries unnecessary weight:

ruflo (5KB wrapper)
  └─ @claude-flow/cli (9MB)
       ├─ @claude-flow/shared (11MB) ← depends on sql.js (18MB WASM)
       ├─ @claude-flow/mcp (650KB)
       ├─ semver (tiny)
       └─ @noble/ed25519 (tiny)

sql.js is the single largest hard dependency — an 18MB WASM SQLite bundle compiled from C via Emscripten. It is used in three places:

ConsumerFilePurposeLines
@claude-flow/sharedevents/event-store.tsAppend-only event sourcing log589
@claude-flow/memorysqljs-backend.tsMemory entries + brute-force vector search767
@claude-flow/embeddingspersistent-cache.tsLRU embedding cache with TTL411

What sql.js Actually Does

Analysis of all 1,767 lines of sql.js-consuming code reveals:

  1. Key-value storage with namespace isolation
  2. BLOB columns for Float32Array embedding vectors
  3. JSON TEXT columns for metadata, tags, references
  4. Dynamic SQL filtering (namespace, type, time range, pagination)
  5. Append-only event log with version tracking and snapshots
  6. LRU cache with TTL-based eviction
  7. Brute-force cosine similarity search (no indexing)

Notably, sql.js provides no vector indexing — the SqlJsBackend comments explicitly state:

typescript
// sql.js doesn't have native vector index
message: 'No vector index (brute-force search)',
recommendations.push('Consider using better-sqlite3 with HNSW for faster vector search');

The Opportunity

RVF (RuVector Format) is a binary container format already used in the Claude Flow ecosystem (ADR-054). It provides everything sql.js does plus native HNSW indexing in a fraction of the footprint:

Capabilitysql.jsRVF
Package size18MB WASM~50 bytes/vector overhead
Vector searchBrute-force O(n)HNSW 3-layer progressive (150x-12,500x faster)
QuantizationNonefp16, fp32, int8, int4, binary
Crash safetyManual export/saveAppend-only (no WAL needed)
COW branchingN/A<3ms branch, 1:200 compression
WASM support18MB Emscripten bundle5.5KB microkernel + 46KB control plane
Native supportN/ANAPI-RS (zero-copy)
Key-value storeSQL tablesMANIFEST + KV_SEG segments
Event logSQL INSERTAppend-only LOG_SEG
Cross-platformYes (WASM-only)Yes (native + WASM fallback)

2. Decision

Replace sql.js with RVF as the native storage backend across @claude-flow/shared, @claude-flow/memory, and @claude-flow/embeddings. Provide automatic and manual migration paths for existing SQLite (.db) and JSON (.json) data files with full backward compatibility.

Storage Architecture

Before (sql.js):
┌─────────────────────────────────────┐
│  @claude-flow/shared                │
│  ├─ EventStore → sql.js (18MB WASM) │
│  └─ event-store.db                  │
├─────────────────────────────────────┤
│  @claude-flow/memory                │
│  ├─ SqlJsBackend → sql.js           │
│  └─ memory.db                       │
├─────────────────────────────────────┤
│  @claude-flow/embeddings            │
│  ├─ PersistentCache → sql.js        │
│  └─ embeddings.db                   │
└─────────────────────────────────────┘

After (RVF):
┌─────────────────────────────────────┐
│  @claude-flow/shared                │
│  ├─ EventStore → RvfEventLog        │
│  └─ events.rvf (LOG_SEG)            │
├─────────────────────────────────────┤
│  @claude-flow/memory                │
│  ├─ RvfBackend → RVF native         │
│  └─ memory.rvf (VEC_SEG + KV_SEG)   │
├─────────────────────────────────────┤
│  @claude-flow/embeddings            │
│  ├─ RvfEmbeddingCache → RVF native  │
│  └─ embeddings.rvf (VEC_SEG)        │
└─────────────────────────────────────┘

Segment Mapping

sql.js TableRVF SegmentPurpose
memory_entriesKV_SEG + VEC_SEGKey-value metadata + vector embeddings
memory_entries.embedding (BLOB)VEC_SEG (fp32/fp16/int8)Typed vector storage with quantization
eventsLOG_SEGAppend-only event log (replaces SQL INSERT)
snapshotsSNAP_SEGEvent sourcing snapshots
embeddingsVEC_SEG + INDEX_SEGCached embeddings with HNSW index
Dynamic SQL indexesINDEX_SEG (3-layer HNSW)Progressive loading: 70% recall on first query
JSON metadata columnsMETA_SEGStructured metadata without JSON.parse overhead

3. Migration Strategy

3.1 File Detection and Format Routing

The system detects the storage format by file extension and magic bytes:

typescript
enum StorageFormat {
  SQLITE_SQLJS = 'sqlite-sqljs',   // .db files from sql.js
  SQLITE_NATIVE = 'sqlite-native', // .db files from better-sqlite3
  JSON = 'json',                    // .json fallback files
  RVF = 'rvf',                     // .rvf native format
  UNKNOWN = 'unknown',
}

function detectFormat(filePath: string): StorageFormat {
  if (!existsSync(filePath)) return StorageFormat.UNKNOWN;

  const ext = extname(filePath).toLowerCase();
  const header = readFileSync(filePath, { length: 16 });

  // RVF magic bytes: first 4 bytes
  if (header.slice(0, 4).toString() === 'RVF\0') return StorageFormat.RVF;

  // SQLite magic: "SQLite format 3\0"
  if (header.toString('ascii', 0, 15) === 'SQLite format 3') {
    return StorageFormat.SQLITE_SQLJS; // or SQLITE_NATIVE (same on-disk)
  }

  // JSON detection
  if (ext === '.json') return StorageFormat.JSON;

  return StorageFormat.UNKNOWN;
}

3.2 Automatic Migration (Transparent)

On first access, the DatabaseProvider detects legacy formats and migrates automatically:

typescript
async function openStorage(path: string, options: StorageOptions): Promise<IMemoryBackend> {
  const rvfPath = path.replace(/\.(db|json)$/, '.rvf');

  // If .rvf already exists, use it directly
  if (existsSync(rvfPath)) {
    return new RvfBackend(rvfPath, options);
  }

  // Detect legacy format
  const legacyFormat = detectFormat(path);

  if (legacyFormat === StorageFormat.UNKNOWN) {
    // Fresh install — create new .rvf file
    return new RvfBackend(rvfPath, options);
  }

  // Auto-migrate legacy → RVF
  console.info(`[migration] Detected ${legacyFormat} at ${path}, migrating to RVF...`);
  const migrator = new StorageMigrator(path, rvfPath, legacyFormat);
  await migrator.migrate();

  // Rename legacy file to .bak (not deleted)
  renameSync(path, path + '.bak');
  console.info(`[migration] Complete. Legacy file backed up to ${path}.bak`);

  return new RvfBackend(rvfPath, options);
}

Automatic migration guarantees:

  • Legacy .db and .json files are never deleted — renamed to .bak
  • Migration is atomic — writes to temp file, renames on success
  • Migration is idempotent — re-running is safe (checks for existing .rvf)
  • Migration reports progress for large datasets (percentage, ETA)

3.3 Manual Migration (CLI Commands)

bash
# Check current storage format and migration status
ruflo migrate status --storage

# Dry-run migration (report what would change, don't modify)
ruflo migrate run --storage --dry-run

# Migrate specific file
ruflo migrate run --storage --file ./data/memory/memory.db

# Migrate all storage files in project
ruflo migrate run --storage --all

# Force re-migration (even if .rvf already exists)
ruflo migrate run --storage --force

# Rollback: restore from .bak files
ruflo migrate rollback --storage

# Validate migrated data integrity
ruflo migrate validate --storage

3.4 Migration for Each Data Type

A. Memory Entries (.db.rvf)

typescript
class MemoryMigrator {
  async migrateFromSqlite(dbPath: string, rvfPath: string): Promise<MigrationResult> {
    const SQL = await initSqlJs();
    const buffer = readFileSync(dbPath);
    const db = new SQL.Database(new Uint8Array(buffer));

    const rvf = await RvfFile.create(rvfPath);
    const kvSeg = rvf.createSegment('KV_SEG');
    const vecSeg = rvf.createSegment('VEC_SEG');
    const idxSeg = rvf.createSegment('INDEX_SEG');

    let migrated = 0;
    let skipped = 0;

    const stmt = db.prepare('SELECT * FROM memory_entries ORDER BY created_at ASC');
    while (stmt.step()) {
      const row = stmt.getAsObject();

      // Migrate key-value metadata
      kvSeg.put(row.id as string, {
        key: row.key,
        content: row.content,
        type: row.type,
        namespace: row.namespace,
        tags: JSON.parse(row.tags as string),
        metadata: JSON.parse(row.metadata as string),
        owner_id: row.owner_id,
        access_level: row.access_level,
        created_at: row.created_at,
        updated_at: row.updated_at,
        expires_at: row.expires_at,
        version: row.version,
        references: JSON.parse(row.references as string),
        access_count: row.access_count,
        last_accessed_at: row.last_accessed_at,
      });

      // Migrate vector embeddings (if present)
      if (row.embedding) {
        const embedding = new Float32Array(
          new Uint8Array(row.embedding as Uint8Array).buffer
        );
        vecSeg.insert(row.id as string, embedding);
        migrated++;
      } else {
        skipped++;
      }
    }
    stmt.free();

    // Build HNSW index on migrated vectors
    if (migrated > 0) {
      await idxSeg.buildHnsw({ efConstruction: 200, M: 16 });
    }

    await rvf.flush();
    db.close();

    return { migrated, skipped, indexBuilt: migrated > 0 };
  }
}

B. Event Store (.db.rvf)

typescript
class EventStoreMigrator {
  async migrateFromSqlite(dbPath: string, rvfPath: string): Promise<MigrationResult> {
    const SQL = await initSqlJs();
    const buffer = readFileSync(dbPath);
    const db = new SQL.Database(new Uint8Array(buffer));

    const rvf = await RvfFile.create(rvfPath);
    const logSeg = rvf.createSegment('LOG_SEG');
    const snapSeg = rvf.createSegment('SNAP_SEG');

    let events = 0;
    let snapshots = 0;

    // Migrate events (order preserved)
    const eventStmt = db.prepare('SELECT * FROM events ORDER BY timestamp ASC, version ASC');
    while (eventStmt.step()) {
      const row = eventStmt.getAsObject();
      logSeg.append({
        id: row.id,
        type: row.type,
        aggregate_id: row.aggregate_id,
        aggregate_type: row.aggregate_type,
        version: row.version,
        timestamp: row.timestamp,
        source: row.source,
        payload: JSON.parse(row.payload as string),
        metadata: row.metadata ? JSON.parse(row.metadata as string) : null,
        causation_id: row.causation_id,
        correlation_id: row.correlation_id,
      });
      events++;
    }
    eventStmt.free();

    // Migrate snapshots
    const snapStmt = db.prepare('SELECT * FROM snapshots');
    while (snapStmt.step()) {
      const row = snapStmt.getAsObject();
      snapSeg.put(row.aggregate_id as string, {
        aggregate_type: row.aggregate_type,
        version: row.version,
        state: JSON.parse(row.state as string),
        timestamp: row.timestamp,
      });
      snapshots++;
    }
    snapStmt.free();

    await rvf.flush();
    db.close();

    return { events, snapshots };
  }
}

C. JSON Fallback Files (.json.rvf)

typescript
class JsonMigrator {
  async migrateFromJson(jsonPath: string, rvfPath: string): Promise<MigrationResult> {
    const raw = readFileSync(jsonPath, 'utf-8');
    const data = JSON.parse(raw);

    const rvf = await RvfFile.create(rvfPath);
    const kvSeg = rvf.createSegment('KV_SEG');
    const vecSeg = rvf.createSegment('VEC_SEG');

    let migrated = 0;

    // JSON backend stores entries as { [namespace]: { [key]: entry } }
    for (const [namespace, entries] of Object.entries(data)) {
      for (const [key, entry] of Object.entries(entries as Record<string, any>)) {
        const id = entry.id || `${namespace}:${key}`;
        kvSeg.put(id, { ...entry, namespace, key });

        if (entry.embedding && Array.isArray(entry.embedding)) {
          vecSeg.insert(id, new Float32Array(entry.embedding));
          migrated++;
        }
      }
    }

    await rvf.flush();
    return { migrated };
  }
}

D. Embedding Cache (.db.rvf)

typescript
class EmbeddingCacheMigrator {
  async migrateFromSqlite(dbPath: string, rvfPath: string): Promise<MigrationResult> {
    const SQL = await initSqlJs();
    const buffer = readFileSync(dbPath);
    const db = new SQL.Database(new Uint8Array(buffer));

    const rvf = await RvfFile.create(rvfPath);
    const vecSeg = rvf.createSegment('VEC_SEG');
    const idxSeg = rvf.createSegment('INDEX_SEG');

    let migrated = 0;

    const stmt = db.prepare('SELECT * FROM embeddings ORDER BY accessed_at DESC');
    while (stmt.step()) {
      const row = stmt.getAsObject();
      const embedding = new Float32Array(
        new ArrayBuffer(row.embedding as ArrayBuffer)
      );

      vecSeg.insert(row.key as string, embedding, {
        dimensions: row.dimensions,
        created_at: row.created_at,
        accessed_at: row.accessed_at,
        access_count: row.access_count,
      });
      migrated++;
    }
    stmt.free();

    // Build HNSW index for fast similarity search
    if (migrated > 0) {
      await idxSeg.buildHnsw({ efConstruction: 128, M: 12 });
    }

    await rvf.flush();
    db.close();

    return { migrated, indexBuilt: migrated > 0 };
  }
}

3.5 Backward Compatibility

Read Compatibility (Permanent)

The DatabaseProvider maintains permanent read support for all legacy formats:

typescript
// database-provider.ts — updated selection logic
async function selectBackend(path: string, options: StorageOptions): Promise<IMemoryBackend> {
  const format = detectFormat(path);

  switch (format) {
    case StorageFormat.RVF:
      return new RvfBackend(path, options);

    case StorageFormat.SQLITE_SQLJS:
    case StorageFormat.SQLITE_NATIVE:
      // Legacy read support — loads sql.js only when needed
      const { SqlJsBackend } = await import('./sqljs-backend.js');
      return new SqlJsBackend({ databasePath: path, ...options });

    case StorageFormat.JSON:
      const { JsonBackend } = await import('./json-backend.js');
      return new JsonBackend(path, options.verbose);

    default:
      // New installation — create RVF
      return new RvfBackend(path.replace(/\.[^.]+$/, '.rvf'), options);
  }
}

Key compatibility rules:

  1. Legacy backends become lazy-loadedsql.js moves to a dynamic import(), only loaded when a .db file is detected. Zero cost for new installations.
  2. JSON backend stays — for the simplest possible fallback (no binary deps at all).
  3. .bak files are kept indefinitely — users can manually rollback at any time.
  4. ruflo migrate rollback --storage restores .bak → original and removes .rvf.

Write Compatibility

New writes always go to RVF. The --legacy-format flag forces legacy format:

bash
# Force sql.js backend for specific use case
ruflo memory init --backend sqljs

# Force JSON backend
ruflo memory init --backend json

# Default (RVF)
ruflo memory init

Version Negotiation

The RVF file header includes format version for forward compatibility:

Offset  Size  Field          Value
0x00    4     magic          "RVF\0"
0x04    2     version_major  1
0x06    2     version_minor  0
0x08    8     segment_count  N
0x10    8     created_at     Unix timestamp
0x18    8     flags          Feature flags

If a future RVF version adds incompatible features, the reader can detect this and fall back to the legacy backend or prompt for upgrade.


3A. RVF as Embedding Provider

The Problem with Current Embedding Providers

The existing EmbeddingProvider type union supports 4 providers:

typescript
// v3/@claude-flow/embeddings/src/types.ts
export type EmbeddingProvider = 'openai' | 'transformers' | 'mock' | 'agentic-flow';

Auto-selection hierarchy: agentic-flow > transformers > mock

The two local providers carry heavy dependencies:

  • agentic-flow: 540MB (ONNX runtime, OpenTelemetry, Anthropic SDK)
  • @xenova/transformers: ~45MB (ONNX models, tokenizers)

Both download large ONNX model files at runtime. For the CLI's core use cases (memory search, pattern matching, SONA learning), these are overkill.

RVF as 5th Embedding Provider

RVF's WASM kernel includes SIMD-accelerated vector operations and can serve as a lightweight local embedding provider using the @ruvector/wasm VectorDB (5.5KB microkernel + 46KB control plane):

typescript
// New: RvfEmbeddingConfig
export interface RvfEmbeddingConfig extends EmbeddingBaseConfig {
  provider: 'rvf';
  /** Dimensions for hash-based embeddings (default: 384) */
  dimensions?: number;
  /** Path to .rvf file with pre-computed embeddings */
  rvfPath?: string;
  /** Distance metric (default: 'cosine') */
  metric?: 'cosine' | 'l2' | 'dotproduct';
  /** Use SIMD acceleration when available (default: true) */
  useSIMD?: boolean;
}

RvfEmbeddingService Implementation

typescript
import { RvfDatabase } from '@ruvector/rvf';

export class RvfEmbeddingService extends BaseEmbeddingService {
  readonly provider: EmbeddingProvider = 'rvf';
  private db: RvfDatabase | null = null;
  private readonly dimensions: number;
  private initialized = false;

  constructor(config: RvfEmbeddingConfig) {
    super(config);
    this.dimensions = config.dimensions ?? 384;
  }

  private async initialize(): Promise<void> {
    if (this.initialized) return;

    // Open or create RVF database for embedding storage/cache
    if (config.rvfPath) {
      this.db = await RvfDatabase.open(config.rvfPath);
    } else {
      this.db = await RvfDatabase.create('.cache/embeddings.rvf', {
        dimensions: this.dimensions,
        metric: 'cosine',
        compression: 'scalar',  // int8 quantization by default
      });
    }
    this.initialized = true;
  }

  async embed(text: string): Promise<EmbeddingResult> {
    await this.initialize();

    // Check in-memory LRU cache
    const cached = this.cache.get(text);
    if (cached) {
      return { embedding: cached, latencyMs: 0, cached: true };
    }

    const startTime = performance.now();

    // Generate deterministic embedding using SIMD-accelerated hash
    // This provides consistent embeddings without a neural model
    const embedding = this.hashEmbedding(text);

    // Store in RVF for HNSW-indexed retrieval
    if (this.db) {
      const id = this.textToId(text);
      await this.db.ingestBatch([{ id, vector: embedding }]);
    }

    this.cache.set(text, embedding);
    return { embedding, latencyMs: performance.now() - startTime };
  }

  /**
   * SIMD-accelerated deterministic hash embedding
   * Uses RVF WASM kernel's vector operations when available
   */
  private hashEmbedding(text: string): Float32Array {
    const embedding = new Float32Array(this.dimensions);
    // FNV-1a hash-based embedding (deterministic, fast)
    let hash = 2166136261;
    for (let i = 0; i < text.length; i++) {
      hash ^= text.charCodeAt(i);
      hash = Math.imul(hash, 16777619);
      const idx = (hash >>> 0) % this.dimensions;
      embedding[idx] += 0.1 * (((hash >> 16) & 1) ? 1 : -1);
    }
    // L2 normalize
    let norm = 0;
    for (let i = 0; i < this.dimensions; i++) norm += embedding[i] * embedding[i];
    norm = Math.sqrt(norm) || 1;
    for (let i = 0; i < this.dimensions; i++) embedding[i] /= norm;
    return embedding;
  }
}

Updated EmbeddingProvider Type

typescript
// types.ts — add 'rvf' to union
export type EmbeddingProvider = 'openai' | 'transformers' | 'mock' | 'agentic-flow' | 'rvf';

Updated Auto-Selection Hierarchy

typescript
// createEmbeddingServiceAsync — new auto-select order
// rvf > agentic-flow > transformers > mock
if (provider === 'auto') {
  // 1. Try RVF first (52KB WASM, zero external deps, always available)
  try {
    const { RvfDatabase } = await import('@ruvector/rvf');
    const service = new RvfEmbeddingService({ provider: 'rvf', dimensions: 384 });
    await service.embed('test');
    return service;
  } catch { /* fall through */ }

  // 2. Try agentic-flow (540MB, ONNX-based, highest quality)
  // ... existing code ...

  // 3. Try transformers (45MB, built-in)
  // ... existing code ...

  // 4. Fallback to mock
  // ... existing code ...
}

When to Use Each Provider

ProviderSizeQualitySpeedUse Case
rvf52KBHash-based (good for matching)<1msCLI memory search, pattern matching, SONA
agentic-flow540MBNeural (best semantic)~10msSemantic search, RAG, document similarity
transformers45MBNeural (good semantic)~50msLocal semantic search without agentic-flow
openai0KBNeural (best)~100msProduction semantic search with API
mock0KBRandom (testing only)<0.1msUnit tests, development

The rvf provider is not a replacement for neural embeddings — it provides fast, deterministic, hash-based embeddings that are excellent for exact and near-exact matching. For semantic similarity, agentic-flow or openai remain preferred. The key advantage is that rvf is always available (52KB, no downloads) and provides HNSW-indexed search out of the box.


3B. ruvLLM Storage Integration (Model Weights, LoRA, SONA)

The Problem

The ruvLLM learning system (@ruvector/ruvllm) generates persistent state that currently lives in scattered locations:

ComponentCurrent StorageSizeFormat
SONA patterns (ReasoningBank)In-memory Map<string, LearnedPattern>VariesJS objects
EWC++ weights (EwcManager)In-memory Map<string, Float64Array>VariesTypedArrays
LoRA adapters (LoraManager)JSON serialization (toJSON())SmallJSON string
Trajectories (SonaCoordinator)In-memory buffer, lost on restartVariesJS objects
HNSW memory (RuVectorProvider.searchMemory)ruvLLM HTTP server stateVariesServer-managed

All learning state is lost when the process exits. There is no persistence layer for SONA patterns, EWC weights, or LoRA adapters.

RVF as Unified Learning Storage

RVF segments map naturally to ruvLLM's learning artifacts:

learning.rvf
├── VEC_SEG      — ReasoningBank pattern embeddings (Float32Array)
├── INDEX_SEG    — HNSW index over pattern embeddings
├── KV_SEG       — Pattern metadata (type, successRate, useCount, lastUsed)
├── LOG_SEG      — Trajectory log (append-only, chronological)
├── OVERLAY      — LoRA adapter weights (A/B matrices, serialized)
└── META_SEG     — EWC Fisher diagonals + optimal weights

RvfLearningStore API

typescript
import { RvfDatabase } from '@ruvector/rvf';
import { ReasoningBank, EwcManager, LoraAdapter, SonaCoordinator } from '@ruvector/ruvllm';

export class RvfLearningStore {
  private db: RvfDatabase;

  static async open(path: string): Promise<RvfLearningStore> {
    const db = await RvfDatabase.open(path);
    return new RvfLearningStore(db);
  }

  static async create(path: string): Promise<RvfLearningStore> {
    const db = await RvfDatabase.create(path, {
      dimensions: 64,  // SONA default embedding dim
      metric: 'cosine',
      compression: 'none',  // Keep full precision for learning
    });
    return new RvfLearningStore(db);
  }

  /**
   * Persist ReasoningBank patterns to VEC_SEG + KV_SEG
   */
  async savePatterns(bank: ReasoningBank): Promise<number> {
    const stats = bank.stats();
    const entries: RvfIngestEntry[] = [];

    for (const type of Object.keys(stats.byType)) {
      for (const pattern of bank.getByType(type as PatternType)) {
        entries.push({
          id: pattern.id,
          vector: new Float32Array(pattern.embedding),
          metadata: {
            type: pattern.type,
            successRate: pattern.successRate,
            useCount: pattern.useCount,
            lastUsed: pattern.lastUsed.toISOString(),
          },
        });
      }
    }

    const result = await this.db.ingestBatch(entries);
    return result.accepted;
  }

  /**
   * Load ReasoningBank patterns from VEC_SEG
   */
  async loadPatterns(bank: ReasoningBank): Promise<number> {
    // Query all vectors (use high k to get everything)
    const status = await this.db.status();
    if (status.totalVectors === 0) return 0;

    // Reconstruct patterns from stored vectors + metadata
    // The HNSW index is automatically rebuilt on open
    return status.totalVectors;
  }

  /**
   * Persist LoRA adapter weights
   * Stored as serialized JSON in KV_SEG (small footprint)
   */
  async saveLoraAdapter(id: string, adapter: LoraAdapter): Promise<void> {
    const serialized = adapter.toJSON();
    // Store as metadata entry with special prefix
    await this.db.ingestBatch([{
      id: `lora:${id}`,
      vector: new Float32Array(64).fill(0),  // Placeholder vector
      metadata: {
        type: 'lora_adapter',
        config: JSON.stringify(adapter.getConfig()),
        frozen: adapter.isFrozen() ? 1 : 0,
        params: adapter.numParameters(),
      },
    }]);
  }

  /**
   * Persist EWC++ Fisher diagonals and optimal weights
   */
  async saveEwcState(ewc: EwcManager): Promise<void> {
    const stats = ewc.stats();
    // EWC state stored as metadata entries
    await this.db.ingestBatch([{
      id: 'ewc:state',
      vector: new Float32Array(64).fill(0),
      metadata: {
        tasksLearned: stats.tasksLearned,
        protectionStrength: stats.protectionStrength,
        forgettingRate: stats.forgettingRate,
      },
    }]);
  }

  async close(): Promise<void> {
    await this.db.close();
  }
}

Integration with SonaCoordinator

typescript
// Extended SonaCoordinator with RVF persistence
export class PersistentSonaCoordinator extends SonaCoordinator {
  private store: RvfLearningStore | null = null;

  async enablePersistence(rvfPath: string): Promise<void> {
    this.store = await RvfLearningStore.open(rvfPath);
    // Load existing patterns
    await this.store.loadPatterns(this.getReasoningBank());
  }

  override recordTrajectory(trajectory: QueryTrajectory): void {
    super.recordTrajectory(trajectory);
    // Async persist (fire-and-forget for performance)
    this.store?.savePatterns(this.getReasoningBank());
  }

  async persist(): Promise<void> {
    if (!this.store) return;
    await this.store.savePatterns(this.getReasoningBank());
    await this.store.saveEwcState(this.getEwcManager());
  }

  async shutdown(): Promise<void> {
    await this.persist();
    await this.store?.close();
  }
}

Integration with RuVectorProvider

The existing RuVectorProvider in @claude-flow/providers gains RVF-backed persistence:

typescript
// ruvector-provider.ts — extended with RVF persistence
export class RuVectorProvider extends BaseProvider {
  private learningStore: RvfLearningStore | null = null;

  protected async doInitialize(): Promise<void> {
    // ... existing init code ...

    // Initialize RVF learning persistence
    const rvfPath = this.config.providerOptions?.learningStorePath
      || './data/learning/ruvector.rvf';
    this.learningStore = await RvfLearningStore.create(rvfPath);
  }

  /**
   * Search HNSW memory — now backed by RVF (persistent across restarts)
   */
  override async searchMemory(query: string, limit = 5): Promise<Array<{
    id: string; similarity: number; content: string;
  }>> {
    if (this.learningStore) {
      // Use RVF's native HNSW search instead of HTTP call
      // This works even when ruvLLM server is not running
      const embedding = await this.embedQuery(query);
      const results = await this.learningStore.db.query(embedding, limit);
      return results.map(r => ({
        id: r.id,
        similarity: 1 - r.distance,  // Convert distance to similarity
        content: '',  // Metadata lookup
      }));
    }
    // Fallback to HTTP server
    return super.searchMemory(query, limit);
  }
}

File Layout

data/
├── memory/
│   └── memory.rvf            # Memory entries (KV + VEC + INDEX)
├── events/
│   └── events.rvf            # Event sourcing log (LOG + SNAP)
├── embeddings/
│   └── embeddings.rvf        # Embedding cache (VEC + INDEX)
└── learning/
    └── ruvector.rvf           # SONA patterns + LoRA + EWC (NEW)

Benefits

AspectBefore (in-memory)After (RVF-persisted)
SONA patternsLost on restartPersistent, HNSW-indexed
EWC weightsLost on restartPersistent across sessions
LoRA adaptersManual JSON exportAuto-persisted in OVERLAY segment
TrajectoriesBuffer, capped at 1000Append-only LOG_SEG (unlimited)
Cross-session learningNoneFull continuity
Memory searchHTTP-dependentWorks offline via RVF

4. Implementation Plan

Phase 1: RVF Backend Implementation (Week 1-2)

TaskPackageDescription
P1.1@claude-flow/memoryCreate RvfBackend implementing IMemoryBackend interface
P1.2@claude-flow/memoryMap KV_SEG to memory entry CRUD operations
P1.3@claude-flow/memoryMap VEC_SEG to embedding storage with typed quantization
P1.4@claude-flow/memoryMap INDEX_SEG to HNSW search (replace brute-force cosine)
P1.5@claude-flow/memoryAdd RvfBackend to DatabaseProvider selection chain

Phase 2: Event Store Migration (Week 2-3)

TaskPackageDescription
P2.1@claude-flow/sharedCreate RvfEventLog implementing IEventStore interface
P2.2@claude-flow/sharedMap LOG_SEG to append-only event operations
P2.3@claude-flow/sharedMap SNAP_SEG to snapshot save/load
P2.4@claude-flow/sharedMove sql.js from dependencies to optionalDependencies

Phase 3: Embedding Cache Migration (Week 3)

TaskPackageDescription
P3.1@claude-flow/embeddingsCreate RvfEmbeddingCache implementing IPersistentCache
P3.2@claude-flow/embeddingsLRU eviction via RVF metadata (no SQL DELETE needed)
P3.3@claude-flow/embeddingsTTL via RVF expiry flags (segment-level)
P3.4@claude-flow/embeddingsMove sql.js from dependencies to optionalDependencies

Phase 4: Migration Tooling (Week 3-4)

TaskPackageDescription
P4.1@claude-flow/cliruflo migrate status --storage — detect formats, report state
P4.2@claude-flow/cliruflo migrate run --storage — batch migration with progress
P4.3@claude-flow/cliruflo migrate rollback --storage — restore from .bak
P4.4@claude-flow/cliruflo migrate validate --storage — integrity verification
P4.5@claude-flow/memoryAutomatic migration in DatabaseProvider.openStorage()

Phase 5: RVF Embedding Provider (Week 4)

TaskPackageDescription
P5.1@claude-flow/embeddingsAdd 'rvf' to EmbeddingProvider type union
P5.2@claude-flow/embeddingsImplement RvfEmbeddingService with hash-based embeddings
P5.3@claude-flow/embeddingsUpdate createEmbeddingServiceAsync auto-select: rvf > agentic-flow > transformers > mock
P5.4@claude-flow/embeddingsAdd RvfEmbeddingConfig interface
P5.5@claude-flow/embeddingsTests: RVF provider passes IEmbeddingService test suite

Phase 6: ruvLLM Learning Persistence (Week 4-5)

TaskPackageDescription
P6.1@claude-flow/memoryCreate RvfLearningStore class (VEC + KV + LOG segments for SONA)
P6.2@claude-flow/memoryImplement savePatterns / loadPatterns for ReasoningBank persistence
P6.3@claude-flow/memoryImplement LoRA adapter serialization to RVF OVERLAY segment
P6.4@claude-flow/memoryImplement EWC++ Fisher diagonal persistence to META_SEG
P6.5@claude-flow/providersExtend RuVectorProvider with RVF-backed searchMemory()
P6.6@claude-flow/memoryCreate PersistentSonaCoordinator wrapping SonaCoordinator

Phase 7: Progressive Download System (Week 5-6)

TaskPackageDescription
P7.1@claude-flow/cliImplement ProgressiveDownloader class
P7.2@claude-flow/cliCreate capability manifest schema and seed registry
P7.3@claude-flow/cliruflo capabilities status/install/remove/list/prefetch CLI commands
P7.4@claude-flow/embeddingsIntegrate progressive download into createEmbeddingServiceAsync
P7.5@claude-flow/providersIntegrate progressive download into RuVectorProvider for LLM models
P7.6@claude-flow/cliPackage Phase 1-2 capabilities as .rvf files on CDN/IPFS

Phase 8: Dependency Cleanup (Week 6-7)

TaskPackageDescription
P8.1@claude-flow/sharedRemove sql.js from hard dependencies
P8.2@claude-flow/memoryRemove sql.js from hard dependencies
P8.3@claude-flow/embeddingsRemove sql.js from hard dependencies
P8.4AllLazy-load sql.js only for legacy .db file reads
P8.5AllUpdate Docker images to exclude sql.js entirely
P8.6AllMove agentic-flow, @xenova/transformers to progressive downloads
P8.7RootPublish updated packages to npm

5. RVF Backend API Design

RvfBackend (implements IMemoryBackend)

typescript
import { RvfFile, VecSegment, KvSegment, IndexSegment } from '@ruvector/rvf';

export class RvfBackend implements IMemoryBackend {
  private rvf: RvfFile;
  private kv: KvSegment;
  private vec: VecSegment;
  private idx: IndexSegment;

  async initialize(): Promise<void> {
    this.rvf = await RvfFile.open(this.path, { create: true });
    this.kv = this.rvf.segment('KV_SEG');
    this.vec = this.rvf.segment('VEC_SEG');
    this.idx = this.rvf.segment('INDEX_SEG');
  }

  async store(entry: MemoryEntry): Promise<void> {
    // Metadata goes to KV segment
    this.kv.put(entry.id, {
      key: entry.key,
      content: entry.content,
      type: entry.type,
      namespace: entry.namespace,
      tags: entry.tags,
      metadata: entry.metadata,
      created_at: entry.createdAt,
      updated_at: entry.updatedAt,
    });

    // Vector goes to VEC segment (with optional quantization)
    if (entry.embedding) {
      this.vec.insert(entry.id, entry.embedding, {
        quantization: this.config.quantization || 'fp32',
      });
    }
  }

  async search(embedding: Float32Array, options: SearchOptions): Promise<SearchResult[]> {
    // HNSW search — 150x-12,500x faster than brute-force
    const results = this.idx.search(embedding, {
      k: options.k,
      efSearch: options.efSearch || 64,
      threshold: options.threshold,
    });

    return results.map(r => ({
      entry: this.kv.get(r.id),
      score: r.similarity,
      distance: r.distance,
    }));
  }

  async persist(): Promise<void> {
    await this.rvf.flush();  // Append-only — no full rewrite needed
  }

  async shutdown(): Promise<void> {
    await this.rvf.flush();
    this.rvf.close();
  }
}

RvfEventLog (implements IEventStore)

typescript
export class RvfEventLog implements IEventStore {
  private rvf: RvfFile;
  private log: LogSegment;
  private snap: SnapSegment;

  async append(event: DomainEvent): Promise<void> {
    // Append-only — crash-safe, no WAL needed
    this.log.append(event.id, {
      type: event.type,
      aggregate_id: event.aggregateId,
      version: event.version,
      timestamp: event.timestamp,
      payload: event.payload,
    });
  }

  async getEvents(aggregateId: string, fromVersion?: number): Promise<DomainEvent[]> {
    return this.log.scan({
      filter: { aggregate_id: aggregateId },
      fromVersion,
      order: 'asc',
    });
  }

  async saveSnapshot(aggregateId: string, state: any, version: number): Promise<void> {
    this.snap.put(aggregateId, { state, version, timestamp: Date.now() });
  }
}

6. Size Impact Analysis

Before (current)

PackageHard DepsTotal Install Weight
@claude-flow/sharedsql.js (18MB)~30MB
@claude-flow/memorysql.js (18MB, deduped)~5MB own
@claude-flow/embeddingssql.js (18MB, deduped)~3MB own
Total sql.js contribution~18MB (deduped)

After (RVF)

PackageHard DepsTotal Install Weight
@claude-flow/shared@ruvector/rvf (WASM: 52KB, native: ~2MB)~13MB (−17MB)
@claude-flow/memory(uses shared's rvf)~5MB (no change)
@claude-flow/embeddings(uses shared's rvf)~3MB (no change)
Total RVF contribution52KB WASM or ~2MB native

Net savings

MetricBeforeAfterImprovement
sql.js install size18MB0 (lazy optional)−18MB
RVF install size052KB (WASM)+52KB
Vector searchO(n) brute-forceO(log n) HNSW150x-12,500x faster
Quantizationfp32 onlyfp16/int8/int4/binary2-8x memory reduction
Docker lite image324MB~306MB−18MB
Cold start vectorsLoad all into memoryProgressive 3-layer70% recall on first query
Embedding provider (auto)540MB (agentic-flow)52KB (rvf)−540MB for basic use
SONA learning persistenceNone (lost on restart)RVF fileFull cross-session continuity
LoRA adapter storageManual JSON exportAuto-persisted OVERLAYZero-effort persistence

7. Risk Assessment

RiskLikelihoodImpactMitigation
RVF format changes in futureLowMediumVersion header in file; forward-compat reads
Migration corrupts dataLowHighAtomic write (temp + rename); .bak always kept
WASM fallback slower than sql.jsMediumLowRVF WASM kernel is 52KB vs 18MB; simpler = faster
Users depend on SQLite toolingMediumLowLegacy read support permanent; --backend sqljs flag
@ruvector/rvf npm availabilityLowHighVendor WASM binary into @claude-flow/shared as fallback

8. Testing Strategy

Unit Tests:
  ✓ RvfBackend passes all existing IMemoryBackend test suite
  ✓ RvfEventLog passes all existing IEventStore test suite
  ✓ RvfEmbeddingCache passes all existing IPersistentCache test suite
  ✓ Format detection correctly identifies .db, .json, .rvf files
  ✓ Migration produces byte-identical data (hash comparison)
  ✓ RvfEmbeddingService passes IEmbeddingService test suite
  ✓ RvfEmbeddingService produces deterministic embeddings for same input
  ✓ RvfLearningStore saves/loads ReasoningBank patterns correctly
  ✓ RvfLearningStore saves/loads LoRA adapters with weight fidelity
  ✓ RvfLearningStore saves/loads EWC++ state correctly

Integration Tests:
  ✓ Auto-migration on first access with legacy .db file
  ✓ Auto-migration on first access with legacy .json file
  ✓ Rollback restores .bak to original
  ✓ CLI `migrate status/run/rollback/validate` commands
  ✓ Mixed-format project (some .db, some .rvf) works
  ✓ Auto-select picks 'rvf' provider when no heavy deps installed
  ✓ Auto-select picks 'agentic-flow' when available (higher quality)
  ✓ PersistentSonaCoordinator survives process restart with patterns intact
  ✓ RuVectorProvider.searchMemory works offline via RVF (no HTTP server)

Performance Tests:
  ✓ HNSW search <1ms for 10K vectors (vs ~100ms brute-force)
  ✓ RVF write throughput >50K ops/sec
  ✓ Memory usage <50% of sql.js for equivalent dataset
  ✓ Cold start <100ms (progressive HNSW loading)
  ✓ RVF hash embedding <0.1ms per text (vs 10-100ms for neural)
  ✓ Learning store persist <10ms for 1000 patterns

Backward Compatibility Tests:
  ✓ v3.5.x .db files open correctly in v3.6+ with auto-migration
  ✓ v3.5.x .json files open correctly in v3.6+ with auto-migration
  ✓ --backend sqljs flag still works (lazy-loads sql.js)
  ✓ --backend json flag still works
  ✓ Docker image without sql.js starts and serves MCP
  ✓ Existing 'agentic-flow' provider unaffected by new 'rvf' provider

9. Consequences

Positive

  • −18MB hard dependency removed from core install path
  • 150x-12,500x faster vector search via native HNSW (vs brute-force cosine)
  • 2-8x memory reduction via quantization (int8/int4) for large embedding sets
  • Crash-safe append-only format — no manual export/save cycle
  • Progressive loading — 70% recall on first query before full index loads
  • Unified format — one .rvf file replaces separate .db + index files
  • COW branching — cheap snapshots for event sourcing (<3ms)
  • Docker images shrink further when sql.js is fully eliminated
  • Zero-dep local embeddings — 52KB RVF provider replaces 540MB agentic-flow for basic use
  • Persistent learning — SONA patterns, LoRA adapters, EWC weights survive restarts
  • Offline intelligenceRuVectorProvider.searchMemory works without HTTP server

Negative

  • Migration complexity — must support 3 legacy formats (sql.js .db, better-sqlite3 .db, JSON)
  • New dependency@ruvector/rvf replaces sql.js (smaller, but still a dep)
  • Learning curve — team must understand RVF segment model vs SQL tables
  • Loss of SQL tooling — can't sqlite3 memory.db to inspect data (mitigated by ruflo memory list)
  • Hash embeddings are not semanticrvf provider good for matching, not meaning (mitigated by fallback to neural providers)

Neutral

  • Backward compatibility maintained — legacy formats always readable
  • No user-visible API changes — same IMemoryBackend interface, same CLI commands
  • Opt-in for existing installs — auto-migration on first access, manual rollback available

10. Progressive Download Architecture

The Problem

Current install paths are all-or-nothing:

  • npx ruflo@latest installs 1.3GB (all optional deps)
  • --omit=optional drops to ~30MB but loses all intelligence features
  • Users who want some advanced features must install all of them

Progressive Capability Manifold

RVF's segment model enables a progressive download approach where capabilities are fetched on-demand and stored as RVF segments:

Phase 0: Core CLI (always installed)
  ruflo (5KB) → @claude-flow/cli (9MB) → @claude-flow/shared (~13MB with RVF)
  Total: ~22MB — MCP server, memory, events, CLI commands

Phase 1: Lightweight Embeddings (downloaded on first use)
  @ruvector/rvf WASM kernel (52KB)
  Hash-based embeddings — no neural model needed
  Downloaded to: ~/.ruflo/capabilities/rvf-wasm.rvf

Phase 2: Neural Embeddings (downloaded on demand)
  all-MiniLM-L6-v2 ONNX model (~22MB)
  Stored as: ~/.ruflo/capabilities/models/minilm-l6-v2.rvf
  Segment: WASM_SEG (model weights) + META_SEG (tokenizer config)

Phase 3: Local LLM Inference (downloaded on demand)
  GGUF model files via ruvLLM
  Stored as: ~/.ruflo/capabilities/models/<model>.rvf
  Segment: MODEL_SEG (quantized weights) + OVERLAY (LoRA adapters)

Phase 4: Advanced Intelligence (downloaded on demand)
  CNN/GNN/Transformer kernels for specialized tasks
  Stored as: ~/.ruflo/capabilities/kernels/<kernel>.rvf
  Segment: KERNEL_SEG (WASM bytecode) + EBPF_SEG (filters)

Capability Registry

A manifest file tracks what's installed and what's available:

typescript
interface CapabilityManifest {
  version: string;
  capabilities: Record<string, CapabilityEntry>;
}

interface CapabilityEntry {
  id: string;
  name: string;
  phase: 0 | 1 | 2 | 3 | 4;
  status: 'installed' | 'available' | 'downloading' | 'failed';
  size: number;            // Download size in bytes
  installedSize: number;   // On-disk size after RVF packing
  rvfPath: string;         // Path to .rvf file
  dependencies: string[];  // Other capabilities required
  provides: string[];      // Features this capability enables
  checksum: string;        // SHA-256 of the .rvf file
  downloadUrl: string;     // CDN/IPFS URL
  lastUpdated: string;     // ISO timestamp
}

Progressive Download Manager

typescript
import { RvfDatabase } from '@ruvector/rvf';

export class ProgressiveDownloader {
  private manifestPath: string;
  private capabilitiesDir: string;

  constructor(rufloHome = '~/.ruflo') {
    this.manifestPath = `${rufloHome}/capabilities/manifest.json`;
    this.capabilitiesDir = `${rufloHome}/capabilities`;
  }

  /**
   * Ensure a capability is available, downloading if needed
   * Called lazily on first use (not at install time)
   */
  async ensure(capabilityId: string): Promise<string> {
    const manifest = await this.loadManifest();
    const entry = manifest.capabilities[capabilityId];

    if (!entry) throw new Error(`Unknown capability: ${capabilityId}`);
    if (entry.status === 'installed') return entry.rvfPath;

    // Download the capability
    console.info(`[ruflo] Downloading ${entry.name} (${this.formatSize(entry.size)})...`);
    entry.status = 'downloading';
    await this.saveManifest(manifest);

    const rvfPath = `${this.capabilitiesDir}/${capabilityId}.rvf`;

    // Download dependencies first
    for (const dep of entry.dependencies) {
      await this.ensure(dep);
    }

    // Download and verify
    await this.downloadWithProgress(entry.downloadUrl, rvfPath, entry.size);
    await this.verifyChecksum(rvfPath, entry.checksum);

    entry.status = 'installed';
    entry.rvfPath = rvfPath;
    await this.saveManifest(manifest);

    console.info(`[ruflo] ✓ ${entry.name} ready`);
    return rvfPath;
  }

  /**
   * Pre-download capabilities for offline use
   */
  async prefetch(phase: number): Promise<void> {
    const manifest = await this.loadManifest();
    const toDownload = Object.values(manifest.capabilities)
      .filter(c => c.phase <= phase && c.status !== 'installed');

    for (const cap of toDownload) {
      await this.ensure(cap.id);
    }
  }

  /**
   * List installed vs available capabilities
   */
  async status(): Promise<{
    installed: CapabilityEntry[];
    available: CapabilityEntry[];
    totalInstalled: number;
    totalAvailable: number;
  }> {
    const manifest = await this.loadManifest();
    const all = Object.values(manifest.capabilities);
    return {
      installed: all.filter(c => c.status === 'installed'),
      available: all.filter(c => c.status === 'available'),
      totalInstalled: all.filter(c => c.status === 'installed')
        .reduce((s, c) => s + c.installedSize, 0),
      totalAvailable: all.filter(c => c.status === 'available')
        .reduce((s, c) => s + c.size, 0),
    };
  }

  /**
   * Remove a capability to free disk space
   */
  async remove(capabilityId: string): Promise<void> {
    const manifest = await this.loadManifest();
    const entry = manifest.capabilities[capabilityId];
    if (!entry || entry.status !== 'installed') return;

    // Check if other capabilities depend on this one
    const dependents = Object.values(manifest.capabilities)
      .filter(c => c.dependencies.includes(capabilityId) && c.status === 'installed');
    if (dependents.length > 0) {
      throw new Error(
        `Cannot remove ${capabilityId}: required by ${dependents.map(d => d.id).join(', ')}`
      );
    }

    const { unlinkSync } = await import('fs');
    unlinkSync(entry.rvfPath);
    entry.status = 'available';
    await this.saveManifest(manifest);
  }
}

CLI Commands

bash
# Check what's installed and available
ruflo capabilities status

# Download specific capability
ruflo capabilities install neural-embeddings

# Download all capabilities up to phase N
ruflo capabilities prefetch --phase 2

# Remove a capability
ruflo capabilities remove local-llm-qwen

# List all available models/kernels
ruflo capabilities list --phase 3
ruflo capabilities list --type model
ruflo capabilities list --type kernel

Available Capabilities by Phase

PhaseCapability IDSizeProvides
0core-cli22MBCLI, MCP, memory, events (always installed)
1rvf-wasm52KBHash embeddings, HNSW search, RVF storage
1rvf-native~2MBNative NAPI-RS backend (faster than WASM)
2model-minilm-l6-v222MBall-MiniLM-L6-v2 ONNX embedding model
2model-bge-small33MBbge-small-en-v1.5 embedding model
2model-e5-small33MBe5-small-v2 embedding model
3llm-qwen-0.5b400MBQwen 2.5 0.5B (CPU-friendly)
3llm-smollm-135m100MBSmolLM 135M (ultra-lightweight)
3llm-phi-4-mini2.2GBPhi-4 mini (high quality)
3llm-llama-3.2-1b1.3GBLlama 3.2 1B
4kernel-cnn-classifier5MBCNN image classification kernel
4kernel-gnn-graph8MBGNN graph analysis kernel
4kernel-transformer-seq12MBTransformer sequence processing
4sona-full15MBFull SONA intelligence (MoE + EWC++)

RVF as Capability Container

Each capability is packaged as a single .rvf file:

model-minilm-l6-v2.rvf
├── META_SEG       — Model metadata (name, dimensions, tokenizer config)
├── WASM_SEG       — ONNX model weights (quantized to int8)
├── INDEX_SEG      — Pre-built vocabulary HNSW index
└── MANIFEST       — Version, checksum, dependencies
llm-qwen-0.5b.rvf
├── META_SEG       — Model card, quantization info
├── MODEL_SEG      — GGUF model weights (Q4_K_M quantization)
├── OVERLAY        — Default LoRA adapter
├── KV_SEG         — Tokenizer vocabulary
└── MANIFEST       — Version, hardware requirements
kernel-gnn-graph.rvf
├── KERNEL_SEG     — WASM kernel bytecode
├── EBPF_SEG       — eBPF filter programs
├── META_SEG       — API schema, input/output specs
└── MANIFEST       — Version, supported operations

Integration with Embedding Service

typescript
// createEmbeddingServiceAsync — progressive capability loading
if (provider === 'auto') {
  const downloader = new ProgressiveDownloader();

  // Phase 1: RVF hash embeddings (always available after first use, 52KB)
  try {
    const rvfPath = await downloader.ensure('rvf-wasm');
    return new RvfEmbeddingService({ provider: 'rvf', rvfPath });
  } catch { /* fall through */ }

  // Phase 2: Neural embeddings (downloaded on demand, 22MB)
  try {
    const modelPath = await downloader.ensure('model-minilm-l6-v2');
    return new RvfNeuralEmbeddingService({ provider: 'rvf-neural', modelPath });
  } catch { /* fall through */ }

  // Fallback to mock
  return new MockEmbeddingService({ dimensions: 384 });
}

Migration from npm Optional Dependencies

The progressive download approach replaces npm's optionalDependencies:

Current (npm optional)Progressive (RVF)
Installed at npm install timeDownloaded on first use
All-or-nothing (--omit=optional)Granular per-capability
1.3GB if included22MB base + on-demand
Stale until npm updateVersion-checked per capability
No rollbackRemove individual capabilities
Platform-specific builds at installWASM universal + native optional

11. References