Back to Woods

Backend Adaptability Matrix

docs/BACKEND_MATRIX.md

1.2.030.3 KB
Original Source

Backend Adaptability Matrix

Purpose

This document provides deep analysis of every backend option Woods supports, with tradeoffs, performance characteristics, and guidance for selecting the right combination for a given environment.

The goal is that an agent or developer reading this document can make an informed backend selection without external research.


Vector Stores

pgvector (PostgreSQL Extension)

What it is: PostgreSQL extension that adds vector similarity search directly to Postgres.

Best for: Teams already on PostgreSQL who want to minimize infrastructure. Rails 8 apps with the Solid suite. Codebases under ~5000 units.

Strengths:

  • Zero additional infrastructure if you're on PostgreSQL
  • Transactional consistency with metadata (same database)
  • Familiar SQL interface, works with ActiveRecord
  • Supports HNSW and IVFFlat indexing
  • Filtered search via standard WHERE clauses
  • Backed by strong open-source community

Weaknesses:

  • Search performance degrades at high scale (>100K vectors) without careful tuning
  • IVFFlat requires periodic reindexing after large batch inserts
  • HNSW index builds are memory-intensive
  • Competes for resources with your application database
  • No built-in sharding for vectors

Configuration:

ruby
config.vector_store = :pgvector
config.vector_store_connection = ENV["DATABASE_URL"]
# Or separate database:
config.vector_store_connection = ENV["VECTOR_DATABASE_URL"]

Schema:

sql
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE woods_embeddings (
  id TEXT PRIMARY KEY,
  embedding vector(1536),
  metadata JSONB NOT NULL DEFAULT '{}',
  created_at TIMESTAMPTZ DEFAULT NOW(),
  updated_at TIMESTAMPTZ DEFAULT NOW()
);

-- HNSW index (preferred for < 1M vectors)
CREATE INDEX ON woods_embeddings
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- GIN index on metadata for filtered queries
CREATE INDEX ON woods_embeddings
  USING gin (metadata jsonb_path_ops);

Performance notes:

  • HNSW: ~5ms search at 10K vectors, ~20ms at 100K. Memory: ~1.5x vector size.
  • IVFFlat: Faster builds, slower search. Better for bulk insert then query patterns.
  • For codebase indexing (~1000-5000 units, potentially 5000-20000 chunks), HNSW is appropriate.
  • Recommend a separate database from your application if running in production.

When to use: PostgreSQL is your primary database, you value simplicity, and scale is under ~50K vectors.

When to avoid: MySQL is your primary database (can't use pgvector), you need sub-millisecond search, or you're indexing multiple large codebases.


Qdrant

What it is: Purpose-built vector database with native filtering, written in Rust.

Best for: Teams with Docker-based infrastructure who want dedicated vector search. Self-hosted environments. Multi-codebase indexing.

Strengths:

  • Purpose-built for vector search (consistently fast)
  • Native payload filtering (no joins needed)
  • Built-in quantization for memory efficiency
  • Excellent Docker support, trivial to add to docker-compose
  • gRPC and REST APIs
  • Snapshot and backup support
  • Handles batch operations well

Weaknesses:

  • Additional infrastructure to manage
  • Separate from your application database (no transactional consistency)
  • Ruby client library is less mature than PostgreSQL tooling
  • Overkill for small codebases

Configuration:

ruby
config.vector_store = :qdrant
config.vector_store_url = ENV.fetch("QDRANT_URL", "http://localhost:6333")
config.vector_store_collection = "woods"

Docker Compose:

yaml
services:
  qdrant:
    image: qdrant/qdrant:v1.12.1
    ports:
      - "6333:6333"
      - "6334:6334"
    volumes:
      - qdrant-data:/qdrant/storage
    environment:
      QDRANT__SERVICE__GRPC_PORT: 6334
    deploy:
      resources:
        limits:
          memory: 512M  # Sufficient for ~50K vectors

volumes:
  qdrant-data:

Performance notes:

  • ~2ms search at 10K vectors, ~5ms at 100K
  • Memory: ~100MB for 10K 1536-dim vectors with HNSW
  • Quantization can reduce memory by 4x with minimal quality loss
  • Batch upsert: ~1000 vectors/second

When to use: Docker infrastructure, self-hosted, need for filtered search, multi-codebase, or want separation of concerns between app DB and vector search.

When to avoid: Minimal infrastructure footprint is a priority, or team doesn't want another service to manage.


Pinecone

What it is: Fully managed cloud vector database.

Best for: Teams that prefer managed services and don't want to operate vector infrastructure.

Strengths:

  • Zero operational overhead
  • Scales automatically
  • Good SDK and documentation
  • Metadata filtering
  • Serverless pricing option (pay per query)

Weaknesses:

  • Vendor lock-in
  • Data leaves your infrastructure
  • Latency depends on region
  • Cost scales with usage
  • No self-hosted option
  • Free tier is limited

Configuration:

ruby
config.vector_store = :pinecone
config.vector_store_api_key = ENV["PINECONE_API_KEY"]
config.vector_store_environment = "us-east-1"
config.vector_store_index = "woods"

When to use: Cloud-native teams, no ops capacity for self-hosting, data sensitivity is not a concern.

When to avoid: Self-hosted requirements, cost sensitivity at scale, data must stay on-premise.


SQLite-vss / FAISS (Local)

What it is: File-based vector search using SQLite for metadata and FAISS for vector operations.

Best for: Local development, zero-dependency setups, evaluation, single-developer use.

Strengths:

  • Zero external dependencies
  • No network latency
  • Works offline
  • Trivial setup
  • Good for testing and development

Weaknesses:

  • Single-process access (no concurrent writes)
  • No network access (can't share across services)
  • FAISS index must fit in memory
  • Limited filtering capabilities
  • No built-in persistence management for FAISS

Configuration:

ruby
config.vector_store = :sqlite_faiss
# Automatically uses output_dir for storage

When to use: Getting started, local development, evaluation, CI testing.

When to avoid: Multi-user access, production workloads, CI pipelines that need shared state.


Chroma

What it is: Open-source embedding database with a focus on developer experience.

Best for: Prototyping, Python-heavy teams (Ruby client exists but is third-party).

Strengths:

  • Simple API
  • Built-in embedding functions
  • Document-oriented (stores text alongside vectors)
  • Good developer experience

Weaknesses:

  • Ruby client is community-maintained
  • Less mature than Qdrant for production use
  • Limited filtering compared to Qdrant/pgvector
  • Performance characteristics less documented

When to use: Prototyping, if you're already using Chroma elsewhere.

When to avoid: Production Rails apps, when Ruby-native tooling matters.


Milvus

What it is: Open-source vector database designed for massive scale.

Best for: Very large deployments, multi-tenant indexing across many codebases.

Strengths:

  • Handles billions of vectors
  • GPU-accelerated search
  • Multi-tenancy support
  • Rich indexing options

Weaknesses:

  • Complex to deploy (requires etcd, MinIO, Pulsar)
  • Overkill for single-codebase use
  • Operational complexity
  • Ruby client is third-party

When to use: Indexing dozens of large codebases, enterprise deployment.

When to avoid: Single codebase, minimal infrastructure preference.


Embedding Providers

OpenAI text-embedding-3-small

Dimensions: 1536 Max tokens: 8191 Cost: ~$0.02 per 1M tokens Latency: ~100ms single, ~500ms batch of 100

Strengths: Good quality/cost ratio, fast, well-documented, reliable API. Weaknesses: Data sent to OpenAI, API dependency, not code-optimized.

Best for: General use, getting started, teams already using OpenAI.

OpenAI text-embedding-3-large

Dimensions: 3072 Max tokens: 8191 Cost: ~$0.13 per 1M tokens Latency: ~150ms single, ~800ms batch of 100

Strengths: Higher quality than small, supports dimension reduction (can use 1536 dims for compatibility). Weaknesses: 6.5x cost of small, marginal quality improvement for code.

Best for: When retrieval quality is paramount and cost is not a concern.

Voyage Code 3 / Code 2

Dimensions: 1024 (Code 3) / 1536 (Code 2) Max tokens: 32000 (Code 3) / 16000 (Code 2) Cost: ~$0.06 per 1M tokens (verify current pricing for Code 3) Latency: ~120ms single

Strengths: Specifically trained on code. Code 3's 32K context window means even very large units can be embedded without truncation. Code 3 uses 1024 dimensions (smaller vectors, less storage) while maintaining strong retrieval quality. Benchmarks show strong performance on code retrieval tasks. Weaknesses: Smaller community than OpenAI. API availability/reliability less battle-tested.

Best for: Code-specific retrieval where embedding quality matters. Code 3's 32K token window is significant — many extracted units exceed 8K tokens, especially with inlined concerns. The lower dimensionality (1024 vs 1536) also reduces vector storage costs.

Ollama / Nomic-embed-text (Self-hosted)

Dimensions: 768 (nomic-embed-text) / varies by model Max tokens: 8192 (nomic) Cost: Hardware only Latency: ~200ms single (GPU), ~2s single (CPU)

Strengths: Fully self-hosted, no data leaves infrastructure, no API costs, works offline. Weaknesses: Requires GPU for reasonable performance (CPU is 10x slower), lower quality than commercial models, smaller dimensions may reduce retrieval precision.

Best for: Security-sensitive environments, air-gapped networks, cost-sensitive at scale.

Anthropic Embeddings

Note: Anthropic does not currently offer a standalone embedding API. If this changes, it would be a natural fit given the system's agentic focus. Monitor for availability.

Embedding Selection Guidance

PriorityRecommendation
Best quality for codeVoyage Code 3
Best general-purposeOpenAI text-embedding-3-small
Best for large unitsVoyage Code 3 (32K context)
Lowest costOllama + nomic-embed-text
No external dependenciesOllama + nomic-embed-text
Maximum qualityOpenAI text-embedding-3-large

Critical consideration: Embedding dimensions must match across your entire index. Changing embedding providers requires a full re-index. Choose carefully at the start.


Metadata Stores

PostgreSQL

Best for: Most production deployments. JSON operators for metadata queries, full-text search for keywords, recursive CTEs for graph traversal (can double as graph store).

Key features:

  • JSONB columns with GIN indexes for fast metadata filtering
  • ts_vector / ts_query for full-text keyword search
  • Recursive CTEs for graph operations (can serve as graph store too)
  • Familiar to Rails developers
  • Works with ActiveRecord

Dual use as graph store: PostgreSQL can handle both metadata and graph storage with recursive CTEs, eliminating the need for a separate graph backend at moderate scale (~5000 nodes).

sql
-- Recursive CTE for dependency traversal
WITH RECURSIVE deps AS (
  SELECT target_id, 1 as depth
  FROM edges
  WHERE source_id = 'Order'
  UNION ALL
  SELECT e.target_id, d.depth + 1
  FROM edges e
  JOIN deps d ON e.source_id = d.target_id
  WHERE d.depth < 3
)
SELECT * FROM deps;

MySQL

Best for: Teams already on MySQL who don't want to add PostgreSQL. This is common — MySQL (including Percona, MariaDB, Aurora) remains the most prevalent Rails database in production.

Key features:

  • JSON functions for metadata (JSON_EXTRACT, JSON_CONTAINS, JSON_OVERLAPS)
  • Generated columns for indexable JSON extraction
  • Full-text indexes with MATCH ... AGAINST for keyword search
  • Recursive CTEs in 8.0+ for graph traversal (dual-use as graph store)
  • Familiar to Rails developers, works with ActiveRecord

Schema:

sql
CREATE TABLE woods_units (
  id VARCHAR(255) PRIMARY KEY,
  unit_type VARCHAR(50) NOT NULL,
  namespace VARCHAR(255),
  file_path VARCHAR(500),
  source_code MEDIUMTEXT,
  metadata JSON NOT NULL,
  
  -- Generated columns for indexable fields extracted from JSON
  change_frequency VARCHAR(20) GENERATED ALWAYS AS (
    JSON_UNQUOTE(JSON_EXTRACT(metadata, '$.git.change_frequency'))
  ) STORED,
  importance VARCHAR(20) GENERATED ALWAYS AS (
    JSON_UNQUOTE(JSON_EXTRACT(metadata, '$.importance'))
  ) STORED,
  association_count INT GENERATED ALWAYS AS (
    JSON_LENGTH(JSON_EXTRACT(metadata, '$.associations'))
  ) STORED,
  
  created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  
  INDEX idx_type (unit_type),
  INDEX idx_namespace (namespace),
  INDEX idx_change_freq (change_frequency),
  INDEX idx_importance (importance),
  FULLTEXT idx_fulltext (id, file_path, source_code)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;

Keyword search:

sql
-- Full-text search across identifiers and source
SELECT id, MATCH(id, file_path, source_code) AGAINST('checkout payment' IN BOOLEAN MODE) AS relevance
FROM woods_units
WHERE MATCH(id, file_path, source_code) AGAINST('checkout payment' IN BOOLEAN MODE)
ORDER BY relevance DESC
LIMIT 20;

-- JSON-based method name search
SELECT id FROM woods_units
WHERE JSON_CONTAINS(
  JSON_EXTRACT(metadata, '$.method_names'),
  '"process_payment"'
);

-- Multi-value filter on metadata
SELECT id FROM woods_units
WHERE unit_type IN ('model', 'service')
  AND change_frequency IN ('hot', 'active')
  AND JSON_OVERLAPS(
    JSON_EXTRACT(metadata, '$.dependencies'),
    JSON_ARRAY('Order', 'Payment')
  );

Graph traversal (MySQL 8.0+ recursive CTEs):

sql
CREATE TABLE woods_edges (
  source_id VARCHAR(255) NOT NULL,
  target_id VARCHAR(255) NOT NULL,
  relationship VARCHAR(50) NOT NULL,
  PRIMARY KEY (source_id, target_id, relationship),
  INDEX idx_target (target_id),
  INDEX idx_relationship (relationship)
) ENGINE=InnoDB;

-- Forward dependency traversal
WITH RECURSIVE deps AS (
  SELECT target_id, relationship, 1 AS depth
  FROM woods_edges
  WHERE source_id = 'Order'
  UNION ALL
  SELECT e.target_id, e.relationship, d.depth + 1
  FROM woods_edges e
  INNER JOIN deps d ON e.source_id = d.target_id
  WHERE d.depth < 3
)
SELECT DISTINCT target_id, MIN(depth) as min_depth
FROM deps
GROUP BY target_id
ORDER BY min_depth;

-- Reverse: who depends on Order?
WITH RECURSIVE dependents AS (
  SELECT source_id, relationship, 1 AS depth
  FROM woods_edges
  WHERE target_id = 'Order'
  UNION ALL
  SELECT e.source_id, e.relationship, d.depth + 1
  FROM woods_edges e
  INNER JOIN dependents d ON e.target_id = d.source_id
  WHERE d.depth < 3
)
SELECT DISTINCT source_id, MIN(depth) as min_depth
FROM dependents
GROUP BY source_id
ORDER BY min_depth;

Dual use as graph store: MySQL 8.0+ supports recursive CTEs, so it can serve as both metadata and graph store — same pattern as PostgreSQL. Performance is adequate for ~5000 nodes. For Percona Cluster / Group Replication setups, the graph table should use InnoDB with appropriate row-level locking.

MariaDB note: MariaDB 10.2+ supports recursive CTEs and JSON functions but the syntax diverges slightly from MySQL 8.0 (e.g., JSON_VALUE vs JSON_EXTRACT). Generated columns are supported. If targeting MariaDB, test JSON function compatibility.

Vector search with MySQL: MySQL has no native vector extension equivalent to pgvector. For MySQL-primary stacks, vector search must use a separate backend:

  • Qdrant — Best fit for Docker/self-hosted MySQL environments
  • Pinecone — Best fit for managed/cloud MySQL environments
  • FAISS — Best fit for local development alongside MySQL

This is the primary architectural difference from PostgreSQL: MySQL handles metadata and graph, but vectors live elsewhere.

Configuration:

ruby
config.metadata_store = :mysql
config.metadata_store_connection = ENV["DATABASE_URL"]
# Or use the application's existing ActiveRecord connection:
config.metadata_store_connection = :active_record

# Vector store must be separate
config.vector_store = :qdrant  # or :pinecone, :sqlite_faiss

Performance notes:

  • Generated columns + B-tree indexes make filtered queries fast without parsing JSON at query time
  • JSON_CONTAINS and JSON_OVERLAPS on unindexed JSON paths do full scans — use generated columns for frequently filtered fields
  • Full-text search with MATCH ... AGAINST is fast but requires InnoDB full-text indexes (available since 5.6)
  • Recursive CTEs in MySQL 8.0 are ~20-40% slower than PostgreSQL for deep traversals but adequate for codebase-scale graphs
  • For Percona XtraDB Cluster, writes to codebase tables should use a dedicated connection to avoid certification conflicts with application writes

When to use: MySQL/Percona/Aurora is your primary database, team has MySQL expertise, you don't want to introduce PostgreSQL.

When to avoid: You need vector search in the same database (use PostgreSQL + pgvector instead), or you're on MySQL < 8.0 (no recursive CTEs, limited JSON support).

SQLite

Best for: Local development, zero-dependency setups, testing.

Key features:

  • JSON1 extension for metadata queries
  • FTS5 for full-text search
  • Zero setup
  • Single-file database

Limitations:

  • Single writer at a time
  • No network access
  • Limited JSON querying compared to PostgreSQL

In-Memory

Best for: Testing, evaluation, small codebases.

Loads from extracted JSON files on startup. All queries run against in-memory hash maps. Fast but ephemeral.


Graph Stores

In-Memory (Default)

Loads dependency_graph.json into a Ruby hash structure. Supports BFS traversal with visited set, PageRank scoring, and structural analysis via GraphAnalyzer (orphan detection, dead-end detection, hub identification, cycle detection, bridge detection). Suitable for up to ~5000 nodes.

Memory: ~10MB for 2000 nodes with average 5 edges each. Traversal: < 1ms for depth-2 BFS. Analysis: GraphAnalyzer provides orphans, dead_ends, hubs(limit:), cycles, bridges(limit:, sample_size:), and a combined analyze method.

MySQL 8.0+ (Recursive CTEs)

Same pattern as PostgreSQL — stores edges in a table, traverses with recursive queries. MySQL 8.0 introduced recursive CTEs (WITH RECURSIVE), making this viable for MySQL-primary stacks.

sql
CREATE TABLE woods_edges (
  source_id VARCHAR(255) NOT NULL,
  target_id VARCHAR(255) NOT NULL,
  relationship VARCHAR(50) NOT NULL,
  PRIMARY KEY (source_id, target_id, relationship),
  INDEX idx_target (target_id),
  INDEX idx_relationship (relationship)
) ENGINE=InnoDB;

-- Forward traversal
WITH RECURSIVE deps AS (
  SELECT target_id, 1 AS depth
  FROM woods_edges WHERE source_id = 'Order'
  UNION ALL
  SELECT e.target_id, d.depth + 1
  FROM woods_edges e
  INNER JOIN deps d ON e.source_id = d.target_id
  WHERE d.depth < 3
)
SELECT DISTINCT target_id, MIN(depth) as min_depth
FROM deps GROUP BY target_id ORDER BY min_depth;

Performance: MySQL's CTE optimizer is less mature than PostgreSQL's. Expect ~20-40% slower deep traversals. Adequate for single-codebase graphs up to ~5000 nodes. For graphs larger than that, consider in-memory with MySQL as fallback, or PostgreSQL.

Percona/MariaDB: Percona Server 8.0 uses the same MySQL CTE implementation. MariaDB 10.2+ has its own CTE implementation with slightly different optimization characteristics — test with your distribution.

When to use: MySQL 8.0+ is your primary database and you want graph storage without adding infrastructure.

PostgreSQL (Recursive CTEs)

Stores edges in a table, traverses with recursive queries. PostgreSQL's CTE optimizer is more mature, with better query planning for deep recursive traversals. Suitable for up to ~50000 nodes.

sql
CREATE TABLE woods_edges (
  source_id TEXT NOT NULL,
  target_id TEXT NOT NULL,
  relationship TEXT NOT NULL,
  PRIMARY KEY (source_id, target_id, relationship)
);

CREATE INDEX ON woods_edges (target_id);  -- For reverse traversal

Neo4j (Advanced)

Full graph database. Only needed for cross-repo analysis or very large monoliths where traversal patterns are complex (paths with conditions, weighted shortest path, community detection).

When to use: > 50000 nodes, cross-repository tracing, need for graph algorithms beyond what GraphAnalyzer provides in-memory. Note that basic graph analytics (betweenness centrality via bridge detection, cycle detection, hub identification, PageRank) are now available in-memory via GraphAnalyzer and DependencyGraph#pagerank, so Neo4j is only warranted for very large graphs or advanced algorithms like community detection and weighted shortest path.


Background Job Integration

Indexing can be triggered synchronously (rake task, inline) or asynchronously (background job). The system supports:

Sidekiq

ruby
class WoodsJob
  include Sidekiq::Job
  sidekiq_options queue: :low, retry: 2

  def perform(mode = "full")
    case mode
    when "full" then Woods.index!
    when "incremental" then Woods.index_incremental!
    end
  end
end

Solid Queue (Rails 8)

ruby
class WoodsJob < ApplicationJob
  queue_as :low_priority
  limits_concurrency to: 1, key: "woods"

  def perform(mode = "full")
    case mode
    when "full" then Woods.index!
    when "incremental" then Woods.index_incremental!
    end
  end
end

GoodJob

ruby
class WoodsJob < ApplicationJob
  queue_as :utility
  retry_on StandardError, wait: :polynomially_longer, attempts: 3

  def perform(mode = "full")
    # Same interface
  end
end

Inline (Development/CI)

ruby
# No job system needed
Woods.index!

The key point: the indexing pipeline itself is agnostic to job system. It's a synchronous Ruby operation. The job wrapper is just scheduling and concurrency control.


Starter (Zero Dependencies)

ruby
Woods.configure_with_preset(:local)
# Vector: InMemory VectorStore
# Metadata: SQLite
# Graph: In-memory
# Embedding: Ollama (nomic-embed-text)
# Jobs: Inline

Setup: brew install ollama && ollama pull nomic-embed-text Tradeoff: Lower retrieval quality, CPU-bound embedding, single-user.

Rails 8 Standard

ruby
Woods.configure_with_preset(:postgresql)
# Vector: pgvector
# Metadata: SQLite
# Graph: In-memory
# Embedding: OpenAI
# Jobs: Solid Queue

Setup: bundle add pgvector + enable extension Tradeoff: All-in-one database, good quality, API dependency for embeddings.

MySQL + Qdrant (Classic Rails)

ruby
# Note: :mysql preset is not yet implemented. Use :production preset + manual config
Woods.configure_with_preset(:production)
# Vector: Qdrant
# Metadata: SQLite
# Graph: In-memory
# Embedding: OpenAI
# Jobs: Sidekiq

Setup: Add Qdrant to docker-compose, create codebase tables in existing MySQL Tradeoff: Leverages existing MySQL infrastructure for metadata + graph. Qdrant handles vector search (which MySQL can't do natively). Most natural fit for established Rails apps on MySQL/Percona with Docker and Sidekiq.

Docker-Native

ruby
# Note: :docker preset is not yet implemented. Use :production preset (Qdrant + SQLite + OpenAI)
Woods.configure_with_preset(:production)
# Vector: Qdrant
# Metadata: SQLite
# Graph: In-memory
# Embedding: OpenAI
# Jobs: Sidekiq or Solid Queue

Setup: Add Qdrant to docker-compose Tradeoff: Additional service, but purpose-built vector search. Best performance/flexibility ratio.

Fully Self-Hosted

ruby
# Note: :self_hosted preset is not yet implemented.
# Use :production preset with Ollama as embedding provider via manual config.
Woods.configure_with_preset(:production)
# Vector: Qdrant
# Metadata: SQLite
# Graph: In-memory
# Embedding: Ollama (nomic-embed-text or custom)
# Jobs: Any

Setup: Qdrant + Ollama in docker-compose Tradeoff: No external API calls, all data stays on-premise. Works with either database. Embedding quality depends on model choice.

Enterprise / Multi-Repo

ruby
# Note: :enterprise preset is not yet implemented. Multi-repo / Milvus / Neo4j adapters are aspirational.
# Vector: Milvus or Weaviate (not yet implemented)
# Metadata: PostgreSQL (not yet implemented — current adapter: SQLite)
# Graph: Neo4j (not yet implemented — current adapter: InMemory)
# Embedding: OpenAI or Azure OpenAI (Azure not yet implemented)
# Jobs: Any

Setup: Significant infrastructure Tradeoff: Maximum capability, maximum operational cost.


Cost Modeling

Different stack combinations have very different cost profiles. This section provides rough estimates so teams can budget before committing to an architecture. All prices are approximate as of early 2025 and will change.

Variables That Drive Cost

VariableRangeImpact
Codebase size (models + services + controllers + GraphQL types + jobs)50–2000+ unitsLinear on embedding cost, linear on storage
Chunk multiplier (chunks per unit, avg)1.5–4xMultiplies embedding cost
Embedding dimensions256–3072Multiplies vector storage
Re-index frequencyWeekly / per-merge / per-commitMultiplies embedding cost over time
Retrieval volume10–1000 queries/dayMultiplies query-time embedding cost
Infrastructure modelShared app DB / dedicated / managed cloudDetermines hosting cost

Embedding Cost per Full Index

Assumptions: average unit produces ~400 tokens of embeddable text. Hierarchical chunking produces ~2.5 chunks per unit on average (1 summary + 1–3 semantic chunks). Each chunk has a ~50-token context prefix.

Codebase SizeTotal ChunksEmbedding TokensOpenAI 3-small ($0.02/1M)OpenAI 3-large ($0.13/1M)Voyage Code 3 ($0.06/1M)Ollama (local)
50 units~125~56K$0.001$0.007$0.003$0
200 units~500~225K$0.005$0.029$0.014$0
500 units~1,250~562K$0.011$0.073$0.034$0
1000 units~2,500~1.1M$0.022$0.146$0.068$0

A full index of a 1000-unit codebase costs about two cents with OpenAI's small model. A production Rails app may have ~993 models and 2000+ total units when including controllers, services, jobs, GraphQL types, and mailers. Even at that scale, embedding cost is not the bottleneck.

Incremental Re-embedding Cost

With chunk checksumming (only re-embed changed chunks), a typical merge touches 2–10 units. Assuming 5 changed units, ~12 chunks:

ScenarioTokensOpenAI 3-smallOpenAI 3-largeVoyage Code 3
Single merge (5 units)~5.4K$0.0001$0.0007$0.0003
Daily (10 merges)~54K$0.001$0.007$0.003
Monthly (200 merges)~1.1M$0.022$0.143$0.066
Yearly (2400 merges)~13M$0.26$1.69$0.78

Even with aggressive per-merge re-indexing, the yearly embedding cost for a 1000-unit codebase is under $2 with OpenAI's small model. At the target scale of 2000+ units, multiply these figures by ~2x — still well under $5/year.

Query-Time Embedding Cost

Each retrieval query must be embedded to perform vector search. One embedding call per query.

Daily QueriesMonthly Tokens (~100 tokens/query)OpenAI 3-smallOpenAI 3-largeVoyage Code 3
10~30K$0.001$0.004$0.002
100~300K$0.006$0.039$0.018
1000~3M$0.060$0.390$0.180

Query volume matters more than index size for ongoing costs, but even 1000 queries/day is cheap.

Vector Storage Cost

Storage cost depends on embedding dimensions × number of vectors.

Bytes per vector: dimensions × 4 (float32). With metadata overhead, estimate dimensions × 4 × 1.3.

Codebase (chunks)256-dim1024-dim1536-dim3072-dim
1250.16 MB0.65 MB0.97 MB1.9 MB
5000.64 MB2.6 MB3.8 MB7.7 MB
1,2501.6 MB6.4 MB9.6 MB19 MB
2,5003.2 MB13 MB19 MB38 MB

pgvector / MySQL: Uses your existing database storage. No additional cost beyond disk. At these sizes, vector storage is negligible.

Qdrant (self-hosted): Docker container, ~200MB base RAM. For codebases under 5,000 chunks, memory usage is under 100MB for vectors. Total container footprint: ~300MB RAM.

Qdrant Cloud: Free tier covers up to 1M vectors. A codebase would need to be enormous to exceed this.

Pinecone: Free tier: 1 index, 100K vectors. Starter ($8/mo): more indexes. No codebase will exceed free tier vector counts, but you pay for the pod.

Infrastructure Hosting Cost

ComponentShared (use app infra)Dedicated (self-hosted)Managed Cloud
Vector store$0 (pgvector in app DB)$5–15/mo (Qdrant VPS)$0–70/mo (free tiers exist)
Metadata store$0 (app database)$0 (app database)$0 (app database)
Embedding$5–20/mo (GPU for Ollama)$0.02–2/yr (API)
Background jobs$0 (app job system)$0 (app job system)$0 (app job system)

Cost per Preset

PresetSetup CostMonthly OngoingNotes
local (InMemory VectorStore + SQLite + Ollama)$0$0Requires local GPU for decent embedding speed. CPU works but slow.
postgresql (pgvector + SQLite + OpenAI)$0$0.05–0.50pgvector for vectors, SQLite for metadata. Good quality, API dependency for embeddings.
production (Qdrant + SQLite + OpenAI)$0$0.05–0.50Qdrant for vectors, SQLite for metadata. Slightly better vector search, more infra.
MySQL / Self-hosted / Enterprise (not yet implemented)Aspirational presets. Use :production preset with manual configuration overrides.

The Real Cost: Developer Time

Infrastructure and API costs are noise for single-codebase use. The actual cost is:

  • Setup time: 1–4 hours depending on preset complexity
  • Tuning time: 4–20 hours to evaluate embedding models, chunk strategies, and ranking weights
  • Maintenance time: 1–2 hours/month for monitoring, occasional re-tuning

Choose the preset that minimizes setup and maintenance time for your team's existing infrastructure. The cheapest option is always the one that uses what you already run.