docs/design/PERSISTENCE_AND_BOOTSTRAP.md
Status: Implemented — PRs #73, #74, #75, #76, #77, and the current status-surface PR.
Scope: Fix multi-process (Shape 2) index serving end-to-end. Covers control-flow decomposition of MCP::Bootstrapper, on-disk persistence format for the in-memory stores, and the query-path allocation bug that shipped alongside them.
Non-goals: Alternate durable backends beyond what already exists (pgvector, Qdrant, SQLite), distributed/sharded embedding, booting Rails inside woods-mcp, native C extensions.
Two plan items were explicitly scope-cut during implementation:
Indexer#persist_snapshot.spec/integration/persistence_round_trip_spec.rb proves the dump/load contract. A testbed-hosted end-to-end that boots woods-mcp and issues a semantic query against a real Ollama is a separate effort that needs the testbed repo wired in.Woods supports three deployment shapes:
woods:embed, separate long-lived woods-mcp server reads from the same output_dir and answers queries. Broken.Shape 2 is what the admin-style host wants: a Rails app extracts and embeds on its own schedule, and an MCP sidecar serves queries without booting Rails. Three independent defects compose into a broken experience:
| Defect | Location | Effect |
|---|---|---|
| In-memory vector store has no persistence | lib/woods/storage/vector_store.rb InMemory | Vectors live in a process-local Hash; MCP sidecar starts empty. |
| In-memory metadata store has no persistence | lib/woods/storage/metadata_store.rb InMemory | Same story — metadata dies with embed process. |
| MCP bootstrapper ignores host config | lib/woods/mcp/bootstrapper.rb build_retriever | exe/woods-mcp does not boot Rails; config/initializers/woods.rb never runs; bootstrapper re-derives provider from env vars, mutates Woods.configuration in place, rescue StandardErrors every failure into a single nil + "pattern search only" warning. |
| Cosine similarity allocates transiently per pair | lib/woods/storage/vector_store.rb:150–163 | vec.zip(other).sum { |x, y| x * y } allocates ~770 objects per similarity call; ~9.8M allocations per 12 k-vector search; ~1017 ms/query on admin-size corpora. |
The first three are what users experience. The fourth is a performance defect adjacent to the same code — independent of persistence but by far the largest user-visible win available.
config/initializers/woods.rb must reach every woods process without the host re-declaring values at launcher boundaries. We capture resolved config (host URL, dimension, model, gem version) at embed time, not declared config. Same reason Rails 6 moved to DatabaseConfigurations — the resolved value is the contract.Snapshotter pair rather than bolting #persistent? / #dump_to / .load_from onto VectorStore::Interface and MetadataStore::Interface. Persistent backends (pgvector, Qdrant, SQLite) never touch the Snapshotter and don't grow no-op methods.Bootstrapper#build_retriever today plays four roles (resolve config, probe network, construct stores, log status) and catches every exception. We split those roles and raise typed errors at each boundary. build_retriever(config) becomes a confident four-line method: given config, build retriever.woods_status. Silent fallback is the failure mode we're eliminating, not a thing to preserve.output_dir (config snapshot, vector dump, metadata dump) carries an explicit monotonic schema version independent of gem semver. A server reading a newer-than-supported artifact refuses loudly. A server reading an older artifact upgrades in place through a compat shim or emits a structured deprecation and reads through it.latest pointer file — consumers never read a half-written dump because they only look at the pointer, and the pointer is flipped last.output_dir/
├── manifest.json # existing — extracted unit manifest
├── dependency_graph.json # existing
├── checkpoint.json # existing — embed resume state
├── woods.json # NEW — resolved config snapshot (versioned)
├── dumps/
│ ├── 2026-04-23T03-42-17Z/ # per-run directory (UTC timestamp)
│ │ ├── vectors.bin # packed-float32 blob with header
│ │ ├── vectors.idx # id-order index (id strings + offsets)
│ │ └── metadata.msgpack # metadata store dump (versioned)
│ └── latest # pointer file → "2026-04-23T03-42-17Z"
└── units/ # existing — per-unit extraction output
latest is a one-line text file containing the directory name of the newest complete dump. The embed run writes everything into dumps/<ts>/, fsyncs the directory, then atomically rewrites latest via tmp + rename. Readers always File.read("latest").strip first. Crash mid-write leaves a valid previous latest pointing at the last complete dump.
vectors.bin)Binary format chosen for the pack("e*") path (IEEE-754 LE float32). Headers are fixed-width so a corrupt file is detectable on first read.
offset length field
0 4 bytes magic "WVF1" (Woods Vector File v1)
4 4 bytes schema_version (u32, LE)
8 4 bytes dimension (u32, LE) e.g. 768
12 8 bytes vector_count (u64, LE)
20 4 bytes gem_version_length (u32, LE)
24 N bytes gem_version (UTF-8)
24+N 4 bytes model_name_length (u32, LE)
28+N M bytes model_name (UTF-8)
... — packed float32 data (vector_count × dimension × 4 bytes)
vectors.idx holds the id-ordering sidecar: one record per vector, length-prefixed-id-string + u64 offset into vectors.bin. Keeps vectors.bin purely numeric and mmap-friendly.
woods.json)Captures resolved config — what the embed run actually used, not what the initializer declared.
{
"schema_version": 1,
"gem_version": "1.2.0",
"created_at": "2026-04-23T03:42:17Z",
"embedding_provider": {
"class": "Woods::Embedding::Provider::Ollama",
"model": "nomic-embed-text",
"host": "http://host.docker.internal:11434",
"num_ctx": 2048,
"read_timeout": 120,
"dimension": 768
},
"stores": {
"vector_store": "in_memory",
"metadata_store": "in_memory",
"graph_store": "in_memory"
},
"dumps_dir": "dumps",
"latest_dump": "2026-04-23T03-42-17Z"
}
Indexer#store_vectors writes an append-only vectors.log in the active dump directory as each batch is embedded — fixed-width records (u32 id-length + id + u32 dim + packed floats). On index_all completion, the log is compacted into vectors.bin + vectors.idx via tmp + rename. Crash mid-embed leaves a resumable log. For admin's 6315-unit corpus the compact step is < 500 ms; for 10× corpora it stays bounded by disk bandwidth, not by Ruby heap pressure.
Before:
Bootstrapper.build_retriever
├─ mutates Woods.configuration per-branch
├─ probes network inline
├─ rescue StandardError → nil + warn
└─ returns a Retriever or nil
After:
ConfigResolver.resolve(config, artifact:, env: ENV)
├─ reads artifact.config_snapshot (woods.json) if present
├─ applies env overrides (explicit + logged)
├─ validates (raises MissingCredential / ConfigMismatch / DimensionMismatch)
└─ returns an immutable ResolvedConfig
ProviderProbe.reachable!(provider)
└─ raises ProviderUnreachable(url:, reason:) or returns provider
Snapshotter::Vector.load_or_empty(artifact) → VectorStore
Snapshotter::Metadata.load_or_empty(artifact) → MetadataStore
├─ read artifact.latest_dump
├─ validate header schema_version
└─ return hydrated store or empty store (never raises for "no dump yet")
Bootstrapper.build_retriever(config, artifact:)
├─ config = ConfigResolver.resolve(config, artifact:)
├─ provider = build_provider(config) # no probe here
├─ state = BootstrapState.new
├─ vector_store = Snapshotter::Vector.load_or_empty(artifact)
├─ metadata_store = Snapshotter::Metadata.load_or_empty(artifact)
├─ retriever = Builder.new(config).build_retriever(vector_store:, metadata_store:, provider:)
├─ begin ProviderProbe.reachable!(provider); state.mark(:hydrated)
│ rescue ProviderUnreachable => e; state.mark(:degraded, reason: e); retriever.degrade!
│ end
└─ retriever
build_retriever does not mutate Woods.configuration. rescue StandardError is gone. ConfigResolver raises typed errors; ProviderProbe.reachable! raises typed errors; Snapshotter.load_or_empty returns nil only at the boundary (no dump yet) and the empty-store is the interior's single source of truth.
The graph store is rebuilt from dependency_graph.json on boot because extraction owns the write path — embed never touches graph edges. This only works when the graph store is ephemeral. If a backend reports durable? => true, the hydration path raises InapplicableBackend at boot: rebuilding a durable store from the extraction artifact would stomp state it's supposed to preserve, and the contributor adding that adapter is the one who needs to wire an extraction-time write path (mirroring what Snapshotter::Vector.dump already enforces for pgvector / Qdrant).
Woods::Error # existing
Woods::ConfigurationError # existing (PR #72) — for config-shape errors
Woods::MCP::BootstrapError # NEW — sibling to ConfigurationError, not child
Woods::MCP::MissingCredential # config-invalid
Woods::MCP::ConfigMismatch # stored config contradicts host config
Woods::MCP::DimensionMismatch # provider dim ≠ stored vectors dim
Woods::MCP::UnsupportedArtifact # woods.json schema_version newer than gem
Woods::MCP::ProviderUnreachable # sibling; recoverable, NOT a BootstrapError
BootstrapError is a sibling of ConfigurationError, not a child. UnsupportedArtifact and DimensionMismatch describe artifact/runtime state, not declared-configuration problems; grouping them under ConfigurationError would mislead host apps that rescue it. Inherits directly from Woods::Error.
ProviderUnreachable sits outside BootstrapError deliberately — it's recoverable at the MCP layer and signals "start degraded, retry later," not "fail startup." Bootstrapper catches it internally; nothing upstream should.
The gem already ships Woods::Resilience::RetryableProvider and Woods::Resilience::CircuitBreaker. The MCP boot path reuses these rather than reinventing retry logic. Bootstrapper wraps the configured provider with RetryableProvider once, and the first-query retry from degraded state goes through the same breaker state machine that runtime calls use. No new retry primitives.
BootstrapState and woods_statusBootstrapState is a small value object: status ∈ {initializing, hydrating, hydrated, degraded, failed} plus reason: (exception or nil), hydrated_at:, degraded_since:. The woods_status MCP tool reads from it:
{
"state": "degraded",
"reason": "Woods::MCP::ProviderUnreachable: http://host.docker.internal:11434/api/tags refused connection",
"degraded_since": "2026-04-23T03:42:18Z",
"provider": { "class": "Ollama", "host": "...", "model": "nomic-embed-text", "reachable": false },
"vector_store": { "type": "in_memory", "loaded_from": "dumps/2026-04-23T03-42-17Z/vectors.bin", "vector_count": 12771, "schema_version": 1 },
"metadata_store": { "type": "in_memory", "record_count": 6315, "schema_version": 1 },
"config_source": "output_dir/woods.json",
"staleness": { "embedded_at": "2026-04-23T03:41:47Z", "manifest_matches_vectors": true }
}
An operator SSH'd into a container at 2 am answers "why is semantic search broken" by reading this blob.
InMemory::VectorStore switches its backing representation from @entries = { id => { vector: [Float]*N, metadata: {...} } } to:
@ids = [] (String)@vectors = [] (one flat Array<Float> of length count × dim — index i*dim..(i+1)*dim)@metadata = {} (id → Hash)cosine_similarity becomes a kernel that takes two offsets and a dimension, walks with a while loop, reuses no temporary allocations. Expected: ~1017 ms → ~50–100 ms per query on 12 k vectors; 9.8M allocations → ~0 per query.
Four focused PRs. Each independently reviewable, testable, and shippable. Each leaves the tree working for existing SQLite / pgvector / Qdrant users.
Write a standalone script that measures, on 12 771 × 768 Float arrays:
pack("e*"), Marshal.dump/load, MessagePack, JSON.generate/parse. Measure both cold (freshly-opened file, page cache dropped via posix_fadvise DONTNEED or a separate process) and warm (re-read) — operators feel cold, not warm.GC.stat(:total_allocated_objects) delta. RSS hides churn; allocation count predicts GC pressure during the first query.zip/sum kernel vs the while-loop kernel.{ type: "model", namespace_prefix: "Admin::" }, which rejects ~80% of candidates on admin-shape corpora). The filter must run before the kernel; verify.Output a short table. Decision gates (revised post-measurement, Ruby 3.3 arm64-darwin23, 12,771 × 768 unit-normalized vectors):
pack("e*") cold load must be ≥ 3× faster than the next best alternative. Actual (see tmp/bench_results/phase0.json): pack(e*) 33 ms, MessagePack 110 ms — 3.3×. The original 8× gate was speculative; the 3× gate reflects measurement.pack("e*") on-disk size must be ≤ half the next-best format. Actual: 39 MB vs 88 MB MessagePack vs 216 MB Marshal/JSON.GC.stat(:total_allocated_objects) delta around a 12 k-entry search ≤ 50). Actual: 2 allocations — zip/sum baseline is 9,833,673. This is the load-bearing gate; wall-clock is secondary.If any gate fails, the corresponding phase is re-opened. Measurement 2026-04-23 met the revised gates.
Scope: Replace InMemory::VectorStore#cosine_similarity and the backing representation. No on-disk changes. No Bootstrapper changes.
Changes:
InMemory::VectorStore switches to flat-buffer backing (@ids, @vectors, @metadata).#store(id, vector, metadata) appends to all three.#delete(id) marks a tombstone (sparse; compact at next full-embed run) — simplest correct approach; can optimize later.#search(query_vector, limit:, filters:) iterates indices, computes cosine inline via a strided while-loop kernel, no per-pair allocations.#each_entry { |id, vector, metadata| ... } and #bulk_load(ids:, vectors_flat:, metadata:) — the Snapshotter seams. PR 1 adds them with straightforward implementations on the new flat buffer; PR 3 consumes them.Specs:
zip/sum implementation on fixed vectors.ObjectSpace.count_objects_size or GC.stat(:total_allocated_objects) delta around a search call drops by > 1000×.Ships first, independent of everything else. ~100–150 LoC including specs.
Scope: Extract ConfigResolver, ProviderProbe, IndexArtifact, Snapshotter (with no-op implementations on the persistence side for now — Phase 3 fills them in). Define the full exception hierarchy. Introduce BootstrapState. Rewrite Bootstrapper#build_retriever.
Changes:
lib/woods/index_artifact.rb — a Whole Value for output_dir: config_path, dumps_root, latest_dump_path, fresh?, schema_version.lib/woods/mcp/config_resolver.rb — resolve(config, artifact:, env: ENV) → ResolvedConfig. Raises Woods::MCP::MissingCredential, ConfigMismatch, DimensionMismatch, UnsupportedArtifact.lib/woods/mcp/provider_probe.rb — reachable!(provider) → provider or raise ProviderUnreachable(url:, reason:).lib/woods/storage/snapshotter/vector.rb and snapshotter/metadata.rb — load_or_empty(artifact) → Store, dump(store, artifact) → void. Phase 2 implementations are stubs that always return empty-store + no-op dump; Phase 3 wires in the real pack/unpack.lib/woods/mcp/bootstrap_state.rb — value object.lib/woods/mcp/bootstrapper.rb — rewritten build_retriever(config, artifact: IndexArtifact.new(config.output_dir)). No mutation of Woods.configuration. No bare rescue StandardError. Four-line narrative body.exe/woods-mcp top-level gains one rescue Woods::MCP::BootstrapError => e that prints class + message + remediation hint and exits nonzero.woods.json is absent, Bootstrapper raises Woods::MCP::MissingArtifact unless WOODS_ALLOW_AUTODETECT=1 is set. If the env flag is set, the existing env-var auto-detect path runs with a deprecated_autodetect structured warning. Hosts that never ran an embed see a clear failure; hosts that want the old behavior opt in explicitly.Specs:
ConfigResolver — happy path, missing-credential raise, dimension-mismatch raise, schema-version raise.ProviderProbe — reachable path, refused path, timeout path.IndexArtifact — path semantics, fresh? semantics, handling of missing latest.Snapshotter stubs — load_or_empty always returns empty; dump is a no-op but records that it was called (for Phase 3 replacement).Bootstrapper.build_retriever — returns retriever in happy path, BootstrapState.degraded on ProviderUnreachable, raises typed error on config-invalid.exe/woods-mcp smoke — missing credential raises with a one-line operator-readable message.Scope: Fill in the Snapshotter implementations, wire the Indexer to stream-append during embed, compact at end, flip latest. Write woods.json at embed completion.
Changes:
Snapshotter::Vector — real load_or_empty reads latest pointer → loads vectors.bin header → validates schema version + dimension → unpack("e*") into a flat Array<Float> → returns populated InMemory::VectorStore. Real dump writes header + float blob + vectors.idx.Snapshotter::Metadata — real implementation. Format is MessagePack here because metadata is heterogeneous hash-shaped data, not dense numeric arrays — MessagePack's type tags matter for the hash case. Schema-versioned header.Indexer#store_vectors — streaming append to vectors.log in the active dump dir. On index_all completion, compact to vectors.bin + vectors.idx atomically.Indexer writes woods.json to output_dir/ on completion with resolved config.latest pointer flip at the end of a successful run; previous latest dump is kept (not deleted) — retention is a separate operational concern.Tempfile + File.rename everywhere.Specs:
UnsupportedArtifact; v1-server reads v1-dump cleanly.vectors.log exists but compact never ran → subsequent embed can resume from the log.latest pointer: write-in-progress dump directory is never pointed to.woods-testbed: run embed in subprocess, stop it, start woods-mcp in another subprocess, issue a semantic query, assert results.Scope: Wire BootstrapState into the woods_status MCP tool. Add :shared_filesystem preset to Builder::PRESETS. Docs.
Changes:
woods_status tool returns the structured health block from §3.7.Builder::PRESETS[:shared_filesystem] = { vector_store: :in_memory, metadata_store: :in_memory, graph_store: :in_memory, embedding_provider: :ollama } with the requirement that output_dir be set and readable by both processes.docs/CONFIGURATION_REFERENCE.md — new "Deployment Shapes" section (single-process, shared filesystem, distributed) and a "Shape 2 setup" subsection referencing the :shared_filesystem preset.docs/BACKEND_MATRIX.md — new "Persistence Story" column for every adapter; explicit rows for Shape 1 / Shape 2 / Shape 3.docs/design/PERSISTENCE_AND_BOOTSTRAP.md (this file) — updated to "Implemented" once PRs 1–4 land.IndexArtifact APIartifact = Woods::IndexArtifact.new(output_dir)
artifact.fresh? # → true if no woods.json or no dumps/latest
artifact.config_path # → Pathname("output_dir/woods.json")
artifact.latest_dump_path # → Pathname or nil
artifact.dumps_root # → Pathname("output_dir/dumps")
artifact.new_dump_dir(now: Time.now.utc) # → Pathname with atomic-create semantics
artifact.promote(dump_dir) # → flips `latest` via tmp + rename
artifact.read_config # → Hash or nil (with schema-version validation)
artifact.write_config(resolved) # → atomic write
ResolvedConfig (immutable Whole Value, not a struct)Ruby 3.2+ Data class, but carries behavior — not just fields. A bag-of-fields is insufficient; dimension-mismatch and config-comparison logic belong on the value itself so callers tell-don't-ask.
config = Woods::ResolvedConfig.from_hash(woods_json)
config.dimension # → 768
config.provider_signature # → "Ollama/nomic-embed-text@http://host..."
config.matches?(other) # → Boolean — true if provider + dimension + model agree
config.assert_compatible!(stored_config)
# → raises DimensionMismatch / ConfigMismatch if anything critical differs
config.to_snapshot_json # → String for woods.json serialization
ConfigResolver.resolve returns a ResolvedConfig; callers compare via matches? / assert_compatible!. Snapshotter takes a ResolvedConfig, never a loose dim: kwarg — two sources of truth for dimension is how the next DimensionMismatch bug ships.
Snapshotter::Vector APIWoods::Storage::Snapshotter::Vector.load_or_empty(artifact, resolved_config:) # → InMemory::VectorStore
Woods::Storage::Snapshotter::Vector.dump(store, artifact, dump_dir)
The Snapshotter takes resolved_config:, not a loose dim: kwarg — dimension is derived from the config, eliminating two sources of truth.
The Snapshotter knows nothing about the adapter's internals beyond the new #each_entry(&block) / #bulk_load(entries) methods — adapters stay in key/value vocabulary. Dispatch logic lives in Builder, not in the Snapshotter: Builder constructs a Snapshotter only for :in_memory stores; persistent adapters never meet one. This keeps the Snapshotter's silhouette clean and avoids a persistent-adapter interface pollution creeping back in through the "refuse if persistent" door.
If a persistent adapter is ever wired into the Snapshotter path by a bug, it raises Woods::Storage::InapplicableBackend (a named class, not a bare Woods::Error) so tests can assert it.
BootstrapStatestate = Woods::MCP::BootstrapState.new
state.status # → :initializing | :hydrating | :hydrated | :degraded | :failed
state.reason # → Exception or nil
state.hydrated_at
state.degraded_since
state.mark(:hydrated)
state.mark(:degraded, reason: ProviderUnreachable.new(...))
state.to_h # for woods_status
Woods::Error # existing
Woods::ConfigurationError # existing — declared-config shape errors only
Woods::Storage::InapplicableBackend # Snapshotter misuse on durable backend
Woods::MCP::BootstrapError # sibling of ConfigurationError
Woods::MCP::MissingCredential # config-invalid
Woods::MCP::ConfigMismatch # stored config contradicts host config
Woods::MCP::DimensionMismatch # provider dim ≠ stored vectors dim
Woods::MCP::UnsupportedArtifact # artifact schema_version newer than gem, or corrupted
Woods::MCP::MissingArtifact # no woods.json and WOODS_ALLOW_AUTODETECT unset
Woods::MCP::ProviderUnreachable # recoverable sibling; caught internally for degraded start
Each exception carries structured details (URL, expected vs actual dimension, artifact schema version) that exe/woods-mcp's top-level rescue formats into a one-line operator message.
| Scope | Lives in | Coverage |
|---|---|---|
| Kernel correctness + allocation bound | spec/storage/vector_store_spec.rb | Bit-equal results vs reference; GC.stat delta below threshold. |
| Kernel latency | spec/performance/query_kernel_spec.rb (new, tagged :perf, opt-in) | Wall-clock bound for 12 771 × 768; fails loud on regression. |
| ConfigResolver | spec/mcp/config_resolver_spec.rb | Every raise path, every env-override path, happy path. |
| ProviderProbe | spec/mcp/provider_probe_spec.rb | Reachable, refused, timeout, DNS failure. |
| IndexArtifact | spec/index_artifact_spec.rb | Paths, fresh?, atomic promote. |
| Snapshotter round-trip | spec/storage/snapshotter_spec.rb | Dump in process A (forked), load in process B, assert equality. |
| Schema-version rejection | same | v1-server rejects v2-dump with UnsupportedArtifact. |
| Degraded start | spec/mcp/bootstrapper_spec.rb | Provider unreachable → retriever returned + state :degraded. |
exe/woods-mcp top-level | spec/exe/woods_mcp_spec.rb | Typed exception prints actionable one-liner and exits 2. |
End-to-end in woods-testbed | testbed repo smoke script | Full shape-2 cycle — embed, MCP boot, query, result match. |
| User class | Change visible? | Notes |
|---|---|---|
Existing :sqlite metadata users | No | Snapshotter never touches SQLite backend. |
| Existing pgvector / Qdrant users | No | Same story. |
Existing :in_memory single-process users | No | Get persistence for free if they set output_dir; otherwise behavior identical. |
| Existing env-var-driven MCP launchers | No | Env overrides always win over snapshot. No woods.json → falls back to current auto-detect path with a deprecation warning. |
| Shape 2 hosts (admin) | Major fix | MCP semantic search works end-to-end once Phase 3 lands. No manual config beyond existing initializer. |
| Silent-fallback behavior (current) | Removed | Config errors now raise at boot. Provider unreachable starts degraded. Operators can diagnose. |
InMemory#search's current filter application is cheap, the kernel win holds; if filters are expensive, the bench needs to show end-to-end win, not just the kernel.pack("e*"), because metadata is heterogeneous hash-shaped data. This is intentional and the doc should say so; rejecting MessagePack for vectors does not mean rejecting it universally.keep_last_n_dumps knob eventually. Out of scope for these four PRs, but call it out in Phase 4 docs.File.open("vectors.bin", "rb").then { |f| IO::Buffer.map(f) } lets the serve process skip the in-memory unpack entirely. Not worth the complexity at admin's 12 k scale (read + unpack("e*") is < 500 ms there), but the packed-float32 format chosen in §3.2 was picked partly to keep this door open. Revisit at a real workload that needs it.vectors.log is consumed by the compact step on successful embed finish. If the embed crashes, checkpoint.json already tracks per-unit progress, so the semantic resume works — the new vectors.log is either (a) replayed into vectors.bin if the embed was otherwise complete, or (b) discarded and rewritten from the next run. Phase 3 implementation must pick one; the spec captures it.Builder#build_metadata_store / build_vector_store still construct empty stores. The Snapshotter does not replace these methods — it consumes them. PR 3's Snapshotter::Vector.load_or_empty calls Builder to get an empty InMemory::VectorStore, then bulk_loads the hydrated data. Keeps the construction site single.woods_status. Phase 4 polish.ConfigurationError vs BootstrapError. BootstrapError inherits from ConfigurationError today (§3.6). Double-check that existing rescue Woods::ConfigurationError handlers in host apps still behave as expected after we extend the hierarchy.woods-mcp. Rejected for operational reasons: inherits host boot time and initializer failure modes on every MCP restart; loses the ability to run the MCP on a different host from the Rails app.#persistent? / #dump_to / .load_from on storage Interface. Rejected as interface pollution — three methods most implementers answer with "no-op" pollutes the contract. Replaced by the Snapshotter pattern.:mysql metadata adapter. Long-term option; not this plan.| Decision | Rationale |
|---|---|
pack("e*") over MessagePack for vectors | Type-tagged variant per float is wrong for dense numeric; 10× heap cost at scale; non-mmap-friendly. Metadata is a separate case (heterogeneous hash shape) and keeps MessagePack. |
| Snapshotter pair, not persistence methods on Interface | Interface pollution — most adapters would answer persistent? / dump_to / load_from with "no-op." persistent? is also a tell-don't-ask smell. Separate role, separate object. |
Resolved config in woods.json, not declared | Env resolution differs between embed process and serve process; resolved values are the contract. Same reason Rails 6 moved to DatabaseConfigurations. |
| Typed exception hierarchy | Silent rescue StandardError hides three distinct failure modes; operators need class-based grep. Four BootstrapError subclasses + ProviderUnreachable as a recoverable sibling. |
| Start degraded on provider unreachable, not fail | MCP is a long-lived sidecar; dependency may come up after the server. Config-invalid still raises; dependency-unreachable starts degraded and retries on first query. |
| Kernel fix ships first | 10–20× query latency win for ~150 LoC, no coupling to other work; admin benefits immediately. |
| Schema version on every artifact from PR 3 | Retrofit requires migration; upfront is free. |
latest pointer for cross-artifact atomicity | Eliminates half-written-dump race cleanly. |
| Streaming append during embed | Crash at hour N of an N+1 hour embed leaves a resumable artifact, not garbage. |
| No new required infrastructure | Backend agnosticism per CLAUDE.md; Shape 2 must work for MySQL hosts without new services. |
Size estimate, rough:
| Phase | LoC (code + specs + docs) | Depends on |
|---|---|---|
| Phase 0 bench | ~100 (throwaway) | — |
| PR 1 kernel fix | ~150 | — |
| PR 2 decomposition | ~600 | PR 1 (for overlap safety) |
| PR 3 persistence | ~700 | PR 2 |
| PR 4 polish | ~200 | PR 3 |
| Total | ~1650 | sequential after PR 1 |