docs/FAQ.md
No — Woods requires a booted Rails environment for extraction. It uses runtime introspection APIs (ActiveRecord::Base.descendants, Rails.application.routes, reflection APIs) that only exist inside a running Rails application. Static analysis of source files alone cannot produce the accurate, inlined output that Woods generates. The MCP Index Server does not require Rails — it reads pre-extracted JSON from disk — but the extraction step itself always does.
Woods supports Rails 6.1 and newer, with Ruby 3.0 or newer. It is tested against Rails 7.x and 8.x. Rails 6.0 and earlier are not supported because the gem relies on Zeitwerk autoloading and several reflection APIs introduced in 6.1.
Yes — MySQL, PostgreSQL, and SQLite are all supported equally as application databases. Woods extraction uses ActiveRecord's database-agnostic reflection APIs and never issues raw SQL during extraction. The only backend-specific requirement is pgvector, which is PostgreSQL-only and optional. All other storage backends (SQLite metadata store, Qdrant, in-memory) work identically with MySQL and PostgreSQL. See BACKEND_MATRIX.md for the full compatibility matrix.
Woods has been tested on applications with 200+ models and 500+ extractable units. Extraction time scales roughly linearly with codebase size — a mid-size app (50-100 models) takes 10-30 seconds. Very large applications benefit from disabling include_framework_sources and using incremental mode for subsequent runs.
No. Extraction is entirely read-only. It uses ActiveRecord reflection APIs (columns, reflect_on_all_associations, _validators, etc.) rather than running queries against application data. No records are created, modified, or deleted during extraction.
Extraction is designed for development and CI environments — it requires a fully booted Rails environment and takes 10-30 seconds. The MCP servers are read-only development tools. Running extraction in production is technically possible but not recommended. The common pattern is to extract in CI and publish the JSON output as a build artifact.
Add the gem to your Gemfile and run the install generator:
# Gemfile
group :development do
gem 'woods'
end
bundle install
bundle exec rails generate woods:install
The generator creates config/initializers/woods.rb with default configuration. For Docker projects, run these commands through docker compose exec app. See GETTING_STARTED.md for the full setup walkthrough.
The only required option is output_dir, which has a sensible default:
Woods.configure do |config|
config.output_dir = Rails.root.join('tmp/woods') # default
end
With just this, you can run rake woods:extract and get full extraction output. Embedding and vector storage require additional configuration — see CONFIGURATION_REFERENCE.md.
Use the woods-mcp-start wrapper, which validates the index and restarts on failure:
{
"mcpServers": {
"codebase": {
"command": "woods-mcp-start",
"args": ["/path/to/your-rails-app/tmp/woods"]
}
}
}
Add this to .mcp.json in your Rails app root (for project-scoped config) or to claude_desktop_config.json (for global config). Run rake woods:extract first to generate the index. See MCP_SERVERS.md for the full setup guide.
Use woods-mcp (without the -start wrapper, which is Claude Code-specific):
{
"mcpServers": {
"codebase": {
"command": "woods-mcp",
"args": ["/path/to/your-rails-app/tmp/woods"]
}
}
}
Add this to .cursor/mcp.json in your project. See MCP_SERVERS.md for details.
The setup is the same as Cursor — use woods-mcp (not the -start wrapper):
{
"mcpServers": {
"codebase": {
"command": "woods-mcp",
"args": ["/path/to/your-rails-app/tmp/woods"]
}
}
}
Add this to your Windsurf MCP configuration file. The Index Server is transport-agnostic and works with any MCP-compliant client.
Woods extracts 34 types of units from a Rails application. The default extraction set includes models (with inlined concerns and schema), controllers, services, view components, jobs, mailers, GraphQL types/mutations/queries, serializers, managers, policies, validators, and Rails framework source. Additional extractors are available for state machines, events, decorators, database views, rake tasks, Action Cable channels, and more. See CONFIGURATION_REFERENCE.md for the full extractor list.
When a model includes a concern, the behavior defined in that concern is part of the model's effective API — callbacks fire, validations run, scopes are available. A tool that reports only what's in app/models/user.rb misses everything defined in included concerns. Woods inlines concern source directly into each unit's source_code field so the full behavioral picture is in one place. This is the key differentiator from file-level tools.
Use incremental mode, which re-extracts only files that have changed since the last run:
bundle exec rake woods:incremental
# Docker:
docker compose exec app bundle exec rake woods:incremental
Incremental mode is ideal for CI pipelines and local development workflows. It is typically 5-10× faster than a full extraction. Note that some unit types (routes, middleware, engines) require full extraction to update — see TROUBLESHOOTING.md for details.
Configure an embedding provider, then run the embed task:
# config/initializers/woods.rb
Woods.configure do |config|
# OpenAI (cloud)
config.embedding_provider = :openai
config.embedding_model = 'text-embedding-3-small'
config.embedding_options = { api_key: ENV['OPENAI_API_KEY'] }
# Ollama (local, no API key needed)
# config.embedding_provider = :ollama
# config.embedding_model = 'nomic-embed-text'
end
bundle exec rake woods:embed
After embedding, the codebase_retrieve MCP tool supports natural-language queries ranked by semantic similarity. See CONFIGURATION_REFERENCE.md for vector storage options.
Unit types that don't map to individual files — routes, middleware, engines, scheduled jobs, state machines, events, and factories — are extracted by introspecting the entire application at once rather than a single file. There's no way to incrementally update them by watching one file change. When any of these types change, run a full extraction:
bundle exec rake woods:extract
A mid-size Rails app (50-100 models, typical controller and service layer) takes 10-30 seconds for a full extraction. Larger apps (200+ models) may take 1-2 minutes. Framework source extraction (Rails, gem internals) adds overhead and can be disabled with config.include_framework_sources = false if you don't need it. Incremental extraction for changed files is much faster — typically under 5 seconds.
The Index Server reads pre-extracted JSON from disk and does not require Rails. It provides 27 tools for querying extracted codebase structure, dependency graphs, semantic search, and temporal snapshots. The Console Server connects to a live Rails application and provides 31 tools for querying real database records, running diagnostics, and monitoring job queues. Use the Index Server for structural/architectural questions; use the Console Server for live data and runtime diagnostics.
You're using the embedded console mode (launched via rake woods:console or docker compose exec ... rake woods:console). Embedded mode intentionally exposes only the 9 Tier 1 read-only tools (count, sample, find, pluck, aggregate, association_count, schema, recent, status). To access all 31 tools across all 4 tiers, use the bridge architecture. See CONSOLE_MCP_SETUP.md Option D for bridge setup.
The Console Server implements multiple safety layers. Every query runs inside a database transaction that is always rolled back, so writes are silently discarded. SqlValidator rejects DML and DDL at the string level before any database interaction. Model names are validated against ActiveRecord::Base.descendants to prevent arbitrary class instantiation. Tier 4 tools (eval, raw SQL) require explicit human confirmation. The Console Server is designed for development environments — treat it accordingly and avoid exposing it publicly.
Switch from the embedded mode (Tier 1 only) to the bridge architecture (all 4 tiers). The bridge runs woods-console-mcp on the host and connects to a bridge process inside the Rails environment.
~/.woods/console.yml:connection:
mode: docker
service: app
compose_file: docker-compose.yml
.mcp.json:{
"mcpServers": {
"codebase-console": {
"command": "woods-console-mcp"
}
}
}
See CONSOLE_MCP_SETUP.md for the full bridge setup guide.
This is an MCP client behavior, not a server bug. Some clients batch parallel tool calls into a single protocol request. If one call in the batch fails (e.g., a typo in an identifier), the transport layer may reject the entire response. There is no server-side fix — the workaround is to validate arguments first (use search to confirm identifiers exist) or send calls sequentially when any call is unreliable. See the Troubleshooting guide for details.
Extraction runs inside the container — it requires Rails to be booted. The Index Server runs outside the container on the host — it only reads static JSON files. The Console Server connects to a process inside the container through docker exec -i. This split architecture means you only need Docker for operations that require Rails.
HOST CONTAINER
───────────────── ──────────────────
Index Server (reads JSON) ◀── volume mount ─── rake extract (writes JSON)
woods-console-mcp ──── docker exec ──▶ rake console (queries Rails)
See DOCKER_SETUP.md for the full Docker architecture guide.
The Index Server is looking at the wrong path — specifically the container-internal path rather than the host-side path. The Index Server runs on the host and reads from the volume-mounted output directory.
{
"mcpServers": {
"codebase": {
"command": "woods-mcp-start",
"args": ["./tmp/woods"] ✓ host path
}
}
}
Do not use /app/tmp/woods (the container path) — the host process cannot access it. Verify with ls ./tmp/woods/manifest.json on the host.
For the embedded mode (9 Tier 1 tools), point the MCP client at docker compose exec -i:
{
"mcpServers": {
"codebase-console": {
"command": "docker",
"args": ["compose", "exec", "-i", "app",
"bundle", "exec", "rake", "woods:console"]
}
}
}
The -i flag is required to keep stdin attached for MCP protocol communication. For all 31 tools, use the bridge architecture instead. See DOCKER_SETUP.md for both configurations with complete examples.
Woods supports three vector storage backends and two metadata backends:
| Backend | Type | Use case |
|---|---|---|
in_memory | Vector + Metadata | Local dev, no persistence needed |
sqlite | Metadata | Persistent metadata, simple setup |
pgvector | Vector | PostgreSQL apps wanting unified storage |
qdrant | Vector | Production-scale semantic search |
All backends work with both MySQL and PostgreSQL application databases. pgvector requires PostgreSQL for the vector store, but your application database can still be MySQL. See BACKEND_MATRIX.md for the full compatibility matrix.
Two embedding providers are supported:
text-embedding-3-small (1536 dimensions, default) or text-embedding-3-large. Requires an OPENAI_API_KEY. Billed per token.nomic-embed-text, mxbai-embed-large). Runs locally, no API key or cost. Requires Ollama to be running at localhost:11434.# OpenAI
config.embedding_provider = :openai
config.embedding_model = 'text-embedding-3-small'
config.embedding_options = { api_key: ENV['OPENAI_API_KEY'] }
# Ollama
config.embedding_provider = :ollama
config.embedding_model = 'nomic-embed-text'
Presets configure storage and embedding together with a single call:
# No external services — in-memory vectors, SQLite metadata, Ollama embeddings
Woods.configure_with_preset(:local)
# PostgreSQL + OpenAI — pgvector vectors, SQLite metadata, OpenAI embeddings
Woods.configure_with_preset(:postgresql)
# Production scale — Qdrant vectors, SQLite metadata, OpenAI embeddings
Woods.configure_with_preset(:production)
Presets can be overridden with a block:
Woods.configure_with_preset(:local) do |config|
config.max_context_tokens = 16000
config.embedding_model = 'mxbai-embed-large'
end
Start with :local for zero-dependency development and upgrade to :postgresql or :production when you need persistence or scale. See CONFIGURATION_REFERENCE.md for what each preset configures.
Switching embedding models requires a full re-index. The new model produces vectors with different dimensions or a different embedding space, making old and new vectors incompatible for similarity search. IndexValidator detects dimension mismatches before queries fail and logs a warning. Re-index with:
bundle exec rake woods:extract
bundle exec rake woods:embed
When you run rake woods:embed, Woods generates embedding vectors for each extracted unit and stores them in your configured vector store. The codebase_retrieve MCP tool accepts a natural-language query, embeds the query using the same provider, and finds the most semantically similar units using cosine similarity. Results are re-ranked using Reciprocal Rank Fusion (RRF) that combines semantic similarity with PageRank importance scores, then assembled into a formatted context block within your configured token budget.
codebase_retrieve tool for?codebase_retrieve is the primary semantic search tool on the Index Server. It accepts a natural-language description of what you're looking for ("find where user email validation happens", "which services send Stripe API calls") and returns the most relevant extracted units as formatted context. It requires embedding configuration — without an embedding provider, the tool is available but returns no results. Token budget is controlled by config.max_context_tokens (default: 8000).
Several options for tuning retrieval:
max_context_tokens to include more units per query (at the cost of larger LLM context).similarity_threshold (default 0.7) to include less similar results.include_framework_sources: true) if Rails internals are relevant to your queries.retrieval_rate, retrieval_report_gap) to record quality ratings — retrieval_suggest analyzes feedback to recommend configuration changes.Temporal snapshots capture the full extraction state at a point in time, tied to a git SHA. They let you compare how the codebase has changed between snapshots — which units were added, modified, or deleted. Snapshots are opt-in and disabled by default.
Enable them in your initializer:
config.enable_snapshots = true
Snapshots require database migrations 004 and 005 to be run first (bundle exec rails db:migrate). The list_snapshots, snapshot_diff, unit_history, and snapshot_detail MCP tools become available after enabling.
The session tracer is middleware that records which Rails actions are invoked during a browser session, assembles the relevant extracted units, and makes that context available via the session_trace MCP tool. It is useful for giving an AI tool accurate context about what code path was active during a specific user interaction.
Session tracing is disabled by default. To enable it:
config.session_tracer_enabled = true
config.session_store = Woods::SessionTracer::FileStore.new(
Rails.root.join('tmp/session_traces')
)
The session_store option is required — there is no default store.
Use incremental extraction in your CI pipeline. Fetch enough git history for the incremental diff to work:
# .github/workflows/index.yml
jobs:
index:
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 2
- name: Update index
run: bundle exec rake woods:incremental
env:
GITHUB_BASE_REF: ${{ github.base_ref }}
For Docker-based CI:
- name: Update index
run: docker compose exec -T app bundle exec rake woods:incremental
Two rake tasks validate index integrity:
# Check integrity (no Rails required)
bundle exec rake woods:validate
# Show unit counts and extraction stats
bundle exec rake woods:stats
The pipeline_status MCP tool also reports the last extraction time, unit counts, and whether the index is stale relative to the current git HEAD.
Yes. Implement the extractor interface and register it:
class MyExtractor
def initialize; end
def extract_all
# Return Array<ExtractedUnit>
end
end
Then add it to the extractors list:
config.extractors += [:my_extractor]
The extractor must be accessible at boot time. See the existing extractors in lib/woods/extractors/ for the interface and conventions.
Use config.extractors to remove specific extractor types, or exclude directories from eager loading:
# Exclude specific extractor types
config.extractors -= %i[factories test_mappings]
# Exclude a directory from eager loading (prevents that dir from being indexed)
# config/application.rb
config.eager_load_paths -= [Rails.root.join('app/internal')]
Use console_redacted_columns to redact sensitive column values from Console Server results without excluding extraction:
config.console_redacted_columns = %w[password_digest api_key ssn token]
For detailed problem-specific guidance, see TROUBLESHOOTING.md.
Quick links: