docs/EMBEDDING_MODELS.md
Woods supports OpenAI's embedding API and any model served by a local Ollama instance. This doc covers the Ollama side — which model to pick, why the default is conservative, and how to add a new one.
| Model | Native context | Dimensions | Size on disk | Recommended for |
|---|---|---|---|---|
nomic-embed-text (default) | 2048 | 768 | 274 MB | General use, small footprint |
bge-m3 | 8192 | 1024 | 1.2 GB | Large Rails units, fewer chunks |
snowflake-arctic-embed2 | 8192 | 1024 | 1.2 GB | Multilingual projects |
mxbai-embed-large | 512 | 1024 | 670 MB | Short text (tweets, commit msgs) |
all-minilm | 256 | 384 | 46 MB | Tight-memory environments |
The default is nomic-embed-text because it's small, fast, and ships with every
fresh Ollama install. If you're indexing a large Rails codebase and don't mind
pulling a bigger model, switching to bge-m3 usually gives you:
bge-m3 benchmarks meaningfully better than
nomic-embed-text on code-search MTEB tasks.The tradeoff is download size (1.2 GB vs 274 MB), RAM usage during inference, and a dimension change (768 → 1024) that requires re-indexing when you switch.
Woods.configure do |config|
config.embedding_provider = :ollama
config.embedding_options = {
model: 'bge-m3',
host: 'http://localhost:11434'
}
# Everything else — chunker sizing, token counting, num_ctx — is picked
# up automatically from the model name.
end
Pull the model first:
ollama pull bge-m3
If you switch models on an existing install, drop your vector index before
re-indexing — the embedding dimension change is incompatible with existing
vectors. Woods::Resilience::IndexValidator will catch this and raise at
startup if you miss it.
num_ctx isn't enoughOllama has a long-running regression (upstream issue
ollama/ollama#14186) where
options.num_ctx does not lift the effective context ceiling on
/api/embed for models whose native context is smaller than the requested
override. For nomic-embed-text (native 2048) the server rejects inputs above
that with a 400 "the input length exceeds the context length" regardless of
num_ctx.
Woods works around this two ways:
Provider::Ollama keeps
a model → native-context registry (MODEL_CONTEXT_LENGTHS) so the chunker
sizes inputs to what Ollama will actually accept. num_ctx is still passed
through the request body in case the regression is fixed upstream, but
nothing relies on it.tokenizers gem is installed,
Embedding::TokenCounter loads the bert-base-uncased WordPiece tokenizer
(the one every BERT-family embedding model is built on) and the chunker
re-verifies every slice. WordPiece fragments CamelCase and :: separators
differently than character-based estimation suggests — this verification is
what catches the 10–20 % gap between our estimate and Ollama's internal
count.If you want Woods to auto-pick num_ctx for a model we don't ship support for:
ollama pull your-model.curl -s http://localhost:11434/api/show -d '{"model":"your-model"}' \
| jq '.model_info | to_entries[] | select(.key | contains("context_length"))'
# Probe script in this repo: scripts/probes/ollama_context_probe.rb
ruby scripts/probes/ollama_context_probe.rb your-model 8192
MODEL_CONTEXT_LENGTHS in lib/woods/embedding/provider.rb
(keep the value at the enforced ceiling, not the advertised one).nomic-embed-textbge-m3 weights vs 274 MB for
nomic-embed-text can matter on a shared laptop).bge-m3 or snowflake-arctic-embed2tokenizers gem install cost and want the tighter
coupling between client-side verification and server-side enforcement.