Back to Cocoindex

Code Embedding (Rust)

examples/rust/code_embedding/README.md

1.0.82.2 KB
Original Source

Code Embedding (Rust)

Rust port of the Python code_embedding example.

Pipeline: walk → detect language → tree-sitter chunk → embed → store in pgvector, with a vector-similarity query mode.

How it maps to the Python example

StepPythonRust (this example)
Walk fileslocalfs.walk_dircocoindex::walk
Per-file incremental skip@coco.fn(memo=True)#[cocoindex::function(memo)]
Language detectiondetect_code_languagecocoindex_ops_text::prog_langs::detect_language
ChunkingRecursiveSplittercocoindex_ops_text::split::RecursiveChunker
EmbeddingsSentenceTransformerEmbedder (all-MiniLM-L6-v2)fastembed AllMiniLML6V2the same model, local ONNX
Embedder change-detectionContextKey(..., detect_change=True)ContextKey::new_with_state("embedder", |e| e.model_name)
Vector storepostgres.TableTarget + declare_vector_indexcocoindex::postgres TableTarget + declare_vector_index
Stable row idsIdGenerator.next_id(chunk.text)IdGenerator::next_id(ctx, chunk_text)
Querypgvector <=>pgvector <=>

Chunking and language detection do exist in the engine (cocoindex_ops_text) but aren't re-exported by the cocoindex SDK crate, so we depend on that crate directly.

Prerequisites

  • Postgres with the pgvector extension. Quick start:
    sh
    docker run -d --name cocoindex-pg -p 5432:5432 \
      -e POSTGRES_USER=cocoindex -e POSTGRES_PASSWORD=cocoindex -e POSTGRES_DB=cocoindex \
      pgvector/pgvector:pg16
    
    Override the connection with POSTGRES_URL (default postgres://cocoindex:cocoindex@localhost/cocoindex).
  • The embedding model is downloaded automatically by fastembed on first run (cached afterwards).

Usage

sh
# Index (incremental — unchanged files are skipped). Defaults to the repo root.
cargo run -- index            # or: cargo run -- index /path/to/code

# Query
cargo run -- query "how is memoization implemented"

Re-running index after editing a file re-embeds only that file; deleting a file removes its rows on the next index through target reconciliation.