Back to Cocoindex

Text Embedding (Rust)

examples/rust/text_embedding/README.md

1.0.81.5 KB
Original Source

Text Embedding (Rust)

Rust port of the Python text_embedding example.

Walks local markdown files, chunks each file (markdown-aware), embeds the chunks, and stores them in Postgres/pgvector — then serves similarity search.

Parallel to the Python example

ConcernPythonRust (this example)
Sourcelocalfs.walk_dircocoindex::fs::walk
Per-file compute@coco.fn(memo=True) process_file#[cocoindex::function(memo)] process_file
ChunkingRecursiveSplitter (markdown)cocoindex_ops_text RecursiveChunker (markdown)
Embeddingssentence-transformers/all-MiniLM-L6-v2fastembed AllMiniLML6V2 (same model, 384-dim)
Targetpostgres.TableTarget + pgvector indexpostgres::TableTarget + declare_vector_index

Incrementality: unchanged files are memo-skipped; chunks of a removed/edited file are reconciled away (the managed TableTarget deletes orphaned rows).

Run

bash
export POSTGRES_URL=postgres://cocoindex:cocoindex@localhost/cocoindex   # pgvector-enabled

cargo run -- index                 # walk ./markdown_files -> chunk -> embed -> pgvector
cargo run -- query "your query"    # cosine similarity search