Back to Chroma

Search

docs/mintlify/reference/search.mdx

1.5.94.6 KB
Original Source

Search dictionaries define filtering, ranking, grouping, pagination, and field selection for Chroma queries. Each SDK provides a DSL, but they compile to the same JSON format that you can construct directly.

For example, SDK code like this:

<CodeGroup> ```python Python from chromadb import Search, K, Knn, GroupBy, MinK

search = Search( where=K("status") == "active", rank=Knn(query="machine learning research", limit=100), group_by=GroupBy(keys=K("category"), aggregate=MinK(keys=K.SCORE, k=2)), limit=10, select=[K.DOCUMENT, K.SCORE, "category"] )


```typescript TypeScript
import { Search, K, Knn, GroupBy, MinK } from 'chromadb';

const search = new Search({
  where: K("status").eq("active"),
  rank: Knn({ query: "machine learning research", limit: 100 }),
  groupBy: new GroupBy([K("category")], new MinK([K.SCORE], 2)),
  limit: 10,
  select: [K.DOCUMENT, K.SCORE, "category"]
});
rust
use chroma::types::{Aggregate, GroupBy, Key, QueryVector, RankExpr, SearchPayload};

let search = SearchPayload::default()
    .r#where(Key::field("status").eq("active"))
    .rank(RankExpr::Knn {
        query: QueryVector::Dense(vec![0.1, 0.2, 0.3]),
        key: Key::Embedding,
        limit: 100,
        default: None,
        return_rank: false,
    })
    .group_by(GroupBy {
        keys: vec![Key::field("category")],
        aggregate: Some(Aggregate::MinK {
            keys: vec![Key::Score],
            k: 2,
        }),
    })
    .limit(Some(10), 0)
    .select([Key::Document, Key::Score, Key::field("category")]);
</CodeGroup>

Gets compiled to this JSON:

json
{
  "where": {"status": {"$eq": "active"}},
  "rank": {"$knn": {"query": "machine learning research", "limit": 100}},
  "group_by": {
    "keys": ["category"],
    "aggregate": {"$min_k": {"keys": ["#score"], "k": 2}}
  },
  "limit": {"limit": 10, "offset": 0},
  "select": {"keys": ["#document", "#score", "category"]}
}

This reference describes the Search dictionary format and rules. For related dictionary references, see Where Filters.

JSON Format

Basic Structure

A Search dictionary is an object with optional keys:

json
{
  "where": { /* where filter dictionary */ },
  "rank": { /* rank expression dictionary */ },
  "group_by": { /* group by dictionary */ },
  "limit": {"limit": 10, "offset": 0},
  "select": {"keys": ["#document", "#score"]}
}

All keys are optional. Omitted keys use Search defaults.

Component Schemas

where

where uses the Where Filter dictionary schema.

json
{
  "where": ...
}

See Where Filters for full operator and rule definitions.

rank

rank must be a dictionary with exactly one top-level operator.

json
{
  "rank": RankExpr
}
json
{
  "RankExpr": {"$val": "number"}
}
json
{
  "RankExpr": {
    "$knn": {
      "query": "string | number[] | SparseVector",
      "key": "string (optional)",
      "limit": "positive integer (optional)",
      "default": "number (optional)",
      "return_rank": "boolean (optional)"
    }
  }
}
json
{
  "RankExpr": {
    "$op": ...
  }
}
OperatorFormat
$sum["RankExpr", "RankExpr", "... (min 2)"]
$mul["RankExpr", "RankExpr", "... (min 2)"]
$max["RankExpr", "RankExpr", "... (min 2)"]
$min["RankExpr", "RankExpr", "... (min 2)"]
$sub (l-r){ "left": "RankExpr", "right": "RankExpr" }
$div (l/r){ "left": "RankExpr", "right": "RankExpr" }
$abs"RankExpr"
$exp (e<sup>x</sup>)"RankExpr"
$log (Natural logarithm)"RankExpr"

group_by

group_by can be omitted or provided as a dictionary with both keys and aggregate.

json
{
  "group_by": {
    "keys": ["metadata_field", "... (min 1)"],
    "aggregate": {
      "$min_k": { // Or $max_k
        "keys": ["metadata_field_or_#score", "... (min 1)"],
        "k": "positive integer"
      }
    }
  }
}

limit

Controls pagination.

json
{
  "limit": {
    "limit": 10, (optional, default 0)
    "offset": 20 (optional)
  }
}

select

Controls returned fields. Use built-ins (#id, #document, #embedding, #metadata, #score) and/or metadata field names.

json
{
  "select": {
    "keys": ["#id", "#document", "#metadata", "#score", "author"]
  }
}