Back to Chroma

Full Text Search

docs/mintlify/docs/querying-collections/full-text-search.mdx

1.5.97.2 KB
Original Source
<Tabs> <Tab title="Python" icon="python">

The where_document argument in get and query is used to filter records based on their document content.

We support full-text search with the $contains and $not_contains operators. We also support regular expression pattern matching with the $regex and $not_regex operators.

For example, here we get all records whose document contains a search string:

python
collection.get(
   where_document={"$contains": "search string"}
)

Note: Full-text search is case-sensitive.

Here we get all records whose documents match the regex pattern for an email address:

python
collection.get(
   where_document={
       "$regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
   }
)

Using Logical Operators

You can also use the logical operators $and and $or to combine multiple filters.

An $and operator will return results that match all the filters in the list:

python
collection.query(
    query_texts=["query1", "query2"],
    where_document={
        "$and": [
            {"$contains": "search_string_1"},
            {"$regex": "[a-z]+"},
        ]
    }
)

An $or operator will return results that match any of the filters in the list:

python
collection.query(
    query_texts=["query1", "query2"],
    where_document={
        "$or": [
            {"$contains": "search_string_1"},
            {"$not_contains": "search_string_2"},
        ]
    }
)

Combining with Metadata Filtering

.get and .query can handle where_document search combined with metadata filtering:

python
collection.query(
    query_texts=["doc10", "thus spake zarathustra", ...],
    n_results=10,
    where={"metadata_field": "is_equal_to_this"},
    where_document={"$contains":"search_string"}
)
</Tab> <Tab title="TypeScript" icon="js">

The whereDocument argument in get and query is used to filter records based on their document content.

We support full-text search with the $contains and $not_contains operators. We also support regular expression pattern matching with the $regex and $not_regex operators.

For example, here we get all records whose document contains a search string:

typescript
await collection.get({
  whereDocument: { $contains: "search string" },
});

Here we get all records whose documents matches the regex pattern for an email address:

typescript
await collection.get({
  whereDocument: {
    $regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$",
  },
});

Using Logical Operators

You can also use the logical operators $and and $or to combine multiple filters.

An $and operator will return results that match all the filters in the list:

typescript
await collection.query({
  queryTexts: ["query1", "query2"],
  whereDocument: {
    $and: [{ $contains: "search_string_1" }, { $regex: "[a-z]+" }],
  },
});

An $or operator will return results that match any of the filters in the list:

typescript
await collection.query({
  queryTexts: ["query1", "query2"],
  whereDocument: {
    $or: [
      { $contains: "search_string_1" },
      { $not_contains: "search_string_2" },
    ],
  },
});

Combining with Metadata Filtering

.get and .query can handle whereDocument search combined with metadata filtering:

typescript
await collection.query({
    queryTexts: ["doc10", "thus spake zarathustra", ...],
    nResults: 10,
    where: { metadata_field: "is_equal_to_this" },
    whereDocument: { "$contains": "search_string" }
})
</Tab> <Tab title="Rust" icon="rust">

The r#where argument in get and query is used to filter records based on their document content.

We support full-text search with the Contains and NotContains operators. We also support regular expression pattern matching with the Regex and NotRegex operators.

For example, here we get all records whose document contains a search string:

rust
use chroma::types::{DocumentExpression, DocumentOperator, Where};

let where_clause = Where::Document(DocumentExpression {
    operator: DocumentOperator::Contains,
    pattern: "search string".to_string(),
});

let results = collection
    .get(None, Some(where_clause), None, None, None)
    .await?;

Here we get all records whose documents matches the regex pattern for an email address:

rust
let where_clause = Where::Document(DocumentExpression {
    operator: DocumentOperator::Regex,
    pattern: r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$".to_string(),
});

let results = collection
    .get(None, Some(where_clause), None, None, None)
    .await?;

Using Logical Operators

You can also use the logical operators to combine multiple filters using CompositeExpression.

An And operator will return results that match all the filters in the list:

rust
use chroma::types::{
    BooleanOperator, CompositeExpression, DocumentExpression, DocumentOperator, Where,
};

let where_clause = Where::Composite(CompositeExpression {
    operator: BooleanOperator::And,
    children: vec![
        Where::Document(DocumentExpression {
            operator: DocumentOperator::Contains,
            pattern: "search_string_1".to_string(),
        }),
        Where::Document(DocumentExpression {
            operator: DocumentOperator::Regex,
            pattern: "[a-z]+".to_string(),
        }),
    ],
});

let results = collection
    .query(vec![vec![0.1, 0.2, 0.3]], Some(10), Some(where_clause), None, None)
    .await?;

An Or operator will return results that match any of the filters in the list:

rust
let where_clause = Where::Composite(CompositeExpression {
    operator: BooleanOperator::Or,
    children: vec![
        Where::Document(DocumentExpression {
            operator: DocumentOperator::Contains,
            pattern: "search_string_1".to_string(),
        }),
        Where::Document(DocumentExpression {
            operator: DocumentOperator::NotContains,
            pattern: "search_string_2".to_string(),
        }),
    ],
});

let results = collection
    .query(vec![vec![0.1, 0.2, 0.3]], Some(10), Some(where_clause), None, None)
    .await?;

Combining with Metadata Filtering

get and query can handle document search combined with metadata filtering using a composite where clause:

rust
use chroma::types::{
    BooleanOperator, CompositeExpression, DocumentExpression, DocumentOperator,
    MetadataComparison, MetadataExpression, MetadataValue, PrimitiveOperator, Where,
};

let where_clause = Where::Composite(CompositeExpression {
    operator: BooleanOperator::And,
    children: vec![
        Where::Metadata(MetadataExpression {
            key: "metadata_field".to_string(),
            comparison: MetadataComparison::Primitive(
                PrimitiveOperator::Equal,
                MetadataValue::Str("is_equal_to_this".to_string()),
            ),
        }),
        Where::Document(DocumentExpression {
            operator: DocumentOperator::Contains,
            pattern: "search_string".to_string(),
        }),
    ],
});

let results = collection
    .query(vec![vec![0.1, 0.2, 0.3]], Some(10), Some(where_clause), None, None)
    .await?;
</Tab> </Tabs>