docs/mintlify/docs/querying-collections/full-text-search.mdx
The where_document argument in get and query is used to filter records based on their document content.
We support full-text search with the $contains and $not_contains operators. We also support regular expression pattern matching with the $regex and $not_regex operators.
For example, here we get all records whose document contains a search string:
collection.get(
where_document={"$contains": "search string"}
)
Note: Full-text search is case-sensitive.
Here we get all records whose documents match the regex pattern for an email address:
collection.get(
where_document={
"$regex": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$"
}
)
You can also use the logical operators $and and $or to combine multiple filters.
An $and operator will return results that match all the filters in the list:
collection.query(
query_texts=["query1", "query2"],
where_document={
"$and": [
{"$contains": "search_string_1"},
{"$regex": "[a-z]+"},
]
}
)
An $or operator will return results that match any of the filters in the list:
collection.query(
query_texts=["query1", "query2"],
where_document={
"$or": [
{"$contains": "search_string_1"},
{"$not_contains": "search_string_2"},
]
}
)
.get and .query can handle where_document search combined with metadata filtering:
collection.query(
query_texts=["doc10", "thus spake zarathustra", ...],
n_results=10,
where={"metadata_field": "is_equal_to_this"},
where_document={"$contains":"search_string"}
)
The whereDocument argument in get and query is used to filter records based on their document content.
We support full-text search with the $contains and $not_contains operators. We also support regular expression pattern matching with the $regex and $not_regex operators.
For example, here we get all records whose document contains a search string:
await collection.get({
whereDocument: { $contains: "search string" },
});
Here we get all records whose documents matches the regex pattern for an email address:
await collection.get({
whereDocument: {
$regex: "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$",
},
});
You can also use the logical operators $and and $or to combine multiple filters.
An $and operator will return results that match all the filters in the list:
await collection.query({
queryTexts: ["query1", "query2"],
whereDocument: {
$and: [{ $contains: "search_string_1" }, { $regex: "[a-z]+" }],
},
});
An $or operator will return results that match any of the filters in the list:
await collection.query({
queryTexts: ["query1", "query2"],
whereDocument: {
$or: [
{ $contains: "search_string_1" },
{ $not_contains: "search_string_2" },
],
},
});
.get and .query can handle whereDocument search combined with metadata filtering:
await collection.query({
queryTexts: ["doc10", "thus spake zarathustra", ...],
nResults: 10,
where: { metadata_field: "is_equal_to_this" },
whereDocument: { "$contains": "search_string" }
})
The r#where argument in get and query is used to filter records based on their document content.
We support full-text search with the Contains and NotContains operators. We also support regular expression pattern matching with the Regex and NotRegex operators.
For example, here we get all records whose document contains a search string:
use chroma::types::{DocumentExpression, DocumentOperator, Where};
let where_clause = Where::Document(DocumentExpression {
operator: DocumentOperator::Contains,
pattern: "search string".to_string(),
});
let results = collection
.get(None, Some(where_clause), None, None, None)
.await?;
Here we get all records whose documents matches the regex pattern for an email address:
let where_clause = Where::Document(DocumentExpression {
operator: DocumentOperator::Regex,
pattern: r"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$".to_string(),
});
let results = collection
.get(None, Some(where_clause), None, None, None)
.await?;
You can also use the logical operators to combine multiple filters using CompositeExpression.
An And operator will return results that match all the filters in the list:
use chroma::types::{
BooleanOperator, CompositeExpression, DocumentExpression, DocumentOperator, Where,
};
let where_clause = Where::Composite(CompositeExpression {
operator: BooleanOperator::And,
children: vec![
Where::Document(DocumentExpression {
operator: DocumentOperator::Contains,
pattern: "search_string_1".to_string(),
}),
Where::Document(DocumentExpression {
operator: DocumentOperator::Regex,
pattern: "[a-z]+".to_string(),
}),
],
});
let results = collection
.query(vec![vec![0.1, 0.2, 0.3]], Some(10), Some(where_clause), None, None)
.await?;
An Or operator will return results that match any of the filters in the list:
let where_clause = Where::Composite(CompositeExpression {
operator: BooleanOperator::Or,
children: vec![
Where::Document(DocumentExpression {
operator: DocumentOperator::Contains,
pattern: "search_string_1".to_string(),
}),
Where::Document(DocumentExpression {
operator: DocumentOperator::NotContains,
pattern: "search_string_2".to_string(),
}),
],
});
let results = collection
.query(vec![vec![0.1, 0.2, 0.3]], Some(10), Some(where_clause), None, None)
.await?;
get and query can handle document search combined with metadata filtering using a composite where clause:
use chroma::types::{
BooleanOperator, CompositeExpression, DocumentExpression, DocumentOperator,
MetadataComparison, MetadataExpression, MetadataValue, PrimitiveOperator, Where,
};
let where_clause = Where::Composite(CompositeExpression {
operator: BooleanOperator::And,
children: vec![
Where::Metadata(MetadataExpression {
key: "metadata_field".to_string(),
comparison: MetadataComparison::Primitive(
PrimitiveOperator::Equal,
MetadataValue::Str("is_equal_to_this".to_string()),
),
}),
Where::Document(DocumentExpression {
operator: DocumentOperator::Contains,
pattern: "search_string".to_string(),
}),
],
});
let results = collection
.query(vec![vec![0.1, 0.2, 0.3]], Some(10), Some(where_clause), None, None)
.await?;