Back to Chroma

Collection

docs/mintlify/reference/python/collection.mdx

1.5.99.5 KB
Original Source

Collection Methods

count

Return the number of records in the collection.

add

Add records to the collection.

<ParamField path="ids" type="Union[str, IDs]" required> Record IDs to add. </ParamField> <ParamField path="embeddings" type="Optional[Embeddings]"> Embeddings to add. If None, embeddings are computed. </ParamField> <ParamField path="metadatas" type="Union[Optional[Metadatas], List[Optional[Metadatas]], None]"> Optional metadata for each record. </ParamField> <ParamField path="documents" type="Union[str, IDs, None]"> Optional documents for each record. </ParamField> <ParamField path="images" type="Optional[Embeddings]"> Optional images for each record. </ParamField> <ParamField path="uris" type="Union[str, IDs, None]"> Optional URIs for loading images. </ParamField>

Raises:

  • ValueError: If embeddings and documents are both missing.
  • ValueError: If embeddings and documents are both provided.
  • ValueError: If lengths of provided fields do not match.
  • ValueError: If an ID already exists.

get

Retrieve records from the collection.

If no filters are provided, returns records up to limit starting at offset.

<ParamField path="ids" type="Union[str, IDs, None]"> If provided, only return records with these IDs. </ParamField> <ParamField path="where" type="Optional[Dict[Union[str, Literal[$and], Literal[$or]], Where]]"> A Where filter used to filter based on metadata values. </ParamField> <ParamField path="limit" type="Optional[int]"> Maximum number of results to return. </ParamField> <ParamField path="offset" type="Optional[int]"> Number of results to skip before returning. </ParamField> <ParamField path="where_document" type="Optional[Dict[Where, Union[str, List[Dict[Where, Union[str, List[WhereDocument]]]]]]]"> A WhereDocument filter used to filter based on K.DOCUMENT. </ParamField> <ParamField path="include" type="List[Literal[documents, embeddings, metadatas, distances, uris, data]]"> Fields to include in results. Can contain "embeddings", "metadatas", "documents", "uris". Defaults to "metadatas" and "documents". </ParamField>

Returns: Retrieved records and requested fields as a GetResult object.

peek

Return the first limit records from the collection.

<ParamField path="limit" type="int"> Maximum number of records to return. </ParamField>

Returns: Retrieved records and requested fields.

query

Query for the K nearest neighbor records in the collection.

This is a batch query API. Multiple queries can be performed at once by providing multiple embeddings, texts, or images.

python
query_1 = [0.1, 0.2, 0.3]
query_2 = [0.4, 0.5, 0.6]
results = collection.query(
    query_embeddings=[query_1, query_2],
    n_results=10,
)

If query_texts, query_images, or query_uris are provided, the collection's embedding function will be used to create embeddings before querying the API.

The ids, where, where_document, and include parameters are applied to all queries.

<ParamField path="query_embeddings" type="Optional[Embeddings]"> Raw embeddings to query for. </ParamField> <ParamField path="query_texts" type="Union[str, IDs, None]"> Documents to embed and query against. </ParamField> <ParamField path="query_images" type="Optional[Embeddings]"> Images to embed and query against. </ParamField> <ParamField path="query_uris" type="Union[str, IDs, None]"> URIs to be loaded and embedded. </ParamField> <ParamField path="ids" type="Union[str, IDs, None]"> Optional subset of IDs to search within. </ParamField> <ParamField path="n_results" type="int"> Number of neighbors to return per query. </ParamField> <ParamField path="where" type="Optional[Dict[Union[str, Literal[$and], Literal[$or]], Where]]"> Metadata filter. </ParamField> <ParamField path="where_document" type="Optional[Dict[Where, Union[str, List[Dict[Where, Union[str, List[WhereDocument]]]]]]]"> Document content filter. </ParamField> <ParamField path="include" type="List[Literal[documents, embeddings, metadatas, distances, uris, data]]"> Fields to include in results. Can contain "embeddings", "metadatas", "documents", "uris", "distances". Defaults to "metadatas", "documents", "distances". </ParamField>

Returns: Nearest neighbor results.

Raises:

  • ValueError: If no query input is provided.
  • ValueError: If multiple query input types are provided.

modify

Update collection name, metadata, or configuration.

<ParamField path="name" type="Optional[str]"> New collection name. </ParamField> <ParamField path="metadata" type="Optional[Dict[str, Any]]"> New metadata for the collection. </ParamField> <ParamField path="configuration" type="Optional[UpdateCollectionConfiguration]"> New configuration for the collection. </ParamField>

update

Update existing records by ID.

Records are provided in columnar format. If provided, the embeddings, metadatas, documents, and uris lists must be the same length. Entries in each list correspond to the same record.

python
ids = ["id1", "id2", "id3"]
embeddings = [[0.1, 0.2, 0.3], [0.4, 0.5, 0.6], [0.7, 0.8, 0.9]]
metadatas = [{"key": "value"}, {"key": "value"}, {"key": "value"}]
documents = ["document1", "document2", "document3"]
uris = ["uri1", "uri2", "uri3"]
collection.update(ids, embeddings, metadatas, documents, uris)

If embeddings are not provided, the embeddings will be computed based on documents using the collection's embedding function.

<ParamField path="ids" type="Union[str, IDs]" required> Record IDs to update. </ParamField> <ParamField path="embeddings" type="Optional[Embeddings]"> Updated embeddings. If None, embeddings are computed. </ParamField> <ParamField path="metadatas" type="Union[Optional[Metadatas], List[Optional[Metadatas]], None]"> Updated metadata. </ParamField> <ParamField path="documents" type="Union[str, IDs, None]"> Updated documents. </ParamField> <ParamField path="images" type="Optional[Embeddings]"> Updated images. </ParamField> <ParamField path="uris" type="Union[str, IDs, None]"> Updated URIs for loading images. </ParamField>

upsert

Create or update records by ID.

<ParamField path="ids" type="Union[str, IDs]" required> Record IDs to upsert. </ParamField> <ParamField path="embeddings" type="Optional[Embeddings]"> Embeddings to add or update. If None, embeddings are computed. </ParamField> <ParamField path="metadatas" type="Union[Optional[Metadatas], List[Optional[Metadatas]], None]"> Metadata to add or update. </ParamField> <ParamField path="documents" type="Union[str, IDs, None]"> Documents to add or update. </ParamField> <ParamField path="images" type="Optional[Embeddings]"> Images to add or update. </ParamField> <ParamField path="uris" type="Union[str, IDs, None]"> URIs for loading images. </ParamField>

delete

Delete records by ID or filters.

All documents that match the ids or where and where_document filters will be deleted.

<ParamField path="ids" type="Optional[IDs]"> Record IDs to delete. </ParamField> <ParamField path="where" type="Optional[Dict[Union[str, Literal[$and], Literal[$or]], Where]]"> Metadata filter. </ParamField> <ParamField path="where_document" type="Optional[Dict[Where, Union[str, List[Dict[Where, Union[str, List[WhereDocument]]]]]]]"> Document content filter. </ParamField>

Raises:

  • ValueError: If no IDs or filters are provided.

Types

GetResult

Result payload for collection.get() operations.

The returned records are in columnar form. Corresponding entries in each list correspond to the same record.

python
results = collection.get(ids=["id1", "id2", "id3"])
records = zip(results["ids"], results["documents"], results["metadatas"])
for id, document, metadata in records:
    print(id, document, metadata)

GetResult will only include ids and the fields specified in the include param when making the get() operation.

<span class="text-sm">Properties</span>

<ParamField path="ids" type="IDs" /> <ParamField path="embeddings" type="Optional[Embeddings]" /> <ParamField path="documents" type="Optional[IDs]" /> <ParamField path="uris" type="Optional[IDs]" /> <ParamField path="data" type="Optional[Optional[Embeddings]]" /> <ParamField path="metadatas" type="Optional[List[Optional[Metadatas]]]" /> <ParamField path="included" type="List[Literal[documents, embeddings, metadatas, distances, uris, data]]" />

QueryResult

Result payload for collection.query() operations.

The returned records are batches of records in columnar form.

python
results = collection.query(query_embeddings=[batch_1, batch_2, ...])
batches = zip(results["ids"], results["documents"], results["metadatas"])

Each batch is a list of records in columnar form.

python
for batch in batches:
    records = zip(batch["ids"], batch["documents"], batch["metadatas"])
    for id, document, metadata in records:
        print(id, document, metadata)

QueryResult will only include ids and the fields specified in the include param when making the query() operation.

<span class="text-sm">Properties</span>

<ParamField path="ids" type="List[IDs]" /> <ParamField path="embeddings" type="Optional[Embeddings]" /> <ParamField path="documents" type="Optional[List[IDs]]" /> <ParamField path="uris" type="Optional[List[IDs]]" /> <ParamField path="data" type="Optional[List[Optional[Embeddings]]]" /> <ParamField path="metadatas" type="Optional[List[List[Optional[Metadatas]]]]" /> <ParamField path="distances" type="Optional[List[List[float]]]" /> <ParamField path="included" type="List[Literal[documents, embeddings, metadatas, distances, uris, data]]" />