Back to Chroma

Manage Collections

docs/mintlify/docs/collections/manage-collections.mdx

1.5.911.3 KB
Original Source

import { Danger } from '/snippets/callout.mdx';

Chroma lets you manage collections of embeddings, using the collection primitive. Collections are the fundamental unit of storage and querying in Chroma.

Creating Collections

Chroma collections are created with a name. Collection names are used in the url, so there are a few restrictions on them:

  • The length of the name must be between 3 and 512 characters.
  • The name must start and end with a lowercase letter or a digit, and it can contain dots, dashes, and underscores in between.
  • The name must not contain two consecutive dots.
  • The name must not be a valid IP address.
<CodeGroup> ```python Python collection = client.create_collection(name="my_collection") ```
typescript
const collection = await client.createCollection({
  name: "my_collection",
});
rust
let collection = client
    .create_collection("my_collection", None, None)
    .await?;
</CodeGroup>

Note that collection names must be unique inside a Chroma database. If you try to create a collection with a name of an existing one, you will see an exception.

Embedding Functions

When you add documents to a collection, Chroma will embed them for you by using the collection's embedding function. Chroma will use sentence transformer embedding function as a default.

Chroma also offers various embedding function, which you can provide upon creating a collection. For example, you can create a collection using the OpenAIEmbeddingFunction:

<Tabs> <Tab title="Python" icon="python">

Install the openai package:

<CodeGroup> ```bash pip pip install openai ```
bash
poetry add openai
bash
uv pip install openai
</CodeGroup>

Create your collection with the OpenAIEmbeddingFunction:

python
import os
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction

collection = client.create_collection(
    name="my_collection",
    embedding_function=OpenAIEmbeddingFunction(
        api_key=os.getenv("OPENAI_API_KEY"),
        model_name="text-embedding-3-small"
    )
)

Instead of having Chroma embed documents, you can also provide embeddings directly when adding data to a collection. In this case, your collection will not have an embedding function set, and you will be responsible for providing embeddings directly when adding data and querying.

python
collection = client.create_collection(
    name="my_collection",
    embedding_function=None
)
</Tab> <Tab title="TypeScript" icon="js">

Install the @chroma-core/openai package to get access to the OpenAIEmbeddingFunction:

<CodeGroup> ```bash npm npm install @chroma-core/openai ```
bash
pnpm add @chroma-core/openai
bash
bun add @chroma-core/openai
bash
yarn add @chroma-core/openai
</CodeGroup>

Create your collection with the OpenAIEmbeddingFunction:

typescript
import { OpenAIEmbeddingFunction } from "@chroma-core/openai";

const collection = await client.createCollection({
  name: "my_collection",
  embeddingFunction: new OpenAIEmbeddingFunction({
    apiKey: process.env.OPENAI_API_KEY,
    modelName: "text-embedding-3-small",
  }),
});

Instead of having Chroma embed documents, you can also provide embeddings directly when adding data to a collection. In this case, your collection will not have an embedding function set, and you will be responsible for providing embeddings directly when adding data and querying.

typescript
const collection = await client.createCollection({
  name: "my_collection",
  embeddingFunction: null,
});
</Tab> <Tab title="Rust" icon="rust"> The Rust client expects embeddings to be provided directly when using `add`, `get`, `search` and other functions. Use your provider SDK to generate embeddings, then pass them to Chroma.
rust
collection.add(
    vec!["id1".to_string(), "id2".to_string(), "id3".to_string()],
    vec![
        vec![1.1, 2.3, 3.2],
        vec![4.5, 6.9, 4.4],
        vec![1.1, 2.3, 3.2],
    ],
    Some(vec![
        Some("lorem ipsum...".to_string()),
        Some("doc2".to_string()),
        Some("doc3".to_string()),
    ]),
    None,
    None,
).await?;
</Tab> </Tabs>

Collection Metadata

When creating collections, you can pass the optional metadata argument to add a mapping of metadata key-value pairs to your collections. This can be useful for adding general information about the collection like creation time, description of the data stored in the collection, and more.

<CodeGroup> ```python Python from datetime import datetime

collection = client.create_collection( name="my_collection", embedding_function=emb_fn, metadata={ "description": "my first Chroma collection", "created": str(datetime.now()) } )


```typescript TypeScript
let collection = await client.createCollection({
  name: "my_collection",
  embeddingFunction: emb_fn,
  metadata: {
    description: "my first Chroma collection",
    created: new Date().toString(),
  },
});
rust
use chroma::types::Metadata;

let mut metadata = Metadata::new();
metadata.insert("description".to_string(), "my first Chroma collection".into());
metadata.insert("created".to_string(), "2024-01-01T00:00:00Z".into());

let collection = client
    .create_collection("my_collection", None, Some(metadata))
    .await?;
</CodeGroup>

Getting Collections

<Tabs> <Tab title="Python" icon="python">

There are several ways to get a collection after it was created.

The get_collection function will get a collection from Chroma by name. It returns a Collection object with name, metadata, configuration, and embedding_function.

python
collection = client.get_collection(name="my-collection")

The get_or_create_collection function behaves similarly, but will create the collection if it doesn't exist. You can pass to it the same arguments create_collection expects, and the client will ignore them if the collection already exists.

python
collection = client.get_or_create_collection(
    name="my-collection",
    metadata={"description": "..."}
)

The list_collections function returns the collections you have in your Chroma database. The collections will be ordered by creation time from oldest to newest.

python
collections = client.list_collections()

By default, list_collections returns up to 100 collections. If you have more than 100 collections, or need to get only a subset of your collections, you can use the limit and offset arguments:

python
first_collections_batch = client.list_collections(limit=100) # get the first 100 collections
second_collections_batch = client.list_collections(limit=100, offset=100) # get the next 100 collections
collections_subset = client.list_collections(limit=20, offset=50) # get 20 collections starting from the 50th

Current versions of Chroma store the embedding function you used to create a collection on the server, so the client can resolve it for you on subsequent "get" operations. If you are running an older version of the Chroma client or server (earlier than 1.1.13), you will need to provide the same embedding function you used to create a collection when using get_collection:

python
collection = client.get_collection(
    name='my-collection',
    embedding_function=ef
)
</Tab> <Tab title="TypeScript" icon="js">

There are several ways to get a collection after it was created.

The getCollection function will get a collection from Chroma by name. It returns a collection object with name, metadata, configuration, and embeddingFunction. If you did not provide an embedding function to createCollection, you can provide it to getCollection.

typescript
const collection = await client.getCollection({ name: "my-collection " });

The getOrCreate function behaves similarly, but will create the collection if it doesn't exist. You can pass to it the same arguments createCollection expects, and the client will ignore them if the collection already exists.

typescript
const collection = await client.getOrCreateCollection({
  name: "my-collection",
  metadata: { description: "..." },
});

If you need to get multiple collections at once, you can use getCollections():

typescript
const [col1, col2] = client.getCollections(["col1", "col2"]);

The listCollections function returns all the collections you have in your Chroma database. The collections will be ordered by creation time from oldest to newest.

typescript
const collections = await client.listCollections();

By default, listCollections returns up to 100 collections. If you have more than 100 collections, or need to get only a subset of your collections, you can use the limit and offset arguments:

typescript
const firstCollectionsBatch = await client.listCollections({ limit: 100 }); // get the first 100 collections
const secondCollectionsBatch = await client.listCollections({
  limit: 100,
  offset: 100,
}); // get the next 100 collections
const collectionsSubset = await client.listCollections({
  limit: 20,
  offset: 50,
}); // get 20 collections starting from the 50th

Current versions of Chroma store the embedding function you used to create a collection on the server, so the client can resolve it for you on subsequent "get" operations. If you are running an older version of the Chroma JS/TS client (earlier than 3.04) or server (earlier than 1.1.13), you will need to provide the same embedding function you used to create a collection when using getCollection and getCollections:

typescript
const collection = await client.getCollection({
  name: "my-collection",
  embeddingFunction: ef,
});

const [col1, col2] = client.getCollections([
  { name: "col1", embeddingFunction: openaiEF },
  { name: "col2", embeddingFunction: defaultEF },
]);
</Tab> <Tab title="Rust" icon="rust"> Use the client to get collections or list them with pagination.
rust
let collection = client.get_collection("my-collection").await?;

let collection = client
    .get_or_create_collection("my-collection", None, None)
    .await?;

let collections = client.list_collections(100, Some(0)).await?;
</Tab> </Tabs>

Modifying Collections

After a collection is created, you can modify its name, metadata and elements of its index configuration with the modify method:

<CodeGroup> ```python Python collection.modify( name="new-name", metadata={"description": "new description"} ) ```
typescript
await collection.modify({
  name: "new-name",
  metadata: { description: "new description" },
});
</CodeGroup>

Deleting Collections

You can delete a collection by name. This action will delete a collection, all of its embeddings, and associated documents and records' metadata.

<Danger> Deleting collections is destructive and not reversible </Danger> <CodeGroup> ```python Python client.delete_collection(name="my-collection") ```
typescript
await client.deleteCollection({ name: "my-collection" });
</CodeGroup>

Convenience Methods

Collections also offer a few useful convenience methods:

  • count - returns the number of records in the collection.
  • peek - returns the first 10 records in the collection.
<CodeGroup> ```python Python collection.count() collection.peek() ```
typescript
await collection.count();
await collection.peek();
</CodeGroup>