docs/mintlify/docs/collections/manage-collections.mdx
import { Danger } from '/snippets/callout.mdx';
Chroma lets you manage collections of embeddings, using the collection primitive. Collections are the fundamental unit of storage and querying in Chroma.
Chroma collections are created with a name. Collection names are used in the url, so there are a few restrictions on them:
const collection = await client.createCollection({
name: "my_collection",
});
let collection = client
.create_collection("my_collection", None, None)
.await?;
Note that collection names must be unique inside a Chroma database. If you try to create a collection with a name of an existing one, you will see an exception.
When you add documents to a collection, Chroma will embed them for you by using the collection's embedding function. Chroma will use sentence transformer embedding function as a default.
Chroma also offers various embedding function, which you can provide upon creating a collection. For example, you can create a collection using the OpenAIEmbeddingFunction:
Install the openai package:
poetry add openai
uv pip install openai
Create your collection with the OpenAIEmbeddingFunction:
import os
from chromadb.utils.embedding_functions import OpenAIEmbeddingFunction
collection = client.create_collection(
name="my_collection",
embedding_function=OpenAIEmbeddingFunction(
api_key=os.getenv("OPENAI_API_KEY"),
model_name="text-embedding-3-small"
)
)
Instead of having Chroma embed documents, you can also provide embeddings directly when adding data to a collection. In this case, your collection will not have an embedding function set, and you will be responsible for providing embeddings directly when adding data and querying.
collection = client.create_collection(
name="my_collection",
embedding_function=None
)
Install the @chroma-core/openai package to get access to the OpenAIEmbeddingFunction:
pnpm add @chroma-core/openai
bun add @chroma-core/openai
yarn add @chroma-core/openai
Create your collection with the OpenAIEmbeddingFunction:
import { OpenAIEmbeddingFunction } from "@chroma-core/openai";
const collection = await client.createCollection({
name: "my_collection",
embeddingFunction: new OpenAIEmbeddingFunction({
apiKey: process.env.OPENAI_API_KEY,
modelName: "text-embedding-3-small",
}),
});
Instead of having Chroma embed documents, you can also provide embeddings directly when adding data to a collection. In this case, your collection will not have an embedding function set, and you will be responsible for providing embeddings directly when adding data and querying.
const collection = await client.createCollection({
name: "my_collection",
embeddingFunction: null,
});
collection.add(
vec!["id1".to_string(), "id2".to_string(), "id3".to_string()],
vec![
vec![1.1, 2.3, 3.2],
vec![4.5, 6.9, 4.4],
vec![1.1, 2.3, 3.2],
],
Some(vec![
Some("lorem ipsum...".to_string()),
Some("doc2".to_string()),
Some("doc3".to_string()),
]),
None,
None,
).await?;
When creating collections, you can pass the optional metadata argument to add a mapping of metadata key-value pairs to your collections. This can be useful for adding general information about the collection like creation time, description of the data stored in the collection, and more.
collection = client.create_collection( name="my_collection", embedding_function=emb_fn, metadata={ "description": "my first Chroma collection", "created": str(datetime.now()) } )
```typescript TypeScript
let collection = await client.createCollection({
name: "my_collection",
embeddingFunction: emb_fn,
metadata: {
description: "my first Chroma collection",
created: new Date().toString(),
},
});
use chroma::types::Metadata;
let mut metadata = Metadata::new();
metadata.insert("description".to_string(), "my first Chroma collection".into());
metadata.insert("created".to_string(), "2024-01-01T00:00:00Z".into());
let collection = client
.create_collection("my_collection", None, Some(metadata))
.await?;
There are several ways to get a collection after it was created.
The get_collection function will get a collection from Chroma by name. It returns a Collection object with name, metadata, configuration, and embedding_function.
collection = client.get_collection(name="my-collection")
The get_or_create_collection function behaves similarly, but will create the collection if it doesn't exist. You can pass to it the same arguments create_collection expects, and the client will ignore them if the collection already exists.
collection = client.get_or_create_collection(
name="my-collection",
metadata={"description": "..."}
)
The list_collections function returns the collections you have in your Chroma database. The collections will be ordered by creation time from oldest to newest.
collections = client.list_collections()
By default, list_collections returns up to 100 collections. If you have more than 100 collections, or need to get only a subset of your collections, you can use the limit and offset arguments:
first_collections_batch = client.list_collections(limit=100) # get the first 100 collections
second_collections_batch = client.list_collections(limit=100, offset=100) # get the next 100 collections
collections_subset = client.list_collections(limit=20, offset=50) # get 20 collections starting from the 50th
Current versions of Chroma store the embedding function you used to create a collection on the server, so the client can resolve it for you on subsequent "get" operations. If you are running an older version of the Chroma client or server (earlier than 1.1.13), you will need to provide the same embedding function you used to create a collection when using get_collection:
collection = client.get_collection(
name='my-collection',
embedding_function=ef
)
There are several ways to get a collection after it was created.
The getCollection function will get a collection from Chroma by name. It returns a collection object with name, metadata, configuration, and embeddingFunction. If you did not provide an embedding function to createCollection, you can provide it to getCollection.
const collection = await client.getCollection({ name: "my-collection " });
The getOrCreate function behaves similarly, but will create the collection if it doesn't exist. You can pass to it the same arguments createCollection expects, and the client will ignore them if the collection already exists.
const collection = await client.getOrCreateCollection({
name: "my-collection",
metadata: { description: "..." },
});
If you need to get multiple collections at once, you can use getCollections():
const [col1, col2] = client.getCollections(["col1", "col2"]);
The listCollections function returns all the collections you have in your Chroma database. The collections will be ordered by creation time from oldest to newest.
const collections = await client.listCollections();
By default, listCollections returns up to 100 collections. If you have more than 100 collections, or need to get only a subset of your collections, you can use the limit and offset arguments:
const firstCollectionsBatch = await client.listCollections({ limit: 100 }); // get the first 100 collections
const secondCollectionsBatch = await client.listCollections({
limit: 100,
offset: 100,
}); // get the next 100 collections
const collectionsSubset = await client.listCollections({
limit: 20,
offset: 50,
}); // get 20 collections starting from the 50th
Current versions of Chroma store the embedding function you used to create a collection on the server, so the client can resolve it for you on subsequent "get" operations. If you are running an older version of the Chroma JS/TS client (earlier than 3.04) or server (earlier than 1.1.13), you will need to provide the same embedding function you used to create a collection when using getCollection and getCollections:
const collection = await client.getCollection({
name: "my-collection",
embeddingFunction: ef,
});
const [col1, col2] = client.getCollections([
{ name: "col1", embeddingFunction: openaiEF },
{ name: "col2", embeddingFunction: defaultEF },
]);
let collection = client.get_collection("my-collection").await?;
let collection = client
.get_or_create_collection("my-collection", None, None)
.await?;
let collections = client.list_collections(100, Some(0)).await?;
After a collection is created, you can modify its name, metadata and elements of its index configuration with the modify method:
await collection.modify({
name: "new-name",
metadata: { description: "new description" },
});
You can delete a collection by name. This action will delete a collection, all of its embeddings, and associated documents and records' metadata.
<Danger> Deleting collections is destructive and not reversible </Danger> <CodeGroup> ```python Python client.delete_collection(name="my-collection") ```await client.deleteCollection({ name: "my-collection" });
Collections also offer a few useful convenience methods:
count - returns the number of records in the collection.peek - returns the first 10 records in the collection.await collection.count();
await collection.peek();