docs/components/vectordbs/dbs/cassandra.mdx
Apache Cassandra is a highly scalable, distributed NoSQL database designed for handling large amounts of data across many commodity servers with no single point of failure. It supports vector storage for semantic search capabilities in AI applications and can scale to massive datasets with linear performance improvements.
import os
from mem0 import Memory
os.environ["OPENAI_API_KEY"] = "sk-xx"
config = {
"vector_store": {
"provider": "cassandra",
"config": {
"contact_points": ["127.0.0.1"],
"port": 9042,
"username": "cassandra",
"password": "cassandra",
"keyspace": "mem0",
"collection_name": "memories",
}
}
}
m = Memory.from_config(config)
messages = [
{"role": "user", "content": "I'm planning to watch a movie tonight. Any recommendations?"},
{"role": "assistant", "content": "How about thriller movies? They can be quite engaging."},
{"role": "user", "content": "I'm not a big fan of thriller movies but I love sci-fi movies."},
{"role": "assistant", "content": "Got it! I'll avoid thriller recommendations and suggest sci-fi movies in the future."}
]
m.add(messages, user_id="alice", metadata={"category": "movies"})
For managed Cassandra with DataStax Astra DB:
config = {
"vector_store": {
"provider": "cassandra",
"config": {
"contact_points": ["dummy"], # Not used with secure connect bundle
"username": "token",
"password": "AstraCS:...", # Your Astra DB application token
"keyspace": "mem0",
"collection_name": "memories",
"secure_connect_bundle": "/path/to/secure-connect-bundle.zip"
}
}
}
Here are the parameters available for configuring Apache Cassandra:
| Parameter | Description | Default Value |
|---|---|---|
contact_points | List of contact point IP addresses | Required |
port | Cassandra port | 9042 |
username | Database username | None |
password | Database password | None |
keyspace | Keyspace name | "mem0" |
collection_name | Table name for storing vectors | "memories" |
embedding_model_dims | Dimensions of embedding vectors | 1536 |
secure_connect_bundle | Path to Astra DB secure connect bundle | None |
protocol_version | CQL protocol version | 4 |
load_balancing_policy | Custom load balancing policy | None |
# Pull and run Cassandra container
docker run --name mem0-cassandra \
-p 9042:9042 \
-e CASSANDRA_CLUSTER_NAME="Mem0Cluster" \
-d cassandra:latest
# Wait for Cassandra to start (may take 1-2 minutes)
docker exec -it mem0-cassandra cqlsh
# Create keyspace
CREATE KEYSPACE IF NOT EXISTS mem0
WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1};
Ubuntu/Debian:
# Add Apache Cassandra repository
echo "deb https://downloads.apache.org/cassandra/debian 40x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
curl https://downloads.apache.org/cassandra/KEYS | sudo apt-key add -
# Install Cassandra
sudo apt-get update
sudo apt-get install cassandra
# Start Cassandra
sudo systemctl start cassandra
# Verify installation
nodetool status
macOS:
# Using Homebrew
brew install cassandra
# Start Cassandra
brew services start cassandra
# Connect to CQL shell
cqlsh
Install the required Python package:
pip install cassandra-driver
from cassandra.policies import DCAwareRoundRobinPolicy
config = {
"vector_store": {
"provider": "cassandra",
"config": {
"contact_points": ["node1.example.com", "node2.example.com", "node3.example.com"],
"port": 9042,
"username": "mem0_user",
"password": "secure_password",
"keyspace": "mem0_prod",
"collection_name": "memories",
"protocol_version": 4,
"load_balancing_policy": DCAwareRoundRobinPolicy(local_dc='DC1')
}
}
}