examples/functionality/vector_store/oceanbase/README.md
This example demonstrates how to use OceanBaseStore for vector storage and semantic search in AgentScope. It includes CRUD operations, metadata filtering, document chunking, and distance metric tests.
Install dependencies (including pyobvector):
pip install -e .[full]
Start seekdb (a minimal OceanBase-compatible instance):
docker run -d -p 2881:2881 oceanbase/seekdb
Run the example script:
python main.py
Note: The script defaults to
127.0.0.1:2881, userroot, databasetest. If you use a multi-tenant OceanBase account (e.g.,root@test), override via environment variables.
from agentscope.rag import OceanBaseStore
store = OceanBaseStore(
collection_name="test_collection",
dimensions=768,
distance="COSINE",
uri="127.0.0.1:2881",
user="root",
password="",
db_name="test",
)
from agentscope.rag import Document, DocMetadata
from agentscope.message import TextBlock
doc = Document(
metadata=DocMetadata(
content=TextBlock(type="text", text="Your document text"),
doc_id="doc_1",
chunk_id=0,
total_chunks=1,
),
embedding=[0.1, 0.2, 0.3],
)
await store.add([doc])
results = await store.search(
query_embedding=[0.1, 0.2, 0.3],
limit=5,
score_threshold=0.9,
)
client = store.get_client()
table = client.load_table(collection_name="test_collection")
results = await store.search(
query_embedding=[0.1, 0.2, 0.3],
limit=5,
flter=[table.c["doc_id"].like("doc%")],
)
Note: The parameter name is
flter(missing the "i") to avoid clashing with Python's built-infilterand follows the underlying library's convention.
client = store.get_client()
table = client.load_table(collection_name="test_collection")
await store.delete(where=[table.c["doc_id"] == "doc_1"])
| Metric | Description | Best For |
|---|---|---|
| COSINE | Cosine similarity | Text embeddings (recommended) |
| L2 | Euclidean distance | Spatial data |
| IP | Inner product | Recommendation systems |
Build filters using SQLAlchemy expressions and pass them via flter:
table = store.get_client().load_table("test_collection")
filters = [
table.c["doc_id"] == "doc_1",
table.c["doc_id"].like("prefix%"),
table.c["chunk_id"] >= 0,
]
client = store.get_client()
stats = client.get_collection_stats(collection_name="test_collection")
content: Text content (TextBlock)doc_id: Unique document identifierchunk_id: Chunk position (0-indexed)total_chunks: Total chunks in documentWhat embedding dimension should I use? Match your embedding model's output dimension (e.g., 768 for BERT, 1536 for OpenAI ada-002).
Can I change the distance metric after creation? No, create a new collection with the desired metric.
How do I clean up test data? Drop the collection via the underlying client or remove the seekdb container volume.
The script supports the following environment variables to override connection settings:
export OCEANBASE_URI="127.0.0.1:2881"
export OCEANBASE_USER="root"
export OCEANBASE_PASSWORD=""
export OCEANBASE_DB="test"