Back to Agentscope

OceanBase Vector Store

examples/functionality/vector_store/oceanbase/README.md

1.0.193.6 KB
Original Source

OceanBase Vector Store

This example demonstrates how to use OceanBaseStore for vector storage and semantic search in AgentScope. It includes CRUD operations, metadata filtering, document chunking, and distance metric tests.

Quick Start

Install dependencies (including pyobvector):

bash
pip install -e .[full]

Start seekdb (a minimal OceanBase-compatible instance):

bash
docker run -d -p 2881:2881 oceanbase/seekdb

Run the example script:

bash
python main.py

Note: The script defaults to 127.0.0.1:2881, user root, database test. If you use a multi-tenant OceanBase account (e.g., root@test), override via environment variables.

Usage

Initialize Store

python
from agentscope.rag import OceanBaseStore

store = OceanBaseStore(
    collection_name="test_collection",
    dimensions=768,
    distance="COSINE",
    uri="127.0.0.1:2881",
    user="root",
    password="",
    db_name="test",
)

Add Documents

python
from agentscope.rag import Document, DocMetadata
from agentscope.message import TextBlock

doc = Document(
    metadata=DocMetadata(
        content=TextBlock(type="text", text="Your document text"),
        doc_id="doc_1",
        chunk_id=0,
        total_chunks=1,
    ),
    embedding=[0.1, 0.2, 0.3],
)

await store.add([doc])
python
results = await store.search(
    query_embedding=[0.1, 0.2, 0.3],
    limit=5,
    score_threshold=0.9,
)
python
client = store.get_client()
table = client.load_table(collection_name="test_collection")

results = await store.search(
    query_embedding=[0.1, 0.2, 0.3],
    limit=5,
    flter=[table.c["doc_id"].like("doc%")],
)

Note: The parameter name is flter (missing the "i") to avoid clashing with Python's built-in filter and follows the underlying library's convention.

Delete

python
client = store.get_client()
table = client.load_table(collection_name="test_collection")

await store.delete(where=[table.c["doc_id"] == "doc_1"])

Distance Metrics

MetricDescriptionBest For
COSINECosine similarityText embeddings (recommended)
L2Euclidean distanceSpatial data
IPInner productRecommendation systems

Filter Expressions

Build filters using SQLAlchemy expressions and pass them via flter:

python
table = store.get_client().load_table("test_collection")

filters = [
    table.c["doc_id"] == "doc_1",
    table.c["doc_id"].like("prefix%"),
    table.c["chunk_id"] >= 0,
]

Advanced Usage

Access Underlying Client

python
client = store.get_client()
stats = client.get_collection_stats(collection_name="test_collection")

Document Metadata

  • content: Text content (TextBlock)
  • doc_id: Unique document identifier
  • chunk_id: Chunk position (0-indexed)
  • total_chunks: Total chunks in document

FAQ

What embedding dimension should I use? Match your embedding model's output dimension (e.g., 768 for BERT, 1536 for OpenAI ada-002).

Can I change the distance metric after creation? No, create a new collection with the desired metric.

How do I clean up test data? Drop the collection via the underlying client or remove the seekdb container volume.

Environment Variables

The script supports the following environment variables to override connection settings:

bash
export OCEANBASE_URI="127.0.0.1:2881"
export OCEANBASE_USER="root"
export OCEANBASE_PASSWORD=""
export OCEANBASE_DB="test"

References