Back to Langbot

SeekDB Vector Database Integration

docs/SEEKDB_INTEGRATION.md

4.9.68.1 KB
Original Source

SeekDB Vector Database Integration

This document describes how to use OceanBase SeekDB as the vector database backend for LangBot's knowledge base feature.

What is SeekDB?

OceanBase SeekDB is an AI-native search database that unifies relational, vector, text, JSON and GIS in a single engine, enabling hybrid search and in-database AI workflows. It's developed by OceanBase and released under Apache 2.0 license.

Key Features

  • Hybrid Search: Combine vector search, full-text search and relational query in a single statement
  • Multi-Model Support: Support relational, vector, text, JSON and GIS in a single engine
  • Lightweight: Requires as little as 1 CPU core and 2 GB of memory
  • Multiple Deployment Modes: Supports both embedded mode and client/server mode
  • MySQL Compatible: Powered by OceanBase engine with full ACID compliance and MySQL compatibility

Installation

SeekDB support is automatically included when you install LangBot. The required dependency pyseekdb is listed in pyproject.toml.

If you need to install it manually:

bash
pip install pyseekdb

⚠️ Platform Compatibility

Embedded Mode

PlatformStatusNotes
Linux✅ SupportedFull embedded mode support via pylibseekdb
macOS❌ Not Supportedpylibseekdb is Linux-only; use server mode instead
Windows❌ Not Supportedpylibseekdb is Linux-only; use server mode instead

Important: Embedded mode requires the pylibseekdb library, which is only available on Linux. If you're on macOS or Windows, you must use server mode.

Server Mode (Docker)

PlatformStatusNotes
Linux✅ SupportedFull Docker support
macOS⚠️ Known IssueDocker container initialization failure - See Issue #36
Windows⚠️ UntestedShould work but not yet tested

macOS Users: Currently, SeekDB Docker containers have an initialization issue on macOS (oceanbase/seekdb#36). Until this is resolved, we recommend:

  • Using ChromaDB or Qdrant as alternatives
  • Connecting to a remote SeekDB server on Linux if available

Server Mode (Remote Connection)

PlatformStatusNotes
All Platforms✅ SupportedConnect to SeekDB running on a remote Linux server

Recommendation for macOS/Windows users: Deploy SeekDB on a Linux server and connect via server mode configuration.

Configuration

Embedded mode runs SeekDB directly within the LangBot process, storing data locally. This is the simplest setup and requires no external services.

Edit your config.yaml:

yaml
vdb:
  use: seekdb
  seekdb:
    mode: embedded
    path: './data/seekdb'  # Path to store SeekDB data
    database: 'langbot'    # Database name

Server Mode (For Production)

Server mode connects to a remote SeekDB server or OceanBase server. This is recommended for production deployments.

SeekDB Server

yaml
vdb:
  use: seekdb
  seekdb:
    mode: server
    host: 'localhost'
    port: 2881
    database: 'langbot'
    user: 'root'
    password: ''  # Can also use SEEKDB_PASSWORD env var

OceanBase Server

If you're using OceanBase with seekdb capabilities:

yaml
vdb:
  use: seekdb
  seekdb:
    mode: server
    host: 'localhost'
    port: 2881
    tenant: 'sys'        # OceanBase tenant name
    database: 'langbot'
    user: 'root'
    password: ''

Configuration Parameters

ParameterRequiredDefaultDescription
modeNoembeddedDeployment mode: embedded or server
pathNo./data/seekdbData directory for embedded mode
databaseNolangbotDatabase name
hostNolocalhostServer host (server mode only)
portNo2881Server port (server mode only)
userNorootUsername (server mode only)
passwordNo''Password (server mode only)
tenantNoNoneOceanBase tenant (optional, server mode only)

Usage

Once configured, SeekDB will be used automatically for all knowledge base operations in LangBot:

  1. Creating Knowledge Bases: Vectors will be stored in SeekDB collections
  2. Adding Documents: Document embeddings will be indexed in SeekDB
  3. Searching: Vector similarity search will use SeekDB's efficient indexing
  4. Deleting: Document removal will delete vectors from SeekDB

No code changes are required - just update your configuration!

Architecture Details

Implementation

The SeekDB adapter is implemented in src/langbot/pkg/vector/vdbs/seekdb.py and follows the same VectorDatabase interface as Chroma and Qdrant adapters.

Key methods:

  • add_embeddings(): Add vectors with metadata to a collection
  • search(): Perform vector similarity search
  • delete_by_file_id(): Delete vectors by file ID metadata
  • get_or_create_collection(): Manage collections
  • delete_collection(): Remove entire collections

Vector Storage

  • Collections are created with HNSW (Hierarchical Navigable Small World) index
  • Default distance metric: Cosine similarity
  • Default vector dimension: 384 (adjusts automatically based on embeddings)
  • Metadata is stored alongside vectors for filtering

Advantages Over Other Vector Databases

vs. ChromaDB

  • ✅ Better MySQL compatibility
  • ✅ Hybrid search capabilities (vector + full-text + SQL)
  • ✅ Production-grade distributed mode support
  • ✅ Lightweight embedded mode

vs. Qdrant

  • ✅ SQL query support
  • ✅ MySQL ecosystem integration
  • ✅ Simpler deployment (no Docker required for embedded mode)
  • ✅ Multi-model data support (not just vectors)

Troubleshooting

Import Error

If you see: ImportError: pyseekdb is not installed

Solution:

bash
pip install pyseekdb

Embedded Mode Error on macOS/Windows

Error:

RuntimeError: Embedded Client is not available because pylibseekdb is not available.
Please install pylibseekdb (Linux only) or use RemoteServerClient (host/port) instead.

Cause: pylibseekdb is only available on Linux platforms.

Solution: Use server mode instead:

  1. Deploy SeekDB on a Linux server or VM
  2. Configure LangBot to use server mode:
yaml
vdb:
  use: seekdb
  seekdb:
    mode: server
    host: 'your-seekdb-server-ip'
    port: 2881
    database: 'langbot'
    user: 'root'
    password: ''

Alternative: Use ChromaDB or Qdrant, which work on all platforms:

yaml
vdb:
  use: chroma  # or qdrant

Docker Container Fails on macOS

Symptoms:

bash
docker run -d -p 2881:2881 oceanbase/seekdb:latest
# Container exits immediately with code 30

Error in logs:

[ERROR] Code: Agent.SeekDB.Not.Exists
Message: initialize failed: init agent failed: SeekDB not exists in current directory.

Cause: This is a known issue with SeekDB Docker containers on macOS. See oceanbase/seekdb#36.

Status: Under investigation by OceanBase team.

Workaround Options:

  1. Use alternatives: ChromaDB or Qdrant work perfectly on macOS
  2. Remote server: Deploy SeekDB on a Linux server and connect remotely
  3. Wait for fix: Monitor the GitHub issue for updates

Connection Error (Server Mode)

If SeekDB server is not reachable, check:

  1. Server is running: ps aux | grep observer
  2. Port is accessible: nc -zv localhost 2881
  3. Credentials are correct in config
  4. Firewall allows connections on port 2881

Performance Issues

For large datasets:

  • Use server mode instead of embedded mode
  • Ensure adequate memory allocation
  • Consider using OceanBase distributed mode for very large scale
  • Adjust HNSW index parameters if needed

Resources

License

SeekDB is licensed under Apache License 2.0.