docs/src/content/docs/connectors/sqlite.mdx
The sqlite connector provides utilities for writing rows to SQLite databases, with optional vector support via the sqlite-vec extension.
from cocoindex.connectors import sqlite
:::note[Vector Support] For vector operations, install the sqlite-vec extension:
pip install cocoindex[sqlite]
Note: The default SQLite library bundled with macOS does not support extensions. Use Homebrew Python (brew install python) or build SQLite with extension support.
:::
connect() creates a managed SQLite connection with sensible defaults, including automatic sqlite-vec loading and thread-safe access.
def connect(
database: str | Path,
*,
timeout: float = 5.0,
load_vec: bool | Literal["auto"] = "auto",
**kwargs: Any,
) -> ManagedConnection
Parameters:
database — Path to the SQLite database file, or ":memory:" for an in-memory database.timeout — How long to wait for locks before raising an error.load_vec — Whether to load the sqlite-vec extension for vector support:
"auto" (default): Try to load, silently ignore if unavailable.True: Load and raise an error if unavailable.False: Don't attempt to load.**kwargs — Additional arguments passed directly to sqlite3.connect().Returns: A ManagedConnection with thread-safe access and extension tracking.
Example:
managed_conn = sqlite.connect("mydb.sqlite") # Auto-loads sqlite-vec if available
# Or for in-memory:
managed_conn = sqlite.connect(":memory:")
# Or explicitly require vector support:
managed_conn = sqlite.connect("mydb.sqlite", load_vec=True)
# Or disable auto-loading:
managed_conn = sqlite.connect("mydb.sqlite", load_vec=False)
A wrapper around sqlite3.Connection that provides thread-safe access and tracks loaded extensions. The connection uses autocommit mode internally.
Methods:
transaction() — Context manager that acquires a lock and executes within a transaction (BEGIN...COMMIT/ROLLBACK). Use for write operations that should be atomic.readonly() — Context manager that acquires a lock for read-only operations. No transaction is started since the connection uses autocommit mode.close() — Closes the underlying connection.Properties:
loaded_extensions — A read-only Set[str] of loaded extension names (e.g., "sqlite-vec").The sqlite connector provides target state APIs for writing rows to tables. With it, CocoIndex tracks what rows should exist and automatically handles upserts and deletions.
Create a ContextKey[sqlite.ManagedConnection] to identify your SQLite connection, then provide it in your lifespan using sqlite.managed_connection():
:::note The key name is load-bearing across runs — it's the stable identity CocoIndex uses to track managed rows. See ContextKey as stable identity before renaming. :::
import cocoindex as coco
SQLITE_DB = coco.ContextKey[sqlite.ManagedConnection]("main_db")
@coco.lifespan
def coco_lifespan(builder: coco.EnvironmentBuilder) -> Iterator[None]:
with sqlite.managed_connection("mydb.sqlite") as conn:
builder.provide(SQLITE_DB, conn)
yield
Declares a table as a target state. Returns a TableTarget for declaring rows.
def declare_table_target(
db: ContextKey[ManagedConnection],
table_name: str,
table_schema: TableSchema[RowT],
*,
managed_by: Literal["system", "user"] = "system",
virtual_table_def: Vec0TableDef | None = None,
) -> TableTarget[RowT, coco.PendingS]
Parameters:
db — A ContextKey[ManagedConnection] identifying the connection to use.table_name — Name of the table.table_schema — Schema definition including columns and primary key (see Table Schema).managed_by — Whether CocoIndex manages the table lifecycle ("system") or assumes it exists ("user").virtual_table_def — Optional Vec0TableDef to create a vec0 virtual table instead of a regular table.Returns: A pending TableTarget. Use the convenience wrapper await sqlite.mount_table_target(SQLITE_DB, table_name, table_schema) to resolve.
Once a TableTarget is resolved, declare rows to be upserted:
def TableTarget.declare_row(
self,
*,
row: RowT,
) -> None
Parameters:
row — A row object (dict, dataclass, NamedTuple, or Pydantic model). Must include all primary key columns.Define the table structure using a Python class (dataclass, NamedTuple, or Pydantic model):
@classmethod
async def TableSchema.from_class(
cls,
record_type: type[RowT],
primary_key: list[str],
*,
column_overrides: dict[str, SqliteType | VectorSchemaProvider] | None = None,
) -> TableSchema[RowT]
Parameters:
record_type — A record type whose fields define table columns.primary_key — List of column names forming the primary key.column_overrides — Optional per-column overrides for type mapping or vector configuration.Example:
@dataclass
class OutputProduct:
category: str
name: str
price: float
embedding: Annotated[NDArray, embedder]
schema = await sqlite.TableSchema.from_class(
OutputProduct,
primary_key=["category", "name"],
)
Python types are automatically mapped to SQLite type affinities:
| Python Type | SQLite Type |
|---|---|
bool | INTEGER (0/1) |
int | INTEGER |
float | REAL |
decimal.Decimal | TEXT |
str | TEXT |
bytes | BLOB |
uuid.UUID | TEXT |
datetime.date | TEXT (ISO format) |
datetime.time | TEXT (ISO format) |
datetime.datetime | TEXT (ISO format) |
datetime.timedelta | REAL (total seconds) |
list, dict, nested structs | TEXT (JSON) |
NDArray (with vector schema) | float[N] (sqlite-vec type, e.g., float[384]) |
To override the default mapping, provide a SqliteType or VectorSchemaProvider via:
typing.Annotated on the fieldcolumn_overrides — passing overrides when constructing TableSchemaUse SqliteType to specify a custom SQLite type and optional encoder:
from typing import Annotated
from cocoindex.connectors.sqlite import SqliteType
@dataclass
class MyRow:
id: int
value: Annotated[float, SqliteType("REAL")]
data: Annotated[dict, SqliteType("TEXT", encoder=lambda v: json.dumps(v))]
Or via column_overrides:
schema = sqlite.TableSchema(
MyRow,
primary_key=["id"],
column_overrides={
"data": sqlite.SqliteType("TEXT", encoder=lambda v: json.dumps(v)),
},
)
For NDArray fields, a VectorSchemaProvider annotation specifies the vector dimension and dtype. Vectors are stored as BLOBs in sqlite-vec compatible float32 format. See Vector Schema for the full list of annotation options (ContextKey, embedder instance, or explicit VectorSchema).
Define columns directly using ColumnDef:
def TableSchema.__init__(
self,
columns: dict[str, ColumnDef],
primary_key: list[str],
) -> None
Example:
schema = sqlite.TableSchema(
{
"category": sqlite.ColumnDef(type="TEXT", nullable=False),
"name": sqlite.ColumnDef(type="TEXT", nullable=False),
"price": sqlite.ColumnDef(type="REAL"),
"embedding": sqlite.ColumnDef(type="float[384]"), # sqlite-vec vector type
},
primary_key=["category", "name"],
)
SQLite virtual tables allow custom storage backends and specialized functionality. The sqlite connector supports creating virtual tables through the same declare_table_target() API used for regular tables.
The vec0 module from sqlite-vec provides optimized vector storage for similarity search. Use vec0 virtual tables when:
Requirements:
INTEGER primary key columnfloat[N] vector columnload_vec=True)Configure vec0-specific features using Vec0TableDef:
from cocoindex.connectors.sqlite import Vec0TableDef
virtual_table_def = Vec0TableDef(
partition_key_columns=["category"], # Optional: partition index by these columns
auxiliary_columns=["metadata"], # Optional: columns excluded from KNN filters
)
Parameters:
partition_key_columns — List of column names used to partition the vector index. Queries can filter by partition keys efficiently. Multiple partition keys create a composite partition.auxiliary_columns — List of column names to mark as auxiliary (stored but not usable in KNN filters). Useful for metadata that doesn't need to participate in similarity search.Pass virtual_table_def to declare_table_target():
@dataclass
class VectorDocument:
id: int
category: str
content: str
embedding: Annotated[NDArray, embedder] # e.g., float[384]
metadata: str
# Create vec0 virtual table with partition key and auxiliary column
table = await sqlite.mount_table_target(
SQLITE_DB,
"documents",
await sqlite.TableSchema.from_class(
VectorDocument,
primary_key=["id"],
),
virtual_table_def=sqlite.Vec0TableDef(
partition_key_columns=["category"],
auxiliary_columns=["metadata"],
),
)
:::warning[Current Limitations] Vec0 virtual tables have the following limitations:
virtual_table_def settings), the table will be recreated and existing data will be lost.A future schema versioning mechanism will allow preserving row data across table recreations. :::
import cocoindex as coco
from cocoindex.connectors import sqlite
DATABASE_PATH = "mydb.sqlite"
SQLITE_DB = coco.ContextKey[sqlite.ManagedConnection]("main_db")
@dataclass
class OutputProduct:
category: str
name: str
description: str
embedding: Annotated[NDArray, embedder]
@coco.lifespan
def coco_lifespan(builder: coco.EnvironmentBuilder) -> Iterator[None]:
with sqlite.managed_connection(DATABASE_PATH, load_vec=True) as conn: # Enable vector support
builder.provide(SQLITE_DB, conn)
yield
@coco.fn
async def app_main() -> None:
# Declare table target state
table = await sqlite.mount_table_target(
SQLITE_DB,
"products",
await sqlite.TableSchema.from_class(
OutputProduct,
primary_key=["category", "name"],
),
)
# Declare rows
for product in products:
table.declare_row(row=product)
import cocoindex as coco
from cocoindex.connectors import sqlite
from cocoindex.ops.sentence_transformers import SentenceTransformerEmbedder
from dataclasses import dataclass
from typing import Annotated
from numpy.typing import NDArray
DATABASE_PATH = "vectors.sqlite"
SQLITE_DB = coco.ContextKey[sqlite.ManagedConnection]("vec_db")
embedder = SentenceTransformerEmbedder("sentence-transformers/all-MiniLM-L6-v2")
@dataclass
class VectorDocument:
id: int
category: str
title: str
content: str
embedding: Annotated[NDArray, embedder] # float[384]
metadata: str # Will be marked as auxiliary
@coco.lifespan
def coco_lifespan(builder: coco.EnvironmentBuilder) -> Iterator[None]:
with sqlite.managed_connection(DATABASE_PATH, load_vec=True) as conn:
builder.provide(SQLITE_DB, conn)
yield
@coco.fn
async def app_main() -> None:
# Create vec0 virtual table with partition key and auxiliary column
table = await sqlite.mount_table_target(
SQLITE_DB,
"documents",
await sqlite.TableSchema.from_class(
VectorDocument,
primary_key=["id"],
),
virtual_table_def=sqlite.Vec0TableDef(
partition_key_columns=["category"], # Partition index by category
auxiliary_columns=["metadata"], # Store but don't index for KNN
),
)
# Declare document rows
docs = [
VectorDocument(
id=1,
category="tech",
title="Introduction to AI",
content="Artificial intelligence is...",
embedding=await embedder.embed("Artificial intelligence is..."),
metadata='{"source": "blog", "date": "2025-01-15"}',
),
# ... more documents
]
for doc in docs:
table.declare_row(row=doc)