contrib/starrocks-python-client/README.md
A StarRocks client for the Python programming language, including a SQLAlchemy dialect and an Alembic extension.
StarRocks is a next-generation data platform designed for fast,real-time analytics. This package allows developers to interact with StarRocks using Python, leveraging SQLAlchemy's powerful ORM and expression language, and managing database schema migrations with Alembic.
pip install starrocks
Python >= 3.10, <= 3.14
It is highly recommended to install starrocks in a virtual environment to avoid conflicts with system-wide packages.
Mac/Linux:
pip install virtualenv
virtualenv <your-env-name>
source <your-env-name>/bin/activate
<your-env-name>/bin/pip install starrocks
Windows:
pip install virtualenv
virtualenv <your-env-name>
<your-env-name>\Scripts\activate
<your-env-name>\Scripts\pip.exe install starrocks
To connect to StarRocks, use the SQLAlchemy connection string format:
starrocks://<User>:<Password>@<Host>:<Port>/[<Catalog>.]<Database>
Or, for an asynchronous connection, use asyncmy driver:
starrocks+asyncmy://<User>:<Password>@<Host>:<Port>/[<Catalog>.]<Database>
Note: The
Catalogcan be omitted and is managed by StarRocks. The default isdefault_catalog.
Connect to your database and do a query.
from sqlalchemy import create_engine, text
engine = create_engine('starrocks://root@localhost:9030/mydatabase')
# make sure you have created the table `mytable` in `mydatabase`.
with engine.connect() as connection:
print("Connection successful!")
rows = connection.execute(text("SELECT * FROM mytable LIMIT 2")).fetchall()
print(rows)
# async version
import asyncio
from sqlalchemy.ext.asyncio import create_async_engine
engine = create_async_engine('starrocks+asyncmy://root@localhost:9030/mydatabase')
async def async_query():
async with engine.connect() as connection:
print("Connection successful!")
rows = await connection.execute(text("SELECT * FROM mytable LIMIT 2")).fetchall()
print(rows)
await engine.dispose()
asyncio.run(async_query())
You can define a table with StarRocks-specific attributes using SQLAlchemy's ORM declarative style: orm-quickstart.
from sqlalchemy import Column
from sqlalchemy.orm import Mapped, declarative_base, mapped_column
from starrocks import INTEGER, STRING
Base = declarative_base()
class MyTable(Base):
__tablename__ = 'my_orm_table'
id: Mapped[int] = mapped_column(INTEGER, primary_key=True)
name: Mapped[str] = mapped_column(STRING)
__table_args__ = {
'comment': 'table comment',
'starrocks_primary_key': 'id',
'starrocks_distributed_by': 'HASH(id) BUCKETS 10',
'starrocks_properties': {'replication_num': '1'}
}
# Create the table in the database
Base.metadata.create_all(engine)
Alternatively, you can use SQLAlchemy Core to define tables programmatically.
from sqlalchemy import Column, MetaData, Table
from starrocks import INTEGER, VARCHAR
metadata = MetaData()
my_core_table = Table(
'my_core_table',
metadata,
Column('id', INTEGER, primary_key=True),
Column('name', VARCHAR(50)),
# StarRocks-specific arguments
starrocks_primary_key='id',
starrocks_distributed_by='HASH(id) BUCKETS 10',
starrocks_properties={"replication_num": "1"}
)
# Create the table in the database
metadata.create_all(engine)
For a complete guide on defining tables (Core and ORM), executing queries, and using advanced features, please see the SQLAlchemy Usage Guide.
For a detailed reference on all StarRocks-specific table attributes and data types, please see the Table Definition Reference.
Create a View and a Materialized View using the StarRocks helpers. These behave like SQLAlchemy Table objects and are created with metadata.create_all(engine).
from sqlalchemy import MetaData, text
from starrocks.sql.schema import View, MaterializedView
metadata = MetaData()
# Create a simple View (columns inferred from SELECT)
user_view = View(
'user_view',
metadata,
definition='SELECT id, name FROM my_core_table WHERE name IS NOT NULL',
comment='Active users'
)
# Create a simple Materialized View (asynchronous refresh)
user_stats_mv = MaterializedView(
'user_stats_mv',
metadata,
definition='SELECT id, COUNT(*) AS cnt FROM my_core_table GROUP BY id',
starrocks_refresh='ASYNC'
)
# Create the view and MV in the database
metadata.create_all(engine)
# Query the view or MV like normal tables
with engine.connect() as conn:
rows = conn.execute(text("SELECT * FROM user_view LIMIT 5")).fetchall()
print(rows)
You can refer to Views Definition Reference and Materialized View Definition Reference for more detailed information.
This dialect integrates with Alembic to support automated schema migrations. Here’s a quick-start guide to get you up and running.
If you already have tables/views/materialized views in your StarRocks database, you can generate models.py (or a consolidated models file) using sqlacodegen.
sqlacodegen --options include_dialect_options,keep_dialect_types \
starrocks://root@localhost:9030 > models.py
Refer to generating models and sqlacodegen for more options and features.
pip install "alembic>=1.16"
alembic init alembic
In alembic.ini, set the sqlalchemy.url to your StarRocks connection string.
# alembic.ini
sqlalchemy.url = starrocks://root@localhost:9030/mydatabase
It's better to print the log from this starrocks-sqlalchemy when runing alembic command. You can add following logging configration in the alembic.ini file.
# alembic.ini
[loggers]
# Append starrocks model at the following line
# keys = root,sqlalchemy,alembic
keys = root,sqlalchemy,alembic,starrocks
# Add following lines after `[logger_alembic]` section
[logger_starrocks]
level = INFO
handlers =
qualname = starrocks
In alembic/env.py, import your models' metadata and assign it to target_metadata.
# alembic/env.py
# Add these imports
from myapp.models import Base # Adjust to your models' location
from starrocks.alembic import render_column_type, include_object_for_view_mv
# ...
# And set the target_metadata
target_metadata = Base.metadata
def run_migrations_online() -> None:
# ... inside this function
context.configure(
# ...
render_item=render_column_type, # Add this line (required for column comparison)
include_object=include_object_for_view_mv # Add this line (required for View/MV support)
)
# ...
Note: For advanced filtering options (e.g., excluding temporary tables), see the Alembic Integration Guide.
With your models defined (as shown in the SQLAlchemy examples above), you can now generate and apply a migration.
# Generate the migration script
alembic revision --autogenerate -m "Create initial tables"
# Apply the migration to the database
alembic upgrade head
For a full tutorial on advanced topics like data migrations, handling complex types, and managing views, please refer to the Alembic Integration Guide.
To run tests for the StarRocks SQLAlchemy dialect, first install the package in editable mode along with its testing dependencies:
pip install -e .
pip install pytest mock
Then, you can run the test suite using pytest:
pytest
This will run the standard SQLAlchemy dialect test suite as well as StarRocks-specific tests. For more details, please check SQLAlchemy's guide for dialect development.
To see the raw SQL that the dialect compiles and executes during tests, you can modify the [tool.pytest.ini_options] section in the pyproject.toml file in your project with the following content:
[tool.pytest.ini_options]
log_cli = true
log_cli_level = DEBUG
log_cli_format = %(levelname)-5.5s [%(name)s] %(message)s
To run the integration and system tests, you must have a running StarRocks cluster. The tests require a connection URL to be provided via the STARROCKS_URL environment variable.
Set up your StarRocks database:
Ensure your StarRocks instance is running and you have a database available for testing (e.g., test_sqla).
CREATE DATABASE IF NOT EXISTS test_sqla;
-- set it if you're testing cases on small shared-nothing clusters
ADMIN SET FRONTEND CONFIG ("default_replication_num" = "1");
Configure the connection URL:
Set the STARROCKS_URL environment variable in your shell. The format should be:
export STARROCKS_URL="starrocks://<User>:<Password>@<Host>:<Port>/<Database>"
For example (the default url will be this if you don't set it):
export STARROCKS_URL="starrocks://[email protected]:9030/test_sqla"
Run the tests:
With the development environment set up and the environment variable configured, you can run the tests using pytest:
pytest
In addition to the StarRocks-specific tests, you can run the comprehensive test suite provided by SQLAlchemy to ensure full compatibility. To do this, you need to enable the SQLAlchemy test plugin in test/conftest.py.
Enable the SQLAlchemy test plugin:
Open the file test/conftest.py and uncomment the line that imports from sqlalchemy.testing.plugin.pytestplugin.
# test/conftest.py
# ... other imports
# To run the test_suite.py and full SQLAlchemy test suite, uncomment the following line:
from sqlalchemy.testing.plugin.pytestplugin import *
NOTE: If you enable this line, StarRocks-specific tests will not run. We will modify the StarRocks-specific tests using SQLAlchemy's test framework in the future.
You can enable SQLAlchemy's tests from the file test/test_suite.py by uncomment the following line:
# To run the full SQLAlchemy test suite, uncomment the following line:
from sqlalchemy.testing.suite import *
Run the tests:
pytest test/test_suite.py
This will run the standard SQLAlchemy dialect test suite as well as StarRocks-specific tests. For more details, please check SQLAlchemy's guide for dialect development.
Note: After running the full suite, it's good practice to re-comment the line in test/conftest.py to keep standard test runs focused on the dialect-specific tests.
To build and deploy the package, you'll need the build and twine tools. You can install them using pip:
pip install build twine
Once installed, follow these steps to build and release a new version:
Build the package:
This command packages your project into distribution files (.tar.gz for source and .whl for a built distribution) and places them in the dist/ directory.
python3 -m build
The main outputs are dist/starrocks-$version.tar.gz and dist/starrocks-$version-py3-none-any.whl.
Upload to the test environment (TestPyPI):
Before publishing to the official PyPI, it's a good practice to upload to TestPyPI to ensure everything works as expected.
python3 -m twine upload --repository testpypi dist/*
Test the package from TestPyPI:
Install the package from TestPyPI to verify that it was uploaded correctly and can be installed by others.
python3 -m pip install --index-url https://test.pypi.org/simple/ starrocks
Upload to the production environment (PyPI):
After verifying the package on TestPyPI, upload it to the official Python Package Index (PyPI) to make it publicly available.
twine upload dist/*