docs/en/integrations/smt/starrocks_sqlalchemy.md
This guide introduces how to manage StarRocks schemas using the Python ecosystem — including SQLAlchemy, Alembic, and sqlacodegen — through the starrocks SQLAlchemy dialect. It is designed to help you understand why schema migration is useful and how to use it effectively with StarRocks.
Many users manage StarRocks tables, views, and materialized views using SQL DDL directly. However, as projects grow, manually maintaining ALTER TABLE statements becomes error-prone and hard to track.
The StarRocks SQLAlchemy dialect (starrocks) provides:
This allows Python users to maintain StarRocks schemas in a declarative, version-controlled, and automated way.
Although schema migration is traditionally associated with OLTP databases, it is also valuable in data warehousing systems such as StarRocks. Teams use Alembic together with the StarRocks dialect because of the benefits listed below.
Once you define schema in Python ORM models or SQLAlchemy core style, writing ALTER TABLE statements manually is no longer required.
Alembic compares current StarRocks schema with your SQLAlchemy models, and generates migration scripts automatically (CREATE/DROP/ALTER).
Each schema change becomes a migration file (Python), so users can track changes and roll back if needed.
Schema changes can be applied to development, staging, and production with the same process.
SQLAlchemy: 1.4 or later (SQLAlchemy 2.0 is recommended and is required to use sqlacodegen)Alembic: 1.16 or laterRun the following command to install the StarRocks Python client.
pip install starrocks
Connect to your StarRocks cluster using the following URL.
starrocks://<user>:<password>@<FE_host>:<query_port>/[<catalog>.]<database>
user: Username used to connect to the cluster.password: User password.FE_host: FE IP address.query_port: FE query_port (Default: 9030).catalog: The name of the catalog where your database locates.database: The name of the database you want to connect.After installation, you can quickly validate connectivity using the following code example:
from sqlalchemy import create_engine, text
# you need to create `mydatabase` first
engine = create_engine("starrocks://root@localhost:9030/mydatabase")
with engine.connect() as conn:
conn.execute(text("SELECT 1")).fetchall()
print("Connection successful!")
The StarRocks dialect supports:
It also supports StarRocks-specific table attributes such as:
ENGINE (OLAP)DUPLICATE KEY, PRIMARY KEY, UNIQUE KEY, AGGREGATE KEY)PARTITION BY variants (RANGE / LIST / Expression partitioning)DISTRIBUTED BY variants (HASH / RANDOM)ORDER BYreplication_num, storage_medium):::important
starrocks_.starrocks_ prefix must be lowercase. The suffix is accepted in either case (for example, PRIMARY_KEY and primary_key).starrocks_primary_key="id"), the involved columns must also be marked with primary_key=True in Column(...), so that SQLAlchemy metadata and Alembic autogenerate can behave correctly.
:::Examples below reflect the real public API and parameter names.
StarRocks table options can be specified in both ORM (via __table_args__) and Core (via Table(..., starrocks_...=...)) styles.
from sqlalchemy import create_engine
from sqlalchemy.orm import Mapped, declarative_base, mapped_column
from starrocks import INTEGER, STRING
# with the same engine as the quick test
engine = create_engine("starrocks://root@localhost:9030/mydatabase")
Base = declarative_base()
class MyTable(Base):
__tablename__ = 'my_orm_table'
id: Mapped[int] = mapped_column(INTEGER, primary_key=True)
name: Mapped[str] = mapped_column(STRING)
__table_args__ = {
'comment': 'table comment',
'starrocks_primary_key': 'id',
'starrocks_distributed_by': 'HASH(id) BUCKETS 10',
'starrocks_properties': {'replication_num': '1'}
}
# Create the table in the database
Base.metadata.create_all(engine)
from sqlalchemy import Column, MetaData, Table, create_engine
from starrocks import INTEGER, VARCHAR
# with the same engine as the quick test
engine = create_engine("starrocks://root@localhost:9030/mydatabase")
metadata = MetaData()
my_core_table = Table(
'my_core_table',
metadata,
Column('id', INTEGER, primary_key=True),
Column('name', VARCHAR(50)),
# StarRocks-specific arguments
starrocks_primary_key='id',
starrocks_distributed_by='HASH(id) BUCKETS 10',
starrocks_properties={"replication_num": "1"}
)
# Create the table in the database
metadata.create_all(engine)
:::note For a comprehensive reference of table attributes and data types, see Reference [4]. :::
Below is the recommended view definition style, using columns as a list of dicts (name/comment). This example is based on an existing table my_core_table.
from starrocks.schema import View
# Reuse the metadata from the Core table example above
metadata = my_core_table.metadata
user_view = View(
"user_view",
metadata,
definition="SELECT id, name FROM my_core_table WHERE name IS NOT NULL",
columns=[
{"name": "id", "comment": "ID"},
{"name": "name", "comment": "Name"},
],
comment="Active users",
)
:::note For more View options and limitations, see Reference [5]. :::
Materialized views are defined similarly. The starrocks_refresh property is a syntax string that indicates the refresh strategy.
from starrocks.schema import MaterializedView
# Reuse the metadata from the Core table example above
metadata = my_core_table.metadata
# Create a simple Materialized View (asynchronous refresh)
user_stats_ = MaterializedView(
'user_stats_',
metadata,
definition='SELECT id, COUNT(*) AS cnt FROM my_core_table GROUP BY id',
starrocks_refresh='ASYNC'
)
:::note For more options and ALTER limitations, see Reference [6]. :::
The StarRocks SQLAlchemy dialect provides full support for:
This enables Alembic’s autogenerate to work properly.
Initialize Alembic:
alembic init migrations
Configure your database URL in alembic.ini:
# alembic.ini
sqlalchemy.url = starrocks://<user>:<password>@<FE_host>:<query_port>/[<catalog>.]<database>
Enable StarRocks dialect logging (optional):
You can enable the starrocks logger in alembic.ini to observe the detected changes of a table via logs. For details, see Reference [2].
Edit env.py (configure both offline and online paths):
from alembic import context
from starrocks.alembic import render_column_type, include_object_for_view_
from starrocks.alembic.starrocks import StarRocksImpl # noqa: F401 (ensure impl registered)
from myapp.models import Base # adjust to your project
target_metadata = Base.metadata
def run_migrations_offline() -> None:
url = context.config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
render_item=render_column_type,
include_object=include_object_for_view_
)
with context.begin_transaction():
context.run_migrations()
def run_migrations_online() -> None:
# ... create engine and connect as in alembic default env.py ...
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
render_item=render_column_type,
include_object=include_object_for_view_
)
with context.begin_transaction():
context.run_migrations()
alembic revision --autogenerate -m "initial schema"
Alembic will compare SQLAlchemy models with the actual StarRocks schema, and output the correct DDL.
alembic upgrade head
Downgrade is also supported (where reversible).
:::important StarRocks DDL is not transactional across multiple statements. If an upgrade fails midway, you may need to inspect what has already been applied and perform manual remediation (for example, write a compensating migration or run manual DDL) before re-running. :::
The dialect supports Alembic autogenerate for:
starrocks_* (within StarRocks ALTER support)Some StarRocks DDL changes are not reversible or not alterable. You can only make these changes by dropping and recreate the table/view/materialized view. If you specify these changes in the dialect, autogenerate will warn or raise.
This section shows a runnable end-to-end workflow, including where to pause and review generated files.
mkdir my_sr_alembic_project
cd my_sr_alembic_project
alembic init alembic
alembic.iniEdit the URL in alembic.ini:
sqlalchemy.url = starrocks://root@localhost:9030/mydatabase
Create a package for your models:
mkdir -p myapp
touch myapp/__init__.py
Create myapp/models.py and put your table/view/materialized view definitions in the package:
:::note
When using Alembic migrations, do not call metadata.create_all(engine) in your models module.
:::
from sqlalchemy import Column, Table
from sqlalchemy.orm import Mapped, declarative_base, mapped_column
from starrocks import INTEGER, STRING, VARCHAR
from starrocks.schema import MaterializedView, View
Base = declarative_base()
# --- ORM table ---
class MyOrmTable(Base):
__tablename__ = "my_orm_table"
id: Mapped[int] = mapped_column(INTEGER, primary_key=True)
name: Mapped[str] = mapped_column(STRING)
__table_args__ = {
"comment": "table comment",
"starrocks_primary_key": "id",
"starrocks_distributed_by": "HASH(id) BUCKETS 10",
"starrocks_properties": {"replication_num": "1"},
}
# --- Core table on the same metadata (important for Alembic target_metadata) ---
my_core_table = Table(
"my_core_table",
Base.metadata,
Column("id", INTEGER, primary_key=True),
Column("name", VARCHAR(50)),
comment="core table comment",
starrocks_primary_key="id",
starrocks_distributed_by="HASH(id) BUCKETS 10",
starrocks_properties={"replication_num": "1"},
)
# --- View ---
user_view = View(
"user_view",
Base.metadata,
definition="SELECT id, name FROM my_core_table WHERE name IS NOT NULL",
columns=[
{"name": "id", "comment": "ID"},
{"name": "name", "comment": "Name"},
],
comment="Active users",
)
# --- Materialized View ---
user_stats_mv = MaterializedView(
"user_stats_mv",
Base.metadata,
definition="SELECT id, COUNT(*) AS cnt FROM my_core_table GROUP BY id",
starrocks_refresh="ASYNC",
)
env.py for autogenerateEdit alembic/env.py:
myapp.models to set the target_metadata.render_column_type, and include_object_for_view_mv to set them in both run_migrations_offline() and run_migrations_online() to properly handle views and MVs, and to properly render StarRocks column types.:::note
You need to add or modify these lines in env.py, rather than replace the generated env.py file.
:::
from alembic import context
from starrocks.alembic import render_column_type, include_object_for_view_mv
from starrocks.alembic.starrocks import StarRocksImpl # noqa: F401
from myapp.models import Base
target_metadata = Base.metadata
# Optional: set version table replication for single-BE dev clusters
version_table_kwargs = {"starrocks_properties": {"replication_num": "1"}}
# In both run_migrations_offline() and run_migrations_online(), ensure:
def run_migrations_offline() -> None:
url = context.config.get_main_option("sqlalchemy.url")
context.configure(
url=url,
target_metadata=target_metadata,
literal_binds=True,
render_item=render_column_type,
include_object=include_object_for_view_mv,
version_table_kwargs=version_table_kwargs,
)
def run_migrations_online() -> None:
# ... create engine and connect as in alembic default env.py ...
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
render_item=render_column_type,
include_object=include_object_for_view_mv,
version_table_kwargs=version_table_kwargs,
)
alembic revision --autogenerate -m "create initial schema"
Pause and review:
alembic/versions/.create_table, create_view, create_materialized_view).Preview SQL:
alembic upgrade head --sql
Pause and review:
Apply:
alembic upgrade head
:::important StarRocks DDL is not transactional across multiple statements. If an upgrade fails midway, you may need to inspect what has already been applied and perform manual remediation before re-running. :::
Update myapp/models.py to:
my_core_table): add a column, or update the table comment, and change one table property.my_new_table).:::note Adding a column can be a time-consuming schema change. StarRocks allows only one running schema change job per table at a time. In practice, it is recommended to keep “add/drop/modify columns” changes separate from other heavy changes (for example, additional add/drop columns or mass property changes), and split them into multiple Alembic revisions if needed. :::
from sqlalchemy import Column, Table
from starrocks import INTEGER, VARCHAR
# Modify an existing table (add a column)
# (Update the existing my_core_table definition in-place.)
my_core_table = Table(
"my_core_table",
Base.metadata,
Column("id", INTEGER, primary_key=True),
Column("name", VARCHAR(50)),
Column("age", INTEGER), # added column only
starrocks_primary_key='id',
starrocks_distributed_by='HASH(id) BUCKETS 10',
starrocks_properties={"replication_num": "1"},
)
my_new_table = Table(
"my_new_table",
Base.metadata,
Column("id", INTEGER, primary_key=True),
Column("name", VARCHAR(50)),
starrocks_primary_key="id",
starrocks_distributed_by="HASH(id) BUCKETS 10",
starrocks_properties={"replication_num": "1"},
)
alembic revision --autogenerate -m "add a new table, change a old table"
Pause and review:
Make sure the new migration contains:
create_table(...) for my_new_table, andmy_core_table changes (for example, add column / set comment / set properties).Preview SQL and apply:
alembic upgrade head --sql
alembic upgrade head
sqlacodegen can reverse-generate SQLAlchemy models directly from StarRocks:
sqlacodegen --options include_dialect_options,keep_dialect_types \
--generator tables \
starrocks://<user>:<password>@<FE_host>:<query_port>/[catalog.]<database> > models.py
Supported objects:
This is useful when onboarding an existing StarRocks schema into Alembic.
You can directly use above command to generate the Python script for tables/views/materialized views defined in the End-to-End Example section.
:::note
--generator tables when generating Core-style models (ORM generators may reorder columns according to NOT NULL / NULL attribute).NOT NULL. If you want them nullable, adjust the generated model manually.
:::ALTER TABLE; use an explicit plan (usually dropping and recreating with backfill).BUCKETS clause, StarRocks may auto-assign bucket count; the dialect is designed to avoid noisy diffs in that case.With the StarRocks SQLAlchemy dialect and Alembic integration, you can:
This brings StarRocks schema management into the modern Python data engineering ecosystem and significantly simplifies cross-environment schema consistency.
[1]: starrocks-python-client README
[2]: Alembic Integration
[3]: SQLAlchemy details
[4]: Table Support
[5]: View Support