docs/en/integrations/dbt.md
import Experimental from '../_assets/commonMarkdown/_experimental.mdx'
dbt-starrocks enables the use of dbt to transform data in StarRocks using dbt's modeling patterns and best practices.
dbt-starrocks GitHub repo.
| StarRocks >= 3.1 | StarRocks >= 3.4 | Feature |
|---|---|---|
| ✅ | ✅ | Table materialization |
| ✅ | ✅ | View materialization |
| ✅ | ✅ | Materialized View materialization |
| ✅ | ✅ | Incremental materialization |
| ✅ | ✅ | Primary Key Model |
| ✅ | ✅ | Sources |
| ✅ | ✅ | Custom data tests |
| ✅ | ✅ | Docs generate |
| ✅ | ✅ | Expression Partition |
| ❌ | ❌ | Kafka |
| ❌ | ✅ | Dynamic Overwrite |
* | ✅ | Submit task |
| ✅ | ✅ | Microbatch (Insert Overwrite) |
| ❌ | ✅ | Microbatch (Dynamic Overwrite) |
* Verify the specific submit task support for your version, see SUBMIT TASK
Install the StarRocks DBT adapter using pip:
pip install dbt-starrocks
Verify the installation by checking the version:
dbt --version
This should list starrocks under plugins.
Create or update profiles.yml with StarRocks-specific settings.
starrocks_project:
target: dev
outputs:
dev:
type: starrocks
host: your-starrocks-host.com
port: 9030
schema: your_database
username: your_username
password: your_password
catalog: test_catalog
typeDescription: The specific adapter to use, this must be set to starrocks
Required?: Required
Example: starrocks
hostDescription: The hostname to connect to
Required?: Required
Example: 192.168.100.28
portDescription: The port to use
Required?: Required
Example: 9030
catalogDescription: Specify the catalog to build models into
Required?: Optional
Example: default_catalog
schemaDescription: Specify the schema (database in StarRocks) to build models into
Required?: Required
Example: analytics
usernameDescription: The username to use to connect to the server
Required?: Required
Example: dbt_admin
passwordDescription: The password to use for authenticating to the server
Required?: Required
Example: correct-horse-battery-staple
versionDescription: Let Plugin try to go to a compatible starrocks version
Required?: Optional
Example: 3.1.0
use_pureDescription: set to "true" to use C extensions
Required?: Optional
Example: true
is_asyncDescription: "true" to submit suitable tasks as etl tasks.
Required?: Optional
Example: true
async_query_timeoutDescription: Sets the query_timeout value when submitting a task to StarRocks
Required?: Optional
Example: 300
Create or update sources.yml
sources:
- name: your_source
database: your_sr_catalog
schema: your_sr_database
tables:
- name: your_table
If the catalog is not specified in the schema, it will default to the catalog defined in the profile. Using the profile from earlier, if catalog is not defined, the model will assume the source is located at test_catalog.your_sr_database.
Basic Table Configuration
{{ config(
materialized='table',
engine='OLAP',
keys=['id', 'name', 'created_date'],
table_type='PRIMARY',
distributed_by=['id'],
buckets=3,
partition_by=['created_date'],
properties=[
{"replication_num": "1"}
]
) }}
SELECT
id,
name,
email,
created_date,
last_modified_date
FROM {{ source('your_source', 'users') }}
OLAP)PRIMARY: Primary key model (supports upserts and deletes)DUPLICATE: Duplicate key model (allows duplicate rows)UNIQUE: Unique key model (enforces uniqueness)distributed_by: Columns for hash distributionbuckets: Number of buckets for data distribution (leave empty for auto bucketing)partition_by: Columns for table partitioningpartition_by_init: Initial partition definitionsproperties: Additional StarRocks table propertiesThis example creates a materialized table in StarRocks containing aggregated data from an external Hive catalog.
:::tip Configure the external catalog if it does not already exist:
CREATE EXTERNAL CATALOG `hive_external`
PROPERTIES (
"hive.metastore.uris" = "thrift://127.0.0.1:8087",
"type"="hive"
);
:::
{{ config(
materialized='table',
keys=['product_id', 'order_date'],
distributed_by=['product_id'],
partition_by=['order_date']
) }}
-- Aggregate data from Hive external catalog into StarRocks table
SELECT
h.product_id,
h.order_date,
COUNT(*) as order_count,
SUM(h.amount) as total_amount,
MAX(h.last_updated) as last_updated
FROM {{ source('hive_external', 'orders') }} h
GROUP BY
h.product_id,
h.order_date
{{
config(
materialized='table',
on_table_exists = 'replace',
partition_by=['order_date'],
properties={},
catalog='external_catalog',
database='test_db'
)
}}
SELECT * FROM {{ source('iceberg_external', 'orders') }}
The configuration for materialization to external catalogs supports fewer options. on_table_exists, partition_by, and properties are supported. If catalog and database are not set, the defaults from the profile will be used.
Incremental materializations are supported in StarRocks as well:
{{ config(
materialized='incremental',
unique_key='id',
table_type='PRIMARY',
keys=['id'],
distributed_by=['id'],
incremental_strategy='default'
) }}
SELECT
id,
user_id,
event_name,
event_timestamp,
properties
FROM {{ source('raw', 'events') }}
{% if is_incremental() %}
WHERE event_timestamp > (SELECT MAX(event_timestamp) FROM {{ this }})
{% endif %}
dbt-starrocks supports multiple incremental strategies:
append (default): Simply appends new records without deduplicationinsert_overwrite: Overwrites table partitions with insertiondynamic_overwrite: Overwrites, creates, and writes table partitionsFor more information about which overwrite strategy to use, see the INSERT documentation.
:::note Currently, incremental merge is not supported. :::
{{ source('external_source_name', 'table_name' }} macro.dbt seed was not tested for external catalogs and is not currently supported.dbt to create models in external databases that do not currently exist, the location of the models must be set through properties.