docs/en/best_practices/bucketing.md
A concise field guide to choosing between Hash Bucketing and Random Bucketing in StarRocks, including their mechanics, trade‑offs, and recommended use cases.
| Aspect | Hash Bucketing | Random Bucketing |
|---|---|---|
| Example | DISTRIBUTED BY HASH(id) BUCKETS 16 | DISTRIBUTED BY RANDOM |
| Key declaration | Required HASH(col1, …) | None – rows assigned round‑robin |
| Initial bucket count when omitted | Auto‑chosen at CREATE, then fixed | Auto‑chosen at CREATE; can grow if bucket_size set |
| Tablet split / shrink | Manual ALTER … BUCKETS | Automatic split ⇢ growth only (≥ v3.2) |
| Skew resistance | Depends on key cardinality | High – uniform by design |
| Bucket pruning | ✅ (filters, joins) | 🚫 (full tablet scan) |
| Colocate joins | ✅ | 🚫 |
| Local aggregation / bucket-shuffle joins | ✅ | 🚫 |
| Supported table types | All | Duplicate Key tables only |
Rows are assigned to tablets by hashing one or more columns. Tablet count is fixed after creation unless manually altered.
-- Fact table partitioned and hash‑bucketed by (customer_id)
CREATE TABLE sales (
sale_id bigint,
customer_id int,
sale_date date,
amount decimal(10,2)
) ENGINE = OLAP
DISTRIBUTED BY HASH(customer_id) BUCKETS 48
PARTITION BY date_trunc('DAY', sale_date)
PROPERTIES ("colocate_with" = "group1");
-- Dimension table hash‑bucketed on the same key and bucket count colocated with the sales table
CREATE TABLE customers (
customer_id int,
region varchar(32),
status tinyint
) ENGINE = OLAP
DISTRIBUTED BY HASH(customer_id) BUCKETS 48
PROPERTIES ("colocate_with" = "group1");
-- StarRocks can do tablet pruning
SELECT sum(amount)
FROM sales
WHERE customer_id = 123
-- StarRocks can do local aggregation
SELECT customer_id, sum(amount) AS total_amount
FROM sales
GROUP BY customer_id
ORDER BY total_amount DESC LIMIT 100;
-- StarRocks can do colocate join
SELECT c.region, sum(s.amount)
FROM sales s JOIN customers c USING (customer_id)
WHERE s.sale_date BETWEEN '2025-01-01' AND '2025-01-31'
GROUP BY c.region;
WHERE customer_id = 123 enables bucket pruning, allowing the query to access only a single tablet, which lowers latency & CPU cycles, especially for point-lookups.Rows are assigned round‑robin; no key specified. With PROPERTIES ("bucket_size"="<bytes>"), StarRocks dynamically splits tablets as partitions grow (v3.2+).