apps/docs/content/guides/database/partitions.mdx
Table partitioning is a technique that allows you to divide a large table into smaller, more manageable parts called “partitions”.
<Image alt="multi database" src={{ light: '/docs/img/database/partitions-light.png', dark: '/docs/img/database/partitions-dark.png', }} className="max-h-[400px] !mx-auto" width={1600} height={1075} />
Each partition contains a subset of the data based on a specified criteria, such as a range of values or a specific condition. Partitioning can significantly improve query performance and simplify data management for large datasets.
Postgres supports various partitioning methods based on how you want to partition your data. The commonly used methods are:
Let's consider an example of range partitioning for a sales table based on the order date. We'll create monthly partitions to store data for each month:
create table sales (
id bigint generated by default as identity,
order_date date not null,
customer_id bigint,
amount bigint,
-- We need to include all the
-- partitioning columns in constraints:
primary key (order_date, id)
)
partition by range (order_date);
create table sales_2000_01
partition of sales
for values from ('2000-01-01') to ('2000-02-01');
create table sales_2000_02
partition of sales
for values from ('2000-02-01') to ('2000-03-01');
To create a partitioned table you append partition by range (<column_name>) to the table creation statement. The column that you are partitioning with must be included in any unique index, which is the reason why we specify a composite primary key here (primary key (order_date, id)).
To query a partitioned table, you have two options:
When you query the parent table, Postgres automatically routes the query to the relevant partitions based on the conditions specified in the query. This allows you to retrieve data from all partitions simultaneously.
Example:
select *
from sales
where order_date >= '2000-01-01' and order_date < '2000-03-01';
This query will retrieve data from both the sales_2000_01 and sales_2000_02 partitions.
If you only need to retrieve data from a specific partition, you can directly query that partition instead of the parent table. This approach is useful when you want to target a specific range or condition within a partition.
select *
from sales_2000_02;
This query will retrieve data only from the sales_2000_02 partition.
There is no real threshold to determine when you should use partitions. Partitions introduce complexity, and complexity should be avoided until it's needed. A few guidelines:
Here are simple examples for each of the partitioning types in Postgres.
Let's consider a range partitioning example for a table that stores sales data based on the order date. We'll create monthly partitions to store data for each month.
In this example, the sales table is partitioned into two partitions: sales_january and sales_february. The data in these partitions is based on the specified range of order dates:
create table sales (
id bigint generated by default as identity,
order_date date not null,
customer_id bigint,
amount bigint,
-- We need to include all the
-- partitioning columns in constraints:
primary key (order_date, id)
)
partition by range (order_date);
create table sales_2000_01
partition of sales
for values from ('2000-01-01') to ('2000-02-01');
create table sales_2000_02
partition of sales
for values from ('2000-02-01') to ('2000-03-01');
Let's consider a list partitioning example for a table that stores customer data based on their region. We'll create partitions to store customers from different regions.
In this example, the customers table is partitioned into two partitions: customers_americas and customers_asia. The data in these partitions is based on the specified list of regions:
-- Create the partitioned table
create table customers (
id bigint generated by default as identity,
name text,
country text,
-- We need to include all the
-- partitioning columns in constraints:
primary key (country, id)
)
partition by list(country);
create table customers_americas
partition of customers
for values in ('US', 'CANADA');
create table customers_asia
partition of customers
for values in ('INDIA', 'CHINA', 'JAPAN');
You can use hash partitioning to evenly distribute data.
In this example, the products table is partitioned into two partitions: products_one and products_two. The data is distributed across these partitions using a hash function:
create table products (
id bigint generated by default as identity,
name text,
category text,
price bigint
)
partition by hash (id);
create table products_one
partition of products
for values with (modulus 2, remainder 1);
create table products_two
partition of products
for values with (modulus 2, remainder 0);
There are several other tools available for Postgres partitioning, most notably pg_partman. Native partitioning was introduced in Postgres 10 and is generally thought to have better performance.