Back to Polars

Set Operations

py-polars/docs/source/reference/sql/set_operations.rst

latest7.6 KB
Original Source

Set Operations

.. list-table:: :header-rows: 1 :widths: 20 60

    • Function
    • Description
    • :ref:EXCEPT <except>
    • Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second.
    • :ref:EXCEPT BY NAME <except_by_name>
    • Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second, aligning columns by name instead of by ordinal position.
    • :ref:INTERSECT <intersect>
    • Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets.
    • :ref:INTERSECT BY NAME <intersect_by_name>
    • Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets, aligning columns by name instead of by ordinal position.
    • :ref:UNION <union>
    • Combine the distinct result sets of two or more SELECT statements. The final result set will have no duplicate rows.
    • :ref:UNION ALL <union_all>
    • Combine the complete result sets of two or more SELECT statements. The final result set will be composed of all rows from each query.
    • :ref:UNION [ALL] BY NAME <union_by_name>
    • Combine the result sets of two or more SELECT statements, aligning columns by name instead of by ordinal position; if ALL is omitted the final result will have no duplicate rows. This also combines columns from both datasets.

.. _except:

EXCEPT

Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second.

Example:

.. code-block:: python

lf1 = pl.LazyFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
})
lf2 = pl.LazyFrame({
    "id": [2, 3, 4],
    "age": [30, 25, 45],
    "name": ["Bob", "Charlie", "David"],
})
pl.sql("""
    SELECT id, name FROM lf1
    EXCEPT
    SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (1, 2)
# ┌─────┬───────┐
# │ id  ┆ name  │
# │ --- ┆ ---   │
# │ i64 ┆ str   │
# ╞═════╪═══════╡
# │ 1   ┆ Alice │
# └─────┴───────┘

.. _except_by_name:

EXCEPT BY NAME

Combine the result sets of two SELECT statements, returning only the rows that appear in the first result set but not in the second, aligning columns by name instead of by ordinal position. This allows the two queries to have different column orders or different numbers of columns (only commonly-named columns are used for the comparison).

Example:

.. code-block:: python

lf1 = pl.LazyFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
})
lf2 = pl.LazyFrame({
    "name": ["Bob", "Charlie", "David"],
    "id": [2, 3, 4],
})
pl.sql("""
    SELECT * FROM lf1
    EXCEPT BY NAME
    SELECT * FROM lf2
""").sort(by="id").collect()
# shape: (1, 2)
# ┌─────┬───────┐
# │ id  ┆ name  │
# │ --- ┆ ---   │
# │ i64 ┆ str   │
# ╞═════╪═══════╡
# │ 1   ┆ Alice │
# └─────┴───────┘

.. _intersect:

INTERSECT

Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets.

Example:

.. code-block:: python

pl.sql("""
    SELECT id, name FROM lf1
    INTERSECT
    SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# └─────┴─────────┘

.. _intersect_by_name:

INTERSECT BY NAME

Combine the result sets of two SELECT statements, returning only the rows that appear in both result sets, aligning columns by name instead of by ordinal position. This allows the two queries to have different column orders or different numbers of columns (only commonly-named columns are used for the comparison).

Example:

.. code-block:: python

lf1 = pl.LazyFrame({
    "id": [1, 2, 3],
    "name": ["Alice", "Bob", "Charlie"],
})
lf2 = pl.LazyFrame({
    "name": ["Bob", "Charlie", "David"],
    "id": [2, 3, 4],
})
pl.sql("""
    SELECT * FROM lf1
    INTERSECT BY NAME
    SELECT * FROM lf2
""").sort(by="id").collect()
# shape: (2, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# └─────┴─────────┘

.. _union:

UNION

Combine the distinct result sets of two or more SELECT statements. The final result set will have no duplicate rows.

Example:

.. code-block:: python

pl.sql("""
    SELECT id, name FROM lf1
    UNION
    SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (4, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ Alice   │
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# │ 4   ┆ David   │
# └─────┴─────────┘

.. _union_all:

UNION ALL

Combine the complete result sets of two or more SELECT statements. The final result set will be composed of all rows from each query.

Example:

.. code-block:: python

pl.sql("""
    SELECT id, name FROM lf1
    UNION ALL
    SELECT id, name FROM lf2
""").sort(by="id").collect()
# shape: (6, 2)
# ┌─────┬─────────┐
# │ id  ┆ name    │
# │ --- ┆ ---     │
# │ i64 ┆ str     │
# ╞═════╪═════════╡
# │ 1   ┆ Alice   │
# │ 2   ┆ Bob     │
# │ 2   ┆ Bob     │
# │ 3   ┆ Charlie │
# │ 3   ┆ Charlie │
# │ 4   ┆ David   │
# └─────┴─────────┘

.. _union_by_name:

UNION BY NAME

Combine the result sets of two or more SELECT statements, aligning columns by name instead of by ordinal position; if ALL is omitted the final result will have no duplicate rows. This also combines columns from both datasets.

Example:

.. code-block:: python

pl.sql("""
    SELECT * FROM lf1
    UNION BY NAME
    SELECT * FROM lf2
""").sort(by="id").collect()
# shape: (6, 3)
# ┌─────┬─────────┬──────┐
# │ id  ┆ name    ┆ age  │
# │ --- ┆ ---     ┆ ---  │
# │ i64 ┆ str     ┆ i64  │
# ╞═════╪═════════╪══════╡
# │ 1   ┆ Alice   ┆ null │
# │ 2   ┆ Bob     ┆ null │
# │ 2   ┆ Bob     ┆ 30   │
# │ 3   ┆ Charlie ┆ 25   │
# │ 3   ┆ Charlie ┆ null │
# │ 4   ┆ David   ┆ 45   │
# └─────┴─────────┴──────┘