Back to Materialize

JOIN

doc/user/content/sql/select/join.md

1235.7 KB
Original Source

JOIN lets you combine two or more table expressions into a single table expression.

Conceptual framework

Much like an RDBMS, Materialize can join together any two table expressions (in our case, either sources or views) into a single table expression.

Materialize has much broader support for JOIN than most streaming platforms, i.e. we support all types of SQL joins in all of the conditions you would expect.

Syntax

{{% include-syntax file="examples/select_join" example="syntax" %}}

Note: It's possible to join together table expressions as inner joins without using this clause whatsoever, e.g. SELECT cols... FROM t1, t2 WHERE t1.x = t2.x GROUP BY cols...

Details

Unlike most other streaming platforms, JOINs in Materialize have very few, if any, restrictions. For example, Materialize:

  • Does not require time windows when joining streams.
  • Does not require any kind of partitioning.

Instead, JOINs work over the available history of both streams, which ultimately provides an experience more similar to an RDBMS than other streaming platforms.

LATERAL subqueries

To permit subqueries on the right-hand side of a JOIN to access the columns defined by the left-hand side, declare the subquery as LATERAL. Normally, a subquery only has access to the columns within its own context.

Table function invocations always have implicit access to the columns defined by the left-hand side of the join, so declaring them as LATERAL is a permitted no-op.

When a join contains a LATERAL cross-reference, the right-hand relation is recomputed for each row in the left-hand relation, then joined to the left-hand row according to the usual rules of the selected join type.

{{< warning >}} LATERAL subqueries can be very expensive to compute. For best results, do not materialize a view containing a LATERAL subquery without first inspecting the plan via the EXPLAIN PLAN statement. In many common patterns involving LATERAL joins, Materialize can optimize away the join entirely. {{< /warning >}}

As a simple example, the following query uses LATERAL to count from 1 to x for all the values of x in xs.

mzsql
SELECT * FROM
  (VALUES (1), (3)) xs (x)
  CROSS JOIN LATERAL generate_series(1, x) y;
nofmt
 x | y
---+---
 1 | 1
 3 | 1
 3 | 2
 3 | 3

For a real-world example of a LATERAL subquery, see the Top-K by group idiom.

Examples

For these examples, we'll use a small data set:

Employees

nofmt
 id |  name
----+--------
  1 | Frank
  2 | Arjun
  3 | Nikhil
  4 | Cuong

Managers

nofmt
 id | name  | manages
----+-------+---------
  1 | Arjun |       4
  2 | Cuong |       3
  3 | Frank |

In this table:

  • Arjun and Frank do not have managers.
  • Frank is a manager but has no reports.

Inner join

Inner joins return all tuples from both tables where the join condition is valid.

mzsql
SELECT
  employees."name" AS employee,
  managers."name" AS manager
FROM employees
INNER JOIN managers ON employees.id = managers.manages;
nofmt
 employee | manager
----------+---------
 Cuong    | Arjun
 Nikhil   | Cuong

Left outer join

Left outer joins (also known as left joins) return all tuples from the left-hand-side table, and all tuples from the right-hand-side table that match the join condition. Tuples on from the left-hand table that are not joined with a tuple from the right-hand table contain NULL wherever the right-hand table is referenced.

mzsql
SELECT
  employees."name" AS employee,
  managers."name" AS manager
FROM employees
LEFT OUTER JOIN managers ON employees.id = managers.manages;
nofmt
 employee | manager
----------+---------
 Cuong    | Arjun
 Nikhil   | Cuong
 Arjun    |
 Frank    |

Right outer join

Right outer joins (also known as right joins) are simply the right-hand-side equivalent of left outer joins.

Right outer joins return all tuples from the right-hand-side table, and all tuples from the left-hand-side table that match the join condition. Tuples on from the right-hand table that are not joined with a tuple from the left-hand table contain NULL wherever the left-hand table is referenced.

mzsql
SELECT
  employees."name" AS employee,
  managers."name" AS manager
FROM employees
RIGHT OUTER JOIN managers ON employees.id = managers.manages;
nofmt
 employee | manager
----------+---------
 Cuong    | Arjun
 Nikhil   | Cuong
          | Frank

Full outer join

Full outer joins perform both a left outer join and a right outer join. They return all tuples from both tables, and join them together where the join conditions are met.

Tuples that are not joined with the other table contain NULL wherever the other table is referenced.

mzsql
SELECT
  employees."name" AS employee,
  managers."name" AS manager
FROM employees
FULL OUTER JOIN managers ON employees.id = managers.manages;
nofmt
 employee | manager
----------+---------
 Cuong    | Arjun
 Nikhil   | Cuong
          | Frank
 Arjun    |
 Frank    |

Cross join

Cross joins return the Cartesian product of the two tables, i.e. all combinations of tuples from the left-hand table combined with tuples from the right-hand table.

Our example dataset doesn't have a meaningful cross-join query, but the above diagram shows how cross joins form the Cartesian product.