Back to Influxdb

Perform an inner join

content/flux/v0/join-data/inner.md

latest7.9 KB
Original Source

Use join.inner() to perform an inner join of two streams of data. Inner joins drop any rows from both input streams that do not have a matching row in the other stream.

{{< svg svg="static/svgs/join-diagram.svg" class="inner" >}}

{{< expand-wrapper >}} {{% expand "View table illustration of an inner join" %}} {{< flex >}} {{% flex-content "third" %}}

left

r1<span style="color:#9b2aff"></span><span style="color:#9b2aff"></span>
r2<span style="color:#9b2aff"></span><span style="color:#9b2aff"></span>
{{% /flex-content %}}
{{% flex-content "third" %}}

right

r1<span style="color:#d30971"></span><span style="color:#d30971"></span>
r3<span style="color:#d30971"></span><span style="color:#d30971"></span>
r4<span style="color:#d30971"></span><span style="color:#d30971"></span>
{{% /flex-content %}}
{{% flex-content "third" %}}

Inner join result

r1<span style="color:#9b2aff"></span><span style="color:#9b2aff"></span><span style="color:#d30971"></span><span style="color:#d30971"></span>
{{% /flex-content %}}
{{< /flex >}}
{{% /expand %}}
{{< /expand-wrapper >}}

Use join.inner to join your data

  1. Import the join package.

  2. Define the left and right data streams to join:

    • Each stream must have one or more columns with common values. Column labels do not need to match, but column values do.
    • Each stream should have identical group keys.

    For more information, see join data requirements.

  3. Use join.inner() to join the two streams together. Provide the following required parameters:

    • left: Stream of data representing the left side of the join.
    • right: Stream of data representing the right side of the join.
    • on: Join predicate. For example: (l, r) => l.column == r.column.
    • as: Join output function that returns a record with values from each input stream. For example: (l, r) => ({l with column1: r.column1, column2: r.column2}).

The following example uses a filtered selection from the machineProduction sample data set as the left data stream and an ad-hoc table created with array.from() as the right data stream.

{{% note %}}

Example data grouping

The example below ungroups the left stream to match the grouping of the right stream. After the two streams are joined together, the joined data is grouped by stationID. {{% /note %}}

js
import "array"
import "influxdata/influxdb/sample"
import "join"

left =
    sample.data(set: "machineProduction")
        |> filter(fn: (r) => r.stationID == "g1" or r.stationID == "g2" or r.stationID == "g3")
        |> filter(fn: (r) => r._field == "oil_temp")
        |> limit(n: 5)

right =
    array.from(
        rows: [
            {station: "g1", opType: "auto", last_maintained: 2021-07-15T00:00:00Z},
            {station: "g2", opType: "manned", last_maintained: 2021-07-02T00:00:00Z},
        ],
    )

join.inner(
    left: left |> group(),
    right: right,
    on: (l, r) => l.stationID == r.station,
    as: (l, r) => ({l with opType: r.opType, maintained: r.last_maintained}),
)
    |> group(columns: ["stationID"])

{{< expand-wrapper >}} {{% expand "View example input and output data" %}}

{{% note %}} _start and _stop columns have been omitted from example input and output. {{% /note %}}

Input

left {#left-input}

_time_measurementstationID_field_value
2021-08-01T00:00:00Zmachineryg1oil_temp39.1
2021-08-01T00:00:11.51Zmachineryg1oil_temp40.3
2021-08-01T00:00:19.53Zmachineryg1oil_temp40.6
2021-08-01T00:00:25.1Zmachineryg1oil_temp40.72
2021-08-01T00:00:36.88Zmachineryg1oil_temp40.8
_time_measurementstationID_field_value
2021-08-01T00:00:00Zmachineryg2oil_temp40.6
2021-08-01T00:00:27.93Zmachineryg2oil_temp40.6
2021-08-01T00:00:54.96Zmachineryg2oil_temp40.6
2021-08-01T00:01:17.27Zmachineryg2oil_temp40.6
2021-08-01T00:01:41.84Zmachineryg2oil_temp40.6
_time_measurementstationID_field_value
2021-08-01T00:00:00Zmachineryg3oil_temp41.4
2021-08-01T00:00:14.46Zmachineryg3oil_temp41.36
2021-08-01T00:00:25.29Zmachineryg3oil_temp41.4
2021-08-01T00:00:38.77Zmachineryg3oil_temp41.4
2021-08-01T00:00:51.2Zmachineryg3oil_temp41.4

right {#right-input}

stationopTypelast_maintained
g1auto2021-07-15T00:00:00Z
g2manned2021-07-02T00:00:00Z

Output {#example-output}

_time_measurementstationID_field_valueopTypemaintained
2021-08-01T00:00:00Zmachineryg1oil_temp39.1auto2021-07-15T00:00:00Z
2021-08-01T00:00:11.51Zmachineryg1oil_temp40.3auto2021-07-15T00:00:00Z
2021-08-01T00:00:19.53Zmachineryg1oil_temp40.6auto2021-07-15T00:00:00Z
2021-08-01T00:00:25.1Zmachineryg1oil_temp40.72auto2021-07-15T00:00:00Z
2021-08-01T00:00:36.88Zmachineryg1oil_temp40.8auto2021-07-15T00:00:00Z
_time_measurementstationID_field_valueopTypemaintained
2021-08-01T00:00:00Zmachineryg2oil_temp40.6manned2021-07-02T00:00:00Z
2021-08-01T00:00:27.93Zmachineryg2oil_temp40.6manned2021-07-02T00:00:00Z
2021-08-01T00:00:54.96Zmachineryg2oil_temp40.6manned2021-07-02T00:00:00Z
2021-08-01T00:01:17.27Zmachineryg2oil_temp40.6manned2021-07-02T00:00:00Z
2021-08-01T00:01:41.84Zmachineryg2oil_temp40.6manned2021-07-02T00:00:00Z

Things to note about the join output

  • Because the right stream does not have a row with the g3 station tag, the joined output drops all rows with the g3 stationID tag from the left stream. join.inner() drops any rows that do not have a matching row in the other data stream.

{{% /expand %}} {{< /expand-wrapper >}}