Back to Influxdb

join.tables() function

content/flux/v0/stdlib/join/tables.md

latest9.1 KB
Original Source
<!------------------------------------------------------------------------------ IMPORTANT: This page was generated from comments in the Flux source code. Any edits made directly to this page will be overwritten the next time the documentation is generated. To make updates to this documentation, update the function comments above the function definition in the Flux source code: https://github.com/influxdata/flux/blob/master/stdlib/join/join.flux#L216-L226 Contributing to Flux: https://github.com/influxdata/flux#contributing Fluxdoc syntax: https://github.com/influxdata/flux/blob/master/docs/fluxdoc.md ------------------------------------------------------------------------------->

join.tables() joins two input streams together using a specified method, predicate, and a function to join two corresponding records, one from each input stream.

join.tables() only compares records with the same group key. Output tables have the same grouping as the input tables.

Function type signature
js
(
    <-left: stream[A],
    as: (l: A, r: B) => C,
    method: string,
    on: (l: A, r: B) => bool,
    right: stream[B],
) => stream[C] where A: Record, B: Record, C: Record

{{% caption %}} For more information, see Function type signatures. {{% /caption %}}

Parameters

left

Left input stream. Default is piped-forward data (<-).

({{< req >}}) Right input stream.

on

({{< req >}}) Function that takes a left and right record (l, and r respectively), and returns a boolean.

The body of the function must be a single boolean expression, consisting of one or more equality comparisons between a property of l and a property of r, each chained together by the and operator.

as

({{< req >}}) Function that takes a left and a right record (l and r respectively), and returns a record. The returned record is included in the final output.

method

({{< req >}}) String that specifies the join method.

Supported methods:

  • inner
  • left
  • right
  • full

Examples

Perform an inner join

js
import "sampledata"
import "join"

ints = sampledata.int()
strings = sampledata.string()

join.tables(
    method: "inner",
    left: ints,
    right: strings,
    on: (l, r) => l._time == r._time,
    as: (l, r) => ({l with label: r._value}),
)

{{< expand-wrapper >}} {{% expand "View example output" %}}

Output data

_time_valuelabel*tag
2021-01-01T00:00:00Z-2smpl_g9qczst1
2021-01-01T00:00:10Z10smpl_0mgv9nt1
2021-01-01T00:00:20Z7smpl_phw664t1
2021-01-01T00:00:30Z17smpl_guvzy4t1
2021-01-01T00:00:40Z15smpl_5v3ccet1
2021-01-01T00:00:50Z4smpl_s9fmgyt1
_time_valuelabel*tag
2021-01-01T00:00:00Z19smpl_b5eidat2
2021-01-01T00:00:10Z4smpl_eu4oxpt2
2021-01-01T00:00:20Z-3smpl_5g7tz4t2
2021-01-01T00:00:30Z19smpl_sox1utt2
2021-01-01T00:00:40Z13smpl_wfm757t2
2021-01-01T00:00:50Z1smpl_dtn2bvt2

{{% /expand %}} {{< /expand-wrapper >}}

Perform a left outer join

If the join method is anything other than inner, pay special attention to how the output record is constructed in the as function.

Because of how flux handles outer joins, it's possible for either l or r to be a default record. This means any value in a non-group-key column could be null.

For more information about the behavior of outer joins, see the Outer joins section in the join package documentation.

In the case of a left outer join, l is guaranteed to not be a default record. To ensure that the output record has non-null values for any columns that aren't part of the group key, use values from l. Using a non-group-key value from r risks that value being null.

The example below constructs the output record almost entirely from properties of l. The only exception is the v_right column which gets its value from r._value. In this case, understand and expect that v_right will sometimes be null.

js
import "array"
import "join"

left =
    array.from(
        rows: [
            {_time: 2022-01-01T00:00:00Z, _value: 1, label: "a"},
            {_time: 2022-01-01T00:00:00Z, _value: 2, label: "b"},
            {_time: 2022-01-01T00:00:00Z, _value: 3, label: "d"},
        ],
    )
right =
    array.from(
        rows: [
            {_time: 2022-01-01T00:00:00Z, _value: 0.4, id: "a"},
            {_time: 2022-01-01T00:00:00Z, _value: 0.5, id: "c"},
            {_time: 2022-01-01T00:00:00Z, _value: 0.6, id: "d"},
        ],
    )

join.tables(
    method: "left",
    left: left,
    right: right,
    on: (l, r) => l.label == r.id and l._time == r._time,
    as: (l, r) => ({_time: l._time, label: l.label, v_left: l._value, v_right: r._value}),
)

{{< expand-wrapper >}} {{% expand "View example output" %}}

Output data

_timelabelv_leftv_right
2022-01-01T00:00:00Za10.4
2022-01-01T00:00:00Zb2
2022-01-01T00:00:00Zd30.6

{{% /expand %}} {{< /expand-wrapper >}}

Perform a right outer join

The next example is nearly identical to the previous example, but uses the right join method. With this method, r is guaranteed to not be a default record, but l may be a default record. Because l is more likely to contain null values, the output record is built almost entirely from properties of r, with the exception of v_left, which we expect to sometimes be null.

js
import "array"
import "join"

left =
    array.from(
        rows: [
            {_time: 2022-01-01T00:00:00Z, _value: 1, label: "a"},
            {_time: 2022-01-01T00:00:00Z, _value: 2, label: "b"},
            {_time: 2022-01-01T00:00:00Z, _value: 3, label: "d"},
        ],
    )
right =
    array.from(
        rows: [
            {_time: 2022-01-01T00:00:00Z, _value: 0.4, id: "a"},
            {_time: 2022-01-01T00:00:00Z, _value: 0.5, id: "c"},
            {_time: 2022-01-01T00:00:00Z, _value: 0.6, id: "d"},
        ],
    )

join.tables(
    method: "right",
    left: left,
    right: right,
    on: (l, r) => l.label == r.id and l._time == r._time,
    as: (l, r) => ({_time: r._time, label: r.id, v_left: l._value, v_right: r._value}),
)

{{< expand-wrapper >}} {{% expand "View example output" %}}

Output data

_timelabelv_leftv_right
2022-01-01T00:00:00Za10.4
2022-01-01T00:00:00Zc0.5
2022-01-01T00:00:00Zd30.6

{{% /expand %}} {{< /expand-wrapper >}}

Perform a full outer join

In a full outer join, there are no guarantees about l or r. Either one of them could be a default record, but they will never both be a default record at the same time.

To get non-null values for the output record, check both l and r to see which contains the desired values.

The example below defines a function for the as parameter that appropriately handles the uncertainty of a full outer join.

v_left and v_right still use values from l and r directly, because we expect them to sometimes be null in the output table.

js
import "array"
import "join"

left =
    array.from(
        rows: [
            {_time: 2022-01-01T00:00:00Z, _value: 1, label: "a"},
            {_time: 2022-01-01T00:00:00Z, _value: 2, label: "b"},
            {_time: 2022-01-01T00:00:00Z, _value: 3, label: "d"},
        ],
    )
right =
    array.from(
        rows: [
            {_time: 2022-01-01T00:00:00Z, _value: 0.4, id: "a"},
            {_time: 2022-01-01T00:00:00Z, _value: 0.5, id: "c"},
            {_time: 2022-01-01T00:00:00Z, _value: 0.6, id: "d"},
        ],
    )

join.tables(
    method: "full",
    left: left,
    right: right,
    on: (l, r) => l.label == r.id and l._time == r._time,
    as: (l, r) => {
        time = if exists l._time then l._time else r._time
        label = if exists l.label then l.label else r.id

        return {_time: time, label: label, v_left: l._value, v_right: r._value}
    },
)

{{< expand-wrapper >}} {{% expand "View example output" %}}

Output data

_timelabelv_leftv_right
2022-01-01T00:00:00Za10.4
2022-01-01T00:00:00Zb2
2022-01-01T00:00:00Zc0.5
2022-01-01T00:00:00Zd30.6

{{% /expand %}} {{< /expand-wrapper >}}