Back to Pathway

Table Operations Overview

docs/2.developers/4.user-guide/30.data-transformation/10.table-operations.md

0.30.120.2 KB
Original Source

Table Operations: An Overview

This section describes the basic transformations you can do with Pathway.

Assignment and renaming

You can create a column in a table using the select and assignment (=) operators:

  • t.select(new_col=t.colA + t.colB)
  • t.select(new_col="default value")

To rename a column, you can do the same (use select) or use rename:

  • t.select(new_col=t.old_col)
  • t.rename(new_col=t.old_col)

Selection and indexing

<table class="w-full"> <thead> <tr> <th class="text-left">Description</th> <th class="text-left">Operators</th> <th class="text-left">Example</th> </tr> </thead> <tbody> <tr> <th class="text-left">Select a column</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway-table#pathway.Table.select">select</a> and dot/bracket notations</span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(t.colA, t['colB'], pw.this.colC)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Select all columns</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway-table#pathway.Table.select">select</a> and star notation</span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(*pw.this)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Removing columns</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway-table#pathway.Table.without">without</a> and dot/bracket notations</span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.without(t.colA, t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Referring to the current table</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway#pathway.this">pw.this</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t1.select(new_col=pw.this.colA + pw.this.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Referring to a table in a join/window</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway#pathway.left">pw.left</a> and <a href="/developers/api-docs/pathway#pathway.right">pw.right</a> notations</span> </td> <td class="text-left !align-left"> <ProseCodeInline>t1.join(t2, pw.left.colA == pw.right.colB).reduce(*pw.left, pw.right.colC)</ProseCodeInline> </td> </tr> <tr> <th class="text-left"><a href="/developers/user-guide/data-transformation/indexing-grouped-tables">Reference indexing</a></th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway-table#pathway.Table.ix_ref">ix_ref</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t_selected_ids.select(selected=t.ix_ref(column).name)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Reindexing</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway-table#pathway.Table.with_id_from">with_id_from</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.with_id_from(t_new_ids.new_id_source)</ProseCodeInline> </td> </tr> </tbody> </table>

Arithmetic operators

<table class="w-full"> <thead> <tr> <th class="text-left">Description</th> <th class="text-left">Operators</th> <th class="text-left">Example</th> </tr> </thead> <tbody> <tr> <th class="text-left">Addition</th> <td class="text-left !align-left"> <ProseCodeInline>+</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA + t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Subtraction</th> <td class="text-left !align-left"> <ProseCodeInline>-</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA - t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Multiplication</th> <td class="text-left !align-left"> <ProseCodeInline>*</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA * t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Division</th> <td class="text-left !align-left"> <ProseCodeInline>/</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA / t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Floor division</th> <td class="text-left !align-left"> <ProseCodeInline>//</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA // t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Modulus</th> <td class="text-left !align-left"> <ProseCodeInline>%</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA % t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Exponentiation</th> <td class="text-left !align-left"> <ProseCodeInline>**</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA ** t.colB)</ProseCodeInline> </td> </tr> </tbody> </table>

Comparisons operators

<table class="w-full"> <thead> <tr> <th class="text-left">Description</th> <th class="text-left">Operators</th> <th class="text-left">Example</th> </tr> </thead> <tbody> <tr> <th class="text-left">Equal</th> <td class="text-left !align-left"> <ProseCodeInline>==</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA == t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Not equal</th> <td class="text-left !align-left"> <ProseCodeInline>!=</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA != t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Greater than</th> <td class="text-left !align-left"> <ProseCodeInline>></ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA > t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Less than</th> <td class="text-left !align-left"> <ProseCodeInline><</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA < t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Greater than or equal to</th> <td class="text-left !align-left"> <ProseCodeInline>>=</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA >= t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Less than or equal to</th> <td class="text-left !align-left"> <ProseCodeInline><=</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA <= t.colB)</ProseCodeInline> </td> </tr> </tbody> </table>

Boolean operators

<table class="w-full"> <thead> <tr> <th class="text-left">Description</th> <th class="text-left">Operators</th> <th class="text-left">Example</th> </tr> </thead> <tbody> <tr> <th class="text-left">And</th> <td class="text-left !align-left"> <ProseCodeInline>&</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA & t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Or</th> <td class="text-left !align-left"> <ProseCodeInline>|</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA | t.colB)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Not</th> <td class="text-left !align-left"> <ProseCodeInline>~</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=~t.colA)</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Exclusive or (XOR)</th> <td class="text-left !align-left"> <ProseCodeInline>^</ProseCodeInline> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=t.colA ^ t.colB)</ProseCodeInline> </td> </tr> </tbody> </table>

Filtering

You can filter rows using the filter operator:

  • t.filter(~pw.this.column)
  • t.filter(pw.this.column > value)

Missing data

The pw.coalesce operator returns the first not-None value from the given columns:

  • t.select(new_col=pw.coalesce(t.colA, t.colB))
  • t.select(new_col=pw.coalesce(t.colA, 10)

Aggregation

You can aggregate data across the rows of the table using the groupby and reduce operators:

t.groupby(pw.this.column).reduce(sum=pw.reducers.sum(pw.this.value))

You can read our dedicated tutorial to learn more about it.

Reducers

Pathway provides several reducers to use on the aggregated values:

<table class="w-full"> <thead> <tr> <th class="text-left">Reducer</th> <th class="text-left">Example</th> </tr> </thead> <tbody> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.any">any</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_any=pw.reducers.any(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.argmax">argmax</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_argmax=pw.reducers.argmax(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.argmin">argmin</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_argmin=pw.reducers.argmin(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.avg">avg</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_avg=pw.reducers.avg(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.earliest">earliest</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_min=pw.reducers.earliest(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.latest">latest</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_max=pw.reducers.latest(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.max">max</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_max=pw.reducers.max(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.min">min</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_min=pw.reducers.min(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.ndarray">ndarray</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_array=pw.reducers.ndarray(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.sorted_tuple">sorted_tuple</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_tuple=pw.reducers.sorted_tuple(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.sum">sum</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_sum=pw.reducers.sum(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.tuple">tuple</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_tuple=pw.reducers.tuple(t.colB))</ProseCodeInline> </td> </tr> <tr> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/reducers#pathway.reducers.unique">unique</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.groupby(t.colA).reduce(col_unique=pw.reducers.unique(t.colB))</ProseCodeInline> </td> </tr> </tbody> </table>

You can also create your own stateful reducers.

Joins

You can use a join to combine columns from two different tables by associating rows from both tables which are matching on some given values:

t1.join(t2, pw.left.column == pw.right.column).select(...)

Read our tutorial about joins to learn more about how to do joins in Pathway.

Union and Concatenation

<table class="w-full"> <thead> <tr> <th class="text-left">Description</th> <th class="text-left">Operators</th> <th class="text-left">Example</th> </tr> </thead> <tbody> <tr> <th class="text-left">Union</th> <td class="text-left !align-left"> <span class="block">+ or +=</span> </td> <td class="text-left !align-left"> <ProseCodeInline class="block">t1 + t2</ProseCodeInline> <div class="block"><ProseCodeInline>t1 += t2 </ProseCodeInline> modifies <ProseCodeInline>t1</ProseCodeInline></div> </td> </tr> <tr> <th class="text-left">Concatenation</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway-table#pathway.Table.concat_reindex">concat_reindex</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>pw.Table.concat_reindex(t1, t2)</ProseCodeInline> </td> </tr> </tbody> </table>

Updating cell values

You can update the cells of a table using the content of another table using the update_cells operator (<<):

  • t.update_cells(t_new)
  • t << t_new

Flattening a column

You can transform a column containing iterables or JSON arrays into multiple rows using the flatten operator:

t.flatten(t.col_to_flatten)

Column operations

<table class="w-full"> <thead> <tr> <th class="text-left">Description</th> <th class="text-left">Operators</th> <th class="text-left">Example</th> </tr> </thead> <tbody> <tr> <th class="text-left">Applying a function to each cell of a column.</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway#pathway.apply">pw.apply</a> in a <a href="/developers/api-docs/pathway-table#pathway.Table.select">select</a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=pw.apply(func, pw.this.col))</ProseCodeInline> </td> </tr> <tr> <th class="text-left">Folding columns into a single one.</th> <td class="text-left !align-left"> <span class="block"><a href="/developers/api-docs/pathway#pathway.make_tuple"><ProseCodeInline>pw.make_tuple</ProseCodeInline></a></span> </td> <td class="text-left !align-left"> <ProseCodeInline>t.select(new_col=pw.make_tuple(t.a, t.b, t.c))</ProseCodeInline> </td> </tr> </tbody> </table>

User-defined functions (UDF)

Pathway allows you to define your own User-defined functions. See our tutorial to learn more about it.