Back to Polars

Column selections

docs/user-guide/expressions/column-selections.md

latest5.8 KB
Original Source

Column selections

Let's create a dataset to use in this section:

{{code_block('user-guide/expressions/column-selections','selectors_df',['DataFrame'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:setup"
--8<-- "python/user-guide/expressions/column-selections.py:selectors_df"

Expression expansion

As we've seen in the previous section, we can select specific columns using the pl.col method. It can also select multiple columns - both as a means of convenience, and to expand the expression.

This kind of convenience feature isn't just decorative or syntactic sugar. It allows for a very powerful application of DRY principles in your code: a single expression that specifies multiple columns expands into a list of expressions (depending on the DataFrame schema), resulting in being able to select multiple columns + run computation on them!

Select all, or all but some

We can select all columns in the DataFrame object by providing the argument *:

{{code_block('user-guide/expressions/column-selections', 'all',['all'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:all"

Often, we don't just want to include all columns, but include all while excluding a few. This can be done easily as well:

{{code_block('user-guide/expressions/column-selections','exclude',['exclude'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:exclude"

By multiple strings

Specifying multiple strings allows expressions to expand to all matching columns:

{{code_block('user-guide/expressions/column-selections','expansion_by_names',['dt_to_string'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:expansion_by_names"

By regular expressions

Multiple column selection is possible by regular expressions also, by making sure to wrap the regex by ^ and $ to let pl.col know that a regex selection is expected:

{{code_block('user-guide/expressions/column-selections','expansion_by_regex',[''])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:expansion_by_regex"

By data type

pl.col can select multiple columns using Polars data types:

{{code_block('user-guide/expressions/column-selections','expansion_by_dtype',['n_unique'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:expansion_by_dtype"

Using selectors

Polars also allows for the use of intuitive selections for columns based on their name, dtype or other properties; and this is built on top of existing functionality outlined in col used above. It is recommended to use them by importing and aliasing polars.selectors as cs.

By dtype

To select just the integer and string columns, we can do:

{{code_block('user-guide/expressions/column-selections','selectors_intro',['selectors'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:selectors_intro"

Applying set operations

These selectors also allow for set based selection operations. For instance, to select the numeric columns except the first column that indicates row numbers:

{{code_block('user-guide/expressions/column-selections','selectors_diff',['cs_first', 'cs_numeric'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:selectors_diff"

We can also select the row number by name and any non-numeric columns:

{{code_block('user-guide/expressions/column-selections','selectors_union',['cs_by_name', 'cs_numeric'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:selectors_union"

By patterns and substrings

Selectors can also be matched by substring and regex patterns:

{{code_block('user-guide/expressions/column-selections','selectors_by_name',['cs_contains', 'cs_matches'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:selectors_by_name"

Converting to expressions

What if we want to apply a specific operation on the selected columns (i.e. get back to representing them as expressions to operate upon)? We can simply convert them using as_expr and then proceed as normal:

{{code_block('user-guide/expressions/column-selections','selectors_to_expr',['cs_temporal'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:selectors_to_expr"

Debugging selectors

Polars also provides two helpful utility functions to aid with using selectors: is_selector and selector_column_names:

{{code_block('user-guide/expressions/column-selections','selectors_is_selector_utility',['is_selector'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:selectors_is_selector_utility"

To predetermine the column names that are selected, which is especially useful for a LazyFrame object:

{{code_block('user-guide/expressions/column-selections','selectors_colnames_utility',['selector_column_names'])}}

python
--8<-- "python/user-guide/expressions/column-selections.py:selectors_colnames_utility"