docs/source/user-guide/transformations/concatenation.md
There are a number of ways to concatenate data from separate DataFrames:
nullIn a vertical concatenation you combine all of the rows from a list of DataFrames into a single
longer DataFrame.
{{code_block('user-guide/transformations/concatenation','vertical',['concat'])}}
--8<-- "python/user-guide/transformations/concatenation.py:setup"
--8<-- "python/user-guide/transformations/concatenation.py:vertical"
Vertical concatenation fails when the dataframes do not have the same column names.
In a horizontal concatenation you combine all of the columns from a list of DataFrames into a
single wider DataFrame.
{{code_block('user-guide/transformations/concatenation','horizontal',['concat'])}}
--8<-- "python/user-guide/transformations/concatenation.py:horizontal"
Horizontal concatenation fails when dataframes have overlapping columns.
When dataframes have different numbers of rows, columns will be padded with null values at the end
up to the maximum length.
{{code_block('user-guide/transformations/concatenation','horizontal_different_lengths',['concat'])}}
--8<-- "python/user-guide/transformations/concatenation.py:horizontal_different_lengths"
nullierIn a diagonal concatenation you combine all of the row and columns from a list of DataFrames into
a single longer and/or wider DataFrame.
{{code_block('user-guide/transformations/concatenation','cross',['concat'])}}
--8<-- "python/user-guide/transformations/concatenation.py:cross"
Diagonal concatenation generates nulls when the column names do not overlap.
When the dataframe shapes do not match and we have an overlapping semantic key then we can join the dataframes instead of concatenating them.
Before a concatenation we have two dataframes df1 and df2. Each column in df1 and df2 is in
one or more chunks in memory. By default, during concatenation the chunks in each column are not
made contiguous. This makes the concat operation faster and consume less memory but it may slow down
future operations that would benefit from having the data be in contiguous memory. The process of
copying the fragmented chunks into a single new chunk is known as rechunking. Rechunking is an
expensive operation. Prior to version 0.20.26, the default was to perform a rechunk but in new
versions, the default is not to. If you do want Polars to rechunk the concatenated DataFrame you
specify rechunk = True when doing the concatenation.