Back to Pandas

Nullable Boolean data type

doc/source/user_guide/boolean.rst

3.1.0.dev03.2 KB
Original Source

.. currentmodule:: pandas

.. ipython:: python :suppress:

import pandas as pd import numpy as np

.. _boolean:


Nullable Boolean data type


.. note::

BooleanArray is currently experimental. Its API or implementation may change without warning.

.. _boolean.indexing:

Indexing with NA values

pandas allows indexing with NA values in a boolean array, which are treated as False.

.. ipython:: python :okexcept:

s = pd.Series([1, 2, 3]) mask = pd.array([True, False, pd.NA], dtype="boolean") s[mask]

If you would prefer to keep the NA values you can manually fill them with fillna(True).

.. ipython:: python

s[mask.fillna(True)]

If you create a column of NA values (for example to fill them later) with df['new_col'] = pd.NA, the dtype would be set to object in the new column. The performance on this column will be worse than with the appropriate type. It's better to use df['new_col'] = pd.Series(pd.NA, dtype="boolean") (or another dtype that supports NA).

.. ipython:: python

df = pd.DataFrame() df['objects'] = pd.NA df.dtypes

.. _boolean.kleene:

Kleene logical operations

:class:arrays.BooleanArray implements Kleene Logic_ (sometimes called three-value logic) for logical operations like & (and), | (or) and ^ (exclusive-or).

This table demonstrates the results for every combination. These operations are symmetrical, so flipping the left- and right-hand side makes no difference in the result.

================= ========= Expression Result ================= ========= True & True True True & False False True & NA NA False & False False False & NA False NA & NA NA True | True True True | False True True | NA True False | False False False | NA NA NA | NA NA True ^ True False True ^ False True True ^ NA NA False ^ False False False ^ NA NA NA ^ NA NA ================= =========

When an NA is present in an operation, the output value is NA only if the result cannot be determined solely based on the other input. For example, True | NA is True, because both True | True and True | False are True. In that case, we don't actually need to consider the value of the NA.

On the other hand, True & NA is NA. The result depends on whether the NA really is True or False, since True & True is True, but True & False is False, so we can't determine the output.

This differs from how np.nan behaves in logical operations. pandas treated np.nan is always false in the output.

In or

.. ipython:: python

pd.Series([True, False, np.nan], dtype="object") | True pd.Series([True, False, np.nan], dtype="boolean") | True

In and

.. ipython:: python

pd.Series([True, False, np.nan], dtype="object") & True pd.Series([True, False, np.nan], dtype="boolean") & True

.. _Kleene Logic: https://en.wikipedia.org/wiki/Three-valued_logic#Kleene_and_Priest_logics