Version 0.20.1 (May 5, 2017)

.. _whatsnew_0200:

This is a major release from 0.19.2 and includes a number of API changes, deprecations, new features, enhancements, and performance improvements along with a large number of bug fixes. We recommend that all users upgrade to this version.

Highlights include:

New .agg() API for Series/DataFrame similar to the groupby-rolling-resample API's, see :ref:here <whatsnew_0200.enhancements.agg>
Integration with the feather-format, including a new top-level pd.read_feather() and DataFrame.to_feather() method, see :ref:here <io.feather>.
The .ix indexer has been deprecated, see :ref:here <whatsnew_0200.api_breaking.deprecate_ix>
Panel has been deprecated, see :ref:here <whatsnew_0200.api_breaking.deprecate_panel>
Addition of an IntervalIndex and Interval scalar type, see :ref:here <whatsnew_0200.enhancements.intervalindex>
Improved user API when grouping by index levels in .groupby(), see :ref:here <whatsnew_0200.enhancements.groupby_access>
Improved support for UInt64 dtypes, see :ref:here <whatsnew_0200.enhancements.uint64_support>
A new orient for JSON serialization, orient='table', that uses the Table Schema spec and that gives the possibility for a more interactive repr in the Jupyter Notebook, see :ref:here <whatsnew_0200.enhancements.table_schema>
Experimental support for exporting styled DataFrames (DataFrame.style) to Excel, see :ref:here <whatsnew_0200.enhancements.style_excel>
Window binary corr/cov operations now return a MultiIndexed DataFrame rather than a Panel, as Panel is now deprecated, see :ref:here <whatsnew_0200.api_breaking.rolling_pairwise>
Support for S3 handling now uses s3fs, see :ref:here <whatsnew_0200.api_breaking.s3>
Google BigQuery support now uses the pandas-gbq library, see :ref:here <whatsnew_0200.api_breaking.gbq>

.. warning::

pandas has changed the internal structure and layout of the code base. This can affect imports that are not from the top-level pandas.* namespace, please see the changes :ref:here <whatsnew_0200.privacy>.

Check the :ref:API Changes <whatsnew_0200.api_breaking> and :ref:deprecations <whatsnew_0200.deprecations> before updating.

.. note::

This is a combined release for 0.20.0 and 0.20.1. Version 0.20.1 contains one additional change for backwards-compatibility with downstream projects using pandas' utils routines. (:issue:16250)

.. contents:: What's new in v0.20.0 :local: :backlinks: none

.. _whatsnew_0200.enhancements:

New features


.. _whatsnew_0200.enhancements.agg:

Method ``agg`` API for DataFrame/Series
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Series & DataFrame have been enhanced to support the aggregation API. This is a familiar API
from groupby, window operations, and resampling. This allows aggregation operations in a concise way
by using :meth:`~DataFrame.agg` and :meth:`~DataFrame.transform`. The full documentation
is :ref:`here <basics.aggregate>` (:issue:`1623`).

Here is a sample

.. ipython:: python

   df = pd.DataFrame(np.random.randn(10, 3), columns=['A', 'B', 'C'],
                     index=pd.date_range('1/1/2000', periods=10))
   df.iloc[3:7] = np.nan
   df

One can operate using string function names, callables, lists, or dictionaries of these.

Using a single function is equivalent to ``.apply``.

.. ipython:: python

   df.agg('sum')

Multiple aggregations with a list of functions.

.. ipython:: python

   df.agg(['sum', 'min'])

Using a dict provides the ability to apply specific aggregations per column.
You will get a matrix-like output of all of the aggregators. The output has one column
per unique function. Those functions applied to a particular column will be ``NaN``:

.. ipython:: python

   df.agg({'A': ['sum', 'min'], 'B': ['min', 'max']})

The API also supports a ``.transform()`` function for broadcasting results.

.. ipython:: python
   :okwarning:

   df.transform(['abs', lambda x: x - x.min()])

When presented with mixed dtypes that cannot be aggregated, ``.agg()`` will only take the valid
aggregations. This is similar to how groupby ``.agg()`` works. (:issue:`15015`)

.. ipython:: python

   df = pd.DataFrame({'A': [1, 2, 3],
                      'B': [1., 2., 3.],
                      'C': ['foo', 'bar', 'baz'],
                      'D': pd.date_range('20130101', periods=3)})
   df.dtypes

.. code-block:: python

   In [10]: df.agg(['min', 'sum'])
   Out[10]:
        A    B          C          D
   min  1  1.0        bar 2013-01-01
   sum  6  6.0  foobarbaz        NaT

.. _whatsnew_0200.enhancements.dataio_dtype:

Keyword argument ``dtype`` for data IO
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``'python'`` engine for :func:`read_csv`, as well as the :func:`read_fwf` function for parsing
fixed-width text files and :func:`read_excel` for parsing Excel files, now accept the ``dtype`` keyword argument for specifying the types of specific columns (:issue:`14295`). See the :ref:`io docs <io.dtypes>` for more information.

.. ipython:: python
   :suppress:

   from io import StringIO

.. ipython:: python

   data = "a  b\n1  2\n3  4"
   pd.read_fwf(StringIO(data)).dtypes
   pd.read_fwf(StringIO(data), dtype={'a': 'float64', 'b': 'object'}).dtypes

.. _whatsnew_0120.enhancements.datetime_origin:

Method ``.to_datetime()`` has gained an ``origin`` parameter
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`to_datetime` has gained a new parameter, ``origin``, to define a reference date
from where to compute the resulting timestamps when parsing numerical values with a specific ``unit`` specified. (:issue:`11276`, :issue:`11745`)

For example, with 1960-01-01 as the starting date:

.. ipython:: python

   pd.to_datetime([1, 2, 3], unit='D', origin=pd.Timestamp('1960-01-01'))

The default is set at ``origin='unix'``, which defaults to ``1970-01-01 00:00:00``, which is
commonly called 'unix epoch' or POSIX time. This was the previous default, so this is a backward compatible change.

.. ipython:: python

   pd.to_datetime([1, 2, 3], unit='D')


.. _whatsnew_0200.enhancements.groupby_access:

GroupBy enhancements
^^^^^^^^^^^^^^^^^^^^

Strings passed to ``DataFrame.groupby()`` as the ``by`` parameter may now reference either column names or index level names. Previously, only column names could be referenced. This allows to easily group by a column and index level at the same time. (:issue:`5677`)

.. ipython:: python

   arrays = [['bar', 'bar', 'baz', 'baz', 'foo', 'foo', 'qux', 'qux'],
             ['one', 'two', 'one', 'two', 'one', 'two', 'one', 'two']]

   index = pd.MultiIndex.from_arrays(arrays, names=['first', 'second'])

   df = pd.DataFrame({'A': [1, 1, 1, 1, 2, 2, 3, 3],
                      'B': np.arange(8)},
                     index=index)
   df

   df.groupby(['second', 'A']).sum()


.. _whatsnew_0200.enhancements.compressed_urls:

Better support for compressed URLs in ``read_csv``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The compression code was refactored (:issue:`12688`). As a result, reading
dataframes from URLs in :func:`read_csv` or :func:`read_table` now supports
additional compression methods: ``xz``, ``bz2``, and ``zip`` (:issue:`14570`).
Previously, only ``gzip`` compression was supported. By default, compression of
URLs and paths are now inferred using their file extensions. Additionally,
support for bz2 compression in the python 2 C-engine improved (:issue:`14874`).

.. ipython:: python

   url = ('https://github.com/{repo}/raw/{branch}/{path}'
          .format(repo='pandas-dev/pandas',
                  branch='main',
                  path='pandas/tests/io/parser/data/salaries.csv.bz2'))
   # default, infer compression
   df = pd.read_csv(url, sep='\t', compression='infer')
   # explicitly specify compression
   df = pd.read_csv(url, sep='\t', compression='bz2')
   df.head(2)

.. _whatsnew_0200.enhancements.pickle_compression:

Pickle file IO now supports compression
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:func:`read_pickle`, :meth:`DataFrame.to_pickle` and :meth:`Series.to_pickle`
can now read from and write to compressed pickle files. Compression methods
can be an explicit parameter or be inferred from the file extension.
See :ref:`the docs here. <io.pickle.compression>`

.. ipython:: python

   df = pd.DataFrame({'A': np.random.randn(1000),
                      'B': 'foo',
                      'C': pd.date_range('20130101', periods=1000, freq='s')})

Using an explicit compression type

.. ipython:: python

   df.to_pickle("data.pkl.compress", compression="gzip")
   rt = pd.read_pickle("data.pkl.compress", compression="gzip")
   rt.head()

The default is to infer the compression type from the extension (``compression='infer'``):

.. ipython:: python

   df.to_pickle("data.pkl.gz")
   rt = pd.read_pickle("data.pkl.gz")
   rt.head()
   df["A"].to_pickle("s1.pkl.bz2")
   rt = pd.read_pickle("s1.pkl.bz2")
   rt.head()

.. ipython:: python
   :suppress:

   import os
   os.remove("data.pkl.compress")
   os.remove("data.pkl.gz")
   os.remove("s1.pkl.bz2")

.. _whatsnew_0200.enhancements.uint64_support:

UInt64 support improved
^^^^^^^^^^^^^^^^^^^^^^^

pandas has significantly improved support for operations involving unsigned,
or purely non-negative, integers. Previously, handling these integers would
result in improper rounding or data-type casting, leading to incorrect results.
Notably, a new numerical index, ``UInt64Index``, has been created (:issue:`14937`)

.. code-block:: ipython

   In [1]: idx = pd.UInt64Index([1, 2, 3])
   In [2]: df = pd.DataFrame({'A': ['a', 'b', 'c']}, index=idx)
   In [3]: df.index
   Out[3]: UInt64Index([1, 2, 3], dtype='uint64')

- Bug in converting object elements of array-like objects to unsigned 64-bit integers (:issue:`4471`, :issue:`14982`)
- Bug in ``Series.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14721`)
- Bug in ``DataFrame`` construction in which unsigned 64-bit integer elements were being converted to objects (:issue:`14881`)
- Bug in ``pd.read_csv()`` in which unsigned 64-bit integer elements were being improperly converted to the wrong data types (:issue:`14983`)
- Bug in ``pd.unique()`` in which unsigned 64-bit integers were causing overflow (:issue:`14915`)
- Bug in ``pd.value_counts()`` in which unsigned 64-bit integers were being erroneously truncated in the output (:issue:`14934`)

.. _whatsnew_0200.enhancements.groupy_categorical:

GroupBy on categoricals
^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, ``.groupby(..., sort=False)`` would fail with a ``ValueError`` when grouping on a categorical series with some categories not appearing in the data. (:issue:`13179`)

.. ipython:: python

   chromosomes = np.r_[np.arange(1, 23).astype(str), ['X', 'Y']]
   df = pd.DataFrame({
       'A': np.random.randint(100),
       'B': np.random.randint(100),
       'C': np.random.randint(100),
       'chromosomes': pd.Categorical(np.random.choice(chromosomes, 100),
                                     categories=chromosomes,
                                     ordered=True)})
   df

**Previous behavior**:

.. code-block:: ipython

   In [3]: df[df.chromosomes != '1'].groupby('chromosomes', observed=False, sort=False).sum()
   ---------------------------------------------------------------------------
   ValueError: items in new_categories are not the same as in old categories

**New behavior**:

.. ipython:: python

   df[df.chromosomes != '1'].groupby('chromosomes', observed=False, sort=False).sum()

.. _whatsnew_0200.enhancements.table_schema:

Table schema output
^^^^^^^^^^^^^^^^^^^

The new orient ``'table'`` for :meth:`DataFrame.to_json`
will generate a `Table Schema`_ compatible string representation of
the data.

.. code-block:: ipython

   In [38]: df = pd.DataFrame(
      ....: {'A': [1, 2, 3],
      ....:  'B': ['a', 'b', 'c'],
      ....:  'C': pd.date_range('2016-01-01', freq='d', periods=3)},
      ....: index=pd.Index(range(3), name='idx'))
   In [39]: df
   Out[39]:
        A  B          C
   idx
   0    1  a 2016-01-01
   1    2  b 2016-01-02
   2    3  c 2016-01-03

   [3 rows x 3 columns]

   In [40]: df.to_json(orient='table')
   Out[40]:
   '{"schema":{"fields":[{"name":"idx","type":"integer"},{"name":"A","type":"integer"},{"name":"B","type":"string"},{"name":"C","type":"datetime"}],"primaryKey":["idx"],"pandas_version":"1.4.0"},"data":[{"idx":0,"A":1,"B":"a","C":"2016-01-01T00:00:00.000"},{"idx":1,"A":2,"B":"b","C":"2016-01-02T00:00:00.000"},{"idx":2,"A":3,"B":"c","C":"2016-01-03T00:00:00.000"}]}'


See :ref:`IO: Table Schema for more information <io.table_schema>`.

Additionally, the repr for ``DataFrame`` and ``Series`` can now publish
this JSON Table schema representation of the Series or DataFrame if you are
using IPython (or another frontend like `nteract`_ using the Jupyter messaging
protocol).
This gives frontends like the Jupyter notebook and `nteract`_
more flexibility in how they display pandas objects, since they have
more information about the data.
You must enable this by setting the ``display.html.table_schema`` option to ``True``.

.. _Table Schema: http://specs.frictionlessdata.io/json-table-schema/
.. _nteract: https://nteract.io/

.. _whatsnew_0200.enhancements.scipy_sparse:

SciPy sparse matrix from/to SparseDataFrame
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas now supports creating sparse dataframes directly from ``scipy.sparse.spmatrix`` instances.
See the :ref:`documentation <sparse.scipysparse>` for more information. (:issue:`4343`)

All sparse formats are supported, but matrices that are not in :mod:`COOrdinate <scipy.sparse>` format will be converted, copying data as needed.

.. code-block:: python

   from scipy.sparse import csr_matrix
   arr = np.random.random(size=(1000, 5))
   arr[arr < .9] = 0
   sp_arr = csr_matrix(arr)
   sp_arr
   sdf = pd.SparseDataFrame(sp_arr)
   sdf

To convert a ``SparseDataFrame`` back to sparse SciPy matrix in COO format, you can use:

.. code-block:: python

   sdf.to_coo()

.. _whatsnew_0200.enhancements.style_excel:

Excel output for styled DataFrames
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Experimental support has been added to export ``DataFrame.style`` formats to Excel using the ``openpyxl`` engine. (:issue:`15530`)

For example, after running the following, ``styled.xlsx`` renders as below:

.. ipython:: python

   np.random.seed(24)
   df = pd.DataFrame({'A': np.linspace(1, 10, 10)})
   df = pd.concat([df, pd.DataFrame(np.random.RandomState(24).randn(10, 4),
                                    columns=list('BCDE'))],
                  axis=1)
   df.iloc[0, 2] = np.nan
   df
   styled = (df.style
             .map(lambda val: 'color:red;' if val < 0 else 'color:black;')
             .highlight_max())
   styled.to_excel('styled.xlsx', engine='openpyxl')

.. image:: ../_static/style-excel.png

.. ipython:: python
   :suppress:

   import os
   os.remove('styled.xlsx')

See the :ref:`Style documentation </user_guide/style.ipynb#Export-to-Excel>` for more detail.

.. _whatsnew_0200.enhancements.intervalindex:

IntervalIndex
^^^^^^^^^^^^^

pandas has gained an ``IntervalIndex`` with its own dtype, ``interval`` as well as the ``Interval`` scalar type. These allow first-class support for interval
notation, specifically as a return type for the categories in :func:`cut` and :func:`qcut`. The ``IntervalIndex`` allows some unique indexing, see the
:ref:`docs <advanced.intervalindex>`. (:issue:`7640`, :issue:`8625`)

.. warning::

   These indexing behaviors of the IntervalIndex are provisional and may change in a future version of pandas. Feedback on usage is welcome.


Previous behavior:

The returned categories were strings, representing Intervals

.. code-block:: ipython

   In [1]: c = pd.cut(range(4), bins=2)

   In [2]: c
   Out[2]:
   [(-0.003, 1.5], (-0.003, 1.5], (1.5, 3], (1.5, 3]]
   Categories (2, object): [(-0.003, 1.5] < (1.5, 3]]

   In [3]: c.categories
   Out[3]: Index(['(-0.003, 1.5]', '(1.5, 3]'], dtype='object')

New behavior:

.. ipython:: python

   c = pd.cut(range(4), bins=2)
   c
   c.categories

Furthermore, this allows one to bin *other* data with these same bins, with ``NaN`` representing a missing
value similar to other dtypes.

.. ipython:: python

   pd.cut([0, 3, 5, 1], bins=c.categories)

An ``IntervalIndex`` can also be used in ``Series`` and ``DataFrame`` as the index.

.. ipython:: python

   df = pd.DataFrame({'A': range(4),
                      'B': pd.cut([0, 3, 1, 1], bins=c.categories)
                      }).set_index('B')
   df

Selecting via a specific interval:

.. ipython:: python

   df.loc[pd.Interval(1.5, 3.0)]

Selecting via a scalar value that is contained *in* the intervals.

.. ipython:: python

   df.loc[0]

.. _whatsnew_0200.enhancements.other:

Other enhancements
^^^^^^^^^^^^^^^^^^

- ``DataFrame.rolling()`` now accepts the parameter ``closed='right'|'left'|'both'|'neither'`` to choose the rolling window-endpoint closedness. See the :ref:`documentation <window.endpoints>` (:issue:`13965`)
- Integration with the ``feather-format``, including a new top-level ``pd.read_feather()`` and ``DataFrame.to_feather()`` method, see :ref:`here <io.feather>`.
- ``Series.str.replace()`` now accepts a callable, as replacement, which is passed to ``re.sub`` (:issue:`15055`)
- ``Series.str.replace()`` now accepts a compiled regular expression as a pattern (:issue:`15446`)
- ``Series.sort_index`` accepts parameters ``kind`` and ``na_position`` (:issue:`13589`, :issue:`14444`)
- ``DataFrame`` and ``DataFrame.groupby()``  have gained a ``nunique()`` method to count the distinct values over an axis (:issue:`14336`, :issue:`15197`).
- ``DataFrame`` has gained a ``melt()`` method, equivalent to ``pd.melt()``, for unpivoting from a wide to long format (:issue:`12640`).
- ``pd.read_excel()`` now preserves sheet order when using ``sheetname=None`` (:issue:`9930`)
- Multiple offset aliases with decimal points are now supported (e.g. ``0.5min`` is parsed as ``30s``) (:issue:`8419`)
- ``.isnull()`` and ``.notnull()`` have been added to ``Index`` object to make them more consistent with the ``Series`` API (:issue:`15300`)
- New ``UnsortedIndexError`` (subclass of ``KeyError``) raised when indexing/slicing into an
  unsorted MultiIndex (:issue:`11897`). This allows differentiation between errors due to lack
  of sorting or an incorrect key. See :ref:`here <advanced.unsorted>`
- ``MultiIndex`` has gained a ``.to_frame()`` method to convert to a ``DataFrame`` (:issue:`12397`)
- ``pd.cut`` and ``pd.qcut`` now support datetime64 and timedelta64 dtypes (:issue:`14714`, :issue:`14798`)
- ``pd.qcut`` has gained the ``duplicates='raise'|'drop'`` option to control whether to raise on duplicated edges (:issue:`7751`)
- ``Series`` provides a ``to_excel`` method to output Excel files (:issue:`8825`)
- The ``usecols`` argument in ``pd.read_csv()`` now accepts a callable function as a value  (:issue:`14154`)
- The ``skiprows`` argument in ``pd.read_csv()`` now accepts a callable function as a value  (:issue:`10882`)
- The ``nrows`` and ``chunksize`` arguments in ``pd.read_csv()`` are supported if both are passed (:issue:`6774`, :issue:`15755`)
- ``DataFrame.plot`` now prints a title above each subplot if ``suplots=True`` and ``title`` is a list of strings (:issue:`14753`)
- ``DataFrame.plot`` can pass the matplotlib 2.0 default color cycle as a single string as color parameter, see `here <http://matplotlib.org/2.0.0/users/colors.html#cn-color-selection>`__. (:issue:`15516`)
- ``Series.interpolate()`` now supports timedelta as an index type with ``method='time'`` (:issue:`6424`)
- Addition of a ``level`` keyword to ``DataFrame/Series.rename`` to rename
  labels in the specified level of a MultiIndex (:issue:`4160`).
- ``DataFrame.reset_index()`` will now interpret a tuple ``index.name`` as a key spanning across levels of ``columns``, if this is a ``MultiIndex`` (:issue:`16164`)
- ``Timedelta.isoformat`` method added for formatting Timedeltas as an `ISO 8601 duration`_. See the :ref:`Timedelta docs <timedeltas.isoformat>` (:issue:`15136`)
- ``.select_dtypes()`` now allows the string ``datetimetz`` to generically select datetimes with tz (:issue:`14910`)
- The ``.to_latex()`` method will now accept ``multicolumn`` and ``multirow`` arguments to use the accompanying LaTeX enhancements
- ``pd.merge_asof()`` gained the option ``direction='backward'|'forward'|'nearest'`` (:issue:`14887`)
- ``Series/DataFrame.asfreq()`` have gained a ``fill_value`` parameter, to fill missing values (:issue:`3715`).
- ``Series/DataFrame.resample.asfreq`` have gained a ``fill_value`` parameter, to fill missing values during resampling (:issue:`3715`).
- :func:`pandas.util.hash_pandas_object` has gained the ability to hash a ``MultiIndex`` (:issue:`15224`)
- ``Series/DataFrame.squeeze()`` have gained the ``axis`` parameter. (:issue:`15339`)
- ``DataFrame.to_excel()`` has a new ``freeze_panes`` parameter to turn on Freeze Panes when exporting to Excel (:issue:`15160`)
- ``pd.read_html()`` will parse multiple header rows, creating a MultiIndex header. (:issue:`13434`).
- HTML table output skips ``colspan`` or ``rowspan`` attribute if equal to 1. (:issue:`15403`)
- :class:`pandas.io.formats.style.Styler` template now has blocks for easier extension, see the :ref:`example notebook </user_guide/style.ipynb#Subclassing>` (:issue:`15649`)
- :meth:`Styler.render() <pandas.io.formats.style.Styler.render>` now accepts ``**kwargs`` to allow user-defined variables in the template (:issue:`15649`)
- Compatibility with Jupyter notebook 5.0; MultiIndex column labels are left-aligned and MultiIndex row-labels are top-aligned (:issue:`15379`)
- ``TimedeltaIndex`` now has a custom date-tick formatter specifically designed for nanosecond level precision (:issue:`8711`)
- ``pd.api.types.union_categoricals`` gained the ``ignore_order`` argument to allow ignoring the ordered attribute of unioned categoricals (:issue:`13410`). See the :ref:`categorical union docs <categorical.union>` for more information.
- ``DataFrame.to_latex()`` and ``DataFrame.to_string()`` now allow optional header aliases. (:issue:`15536`)
- Re-enable the ``parse_dates`` keyword of ``pd.read_excel()`` to parse string columns as dates (:issue:`14326`)
- Added ``.empty`` property to subclasses of ``Index``. (:issue:`15270`)
- Enabled floor division for ``Timedelta`` and ``TimedeltaIndex`` (:issue:`15828`)
- ``pandas.io.json.json_normalize()`` gained the option ``errors='ignore'|'raise'``; the default is ``errors='raise'`` which is backward compatible. (:issue:`14583`)
- ``pandas.io.json.json_normalize()`` with an empty ``list`` will return an empty ``DataFrame`` (:issue:`15534`)
- ``pandas.io.json.json_normalize()`` has gained a ``sep`` option that accepts ``str`` to separate joined fields; the default is ".", which is backward compatible. (:issue:`14883`)
- :meth:`MultiIndex.remove_unused_levels` has been added to facilitate :ref:`removing unused levels <advanced.shown_levels>`. (:issue:`15694`)
- ``pd.read_csv()`` will now raise a ``ParserError`` error whenever any parsing error occurs (:issue:`15913`, :issue:`15925`)
- ``pd.read_csv()`` now supports the ``error_bad_lines`` and ``warn_bad_lines`` arguments for the Python parser (:issue:`15925`)
- The ``display.show_dimensions`` option can now also be used to specify
  whether the length of a ``Series`` should be shown in its repr (:issue:`7117`).
- ``parallel_coordinates()`` has gained a ``sort_labels`` keyword argument that sorts class labels and the colors assigned to them (:issue:`15908`)
- Options added to allow one to turn on/off using ``bottleneck`` and ``numexpr``, see :ref:`here <basics.accelerate>` (:issue:`16157`)
- ``DataFrame.style.bar()`` now accepts two more options to further customize the bar chart. Bar alignment is set with ``align='left'|'mid'|'zero'``, the default is "left", which is backward compatible; You can now pass a list of ``color=[color_negative, color_positive]``. (:issue:`14757`)

.. _ISO 8601 duration: https://en.wikipedia.org/wiki/ISO_8601#Durations


.. _whatsnew_0200.api_breaking:

Backwards incompatible API changes

.. _whatsnew.api_breaking.io_compat:

Possible incompatibility for HDF5 formats created with pandas < 0.13.0 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pd.TimeSeries was deprecated officially in 0.17.0, though has already been an alias since 0.13.0. It has been dropped in favor of pd.Series. (:issue:15098).

This may cause HDF5 files that were created in prior versions to become unreadable if pd.TimeSeries was used. This is most likely to be for pandas < 0.13.0. If you find yourself in this situation. You can use a recent prior version of pandas to read in your HDF5 files, then write them out again after applying the procedure below.

.. code-block:: ipython

In [2]: s = pd.TimeSeries([1, 2, 3], index=pd.date_range('20130101', periods=3))

In [3]: s Out[3]: 2013-01-01 1 2013-01-02 2 2013-01-03 3 Freq: D, dtype: int64

In [4]: type(s) Out[4]: pandas.core.series.TimeSeries

In [5]: s = pd.Series(s)

In [6]: s Out[6]: 2013-01-01 1 2013-01-02 2 2013-01-03 3 Freq: D, dtype: int64

In [7]: type(s) Out[7]: pandas.core.series.Series

.. _whatsnew_0200.api_breaking.index_map:

Map on Index types now return other Index types ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

map on an Index now returns an Index, not a numpy array (:issue:12766)

.. ipython:: python

idx = pd.Index([1, 2]) idx mi = pd.MultiIndex.from_tuples([(1, 2), (2, 4)]) mi

Previous behavior:

.. code-block:: ipython

In [5]: idx.map(lambda x: x * 2) Out[5]: array([2, 4])

In [6]: idx.map(lambda x: (x, x * 2)) Out[6]: array([(1, 2), (2, 4)], dtype=object)

In [7]: mi.map(lambda x: x) Out[7]: array([(1, 2), (2, 4)], dtype=object)

In [8]: mi.map(lambda x: x[0]) Out[8]: array([1, 2])

New behavior:

.. ipython:: python

idx.map(lambda x: x * 2) idx.map(lambda x: (x, x * 2))

mi.map(lambda x: x)

mi.map(lambda x: x[0])

map on a Series with datetime64 values may return int64 dtypes rather than int32

.. code-block:: ipython

In [64]: s = pd.Series(pd.date_range('2011-01-02T00:00', '2011-01-02T02:00', freq='H') ....: .tz_localize('Asia/Tokyo')) ....:

In [65]: s Out[65]: 0 2011-01-02 00:00:00+09:00 1 2011-01-02 01:00:00+09:00 2 2011-01-02 02:00:00+09:00 Length: 3, dtype: datetime64[ns, Asia/Tokyo]

Previous behavior:

.. code-block:: ipython

In [9]: s.map(lambda x: x.hour) Out[9]: 0 0 1 1 2 2 dtype: int32

New behavior:

.. code-block:: ipython

In [66]: s.map(lambda x: x.hour) Out[66]: 0 0 1 1 2 2 Length: 3, dtype: int64

.. _whatsnew_0200.api_breaking.index_dt_field:

Accessing datetime fields of Index now return Index ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The datetime-related attributes (see :ref:here <timeseries.components> for an overview) of DatetimeIndex, PeriodIndex and TimedeltaIndex previously returned numpy arrays. They will now return a new Index object, except in the case of a boolean field, where the result will still be a boolean ndarray. (:issue:15022)

Previous behaviour:

.. code-block:: ipython

In [1]: idx = pd.date_range("2015-01-01", periods=5, freq='10H')

In [2]: idx.hour Out[2]: array([ 0, 10, 20, 6, 16], dtype=int32)

New behavior:

.. code-block:: ipython

In [67]: idx = pd.date_range("2015-01-01", periods=5, freq='10H')

In [68]: idx.hour Out[68]: Index([0, 10, 20, 6, 16], dtype='int32')

This has the advantage that specific Index methods are still available on the result. On the other hand, this might have backward incompatibilities: e.g. compared to numpy arrays, Index objects are not mutable. To get the original ndarray, you can always convert explicitly using np.asarray(idx.hour).

.. _whatsnew_0200.api_breaking.unique:

pd.unique will now be consistent with extension types ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In prior versions, using :meth:Series.unique and :func:pandas.unique on Categorical and tz-aware data-types would yield different return types. These are now made consistent. (:issue:15903)

Datetime tz-aware

Previous behaviour:

.. code-block:: ipython

Series

In [5]: pd.Series([pd.Timestamp('20160101', tz='US/Eastern'), ...: pd.Timestamp('20160101', tz='US/Eastern')]).unique() Out[5]: array([Timestamp('2016-01-01 00:00:00-0500', tz='US/Eastern')], dtype=object)

In [6]: pd.unique(pd.Series([pd.Timestamp('20160101', tz='US/Eastern'), ...: pd.Timestamp('20160101', tz='US/Eastern')])) Out[6]: array(['2016-01-01T05:00:00.000000000'], dtype='datetime64[ns]')

Index

In [7]: pd.Index([pd.Timestamp('20160101', tz='US/Eastern'), ...: pd.Timestamp('20160101', tz='US/Eastern')]).unique() Out[7]: DatetimeIndex(['2016-01-01 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)

In [8]: pd.unique([pd.Timestamp('20160101', tz='US/Eastern'), ...: pd.Timestamp('20160101', tz='US/Eastern')]) Out[8]: array(['2016-01-01T05:00:00.000000000'], dtype='datetime64[ns]')

New behavior:

.. ipython:: python

Series, returns an array of Timestamp tz-aware

pd.Series([pd.Timestamp(r'20160101', tz=r'US/Eastern'), pd.Timestamp(r'20160101', tz=r'US/Eastern')]).unique() pd.unique(pd.Series([pd.Timestamp('20160101', tz='US/Eastern'), pd.Timestamp('20160101', tz='US/Eastern')]))

Index, returns a DatetimeIndex

pd.Index([pd.Timestamp('20160101', tz='US/Eastern'), pd.Timestamp('20160101', tz='US/Eastern')]).unique() pd.unique(pd.Index([pd.Timestamp('20160101', tz='US/Eastern'), pd.Timestamp('20160101', tz='US/Eastern')]))
Categoricals

Previous behaviour:

.. code-block:: ipython

In [1]: pd.Series(list('baabc'), dtype='category').unique() Out[1]: [b, a, c] Categories (3, object): [b, a, c]

In [2]: pd.unique(pd.Series(list('baabc'), dtype='category')) Out[2]: array(['b', 'a', 'c'], dtype=object)

New behavior:

.. ipython:: python

returns a Categorical

pd.Series(list('baabc'), dtype='category').unique() pd.unique(pd.Series(list('baabc'), dtype='category'))

.. _whatsnew_0200.api_breaking.s3:

S3 file handling ^^^^^^^^^^^^^^^^

pandas now uses s3fs <http://s3fs.readthedocs.io/>_ for handling S3 connections. This shouldn't break any code. However, since s3fs is not a required dependency, you will need to install it separately, like boto in prior versions of pandas. (:issue:11915).

.. _whatsnew_0200.api_breaking.partial_string_indexing:

Partial string indexing changes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:ref:DatetimeIndex Partial String Indexing <timeseries.partialindexing> now works as an exact match, provided that string resolution coincides with index resolution, including a case when both are seconds (:issue:14826). See :ref:Slice vs. Exact Match <timeseries.slice_vs_exact_match> for details.

.. ipython:: python

df = pd.DataFrame({'a': [1, 2, 3]}, pd.DatetimeIndex(['2011-12-31 23:59:59', '2012-01-01 00:00:00', '2012-01-01 00:00:01'])) Previous behavior:

.. code-block:: ipython

In [4]: df['2011-12-31 23:59:59'] Out[4]: a 2011-12-31 23:59:59 1

In [5]: df['a']['2011-12-31 23:59:59'] Out[5]: 2011-12-31 23:59:59 1 Name: a, dtype: int64

New behavior:

.. code-block:: ipython

In [4]: df['2011-12-31 23:59:59'] KeyError: '2011-12-31 23:59:59'

In [5]: df['a']['2011-12-31 23:59:59'] Out[5]: 1

.. _whatsnew_0200.api_breaking.concat_dtypes:

Concat of different float dtypes will not automatically upcast ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Previously, concat of multiple objects with different float dtypes would automatically upcast results to a dtype of float64. Now the smallest acceptable dtype will be used (:issue:13247)

.. ipython:: python

df1 = pd.DataFrame(np.array([1.0], dtype=np.float32, ndmin=2)) df1.dtypes

df2 = pd.DataFrame(np.array([np.nan], dtype=np.float32, ndmin=2)) df2.dtypes

Previous behavior:

.. code-block:: ipython

In [7]: pd.concat([df1, df2]).dtypes Out[7]: 0 float64 dtype: object

New behavior:

.. ipython:: python

pd.concat([df1, df2]).dtypes

.. _whatsnew_0200.api_breaking.gbq:

pandas Google BigQuery support has moved ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

pandas has split off Google BigQuery support into a separate package pandas-gbq. You can conda install pandas-gbq -c conda-forge or pip install pandas-gbq to get it. The functionality of :func:read_gbq and :meth:DataFrame.to_gbq remain the same with the currently released version of pandas-gbq=0.1.4. Documentation is now hosted here <https://pandas-gbq.readthedocs.io/>__ (:issue:15347)

.. _whatsnew_0200.api_breaking.memory_usage:

Memory usage for Index is more accurate ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions, showing .memory_usage() on a pandas structure that has an index, would only include actual index values and not include structures that facilitated fast indexing. This will generally be different for Index and MultiIndex and less-so for other index types. (:issue:15237)

Previous behavior:

.. code-block:: ipython

In [8]: index = pd.Index(['foo', 'bar', 'baz'])

In [9]: index.memory_usage(deep=True) Out[9]: 180

In [10]: index.get_loc('foo') Out[10]: 0

In [11]: index.memory_usage(deep=True) Out[11]: 180

New behavior:

.. code-block:: ipython

In [8]: index = pd.Index(['foo', 'bar', 'baz'])

In [9]: index.memory_usage(deep=True) Out[9]: 180

In [10]: index.get_loc('foo') Out[10]: 0

In [11]: index.memory_usage(deep=True) Out[11]: 260

.. _whatsnew_0200.api_breaking.sort_index:

DataFrame.sort_index changes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In certain cases, calling .sort_index() on a MultiIndexed DataFrame would return the same DataFrame without seeming to sort. This would happen with a lexsorted, but non-monotonic levels. (:issue:15622, :issue:15687, :issue:14015, :issue:13431, :issue:15797)

This is unchanged from prior versions, but shown for illustration purposes:

.. code-block:: python

In [81]: df = pd.DataFrame(np.arange(6), columns=['value'], ....: index=pd.MultiIndex.from_product([list('BA'), range(3)])) ....: In [82]: df

Out[82]: value B 0 0 1 1 2 2 A 0 3 1 4 2 5

[6 rows x 1 columns]

.. code-block:: python

In [87]: df.index.is_lexsorted() Out[87]: False

In [88]: df.index.is_monotonic Out[88]: False

Sorting works as expected

.. ipython:: python

df.sort_index()

.. code-block:: python

In [90]: df.sort_index().index.is_lexsorted() Out[90]: True

In [91]: df.sort_index().index.is_monotonic Out[91]: True

However, this example, which has a non-monotonic 2nd level, doesn't behave as desired.

.. ipython:: python

df = pd.DataFrame({'value': [1, 2, 3, 4]}, index=pd.MultiIndex([['a', 'b'], ['bb', 'aa']], [[0, 0, 1, 1], [0, 1, 0, 1]])) df

Previous behavior:

.. code-block:: python

In [11]: df.sort_index() Out[11]: value a bb 1 aa 2 b bb 3 aa 4

In [14]: df.sort_index().index.is_lexsorted() Out[14]: True

In [15]: df.sort_index().index.is_monotonic Out[15]: False

New behavior:

.. code-block:: python

In [94]: df.sort_index() Out[94]: value a aa 2 bb 1 b aa 4 bb 3

[4 rows x 1 columns]

In [95]: df.sort_index().index.is_lexsorted() Out[95]: True

In [96]: df.sort_index().index.is_monotonic Out[96]: True

.. _whatsnew_0200.api_breaking.groupby_describe:

GroupBy describe formatting ^^^^^^^^^^^^^^^^^^^^^^^^^^^

The output formatting of groupby.describe() now labels the describe() metrics in the columns instead of the index. This format is consistent with groupby.agg() when applying multiple functions at once. (:issue:4792)

Previous behavior:

.. code-block:: ipython

In [1]: df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, 2, 3, 4]})

In [2]: df.groupby('A').describe() Out[2]: B A 1 count 2.000000 mean 1.500000 std 0.707107 min 1.000000 25% 1.250000 50% 1.500000 75% 1.750000 max 2.000000 2 count 2.000000 mean 3.500000 std 0.707107 min 3.000000 25% 3.250000 50% 3.500000 75% 3.750000 max 4.000000

In [3]: df.groupby('A').agg(["mean", "std", "min", "max"]) Out[3]: B mean std amin amax A 1 1.5 0.707107 1 2 2 3.5 0.707107 3 4

New behavior:

.. ipython:: python

df = pd.DataFrame({'A': [1, 1, 2, 2], 'B': [1, 2, 3, 4]})

df.groupby('A').describe()

df.groupby('A').agg(["mean", "std", "min", "max"])

.. _whatsnew_0200.api_breaking.rolling_pairwise:

Window binary corr/cov operations return a MultiIndex DataFrame ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

A binary window operation, like .corr() or .cov(), when operating on a .rolling(..), .expanding(..), or .ewm(..) object, will now return a 2-level MultiIndexed DataFrame rather than a Panel, as Panel is now deprecated, see :ref:here <whatsnew_0200.api_breaking.deprecate_panel>. These are equivalent in function, but a MultiIndexed DataFrame enjoys more support in pandas. See the section on :ref:Windowed Binary Operations <window.cov_corr> for more information. (:issue:15677)

.. ipython:: python

np.random.seed(1234) df = pd.DataFrame(np.random.rand(100, 2), columns=pd.Index(['A', 'B'], name='bar'), index=pd.date_range('20160101', periods=100, freq='D', name='foo')) df.tail()

Previous behavior:

.. code-block:: ipython

In [2]: df.rolling(12).corr() Out[2]: <class 'pandas.core.panel.Panel'> Dimensions: 100 (items) x 2 (major_axis) x 2 (minor_axis) Items axis: 2016-01-01 00:00:00 to 2016-04-09 00:00:00 Major_axis axis: A to B Minor_axis axis: A to B

New behavior:

.. ipython:: python

res = df.rolling(12).corr() res.tail()

Retrieving a correlation matrix for a cross-section

.. ipython:: python

df.rolling(12).corr().loc['2016-04-07']

.. _whatsnew_0200.api_breaking.hdfstore_where:

HDFStore where string comparison ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

In previous versions most types could be compared to string column in a HDFStore usually resulting in an invalid comparison, returning an empty result frame. These comparisons will now raise a TypeError (:issue:15492)

.. ipython:: python

df = pd.DataFrame({'unparsed_date': ['2014-01-01', '2014-01-01']}) df.to_hdf('store.h5', key='key', format='table', data_columns=True) df.dtypes

Previous behavior:

.. code-block:: ipython

In [4]: pd.read_hdf('store.h5', 'key', where='unparsed_date > ts') File "<string>", line 1 (unparsed_date > 1970-01-01 00:00:01.388552400) ^ SyntaxError: invalid token

New behavior:

.. code-block:: ipython

In [18]: ts = pd.Timestamp('2014-01-01')

In [19]: pd.read_hdf('store.h5', 'key', where='unparsed_date > ts') TypeError: Cannot compare 2014-01-01 00:00:00 of type <class 'pandas.tslib.Timestamp'> to string column

.. ipython:: python :suppress:

import os os.remove('store.h5')

.. _whatsnew_0200.api_breaking.index_order:

Index.intersection and inner join now preserve the order of the left Index ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

:meth:Index.intersection now preserves the order of the calling Index (left) instead of the other Index (right) (:issue:15582). This affects inner joins, :meth:DataFrame.join and :func:merge, and the .align method.

Index.intersection

.. ipython:: python

left = pd.Index([2, 1, 0]) left right = pd.Index([1, 2, 3]) right

Previous behavior:

.. code-block:: ipython

In [4]: left.intersection(right) Out[4]: Int64Index([1, 2], dtype='int64')

New behavior:

.. ipython:: python

left.intersection(right)
DataFrame.join and pd.merge

.. ipython:: python

left = pd.DataFrame({'a': [20, 10, 0]}, index=[2, 1, 0]) left right = pd.DataFrame({'b': [100, 200, 300]}, index=[1, 2, 3]) right

Previous behavior:

.. code-block:: ipython

In [4]: left.join(right, how='inner') Out[4]: a b 1 10 100 2 20 200

New behavior:

.. ipython:: python

left.join(right, how='inner')

.. _whatsnew_0200.api_breaking.pivot_table:

Pivot table always returns a DataFrame ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The documentation for :meth:pivot_table states that a DataFrame is always returned. Here a bug is fixed that allowed this to return a Series under certain circumstance. (:issue:4386)

.. ipython:: python

df = pd.DataFrame({'col1': [3, 4, 5], 'col2': ['C', 'D', 'E'], 'col3': [1, 3, 9]}) df

Previous behavior:

.. code-block:: ipython

In [2]: df.pivot_table('col1', index=['col3', 'col2'], aggfunc="sum") Out[2]: col3 col2 1 C 3 3 D 4 9 E 5 Name: col1, dtype: int64

New behavior:

.. ipython:: python

df.pivot_table('col1', index=['col3', 'col2'], aggfunc="sum")

.. _whatsnew_0200.api:

Other API changes ^^^^^^^^^^^^^^^^^

numexpr version is now required to be >= 2.4.6 and it will not be used at all if this requisite is not fulfilled (:issue:15213).
CParserError has been renamed to ParserError in pd.read_csv() and will be removed in the future (:issue:12665)
SparseArray.cumsum() and SparseSeries.cumsum() will now always return SparseArray and SparseSeries respectively (:issue:12855)
DataFrame.applymap() with an empty DataFrame will return a copy of the empty DataFrame instead of a Series (:issue:8222)
Series.map() now respects default values of dictionary subclasses with a __missing__ method, such as collections.Counter (:issue:15999)
.loc has compat with .ix for accepting iterators, and NamedTuples (:issue:15120)
interpolate() and fillna() will raise a ValueError if the limit keyword argument is not greater than 0. (:issue:9217)
pd.read_csv() will now issue a ParserWarning whenever there are conflicting values provided by the dialect parameter and the user (:issue:14898)
pd.read_csv() will now raise a ValueError for the C engine if the quote character is larger than one byte (:issue:11592)
inplace arguments now require a boolean value, else a ValueError is thrown (:issue:14189)
pandas.api.types.is_datetime64_ns_dtype will now report True on a tz-aware dtype, similar to pandas.api.types.is_datetime64_any_dtype
DataFrame.asof() will return a null filled Series instead the scalar NaN if a match is not found (:issue:15118)
Specific support for copy.copy() and copy.deepcopy() functions on NDFrame objects (:issue:15444)
Series.sort_values() accepts a one element list of bool for consistency with the behavior of DataFrame.sort_values() (:issue:15604)
.merge() and .join() on category dtype columns will now preserve the category dtype when possible (:issue:10409)
SparseDataFrame.default_fill_value will be 0, previously was nan in the return from pd.get_dummies(..., sparse=True) (:issue:15594)
The default behaviour of Series.str.match has changed from extracting groups to matching the pattern. The extracting behaviour was deprecated since pandas version 0.13.0 and can be done with the Series.str.extract method (:issue:5224). As a consequence, the as_indexer keyword is ignored (no longer needed to specify the new behaviour) and is deprecated.
NaT will now correctly report False for datetimelike boolean operations such as is_month_start (:issue:15781)
NaT will now correctly return np.nan for Timedelta and Period accessors such as days and quarter (:issue:15782)
NaT will now returns NaT for tz_localize and tz_convert methods (:issue:15830)
DataFrame and Panel constructors with invalid input will now raise ValueError rather than PandasError, if called with scalar inputs and not axes (:issue:15541)
DataFrame and Panel constructors with invalid input will now raise ValueError rather than pandas.core.common.PandasError, if called with scalar inputs and not axes; The exception PandasError is removed as well. (:issue:15541)
The exception pandas.core.common.AmbiguousIndexError is removed as it is not referenced (:issue:15541)

.. _whatsnew_0200.privacy:

Reorganization of the library: privacy changes


.. _whatsnew_0200.privacy.extensions:

Modules privacy has changed
^^^^^^^^^^^^^^^^^^^^^^^^^^^

Some formerly public python/c/c++/cython extension modules have been moved and/or renamed. These are all removed from the public API.
Furthermore, the ``pandas.core``, ``pandas.compat``, and ``pandas.util`` top-level modules are now considered to be PRIVATE.
If indicated, a deprecation warning will be issued if you reference these modules. (:issue:`12588`)

.. csv-table::
    :header: "Previous Location", "New Location", "Deprecated"
    :widths: 30, 30, 4

    "pandas.lib", "pandas._libs.lib", "X"
    "pandas.tslib", "pandas._libs.tslib", "X"
    "pandas.computation", "pandas.core.computation", "X"
    "pandas.msgpack", "pandas.io.msgpack", ""
    "pandas.index", "pandas._libs.index", ""
    "pandas.algos", "pandas._libs.algos", ""
    "pandas.hashtable", "pandas._libs.hashtable", ""
    "pandas.indexes", "pandas.core.indexes", ""
    "pandas.json", "pandas._libs.json / pandas.io.json", "X"
    "pandas.parser", "pandas._libs.parsers", "X"
    "pandas.formats", "pandas.io.formats", ""
    "pandas.sparse", "pandas.core.sparse", ""
    "pandas.tools", "pandas.core.reshape", "X"
    "pandas.types", "pandas.core.dtypes", "X"
    "pandas.io.sas.saslib", "pandas.io.sas._sas", ""
    "pandas._join", "pandas._libs.join", ""
    "pandas._hash", "pandas._libs.hashing", ""
    "pandas._period", "pandas._libs.period", ""
    "pandas._sparse", "pandas._libs.sparse", ""
    "pandas._testing", "pandas._libs.testing", ""
    "pandas._window", "pandas._libs.window", ""


Some new subpackages are created with public functionality that is not directly
exposed in the top-level namespace: ``pandas.errors``, ``pandas.plotting`` and
``pandas.testing`` (more details below). Together with ``pandas.api.types`` and
certain functions in the ``pandas.io`` and ``pandas.tseries`` submodules,
these are now the public subpackages.

Further changes:

- The function :func:`~pandas.api.types.union_categoricals` is now importable from ``pandas.api.types``, formerly from ``pandas.types.concat`` (:issue:`15998`)
- The type import ``pandas.tslib.NaTType`` is deprecated and can be replaced by using ``type(pandas.NaT)`` (:issue:`16146`)
- The public functions in ``pandas.tools.hashing`` deprecated from that locations, but are now importable from ``pandas.util`` (:issue:`16223`)
- The modules in ``pandas.util``: ``decorators``, ``print_versions``, ``doctools``, ``validators``, ``depr_module`` are now private. Only the functions exposed in ``pandas.util`` itself are public (:issue:`16223`)

.. _whatsnew_0200.privacy.errors:

``pandas.errors``
^^^^^^^^^^^^^^^^^

We are adding a standard public module for all pandas exceptions & warnings ``pandas.errors``. (:issue:`14800`). Previously
these exceptions & warnings could be imported from ``pandas.core.common`` or ``pandas.io.common``. These exceptions and warnings
will be removed from the ``*.common`` locations in a future release. (:issue:`15541`)

The following are now part of this API:

.. code-block:: python

   ['DtypeWarning',
    'EmptyDataError',
    'OutOfBoundsDatetime',
    'ParserError',
    'ParserWarning',
    'PerformanceWarning',
    'UnsortedIndexError',
    'UnsupportedFunctionCall']


.. _whatsnew_0200.privacy.testing:

``pandas.testing``
^^^^^^^^^^^^^^^^^^

We are adding a standard module that exposes the public testing functions in ``pandas.testing`` (:issue:`9895`). Those functions can be used when writing tests for functionality using pandas objects.

The following testing functions are now part of this API:

- :func:`testing.assert_frame_equal`
- :func:`testing.assert_series_equal`
- :func:`testing.assert_index_equal`


.. _whatsnew_0200.privacy.plotting:

``pandas.plotting``
^^^^^^^^^^^^^^^^^^^

A new public ``pandas.plotting`` module has been added that holds plotting functionality that was previously in either ``pandas.tools.plotting`` or in the top-level namespace. See the :ref:`deprecations sections <whatsnew_0200.privacy.deprecate_plotting>` for more details.

.. _whatsnew_0200.privacy.development:

Other development changes
^^^^^^^^^^^^^^^^^^^^^^^^^

- Building pandas for development now requires ``cython >= 0.23`` (:issue:`14831`)
- Require at least 0.23 version of cython to avoid problems with character encodings (:issue:`14699`)
- Switched the test framework to use `pytest <http://doc.pytest.org/en/latest>`__ (:issue:`13097`)
- Reorganization of tests directory layout (:issue:`14854`, :issue:`15707`).


.. _whatsnew_0200.deprecations:

Deprecations
~~~~~~~~~~~~

.. _whatsnew_0200.api_breaking.deprecate_ix:

Deprecate ``.ix``
^^^^^^^^^^^^^^^^^

The ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc`` and ``.loc`` indexers. ``.ix`` offers a lot of magic on the inference of what the user wants to do. More specifically, ``.ix`` can decide to index *positionally* OR via *labels*, depending on the data type of the index. This has caused quite a bit of user confusion over the years. The full indexing documentation is :ref:`here <indexing>`. (:issue:`14218`)

The recommended methods of indexing are:

- ``.loc`` if you want to *label* index
- ``.iloc`` if you want to *positionally* index.

Using ``.ix`` will now show a ``DeprecationWarning`` with a link to some examples of how to convert code `here <https://pandas.pydata.org/pandas-docs/version/1.0/user_guide/indexing.html#ix-indexer-is-deprecated>`__.


.. ipython:: python

   df = pd.DataFrame({'A': [1, 2, 3],
                      'B': [4, 5, 6]},
                     index=list('abc'))

   df

Previous behavior, where you wish to get the 0th and the 2nd elements from the index in the 'A' column.

.. code-block:: ipython

   In [3]: df.ix[[0, 2], 'A']
   Out[3]:
   a    1
   c    3
   Name: A, dtype: int64

Using ``.loc``. Here we will select the appropriate indexes from the index, then use *label* indexing.

.. ipython:: python

   df.loc[df.index[[0, 2]], 'A']

Using ``.iloc``. Here we will get the location of the 'A' column, then use *positional* indexing to select things.

.. ipython:: python

   df.iloc[[0, 2], df.columns.get_loc('A')]


.. _whatsnew_0200.api_breaking.deprecate_panel:

Deprecate Panel
^^^^^^^^^^^^^^^

``Panel`` is deprecated and will be removed in a future version. The recommended way to represent 3-D data are
with a ``MultiIndex`` on a ``DataFrame`` via the :meth:`~Panel.to_frame` or with the `xarray package <http://xarray.pydata.org/en/stable/>`__. pandas
provides a :meth:`~Panel.to_xarray` method to automate this conversion (:issue:`13563`).

.. code-block:: ipython

    In [133]: import pandas._testing as tm

    In [134]: p = tm.makePanel()

    In [135]: p
    Out[135]:
    <class 'pandas.core.panel.Panel'>
    Dimensions: 3 (items) x 3 (major_axis) x 4 (minor_axis)
    Items axis: ItemA to ItemC
    Major_axis axis: 2000-01-03 00:00:00 to 2000-01-05 00:00:00
    Minor_axis axis: A to D

Convert to a MultiIndex DataFrame

.. code-block:: ipython

    In [136]: p.to_frame()
    Out[136]:
                         ItemA     ItemB     ItemC
    major      minor
    2000-01-03 A      0.628776 -1.409432  0.209395
               B      0.988138 -1.347533 -0.896581
               C     -0.938153  1.272395 -0.161137
               D     -0.223019 -0.591863 -1.051539
    2000-01-04 A      0.186494  1.422986 -0.592886
               B     -0.072608  0.363565  1.104352
               C     -1.239072 -1.449567  0.889157
               D      2.123692 -0.414505 -0.319561
    2000-01-05 A      0.952478 -2.147855 -1.473116
               B     -0.550603 -0.014752 -0.431550
               C      0.139683 -1.195524  0.288377
               D      0.122273 -1.425795 -0.619993

    [12 rows x 3 columns]

Convert to an xarray DataArray

.. code-block:: ipython

    In [137]: p.to_xarray()
    Out[137]:
    <xarray.DataArray (items: 3, major_axis: 3, minor_axis: 4)>
    array([[[ 0.628776,  0.988138, -0.938153, -0.223019],
            [ 0.186494, -0.072608, -1.239072,  2.123692],
            [ 0.952478, -0.550603,  0.139683,  0.122273]],

           [[-1.409432, -1.347533,  1.272395, -0.591863],
            [ 1.422986,  0.363565, -1.449567, -0.414505],
            [-2.147855, -0.014752, -1.195524, -1.425795]],

           [[ 0.209395, -0.896581, -0.161137, -1.051539],
            [-0.592886,  1.104352,  0.889157, -0.319561],
            [-1.473116, -0.43155 ,  0.288377, -0.619993]]])
    Coordinates:
      * items       (items) object 'ItemA' 'ItemB' 'ItemC'
      * major_axis  (major_axis) datetime64[ns] 2000-01-03 2000-01-04 2000-01-05
      * minor_axis  (minor_axis) object 'A' 'B' 'C' 'D'

.. _whatsnew_0200.api_breaking.deprecate_group_agg_dict:

Deprecate groupby.agg() with a dictionary when renaming
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``.groupby(..).agg(..)``, ``.rolling(..).agg(..)``, and ``.resample(..).agg(..)``  syntax can accept a variable of inputs, including scalars,
list, and a dict of column names to scalars or lists. This provides a useful syntax for constructing multiple
(potentially different) aggregations.

However, ``.agg(..)`` can *also* accept a dict that allows 'renaming' of the result columns. This is a complicated and confusing syntax, as well as not consistent
between ``Series`` and ``DataFrame``. We are deprecating this 'renaming' functionality.

- We are deprecating passing a dict to a grouped/rolled/resampled ``Series``. This allowed
  one to ``rename`` the resulting aggregation, but this had a completely different
  meaning than passing a dictionary to a grouped ``DataFrame``, which accepts column-to-aggregations.
- We are deprecating passing a dict-of-dicts to a grouped/rolled/resampled ``DataFrame`` in a similar manner.

This is an illustrative example:

.. ipython:: python

   df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
                      'B': range(5),
                      'C': range(5)})
   df

Here is a typical useful syntax for computing different aggregations for different columns. This
is a natural, and useful syntax. We aggregate from the dict-to-list by taking the specified
columns and applying the list of functions. This returns a ``MultiIndex`` for the columns (this is *not* deprecated).

.. ipython:: python

   df.groupby('A').agg({'B': 'sum', 'C': 'min'})

Here's an example of the first deprecation, passing a dict to a grouped ``Series``. This
is a combination aggregation & renaming:

.. code-block:: ipython

   In [6]: df.groupby('A').B.agg({'foo': 'count'})
   FutureWarning: using a dict on a Series for aggregation
   is deprecated and will be removed in a future version

   Out[6]:
      foo
   A
   1    3
   2    2

You can accomplish the same operation, more idiomatically by:

.. ipython:: python

   df.groupby('A').B.agg(['count']).rename(columns={'count': 'foo'})


Here's an example of the second deprecation, passing a dict-of-dict to a grouped ``DataFrame``:

.. code-block:: python

   In [23]: (df.groupby('A')
       ...:    .agg({'B': {'foo': 'sum'}, 'C': {'bar': 'min'}})
       ...:  )
   FutureWarning: using a dict with renaming is deprecated and
   will be removed in a future version

   Out[23]:
        B   C
      foo bar
   A
   1   3   0
   2   7   3


You can accomplish nearly the same by:

.. ipython:: python

   (df.groupby('A')
      .agg({'B': 'sum', 'C': 'min'})
      .rename(columns={'B': 'foo', 'C': 'bar'})
    )



.. _whatsnew_0200.privacy.deprecate_plotting:

Deprecate .plotting
^^^^^^^^^^^^^^^^^^^

The ``pandas.tools.plotting`` module has been deprecated,  in favor of the top level ``pandas.plotting`` module. All the public plotting functions are now available
from ``pandas.plotting`` (:issue:`12548`).

Furthermore, the top-level ``pandas.scatter_matrix`` and ``pandas.plot_params`` are deprecated.
Users can import these from ``pandas.plotting`` as well.

Previous script:

.. code-block:: python

   pd.tools.plotting.scatter_matrix(df)
   pd.scatter_matrix(df)

Should be changed to:

.. code-block:: python

   pd.plotting.scatter_matrix(df)



.. _whatsnew_0200.deprecations.other:

Other deprecations
^^^^^^^^^^^^^^^^^^

- ``SparseArray.to_dense()`` has deprecated the ``fill`` parameter, as that parameter was not being respected (:issue:`14647`)
- ``SparseSeries.to_dense()`` has deprecated the ``sparse_only`` parameter (:issue:`14647`)
- ``Series.repeat()`` has deprecated the ``reps`` parameter in favor of ``repeats`` (:issue:`12662`)
- The ``Series`` constructor and ``.astype`` method have deprecated accepting timestamp dtypes without a frequency (e.g. ``np.datetime64``) for the ``dtype`` parameter (:issue:`15524`)
- ``Index.repeat()`` and ``MultiIndex.repeat()`` have deprecated the ``n`` parameter in favor of ``repeats`` (:issue:`12662`)
- ``Categorical.searchsorted()`` and ``Series.searchsorted()`` have deprecated the ``v`` parameter in favor of ``value`` (:issue:`12662`)
- ``TimedeltaIndex.searchsorted()``, ``DatetimeIndex.searchsorted()``, and ``PeriodIndex.searchsorted()`` have deprecated the ``key`` parameter in favor of ``value`` (:issue:`12662`)
- ``DataFrame.astype()`` has deprecated the ``raise_on_error`` parameter in favor of ``errors`` (:issue:`14878`)
- ``Series.sortlevel`` and ``DataFrame.sortlevel`` have been deprecated in favor of ``Series.sort_index`` and ``DataFrame.sort_index`` (:issue:`15099`)
- importing ``concat`` from ``pandas.tools.merge`` has been deprecated in favor of imports from the ``pandas`` namespace. This should only affect explicit imports (:issue:`15358`)
- ``Series/DataFrame/Panel.consolidate()`` been deprecated as a public method. (:issue:`15483`)
- The ``as_indexer`` keyword of ``Series.str.match()`` has been deprecated (ignored keyword) (:issue:`15257`).
- The following top-level pandas functions have been deprecated and will be removed in a future version (:issue:`13790`, :issue:`15940`)

  * ``pd.pnow()``, replaced by ``Period.now()``
  * ``pd.Term``, is removed, as it is not applicable to user code. Instead use in-line string expressions in the where clause when searching in HDFStore
  * ``pd.Expr``, is removed, as it is not applicable to user code.
  * ``pd.match()``, is removed.
  * ``pd.groupby()``, replaced by using the ``.groupby()`` method directly on a ``Series/DataFrame``
  * ``pd.get_store()``, replaced by a direct call to ``pd.HDFStore(...)``
- ``is_any_int_dtype``, ``is_floating_dtype``, and ``is_sequence`` are deprecated from ``pandas.api.types`` (:issue:`16042`)

.. _whatsnew_0200.prior_deprecations:

Removal of prior version deprecations/changes
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- The ``pandas.rpy`` module is removed. Similar functionality can be accessed
  through the `rpy2 <https://rpy2.readthedocs.io/>`__ project.
  See the `R interfacing docs <https://pandas.pydata.org/pandas-docs/version/0.20/r_interface.html>`__ for more details.
- The ``pandas.io.ga`` module with a ``google-analytics`` interface is removed (:issue:`11308`).
  Similar functionality can be found in the `Google2Pandas <https://github.com/panalysis/Google2Pandas>`__ package.
- ``pd.to_datetime`` and ``pd.to_timedelta`` have dropped the ``coerce`` parameter in favor of ``errors`` (:issue:`13602`)
- ``pandas.stats.fama_macbeth``, ``pandas.stats.ols``, ``pandas.stats.plm`` and ``pandas.stats.var``, as well as the top-level ``pandas.fama_macbeth`` and ``pandas.ols`` routines are removed. Similar functionality can be found in the `statsmodels <https://www.statsmodels.org/dev/>`__ package. (:issue:`11898`)
- The ``TimeSeries`` and ``SparseTimeSeries`` classes, aliases of ``Series``
  and ``SparseSeries``, are removed (:issue:`10890`, :issue:`15098`).
- ``Series.is_time_series`` is dropped in favor of ``Series.index.is_all_dates`` (:issue:`15098`)
- The deprecated ``irow``, ``icol``, ``iget`` and ``iget_value`` methods are removed
  in favor of ``iloc`` and ``iat`` as explained :ref:`here <whatsnew_0170.deprecations>` (:issue:`10711`).
- The deprecated ``DataFrame.iterkv()`` has been removed in favor of ``DataFrame.iteritems()`` (:issue:`10711`)
- The ``Categorical`` constructor has dropped the ``name`` parameter (:issue:`10632`)
- ``Categorical`` has dropped support for ``NaN`` categories (:issue:`10748`)
- The ``take_last`` parameter has been dropped from ``duplicated()``, ``drop_duplicates()``, ``nlargest()``, and ``nsmallest()`` methods (:issue:`10236`, :issue:`10792`, :issue:`10920`)
- ``Series``, ``Index``, and ``DataFrame`` have dropped the ``sort`` and ``order`` methods (:issue:`10726`)
- Where clauses in ``pytables`` are only accepted as strings and expressions types and not other data-types (:issue:`12027`)
- ``DataFrame`` has dropped the ``combineAdd`` and ``combineMult`` methods in favor of ``add`` and ``mul`` respectively (:issue:`10735`)

.. _whatsnew_0200.performance:

Performance improvements
~~~~~~~~~~~~~~~~~~~~~~~~

- Improved performance of ``pd.wide_to_long()`` (:issue:`14779`)
- Improved performance of ``pd.factorize()`` by releasing the GIL with ``object`` dtype when inferred as strings (:issue:`14859`, :issue:`16057`)
- Improved performance of timeseries plotting with an irregular DatetimeIndex
  (or with ``compat_x=True``) (:issue:`15073`).
- Improved performance of ``groupby().cummin()`` and ``groupby().cummax()`` (:issue:`15048`, :issue:`15109`, :issue:`15561`, :issue:`15635`)
- Improved performance and reduced memory when indexing with a ``MultiIndex`` (:issue:`15245`)
- When reading buffer object in ``read_sas()`` method without specified format, filepath string is inferred rather than buffer object. (:issue:`14947`)
- Improved performance of ``.rank()`` for categorical data (:issue:`15498`)
- Improved performance when using ``.unstack()`` (:issue:`15503`)
- Improved performance of merge/join on ``category`` columns (:issue:`10409`)
- Improved performance of ``drop_duplicates()`` on ``bool`` columns (:issue:`12963`)
- Improve performance of ``pd.core.groupby.GroupBy.apply`` when the applied
  function used the ``.name`` attribute of the group DataFrame (:issue:`15062`).
- Improved performance of ``iloc`` indexing with a list or array (:issue:`15504`).
- Improved performance of ``Series.sort_index()`` with a monotonic index (:issue:`15694`)
- Improved performance in ``pd.read_csv()`` on some platforms with buffered reads (:issue:`16039`)

.. _whatsnew_0200.bug_fixes:

Bug fixes
~~~~~~~~~

Conversion
^^^^^^^^^^

- Bug in ``Timestamp.replace`` now raises ``TypeError`` when incorrect argument names are given; previously this raised ``ValueError`` (:issue:`15240`)
- Bug in ``Timestamp.replace`` with compat for passing long integers (:issue:`15030`)
- Bug in ``Timestamp`` returning UTC based time/date attributes when a timezone was provided (:issue:`13303`, :issue:`6538`)
- Bug in ``Timestamp`` incorrectly localizing timezones during construction (:issue:`11481`, :issue:`15777`)
- Bug in ``TimedeltaIndex`` addition where overflow was being allowed without error (:issue:`14816`)
- Bug in ``TimedeltaIndex`` raising a ``ValueError`` when boolean indexing with ``loc`` (:issue:`14946`)
- Bug in catching an overflow in ``Timestamp`` + ``Timedelta/Offset`` operations (:issue:`15126`)
- Bug in ``DatetimeIndex.round()`` and ``Timestamp.round()`` floating point accuracy when rounding by milliseconds or less (:issue:`14440`, :issue:`15578`)
- Bug in ``astype()`` where ``inf`` values were incorrectly converted to integers. Now raises error now with ``astype()`` for Series and DataFrames (:issue:`14265`)
- Bug in ``DataFrame(..).apply(to_numeric)`` when values are of type decimal.Decimal. (:issue:`14827`)
- Bug in ``describe()`` when passing a numpy array which does not contain the median to the ``percentiles`` keyword argument (:issue:`14908`)
- Cleaned up ``PeriodIndex`` constructor, including raising on floats more consistently (:issue:`13277`)
- Bug in using ``__deepcopy__`` on empty NDFrame objects (:issue:`15370`)
- Bug in ``.replace()`` may result in incorrect dtypes. (:issue:`12747`, :issue:`15765`)
- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
- Bug in ``Series.replace`` which replaced a numeric by string (:issue:`15743`)
- Bug in ``Index`` construction with ``NaN`` elements and integer dtype specified (:issue:`15187`)
- Bug in ``Series`` construction with a datetimetz (:issue:`14928`)
- Bug in ``Series.dt.round()`` inconsistent behaviour on ``NaT`` 's with different arguments (:issue:`14940`)
- Bug in ``Series`` constructor when both ``copy=True`` and ``dtype`` arguments are provided (:issue:`15125`)
- Incorrect dtyped ``Series`` was returned by comparison methods (e.g., ``lt``, ``gt``, ...) against a constant for an empty ``DataFrame`` (:issue:`15077`)
- Bug in ``Series.ffill()`` with mixed dtypes containing tz-aware datetimes. (:issue:`14956`)
- Bug in ``DataFrame.fillna()`` where the argument ``downcast`` was ignored when fillna value was of type ``dict`` (:issue:`15277`)
- Bug in ``.asfreq()``, where frequency was not set for empty ``Series`` (:issue:`14320`)
- Bug in ``DataFrame`` construction with nulls and datetimes in a list-like (:issue:`15869`)
- Bug in ``DataFrame.fillna()`` with tz-aware datetimes (:issue:`15855`)
- Bug in ``is_string_dtype``, ``is_timedelta64_ns_dtype``, and ``is_string_like_dtype`` in which an error was raised when ``None`` was passed in (:issue:`15941`)
- Bug in the return type of ``pd.unique`` on a ``Categorical``, which was returning an ndarray and not a ``Categorical`` (:issue:`15903`)
- Bug in ``Index.to_series()`` where the index was not copied (and so mutating later would change the original), (:issue:`15949`)
- Bug in indexing with partial string indexing with a len-1 DataFrame (:issue:`16071`)
- Bug in ``Series`` construction where passing invalid dtype didn't raise an error. (:issue:`15520`)

Indexing
^^^^^^^^

- Bug in ``Index`` power operations with reversed operands (:issue:`14973`)
- Bug in ``DataFrame.sort_values()`` when sorting by multiple columns where one column is of type ``int64`` and contains ``NaT`` (:issue:`14922`)
- Bug in ``DataFrame.reindex()`` in which ``method`` was ignored when passing ``columns`` (:issue:`14992`)
- Bug in ``DataFrame.loc`` with indexing a ``MultiIndex`` with a ``Series`` indexer (:issue:`14730`, :issue:`15424`)
- Bug in ``DataFrame.loc`` with indexing a ``MultiIndex`` with a numpy array (:issue:`15434`)
- Bug in ``Series.asof`` which raised if the series contained all ``np.nan`` (:issue:`15713`)
- Bug in ``.at`` when selecting from a tz-aware column (:issue:`15822`)
- Bug in ``Series.where()`` and ``DataFrame.where()`` where array-like conditionals were being rejected (:issue:`15414`)
- Bug in ``Series.where()`` where TZ-aware data was converted to float representation (:issue:`15701`)
- Bug in ``.loc`` that would not return the correct dtype for scalar access for a DataFrame (:issue:`11617`)
- Bug in output formatting of a ``MultiIndex`` when names are integers (:issue:`12223`, :issue:`15262`)
- Bug in ``Categorical.searchsorted()`` where alphabetical instead of the provided categorical order was used (:issue:`14522`)
- Bug in ``Series.iloc`` where a ``Categorical`` object for list-like indexes input was returned, where a ``Series`` was expected. (:issue:`14580`)
- Bug in ``DataFrame.isin`` comparing datetimelike to empty frame (:issue:`15473`)
- Bug in ``.reset_index()`` when an all ``NaN`` level of a ``MultiIndex`` would fail (:issue:`6322`)
- Bug in ``.reset_index()`` when raising error for index name already present in ``MultiIndex`` columns (:issue:`16120`)
- Bug in creating a ``MultiIndex`` with tuples and not passing a list of names; this will now raise ``ValueError`` (:issue:`15110`)
- Bug in the HTML display with a ``MultiIndex`` and truncation (:issue:`14882`)
- Bug in the display of ``.info()`` where a qualifier (+) would always be displayed with a ``MultiIndex`` that contains only non-strings (:issue:`15245`)
- Bug in ``pd.concat()`` where the names of ``MultiIndex`` of resulting ``DataFrame`` are not handled correctly when ``None`` is presented in the names of ``MultiIndex`` of input ``DataFrame`` (:issue:`15787`)
- Bug in ``DataFrame.sort_index()`` and ``Series.sort_index()`` where ``na_position`` doesn't work with a ``MultiIndex`` (:issue:`14784`, :issue:`16604`)
- Bug in ``pd.concat()`` when combining objects with a ``CategoricalIndex`` (:issue:`16111`)
- Bug in indexing with a scalar and a ``CategoricalIndex`` (:issue:`16123`)

IO
^^

- Bug in ``pd.to_numeric()`` in which float and unsigned integer elements were being improperly casted (:issue:`14941`, :issue:`15005`)
- Bug in ``pd.read_fwf()`` where the skiprows parameter was not being respected during column width inference (:issue:`11256`)
- Bug in ``pd.read_csv()`` in which the ``dialect`` parameter was not being verified before processing (:issue:`14898`)
- Bug in ``pd.read_csv()`` in which missing data was being improperly handled with ``usecols`` (:issue:`6710`)
- Bug in ``pd.read_csv()`` in which a file containing a row with many columns followed by rows with fewer columns would cause a crash (:issue:`14125`)
- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
- Bug in ``pd.read_csv()`` with ``parse_dates`` when multi-line headers are specified (:issue:`15376`)
- Bug in ``pd.read_csv()`` with ``float_precision='round_trip'`` which caused a segfault when a text entry is parsed (:issue:`15140`)
- Bug in ``pd.read_csv()`` when an index was specified and no values were specified as null values (:issue:`15835`)
- Bug in ``pd.read_csv()`` in which certain invalid file objects caused the Python interpreter to crash (:issue:`15337`)
- Bug in ``pd.read_csv()`` in which invalid values for ``nrows`` and ``chunksize`` were allowed (:issue:`15767`)
- Bug in ``pd.read_csv()`` for the Python engine in which unhelpful error messages were being raised when parsing errors occurred (:issue:`15910`)
- Bug in ``pd.read_csv()`` in which the ``skipfooter`` parameter was not being properly validated (:issue:`15925`)
- Bug in ``pd.to_csv()`` in which there was numeric overflow when a timestamp index was being written (:issue:`15982`)
- Bug in ``pd.util.hashing.hash_pandas_object()`` in which hashing of categoricals depended on the ordering of categories, instead of just their values. (:issue:`15143`)
- Bug in ``.to_json()`` where ``lines=True`` and contents (keys or values) contain escaped characters (:issue:`15096`)
- Bug in ``.to_json()`` causing single byte ascii characters to be expanded to four byte unicode (:issue:`15344`)
- Bug in ``.to_json()`` for the C engine where rollover was not correctly handled for case where frac is odd and diff is exactly 0.5 (:issue:`15716`, :issue:`15864`)
- Bug in ``pd.read_json()`` for Python 2 where ``lines=True`` and contents contain non-ascii unicode characters (:issue:`15132`)
- Bug in ``pd.read_msgpack()`` in which ``Series`` categoricals were being improperly processed (:issue:`14901`)
- Bug in ``pd.read_msgpack()`` which did not allow loading of a dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
- Bug in ``pd.read_msgpack()`` when deserializing a ``CategoricalIndex`` (:issue:`15487`)
- Bug in ``DataFrame.to_records()`` with converting a ``DatetimeIndex`` with a timezone (:issue:`13937`)
- Bug in ``DataFrame.to_records()`` which failed with unicode characters in column names (:issue:`11879`)
- Bug in ``.to_sql()`` when writing a DataFrame with numeric index names (:issue:`15404`).
- Bug in ``DataFrame.to_html()`` with ``index=False`` and ``max_rows`` raising in ``IndexError`` (:issue:`14998`)
- Bug in ``pd.read_hdf()`` passing a ``Timestamp`` to the ``where`` parameter with a non date column (:issue:`15492`)
- Bug in ``DataFrame.to_stata()`` and ``StataWriter`` which produces incorrectly formatted files to be produced for some locales (:issue:`13856`)
- Bug in ``StataReader`` and ``StataWriter`` which allows invalid encodings (:issue:`15723`)
- Bug in the ``Series`` repr not showing the length when the output was truncated (:issue:`15962`).

Plotting
^^^^^^^^

- Bug in ``DataFrame.hist`` where ``plt.tight_layout`` caused an ``AttributeError``  (use ``matplotlib >= 2.0.1``) (:issue:`9351`)
- Bug in ``DataFrame.boxplot`` where ``fontsize`` was not applied to the tick labels on both axes (:issue:`15108`)
- Bug in the date and time converters pandas registers with matplotlib not handling multiple dimensions (:issue:`16026`)
- Bug in ``pd.scatter_matrix()`` could accept either ``color`` or ``c``, but not both (:issue:`14855`)

GroupBy/resample/rolling
^^^^^^^^^^^^^^^^^^^^^^^^

- Bug in ``.groupby(..).resample()`` when passed the ``on=`` kwarg. (:issue:`15021`)
- Properly set ``__name__`` and ``__qualname__`` for ``Groupby.*`` functions (:issue:`14620`)
- Bug in ``GroupBy.get_group()`` failing with a categorical grouper (:issue:`15155`)
- Bug in ``.groupby(...).rolling(...)`` when ``on`` is specified and using a ``DatetimeIndex`` (:issue:`15130`, :issue:`13966`)
- Bug in groupby operations with ``timedelta64`` when passing ``numeric_only=False`` (:issue:`5724`)
- Bug in ``groupby.apply()`` coercing ``object`` dtypes to numeric types, when not all values were numeric (:issue:`14423`, :issue:`15421`, :issue:`15670`)
- Bug in ``resample``, where a non-string ``loffset`` argument would not be applied when resampling a timeseries (:issue:`13218`)
- Bug in ``DataFrame.groupby().describe()`` when grouping on ``Index`` containing tuples (:issue:`14848`)
- Bug in ``groupby().nunique()`` with a datetimelike-grouper where bins counts were incorrect (:issue:`13453`)
- Bug in ``groupby.transform()`` that would coerce the resultant dtypes back to the original (:issue:`10972`, :issue:`11444`)
- Bug in ``groupby.agg()`` incorrectly localizing timezone on ``datetime`` (:issue:`15426`, :issue:`10668`, :issue:`13046`)
- Bug in ``.rolling/expanding()`` functions where ``count()`` was not counting ``np.Inf``, nor handling ``object`` dtypes (:issue:`12541`)
- Bug in ``.rolling()`` where ``pd.Timedelta`` or ``datetime.timedelta`` was not accepted as a ``window`` argument (:issue:`15440`)
- Bug in ``Rolling.quantile`` function that caused a segmentation fault when called with a quantile value outside of the range [0, 1] (:issue:`15463`)
- Bug in ``DataFrame.resample().median()`` if duplicate column names are present (:issue:`14233`)

Sparse
^^^^^^

- Bug in ``SparseSeries.reindex`` on single level with list of length 1 (:issue:`15447`)
- Bug in repr-formatting a ``SparseDataFrame`` after a value was set on (a copy of) one of its series (:issue:`15488`)
- Bug in ``SparseDataFrame`` construction with lists not coercing to dtype (:issue:`15682`)
- Bug in sparse array indexing in which indices were not being validated (:issue:`15863`)

Reshaping
^^^^^^^^^

- Bug in ``pd.merge_asof()`` where ``left_index`` or ``right_index`` caused a failure when multiple ``by`` was specified (:issue:`15676`)
- Bug in ``pd.merge_asof()`` where ``left_index``/``right_index`` together caused a failure when ``tolerance`` was specified (:issue:`15135`)
- Bug in ``DataFrame.pivot_table()`` where ``dropna=True`` would not drop all-NaN columns when the columns was a ``category`` dtype (:issue:`15193`)
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
- Bug in ``pd.pivot_table()`` where no error was raised when values argument was not in the columns (:issue:`14938`)
- Bug in ``pd.concat()`` in which concatenating with an empty dataframe with ``join='inner'`` was being improperly handled (:issue:`15328`)
- Bug with ``sort=True`` in ``DataFrame.join`` and ``pd.merge`` when joining on indexes (:issue:`15582`)
- Bug in ``DataFrame.nsmallest`` and ``DataFrame.nlargest`` where identical values resulted in duplicated rows (:issue:`15297`)
- Bug in :func:`pandas.pivot_table` incorrectly raising ``UnicodeError`` when passing unicode input for ``margins`` keyword (:issue:`13292`)

Numeric
^^^^^^^

- Bug in ``.rank()`` which incorrectly ranks ordered categories (:issue:`15420`)
- Bug in ``.corr()`` and ``.cov()`` where the column and index were the same object (:issue:`14617`)
- Bug in ``.mode()`` where ``mode`` was not returned if was only a single value (:issue:`15714`)
- Bug in ``pd.cut()`` with a single bin on an all 0s array (:issue:`15428`)
- Bug in ``pd.qcut()`` with a single quantile and an array with identical values (:issue:`15431`)
- Bug in ``pandas.tools.utils.cartesian_product()`` with large input can cause overflow on windows (:issue:`15265`)
- Bug in ``.eval()`` which caused multi-line evals to fail with local variables not on the first line (:issue:`15342`)

Other
^^^^^

- Compat with SciPy 0.19.0 for testing on ``.interpolate()`` (:issue:`15662`)
- Compat for 32-bit platforms for ``.qcut/cut``; bins will now be ``int64`` dtype (:issue:`14866`)
- Bug in interactions with ``Qt`` when a ``QtApplication`` already exists (:issue:`14372`)
- Avoid use of ``np.finfo()`` during ``import pandas`` removed to mitigate deadlock on Python GIL misuse (:issue:`14641`)


.. _whatsnew_0.20.0.contributors:

Contributors
~~~~~~~~~~~~

.. contributors:: v0.19.2..v0.20.0

Version 0.20.1 (May 5, 2017)

Version 0.20.1 (May 5, 2017)

Series

Index

Series, returns an array of Timestamp tz-aware

Index, returns a DatetimeIndex

returns a Categorical