Version 0.17.1 (November 21, 2015)

.. _whatsnew_0171:

.. note::

We are proud to announce that pandas has become a sponsored project of the (NumFOCUS organization_). This will help ensure the success of development of pandas as a world-class open-source project.

.. _numfocus organization: http://www.numfocus.org/blog/numfocus-announces-new-fiscally-sponsored-project-pandas

This is a minor bug-fix release from 0.17.0 and includes a large number of bug fixes along several new features, enhancements, and performance improvements. We recommend that all users upgrade to this version.

Highlights include:

Support for Conditional HTML Formatting, see :ref:here <whatsnew_0171.style>
Releasing the GIL on the csv reader & other ops, see :ref:here <whatsnew_0171.performance>
Fixed regression in DataFrame.drop_duplicates from 0.16.2, causing incorrect results on integer values (:issue:11376)

.. contents:: What's new in v0.17.1 :local: :backlinks: none

New features


.. _whatsnew_0171.style:

Conditional HTML formatting
^^^^^^^^^^^^^^^^^^^^^^^^^^^

.. warning::
    This is a new feature and is under active development.
    We'll be adding features an  possibly making breaking changes in future
    releases. Feedback is welcome in :issue:`11610`

We've added *experimental* support for conditional HTML formatting:
the visual styling of a DataFrame based on the data.
The styling is accomplished with HTML and CSS.
Accesses the styler class with the :attr:`pandas.DataFrame.style`, attribute,
an instance of :class:`.Styler` with your data attached.

Here's a quick example:

  .. ipython:: python

    np.random.seed(123)
    df = pd.DataFrame(np.random.randn(10, 5), columns=list("abcde"))
    html = df.style.background_gradient(cmap="viridis", low=0.5)

We can render the HTML to get the following table.

.. raw:: html
   :file: whatsnew_0171_html_table.html

:class:`.Styler` interacts nicely with the Jupyter Notebook.
See the :ref:`documentation </user_guide/style.ipynb>` for more.

.. _whatsnew_0171.enhancements:

Enhancements

DatetimeIndex now supports conversion to strings with astype(str) (:issue:10442)
Support for compression (gzip/bz2) in :meth:pandas.DataFrame.to_csv (:issue:7615)
pd.read_* functions can now also accept :class:python:pathlib.Path, or :class:py:py._path.local.LocalPath objects for the filepath_or_buffer argument. (:issue:11033)
- The DataFrame and Series functions .to_csv(), .to_html() and .to_latex() can now handle paths beginning with tildes (e.g. ~/Documents/) (:issue:11438)
DataFrame now uses the fields of a namedtuple as columns, if columns are not supplied (:issue:11181)
DataFrame.itertuples() now returns namedtuple objects, when possible. (:issue:11269, :issue:11625)
Added axvlines_kwds to parallel coordinates plot (:issue:10709)
Option to .info() and .memory_usage() to provide for deep introspection of memory consumption. Note that this can be expensive to compute and therefore is an optional parameter. (:issue:11595)

.. ipython:: python

df = pd.DataFrame({"A": ["foo"] * 1000}) # noqa: F821 df["B"] = df["A"].astype("category")

shows the '+' as we have object dtypes

df.info()

we have an accurate memory assessment (but can be expensive to compute this)

df.info(memory_usage="deep")
Index now has a fillna method (:issue:10089)

.. ipython:: python

pd.Index([1, np.nan, 3]).fillna(2)
Series of type category now make .str.<...> and .dt.<...> accessor methods / properties available, if the categories are of that type. (:issue:10661)

.. ipython:: python

s = pd.Series(list("aabb")).astype("category") s s.str.contains("a")

date = pd.Series(pd.date_range("1/1/2015", periods=5)).astype("category") date date.dt.day
pivot_table now has a margins_name argument so you can use something other than the default of 'All' (:issue:3335)
Implement export of datetime64[ns, tz] dtypes with a fixed HDF5 store (:issue:11411)
Pretty printing sets (e.g. in DataFrame cells) now uses set literal syntax ({x, y}) instead of Legacy Python syntax (set([x, y])) (:issue:11215)
Improve the error message in :func:pandas.io.gbq.to_gbq when a streaming insert fails (:issue:11285) and when the DataFrame does not match the schema of the destination table (:issue:11359)

.. _whatsnew_0171.api:

API changes


- raise ``NotImplementedError`` in ``Index.shift`` for non-supported index types (:issue:`8038`)
- ``min`` and ``max`` reductions on ``datetime64`` and ``timedelta64`` dtyped series now
  result in ``NaT`` and not ``nan`` (:issue:`11245`).
- Indexing with a null key will raise a ``TypeError``, instead of a ``ValueError`` (:issue:`11356`)
- ``Series.ptp`` will now ignore missing values by default (:issue:`11163`)

.. _whatsnew_0171.deprecations:

Deprecations
^^^^^^^^^^^^

- The ``pandas.io.ga`` module which implements ``google-analytics`` support is deprecated and will be removed in a future version (:issue:`11308`)
- Deprecate the ``engine`` keyword in ``.to_csv()``, which will be removed in a future version (:issue:`11274`)

.. _whatsnew_0171.performance:

Performance improvements

Checking monotonic-ness before sorting on an index (:issue:11080)
Series.dropna performance improvement when its dtype can't contain NaN (:issue:11159)
Release the GIL on most datetime field operations (e.g. DatetimeIndex.year, Series.dt.year), normalization, and conversion to and from Period, DatetimeIndex.to_period and PeriodIndex.to_timestamp (:issue:11263)
Release the GIL on some rolling algos: rolling_median, rolling_mean, rolling_max, rolling_min, rolling_var, rolling_kurt, rolling_skew (:issue:11450)
Release the GIL when reading and parsing text files in read_csv, read_table (:issue:11272)
Improved performance of rolling_median (:issue:11450)
Improved performance of to_excel (:issue:11352)
Performance bug in repr of Categorical categories, which was rendering the strings before chopping them for display (:issue:11305)
Performance improvement in Categorical.remove_unused_categories, (:issue:11643).
Improved performance of Series constructor with no data and DatetimeIndex (:issue:11433)
Improved performance of shift, cumprod, and cumsum with groupby (:issue:4095)

.. _whatsnew_0171.bug_fixes:

Bug fixes


- ``SparseArray.__iter__()`` now does not cause ``PendingDeprecationWarning`` in Python 3.5 (:issue:`11622`)
- Regression from 0.16.2 for output formatting of long floats/nan, restored in (:issue:`11302`)
- ``Series.sort_index()`` now correctly handles the ``inplace`` option (:issue:`11402`)
- Incorrectly distributed .c file in the build on ``PyPi`` when reading a csv of floats and passing ``na_values=<a scalar>`` would show an exception (:issue:`11374`)
- Bug in ``.to_latex()`` output broken when the index has a name (:issue:`10660`)
- Bug in ``HDFStore.append`` with strings whose encoded length exceeded the max unencoded length (:issue:`11234`)
- Bug in merging ``datetime64[ns, tz]`` dtypes (:issue:`11405`)
- Bug in ``HDFStore.select`` when comparing with a numpy scalar in a where clause (:issue:`11283`)
- Bug in using ``DataFrame.ix`` with a MultiIndex indexer (:issue:`11372`)
- Bug in ``date_range`` with ambiguous endpoints (:issue:`11626`)
- Prevent adding new attributes to the accessors ``.str``, ``.dt`` and ``.cat``. Retrieving such
  a value was not possible, so error out on setting it. (:issue:`10673`)
- Bug in tz-conversions with an ambiguous time and ``.dt`` accessors (:issue:`11295`)
- Bug in output formatting when using an index of ambiguous times (:issue:`11619`)
- Bug in comparisons of Series vs list-likes (:issue:`11339`)
- Bug in ``DataFrame.replace`` with a ``datetime64[ns, tz]`` and a non-compat to_replace (:issue:`11326`, :issue:`11153`)
- Bug in ``isnull`` where ``numpy.datetime64('NaT')`` in a ``numpy.array`` was not determined to be null(:issue:`11206`)
- Bug in list-like indexing with a mixed-integer Index (:issue:`11320`)
- Bug in ``pivot_table`` with ``margins=True`` when indexes are of ``Categorical`` dtype (:issue:`10993`)
- Bug in ``DataFrame.plot`` cannot use hex strings colors (:issue:`10299`)
- Regression in ``DataFrame.drop_duplicates`` from 0.16.2, causing incorrect results on integer values (:issue:`11376`)
- Bug in ``pd.eval`` where unary ops in a list error (:issue:`11235`)
- Bug in ``squeeze()`` with zero length arrays (:issue:`11230`, :issue:`8999`)
- Bug in ``describe()`` dropping column names for hierarchical indexes (:issue:`11517`)
- Bug in ``DataFrame.pct_change()`` not propagating ``axis`` keyword on ``.fillna`` method (:issue:`11150`)
- Bug in ``.to_csv()`` when a mix of integer and string column names are passed as the ``columns`` parameter (:issue:`11637`)
- Bug in indexing with a ``range``, (:issue:`11652`)
- Bug in inference of numpy scalars and preserving dtype when setting columns (:issue:`11638`)
- Bug in ``to_sql`` using unicode column names giving UnicodeEncodeError with (:issue:`11431`).
- Fix regression in setting of ``xticks`` in ``plot`` (:issue:`11529`).
- Bug in ``holiday.dates`` where observance rules could not be applied to holiday and doc enhancement (:issue:`11477`, :issue:`11533`)
- Fix plotting issues when having plain ``Axes`` instances instead of ``SubplotAxes`` (:issue:`11520`, :issue:`11556`).
- Bug in ``DataFrame.to_latex()`` produces an extra rule when ``header=False`` (:issue:`7124`)
- Bug in ``df.groupby(...).apply(func)`` when a func returns a ``Series`` containing a new datetimelike column (:issue:`11324`)
- Bug in ``pandas.json`` when file to load is big (:issue:`11344`)
- Bugs in ``to_excel`` with duplicate columns (:issue:`11007`, :issue:`10982`, :issue:`10970`)
- Fixed a bug that prevented the construction of an empty series of dtype ``datetime64[ns, tz]`` (:issue:`11245`).
- Bug in ``read_excel`` with MultiIndex containing integers (:issue:`11317`)
- Bug in ``to_excel`` with openpyxl 2.2+ and merging (:issue:`11408`)
- Bug in ``DataFrame.to_dict()`` produces a ``np.datetime64`` object instead of ``Timestamp`` when only datetime is present in data (:issue:`11327`)
- Bug in ``DataFrame.corr()`` raises exception when computes Kendall correlation for DataFrames with boolean and not boolean columns (:issue:`11560`)
- Bug in the link-time error caused by C ``inline`` functions on FreeBSD 10+ (with ``clang``) (:issue:`10510`)
- Bug in ``DataFrame.to_csv`` in passing through arguments for formatting ``MultiIndexes``, including ``date_format`` (:issue:`7791`)
- Bug in ``DataFrame.join()`` with ``how='right'`` producing a ``TypeError`` (:issue:`11519`)
- Bug in ``Series.quantile`` with empty list results has ``Index`` with ``object`` dtype (:issue:`11588`)
- Bug in ``pd.merge`` results in empty ``Int64Index`` rather than ``Index(dtype=object)`` when the merge result is empty (:issue:`11588`)
- Bug in ``Categorical.remove_unused_categories`` when having ``NaN`` values (:issue:`11599`)
- Bug in ``DataFrame.to_sparse()`` loses column names for MultiIndexes (:issue:`11600`)
- Bug in ``DataFrame.round()`` with non-unique column index producing a Fatal Python error (:issue:`11611`)
- Bug in ``DataFrame.round()`` with ``decimals`` being a non-unique indexed Series producing extra columns (:issue:`11618`)


.. _whatsnew_0.17.1.contributors:

Contributors

.. contributors:: v0.17.0..v0.17.1

Version 0.17.1 (November 21, 2015)

Version 0.17.1 (November 21, 2015)

shows the '+' as we have object dtypes

we have an accurate memory assessment (but can be expensive to compute this)