Changelog

.. note::

This is not exhaustive. For an exhaustive list of changes, see the git log.

.. _v2026.3.0:

2026.3.0

Highlights ^^^^^^^^^^

Preliminary Python 3.14t support (:pr:12223) Guido Imperiale_
Bokeh 3.9.0 compatibility (:pr-distributed:9205) Dimitri Papadopoulos Orfanos_

.. dropdown:: Additional changes

docs: document approximate algorithm and Dask-specific params in describe() (:pr:12300) Maxime Grenu_
docs: clarify coarsen reduction function contract (:pr:12314) monkeyjack123_
Fix misleading TypeError for scalar overflow in dask.array elemwise (:pr:12301) Maxime Grenu_
Stricter warnings filter (:pr:12274) Guido Imperiale_
Clean up obsolete PANDAS_GE markers (:pr:12279) Guido Imperiale_
Bump actions/upload-artifact from 6 to 7 (:pr:12311) dependabot[bot]_
Remove mention of obsolete default value for 'boundary' parameter. (:pr:12304) Marianne Corvellec_
Pandas in 3.14t CI (:pr:12284) Guido Imperiale_
Quadratic definition time in xarray.DataArray.to_zarr(compute=False) (:pr:12299) Guido Imperiale_
Bump scientific-python/issue-from-pytest-log-action from 1.4.0 to 1.5.0 (:pr:12294) dependabot[bot]_
test_tokenize_range_index fails if cityhash is not installed (:pr:12286) Guido Imperiale_
Bump minimum version of scipy (:pr:12271) Guido Imperiale_
Fix flaky categorical concat test (:pr:12276) Harshith J_
Doc: document Zarr compression options for to_zarr (:pr:12269) Harshith J_
Disable the GIL on 3.14t Windows CI (:pr:12280) Guido Imperiale_
Update obsolete pandas URLs (:pr:12278) Guido Imperiale_
Suppress warning: Consolidated metadata is not part of Zarr 3 (:pr:12273) Guido Imperiale_
Pandas4Warning: Copy-on-Write is always enabled with pandas >= 3.0 (:pr:12272) Guido Imperiale_
Disable the GIL in 3.14t CI (:pr:12270) Guido Imperiale_
Propagate contextvars to worker threads; catch warnings in 3.14t (:pr:12224) Guido Imperiale_
Fix bugs in env.yaml / pytest.xml upload (:pr:12266) Guido Imperiale_
Added full_matrices parameter to dask.array.linalg.svd (:pr:12292) Ayan Bag_
fix: zarr.create_array for better backward compatibility (:pr:12291) Wouter-Michiel Vierdag_
Silence deprecations in global config if local config overrides them (:pr:12315) Guido Imperiale_
Fix Total CPU % on /workers tab to normalize by total nthreads (:pr-distributed:9195) Ernest Provo_
setproctitle: avoid being caught by dask.config; add to test envs (:pr-distributed:9202) Guido Imperiale_
Add return type annotation for Client.register_plugin (:pr-distributed:9201) Simon-Martin Schröder
Bump actions/upload-artifact from 6 to 7 (:pr-distributed:9199) dependabot[bot]_
docs: fix Scheduler.close docstring (:pr-distributed:9198) Chase Naples_
Fix Total CPU % on /workers tab to normalize by total nthreads (:pr-distributed:9195) Ernest Provo_
XFAIL test_handle_null_partitions_2 (:pr-distributed:9191) Guido Imperiale_
Type hints for Future.status (:pr-distributed:9188) Navid_
Pin sphinx=8 (:pr-distributed:9190) Guido Imperiale_

.. _v2026.2.0:

2026.2.0

Highlights ^^^^^^^^^^

.. dropdown:: Additional changes

Minimum version of optional dependency scipy bumped to 1.10.0 (was 1.7.2)

.. _v2026.1.2:

2026.1.2

Highlights ^^^^^^^^^^

dask.dataframe now requires PyArrow 16 or greater (was 14)
Have **kwargs in to_zarr follow zarr-python API and add mode argument (:pr:12205) Wouter-Michiel Vierdag_

.. note::

Passing on io-related arguments in ``**kwargs`` in ``to_zarr`` will be deprecated
and ``read_kwargs`` argument as well as ``zarr_array_kwargs`` (dict) introduced in 2025.12.0
has been removed.
If you passed on either ``mode`` or `read_only` as ``**kwargs`` or ``read_kwargs`` in
``to_zarr``, please use the new ``mode`` argument. The ``read_only`` argument can still
be passed on, but it will give a warning and have no effect (given that ``to_zarr``
is meant to write this should not be an issue). For now no error will be thrown.
``**kwargs`` in ``to_zarr`` has been renamed as ``**zarr_array_kwargs`` to indicate
that this  directly follows the ``zarr-python`` API of ``Group.create_array``
when ``zarr>v3.0.0`` and ``zarr.create`` for ``zarr<v3.0.0``. Please see
:func:`dask.array.to_zarr` for more.

.. dropdown:: Additional changes

Minimum version of optional dependency h5py bumped to 3.7.0 (was 3.4.0)
Minimum version of optional dependency python-snappy bumped to 0.7.1 (was 0.6.0)
Minimum version of optional dependency tiledb bumped to 0.27.0 (was 0.12.0)

.. _v2026.1.1:

2026.1.1

Highlights ^^^^^^^^^^

Fix XSS vulnerability CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ Jacob Tomlinson_
Support duck-typed Futures in task graph processing (:pr:12213) Matthew Rocklin_

.. dropdown:: Additional changes

Remove the Python 2 Comment (:pr:12229) Vipin Kataria_
Fix changelog: distributed-pr -> pr-distributed (:pr:12227) Matthew Plough_
Support duck-typed Futures in task graph processing (:pr:12213) Matthew Rocklin_
Relax test_serialization (:pr:12226) Guido Imperiale_
[cosmetic] Reorganise dependency groups in CI environment files (:pr:12222) Guido Imperiale_
Review _array_expr_enabled() (:pr:12217) Guido Imperiale_
Increase coverage; lower codecov threshold to pass (:pr:12214) Guido Imperiale_
Test array expr on mindeps (:pr:12216) Guido Imperiale_
Disable some Mac builds (:pr:12218) Guido Imperiale_
Typing tweaks (:pr:12215) Guido Imperiale_
[CI] unbreak codecov (:pr:12211) Guido Imperiale_
Test array expr on Python 3.14 (:pr:12212) Guido Imperiale_
Fix pickle compatibility for Python 3.14 (:pr:12206) Matthew Rocklin_
Remove deprecated dask._compatibility.entry_points (:pr:12202) Guido Imperiale_
Tweak MacOS CI (:pr:12200) Guido Imperiale_
Remove obsolete CI pins (:pr:12199) Guido Imperiale_
Fix XSS vulnerability CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ Jacob Tomlinson_
Clean up obsolete pins in CI (:pr-distributed:9172) Guido Imperiale_
Fix incompatibility of pyparsing vs. packaging in mindeps CI (:pr-distributed:9170) Guido Imperiale_
Bump mypy; fix mypy failure (:pr-distributed:9171) Guido Imperiale_

.. _v2026.1.0:

2026.1.0

Broken yanked release, please ignore.

.. _v2025.12.0:

2025.12.0

Highlights ^^^^^^^^^^

More improvements for pandas 3.x Tom Augspurger_
Support zarr sharding through create_array (:pr:12153) Wouter-Michiel Vierdag_
Various improvements for project linting and type hinting Dimitri Papadopoulos Orfanos_
Add new "optimization.tune.active" configuration option to disable partition fusion (:pr:12194) Richard (Rick) Zamora_

.. dropdown:: Additional changes

Stable sort in Series.value_counts for pandas 3.x (:pr:12191) Tom Augspurger_
Add new "optimization.tune.active" configuration option to disable partition fusion (:pr:12194) Richard (Rick) Zamora_
Build llms.txt files in Sphinx documentation (:pr:12192) Jacob Tomlinson_
Support zarr sharding through create_array (:pr:12153) Wouter-Michiel Vierdag_
Support min/max of datetime (:pr:12183) Julia Signell_
pandas 3.x compatibility (:pr:12180) Tom Augspurger_
Minimal version of setuptools-scm (:pr:12184) Dimitri Papadopoulos Orfanos_
Update test_ufunc_meta for upstream-dev failure (:pr:12170) Tom Augspurger_
Upstream compat (:pr:12165) Tom Augspurger_
Enforce a few more ruff rules (:pr:12157) Dimitri Papadopoulos Orfanos_
Enforce ruff/refurb rules (FURB) (:pr:12144) Dimitri Papadopoulos Orfanos_
DEP: bump minimal requirement on toolz (0.10.0 -> 0.12.0) (:pr:12163) Clément Robert_
Fix execution stop in da.to_zarr due to (misleading) PerformanceWarning raised as exception (:pr:12161) Marvin Albert_
Use f-string interpolation where possible (:pr:12140) Dimitri Papadopoulos Orfanos_
pre-commit black hook: use implicit defaults (:pr:12156) Dimitri Papadopoulos Orfanos_
Enforce ruff/pygrep-hooks rules (PGH) (:pr:12143) Dimitri Papadopoulos Orfanos_
Apply Repo-Review rules (:pr:12148) Dimitri Papadopoulos Orfanos_
Document groupby: split_every, split_out (:pr:12135) Jayesh Manani_
isort → ruff (:pr:12149) Dimitri Papadopoulos Orfanos_
Enforce ruff/pyupgrade rule UP031 (:pr:12137) Dimitri Papadopoulos Orfanos_
Replace pre-commit hook with ruff rule (:pr:12142) Dimitri Papadopoulos Orfanos_
Fix reify to handle sparse arrays and other objects without len (:pr:12103) Gautham Hullikunte_
Ruff supersedes absolufy-imports (:pr:12141) Dimitri Papadopoulos Orfanos_
Enforce ruff/pyupgrade rule UP032 (:pr:12136) Dimitri Papadopoulos Orfanos_
Typing fixes (:pr-distributed:9159) Jacob Tomlinson_
Explicit setuptools-scm minimum version (:pr-distributed:9160) Jacob Tomlinson_
Enforce ruff rules (RUF) (:pr-distributed:9153) Dimitri Papadopoulos Orfanos_
Clean up MANIFEST.in (:pr-distributed:9149) Dimitri Papadopoulos Orfanos_
isort → ruff (:pr-distributed:9152) Dimitri Papadopoulos Orfanos_
Ruff supersedes absolufy-imports (:pr-distributed:9154) Dimitri Papadopoulos Orfanos_
Bump minimum supported toolz to 0.12.0 (:pr-distributed:9151) James Bourbeau_
flake8, bugbear, pyupgrade → ruff (:pr-distributed:9147) Dimitri Papadopoulos Orfanos_
Fix typos found by codespell (:pr-distributed:9145) Dimitri Papadopoulos Orfanos_
Clean up setuptools-specific configuration (:pr-distributed:9150) Dimitri Papadopoulos Orfanos_
PEP 639 compliance (:pr-distributed:9146) Dimitri Papadopoulos Orfanos_
Update black (:pr-distributed:9148) Dimitri Papadopoulos Orfanos_
Fix empty progress bar (:pr-distributed:9144) Jacob Tomlinson_
Exclude broken tblib versions in CI (:pr-distributed:9141) Jacob Tomlinson_

.. _v2025.11.0:

2025.11.0

Highlights ^^^^^^^^^^

Use shard shape when available in to_zarr (:pr:12105) Davis Bennett_
Improve worker and nanny support for ipv6 (:pr-distributed:9133) Jianyu Sun_
Linting and type hinting improvements across the codebase

.. dropdown:: Additional changes

Replace versioneer with setuptools-scm (:pr:12133) Jacob Tomlinson_
Apply ruff/Pylint Refactor rules (PLR) (:pr:12010) Dimitri Papadopoulos Orfanos_
Remove files from MANIFEST.in (:pr:12041) Dimitri Papadopoulos Orfanos_
Stabilize test_filter_nonpartition_columns (:pr:12131) DongWon_
Enforce ruff/pyupgrade rules UP007 and UP033 (:pr:12125) Dimitri Papadopoulos Orfanos_
Update np.accumulate workaround comment (:pr:12129) Jacob Tomlinson_
flake8, bugbear, pyupgrade → ruff (:pr:12002) Dimitri Papadopoulos Orfanos_
Adjust pyarrow version skip in test_parquet (:pr:12124) Tom Augspurger_
Fix ufunc in dask.array.cumreduction (:pr:12119) Tony Ding_
Fix docs footer (:pr:12120) Jacob Tomlinson_
Use integer multiple of shard shape when rechunking in to_zarr (:pr:12106) Davis Bennett_
Ensure that the shard shape is used as the default chunk shape for sharded Zarr arrays (:pr:12104) Davis Bennett_
Skip test_parquet for pyarrow==22.0 (:pr:12116) Tom Augspurger_
Clean up setuptools-specific configuration (:pr:12040) Dimitri Papadopoulos Orfanos_
PEP 639 compliance (:pr:12024) Dimitri Papadopoulos Orfanos_
Fix deprecated quantile interpolation being passed to numpy (:pr:12108) David Hoese_
Add uv.lock to .gitignore (:pr:12110) Jacob Tomlinson_
Use shard shape when available in to_zarr (:pr:12105) Davis Bennett_
Add more optional dependencies to Python 3.13 CI builds (:pr:12100) James Bourbeau_
Remove pip pin for docs (:pr:12102) James Bourbeau_
Address collection-based meta arguments in GroupByApply (:pr:12099) Richard (Rick) Zamora_
Replace versioneer with setuptools-scm (:pr-distributed:9137) Jacob Tomlinson_
Improve worker and nanny support for ipv6 (:pr-distributed:9133) Jianyu Sun_
Fix CI Multiple aliased keys in file /Users/runner/.condarc (:pr-distributed:9136) Jacob Tomlinson_
Remove pip pin for docs (:pr-distributed:9132) James Bourbeau_
Remove UCX configuration schema (:pr-distributed:9127) Peter Andreas Entschev_
Add generic type support to Future and Client methods (:pr-distributed:9123) Simon-Martin Schröder_

.. _v2025.10.0:

2025.10.0

Highlights ^^^^^^^^^^

Several Dask Array bug fixes including :pr:12097, :pr:12089, :pr:12088, and :pr:12090.

.. dropdown:: Additional changes

Use updated docs theme (:pr:12093) Jacob Tomlinson_
Fix: dask.array.cumprod does not deal with dtype (:pr:12097) Tony Ding_
CuPy compatibility for percentile (:pr:12098) Tom Augspurger_
Avoid using methods.concat on empty lists (:pr:12096) Tony Ding_
Add distribution check for optional dependencies (:pr:12087) James Bourbeau_
Fix percentile inconsistencies (:pr:12088) Oisin-M_
Fix warning in test_ufunc_where_no_out (:pr:12094) Tom Augspurger_
Fix/choose trivial case (:pr:12090) Oisin-M_
Add input validation on dask.dataframe.read_sql_query() (:pr:12091) Jacob Tomlinson_
Numpy 2.2 updates for cov function with tests (:pr:12079) Mike McCarty_
Fix nanvar (:pr:12089) Oisin-M_
Document manually triggering the conda-forge bots (:pr:12083) Jacob Tomlinson_
Fix mixed HLG/Expr handling in _ExprSequence._simplify_down (:pr:12081) Richard (Rick) Zamora_
Add dask.tokenize to API docs (:pr:12080) Username46786_
CreateOverlappingPartitions: Add before and after to prepend name (:pr:11965) Fabien Aulaire_
Fix scipy.sparce.csc_matrix scalar declaration in _array_like_safe (:pr:12078) Ilan Gold_
Update docs theme and remove docs env pins (:pr-distributed:9125) Jacob Tomlinson_
Add worker name as prefix to ThreadPoolExecutor name (:pr-distributed:9120) Maneesh Sutar_
Skip hanging SSH tests on Windows (:pr-distributed:9115) Jacob Tomlinson_
Fix macOS CI failure during job startup (:pr-distributed:9113) Jacob Tomlinson_
Prevent task stream dashboard showing 1970 date (:pr-distributed:9109) Guillaume Eynard-Bontemps_

.. _v2025.9.2:

2025.9.2

This is a backport security release only.

See CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ for more details.

.. _v2025.9.1:

2025.9.1

Highlights ^^^^^^^^^^

Avoid unconditional pyarrow dependency in dataframe.backends (:pr:12075) Tom Augspurger_
pandas 3.x compatibility for .groups (:pr:12071) Tom Augspurger_

.. dropdown:: Additional changes

Avoid unconditional pyarrow dependency in dataframe.backends (:pr:12075) Tom Augspurger_
pandas 3.x compatibility for .groups (:pr:12071) Tom Augspurger_
Expose details about worker start timeout in the exception message (:pr-distributed:9092) Taylor Braun-Jones_
pynvml => nvidia-ml-py in CI (:pr-distributed:9111) Jacob Tomlinson_

.. _v2025.9.0:

2025.9.0

Highlights ^^^^^^^^^^

pandas 3.x compatibility (:pr:12025) Tom Augspurger_
Remove protocol="ucx" support in favor of distributed-ucxx (:pr-distributed:9105) Peter Andreas Entschev_

.. dropdown:: Additional changes

Fix 0 scalar setting for scipy.sparse (:pr:12027) Ilan Gold_
Workaround failing upstream-dev tests (:pr:12061) Tom Augspurger_
avoid instantiating a potentially very large arange in take (:pr:11998) Justus Magin_
MAINT: address NumPy deprecation in np.minimum (:pr:12059) Marco Edward Gorelli_
CI fixes (:pr:12058) Tom Augspurger_
MAINT: Address NumPy DeprecationWarning (:pr:12056) Marco Edward Gorelli_
Fix test_enforce_columns on Python 3.14 (:pr:12047) Elliott Sales de Andrade_
Fix "th" --> "the" typo in DataFrame SQL docs (:pr:12038) Peter A. Jonsson_
Advance rng state in permutation (:pr:12031) James Bourbeau_
Fix pyarrow chunked array conversion (:pr:12034) James Bourbeau_
Fix xfail condition for pyarrow large_string issue (:pr:12032) James Bourbeau_
pandas 3.x compatibility (:pr:12025) Tom Augspurger_
Fix name not propagated correctly in map_blocks (:pr:11952) Ilan Gold_
Clean tuples dict keys from workers_info in /api/v1/retire_workers. (:pr-distributed:8996) Florian Courtial_
Remove protocol="ucx" support in favor of distributed-ucxx (:pr-distributed:9105) Peter Andreas Entschev_

.. _v2025.7.0:

2025.7.0

Highlights ^^^^^^^^^^

Account for __main__ in pickle normalization (:pr:11970) James Bourbeau_
Enable column projection in MapPartitions (:pr:11875) Richard (Rick) Zamora_
Add config option for direct-to-workers (:pr-distributed:9097) James Bourbeau_

.. dropdown:: Additional changes

CI: update actions location (:pr:12019) Brigitta Sipőcz_
Apply ruff/flake8-comprehensions rules (C4) (:pr:12004) Dimitri Papadopoulos Orfanos_
Apply ruff/flake8-pie rules (PIE) (:pr:12006) Dimitri Papadopoulos Orfanos_
Apply ruff/Pylint Error rules (PLE) (:pr:12013) Dimitri Papadopoulos Orfanos_
Apply ruff/Pylint Convention rules (PLC) (:pr:12012) Dimitri Papadopoulos Orfanos_
Apply ruff/flake8-pyi rules (PYI) (:pr:12007) Dimitri Papadopoulos Orfanos_
Apply ruff/flake8-simplify rules (SIM) (:pr:12008) Dimitri Papadopoulos Orfanos_
Apply ruff/Pylint Warning rules (PLW) (:pr:12011) Dimitri Papadopoulos Orfanos_
Apply ruff/flake8-implicit-str-concat rules (ISC) (:pr:12005) Dimitri Papadopoulos Orfanos_
Apply ruff/pycodestyle rule E714 (:pr:12000) Dimitri Papadopoulos Orfanos_
Fix typos found by codespell (:pr:12001) Dimitri Papadopoulos Orfanos_
Update PyPI URL for official nightly pyarrow repository (:pr:11996) Raúl Cumplido_
Fall-back to textual repr in case jinja2 is not installed (:pr:11987) Lukas Bindreiter_
Prevent builtins.any from being shadowed in dask.array.reductions (:pr:11988) Marvin Albert_
Bump conda-incubator/setup-miniconda from 3.1.1 to 3.2.0 (:pr:11982)
Skip groupby cov test for pandas 3.x (:pr:11977) Tom Augspurger_
Fix upstream CI installation (:pr:11976) James Bourbeau_
Make module name logic more resilient in Dispatch (:pr:11974) James Bourbeau_
Ensure memray profiler runs on all workers (:pr-distributed:9095) James Bourbeau_
Update def to class typo in actors docs (:pr-distributed:9091) Peter Fackeldey_
Bump conda-incubator/setup-miniconda from 3.1.1 to 3.2.0 (:pr-distributed:9090)
Update persist in tests for async clients (:pr-distributed:9089) Tom Augspurger_
Fix pyarrow FileInfo import (:pr-distributed:9078) James Bourbeau_
Make module name logic more resilient in _always_use_pickle_for (:pr-distributed:9086) James Bourbeau_
Temporarily pin pytest in CI to avoid coverage error (:pr-distributed:9088) James Bourbeau_
Remove s3fs from testing CI environment (:pr-distributed:9087) James Bourbeau_
Reuse Comm objects in Scheduler.broadcast (:pr-distributed:9083) Tom Augspurger_
Fix test_resubmit_nondeterministic_task_different_deps (:pr-distributed:9085) James Bourbeau_

.. _v2025.5.1:

2025.5.1

Highlights ^^^^^^^^^^ Fixed Dask Array slicing regression introduced in the 2025.5.0 release. See :pr:11947 from Florian Jetter_ for more details.

.. dropdown:: Additional changes

Speed up slicing graph generation (:pr:11945) Florian Jetter_
Revert "Don't handle tuple in task_spec.parse_input" (:pr:11953) Florian Jetter_
Optimize slicing graph generation (:pr:11946) Florian Jetter_
Fix xarray slicing regression (:pr:11947) Florian Jetter_
Don't handle tuple in task_spec.parse_input (:pr:11948) Florian Jetter_

.. _v2025.5.0:

2025.5.0

Highlights ^^^^^^^^^^

Fixed Array setitem when both the array and the indexer have unknown shape. See :pr:11753 from Tom Augspurger_ for more details.
Fixed several delayed graph handling issues introduced in the 2025.4.0 release. See :pr:11917, :pr:11907, and :pr-distributed:9071 from Florian Jetter_ for more details.

.. dropdown:: Additional changes

Speed up slicing graph generation (:pr:11945) Florian Jetter_
Optimize dask order for worst case of get_target (:pr:11935) Florian Jetter_
Raise on local executor if tasks are missing dependency (:pr:11944) Florian Jetter_
Fix to_dask_array for single partition (:pr:11931) James Bourbeau_
Ensure parquet plan is fully cached during optimization (:pr:11933) Florian Jetter_
Better documentation for expression system (:pr:11915) Florian Jetter_
Simplify (and speed up) culling (:pr:11899) Florian Jetter_
Update pre-commit (:pr:11926) Florian Jetter_
Don't run post setup-miniconda step in CI (:pr:11925) James Bourbeau_
Try to pin pip for readthedocs (:pr:11923) Florian Jetter_
Fix windows CI (:pr:11919) Florian Jetter_
Use stable crick for py310 (:pr-distributed:9072) Florian Jetter_
Remove internal dependencies mapping in update_graph (:pr-distributed:9036) Florian Jetter_
Partially forgotten dependencies (:pr-distributed:9068) Florian Jetter_
Replace filesystem-spec in CI environment with fsspec (:pr-distributed:9069) James Bourbeau_
Ensure actors set erred state properly in case of worker failure (:pr-distributed:9067) Florian Jetter_
Refactor timeouts in start cluster (:pr-distributed:9062) Florian Jetter_
Fix workers / threads / memory displayed in client repr (:pr-distributed:9066) James Bourbeau_
Pin pip for readthedocs (:pr-distributed:9063) Florian Jetter_
Skip TLS functional tests (:pr-distributed:9061) Florian Jetter_
Ensure client submit does not serialize unnecessarily (:pr-distributed:9057) Florian Jetter_

.. _v2025.4.1:

2025.4.1

Highlights ^^^^^^^^^^ This release contains several graph optimization fixes for issues introduced in the 2025.4.0 release.

See :pr:11906, :pr:11898, :pr:11903, and :pr:11904 by Florian Jetter_ for more details.

.. dropdown:: Additional changes

Implement ufuncs and gufunc for array-expr (:pr:11818) Patrick Hoefler_
Implement map_overlap for array-expr (:pr:11822) Patrick Hoefler_

.. _v2025.4.0:

2025.4.0

Highlights ^^^^^^^^^^

When computing multiple Dask-Expr backed collections like DataFrames, they are now optimized together instead of individually.
Graph materialization and low level optimization is now being performed on the scheduler of a distributed cluster (if available).
New kwarg force for DataFrame.shuffle which signals the optimizer to not drop the shuffle during optimization.
Collections that are passed to Dask methods as arguments are now properly optimized. If multiple collections are passed as arguments they will be optimized together. Collections passed this way are prohibited from being being reused, i.e. if the collection is used again in another function call it will be computed again. This pattern is used to avoid pipeline breakers which typically drive memory usage. Avoiding those should reduce memory pressure on the cluster but can cause runtime regressions.
(Special case of above point) Collections passed to Delayed objects are now optimized automatically.

Breaking changes ^^^^^^^^^^^^^^^^

Support for custom low level optimizers removed.
Top level dask.optimize will now always trigger graph materialization. Previously this was not always the case. This also causes any low level HLG annotations to be dropped.
DataFrame and Array compute results are now always concatenated on the cluster. Previously, the behavior was dependent on the API used to call compute (dask.compute, DaskCollection.compute, or Client.compute).
dask.base.collections_to_dsk has been renamed to collections_to_expr and no longer returns a HighLevelGraph or dict object but instead guarantees an dask._expr.Expr object. Further, it no longer performs low level optimization immediately but instead delays until the Expr instance is materialized, i.e. the returned object is no longer a mapping such that converting it to dict or iterating over it is not possible any more.

.. dropdown:: Additional changes

Ensure Future value is in da.from_delayed task graph (:pr:11896) Tom Augspurger_
Fix annotations passed to delayed (:pr:11893) Florian Jetter_
Migrate delayed unpack_collections (:pr:11881) Florian Jetter_
Remove Pub / Sub references from docs (:pr:11891) James Bourbeau_
Ensure only classes without custom init are singletons (:pr:11886) Florian Jetter_
Remove custom initializers for delayed expressions (:pr:11888) Florian Jetter_
Fix persisting multiple DFs at the same time (:pr:11887) Florian Jetter_
Avoid always parsing list inputs to DataFrame.isin as object type numpy arrays (:pr:11869) Matthew Roeschke_
Unskip pandas-dev cov / corr tests (:pr:11873) Tom Augspurger_
HLG blockwise fix (:pr:11871) Florian Jetter_
Ensure annotations for HLG objects are properly generated (:pr:11866) Florian Jetter_
Factor out singleton logic from base Expr class (:pr:11868) Florian Jetter_
Ensure HLGs are using dependencies properly in optimization (:pr:11859) Florian Jetter_
Ensure dictionaries tokenize deterministically (:pr:11867) Florian Jetter_
Ensure default dask scheduler only compute what's needed (:pr:11861) Florian Jetter_
Faster tokenization of pd.RangeIndex (:pr:11863) Florian Jetter_
Update link to Quansight in community doc (:pr:11860) Pavithra Eswaramoorthy_
Relax tolerance in autocorr test (:pr:11857) Tom Augspurger_
Use map_blocks in array.store to avoid materialization and dropping of annotations (:pr:11844) Florian Jetter_
Ensure repartition does not trigger memory size computation during lowering (i.e. on the scheduler) (:pr:11855) Florian Jetter_
Support args and kwargs for rolling aggregations (:pr:11856) Florian Jetter_
Remove nightly h5py from upstream CI job (:pr:11847) James Bourbeau_
Ensure HLGExpr tokenize uniquely (:pr:11849) Florian Jetter_
Do not inject median in describe for pandas 3 (:pr:11846) Florian Jetter_
Fixed Expr.__setattr__ for subclasses (:pr:11845) Tom Augspurger_
Wrap HLGs in an Expr to avoid Client side materialization (:pr:11736) Florian Jetter_
Improve error when submitting work from a closed client (:pr-distributed:9049) James Bourbeau_
Return a default value if address resolution fails (:pr-distributed:9051) Sandro_
Avoid deepcopy when submitting graph (:pr-distributed:8633) Florian Jetter_
Dynamically scale heartbeat and scheduler_info intervals (:pr-distributed:9046) Florian Jetter_
Speed up process startup time by avoiding importing packages on version check (:pr-distributed:9048) Florian Jetter_
Reduce size of scheduler_info (:pr-distributed:9045) Florian Jetter_
Cache WorkerState host property (:pr-distributed:9044) Florian Jetter_
Clear ci env cache (:pr-distributed:9047) Florian Jetter_
Remove deprecated Pub / Sub (:pr-distributed:9039) Florian Jetter_
Perform explicit culling step only if LLG is submitted (:pr-distributed:9040) Florian Jetter_
Do not fully materialize global annotations by type (:pr-distributed:9035) Florian Jetter_
Allow nested worker_client calls (:pr-distributed:9038) George Sakkis_
Dump ci cache (:pr-distributed:9037) Florian Jetter_
Scheduler type annotations (:pr-distributed:9030) Florian Jetter_
Reduce dask.order overhead by removing stripped_dep computation (:pr-distributed:9031) Florian Jetter_
Use Expr instead of HLG (:pr-distributed:9008) Florian Jetter_

.. _v2025.3.0:

2025.3.0

Highlights ^^^^^^^^^^

Automatically adjust chunksizes in xarray.apply_ufunc """""""""""""""""""""""""""""""""""""""""""""""""""""""""

apply_ufunc requires the core dimension to have chunksize=-1. The underlying rechunking operation will automatically adjust the chunksize of the core dimension but keep the other dimensions the same. This can cause exploding chunksizes under the hood.

This release adds an intermediate step that resizes the non-core dimensions by the same factor that the core dimension will increase to keep the maximum chunksize under control. This behavior is automatically enabled when allow_rechunk=True is set.

.. code-block::

import xarray as xr
import dask.array as da

arr = xr.DataArray(
    da.random.random((1, 750, 45910), chunks=(1, "auto", -1)),
    dims=["band", "y", "x"],
)

result = arr.interp(
    y=arr.coords["y"],
    method="linear",
)

.. grid:: 2

.. grid-item:: **Previously**

    Individual chunks are exploding to 25 GiB, likely causing out of memory errors.

    .. image:: images/changelog/gufunc_chunksizes_exploding.png
      :width: 100%
      :align: center
      :alt: Individual chunks are exploding to 25 GiB, likely causing out of memory errors.

.. grid-item:: **Now**

    Dask will now automatically split individual chunks into chunks that will have the
    same chunksize minus a small tolerance.

    .. image:: images/changelog/gufunc_chunksizes_constant.png
      :width: 100%
      :align: center
      :alt: Individual chunks are now roughly the same size

.. dropdown:: Additional changes

Fix dataset info cache assignment (:pr:11840) Florian Jetter_
Expr setattr (:pr:11836) Florian Jetter_
Follow up to expression tokenization caching (:pr:11837) Florian Jetter_
Consolidate getattr for expr classes (:pr:11835) Florian Jetter_
Reduce pickle size of ReadParquet expression (:pr:11797) Florian Jetter_
arange loses precision on ~2**63 (:pr:11801) Guido Imperiale_
Remove numbagg from upstream build (:pr:11821) Patrick Hoefler_
Dispatch to numbagg for nanmedian and nanquantile (:pr:11817) Patrick Hoefler_
Make missing meta warning more ergonomic (:pr:11814) Patrick Hoefler_
Remove name doc from from_pandas (:pr:11812) Patrick Hoefler_
Implement an Array Scalar (:pr:11810) Patrick Hoefler_
Added to_orc to DataFrame API (:pr:11807) Tom Augspurger_
Implement reverse indexing for DataFrames (:pr:11803) Patrick Hoefler_
Add lazy to_pandas_dispatch registration for cudf (:pr:11799) Richard (Rick) Zamora_
Fix missing imports in array-expr (:pr:11796) Florian Jetter_
Cache tokens on expressions and restore after pickle roundtrip (:pr:11791) Florian Jetter_
Use random dashboard ports for LocalCluster in distributed tests (:pr:11795) Florian Jetter_
Implement slicing for array-expr (:pr:11783) Patrick Hoefler_
Never use an asynchronous Client when calling top level compute function (:pr:11790) Florian Jetter_
Refactor import tests (:pr:11794) Florian Jetter_
Migrate base.unpack_collections to Task class (:pr:11793) Florian Jetter_
Ensure map_blocks generates unique tokens (:pr:11792) Florian Jetter_
Speed up normalize_pickle by 50 percent (:pr:11788) Florian Jetter_
Fix divisions calculation with duplicates (:pr:11787) Patrick Hoefler_
Fix assign align for duplicated divisions (:pr:11786) Patrick Hoefler_
Ensure concat optimize project does not raise (:pr:11784) Florian Jetter_
Add array-expr from_array (:pr:11772) Patrick Hoefler_
Keep chunksizes consistent in apply_gufunc (:pr:11683) Patrick Hoefler_
Test dask.dataframe.__all__ (:pr:11782) Philipp A._
Add __all__ to dask.bag (:pr:11781) Philipp A._
Add test for dask.array.__all__ (:pr:11780) Philipp A._
Bump JamesIves/github-pages-deploy-action from 4.7.2 to 4.7.3 (:pr:11777)
Export dask.array members (:pr:11779) Philipp A._
Fix sorted_divisions_locations with duplicates (:pr:11773) Tom Augspurger_
Fix small typo in best-practices.rst (:pr:11775) Sergey Kolesnikov_
Allow unknown chunks in blockwise adjust_chunks (:pr:11769) Lindsey Gray_
Fix crash in asarray(..., like=...) vs. scipy.sparse objects (:pr:11755) Guido Imperiale_
Remove flaky optional dependency (:pr:11771) Tom Augspurger_
Add support for scipy sparray (:pr:11750) Philipp A._
Added flaky to tests extra (:pr:11770) Tom Augspurger_
Ensure divisions are plain scalars (:pr:11767) Tom Augspurger_
Remove divisions code duplication (:pr:11764) Florian Jetter_
Ensure divisions not diverging from npartitions in Merge (:pr:11762) Florian Jetter_
Skip test_visualize_int_overflow on windows (:pr:11761) Florian Jetter_
Reduce pickle size for tasks (:pr:11687) Florian Jetter_
Implement unify_chunks and Rechunk (:pr:11692) Patrick Hoefler_
Fix expression getitem to avoid alignment (:pr:11760) Patrick Hoefler_
arange(..., like=x) embeds the graph of x (:pr:11754) Guido Imperiale_
Simplify assert_divisions (:pr:11745) Florian Jetter_
Fix Projection logic for Series objects (:pr:11747) Patrick Hoefler_
Remove bytes as keys (:pr:11757) Florian Jetter_
Ensure map_partitions returns Series object if function returns scalar (:pr:11756) Florian Jetter_
Don't upload env twice (:pr:11748) Patrick Hoefler_
Fix badges in readme (:pr-distributed:9029) Florian Jetter_
Properly forward cancellation reason (:pr-distributed:9028) Florian Jetter_
Fix bokeh circle (:pr-distributed:9026) Florian Jetter_
Ensure FileInfo can be serialized (:pr-distributed:9025) Florian Jetter_
Add ipykernel to skipped modules in code sampling (:pr-distributed:9022) Matthew Rocklin_
SpecCluster: add option to not shut down the scheduler when the cluster is closed (:pr-distributed:9021) Taylor Braun-Jones_
Fix CI by using client.persist(collection) instead of collection.persist() (:pr-distributed:9020) Hendrik Makait_
Add redirect from prefix root to status (:pr-distributed:9015) Isaac_
Bump JamesIves/github-pages-deploy-action from 4.7.2 to 4.7.3 (:pr-distributed:9018)
Remove bytes keys from tests (:pr-distributed:9017) Jacob Tomlinson_

.. _v2025.2.0:

2025.2.0

Highlights ^^^^^^^^^^ This release includes a critical fix that fixes a deadlock that can arise when seceded task are rescheduled, or cancelled and resubmitted, e.g. due to a worker being lost.

See :pr-distributed:8991 by Hendrik Makait_ for more details.

.. dropdown:: Additional changes

Add big array example (:pr:11744) James Bourbeau_
Fix exploding chunksizes in pad for constant padding (:pr:11743) Patrick Hoefler_
Move optimize method to base class (:pr:11742) Florian Jetter_
Add changelog entry for fixed deadlock (:pr:11741) Hendrik Makait_
Fix graph creation in dask-expr to_delayed (:pr:11739) Patrick Hoefler_
Remove culling from delayed optimisation (:pr:11737) Patrick Hoefler_
Compute meta for from_map on the cluster (:pr:11738) Patrick Hoefler_
Bugs in __setitem__ with dask bool mask (:pr:11728) Guido Imperiale_
Implement infrastructure, random, blockwise and Elemwise (:pr:11689) Patrick Hoefler_
array / asarray with both like= and dtype= (:pr:11733) Guido Imperiale_
Fix annotations warnings test (:pr:11734) Patrick Hoefler_
Catch warnings when writing to remote storage with to_parquet (:pr:11731) Patrick Hoefler_
Remove LocalCluster from tests (:pr:11729) Patrick Hoefler_
Fix partition pruning when using from_array (:pr:11725) Patrick Hoefler_
Fix concatentation with mixed dtype columns (:pr:11727) Patrick Hoefler_
arange: fix extreme values (:pr:11707) Guido Imperiale_
Graph corruption on scalar getitem -> setitem (:pr:11723) Guido Imperiale_
Never share buffers after compute() (:pr:11697) Guido Imperiale_
Extract Dask Array from xarray DataArray in from_array (:pr:11712) Patrick Hoefler_
arange: support kwargs (:pr:11710) Guido Imperiale_
Ensure normalize_token is threadsafe (:pr:11709) Florian Jetter_
Expand advise for instance types and processes (:pr:11705) Florian Jetter_
Drop legacy timeseries implementation (:pr:11704) Florian Jetter_
Update Dask Cloud Provider documentation to include Nebius as a supported cloud option (:pr:11703) Alexander_
Fix normalize_chunks when squashing into a single chunk (:pr:11702) Patrick Hoefler_
Fix positional indexing with newaxis (:pr:11699) Patrick Hoefler_
Set array backend in scipy-sparse-indexing (:pr:11700) Tom Augspurger_
Fix value_counts shuffling strategy (:pr:11698) Patrick Hoefler_
Disentangle core expression class from dataframe specific code (:pr:11688) Patrick Hoefler_
Bump conda-incubator/setup-miniconda from 3.1.0 to 3.1.1 (:pr:11685)
Fixup dataframe conversion from array methods (:pr:11684) Patrick Hoefler_
Remove remaining artifacts of fastparquet (:pr:11682) Patrick Hoefler_
Remove traceback from sizeof failure warning (:pr-distributed:9006) Jacob Tomlinson_
Hotfix: Ignore negative occupancy (:pr-distributed:9012) Hendrik Makait_
Remove expensive tokenization for key uniqueness check (:pr-distributed:9009) Patrick Hoefler_
Fix CI for changes in from_map (:pr-distributed:9011) Patrick Hoefler_
Avoid handling stale long-running messages on scheduler (:pr-distributed:8991) Hendrik Makait_
Bump test_stress timeout (:pr-distributed:9002) Tom Augspurger_
Poll in test_rmm_metrics test (:pr-distributed:9004) Tom Augspurger_
Cache occupancy in WorkStealing.balance() (:pr-distributed:9005) Hendrik Makait_
Homogeneous balancing by accounting for in-flight requests (:pr-distributed:9003) Hendrik Makait_
Consistent estimation of task duration between stealing, adaptive and occupancy calculation (:pr-distributed:9000) Hendrik Makait_
Increase default work-stealing interval by 10x (:pr-distributed:8997) Hendrik Makait_
Remove occupancy plot from status dashboard (:pr-distributed:8995) Hendrik Makait_
Bump conda-incubator/setup-miniconda from 3.1.0 to 3.1.1 (:pr-distributed:8990)

.. _v2025.1.0:

2025.1.0

Highlights ^^^^^^^^^^

Legacy Dask DataFrame Implementation removed """"""""""""""""""""""""""""""""""""""""""""

This release drops the legacy Dask DataFrame implementation. The API with query planning is now the only available Dask DataFrame implementation.

This enforces the deprecation of the configuration:

.. code-block::

dask.config.set({"dataframe.query-planning": False})

Dask-Expr was merged into the dask package as well as the dask/dask repository. It is no longer necessary to install dask-expr separately.

Reducing Memory Pressure for Xarray Workloads """""""""""""""""""""""""""""""""""""""""""""

Dask introduced a mechanism that is called root task queuing <https://distributed.dask.org/en/stable/scheduling-policies.html#queuing>_ in 2022. This mechanism allows Dask to detect tasks that are reading data from storage and schedule them defensively to avoid memory pressure on the cluster through overproduction of these tasks. The underlying mechanism was very fragile and failed for specific types of computations like opening multiple zarr stores or loading a large number of netcdf files.

The recent changes in Dask's task graph representation allow for more robust detection of root tasks. This change makes the detection mechanism independent of the workload running and is especially beneficial for Xarray workloads.

This results in significantly more memory stability and a reduced memory footprint for workloads where root task detection was previously failing and makes the expected memory profile deterministic and independent of the topology of the task graph.

.. _v2024.12.1:

2024.12.1

Highlights ^^^^^^^^^^

Improved scheduler responsiveness for large task graphs """"""""""""""""""""""""""""""""""""""""""""""""""""""" This release reduces the number of Python object references related to tracking tasks by the Dask scheduler. This increases scheduler responsiveness by reducing the time needed to run garbage collection on the scheduler.

See :issue:8958, :pr:11608, :pr:11600, :pr:11598, :pr:11597, and :pr-distributed:8963 from Hendrik Makait_ for more details.

.. dropdown:: Additional changes

Fix map_overlap bug where rechunking and trim=False caused inconsistent chunkings (:pr:11605) Patrick Hoefler_
Avoid legacy implementation in read-csv (:pr:11603) Patrick Hoefler_
Remove legacy DataFrame import (:pr:11604) Patrick Hoefler_
asarray ignores dtype for array inputs (:pr:11586) crusaderky_
Add back LLM chatbot to Dask docs (:pr:11594) dchudz_
Bump JamesIves/github-pages-deploy-action from 4.6.9 to 4.7.2 (:pr:11593)
Migrate dask array creation routines to task spec (:pr:11582) James Bourbeau_
Migrate most of dask array random to task spec (:pr:11581) James Bourbeau_
Do not use local function in array.push (:pr:11576) Florian Jetter_
Bump conda-incubator/setup-miniconda from 3.0.3 to 3.1.0 (:pr-distributed:8922)
Pick random dashboard port in tests (:pr-distributed:8965) Hendrik Makait_
Fix formatting for NoValidWorkerException message (:pr-distributed:8967) Hendrik Makait_
Support pynvml>=11.5 in WSL (:pr-distributed:8962) Richard (Rick) Zamora_
Bump JamesIves/github-pages-deploy-action from 4.6.9 to 4.7.2 (:pr-distributed:8960)

.. _v2024.12.0:

2024.12.0

Highlights ^^^^^^^^^^

Python 3.13 Support """"""""""""""""""" This release adds support for Python 3.13. Dask now supports Python 3.10-3.13.

See :pr:11456 and :pr-distributed:8904 from Patrick Hoefler_ and James Bourbeau_ for more details.

.. dropdown:: Additional changes

Revert "Add LLM chatbot to Dask docs (:pr:11556)" (:pr:11577) dchudz_
Automatically rechunk if array in to_zarr has irregular chunks (:pr:11553) Patrick Hoefler_
Blockwise uses Task class (:pr:11568) Florian Jetter_
Migrate rechunk and reshape to task spec (:pr:11555) Patrick Hoefler_
Cache svg-representation for arrays (:pr:11560) Deepak Cherian_
Fix empty input for containers (:pr:11571) Florian Jetter_
Convert Bag graphs to TaskSpec graphs during optimization (:pr:11569) Florian Jetter_
Add LLM chatbot to Dask docs (:pr:11556) dchudz_
Fuse data nodes in linear fusion too (:pr:11549) Patrick Hoefler_
Migrate slicing code to task spec (:pr:11548) Patrick Hoefler_
Speed up ArraySliceDep tokenization (:pr:11551) Patrick Hoefler_
Fix fusing of p2p barrier tasks (:pr:11543) Patrick Hoefler_
Remove infra/mentions of GPU CI (:pr:11546) Charles Blackmon-Luca_
Temporarily disable gpuCI update CI job (:pr:11545) James Bourbeau_
Use BlockwiseDep to implement map_blocks keywords (:pr:11542) Patrick Hoefler_
Remove optimize_slices (:pr:11538) Patrick Hoefler_
Make reshape_blockwise a noop if shape is the same (:pr:11541) Patrick Hoefler_
Remove read-only flag from open_arry in open_zarr (:pr:11539) Patrick Hoefler_
Implement linear_fusion for task spec class (:pr:11525) Patrick Hoefler_
Remove recursion from TaskSpec (:pr:11477) Florian Jetter_
Fixup test after dask-expr change (:pr:11536) Patrick Hoefler_
Bump codecov/codecov-action from 3 to 5 (:pr:11532)
Create dask-expr frame directly without roundtripping (:pr:11529) Patrick Hoefler_
Add scikit-image nightly back to upstream CI (:pr:11530) James Bourbeau_
Remove from_dask_dataframe import (:pr:11528) Patrick Hoefler_
Ensure that from_array creates a copy (:pr:11524) Patrick Hoefler_
Simplify and improve performance of normalize chunks (:pr:11521) Patrick Hoefler_
Fix flaky nanquantile test (:pr:11518) Patrick Hoefler_
Fix tests for new read_only kwarg in zarr=3 (:pr:11516) Patrick Hoefler_
Fix test_jupyter.py::test_shutsdown_cleanly (:pr-distributed:8954) Hendrik Makait_
Install tornado from conda-forge in Python 3.13 CI (:pr-distributed:8951) James Bourbeau_
Restore retire workers API (:pr-distributed:8939) Florian Jetter_
Properly convert finalize dependencies to references (:pr-distributed:8949) Hendrik Makait_
Block fusion for barrier tasks (:pr-distributed:8944) Patrick Hoefler_
Remove infra/mentions of GPUCI (:pr-distributed:8946) Charles Blackmon-Luca_
Temporarily disable gpuCI update CI job (:pr-distributed:8945) James Bourbeau_
Remove recursion in task spec (:pr-distributed:8920) Florian Jetter_
Less verbose log messages for remove and register worker (:pr-distributed:8938) Florian Jetter_
Do not log full worker info in retire_workers (:pr-distributed:8935) Florian Jetter_

.. _v2024.11.2:

2024.11.2

.. note:: Versions 2024.11.0 and 2024.11.1 included a critical performance regression and should be skipped by every user.

Highlights ^^^^^^^^^^

Legacy Dask DataFrame Deprecated """"""""""""""""""""""""""""""""

This release deprecates the legacy Dask DataFrame implementation. The old implementation will be removed completely in a future release. Users are encourage to switch to the new implementation now and to report any issues they are facing.

Users are also encourage to check that they are only importing functions from dask.dataframe and not any of the submodules.

New quantile methods for Dask Array API """""""""""""""""""""""""""""""""""""""

Dask Array added new quantile and nanquantile methods. Previously, Dask dispatched to the NumPy implementation, which blocked the GIL a lot. This caused large slowdowns on workers with more than one tread and could lead to runtimes over 200s per chunk.

The new quantile implementation avoids many of these problems and reduces runtime to around 1s per chunk independently of the number of threads.

Consistent chunksize in Xarray rolling-construct """"""""""""""""""""""""""""""""""""""""""""""""

Using Xarrays rolling(...).construct(...) with Dask Arrays led to very large chunksizes that rarely fit into memory on a single worker.

The underlying operations is a view on the smaller NumPy array, but triggering a copy of the data will lead to very large memory usage.

.. code-block::

import xarray as xr
import dask.array as da

arr = xr.DataArray(
    da.ones((93504, 721, 1440), chunks=("auto", -1, -1)),
    dims=["time", "lat", "longitude"],
)   # Initial chunks are ~128 MiB
arr.rolling(time=30).construct("window_dim")

.. grid:: 2

.. grid-item:: **Previously**

    Individual chunks are exploding to 10 GiB, likely causing out of memory errors.

    .. image:: images/changelog/rolling-construct-exploding-chunks.png
      :width: 100%
      :align: center
      :alt: Individual chunks are exploding to 10 GiB, likely causing out of memory errors.

.. grid-item:: **Now**

    Dask will now automatically split individual chunks into chunks that will have the
    same chunksize minus a small tolerance.

    .. image:: images/changelog/rolling-construct-constant-chunks.png
      :width: 100%
      :align: center
      :alt: Individual chunks are now roughly the same size

Improved efficiency of map overlap """"""""""""""""""""""""""""""""""

map_overlap now creates smaller and more efficient graphs to keep task graphs generally a lot smaller.

The previous version injected a lot of tasks that weren't necessary, increasing the number of tasks by a factor of 2-10x of what actually necessary. This caused a lot of stress on the scheduler.

Consistent chunksizes for Einstein summation """"""""""""""""""""""""""""""""""""""""""""

Einstein summation historically led to very large chunksizes if applied to more than one Dask Array. This behavior is inherited from NumPy but led to out of memory errors on workers:

.. code-block::

import dask.array as da
arr = da.random.random((1024, 64, 64, 64, 64), chunks=(256, 16, 16, 16, 16)) # Initial chunks are 128 MiB
result = da.einsum("aijkl,amnop->ijklmnop", arr, arr)

.. grid:: 2

.. grid-item:: **Previously**

    Individual chunks are exploding to 32 GiB, very likely causing out of memory errors.

    .. image:: images/changelog/einstein-exploding-chunks.png
      :width: 100%
      :align: center
      :alt: Individual chunks are exploding to 32 GiB, very likely causing out of memory errors

.. grid-item:: **Now**

    The operation keeps individual chunksizes the same.

    .. image:: images/changelog/einstein-constant-chunks.png
      :width: 100%
      :align: center
      :alt: Individual chunks are now roughly the same size

.. dropdown:: Additional changes

Add changelog for Dask release (:pr:11502) Patrick Hoefler_
Minor updates to optional dependencies table (:pr:11503) James Bourbeau_
Add push for ffill like operations (:pr:11501) Patrick Hoefler_
Remove func packing for TaskSpec (:pr:11496) Florian Jetter_
Make tokenization for vindex more efficient (:pr:11493) Patrick Hoefler_
Cut down runtime of einstein summation test (:pr:11499) Patrick Hoefler_
Improve test runtime for test_rot90 (:pr:11498) Florian Jetter_
Disable low level optimization for TaskSpec in Bags (:pr:11495) Florian Jetter_
Add automatic rechunking to sliding-window-view (:pr:11479) Patrick Hoefler_
Add load_stored kwarg to dask.array.store (:pr:11465) Deepak Cherian_
Fix quantile error in two dimensions (:pr:11489) Patrick Hoefler_
Bump conda-incubator/setup-miniconda from 3.0.4 to 3.1.0 (:pr:11490)
Update map_blocks docstring (:pr:11491) Patrick Hoefler_
Fix einsum with empty arrays (:pr:11488) Patrick Hoefler_
Implement non gil-blocking quantile method (:pr:11473) Patrick Hoefler_
Use internal keyword for trimming in map_overlap to reduce graph size (:pr:11486) Patrick Hoefler_
Minor dask order refactor (:pr:11467) Florian Jetter_
Remove empty tasks from map_overlap (:pr:11483) Patrick Hoefler_
Fixup auto chunks calculation if single chunk goes below 1 (:pr:11485) Patrick Hoefler_
Fix CI after pandas upstream changes (:pr:11482) Patrick Hoefler_
Make sure that block_id and block_info don't create extra tasks (:pr:11484) Patrick Hoefler_
Use repeat to build nearest boundary (:pr:9666) Jean-Baptiste Bayle_
Remove dead code from make_blockwise (:pr:11478) Florian Jetter_
Patch auto-chunks calculation for rioxarray (:pr:11480) Patrick Hoefler_
Skip legacy test because of flaky warning (:pr:11475) Patrick Hoefler_
Unskip a few dask-expr tests (:pr:11474) Patrick Hoefler_
Keep chunk sizes consistent in einsum (:pr:11464) Patrick Hoefler_
Improve how normalize_chunks squashes together chunks when "auto" is set (:pr:11468) Patrick Hoefler_
Fix resolve_aliases when multiple aliases are in graph (:pr:11469) Patrick Hoefler_
Avoid cyclic import in dask.array (:pr:11472) Hendrik Makait_
Unskip dataframe test (:pr:11471) Patrick Hoefler_
Improve dask.order performance for large graphs (:pr:11466) Florian Jetter_
Ensure that slice(None) just maps the keys (:pr:11450) Patrick Hoefler_
Fix Task.__repr__() of unpickled object (:pr:11463) Peter Andreas Entschev_
Use TaskSpec in local dask execution (:pr:11378) Florian Jetter_
Adjust accuracy in test_solve_triangular_vector (:pr:11461) Florian Jetter_
Update Aggregation docstring (:pr:11459) Guillaume Eynard-Bontemps_
Implement fuse option for delayed objects (:pr:11441) Patrick Hoefler_
Deprecate legacy dask dataframe implementation (:pr:11437) Patrick Hoefler_
Fix na casting behavior for groupby.agg with arrow dtypes (:pr:11118) Patrick Hoefler_
Fix behavior of keys_in_tasks for TaskSpec nodes (:pr:11445) Florian Jetter_
Convert dtype to int instead of np.uint8 for visualizing large task graphs (:pr:11440) Patrick Hoefler_
Ensure dependencies are not mutated (:pr:11438) Florian Jetter_
Full support for task spec in dask.order (:pr:11347) Florian Jetter_
Remove redundant methods in P2PBarrierTask (:pr-distributed:8924) Florian Jetter_
Fix skipif condition for test_tell_workers_when_peers_have_left (:pr-distributed:8929) Florian Jetter_
Ensure ConnectionPool is closed even if network stack swallows CancelledErrors (:pr-distributed:8928) Florian Jetter_
Fix flaky test_server_comms_mark_active_handlers (:pr-distributed:8927) Florian Jetter_
Make assumption in P2P's barrier mechanism explicit (:pr-distributed:8926) Hendrik Makait_
Adjust timeouts in Jupyter cli test (:pr-distributed:8925) Florian Jetter_
Add stimulus_id to update_graph plugin hook (:pr-distributed:8923) Hendrik Makait_
Reduce P2P transfer task overhead (:pr-distributed:8912) Hendrik Makait_
Disable profiler on Python 3.11 (:pr-distributed:8916) Florian Jetter_
Fix test_restarting_does_not_deadlock (:pr-distributed:8849) Florian Jetter_
Adjust popen timeouts for testing (:pr-distributed:8848) Florian Jetter_
Add retry to shuffle broadcast (:pr-distributed:8900) Florian Jetter_
Fix test_shuffle_with_array_conversion (:pr-distributed:8909) Florian Jetter_
Refactor some tests (:pr-distributed:8908) Florian Jetter_
Graduate dask-expr from contrib to core project (:pr-distributed:8911) Hendrik Makait_
Skip test_tell_workers_when_peers_have_left on py10 (:pr-distributed:8910) Florian Jetter_
Internal cleanup of P2P code (:pr-distributed:8907) Hendrik Makait_
Use Task class instead of tuple (:pr-distributed:8797) Florian Jetter_
Increase connect timeout for test_tell_workers_when_peers_have_left (:pr-distributed:8906) Florian Jetter_
Remove dispatching in TaskCollection (:pr-distributed:8903) Florian Jetter_
Deduplicate requests to scheduler in P2P (:pr-distributed:8899) Hendrik Makait_
Add configurations for rootish taskgroup threshold (:pr-distributed:8898) Patrick Hoefler_

.. _v2024.10.0:

2024.10.0

Notable Changes ^^^^^^^^^^^^^^^

Zarr-Python 3 compatibility (:pr:11388)
Avoid exponentially increasing taskgraph in overlap (:pr:11423)
Ensure numba tokenization does not use slow pickle path (:pr:11419)

.. dropdown:: Additional changes

Ensure broadcast_shapes() returns integers, not NumPy scalars. (:pr:11434) Martin Yeo_
(fix): sparse indexing (:pr:11430) Ilan Gold_
Ensure that recursively calling tokenize respects ensure_deterministic (:pr:11431) Florian Jetter_
Make P2P more configurable (:pr-distributed:8469) Hendrik Makait_
Fit Dashboard worker table to page width (:pr-distributed:8897) Jacob Tomlinson_
Raise helpful error when using the wrong plugin base classes (:pr-distributed:8893) Jacob Tomlinson_
Fix url escaping on exceptions dashboard for non-string keys (:pr-distributed:8891) Patrick Hoefler_
Add meaningful error for out of disk exception during write (:pr-distributed:8886) Hendrik Makait_
Fix binary operations with scalar on the left (:pr-expr:1150) Patrick Hoefler_
Raise exception when calculating divisons (:pr-expr:1149) Patrick Hoefler_
Fix merge_asof for single partition (:pr-expr:1145) Patrick Hoefler_
Improve handling of optional dependencies in analyze and explain (:pr-expr:1146) Hendrik Makait_
Fix alignment issue with groupby index accessors (:pr-expr:1142) Patrick Hoefler_
Fix displaying timestamp scalar (:pr-expr:1141) Patrick Hoefler_

.. _v2024.9.1:

2024.9.1

Highlights ^^^^^^^^^^

Improved adaptive scaling resilience """""""""""""""""""""""""""""""""""" Adaptive scaling clusters now recover from spurious errors during scaling.

See :pr-distributed:8871 by Hendrik Makait_ for more details.

.. dropdown:: Additional changes

Improve error message for incorrect columns order in meta information (:pr:11393) Dmitry Balabka_
Update gpuCI RAPIDS_VER to 24.12 (:pr:11407)
Bump jacobtomlinson/gha-anaconda-package-version from 0.1.3 to 0.1.4 (:pr:11405)
Switch to using zarr.open_array instead of using the zarr.Array constructor (:pr:11387) Joe Hamman_
Update gpuCI RAPIDS_VER to 24.12 (:pr-distributed:8879)
Don't consider scheduler idle while executing Scheduler.update_graph (:pr-distributed:8877) Hendrik Makait_
Bump jacobtomlinson/gha-anaconda-package-version from 0.1.3 to 0.1.4 (:pr-distributed:8878)
Support P2P rechunking datetime arrays (:pr-distributed:8875) James Bourbeau_

.. _v2024.9.0:

2024.9.0

Highlights ^^^^^^^^^^

Bump Bokeh minimum version to 3.1.0 """"""""""""""""""""""""""""""""""" bokeh>=3.1.0 is now required for diagnostics and the distributed cluster dashboard.

See :pr:11375 and :pr-distributed:8861 by James Bourbeau_ for more details.

Introduce new Task class """""""""""""""""""""""" Add a Task class to replace tuples for task specification.

See :pr:11248 by Florian Jetter_ for more details.

.. dropdown:: Additional changes

Bump peter-evans/create-pull-request from 6 to 7 (:pr:11380)
Reduce overhead in tokenize (:pr:11373) Florian Jetter_
Move tokenize to dedicated submodule (:pr:11371) Florian Jetter_
Ensure process_runnables is not too eager in the presence of multiple splits (:pr:11367) Florian Jetter_
Use np.min_scalar_type in shuffle (:pr:11369) James Bourbeau_
Write indexing arrays into dask graph to reduce size for multiple xarray variables (:pr:11362) Patrick Hoefler_
Cast indexer to minimal dtype in shuffle (:pr:11364) Patrick Hoefler_
Reduce memory usage of dask.order (:pr:11361) Florian Jetter_
Bump JamesIves/github-pages-deploy-action from 4.6.3 to 4.6.4 (:pr:11366)
precommit autoupdate (:pr:11360) Florian Jetter_
Homogeneously schedule P2P's unpack tasks (:pr-distributed:8873) Hendrik Makait_
Work/fix firewall for localhost (:pr-distributed:8868) Mario Linker_
Use new tokenize module (:pr-distributed:8858) James Bourbeau_
Point to user code with idempotent plugin warning (:pr-distributed:8856) James Bourbeau_
Fix test nanny timeout (:pr-distributed:8847) Florian Jetter_
Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.4 (:pr-distributed:8853)
Speed up Client.map by computing token only once for func and kwargs (:pr-distributed:8855) Florian Jetter_
Update pre-commit (:pr-distributed:8852) Florian Jetter_

.. _v2024.8.2:

2024.8.2

Highlights ^^^^^^^^^^

Automatic selection of rechunking method """"""""""""""""""""""""""""""""""""""""

To enable users to rechunk data at larger scales than before, Dask now automatically chooses an appropriate rechunking method when rechunking on a cluster. This requires no additional configuration and is enabled by default.

Specifically, Dask chooses between task-based and P2P rechunking. While task-based rechunking has been the previous default, P2P rechunking is beneficial when rechunking requires almost all-to-all communication between the old and new chunks, e.g., when changing between spacial and temporal chunking. In these cases, P2P rechunking offers constant memory usage and creates smaller task graphs. As a result, it works for cases where tasks-based rechunking would have previously failed.

To disable automatic selection, users can select their preferred method via the configuration

.. code-block::

import dask.config
# Choose either "tasks" or "p2p"
dask.config.set({"array.rechunk.method": "tasks"})

or when rechunking

.. code-block::

import dask.array as da
arr = da.random.random(size=(1000, 1000, 365), chunks=(-1, -1, "auto"))
# Choose either "tasks" or "p2p"
arr = arr.rechunk(("auto", "auto", -1), method="tasks")

See :pr:11337 by Hendrik Makait_ for more details.

New shuffle API for Dask Arrays """""""""""""""""""""""""""""""

Dask added a shuffle-API to Dask Arrays. This API allows for shuffling the data along a single dimension. It will ensure that every group of elements along this dimension are in exactly one chunk. This is a very useful operation for GroupBy-Map patterns in Xarray. See :py:func:~dask.array.Array.shuffle for more information and API signature.

See :pr:11267, :pr:11311 and :pr:11326 by Patrick Hoefler_ for more details.

New blockwise_reshape API for Dask Arrays """""""""""""""""""""""""""""""""""""""""

The new :py:func:~dask.array.blockwise_reshape enables an embarassingly parallel reshaping operation for cases where you don't care about the order of the underlying array. It is embarassingly parallel and doesn't trigger a rechunking operation under the hood anymore. This is useful when you don't care about the order of the resulting Array, i.e. if a reduction is applied to the array or if the reshaping is only temporary.

.. code-block::

arr = da.random.random(size=(100, 100, 48_000), chunks=(1000, 100, 83)
result = reshape_blockwise(arr, (10_000, 48_000))
result.sum()

# or: do something that preserves the shape of each chunk

result = reshape_blockwise(result, (100, 100, 48_000), chunks=arr.chunks)

Dask will automatically calculate the resulting chunks if the number of dimensions is reduced, but you have to specify the resulting chunks if the number of dimensions is increased.

Reshaping a Dask Array oftentimes creates a very complicated computations with rechunk operations in between because Dask respect the C ordering of the Array by default. This ensures that the resulting Dask Array is returned in the same order as the corresponding NumPy Array. However, this can lead to very inefficient computations. The blockwise_reshape is a lot more efficient than the default implemenation if you don't care about the order.

.. warning::

Blockwise reshape operations are more efficient as the default, but they will
return an Array that is ordered differently. Use with care!

See :pr:11328 by Patrick Hoefler_ for more details.

Mutlidimensional positional indexing keeping chunksizes consistent """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Indexing a Dask Array with :py:func:~dask.array.vindex previously created a single output chunk along the dimensions that were indexed. vindex is commonly used in Xarray when indexing multiple dimensions in a single step, i.e.:

.. code-block::

arr = xr.DataArray(
    da.random.random((100, 100, 100), chunks=(5, 5, 50)),
    dims=['a', "b", "c"],
)

Previously, this put the indexed dimensions into a single chunk:

.. image:: images/changelog/vindex-memory-increase.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB

Dask now uses an improved algorithm that ensures that the chunksizes are kept consistent:

.. image:: images/changelog/vindex-memory-constant.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB

See :pr:11330 by Patrick Hoefler_ for more details.

.. dropdown:: Additional changes

Add changelog entries for shuffle, vindex and blockwise_reshape (:pr:11350) Patrick Hoefler_
Ensure persisted collections are released without GC (:pr:11348) Florian Jetter_
Update zoom link for dask meeting (:pr:11357) Sarah Charlotte Johnson_
Add more docstring examples for normalize_chunks (:pr:11271) Illviljan_
Choose automatically between tasks-based and p2p rechunking (:pr:11337) Hendrik Makait_
Implement blockwise reshaping API for arrays (:pr:11328) Patrick Hoefler_
Make rechunking in shuffle more intelligent to distribute unevenly if necessary (:pr:11326) Patrick Hoefler_
Increase visibility of GPU CI updates (:pr:11345) Charles Blackmon-Luca_
Update numpy and pyarrow versions in install docs (:pr:11340) James Bourbeau_
Fixup dask and distributed dependencies (:pr:11338) Patrick Hoefler_
Bump numpy>=1.24 and pyarrow>=14.0.1 minimum versions (:pr:11331) James Bourbeau_
Add crick back to Python 3.11+ CI builds (:pr:11335) James Bourbeau_
Preserve chunksizes in vindex (:pr:11330) Patrick Hoefler_
Fix dask.array.fft mismatch with Numpy's interface (add support for norm argument) (:pr:10665) joanrue_
Pass additional parameters to rechunk_p2p (:pr:11319) Hendrik Makait_
Fix docstring formatting for map_overlap (:pr:11332) Tao Xin_
Fix NumPy overflowing for prod on 2.0 (:pr:11327) Patrick Hoefler_
Ensure axes are positive / add tests for negative axes (:pr:10812) joanrue_
Fix map_overlap with new_axis (:pr:11128) David Stansby_
Avoid capturing code of xdist (:pr-distributed:8846) Florian Jetter_
Reduce memory footprint of culling P2P rechunking (:pr-distributed:8845) Hendrik Makait_
Add tests for choosing default rechunking method (:pr-distributed:8843) Hendrik Makait_
Increase visibility of GPU CI updates (:pr-distributed:8841) Charles Blackmon-Luca_
Bump test_pause_while_idle timeout (:pr-distributed:8844) Florian Jetter_
Concatenate small input chunks before P2P rechunking (:pr-distributed:8832) Hendrik Makait_
Remove dump cluster from gen_cluster (:pr-distributed:8823) Florian Jetter_
Bump numpy>=1.24 and pyarrow>=14.0.1 minimum versions (:pr-distributed:8837) James Bourbeau_
Fix PipInstall plugin on Worker (:pr-distributed:8839) Hendrik Makait_
Remove more Python 3.10 compatibility code (:pr-distributed:8824) James Bourbeau_
Use task-based rechunking to prechunk along partial boundaries (:pr-distributed:8831) Hendrik Makait_
Ensure client_desires_keys does not corrupt Scheduler state (:pr-distributed:8827) Florian Jetter_
Bump minimum cloudpickle to 3 (:pr-distributed:8836) James Bourbeau_

.. _v2024.8.1:

2024.8.1

Highlights ^^^^^^^^^^

Improve output chunksizes for reshaping Dask Arrays """""""""""""""""""""""""""""""""""""""""""""""""""

Reshaping a Dask Array oftentimes squashed the dimensions to reshape into a single chunk. This caused very large output chunks and subsequently a lot of out of memory errors and performance issues.

.. code-block::

arr = da.ones(shape=(1000, 100, 48_000), chunks=(1000, 100, 83))
arr.reshape(1000, 100, 4, 12_000)

Previously, this put the last dimension into a single chunk of size 12_000.

.. image:: images/changelog/reshape-memory-increase.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB

The new algorithm will ensure that the chunk-size between in- and output is kept the same. This will avoid large increases in chunk-size and fragmentation of chunks.

.. image:: images/changelog/reshape-constant-memory.png :width: 75% :align: center :alt: Size of each individual chunk stays the same

Improve scheduling efficiency for Xarray Rechunk-GroupBy-Reduce patterns """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

The scheduler previously created an inefficient execution graph for Xarray GroupBy-Reduction patterns that use the cohorts strategy:

.. code-block:: python

import xarray as xr

arr = xr.open_zarr(...)
arr.chunk(time=TimeResampler("ME")).groupby("time.month").mean()

An issue in the algorithm that creates the execution order of the task graph lead to an inefficient execution strategy that accumulates a lot of unnecessary memory on the cluster. The improvement is very similar to :ref:the previous ordering improvement in 2024.08.0 <label.xarray_groupby_ordering>.

Drop support for Python 3.9 """""""""""""""""""""""""""

This release drops support for Python 3.9 in accordance with NEP 29. Python 3.10 is now the required minimum version to run Dask.

See :pr:11245 and :pr-distributed:8793 by Patrick Hoefler_ for more details.

.. dropdown:: Additional changes

Ensure pickle does not change tokens (:pr:11320) Florian Jetter_
Add changelog entry for reshape and ordering improvements (:pr:11324) Patrick Hoefler_
Rename chunksize-tolerance option (:pr:11317) Patrick Hoefler_
Upgrade gpuCI and fix Dask Array failures with "cupy" backend (:pr:11309) Richard (Rick) Zamora_
Implement automatic rechunking for shuffle (:pr:11311) Patrick Hoefler_
Ensure we test against numpy 2 in CI (:pr:11182) James Bourbeau_
Revert "Test ordering on distributed scheduler (:pr:11310)" (:pr:11321) Florian Jetter_
Test ordering on distributed scheduler (:pr:11310) Florian Jetter_
Add tests to cover more cases of new reshape implementation (:pr:11313) Patrick Hoefler_
Order: Choose better target for branches with multiple leaf nodes (:pr:11303) Patrick Hoefler_
Order: Ensure runnable tasks are certainly runnable (:pr:11305) Florian Jetter_
Fix upstream numpy build (:pr:11304) Patrick Hoefler_
Make shuffle a no-op if possible (:pr:11291) Patrick Hoefler_
Keep chunksize consistent in reshape (:pr:11273) Patrick Hoefler_
Enable slicing with only one unknown chunk (:pr:11301) Patrick Hoefler_
Link to dask vs spark benchmarks on Dask docs (:pr:11289) Sarah Charlotte Johnson_
Fix slicing for masked arrays (:pr:11300) Patrick Hoefler_
Array: fix asarray for array input with dtype (:pr:11288) Lucas Colley_
Add numpy constants to array api (:pr:11287) Lucas Colley_
Ignore typing of return value (:pr:11286) Patrick Hoefler_
Remove automatic resizing in reshape (:pr:11269) Patrick Hoefler_
API: expose np dtypes in dask.array namespace (:pr:11178) Lucas Colley_
Reduce frequency of unmanaged memory use warning (:pr-distributed:8834) Patrick Hoefler_
Update gpuCI RAPIDS_VER to 24.10 (:pr-distributed:8786)
Avoid RuntimeError: dictionary changed size during iteration in Server._shift_counters() (:pr-distributed:8828) Hendrik Makait_
Improve concurrent close for scheduler (:pr-distributed:8829) Hendrik Makait_
MINOR: Extract truncation logic out of partial concatenation in P2P rechunking (:pr-distributed:8826) Hendrik Makait_
avoid excessive attribute access overhead for remove_from_task_prefix_count (:pr-distributed:8821) Florian Jetter_
Avoid key validation if validation is disabled (:pr-distributed:8822) Florian Jetter_
Log worker_client event (:pr-distributed:8819) James Bourbeau_

.. _v2024.8.0:

2024.8.0

Highlights ^^^^^^^^^^

Improve efficiency and performance of slicing with positional indexers """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Performance improvement for slicing a Dask Array with a positional indexer. Random access patterns are now more stable and produce easier-to-use results.

.. code-block:: python

x[slice(None), [1, 1, 3, 6, 3, 4, 5]]

Using a positional indexer was previously prone to drastically increasing the number of output chunks and generating a very large task graph. This has been fixed with a more efficient algorithm.

The new algorithm will keep the chunk-sizes along the axis that is indexed the same to avoid fragmentation of chunks or a large increase in chunk-size.

See :pr:11262 and :pr:11267 by Patrick Hoefler_ for more details and performance benchmarks.

.. _label.xarray_groupby_ordering:

Improve scheduling efficiency for Xarray GroupBy-Reduce patterns """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

The scheduler previously created an inefficient execution graph for Xarray GroupBy-Reduction patterns like:

.. code-block:: python

import xarray as xr

arr = xr.open_zarr(...)
arr.groupby("time.month").mean()

An issue in the algorithm that creates the execution order of the task graph lead to an inefficient execution strategy that accumulates a lot of unneceessary memory on the cluster.

.. image:: images/changelog/dask-order-growing-memory.png :width: 75% :align: center :alt: Memory keeps accumulating on the cluster when running an embarassingly parallel operation.

The operation itself is embarassingly parallel. Using the proper execution strategy the scheduler can now execute the operation with constant memory, avoiding spilling and allowing us to scale to larger datasets.

.. image:: images/changelog/dask-order-constant-memory.png :width: 75% :align: center :alt: Same operation is running with constant memory usage for the whole computation and can scale for bigger datasets.

See :pr-distributed:8818 by Patrick Hoefler_ for more details and examples.

.. dropdown:: Additional changes

Add changelog for dask order patch (:pr:11278) Patrick Hoefler_
Add regression test for xarray map reduce (:pr:11277) Florian Jetter_
Add changelog entry for take (:pr:11274) Patrick Hoefler_
Revert "order: remove data task graph normalization" (:pr:11276) Patrick Hoefler_
Use the shuffle algorithm for take (:pr:11267) Patrick Hoefler_
Implement task-based array shuffle (:pr:11262) Patrick Hoefler_
Remove data task graph normalization (:pr:11263) Florian Jetter_
Update zoom link for monthly meeting (:pr:11265) Sarah Charlotte Johnson_
Update data loading section of best practices (:pr:11247) Patrick Hoefler_
Match default chunksize in docstring to actual default set in code (:pr:11254) Bernhard Raml_
Fixup casting error in pandas 3 (:pr:11250) Patrick Hoefler_
Skip new warning from pandas (:pr:11249) Patrick Hoefler_
Fix pandas nightly bugs (:pr:11244) Patrick Hoefler_
Run graph normalisation after dask order (:pr-distributed:8818) Patrick Hoefler_
Update large graph size warning to remove scatter recommendation (:pr-distributed:8815) Patrick Hoefler_
Fail tasks exceeding no-workers-timeout (:pr-distributed:8806) Hendrik Makait_
Fix exception handling for NannyPlugin.setup and NannyPlugin.teardown (:pr-distributed:8811) Hendrik Makait_
Fix exception handling for WorkerPlugin.setup and WorkerPlugin.teardown (:pr-distributed:8810) Hendrik Makait_
typo fix (:pr-distributed:8812) alex-rakowski_
Fix if / else for send_recv_from_rpc (:pr-distributed:8809) Patrick Hoefler_
Ensure that adaptive only stops once (:pr-distributed:8807) Hendrik Makait_
Reduce noise from GC-related logging (:pr-distributed:8804) Hendrik Makait_
Remove unused delete_interval and synchronize_worker_interval from Scheduler (:pr-distributed:8801) Hendrik Makait_
Change log level for Compute Failed log message (:pr-distributed:8802) Patrick Hoefler_
Add Prometheus metric for time spent on GC (:pr-distributed:8803) Hendrik Makait_
Add Prometheus metrics for dask_worker_{added|removed}_total (:pr-distributed:8798) Hendrik Makait_
Add log event for worker-ttl-timed-out (:pr-distributed:8800) Hendrik Makait_
Add Prometheus metrics for dask_client_connections_{added|removed}_total (:pr-distributed:8799) Hendrik Makait_
Fix PackageInstall plugin (:pr-distributed:8794) Hendrik Makait_
Make stealing more robust (:pr-distributed:8788) Hendrik Makait_
Leave a warning about future instantiation (:pr-distributed:8782) Florian Jetter_

.. _v2024.7.1:

2024.7.1

Highlights ^^^^^^^^^^

More resilient distributed lock """""""""""""""""""""""""""""""

:py:class:distributed.Lock is now resilient to worker failures. Previously deadlocks were possible in cases where a lock-holding worker was lost and/or failed to release the lock due to an error.

See :pr-distributed:8770 by Florian Jetter_ for more details.

.. dropdown:: Additional changes

Remove and warn of persist usage (:pr:11237) Patrick Hoefler_
Preserve timestamp unit during meta creation (:pr:11233) Patrick Hoefler_
Ensure that dask-expr DataFrames are optimized when put into delayed (:pr:11231) Patrick Hoefler_
Fixes for d freq deprecation in pandas=3 (:pr:11228) James Bourbeau_
bump approx threshold for test_quantile (:pr:10720) Florian Jetter_
Bump xarray-contrib/issue-from-pytest-log from 1.2.8 to 1.3.0 (:pr:11221)
Bump JamesIves/github-pages-deploy-action from 4.6.1 to 4.6.3 (:pr:11222)
Ensure Lock always register with scheduler (:pr-distributed:8781) Florian Jetter_
Temporarily pin setuptools < 71 (:pr-distributed:8785) James Bourbeau_
Restore len() on TaskPrefix (:pr-distributed:8783) Hendrik Makait_
Avoid false positives for p2p-failed log event (:pr-distributed:8777) Hendrik Makait_
Expose paused and retired workers separately in prometheus (:pr-distributed:8613) Patrick Hoefler_
Creating transitions-failures log event (:pr-distributed:8776) alex-rakowski_
Implement HLG layer for P2P rechunking (:pr-distributed:8751) Hendrik Makait_
Add another test for a possible deadlock scenario caused by (:pr-distributed:8703) (:pr-distributed:8769) Hendrik Makait_
Raise an error if compute on persisted collection with released futures (:pr-distributed:8764) Florian Jetter_
Re-raise P2PConsistencyError from failed P2P tasks (:pr-distributed:8748) Hendrik Makait_
Robuster faster tests memory sampler (:pr-distributed:8758) Florian Jetter_
Fix scheduler_bokeh::test_shuffling (:pr-distributed:8766) Florian Jetter_
Increase timeouts for pubsub::test_client_worker (:pr-distributed:8765) Florian Jetter_
Factor out async taskgroup (:pr-distributed:8756) Florian Jetter_
Don't sort keys lexicographically in worker table (:pr-distributed:8753) Florian Jetter_
Use functools.cache instead of functools.lru_cache for extremely often called functions (:pr-distributed:8762) Jonas Dedden_
Robuster deeply nested structures (:pr-distributed:8730) Florian Jetter_
Adding HLG to MAP (:pr-distributed:8740) alex-rakowski_
Add close worker button to worker info page (:pr-distributed:8742) James Bourbeau_

.. _v2024.7.0:

2024.7.0

Highlights ^^^^^^^^^^

Drop support for pandas 1.x """""""""""""""""""""""""""

This release drops support for pandas<2. pandas 2.0 is now the required minimum version to run Dask DataFrame.

The mimimum version of partd was also raised to 1.4.0. Versions before 1.4 are not compatible with pandas 2.

See :pr:11199 by Patrick Hoefler_ for more details.

Publish-subscribe APIs deprecated """""""""""""""""""""""""""""""""

:py:class:distributed.Pub and :py:class:distributed.Sub have been deprecated and will be removed in a future release. Please switch to :py:func:distributed.Client.log_event and :py:func:distributed.Worker.log_event instead.

See :pr-distributed:8724 by Hendrik Makait_ for more details.

.. dropdown:: Additional changes

Only count data that is in memory for xarray sizeof (:pr:11206) Florian Jetter_
Fix botocore re-raising error (:pr:11209) Patrick Hoefler_
Update Coiled links in documentation (:pr:11211) Sarah Charlotte Johnson_
Add some array-expr methods (:pr:11210) Patrick Hoefler_
Fix quantile for arrow dtypes (:pr:11202) Patrick Hoefler_
Add utility to verify optional dependencies (:pr:11205) Patrick Hoefler_
Implement array expression switch (:pr:11203) Patrick Hoefler_
Remove no longer supported ipython reference (:pr:11196) Patrick Hoefler_
Remove from_delayed references (:pr:11195) Patrick Hoefler_
Add other IO connectors to docs (:pr:11189) Patrick Hoefler_
Fix assert_eq import from cudf (:pr-distributed:8747) James Bourbeau_
Log traceback upon task error (:pr-distributed:8746) Hendrik Makait_
Update system monitor when polling Prometheus metrics (:pr-distributed:8745) Hendrik Makait_
Bump pandas to 2.0 in mindeps build (:pr-distributed:8743) James Bourbeau_
Refactor event logging functionality into broker (:pr-distributed:8731) Hendrik Makait_
Drop support for pandas 1.X (:pr-distributed:8741) Hendrik Makait_
Remove is_python_shutting_down (:pr-distributed:8492) Hendrik Makait_
Fix test_task_state_instance_are_garbage_collected (:pr-distributed:8735) Hendrik Makait_
Fix floating-point inaccuracy (:pr-distributed:8736) Hendrik Makait_
Fix pynvml handles (:pr-distributed:8693) Benjamin Zaitlen_
get_ip: handle getting 0.0.0.0 (:pr-distributed:8712) Adam Williamson_
Remove FutureWarning in test_task_state_instance_are_garbage_collected (:pr-distributed:8734) Hendrik Makait_
Fix mindeps-testing on CI (:pr-distributed:8728) Hendrik Makait_
Extract tests related to event-logging into separate file (:pr-distributed:8733) Hendrik Makait_
Use safer context for ProcessPoolExecutor (:pr-distributed:8715) Elliott Sales de Andrade_
Cache URL encoding of worker addresses in dashboard (:pr-distributed:8725) Florian Jetter_
More robust bokeh test_shuffling (:pr-distributed:8727) Florian Jetter_
Fix type in actor docs (:pr-distributed:8711) Sultan Orazbayev_
More useful warning if a plugin type is provided instead of instance (:pr-distributed:8689) Florian Jetter_
Improve error on cancelled tasks due to disconnect (:pr-distributed:8705) Hendrik Makait_
Fix wait condition on test_forget_errors (:pr-distributed:8714) Elliott Sales de Andrade_
Skip test_deadlock_dependency_of_queued_released (:pr-distributed:8723) Hendrik Makait_
Fix test_quiet_client_close (:pr-distributed:8722) Hendrik Makait_
Fix cleanup iteration in save_sys_modules (:pr-distributed:8713) Elliott Sales de Andrade_
Add quotes to missing bokeh installation commands (:pr-distributed:8717) James Bourbeau_

.. _v2024.6.2:

2024.6.2

This is a patch release to update an issue with dask and distributed version pinning in the 2024.6.1 release.

.. dropdown:: Additional changes

Get docs build passing (:pr:11184) James Bourbeau_
profile._f_lineno: handle next_line being None in Python 3.13 (:pr:8710) Adam Williamson_

.. _v2024.6.1:

2024.6.1

Highlights ^^^^^^^^^^

This release includes a critical fix that fixes a deadlock that can arise when dependencies of root-ish tasks are rescheduled, e.g. due to a worker being lost.

See :pr-distributed:8703 by Hendrik Makait_ for more details.

.. dropdown:: Additional changes

Cache global query-planning config (:pr:11183) Richard (Rick) Zamora_
Python 3.13 fixes (:pr:11185) Adam Williamson_
Fix test_map_freq_to_period_start for pandas=3 (:pr:11181) James Bourbeau_
Bump release-drafter/release-drafter from 5 to 6 (:pr-distributed:8699)

.. _v2024.6.0:

2024.6.0

Highlights ^^^^^^^^^^

memmap array tokenization """"""""""""""""""""""""" Tokenizing memmap arrays will now avoid materializing the array into memory.

See :pr:11161 by Florian Jetter_ for more details.

.. dropdown:: Additional changes

Fix test_dt_accessor with query planning disabled (:pr:11177) James Bourbeau_
Use packaging.version.Version (:pr:11171) James Bourbeau_
Remove deprecated dask.compatibility module (:pr:11172) James Bourbeau_
Ensure compatibility for xarray.NamedArray (:pr:11168) Hendrik Makait_
Estimate sizes of xarray collections (:pr:11166) Florian Jetter_
Add section about futures and variables (:pr:11164) Florian Jetter_
Update docs for combined Dask community meeting info (:pr:11159) Sarah Charlotte Johnson_
Avoid rounding error in test_prometheus_collect_count_total_by_cost_multipliers (:pr-distributed:8687) Hendrik Makait_
Log key collision count in update_graph log event (:pr-distributed:8692) Hendrik Makait_
Automate GitHub Releases when new tags are pushed (:pr-distributed:8626) Jacob Tomlinson_
Fix log event with multiple topics (:pr-distributed:8691) Hendrik Makait_
Rename safe to expected in Scheduler.remove_worker (:pr-distributed:8686) Hendrik Makait_
Log event during failure (:pr-distributed:8663) Hendrik Makait_
Eagerly update aggregate statistics for TaskPrefix instead of calculating them on-demand (:pr-distributed:8681) Hendrik Makait_
Improve graph submission time for P2P rechunking by avoiding unpack recursion into indices (:pr-distributed:8672) Florian Jetter_
Add safe keyword to remove-worker event (:pr-distributed:8647) alex-rakowski_
Improved errors and reduced logging for P2P RPC calls (:pr-distributed:8666) Hendrik Makait_
Adjust P2P tests for dask-expr (:pr-distributed:8662) Hendrik Makait_
Iterate over copy of Server.digests_total_since_heartbeat to avoid RuntimeError (:pr-distributed:8670) Hendrik Makait_
Log task state in Compute Failed (:pr-distributed:8668) Hendrik Makait_
Add Prometheus gauge for task groups (:pr-distributed:8661) Hendrik Makait_
Fix too strict assertion in shuffle code for pandas subclasses (:pr-distributed:8667) Joris Van den Bossche_
Reduce noise from erring tasks that are not supposed to be running (:pr-distributed:8664) Hendrik Makait_

.. _v2024.5.2:

2024.5.2

This release primarily contains minor bug fixes.

.. dropdown:: Additional changes

Fix nightly Zarr installation in CI (:pr:11151) James Bourbeau_
Add python 3.11 build to GPU CI (:pr:11135) Charles Blackmon-Luca_
Update gpuCI RAPIDS_VER to 24.08 (:pr:11141)
Update test_groupby_grouper_dispatch (:pr:11144) Richard (Rick) Zamora_
Bump JamesIves/github-pages-deploy-action from 4.6.0 to 4.6.1 (:pr:11136)
Unskip test_array_function_sparse with new sparse release (:pr:11139) James Bourbeau_
Fix test_parse_dates_multi_column on pandas=3 (:pr:11132) James Bourbeau_
Don't draft release notes for tagged commits (:pr:11138) Jacob Tomlinson_
Reduce task group count for partial P2P rechunks (:pr-distributed:8655) Hendrik Makait_
Update gpuCI RAPIDS_VER to 24.08 (:pr-distributed:8652)
Submit collections metadata to scheduler (:pr-distributed:8612) Florian Jetter_
Fix indent in code example in task-launch.rst (:pr-distributed:8650) Ray Bell_
Avoid multiple WorkerState sphinx error (:pr-distributed:8643) James Bourbeau_

.. _v2024.5.1:

2024.5.1

Highlights ^^^^^^^^^^

NumPy 2.0 support """"""""""""""""" This release contains compatibility updates for the upcoming NumPy 2.0 release.

See :pr:11096 by Benjamin Zaitlen_ and :pr:11106 by James Bourbeau_ for more details.

Increased Zarr store support """""""""""""""""""""""""""" This release contains adds support for MutableMapping-backed Zarr stores like :py:class:zarr.storage.DirectoryStore, etc.

See :pr:10422 by Greg M. Fleishman_ for more details.

.. dropdown:: Additional changes

Minor updates to ML page (:pr:11129) James Bourbeau_
Skip failing sparse test on 0.15.2 (:pr:11131) James Bourbeau_
Make sure nightly pyarrow is installed in upstream CI build (:pr:11121) James Bourbeau_
Add initial draft of ML overview document (:pr:11114) Matthew Rocklin_
Test query-planning in gpuCI (:pr:11060) Richard (Rick) Zamora_
Avoid pytest error when skipping NumPy 2.0 tests (:pr:11110) James Bourbeau_
Use nightly h5py in upstream CI build (:pr:11108) James Bourbeau_
Use nightly scikit-image in upstream CI build (:pr:11107) James Bourbeau_
Bump actions/checkout from 4.1.4 to 4.1.5 (:pr:11105)
Enable parquet append tests after fix (:pr:11104) Patrick Hoefler_
Skip fastparquet tests for numpy 2 (:pr:11103) Patrick Hoefler_
Fix misspelling found by codespell (:pr:11097) Dimitri Papadopoulos Orfanos_
Fix doc build (:pr:11099) Patrick Hoefler_
Clean up percentiles_summary logic (:pr:11094) Richard (Rick) Zamora_
Apply ruff/flake8-implicit-str-concat rule ISC001 (:pr:11098) Dimitri Papadopoulos Orfanos_
Fix clocks on Windows with Python 3.13 (:pr-distributed:8642) Victor Stinner_
Fix "Print host info" CI step on Mac OS (arm64) (:pr-distributed:8638) Hendrik Makait_

.. _v2024.5.0:

2024.5.0

Highlights ^^^^^^^^^^

This release primarily contains minor bugfixes.

.. dropdown:: Additional changes

Don't link to click intersphinx dev version (:pr:11091) M Bussonnier_
Fix API doc links for some dask-expr expressions (:pr:11092) Patrick Hoefler_
Add dask-expr to upstream build (:pr:11086) Patrick Hoefler_
Add melt support when query-planning is enabled (:pr:11088) Richard (Rick) Zamora_
Skip dataframe/product when in numpy 2 envs (:pr:11089) Benjamin Zaitlen_
Add plots to illustrate what the optimizer does (:pr:11072) Patrick Hoefler_
Fixup pandas upstream tests (:pr:11085) Patrick Hoefler_
Bump conda-incubator/setup-miniconda from 3.0.3 to 3.0.4 (:pr:11084)
Bump actions/checkout from 4.1.3 to 4.1.4 (:pr:11083)
Fix CI after pytest changes (:pr:11082) Patrick Hoefler_
Fixup tests for more efficient dask-expr implementation (:pr:11071) Patrick Hoefler_
Generalize clear_known_categories utility (:pr:11059) Richard (Rick) Zamora_
Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.0 (:pr:11062)
Bump release-drafter/release-drafter from 5 to 6 (:pr:11063)
Bump actions/checkout from 4.1.2 to 4.1.3 (:pr:11061)
Update GPU CI RAPIDS_VER to 24.06, disable query planning (:pr:11045) Charles Blackmon-Luca_
Move tests (:pr-distributed:8631) Hendrik Makait_
Bump actions/checkout from 4.1.2 to 4.1.3 (:pr-distributed:8628)

.. _v2024.4.2:

2024.4.2

Highlights ^^^^^^^^^^

Trivial Merge Implementation """"""""""""""""""""""""""""

The Query Optimizer will inspect quires to determine if a merge(...) or groupby(...).apply(...) requires a shuffle. A shuffle can be avoided, if the DataFrame was shuffled on the same columns in a previous step without any operations in between that change the partitioning layout or the relevant values in each partition.

.. code-block:: python

>>> result = df.merge(df2, on="a")
>>> result = result.merge(df3, on="a")

The Query optimizer will identify that result was previously shuffled on "a" as well and thus only shuffle df3 in the second merge operation before doing a blockwise merge.

Auto-partitioning in read_parquet """""""""""""""""""""""""""""""""""""

The Query Optimizer will automatically repartition datasets read from Parquet files if individual partitions are too small. This will reduce the number of partitions in consequentially also the size of the task graph.

The Optimizer aims to produce partitions of at least 75MB and will combine multiple files together if necessary to reach this threshold. The value can be configured by using

.. code-block:: python

>>> dask.config.set({"dataframe.parquet.minimum-partition-size": 100_000_000})

The value is given in bytes. The default threshold is relatively conservative to avoid memory issues on worker nodes with a relatively small amount of memory per thread.

.. dropdown:: Additional changes

Add GitHub Releases automation (:pr:11057) Jacob Tomlinson_
Add changelog entries for new release (:pr:11058) Patrick Hoefler_
Reinstate try/except block in _bind_property (:pr:11049) Lawrence Mitchell_
Fix link for query planning docs (:pr:11054) Patrick Hoefler_
Add config parameter for parquet file size (:pr:11052) Patrick Hoefler_
Update percentile docstring (:pr:11053) Abel Aoun_
Add docs for query optimizer (:pr:11043) Patrick Hoefler_
Assignment of np.ma.masked to obect-type Array (:pr:9627) David Hassell_
Don't error if dask_expr is not installed (:pr:11048) Simon Høxbro Hansen_
Adjust test_set_index for "cudf" backend (:pr:11029) Richard (Rick) Zamora_
Use to/from_legacy_dataframe instead of to/from_dask_dataframe (:pr:11025) Richard (Rick) Zamora_
Tokenize bag groupby keys (:pr:10734) Charles Stern_
Add lazy "cudf" registration for p2p-related dispatch functions (:pr:11040) Richard (Rick) Zamora_
Collect memray profiles on exception (:pr-distributed:8625) Florian Jetter_
Ensure inproc properly emulates serialization protocol (:pr-distributed:8622) Florian Jetter_
Relax test stats profiling2 (:pr-distributed:8621) Florian Jetter_
Restart workers when worker-ttl expires (:pr-distributed:8538) crusaderky_
Use monotonic for deadline test (:pr-distributed:8620) Florian Jetter_
Fix race condition for published futures with annotations (:pr-distributed:8577) Florian Jetter_
Scatter by worker instead of worker -> nthreads (:pr-distributed:8590) Miles_
Send log-event if worker is restarted because of memory pressure (:pr-distributed:8617) Patrick Hoefler_
Do not print xfailed tests in CI (:pr-distributed:8619) Florian Jetter_
ensure workers are not downscaled when participating in p2p (:pr-distributed:8610) Florian Jetter_
Run against stable fsspec (:pr-distributed:8615) Florian Jetter_

.. _v2024.4.1:

2024.4.1

This is a minor bugfix release that that fixes an error when importing dask.dataframe with Python 3.11.9.

See :pr:11035 and :pr:11039 from Richard (Rick) Zamora_ for details.

.. dropdown:: Additional changes

Remove skips for named aggregations (:pr:11036) Patrick Hoefler_
Don't deep-copy read-only buffers on unpickle (:pr-distributed:8609) crusaderky_
Add dask-expr to dask conda recipe (:pr-distributed:8601) Charles Blackmon-Luca_

.. _v2024.4.0:

2024.4.0

Highlights ^^^^^^^^^^

Query planning fixes """""""""""""""""""" This release contains a variety of bugfixes in Dask DataFrame's new query planner.

GPU metric dashboard fixes """""""""""""""""""""""""" GPU memory and utilization dashboard functionality has been restored. Previously these plots were unintentionally left blank.

See :pr-distributed:8572 from Benjamin Zaitlen_ for details.

.. dropdown:: Additional changes

Build nightlies on tag releases (:pr:11014) Charles Blackmon-Luca_
Remove xfail tracebacks from test suite (:pr:11028) Patrick Hoefler_
Fix CI for upstream pandas changes (:pr:11027) Patrick Hoefler_
Fix value_counts raising if branch exists of nans only (:pr:11023) Patrick Hoefler_
Enable custom expressions in dask_cudf (:pr:11013) Richard (Rick) Zamora_
Raise ImportError instead of ValueError when dask-expr cannot be imported (:pr:11007) James Lamb_
Add HypersSpy to ecosystem.rst (:pr:11008) Jonas Lähnemann_
Add Hugging Face hf:// to the list of fsspec compatible remote services (:pr:11012) Quentin Lhoest_
Bump actions/checkout from 4.1.1 to 4.1.2 (:pr:11009)
Refresh documentation for annotations and spans (:pr-distributed:8593) crusaderky_
Fixup deprecation warning from pandas (:pr-distributed:8564) Patrick Hoefler_
Add Python 3.11 to GPU CI matrix (:pr-distributed:8598) Charles Blackmon-Luca_
Deadline to use a monotonic timer (:pr-distributed:8597) crusaderky_
Update gpuCI RAPIDS_VER to 24.06 (:pr-distributed:8588)
Refactor restart() and restart_workers() (:pr-distributed:8550) crusaderky_
Bump actions/checkout from 4.1.1 to 4.1.2 (:pr-distributed:8587)
Fix bokeh deprecations (:pr-distributed:8594) Miles_
Fix flaky test: test_shutsdown_cleanly (:pr-distributed:8582) Miles_
Include type in failed sizeof warning (:pr-distributed:8580) James Bourbeau_

.. _v2024.3.1:

2024.3.1

This is a minor release that primarily demotes an exception to a warning if dask-expr is not installed when upgrading.

.. dropdown:: Additional changes

Only warn if dask-expr is not installed (:pr:11003) Florian Jetter_
Fix typos found by codespell (:pr:10993) Dimitri Papadopoulos Orfanos_
Extra CI job with dask-expr disabled (:pr-distributed:8583) crusaderky_
Fix worker dashboard proxy (:pr-distributed:8528) Miles_
Fix flaky test_restart_waits_for_new_workers (:pr-distributed:8573) crusaderky_
Fix flaky test_raise_on_incompatible_partitions (:pr-distributed:8571) crusaderky_

.. _v2024.3.0:

2024.3.0

Released on March 11, 2024

Highlights ^^^^^^^^^^

Query planning """"""""""""""

This release is enabling query planning by default for all users of dask.dataframe.

The query planning functionality represents a rewrite of the DataFrame using dask-expr. This is a drop-in replacement and we expect that most users will not have to adjust any of their code. Any feedback can be reported on the Dask issue tracker <https://github.com/dask/dask/issues>_ or on the query planning feedback issue <https://github.com/dask/dask/issues/10995>_.

If you are encountering any issues you are still able to opt-out by setting

.. code-block:: python

>>> import dask
>>> dask.config.set({'dataframe.query-planning': False})

Sunset of Pandas 1.X support """"""""""""""""""""""""""""

The new query planning backend is requiring at least pandas 2.0. This pandas version will automatically be installed if you are installing from conda or if you are installing using dask[complete] or dask[dataframe] from pip.

The legacy DataFrame implementation is still supporting pandas 1.X if you install dask without extras.

.. dropdown:: Additional changes

Update tests for pandas nightlies with dask-expr (:pr:10989) Patrick Hoefler_
Use dask-expr docs as main reference docs for DataFrames (:pr:10990) Patrick Hoefler_
Adjust from_array test for dask-expr (:pr:10988) Patrick Hoefler_
Unskip to_delayed test (:pr:10985) Patrick Hoefler_
Bump conda-incubator/setup-miniconda from 3.0.1 to 3.0.3 (:pr:10978)
Fix bug when enabling dask-expr (:pr:10977) Patrick Hoefler_
Update docs and requirements for dask-expr and remove warning (:pr:10976) Patrick Hoefler_
Fix numpy 2 compatibility with ogrid usage (:pr:10929) David Hoese_
Turn on dask-expr switch (:pr:10967) Patrick Hoefler_
Force initializing the random seed with the same byte order interpret… (:pr:10970) Elliott Sales de Andrade_
Use correct encoding for line terminator when reading CSV (:pr:10972) Elliott Sales de Andrade_
perf: do not unnecessarily recalculate input/output indices in optimize_blockwise (:pr:10966) Lindsey Gray
Adjust tests for string option in dask-expr (:pr:10968) Patrick Hoefler_
Adjust tests for array conversion in dask-expr (:pr:10973) Patrick Hoefler_
TST: Fix sizeof tests on 32bit (:pr:10971) Elliott Sales de Andrade_
TST: Add missing skip for pyarrow (:pr:10969) Elliott Sales de Andrade_
Implement dask-expr conversion for bag.to_dataframe (:pr:10963) Patrick Hoefler_
Fix dask-expr import errors (:pr:10964) Miles_
Clean up Sphinx documentation for dask.config (:pr:10959) crusaderky_
Use stdlib importlib.metadata on Python 3.12+ (:pr:10955) wim glenn_
Cast partitioning_index to smaller size (:pr:10953) Florian Jetter_
Reuse dask/dask groupby Aggregation (:pr:10952) Patrick Hoefler_
ensure tokens on futures are unique (:pr-distributed:8569) Florian Jetter_
Don't obfuscate fine performance metrics failures (:pr-distributed:8568) crusaderky_
Mark shuffle fast tasks in dask-expr (:pr-distributed:8563) crusaderky_
Weigh gilknocker Prometheus metric by duration (:pr-distributed:8558) crusaderky_
Fix scheduler transition error on memory->erred (:pr-distributed:8549) Hendrik Makait_
Make CI happy again (:pr-distributed:8560) Miles_
Fix flaky test_Future_release_sync (:pr-distributed:8562) crusaderky_
Fix flaky test_flaky_connect_recover_with_retry (:pr-distributed:8556) Hendrik Makait_
typing tweaks in scheduler.py (:pr-distributed:8551) crusaderky_
Bump conda-incubator/setup-miniconda from 3.0.2 to 3.0.3 (:pr-distributed:8553)
Install dask-expr on CI (:pr-distributed:8552) Hendrik Makait_
P2P shuffle can drop partition column before writing to disk (:pr-distributed:8531) Hendrik Makait_
Better logging for worker removal (:pr-distributed:8517) crusaderky_
Add indicator support to merge (:pr-distributed:8539) Patrick Hoefler_
Bump conda-incubator/setup-miniconda from 3.0.1 to 3.0.2 (:pr-distributed:8535)
Avoid iteration error when getting module path (:pr-distributed:8533) James Bourbeau_
Ignore stdlib threading module in code collection (:pr-distributed:8532) James Bourbeau_
Fix excessive logging on P2P retry (:pr-distributed:8511) Hendrik Makait_
Prevent typos in retire_workers parameters (:pr-distributed:8524) crusaderky_
Cosmetic cleanup of test_steal (backport from #8185) (:pr-distributed:8509) crusaderky_
Fix flaky test_compute_per_key (:pr-distributed:8521) crusaderky_
Fix flaky test_no_workers_timeout_queued (:pr-distributed:8523) crusaderky_

.. _v2024.2.1:

2024.2.1

Released on February 23, 2024

Highlights ^^^^^^^^^^

Allow silencing dask.DataFrame deprecation warning """"""""""""""""""""""""""""""""""""""""""""""""""

The last release contained a DeprecationWarning that alerts users to an upcoming switch of dask.dafaframe to use the new backend with support for query planning (see also :issue:10934).

This DeprecationWarning is triggered in import of the dask.dataframe module and the community raised concerns about this being to verbose.

It is now possible to silence this warning

.. code::

# via Python
>>> dask.config.set({'dataframe.query-planning-warning': False})

# via CLI
dask config set dataframe.query-planning-warning False

See :pr:10936 and :pr:10925 from Miles_ for details.

More robust distributed scheduler for rare key collisions """""""""""""""""""""""""""""""""""""""""""""""""""""""""

Blockwise fusion optimization can cause a task key collision that is not being handled properly by the distributed scheduler (see :issue:9888). Users will typically notice this by seeing one of various internal exceptions that cause a system deadlock or critical failure. While this issue could not be fixed, the scheduler now implements a mechanism that should mitigate most occurences and issues a warning if the issue is detected.

See :pr-distributed:8185 from crusaderky_ and Florian Jetter_ for details.

Over the course of this, various improvements to tokenization have been implemented. See :pr:10913, :pr:10884, :pr:10919, :pr:10896 and primarily :pr:10883 from crusaderky_ for more details.

More robust adaptive scaling on large clusters """"""""""""""""""""""""""""""""""""""""""""""

Adaptive scaling could previously lose data during downscaling if many tasks had to be moved. This typically, but not exclusively, occured on large clusters and would manifest as a recomputation of tasks and could cause clusters to oscillate between up- and downscaling without ever finishing.

See :pr-distributed:8522 from crusaderky_ for more details.

.. dropdown:: Additional changes

Remove flaky fastparquet test (:pr:10948) Patrick Hoefler_
Enable Aggregation from dask-expr (:pr:10947) Patrick Hoefler_
Update tests for assign change in dask-expr (:pr:10944) Patrick Hoefler_
Adjust for pandas large string change (:pr:10942) Patrick Hoefler_
Fix flaky test_describe_empty (:pr:10943) crusaderky_
Use Python 3.12 as reference environment (:pr:10939) crusaderky_
[Cosmetic] Clean up temp paths in test_config.py (:pr:10938) crusaderky_
[CLI] dask config set and dask config find updates. (:pr:10930) Miles_
combine_first when a chunk is full of NaNs (:pr:10932) crusaderky_
Correctly parse lowercase true/false config from CLI (:pr:10926) crusaderky_
dask config get fix when printing None values (:pr:10927) crusaderky_
query-planning can't be None (:pr:10928) crusaderky_
Add dask config set (:pr:10921) Miles_
Make nunique faster again (:pr:10922) Patrick Hoefler_
Clean up some Cython warnings handling (:pr:10924) crusaderky_
Bump pre-commit/action from 3.0.0 to 3.0.1 (:pr:10920)
Raise and avoid data loss of meta provided to P2P shuffle is wrong (:pr-distributed:8520) Florian Jetter_
Fix gpuci: np.product is deprecated (:pr-distributed:8518) crusaderky_
Update gpuCI RAPIDS_VER to 24.04 (:pr-distributed:8471)
Unpin ipywidgets on Python 3.12 (:pr-distributed:8516) crusaderky_
Keep old dependencies on run_spec collision (:pr-distributed:8512) crusaderky_
Trivial mypy fix (:pr-distributed:8513) crusaderky_
Ensure large payload can be serialized and sent over comms (:pr-distributed:8507) Florian Jetter_
Allow large graph warning threshold to be configured (:pr-distributed:8508) Florian Jetter_
Tokenization-related test tweaks (backport from #8185) (:pr-distributed:8499) crusaderky_
Tweaks to update_graph (backport from #8185) (:pr-distributed:8498) crusaderky_
AMM: test incremental retirements (:pr-distributed:8501) crusaderky_
Suppress dask-expr warning in CI (:pr-distributed:8505) crusaderky_
Ignore dask-expr warning in CI (:pr-distributed:8504) James Bourbeau_
Improve tests for P2P stable ordering (:pr-distributed:8458) Hendrik Makait_
Bump pre-commit/action from 3.0.0 to 3.0.1 (:pr-distributed:8503)

.. _v2024.2.0:

2024.2.0

Released on February 9, 2024

Highlights ^^^^^^^^^^

Deprecate Dask DataFrame implementation """"""""""""""""""""""""""""""""""""""" The current Dask DataFrame implementation is deprecated. In a future release, Dask DataFrame will use new implementation that contains several improvements including a logical query planning. The user-facing DataFrame API will remain unchanged.

The new implementation is already available and can be enabled by installing the dask-expr library:

.. code-block:: bash

$ pip install dask-expr

and turning the query planning option on:

.. code-block:: python

>>> import dask
>>> dask.config.set({'dataframe.query-planning': True})
>>> import dask.dataframe as dd

API documentation for the new implementation is available at https://docs.dask.org/en/stable/dataframe-api.html

Any feedback can be reported on the Dask issue tracker https://github.com/dask/dask/issues

See :pr:10912 from Patrick Hoefler_ for details.

Improved tokenization """"""""""""""""""""" This release contains several improvements to Dask's object tokenization logic. More objects now produce deterministic tokens, which can lead to improved performance through caching of intermediate results.

See :pr:10898, :pr:10904, :pr:10876, :pr:10874, and :pr:10865 from crusaderky_ for details.

.. dropdown:: Additional changes

Fix inplace modification on read-only arrays for string conversion (:pr:10886) Patrick Hoefler_
Add changelog entry for dask-expr (:pr:10915) Patrick Hoefler_
Fix leftsemi merge for cudf (:pr:10914) Patrick Hoefler_
Slight update to dask-expr warning (:pr:10916) James Bourbeau_
Improve performance for groupby.nunique (:pr:10910) Patrick Hoefler_
Add configuration for leftsemi merges in dask-expr (:pr:10908) Patrick Hoefler_
Adjust assign test for dask-expr (:pr:10907) Patrick Hoefler_
Avoid pytest.warns in test_to_datetime for GPU CI (:pr:10902) Richard (Rick) Zamora_
Update deployment options in docs homepage (:pr:10901) James Bourbeau_
Fix typo in dataframe docs (:pr:10900) Matthew Rocklin_
Bump peter-evans/create-pull-request from 5 to 6 (:pr:10894)
Fix mimesis API >=13.1.0 - use random.randint (:pr:10888) Miles_
Adjust invalid test (:pr:10897) Patrick Hoefler_
Pickle da.argwhere and da.count_nonzero (:pr:10885) crusaderky_
Fix dask-expr tests after singleton pr (:pr:10892) Patrick Hoefler_
Set lower bound version for s3fs (:pr:10889) Miles_
Add a couple of dask-expr fixes for new parquet cache (:pr:10880) Florian Jetter_
Update deployment documentation (:pr:10882) Matthew Rocklin_
Start with dask-expr doc build (:pr:10879) Patrick Hoefler_
Test tokenization of static and class methods (:pr:10872) crusaderky_
Add distributed.print and distributed.warn to API docs (:pr:10878) James Bourbeau_
Run macos ci on M1 architecture (:pr:10877) Patrick Hoefler_
Update tests for dask-expr (:pr:10838) Patrick Hoefler_
Update parquet tests to align with dask-expr fixes (:pr:10851) Richard (Rick) Zamora_
Fix regression in test_graph_manipulation (:pr:10873) crusaderky_
Adjust pytest errors for dask-expr ci (:pr:10871) Patrick Hoefler_
Set upper bound version for numba when pandas<2.1 (:pr:10890) Miles_
Deprecate method parameter in DataFrame.fillna (:pr:10846) Miles_
Remove warning filter from pyproject.toml (:pr:10867) Patrick Hoefler_
Skip test_append_with_partition for fastparquet (:pr:10828) Patrick Hoefler_
Fix pytest 8 issues (:pr:10868) Patrick Hoefler_
Adjust test for support of median in Groupby.aggregate in dask-expr (2/2) (:pr:10870) Hendrik Makait_
Allow length of ascending to be larger than one in sort_values (:pr:10864) Florian Jetter_
Allow other message raised in Python 3.9 (:pr:10862) Hendrik Makait_
Don't crash when getting computation code in pathological cases (:pr-distributed:8502) James Bourbeau_
Bump peter-evans/create-pull-request from 5 to 6 (:pr-distributed:8494)
fix test of cudf spilling metrics (:pr-distributed:8478) Mads R. B. Kristensen_
Upgrade to pytest 8 (:pr-distributed:8482) crusaderky_
Fix test_two_consecutive_clients_share_results (:pr-distributed:8484) crusaderky_
Client word mix-up (:pr-distributed:8481) templiert_

.. _v2024.1.1:

2024.1.1

Released on January 26, 2024

Highlights ^^^^^^^^^^

Pandas 2.2 and Scipy 1.12 support """"""""""""""""""""""""""""""""" This release contains compatibility updates for the latest pandas and scipy releases.

See :pr:10834, :pr:10849, :pr:10845, and :pr-distributed:8474 from crusaderky_ for details.

Deprecations """"""""""""

Deprecate convert_dtype in apply (:pr:10827) Miles_
Deprecate axis in DataFrame.rolling (:pr:10803) Miles_
Deprecate out= and dtype= parameter in most DataFrame methods (:pr:10800) crusaderky_
Deprecate axis in groupby cumulative transformers (:pr:10796) Miles_
Rename shuffle to shuffle_method in remaining methods (:pr:10797) Miles_

.. dropdown:: Additional changes

Add recommended deployment options to deployment docs (:pr:10866) James Bourbeau_
Improve _agg_finalize to confirm to output expectation (:pr:10835) Hendrik Makait_
Implement deterministic tokenization for hlg (:pr:10817) Patrick Hoefler_
Refactor: move tests for tokenize() to its own module (:pr:10863) crusaderky_
Update DataFrame examples section (:pr:10856) James Bourbeau_
Temporarily pin mimesis<13.1.0 (:pr:10860) James Bourbeau_
Trivial cosmetic tweaks to _testing.py (:pr:10857) crusaderky_
Unskip and adjust tests for groupby-aggregate with median using dask-expr (:pr:10832) Hendrik Makait_
Fix test for sizeof(pd.MultiIndex) in upstream CI (:pr:10850) crusaderky_
numpy 2.0: fix slicing by uint64 array (:pr:10854) crusaderky_
Rename numpy version constants to match pandas (:pr:10843) crusaderky_
Bump actions/cache from 3 to 4 (:pr:10852)
Update gpuCI RAPIDS_VER to 24.04 (:pr:10841)
Fix deprecations in doctest (:pr:10844) crusaderky_
Changed dtype arithmetics in numpy 2.x (:pr:10831) crusaderky_
Adjust tests for median support in dask-expr (:pr:10839) Patrick Hoefler_
Adjust tests for median support in groupby-aggregate in dask-expr (:pr:10840) Hendrik Makait_
numpy 2.x: fix std() on MaskedArray (:pr:10837) crusaderky_
Fail dask-expr ci if tests fail (:pr:10829) Patrick Hoefler_
Activate query_planning when exporting tests (:pr:10833) Patrick Hoefler_
Expose dataframe tests (:pr:10830) Patrick Hoefler_
numpy 2: deprecations in n-dimensional fft functions (:pr:10821) crusaderky_
Generalize CreationDispatch for dask-expr (:pr:10794) Richard (Rick) Zamora_
Remove circular import when dask-expr enabled (:pr:10824) Miles_
Minor[CI]: publish-test-results not marked as failed (:pr:10825) Miles_
Fix more tests to use pytest.warns() (:pr:10818) Michał Górny_
np.unique(): inverse is shaped in numpy 2 (:pr:10819) crusaderky_
Pin test_split_adaptive_files to pyarrow engine (:pr:10820) Patrick Hoefler_
Adjust remaining tests in dask/dask (:pr:10813) Patrick Hoefler_
Restrict test to Arrow only (:pr:10814) Patrick Hoefler_
Filter warnings from std test (:pr:10815) Patrick Hoefler_
Adjust mostly indexing tests (:pr:10790) Patrick Hoefler_
Updates to deployment docs (:pr:10778) Sarah Charlotte Johnson_
Unblock documentation build (:pr:10807) Miles_
Adjust test_to_datetime for dask-expr compatibility Hendrik Makait_
Upstream CI tweaks (:pr:10806) crusaderky_
Improve tests for to_numeric (:pr:10804) Hendrik Makait_
Fix test-report cache key indent (:pr:10798) Miles_
Add test-report workflow (:pr:10783) Miles_
Handle matrix subclass serialization (:pr-distributed:8480) Florian Jetter_
Use smallest data type for partition column in P2P (:pr-distributed:8479) Florian Jetter_
pandas 2.2: fix test_dataframe_groupby_tasks (:pr-distributed:8475) crusaderky_
Bump actions/cache from 3 to 4 (:pr-distributed:8477)
pandas 2.2 vs. pyarrow 14: deprecated DatetimeTZBlock (:pr-distributed:8476) crusaderky_
pandas 2.2.0: Deprecated frequency alias M in favor of ME (:pr-distributed:8473) Hendrik Makait_
Fix docs build (:pr-distributed:8472) Hendrik Makait_
Fix P2P-based joins with explicit npartitions (:pr-distributed:8470) Hendrik Makait_
Ignore dask-expr in test_report.py script (:pr-distributed:8464) Miles_
Nit: hardcode Python version in test report environment (:pr-distributed:8462) crusaderky_
Change test_report.py - skip bad artifacts in dask/dask (:pr-distributed:8461) Miles_
Replace all occurrences of sys.is_finalizing (:pr-distributed:8449) Florian Jetter_

.. _v2024.1.0:

2024.1.0

Released on January 12, 2024

Highlights ^^^^^^^^^^

Partial rechunks within P2P """"""""""""""""""""""""""" P2P rechunking now utilizes the relationships between input and output chunks. For situations that do not require all-to-all data transfer, this may significantly reduce the runtime and memory/disk footprint. It also enables task culling.

See :pr-distributed:8330 from Hendrik Makait_ for details.

Fastparquet engine deprecated """"""""""""""""""""""""""""" The fastparquet Parquet engine has been deprecated. Users should migrate to the pyarrow engine by installing PyArrow <https://arrow.apache.org/docs/python/install.html>_ and removing engine="fastparquet" in read_parquet or to_parquet calls.

See :pr:10743 from crusaderky_ for details.

Improved serialization for arbitrary data """"""""""""""""""""""""""""""""""""""""" This release improves serialization robustness for arbitrary data. Previously there were some cases where serialization could fail for non-msgpack serializable data. In those cases we now fallback to using pickle.

See :pr:8447 from Hendrik Makait_ for details.

Additional deprecations """""""""""""""""""""""

Deprecate shuffle keyword in favour of shuffle_method for DataFrame methods (:pr:10738) Hendrik Makait_
Deprecate automatic argument inference in repartition (:pr:10691) Patrick Hoefler_
Deprecate compute parameter in set_index (:pr:10784) Miles_
Deprecate inplace in eval (:pr:10785) Miles_
Deprecate Series.view (:pr:10754) Miles_
Deprecate npartitions="auto" for set_index & sort_values (:pr:10750) Miles_

.. dropdown:: Additional changes

Avoid shortcut in tasks shuffle that let to data loss (:pr:10763) Patrick Hoefler_
Ignore data tasks when ordering (:pr:10706) Florian Jetter_
Add get_dummies from dask-expr (:pr:10791) Patrick Hoefler_
Adjust IO tests for dask-expr migration (:pr:10776) Patrick Hoefler_
Remove deprecation warning about sort and split_out in groupby (:pr:10788) Patrick Hoefler_
Address pandas deprecations (:pr:10789) Patrick Hoefler_
Import distributed only once in get_scheduler (:pr:10771) Florian Jetter_
Simplify GitHub actions (:pr:10781) crusaderky_
Add unit test overview (:pr:10769) Miles_
Clean up redundant bits in CI (:pr:10768) crusaderky_
Update tests for ufunc (:pr:10773) Patrick Hoefler_
Use pytest.mark.skipif(DASK_EXPR_ENABLED) (:pr:10774) crusaderky_
Adjust shuffle tests for dask-expr (:pr:10759) Patrick Hoefler_
Fix some deprecation warnings from pandas (:pr:10749) Patrick Hoefler_
Adjust shuffle tests for dask-expr (:pr:10762) Patrick Hoefler_
Update pre-commit (:pr:10767) Hendrik Makait_
Clean up config switches in CI (:pr:10766) crusaderky_
Improve exception for validate_key (:pr:10765) Hendrik Makait_
Handle datetimeindexes in set_index with unknown divisions (:pr:10757) Patrick Hoefler_
Add hashing for decimals (:pr:10758) Patrick Hoefler_
Review tests for is_monotonic (:pr:10756) crusaderky_
Change argument order in value_counts_aggregate (:pr:10751) Patrick Hoefler_
Adjust some groupby tests for dask-expr (:pr:10752) Patrick Hoefler_
Restrict mimesis to < 12 for 3.9 build (:pr:10755) Patrick Hoefler_
Don't evaluate config in skip condition (:pr:10753) Patrick Hoefler_
Adjust some tests to be compatible with dask-expr (:pr:10714) Patrick Hoefler_
Make dask.array.utils functions more generic to other Dask Arrays (:pr:10676) Matthew Rocklin_
Remove duplciate "single machine" section (:pr:10747) Matthew Rocklin_
Tweak ORC engine= parameter (:pr:10746) crusaderky_
Add pandas 3.0 deprecations and migration prep for dask-expr (:pr:10723) Miles_
Add task graph animation to docs homepage (:pr:10730) Sarah Charlotte Johnson_
Use new Xarray logo (:pr:10729) James Bourbeau_
Update tab styling on "10 Minutes to Dask" page (:pr:10728) James Bourbeau_
Update environment file upload step in CI (:pr:10726) James Bourbeau_
Don't duplicate unobserved categories in GroupBy.nunqiue if split_out>1 (:pr:10716) Patrick Hoefler_
Changelog entry for dask.order update (:pr:10715) Florian Jetter_
Relax redundant-key check in _check_dsk (:pr:10701) Richard (Rick) Zamora_
Fix test_report.py (:pr-distributed:8459) Miles_
Revert pickle change (:pr-distributed:8456) Florian Jetter_
Adapt test_report.py to support dask/dask repository (:pr-distributed:8450) Miles_
Maintain stable ordering for P2P shuffling (:pr-distributed:8453) Hendrik Makait_
Add no worker timeout for scheduler (:pr-distributed:8371) FTang21_
Allow tests workflow to be dispatched manually by maintainers (:pr-distributed:8445) Erik Sundell_
Make scheduler-related transition functionality private (:pr-distributed:8448) Hendrik Makait_
Update pre-commit hooks (:pr-distributed:8444) Hendrik Makait_
Do not always check if __main__ in result when pickling (:pr-distributed:8443) Florian Jetter_
Delegate wait_for_workers to cluster instances only when implemented (:pr-distributed:8441) Erik Sundell_
Extend sleep in test_pandas (:pr-distributed:8440) Julian Gilbey_
Avoid deprecated shuffle keyword (:pr-distributed:8439) Hendrik Makait_
Shuffle metrics 4/4: Remove bespoke diagnostics (:pr-distributed:8367) crusaderky_
Do not run gilknocker in testsuite (:pr-distributed:8423) Florian Jetter_
Tweak abstractmethods (:pr-distributed:8427) crusaderky_
Shuffle metrics 3/4: Capture background metrics (:pr-distributed:8366) crusaderky_
Shuffle metrics 2/4: Add background metrics (:pr-distributed:8365) crusaderky_
Shuffle metrics 1/4: Add foreground metrics (:pr-distributed:8364) crusaderky_
Bump actions/upload-artifact from 3 to 4 (:pr-distributed:8420)
Fix test_merge_p2p_shuffle_reused_dataframe_with_different_parameters (:pr-distributed:8422) Hendrik Makait_
Expand Client.upload_file docs example (:pr-distributed:8313) Miles_
Improve logging in P2P's scheduler plugin (:pr-distributed:8410) Hendrik Makait_
Re-enable test_decide_worker_coschedule_order_neighbors (:pr-distributed:8402) Florian Jetter_
Add cuDF spilling statistics to RMM/GPU memory plot (:pr-distributed:8148) Charles Blackmon-Luca_
Fix inconsistent hashing for Nanny-spawned workers (:pr-distributed:8400) Charles Stern_
Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client) (:pr-distributed:7481) Florian Jetter_
Fix flaky test_subprocess_cluster_does_not_depend_on_logging (:pr-distributed:8417) crusaderky_

.. _v2023.12.1:

2023.12.1

Released on December 15, 2023

Highlights ^^^^^^^^^^

Logical Query Planning now available for Dask DataFrames """"""""""""""""""""""""""""""""""""""""""""""""""""""""

Dask DataFrames are now much more performant by using a logical query planner. This feature is currently off by default, but can be turned on with:

.. code:: python

dask.config.set({"dataframe.query-planning": True})

You also need to have dask-expr installed:

.. code:: bash

pip install dask-expr

We've seen promising performance improvements so far, see this blog post <https://blog.coiled.io/blog/dask-expr-tpch-dask.html>__ and these regularly updated benchmarks <https://tpch.coiled.io>__ for more information. A more detailed explanation of how the query optimizer works can be found in this blog post <https://blog.coiled.io/blog/dask-expr-introduction.html>__.

This feature is still under active development and the API <https://github.com/dask-contrib/dask-expr#api-coverage>__ isn't stable yet, so breaking changes can occur. We expect to make the query optimizer the default early next year.

See :pr:10634 from Patrick Hoefler_ for details.

Dtype inference in read_parquet """""""""""""""""""""""""""""""""""

read_parquet will now infer the Arrow types pa.date32(), pa.date64() and pa.decimal() as a ArrowDtype in pandas. These dtypes are backed by the original Arrow array, and thus avoid the conversion to NumPy object. Additionally, read_parquet will no longer infer nested and binary types as strings, they will be stored in NumPy object arrays.

See :pr:10698 and :pr:10705 from Patrick Hoefler_ for details.

Scheduling improvements to reduce memory usage """"""""""""""""""""""""""""""""""""""""""""""

This release includes a major rewrite to a core part of our scheduling logic. It includes a new approach to the topological sorting algorithm in dask.order which determines the order in which tasks are run. Improper ordering is known to be a major contributor to too large cluster memory pressure.

Updates in this release fix a couple of performance regressions that were introduced in the release 2023.10.0 (see :pr:10535). Generally, computations should now be much more eager to release data if it is no longer required in memory.

See :pr:10660, :pr:10697 from Florian Jetter_ for details.

Improved P2P-based merging robustness and performance """""""""""""""""""""""""""""""""""""""""""""""""""""

This release contains several updates that fix a possible deadlock introduced in 2023.9.2 and improve the robustness of P2P-based merging when the cluster is dynamically scaling up.

See :pr-distributed:8415, :pr-distributed:8416, and :pr-distributed:8414 from Hendrik Makait_ for details.

Removed disabling pickle option """""""""""""""""""""""""""""""

The distributed.scheduler.pickle configuration option is no longer supported. As of the 2023.4.0 release, pickle is used to transmit task graphs, so can no longer be disabled. We now raise an informative error when distributed.scheduler.pickle is set to False.

See :pr-distributed:8401 from Florian Jetter_ for details.

.. dropdown:: Additional changes

Add changelog entry for recent P2P merge fixes (:pr:10712) Hendrik Makait_
Update DataFrame page (:pr:10710) Matthew Rocklin_
Add changelog entry for dask-expr switch (:pr:10704) Patrick Hoefler_
Improve changelog entry for PipInstall changes (:pr:10711) Hendrik Makait_
Remove PR labeler (:pr:10709) James Bourbeau_
Add .__wrapped__ to Delayed object (:pr:10695) Andrew S. Rosen_
Bump actions/labeler from 4.3.0 to 5.0.0 (:pr:10689)
Bump actions/stale from 8 to 9 (:pr:10690)
[Dask.order] Remove non-runnable leaf nodes from ordering (:pr:10697) Florian Jetter_
Update installation docs (:pr:10699) Matthew Rocklin_
Fix software environment link in docs (:pr:10700) James Bourbeau_
Avoid converting non-strings to arrow strings for read_parquet (:pr:10692) Patrick Hoefler_
Bump xarray-contrib/issue-from-pytest-log from 1.2.7 to 1.2.8 (:pr:10687)
Fix tokenize for pd.DateOffset (:pr:10664) jochenott_
Bugfix for writing empty array to zarr (:pr:10506) Ben_
Docs update, fixup styling, mention free (:pr:10679) Matthew Rocklin_
Update deployment docs (:pr:10680) Matthew Rocklin_
Dask.order rewrite using a critical path approach (:pr:10660) Florian Jetter_
Avoid substituting keys that occur multiple times (:pr:10646) Florian Jetter_
Add missing image to docs (:pr:10694) Matthew Rocklin_
Bump actions/setup-python from 4 to 5 (:pr:10688)
Update landing page (:pr:10674) Matthew Rocklin_
Make meta check simpler in dispatch (:pr:10638) Patrick Hoefler_
Pin PR Labeler (:pr:10675) Matthew Rocklin_
Reorganize docs index a bit (:pr:10669) Matthew Rocklin_
Bump actions/setup-java from 3 to 4 (:pr:10667)
Bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.1 (:pr:10668)
Bump xarray-contrib/issue-from-pytest-log from 1.2.6 to 1.2.7 (:pr:10666)
Fix test_categorize_info with nightly pyarrow (:pr:10662) James Bourbeau_
Rewrite test_subprocess_cluster_does_not_depend_on_logging (:pr-distributed:8409) Hendrik Makait_
Avoid RecursionError when failing to pickle key in SpillBuffer and using tblib=3 (:pr-distributed:8404) Hendrik Makait_
Allow tasks to override is_rootish heuristic (:pr-distributed:8412) Hendrik Makait_
Remove GPU executor (:pr-distributed:8399) Hendrik Makait_
Do not rely on logging for subprocess cluster (:pr-distributed:8398) Hendrik Makait_
Update gpuCI RAPIDS_VER to 24.02 (:pr-distributed:8384)
Bump actions/setup-python from 4 to 5 (:pr-distributed:8396)
Ensure output chunks in P2P rechunking are distributed homogeneously (:pr-distributed:8207) Florian Jetter_
Trivial: fix typo (:pr-distributed:8395) crusaderky_
Bump JamesIves/github-pages-deploy-action from 4.4.3 to 4.5.0 (:pr-distributed:8387)
Bump conda-incubator/setup-miniconda from 3.0.0 to 3.0.1 (:pr-distributed:8388)

.. _v2023.12.0:

2023.12.0

Released on December 1, 2023

Highlights ^^^^^^^^^^

PipInstall restart and environment variables """"""""""""""""""""""""""""""""""""""""""""

The distributed.PipInstall plugin now has more robust restart logic and also supports environment variables <https://pip.pypa.io/en/stable/reference/requirements-file-format/#using-environment-variables>_.

Below shows how users can use the distributed.PipInstall plugin and a TOKEN environment variable to securely install a package from a private repository:

.. code:: python

from dask.distributed import PipInstall plugin = PipInstall(packages=["private_package@git+https://${TOKEN}@github.com/dask/private_package.git]) client.register_plugin(plugin)

See :pr-distributed:8374, :pr-distributed:8357, and :pr-distributed:8343 from Hendrik Makait_ for details.

Bokeh 3.3.0 compatibility """"""""""""""""""""""""" This release contains compatibility updates for using bokeh>=3.3.0 with proxied Dask dashboards. Previously the contents of dashboard plots wouldn't be displayed.

See :pr-distributed:8347 and :pr-distributed:8381 from Jacob Tomlinson_ for details.

.. dropdown:: Additional changes

Add network marker to test_pyarrow_filesystem_option_real_data (:pr:10653) Richard (Rick) Zamora_
Bump GPU CI to CUDA 11.8 (:pr:10656) Charles Blackmon-Luca_
Tokenize pandas offsets deterministically (:pr:10643) Patrick Hoefler_
Add tokenize pd.NA functionality (:pr:10640) Patrick Hoefler_
Update gpuCI RAPIDS_VER to 24.02 (:pr:10636)
Fix precision handling in array.linalg.norm (:pr:10556) joanrue_
Add axis argument to DataFrame.clip and Series.clip (:pr:10616) Richard (Rick) Zamora_
Update changelog entry for in-memory rechunking (:pr:10630) Florian Jetter_
Fix flaky test_resources_reset_after_cancelled_task (:pr-distributed:8373) crusaderky_
Bump GPU CI to CUDA 11.8 (:pr-distributed:8376) Charles Blackmon-Luca_
Bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.0 (:pr-distributed:8372)
Add debug logs to P2P scheduler plugin (:pr-distributed:8358) Hendrik Makait_
O(1) access for /info/task/ endpoint (:pr-distributed:8363) crusaderky_
Remove stringification from shuffle annotations (:pr-distributed:8362) crusaderky_
Don't cast int metrics to float (:pr-distributed:8361) crusaderky_
Drop asyncio TCP backend (:pr-distributed:8355) Florian Jetter_
Add offload support to context_meter.add_callback (:pr-distributed:8360) crusaderky_
Test that sync() propagates contextvars (:pr-distributed:8354) crusaderky_
captured_context_meter (:pr-distributed:8352) crusaderky_
context_meter.clear_callbacks (:pr-distributed:8353) crusaderky_
Use @log_errors decorator (:pr-distributed:8351) crusaderky_
Fix test_statistical_profiling_cycle (:pr-distributed:8356) Florian Jetter_
Shuffle: don't parse dask.config at every RPC (:pr-distributed:8350) crusaderky_
Replace Client.register_plugin s idempotent argument with .idempotent attribute on plugins (:pr-distributed:8342) Hendrik Makait_
Fix test report generation (:pr-distributed:8346) Hendrik Makait_
Install pyarrow-hotfix on mindeps-pandas CI (:pr-distributed:8344) Hendrik Makait_
Reduce memory usage of scheduler process - optimize scheduler.py::TaskState class (:pr-distributed:8331) Miles_
Bump pre-commit linters (:pr-distributed:8340) crusaderky_
Update cuDF test with explicit dtype=object (:pr-distributed:8339) Peter Andreas Entschev_
Fix Cluster / SpecCluster calls to async close methods (:pr-distributed:8327) Peter Andreas Entschev_

.. _v2023.11.0:

2023.11.0

Released on November 10, 2023

Highlights ^^^^^^^^^^

Zero-copy P2P Array Rechunking """"""""""""""""""""""""""""""

Users should see significant performance improvements when using in-memory P2P array rechunking. This is due to no longer copying underlying data buffers.

Below shows a simple example where we compare performance of different rechunking methods.

.. code:: python

shape = (30_000, 6_000, 150) # 201.17 GiB input_chunks = (60, -1, -1) # 411.99 MiB output_chunks = (-1, 6, -1) # 205.99 MiB

arr = da.random.random(size, chunks=input_chunks) with dask.config.set({ "array.rechunk.method": "p2p", "distributed.p2p.disk": True, }): ( da.random.random(size, chunks=input_chunks) .rechunk(output_chunks) .sum() .compute() )

.. image:: images/changelog/2023110-rechunking-disk-perf.png :width: 75% :align: center :alt: A comparison of rechunking performance between the different methods tasks, p2p with disk and p2p without disk on different cluster sizes. The graph shows that p2p without disk is up to 60% faster than the default tasks based approach.

See :pr-distributed:8282, :pr-distributed:8318, :pr-distributed:8321 from crusaderky_ and (:pr-distributed:8322) from Hendrik Makait_ for details.

Deprecating PyArrow <14.0.1 """"""""""""""""""""""""""" pyarrow<14.0.1 usage is deprecated starting in this release. It's recommended for all users to upgrade their version of pyarrow or install pyarrow-hotfix. See this CVE <https://www.cve.org/CVERecord?id=CVE-2023-47248>_ for full details.

See :pr:10622 from Florian Jetter_ for details.

Improved PyArrow filesystem for Parquet """"""""""""""""""""""""""""""""""""""" Using filesystem="arrow" when reading Parquet datasets now properly inferrs the correct cloud region when accessing remote, cloud-hosted data.

See :pr:10590 from Richard (Rick) Zamora_ for details.

Improve Type Reconciliation in P2P Shuffling """""""""""""""""""""""""""""""""""""""""""" See :pr-distributed:8332 from Hendrik Makait_ for details.

.. dropdown:: Additional changes

- Fix sporadic failure of ``test_dataframe::test_quantile`` (:pr:`10625`) `Miles`_
- Bump minimum ``click`` to ``>=8.1`` (:pr:`10623`) `Jacob Tomlinson`_
- Refactor ``test_quantile`` (:pr:`10620`) `Miles`_
- Avoid ``PerformanceWarning`` for fragmented DataFrame (:pr:`10621`) `Patrick Hoefler`_
- Generalize computation of ``NEW_*_VER`` in GPU CI updating workflow (:pr:`10610`) `Charles Blackmon-Luca`_
- Switch to newer GPU CI images (:pr:`10608`) `Charles Blackmon-Luca`_
- Remove double slash in ``fsspec`` tests (:pr:`10605`) `Mario Šaško`_
- Reenable ``test_ucx_config_w_env_var`` (:pr-distributed:`8272`) `Peter Andreas Entschev`_
- Don't share ``host_array`` when receiving from network (:pr-distributed:`8308`) `crusaderky`_
- Generalize computation of ``NEW_*_VER`` in GPU CI updating workflow (:pr-distributed:`8319`) `Charles Blackmon-Luca`_
- Switch to newer GPU CI images (:pr-distributed:`8316`) `Charles Blackmon-Luca`_
- Minor updates to shuffle dashboard (:pr-distributed:`8315`) `Matthew Rocklin`_
- Don't use ``bytearray().join`` (:pr-distributed:`8312`) `crusaderky`_
- Reuse identical shuffles in P2P hash join (:pr-distributed:`8306`) `Hendrik Makait`_

.. _v2023.10.1:

2023.10.1

Released on October 27, 2023

Highlights ^^^^^^^^^^

Python 3.12 """"""""""" This release adds official support for Python 3.12.

See :pr:10544 and :pr-distributed:8223 from Thomas Grainger_ for details.

.. dropdown:: Additional changes

- Avoid splitting parquet files to row groups as aggressively (:pr:`10600`) `Matthew Rocklin`_
- Speed up ``normalize_chunks`` for common case (:pr:`10579`) `Martin Durant`_
- Use Python 3.11 for upstream and doctests CI build (:pr:`10596`) `Thomas Grainger`_
- Bump ``actions/checkout`` from 4.1.0 to 4.1.1 (:pr:`10592`)
- Switch to PyTables ``HEAD`` (:pr:`10580`) `Thomas Grainger`_
- Remove ``numpy.core`` warning filter, link to issue on ``pyarrow`` caused ``BlockManager`` warning (:pr:`10571`) `Thomas Grainger`_
- Unignore and fix deprecated freq aliases (:pr:`10577`) `Thomas Grainger`_
- Move ``register_assert_rewrite`` earlier in ``conftest`` to fix warnings (:pr:`10578`) `Thomas Grainger`_
- Upgrade ``versioneer`` to 0.29 (:pr:`10575`) `Thomas Grainger`_
- change ``test_concat_categorical`` to be non-strict (:pr:`10574`) `Thomas Grainger`_
- Enable SciPy tests with NumPy 2.0 `Thomas Grainger`_
- Enable tests for scikit-image with NumPy 2.0 (:pr:`10569`) `Thomas Grainger`_
- Fix upstream build (:pr:`10549`) `Thomas Grainger`_
- Add optimized code paths for ``drop_duplicates`` (:pr:`10542`) `Richard (Rick) Zamora`_
- Support ``cudf`` backend in ``dd.DataFrame.sort_values`` (:pr:`10551`) `Richard (Rick) Zamora`_
- Rename "GIL Contention" to just GIL in chart labels (:pr-distributed:`8305`) `Matthew Rocklin`_
- Bump ``actions/checkout`` from 4.1.0 to 4.1.1 (:pr-distributed:`8299`)
- Fix dashboard (:pr-distributed:`8293`) `Hendrik Makait`_
- ``@log_errors`` for async tasks (:pr-distributed:`8294`) `crusaderky`_
- Annotations and better tests for serialize_bytes (:pr-distributed:`8300`) `crusaderky`_
- Temporarily xfail ``test_decide_worker_coschedule_order_neighbors`` to unblock CI (:pr-distributed:`8298`) `James Bourbeau`_
- Skip ``xdist`` and ``matplotlib`` in code samples (:pr-distributed:`8290`) `Matthew Rocklin`_
- Use ``numpy._core`` on ``numpy>=2.dev0`` (:pr-distributed:`8291`) `Thomas Grainger`_
- Fix calculation of ``MemoryShardsBuffer.bytes_read`` (:pr-distributed:`8289`) `crusaderky`_
- Allow P2P to store data in-memory (:pr-distributed:`8279`) `Hendrik Makait`_
- Upgrade ``versioneer`` to 0.29 (:pr-distributed:`8288`) `Thomas Grainger`_
- Allow ``ResourceLimiter`` to be unlimited (:pr-distributed:`8276`) `Hendrik Makait`_
- Run ``pre-commit`` autoupdate (:pr-distributed:`8281`) `Thomas Grainger`_
- Annotate instance variables for P2P layers (:pr-distributed:`8280`) `Hendrik Makait`_
- Remove worker gracefully should not mark tasks as suspicious (:pr-distributed:`8234`) `Thomas Grainger`_
- Add signal handling to ``dask spec`` (:pr-distributed:`8261`) `Thomas Grainger`_
- Add typing for ``sync`` (:pr-distributed:`8275`) `Hendrik Makait`_
- Better annotations for shuffle offload (:pr-distributed:`8277`) `crusaderky`_
- Test minimum versions for p2p shuffle (:pr-distributed:`8270`) `crusaderky`_
- Run coverage on test failures (:pr-distributed:`8269`) `crusaderky`_
- Use ``aiohttp`` with extensions (:pr-distributed:`8274`) `Thomas Grainger`_

.. _v2023.10.0:

2023.10.0

Released on October 13, 2023

Highlights ^^^^^^^^^^

Reduced memory pressure for multi array reductions """""""""""""""""""""""""""""""""""""""""""""""""" This release contains major updates to Dask's task graph scheduling logic. The updates here significantly reduce memory pressure on array reductions. We anticipate this will have a strong impact on the array computing community.

See :pr:10535 from Florian Jetter_ for details.

Improved P2P shuffling robustness """"""""""""""""""""""""""""""""" There are several updates (listed below) that make P2P shuffling much more robust and less likely to fail.

See :pr-distributed:8262, :pr-distributed:8264, :pr-distributed:8242, :pr-distributed:8244, and :pr-distributed:8235 from Hendrik Makait_ and :pr-distributed:8124 from Charles Blackmon-Luca_ for details.

Reduced scheduler CPU load for large graphs """"""""""""""""""""""""""""""""""""""""""" Users should see reduced CPU load on their scheduler when computing large task graphs.

See :pr-distributed:8238 and :pr:10547 from Florian Jetter_ and :pr-distributed:8240 from crusaderky_ for details.

.. dropdown:: Additional changes

- Dispatch the ``partd.Encode`` class used for disk-based shuffling (:pr:`10552`) `Richard (Rick) Zamora`_
- Add documentation for hive partitioning (:pr:`10454`) `Richard (Rick) Zamora`_
- Add typing to ``dask.order`` (:pr:`10553`) `Florian Jetter`_
- Allow passing ``index_col=False`` in ``dd.read_csv`` (:pr:`9961`) `Michael Leslie`_
- Tighten ``HighLevelGraph`` annotations (:pr:`10524`) `crusaderky`_
- Support for latest ``ipykernel``/``ipywidgets`` (:pr-distributed:`8253`) `crusaderky`_
- Check minimal ``pyarrow`` version for P2P merge (:pr-distributed:`8266`) `Hendrik Makait`_
- Support for Python 3.12 (:pr-distributed:`8223`) `Thomas Grainger`_
- Use ``memoryview.nbytes`` when warning on large graph send (:pr-distributed:`8268`) `crusaderky`_
- Run tests without ``gilknocker`` (:pr-distributed:`8263`) `crusaderky`_
- Disable ipv6 on MacOS CI (:pr-distributed:`8254`) `crusaderky`_
- Clean up redundant minimum versions (:pr-distributed:`8251`) `crusaderky`_
- Clean up use of ``BARRIER_PREFIX`` in scheduler plugin (:pr-distributed:`8252`) `crusaderky`_
- Improve shuffle run handling in P2P's worker plugin (:pr-distributed:`8245`) `Hendrik Makait`_
- Explicitly set ``charset=utf-8`` (:pr-distributed:`8250`) `crusaderky`_
- Typing tweaks to :pr-distributed:`8239` (:pr-distributed:`8247`) `crusaderky`_
- Simplify scheduler assertion (:pr-distributed:`8246`) `crusaderky`_
- Improve typing (:pr-distributed:`8239`) `Hendrik Makait`_
- Respect cgroups v2 "low" memory limit (:pr-distributed:`8243`) `Samantha Hughes`_
- Fix ``PackageInstall`` by making it a scheduler plugin (:pr-distributed:`8142`) `Hendrik Makait`_
- Xfail ``test_ucx_config_w_env_var`` (:pr-distributed:`8241`) `crusaderky`_
- ``SpecCluster`` resilience to broken workers (:pr-distributed:`8233`) `crusaderky`_
- Suppress ``SpillBuffer`` stack traces for cancelled tasks (:pr-distributed:`8232`) `crusaderky`_
- Update annotations after stringification changes (:pr-distributed:`8195`) `crusaderky`_
- Reduce max recursion depth of profile (:pr-distributed:`8224`) `crusaderky`_
- Offload deeply nested objects (:pr-distributed:`8214`) `crusaderky`_
- Fix flaky ``test_close_connections`` (:pr-distributed:`8231`) `crusaderky`_
- Fix flaky ``test_popen_timeout`` (:pr-distributed:`8229`) `crusaderky`_
- Fix flaky ``test_adapt_then_manual`` (:pr-distributed:`8228`) `crusaderky`_
- Prevent collisions in ``SpillBuffer`` (:pr-distributed:`8226`) `crusaderky`_
- Allow ``retire_workers`` to run concurrently (:pr-distributed:`8056`) `Florian Jetter`_
- Fix HTML repr for ``TaskState`` objects (:pr-distributed:`8188`) `Florian Jetter`_
- Fix ``AttributeError`` for ``builtin_function_or_method`` in ``profile.py`` (:pr-distributed:`8181`) `Florian Jetter`_
- Fix flaky ``test_spans`` (v2) (:pr-distributed:`8222`) `crusaderky`_

.. _v2023.9.3:

2023.9.3

Released on September 29, 2023

Highlights ^^^^^^^^^^

Restore previous configuration override behavior """""""""""""""""""""""""""""""""""""""""""""""" The 2023.9.2 release introduced an unintentional breaking change in how configuration options are overriden in dask.config.get with the override_with= keyword (see :issue:10519). This release restores the previous behavior.

See :pr:10521 from crusaderky_ for details.

Complex dtypes in Dask Array reductions """"""""""""""""""""""""""""""""""""""" This release includes improved support for using common reductions in Dask Array (e.g. var, std, moment) with complex dtypes.

See :pr:10009 from wkrasnicki_ for details.

.. dropdown:: Additional changes

- Bump ``actions/checkout`` from 4.0.0 to 4.1.0 (:pr:`10532`)
- Match ``pandas`` reverting ``apply`` deprecation (:pr:`10531`) `James Bourbeau`_
- Update gpuCI ``RAPIDS_VER`` to ``23.12`` (:pr:`10526`)
- Temporarily skip failing tests with ``fsspec==2023.9.1`` (:pr:`10520`) `James Bourbeau`_

.. _v2023.9.2:

2023.9.2

Released on September 15, 2023

Highlights ^^^^^^^^^^

P2P shuffling now raises when outdated PyArrow is installed """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" Previously the default shuffling method would silently fallback from P2P to task-based shuffling if an older version of pyarrow was installed. Now we raise an informative error with the minimum required pyarrow version for P2P instead of silently falling back.

See :pr:10496 from Hendrik Makait_ for details.

Deprecation cycle for admin.traceback.shorten """"""""""""""""""""""""""""""""""""""""""""" The 2023.9.0 release modified the admin.traceback.shorten configuration option without introducing a deprecation cycle. This resulted in failures to create Dask clusters in some cases. This release introduces a deprecation cycle for this configuration change.

See :pr:10509 from crusaderky_ for details.

.. dropdown:: Additional changes

- Avoid materializing all iterators in ``delayed`` tasks (:pr:`10498`) `James Bourbeau`_
- Overhaul deprecations system in ``dask.config`` (:pr:`10499`) `crusaderky`_
- Remove unnecessary check in ``timeseries`` (:pr:`10447`) `Patrick Hoefler`_
- Use ``register_plugin`` in tests (:pr:`10503`) `James Bourbeau`_
- Make ``preserve_index`` explicit in ``pyarrow_schema_dispatch`` (:pr:`10501`) `Hendrik Makait`_
- Add ``**kwargs`` support for ``pyarrow_schema_dispatch`` (:pr:`10500`) `Hendrik Makait`_
- Centralize and type ``no_default`` (:pr:`10495`) `crusaderky`_

.. _v2023.9.1:

2023.9.1

Released on September 6, 2023

.. note:: This is a hotfix release that fixes a P2P shuffling bug introduced in the 2023.9.0 release (see :pr:10493).

Enhancements ^^^^^^^^^^^^

Stricter data type for dask keys (:pr:10485) crusaderky_
Special handling for None in DASK_ environment variables (:pr:10487) crusaderky_

Bug Fixes ^^^^^^^^^

Fix _partitions dtype in meta for DataFrame.set_index and DataFrame.sort_values (:pr:10493) Hendrik Makait_
Handle cached_property decorators in derived_from (:pr:10490) Lawrence Mitchell_

Maintenance ^^^^^^^^^^^

Bump actions/checkout from 3.6.0 to 4.0.0 (:pr:10492)
Simplify some tests that import distributed (:pr:10484) crusaderky_

.. _v2023.9.0:

2023.9.0

Released on September 1, 2023

Bug Fixes ^^^^^^^^^

Remove support for np.int64 in keys (:pr:10483) crusaderky_
Fix _partitions dtype in meta for shuffling (:pr:10462) Hendrik Makait_
Don't use exception hooks to shorten tracebacks (:pr:10456) crusaderky_

Documentation ^^^^^^^^^^^^^

Add p2p shuffle option to DataFrame docs (:pr:10477) Patrick Hoefler_

Maintenance ^^^^^^^^^^^

Skip failing tests for pandas=2.1.0 (:pr:10488) Patrick Hoefler_
Update tests for pandas=2.1.0 (:pr:10439) Patrick Hoefler_
Enable pytest-timeout (:pr:10482) crusaderky_
Bump actions/checkout from 3.5.3 to 3.6.0 (:pr:10470)

.. _v2023.8.1:

2023.8.1

Released on August 18, 2023

Enhancements ^^^^^^^^^^^^

Adding support for cgroup v2 to cpu_count (:pr:10419) Johan Olsson_
Support multi-column groupby with sort=True and split_out>1 (:pr:10425) Richard (Rick) Zamora_
Add DataFrame.enforce_runtime_divisions method (:pr:10404) Richard (Rick) Zamora_
Enable file mode="x" with a single_file=True for Dask DataFrame to_csv (:pr:10443) Genevieve Buckley_

Bug Fixes ^^^^^^^^^

Fix ValueError when running to_csv in append mode with single_file as True (:pr:10441) Ben_

Maintenance ^^^^^^^^^^^

Add default types_mapper to from_pyarrow_table_dispatch for pandas (:pr:10446) Richard (Rick) Zamora_

.. _v2023.8.0:

2023.8.0

Released on August 4, 2023

Enhancements ^^^^^^^^^^^^

Fix for make_timeseries performance regression (:pr:10428) Irina Truong_

Documentation ^^^^^^^^^^^^^

Add distributed.print to debugging docs (:pr:10435) James Bourbeau_
Documenting compatibility of NumPy functions with Dask functions (:pr:9941) Chiara Marmo_

Maintenance ^^^^^^^^^^^

Use SPDX in license metadata (:pr:10437) John A Kirkham_
Require dask[array] in dask[dataframe] (:pr:10357) John A Kirkham_
Update gpuCI RAPIDS_VER to 23.10 (:pr:10427)
Simplify compatibility code (:pr:10426) Hendrik Makait_
Fix compatibility variable naming (:pr:10424) Hendrik Makait_
Fix a few errors with upstream pandas and pyarrow (:pr:10412) Irina Truong_

.. _v2023.7.1:

2023.7.1

Released on July 20, 2023

.. note::

This release updates Dask DataFrame to automatically convert text data using object data types to string[pyarrow] if pandas>=2 and pyarrow>=12 are installed.

This should result in significantly reduced memory consumption and increased computation performance in many workflows that deal with text data.

You can disable this change by setting the dataframe.convert-string configuration value to False with

.. code-block:: python

  dask.config.set({"dataframe.convert-string": False})

Enhancements ^^^^^^^^^^^^

Convert to pyarrow strings if proper dependencies are installed (:pr:10400) James Bourbeau_
Avoid repartition before shuffle for p2p (:pr:10421) Patrick Hoefler_
API to generate random Dask DataFrames (:pr:10392) Irina Truong_
Speed up dask.bag.Bag.random_sample (:pr:10356) crusaderky_
Raise helpful ValueError for invalid time units (:pr:10408) Nat Tabris_
Make repartition a no-op when divisions match (divisions provided as a list) (:pr:10395) Nicolas Grandemange_

Bug Fixes ^^^^^^^^^

Use dataframe.convert-string in read_parquet token (:pr:10411) James Bourbeau_
Category dtype is lost when concatenating MultiIndex (:pr:10407) Irina Truong_
Fix FutureWarning: The provided callable... (:pr:10405) Irina Truong_
Enable non-categorical hive-partition columns in read_parquet (:pr:10353) Richard (Rick) Zamora_
concat ignoring DataFrame withouth columns (:pr:10359) Patrick Hoefler_

.. _v2023.7.0:

2023.7.0

Released on July 7, 2023

Enhancements ^^^^^^^^^^^^

Catch exceptions when attempting to load CLI entry points (:pr:10380) Jacob Tomlinson_

Bug Fixes ^^^^^^^^^

Fix typo in _clean_ipython_traceback (:pr:10385) Alexander Clausen_
Ensure that df is immutable after from_pandas (:pr:10383) Patrick Hoefler_
Warn consistently for inplace in Series.rename (:pr:10313) Patrick Hoefler_

Documentation ^^^^^^^^^^^^^

Add clarification about output shape and reshaping in rechunk documentation (:pr:10377) Swayam Patil_

Maintenance ^^^^^^^^^^^

Simplify astype implementation (:pr:10393) Patrick Hoefler_
Fix test_first_and_last to accommodate deprecated last (:pr:10373) James Bourbeau_
Add level to create_merge_tree (:pr:10391) Patrick Hoefler_
Do not derive from scipy.stats.chisquare docstring (:pr:10382) Doug Davis_

.. _v2023.6.1:

2023.6.1

Released on June 26, 2023

Enhancements ^^^^^^^^^^^^

Remove no longer supported clip_lower and clip_upper (:pr:10371) Patrick Hoefler_
Support DataFrame.set_index(..., sort=False) (:pr:10342) Miles_
Cleanup remote tracebacks (:pr:10354) Irina Truong_
Add dispatching mechanisms for pyarrow.Table conversion (:pr:10312) Richard (Rick) Zamora_
Choose P2P even if fusion is enabled (:pr:10344) Hendrik Makait_
Validate that rechunking is possible earlier in graph generation (:pr:10336) Hendrik Makait_

Bug Fixes ^^^^^^^^^

Fix issue with header passed to read_csv (:pr:10355) GALI PREM SAGAR_
Respect dropna and observed in GroupBy.var and GroupBy.std (:pr:10350) Patrick Hoefler_
Fix H5FD_lock error when writing to hdf with distributed client (:pr:10309) Irina Truong_
Fix for total_mem_usage of bag.map() (:pr:10341) Irina Truong_

Deprecations ^^^^^^^^^^^^

Deprecate DataFrame.fillna/Series.fillna with method (:pr:10349) Irina Truong_
Deprecate DataFrame.first and Series.first (:pr:10352) Irina Truong_

Maintenance ^^^^^^^^^^^

Deprecate numpy.compat (:pr:10370) Irina Truong_
Fix annotations and spans leaking between threads (:pr:10367) Irina Truong_
Use general kwargs in pyarrow_table_dispatch functions (:pr:10364) Richard (Rick) Zamora_
Remove unnecessary try/except in isna (:pr:10363) Patrick Hoefler_
mypy support for numpy 1.25 (:pr:10362) crusaderky_
Bump actions/checkout from 3.5.2 to 3.5.3 (:pr:10348)
Restore numba in upstream build (:pr:10330) James Bourbeau_
Update nightly wheel index for pandas/numpy/scipy (:pr:10346) Matthew Roeschke_
Add rechunk config values to yaml (:pr:10343) Hendrik Makait_

.. _v2023.6.0:

2023.6.0

Released on June 9, 2023

Enhancements ^^^^^^^^^^^^

Add missing not in predicate support to read_parquet (:pr:10320) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

Fix for incorrect value_counts (:pr:10323) Irina Truong_
Update empty describe top and freq values (:pr:10319) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Fix hetzner typo (:pr:10332) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

Test with numba and sparse on Python 3.11 (:pr:10329) Thomas Grainger_
Remove numpy.find_common_type warning ignore (:pr:10311) James Bourbeau_
Update gpuCI RAPIDS_VER to 23.08 (:pr:10310)

.. _v2023.5.1:

2023.5.1

Released on May 26, 2023

.. note::

This release drops support for Python 3.8. As of this release Dask supports Python 3.9, 3.10, and 3.11. See this community issue <https://github.com/dask/community/issues/315>_ for more details.

Enhancements ^^^^^^^^^^^^

Drop Python 3.8 support (:pr:10295) Thomas Grainger_
Change Dask Bag partitioning scheme to improve cluster saturation (:pr:10294) Jacob Tomlinson_
Generalize dd.to_datetime for GPU-backed collections, introduce get_meta_library utility (:pr:9881) Charles Blackmon-Luca_
Add na_action to DataFrame.map (:pr:10305) Patrick Hoefler_
Raise TypeError in DataFrame.nsmallest and DataFrame.nlargest when columns is not given (:pr:10301) Patrick Hoefler_
Improve sizeof for pd.MultiIndex (:pr:10230) Patrick Hoefler_
Support duplicated columns in a bunch of DataFrame methods (:pr:10261) Patrick Hoefler_
Add numeric_only support to DataFrame.idxmin and DataFrame.idxmax (:pr:10253) Patrick Hoefler_
Implement numeric_only support for DataFrame.quantile (:pr:10259) Patrick Hoefler_
Add support for numeric_only=False in DataFrame.std (:pr:10251) Patrick Hoefler_
Implement numeric_only=False for GroupBy.cumprod and GroupBy.cumsum (:pr:10262) Patrick Hoefler_
Implement numeric_only for skew and kurtosis (:pr:10258) Patrick Hoefler_
mask and where should accept a callable (:pr:10289) Irina Truong_
Fix conversion from Categorical to pa.dictionary in read_parquet (:pr:10285) Patrick Hoefler_

Bug Fixes ^^^^^^^^^

Spurious config on nested annotations (:pr:10318) crusaderky_
Fix rechunking behavior for dimensions with known and unknown chunk sizes (:pr:10157) Hendrik Makait_
Enable drop to support mismatched partitions (:pr:10300) James Bourbeau_
Fix divisions construction for to_timestamp (:pr:10304) Patrick Hoefler_
pandas ExtensionDtype raising in Series reduction operations (:pr:10149) Patrick Hoefler_
Fix regression in da.random interface (:pr:10247) Eray Aslan_
da.coarsen doesn't trim an empty chunk in meta (:pr:10281) Irina Truong_
Fix dtype inference for engine="pyarrow" in read_csv (:pr:10280) Patrick Hoefler_

Documentation ^^^^^^^^^^^^^

Add meta_from_array to API docs (:pr:10306) Ruth Comer_
Update Coiled links (:pr:10296) Sarah Charlotte Johnson_
Add docs for demo day (:pr:10288) Matthew Rocklin_

Maintenance ^^^^^^^^^^^

Explicitly install anaconda-client from conda-forge when uploading conda nightlies (:pr:10316) Charles Blackmon-Luca_
Configure isort to add from __future__ import annotations (:pr:10314) Thomas Grainger_
Avoid pandas Series.__getitem__ deprecation in tests (:pr:10308) James Bourbeau_
Ignore numpy.find_common_type warning from pandas (:pr:10307) James Bourbeau_
Add test to check that DataFrame.__setitem__ does not modify df inplace (:pr:10223) Patrick Hoefler_
Clean up default value of dropna in value_counts (:pr:10299) Patrick Hoefler_
Add pytest-cov to test extra (:pr:10271) James Bourbeau_

.. _v2023.5.0:

2023.5.0

Released on May 12, 2023

Enhancements ^^^^^^^^^^^^

Implement numeric_only=False for GroupBy.corr and GroupBy.cov (:pr:10264) Patrick Hoefler_
Add support for numeric_only=False in DataFrame.var (:pr:10250) Patrick Hoefler_
Add numeric_only support to DataFrame.mode (:pr:10257) Patrick Hoefler_
Add DataFrame.map to dask.DataFrame API (:pr:10246) Patrick Hoefler_
Adjust for DataFrame.applymap deprecation and all NA concat behaviour change (:pr:10245) Patrick Hoefler_
Enable numeric_only=False for DataFrame.count (:pr:10234) Patrick Hoefler_
Disallow array input in mask/where (:pr:10163) Irina Truong_
Support numeric_only=True in GroupBy.corr and GroupBy.cov (:pr:10227) Patrick Hoefler_
Add numeric_only support to GroupBy.median (:pr:10236) Patrick Hoefler_
Support mimesis=9 in dask.datasets (:pr:10241) James Bourbeau_
Add numeric_only support to min, max and prod (:pr:10219) Patrick Hoefler_
Add numeric_only=True support for GroupBy.cumsum and GroupBy.cumprod (:pr:10224) Patrick Hoefler_
Add helper to unpack numeric_only keyword (:pr:10228) Patrick Hoefler_

Bug Fixes ^^^^^^^^^

Fix clone + from_array failure (:pr:10211) crusaderky_
Fix dataframe reductions for ea dtypes (:pr:10150) Patrick Hoefler_
Avoid scalar conversion deprecation warning in numpy=1.25 (:pr:10248) James Bourbeau_
Make sure transform output has the same index as input (:pr:10184) Irina Truong_
Fix corr and cov on a single-row partition (:pr:9756) Irina Truong_
Fix test_groupby_numeric_only_supported and test_groupby_aggregate_categorical_observed upstream errors (:pr:10243) Irina Truong_

Documentation ^^^^^^^^^^^^^

Clean up futures docs (:pr:10266) Matthew Rocklin_
Add Index API reference (:pr:10263) hotpotato_

Maintenance ^^^^^^^^^^^

Warn when meta is passed to apply (:pr:10256) Patrick Hoefler_
Remove imageio version restriction in CI (:pr:10260) Patrick Hoefler_
Remove unused DataFrame variance methods (:pr:10252) Patrick Hoefler_
Un-xfail test_categories with pyarrow strings and pyarrow>=12 (:pr:10244) Irina Truong_
Bump gpuCI PYTHON_VER 3.8->3.9 (:pr:10233) Charles Blackmon-Luca_

.. _v2023.4.1:

2023.4.1

Released on April 28, 2023

Enhancements ^^^^^^^^^^^^

Implement numeric_only support for DataFrame.sum (:pr:10194) Patrick Hoefler_
Add support for numeric_only=True in GroupBy operations (:pr:10222) Patrick Hoefler_
Avoid deep copy in DataFrame.__setitem__ for pandas 1.4 and up (:pr:10221) Patrick Hoefler_
Avoid calling Series.apply with _meta_nonempty (:pr:10212) Patrick Hoefler_
Unpin sqlalchemy and fix compatibility issues (:pr:10140) Patrick Hoefler_

Bug Fixes ^^^^^^^^^

Partially revert default client discovery (:pr:10225) Florian Jetter_
Support arrow dtypes in Index meta creation (:pr:10170) Patrick Hoefler_
Repartitioning raises with extension dtype when truncating floats (:pr:10169) Patrick Hoefler_
Adjust empty Index from fastparquet to object dtype (:pr:10179) Patrick Hoefler_

Documentation ^^^^^^^^^^^^^

Update Kubernetes docs (:pr:10232) Jacob Tomlinson_
Add DataFrame.reduction to API docs (:pr:10229) James Bourbeau_
Add DataFrame.persist to docs and fix links (:pr:10231) Patrick Hoefler_
Add documentation for GroupBy.transform (:pr:10185) Irina Truong_
Fix formatting in random number generation docs (:pr:10189) Eray Aslan_

Maintenance ^^^^^^^^^^^

Pin imageio to <2.28 (:pr:10216) Patrick Hoefler_
Add note about importlib_metadata backport (:pr:10207) James Bourbeau_
Add xarray back to Python 3.11 CI builds (:pr:10200) James Bourbeau_
Add mindeps build with all optional dependencies (:pr:10161) Charles Blackmon-Luca_
Provide proper like value for array_safe in percentiles_summary (:pr:10156) Charles Blackmon-Luca_
Avoid re-opening hdf file multiple times in read_hdf (:pr:10205) Thomas Grainger_
Add merge tests on nullable columns (:pr:10071) Charles Blackmon-Luca_
Fix coverage configuration (:pr:10203) Thomas Grainger_
Remove is_period_dtype and is_sparse_dtype (:pr:10197) Patrick Hoefler_
Bump actions/checkout from 3.5.0 to 3.5.2 (:pr:10201)
Avoid deprecated is_categorical_dtype from pandas (:pr:10180) Patrick Hoefler_
Adjust for deprecated is_interval_dtype and is_datetime64tz_dtype (:pr:10188) Patrick Hoefler_

.. _v2023.4.0:

2023.4.0

Released on April 14, 2023

Enhancements ^^^^^^^^^^^^

Override old default values in update_defaults (:pr:10159) Gabe Joseph_
Add a CLI command to list and get a value from dask config (:pr:9936) Irina Truong_
Handle string-based engine argument to read_json (:pr:9947) Richard (Rick) Zamora_
Avoid deprecated GroupBy.dtypes (:pr:10111) Irina Truong_

Bug Fixes ^^^^^^^^^

Revert grouper-related changes (:pr:10182) Irina Truong_
GroupBy.cov raising for non-numeric grouping column (:pr:10171) Patrick Hoefler_
Updates for Index supporting numpy numeric dtypes (:pr:10154) Irina Truong_
Preserve dtype for partitioning columns when read with pyarrow (:pr:10115) Patrick Hoefler_
Fix annotations for to_hdf (:pr:10123) Hendrik Makait_
Handle None column name when checking if columns are all numeric (:pr:10128) Lawrence Mitchell_
Fix valid_divisions when passed a tuple (:pr:10126) Brian Phillips_
Maintain annotations in DataFrame.categorize (:pr:10120) Hendrik Makait_
Fix handling of missing min/max parquet statistics during filtering (:pr:10042) Richard (Rick) Zamora_

Deprecations ^^^^^^^^^^^^

Deprecate use_nullable_dtypes= and add dtype_backend= (:pr:10076) Irina Truong_
Deprecate convert_dtype in Series.apply (:pr:10133) Irina Truong_

Documentation ^^^^^^^^^^^^^

Document Generator based random number generation (:pr:10134) Eray Aslan_

Maintenance ^^^^^^^^^^^

Update dataframe.convert_string to dataframe.convert-string (:pr:10191) Irina Truong_
Add python-cityhash to CI environments (:pr:10190) Charles Blackmon-Luca_
Temporarily pin scikit-image to fix Windows CI (:pr:10186) Patrick Hoefler_
Handle pandas deprecation warnings for to_pydatetime and apply (:pr:10168) Patrick Hoefler_
Drop bokeh<3 restriction (:pr:10177) James Bourbeau_
Fix failing tests under copy-on-write (:pr:10173) Patrick Hoefler_
Allow pyarrow CI to fail (:pr:10176) James Bourbeau_
Switch to Generator for random number generation in dask.array (:pr:10003) Eray Aslan_
Bump peter-evans/create-pull-request from 4 to 5 (:pr:10166)
Fix flaky modf operation in test_arithmetic (:pr:10162) Irina Truong_
Temporarily remove xarray from CI with pandas 2.0 (:pr:10153) James Bourbeau_
Fix update_graph counting logic in test_default_scheduler_on_worker (:pr:10145) James Bourbeau_
Fix documentation build with pandas 2.0 (:pr:10138) James Bourbeau_
Remove dask/gpu from gpuCI update reviewers (:pr:10135) Charles Blackmon-Luca_
Update gpuCI RAPIDS_VER to 23.06 (:pr:10129)
Bump actions/stale from 6 to 8 (:pr:10121)
Use declarative setuptools (:pr:10102) Thomas Grainger_
Relax assert_eq checks on Scalar-like objects (:pr:10125) Matthew Rocklin_
Upgrade readthedocs config to ubuntu 22.04 and Python 3.11 (:pr:10124) Thomas Grainger_
Bump actions/checkout from 3.4.0 to 3.5.0 (:pr:10122)
Fix test_null_partition_pyarrow in pyarrow CI build (:pr:10116) Irina Truong_
Drop distributed pack (:pr:9988) Florian Jetter_
Make dask.compatibility private (:pr:10114) Jacob Tomlinson_

.. _v2023.3.2:

2023.3.2

Released on March 24, 2023

Enhancements ^^^^^^^^^^^^

Deprecate observed=False for groupby with categoricals (:pr:10095) Irina Truong_
Deprecate axis= for some groupby operations (:pr:10094) James Bourbeau_
The axis keyword in DataFrame.rolling/Series.rolling is deprecated (:pr:10110) Irina Truong_
DataFrame._data deprecation in pandas (:pr:10081) Irina Truong_
Use importlib_metadata backport to avoid CLI UserWarning (:pr:10070) Thomas Grainger_
Port option parsing logic from dask.dataframe.read_parquet to to_parquet (:pr:9981) Anton Loukianov_

Bug Fixes ^^^^^^^^^

Avoid using dd.shuffle in groupby-apply (:pr:10043) Richard (Rick) Zamora_
Enable null hive partitions with pyarrow parquet engine (:pr:10007) Richard (Rick) Zamora_
Support unknown shapes in *_like functions (:pr:10064) Doug Davis_

Documentation ^^^^^^^^^^^^^

Add to_backend methods to API docs (:pr:10093) Lawrence Mitchell_
Remove broken gpuCI link in developer docs (:pr:10065) Charles Blackmon-Luca_

Maintenance ^^^^^^^^^^^

Configure readthedocs sphinx warnings as errors (:pr:10104) Thomas Grainger_
Un-xfail test_division_or_partition with pyarrow strings active (:pr:10108) Irina Truong_
Un-xfail test_different_columns_are_allowed with pyarrow strings active (:pr:10109) Irina Truong_
Restore Entrypoints compatibility (:pr:10113) Jacob Tomlinson_
Un-xfail test_to_dataframe_optimize_graph with pyarrow strings active (:pr:10087) Irina Truong_
Only run test_development_guidelines_matches_ci on editable install (:pr:10106) Charles Blackmon-Luca_
Un-xfail test_dataframe_cull_key_dependencies_materialized with pyarrow strings active (:pr:10088) Irina Truong_
Install mimesis in CI environments (:pr:10105) Charles Blackmon-Luca_
Fix for no module named ipykernel (:pr:10101) Irina Truong_
Fix docs builds by installing ipykernel (:pr:10103) Thomas Grainger_
Allow pyarrow build to continue on failures (:pr:10097) James Bourbeau_
Bump actions/checkout from 3.3.0 to 3.4.0 (:pr:10096)
Fix test_set_index_on_empty with pyarrow strings active (:pr:10054) Irina Truong_
Un-xfail pyarrow pickling tests (:pr:10082) James Bourbeau_
CI environment file cleanup (:pr:10078) James Bourbeau_
Un-xfail more pyarrow tests (:pr:10066) Irina Truong_
Temporarily skip pyarrow_compat tests with pandas 2.0 (:pr:10063) James Bourbeau`_
Fix test_melt with pyarrow strings active (:pr:10052) Irina Truong_
Fix test_str_accessor with pyarrow strings active (:pr:10048) James Bourbeau_
Fix test_better_errors_object_reductions with pyarrow strings active (:pr:10051) James Bourbeau_
Fix test_loc_with_non_boolean_series with pyarrow strings active (:pr:10046) James Bourbeau_
Fix test_values with pyarrow strings active (:pr:10050) James Bourbeau_
Temporarily xfail test_upstream_packages_installed (:pr:10047) James Bourbeau_

.. _v2023.3.1:

2023.3.1

Released on March 10, 2023

Enhancements ^^^^^^^^^^^^

Support pyarrow strings in MultiIndex (:pr:10040) Irina Truong_
Improved support for pyarrow strings (:pr:10000) Irina Truong_
Fix flaky RuntimeWarning during array reductions (:pr:10030) James Bourbeau_
Extend complete extras (:pr:10023) James Bourbeau_
Raise an error with dataframe.convert-string=True and pandas<2.0 (:pr:10033) Irina Truong_
Rename shuffle/rechunk config option/kwarg to method (:pr:10013) James Bourbeau_
Add initial support for converting pandas extension dtypes to arrays (:pr:10018) James Bourbeau_
Remove randomgen support (:pr:9987) Eray Aslan_

Bug Fixes ^^^^^^^^^

Skip rechunk when rechunking to the same chunks with unknown sizes (:pr:10027) Hendrik Makait_
Custom utility to convert parquet filters to pyarrow expression (:pr:9885) Richard (Rick) Zamora_
Consider numpy scalars and 0d arrays as scalars when padding (:pr:9653) Justus Magin_
Fix parquet overwrite behavior after an adaptive read_parquet operation (:pr:10002) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Add and update docs for Data Transfer section (:pr:10022) Miles_

Maintenance ^^^^^^^^^^^

Remove stale hive-partitioning code from pyarrow parquet engine (:pr:10039) Richard (Rick) Zamora_
Increase minimum supported pyarrow to 7.0 (:pr:10024) James Bourbeau_
Revert "Prepare drop packunpack (:pr:9994) (:pr:10037) Florian Jetter_
Have codecov wait for more builds before reporting (:pr:10031) James Bourbeau_
Prepare drop packunpack (:pr:9994) Florian Jetter_
Add CI job with pyarrow strings turned on (:pr:10017) James Bourbeau_
Fix test_groupby_dropna_with_agg for pandas 2.0 (:pr:10001) Irina Truong_
Fix test_pickle_roundtrip for pandas 2.0 (:pr:10011) James Bourbeau_

.. _v2023.3.0:

2023.3.0

Released on March 1, 2023

Bug Fixes ^^^^^^^^^

Bag must not pick p2p as shuffle default (:pr:10005) Florian Jetter_

Documentation ^^^^^^^^^^^^^

Minor follow-up to P2P by default (:pr:10008) James Bourbeau_

Maintenance ^^^^^^^^^^^

Add minimum version to optional jinja2 dependency (:pr:9999) Charles Blackmon-Luca_

.. _v2023.2.1:

2023.2.1

Released on February 24, 2023

.. note::

This release changes the default DataFrame shuffle algorithm to ``p2p``
to improve stability and performance. `Learn more here <https://blog.coiled.io/blog/shuffling-large-data-at-constant-memory.html?utm_source=dask-docs&utm_medium=changelog>`_
and please provide any feedback `on this discussion <https://github.com/dask/distributed/discussions/7509>`_.

If you encounter issues with this new algorithm, please see the :ref:`documentation <shuffle-methods>`
for more information, and how to switch back to the old mode.

Enhancements ^^^^^^^^^^^^

Enable P2P shuffling by default (:pr:9991) Florian Jetter_
P2P rechunking (:pr:9939) Hendrik Makait_
Efficient dataframe.convert-string support for read_parquet (:pr:9979) Irina Truong_
Allow p2p shuffle kwarg for DataFrame merges (:pr:9900) Florian Jetter_
Change split_row_groups default to "infer" (:pr:9637) Richard (Rick) Zamora_
Add option for converting string data to use pyarrow strings (:pr:9926) James Bourbeau_
Add support for multi-column sort_values (:pr:8263) Charles Blackmon-Luca_
Generator based random-number generation indask.array (:pr:9038) Eray Aslan_
Support numeric_only for simple groupby aggregations for pandas 2.0 compatibility (:pr:9889) Irina Truong_

Bug Fixes ^^^^^^^^^

Fix profilers plot not being aligned to context manager enter time (:pr:9739) David Hoese_
Relax dask.dataframe assert_eq type checks (:pr:9989) Matthew Rocklin_
Restore describe compatibility for pandas 2.0 (:pr:9982) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Improving deploying Dask docs (:pr:9912) Sarah Charlotte Johnson_
More docs for DataFrame.partitions (:pr:9976) Tom Augspurger_
Update docs with more information on default Delayed scheduler (:pr:9903) Guillaume Eynard-Bontemps_
Deployment Considerations documentation (:pr:9933) Gabe Joseph_

Maintenance ^^^^^^^^^^^

Temporarily rerun flaky tests (:pr:9983) James Bourbeau_
Update parsing of FULL_RAPIDS_VER/FULL_UCX_PY_VER (:pr:9990) Charles Blackmon-Luca_
Increase minimum supported versions to pandas=1.3 and numpy=1.21 (:pr:9950) James Bourbeau_
Fix std to work with numeric_only for pandas 2.0 (:pr:9960) Irina Truong_
Temporarily xfail test_roundtrip_partitioned_pyarrow_dataset (:pr:9977) James Bourbeau_
Fix copy on write failure in test_idxmaxmin (:pr:9944) Patrick Hoefler_
Bump pre-commit versions (:pr:9955) crusaderky_
Fix test_groupby_unaligned_index for pandas 2.0 (:pr:9963) Irina Truong_
Un-xfail test_set_index_overlap_2 for pandas 2.0 (:pr:9959) James Bourbeau_
Fix test_merge_by_index_patterns for pandas 2.0 (:pr:9930) Irina Truong_
Bump jacobtomlinson/gha-find-replace from 2 to 3 (:pr:9953) James Bourbeau_
Fix test_rolling_agg_aggregate for pandas 2.0 compatibility (:pr:9948) Irina Truong_
Bump black to 23.1.0 (:pr:9956) crusaderky_
Run GPU tests on python 3.8 & 3.10 (:pr:9940) Charles Blackmon-Luca_
Fix test_to_timestamp for pandas 2.0 (:pr:9932) Irina Truong_
Fix an error with groupby value_counts for pandas 2.0 compatibility (:pr:9928) Irina Truong_
Config converter: replace all dashes with underscores (:pr:9945) Jacob Tomlinson_
CI: use nightly wheel to install pyarrow in upstream test build (:pr:9873) Joris Van den Bossche_

.. _v2023.2.0:

2023.2.0

Released on February 10, 2023

Enhancements ^^^^^^^^^^^^

Update numeric_only default in quantile for pandas 2.0 (:pr:9854) Irina Truong_
Make repartition a no-op when divisions match (:pr:9924) James Bourbeau_
Update datetime_is_numeric behavior in describe for pandas 2.0 (:pr:9868) Irina Truong_
Update value_counts to return correct name in pandas 2.0 (:pr:9919) Irina Truong_
Support new axis=None behavior in pandas 2.0 for certain reductions (:pr:9867) James Bourbeau_
Filter out all-nan RuntimeWarning at the chunk level for nanmin and nanmax (:pr:9916) Julia Signell_
Fix numeric meta_nonempty index creation for pandas 2.0 (:pr:9908) James Bourbeau_
Fix DataFrame.info() tests for pandas 2.0 (:pr:9909) James Bourbeau_

Bug Fixes ^^^^^^^^^

Fix GroupBy.value_counts handling for multiple groupby columns (:pr:9905) Charles Blackmon-Luca_

Documentation ^^^^^^^^^^^^^

Fix some outdated information/typos in development guide (:pr:9893) Patrick Hoefler_
Add note about keep=False in drop_duplicates docstring (:pr:9887) Jayesh Manani_
Add meta details to dask Array (:pr:9886) Jayesh Manani_
Clarify task stream showing more rows than threads (:pr:9906) Gabe Joseph_

Maintenance ^^^^^^^^^^^

Fix test_numeric_column_names for pandas 2.0 (:pr:9937) Irina Truong_
Fix dask/dataframe/tests/test_utils_dataframe.py tests for pandas 2.0 (:pr:9788) James Bourbeau_
Replace index.is_numeric with is_any_real_numeric_dtype for pandas 2.0 compatibility (:pr:9918) Irina Truong_
Avoid pd.core import in dask utils (:pr:9907) Matthew Roeschke_
Use label for upstream build on pull requests (:pr:9910) James Bourbeau_
Broaden exception catching for sqlalchemy.exc.RemovedIn20Warning (:pr:9904) James Bourbeau_
Temporarily restrict sqlalchemy < 2 in CI (:pr:9897) James Bourbeau_
Update isort version to 5.12.0 (:pr:9895) Lawrence Mitchell_
Remove unused skiprows variable in read_csv (:pr:9892) Patrick Hoefler_

.. _v2023.1.1:

2023.1.1

Released on January 27, 2023

Enhancements ^^^^^^^^^^^^

Add to_backend method to Array and _Frame (:pr:9758) Richard (Rick) Zamora_
Small fix for timestamp index divisions in pandas 2.0 (:pr:9872) Irina Truong_
Add numeric_only to DataFrame.cov and DataFrame.corr (:pr:9787) James Bourbeau_
Fixes related to group_keys default change in pandas 2.0 (:pr:9855) Irina Truong_
infer_datetime_format compatibility for pandas 2.0 (:pr:9783) James Bourbeau_

Bug Fixes ^^^^^^^^^

Fix serialization bug in BroadcastJoinLayer (:pr:9871) Richard (Rick) Zamora_
Satisfy broadcast argument in DataFrame.merge (:pr:9852) Richard (Rick) Zamora_
Fix pyarrow parquet columns statistics computation (:pr:9772) aywandji_

Documentation ^^^^^^^^^^^^^

Fix "duplicate explicit target name" docs warning (:pr:9863) Chiara Marmo_
Fix code formatting issue in "Defining a new collection backend" docs (:pr:9864) Chiara Marmo_
Update dashboard documentation for memory plot (:pr:9768) Jayesh Manani_
Add docs section about no-worker tasks (:pr:9839) Florian Jetter_

Maintenance ^^^^^^^^^^^

Additional updates for detecting a distributed scheduler (:pr:9890) James Bourbeau_
Update gpuCI RAPIDS_VER to 23.04 (:pr:9876)
Reverse precedence between collection and distributed default (:pr:9869) Florian Jetter_
Update xarray-contrib/issue-from-pytest-log to version 1.2.6 (:pr:9865) James Bourbeau_
Dont require dask config shuffle default (:pr:9826) Florian Jetter_
Un-xfail datetime64 Parquet roundtripping tests for new fastparquet (:pr:9811) James Bourbeau_
Add option to manually run upstream CI build (:pr:9853) James Bourbeau_
Use custom timeout in CI builds (:pr:9844) James Bourbeau_
Remove kwargs from make_blockwise_graph (:pr:9838) Florian Jetter_
Ignore warnings on persist call in test_setitem_extended_API_2d_mask (:pr:9843) Charles Blackmon-Luca_
Fix running S3 tests locally (:pr:9833) James Bourbeau_

.. _v2023.1.0:

2023.1.0

Released on January 13, 2023

Enhancements ^^^^^^^^^^^^

Use distributed default clients even if no config is set (:pr:9808) Florian Jetter_
Implement ma.where and ma.nonzero (:pr:9760) Erik Holmgren_
Update zarr store creation functions (:pr:9790) Ryan Abernathey_
iteritems compatibility for pandas 2.0 (:pr:9785) James Bourbeau_
Accurate sizeof for pandas string[python] dtype (:pr:9781) crusaderky_
Deflate sizeof() of duplicate references to pandas object types (:pr:9776) crusaderky_
GroupBy.__getitem__ compatibility for pandas 2.0 (:pr:9779) James Bourbeau_
append compatibility for pandas 2.0 (:pr:9750) James Bourbeau_
get_dummies compatibility for pandas 2.0 (:pr:9752) James Bourbeau_
is_monotonic compatibility for pandas 2.0 (:pr:9751) James Bourbeau_
numpy=1.24 compatability (:pr:9777) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Remove duplicated encoding kwarg in docstring for to_json (:pr:9796) Sultan Orazbayev_
Mention SubprocessCluster in LocalCluster documentation (:pr:9784) Hendrik Makait_
Move Prometheus docs to dask/distributed (:pr:9761) crusaderky_

Maintenance ^^^^^^^^^^^

Temporarily ignore RuntimeWarning in test_setitem_extended_API_2d_mask (:pr:9828) James Bourbeau_
Fix flaky test_threaded.py::test_interrupt (:pr:9827) Hendrik Makait_
Update xarray-contrib/issue-from-pytest-log in upstream report (:pr:9822) James Bourbeau_
pip install dask on gpuCI builds (:pr:9816) Charles Blackmon-Luca_
Bump actions/checkout from 3.2.0 to 3.3.0 (:pr:9815)
Resolve sqlalchemy import failures in mindeps testing (:pr:9809) Charles Blackmon-Luca_
Ignore sqlalchemy.exc.RemovedIn20Warning (:pr:9801) Thomas Grainger_
xfail datetime64 Parquet roundtripping tests for pandas 2.0 (:pr:9786) James Bourbeau_
Remove sqlachemy 1.3 compatibility (:pr:9695) McToel_
Reduce size of expected DoK sparse matrix (:pr:9775) Elliott Sales de Andrade_
Remove executable flag from dask/dataframe/io/orc/utils.py (:pr:9774) Elliott Sales de Andrade_

.. _v2022.12.1:

2022.12.1

Released on December 16, 2022

Enhancements ^^^^^^^^^^^^

Support dtype_backend="pandas|pyarrow" configuration (:pr:9719) James Bourbeau_
Support cupy.ndarray to cudf.DataFrame dispatching in dask.dataframe (:pr:9579) Richard (Rick) Zamora_
Make filesystem-backend configurable in read_parquet (:pr:9699) Richard (Rick) Zamora_
Serialize all pyarrow extension arrays efficiently (:pr:9740) James Bourbeau_

Bug Fixes ^^^^^^^^^

Fix bug when repartitioning with tz-aware datetime index (:pr:9741) James Bourbeau_
Partial functions in aggs may have arguments (:pr:9724) Irina Truong_
Add support for simple operation with pyarrow-backed extension dtypes (:pr:9717) James Bourbeau_
Rename columns correctly in case of SeriesGroupby (:pr:9716) Lawrence Mitchell_

Documentation ^^^^^^^^^^^^^

Fix url link typo in collection backend doc (:pr:9748) Shawn_
Update Prometheus docs (:pr:9696) Hendrik Makait_

Maintenance ^^^^^^^^^^^

Add zarr to Python 3.11 CI environment (:pr:9771) James Bourbeau_
Add support for Python 3.11 (:pr:9708) Thomas Grainger_
Bump actions/checkout from 3.1.0 to 3.2.0 (:pr:9753)
Avoid np.bool8 deprecation warning (:pr:9737) James Bourbeau_
Make sure dev packages aren't overwritten in upstream CI build (:pr:9731) James Bourbeau_
Avoid adding data.h5 and mydask.html files during tests (:pr:9726) Thomas Grainger_

.. _v2022.12.0:

2022.12.0

Released on December 2, 2022

Enhancements ^^^^^^^^^^^^

Remove statistics-based set_index logic from read_parquet (:pr:9661) Richard (Rick) Zamora_
Add support for use_nullable_dtypes to dd.read_parquet (:pr:9617) Ian Rose_
Fix map_overlap in order to accept pandas arguments (:pr:9571) Fabien Aulaire_
Fix pandas 1.5+ FutureWarning in .str.split(..., expand=True) (:pr:9704) Jacob Hayes_
Enable column projection for groupby slicing (:pr:9667) Richard (Rick) Zamora_
Support duplicate column cum-functions (:pr:9685) Ben_
Improve error message for failed backend dispatch call (:pr:9677) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

Revise meta creation in arrow parquet engine (:pr:9672) Richard (Rick) Zamora_
Fix da.fft.fft for array-like inputs (:pr:9688) James Bourbeau_
Fix groupby -aggregation when grouping on an index by name (:pr:9646) Richard (Rick) Zamora_

Maintenance ^^^^^^^^^^^

Avoid PytestReturnNotNoneWarning in test_inheriting_class (:pr:9707) Thomas Grainger_
Fix flaky test_dataframe_aggregations_multilevel (:pr:9701) Richard (Rick) Zamora_
Bump mypy version (:pr:9697) crusaderky_
Disable dashboard in test_map_partitions_df_input (:pr:9687) James Bourbeau_
Use latest xarray-contrib/issue-from-pytest-log in upstream build (:pr:9682) James Bourbeau_
xfail ttest_1samp for upstream scipy (:pr:9670) James Bourbeau_
Update gpuCI RAPIDS_VER to 23.02 (:pr:9678)

.. _v2022.11.1:

2022.11.1

Released on November 18, 2022

Enhancements ^^^^^^^^^^^^

Restrict bokeh=3 support (:pr:9673) Gabe Joseph_
Updates for fastparquet evolution (:pr:9650) Martin Durant_

Maintenance ^^^^^^^^^^^

Update ga-yaml-parser step in gpuCI updating workflow (:pr:9675) Charles Blackmon-Luca_
Revert importlib.metadata workaround (:pr:9658) James Bourbeau_
Fix mindeps-distributed CI build to handle numpy/pandas not being installed (:pr:9668) James Bourbeau_

.. _v2022.11.0:

2022.11.0

Released on November 15, 2022

Enhancements ^^^^^^^^^^^^

Generalize from_dict implementation to allow usage from other backends (:pr:9628) GALI PREM SAGAR_

Bug Fixes ^^^^^^^^^

Avoid pandas constructors in dask.dataframe.core (:pr:9570) Richard (Rick) Zamora_
Fix sort_values with Timestamp data (:pr:9642) James Bourbeau_
Generalize array checking and remove pd.Index call in _get_partitions (:pr:9634) Benjamin Zaitlen_
Fix read_csv behavior for header=0 and names (:pr:9614) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Update dashboard docs for queuing (:pr:9660) Gabe Joseph_
Remove import dask as d from docstrings (:pr:9644) Matthew Rocklin_
Fix link to partitions docs in read_parquet docstring (:pr:9636) qheuristics_
Add API doc links to array/bag/dataframe sections (:pr:9630) Matthew Rocklin_

Maintenance ^^^^^^^^^^^

Use conda-incubator/[email protected] (:pr:9662) John A Kirkham_
Allow bokeh=3 (:pr:9659) James Bourbeau_
Run upstream build with Python 3.10 (:pr:9655) James Bourbeau_
Pin pyyaml version in mindeps testing (:pr:9640) Charles Blackmon-Luca_
Add pre-commit to catch breakpoint() (:pr:9638) James Bourbeau_
Bump xarray-contrib/issue-from-pytest-log from 1.1 to 1.2 (:pr:9635)
Remove blosc references (:pr:9625) Naty Clementi_
Upgrade mypy and drop unused comments (:pr:9616) Hendrik Makait_
Harden test_repartition_npartitions (:pr:9585) Richard (Rick) Zamora_

.. _v2022.10.2:

2022.10.2

Released on October 31, 2022

This was a hotfix and has no changes in this repository. The necessary fix was in dask/distributed, but we decided to bump this version number for consistency.

.. _v2022.10.1:

2022.10.1

Released on October 28, 2022

Enhancements ^^^^^^^^^^^^

Enable named aggregation syntax (:pr:9563) ChrisJar_
Add extension dtype support to set_index (:pr:9566) James Bourbeau_
Redesigning the array HTML repr for clarity (:pr:9519) Shingo OKAWA_

Bug Fixes ^^^^^^^^^

Fix merge with emtpy left DataFrame (:pr:9578) Ian Rose_

Documentation ^^^^^^^^^^^^^

Add note about limiting thread oversubscription by default (:pr:9592) James Bourbeau_
Use sphinx-click for dask CLI (:pr:9589) James Bourbeau_
Fix Semaphore API docs (:pr:9584) James Bourbeau_
Render meta description in map_overlap docstring (:pr:9568) James Bourbeau_

Maintenance ^^^^^^^^^^^

Require Click 7.0+ in Dask (:pr:9595) John A Kirkham_
Temporarily restrict bokeh<3 (:pr:9607) James Bourbeau_
Resolve importlib-related failures in upstream CI (:pr:9604) Charles Blackmon-Luca_
Improve upstream CI report (:pr:9603) James Bourbeau_
Fix upstream CI report (:pr:9602) James Bourbeau_
Remove setuptools host dep, add CLI entrypoint (:pr:9600) Charles Blackmon-Luca_
More Backend dispatch class type annotations (:pr:9573) Ian Rose_

.. _v2022.10.0:

2022.10.0

Released on October 14, 2022

New Features ^^^^^^^^^^^^

Backend library dispatching for IO in Dask-Array and Dask-DataFrame (:pr:9475) Richard (Rick) Zamora_
Add new CLI that is extensible (:pr:9283) Doug Davis_

Enhancements ^^^^^^^^^^^^

Groupby median (:pr:9516) Ian Rose_
Fix array copy not being a no-op (:pr:9555) David Hoese_
Add support for string timedelta in map_overlap (:pr:9559) Nicolas Grandemange_
Shuffle-based groupby for single functions (:pr:9504) Ian Rose_
Make datetime.datetime tokenize idempotantly (:pr:9532) Martin Durant_
Support tokenizing datetime.time (:pr:9528) Tim Paine_

Bug Fixes ^^^^^^^^^

Avoid race condition in lazy dispatch registration (:pr:9545) James Bourbeau_
Do not allow setitem to np.nan for int dtype (:pr:9531) Doug Davis_
Stable demo column projection (:pr:9538) Ian Rose_
Ensure pickle-able binops in delayed (:pr:9540) Ian Rose_
Fix project CSV columns when selecting (:pr:9534) Martin Durant_

Documentation ^^^^^^^^^^^^^

Update Parquet best practice (:pr:9537) Matthew Rocklin_

Maintenance ^^^^^^^^^^^

Restrict tiledb-py version to avoid CI failures (:pr:9569) James Bourbeau_
Bump actions/github-script from 3 to 6 (:pr:9564)
Bump actions/stale from 4 to 6 (:pr:9551)
Bump peter-evans/create-pull-request from 3 to 4 (:pr:9550)
Bump actions/checkout from 2 to 3.1.0 (:pr:9552)
Bump codecov/codecov-action from 1 to 3 (:pr:9549)
Bump the-coding-turtle/ga-yaml-parser from 0.1.1 to 0.1.2 (:pr:9553)
Move dependabot configuration file (:pr:9547) James Bourbeau_
Add dependabot for GitHub actions (:pr:9542) James Bourbeau_
Run mypy on Windows and Linux (:pr:9530) crusaderky_
Update gpuCI RAPIDS_VER to 22.12 (:pr:9524)

.. _v2022.9.2:

2022.9.2

Released on September 30, 2022

Enhancements ^^^^^^^^^^^^

Remove factorization logic from array auto chunking (:pr:9507) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Add docs on running Dask in a standalone Python script (:pr:9513) James Bourbeau_
Clarify custom-graph multiprocessing example (:pr:9511) nouman_

Maintenance ^^^^^^^^^^^

Groupby sort upstream compatibility (:pr:9486) Ian Rose_

.. _v2022.9.1:

2022.9.1

Released on September 16, 2022

New Features ^^^^^^^^^^^^

Add DataFrame and Series median methods (:pr:9483) James Bourbeau_

Enhancements ^^^^^^^^^^^^

Shuffle groupby default (:pr:9453) Ian Rose_
Filter by list (:pr:9419) Greg Hayes_
Added distributed.utils.key_split functionality to dask.utils.key_split (:pr:9464) Luke Conibear_

Bug Fixes ^^^^^^^^^

Fix overlap so that set_index doesn't drop rows (:pr:9423) Julia Signell_
Fix assigning pandas Series to column when ddf.columns.min() raises (:pr:9485) Erik Welch_
Fix metadata comparison stack_partitions (:pr:9481) James Bourbeau_
Provide default for split_out (:pr:9493) Lawrence Mitchell_

Deprecations ^^^^^^^^^^^^

Allow split_out to be None, which then defaults to 1 in groupby().aggregate() (:pr:9491) Ian Rose_

Documentation ^^^^^^^^^^^^^

Fixing enforce_metadata documentation, not checking for dtypes (:pr:9474) Nicolas Grandemange_
Fix it's --> its typo (:pr:9484) Nat Tabris_

Maintenance ^^^^^^^^^^^

Workaround for parquet writing failure using some datetime series but not others (:pr:9500) Ian Rose_
Filter out numeric_only warnings from pandas (:pr:9496) James Bourbeau_
Avoid set_index(..., inplace=True) where not necessary (:pr:9472) James Bourbeau_
Avoid passing groupby key list of length one (:pr:9495) James Bourbeau_
Update test_groupby_dropna_cudf based on cudf support for group_keys (:pr:9482) James Bourbeau_
Remove dd.from_bcolz (:pr:9479) James Bourbeau_
Added flake8-bugbear to pre-commit hooks (:pr:9457) Luke Conibear_
Bind loop variables in function definitions (B023) (:pr:9461) Luke Conibear_
Added assert for comparisons (B015) (:pr:9459) Luke Conibear_
Set top-level default shell in CI workflows (:pr:9469) James Bourbeau_
Removed unused loop control variables (B007) (:pr:9458) Luke Conibear_
Replaced getattr calls for constant attributes (B009) (:pr:9460) Luke Conibear_
Pin libprotobuf to allow nightly pyarrow in the upstream CI build (:pr:9465) Joris Van den Bossche_
Replaced mutable data structures for default arguments (B006) (:pr:9462) Luke Conibear_
Changed flake8 mirror and updated version (:pr:9456) Luke Conibear_

.. _v2022.9.0:

2022.9.0

Released on September 2, 2022

Enhancements ^^^^^^^^^^^^

Enable automatic column projection for groupby aggregations (:pr:9442) Richard (Rick) Zamora_
Accept superclasses in NEP-13/17 dispatching (:pr:6710) Gabe Joseph_

Bug Fixes ^^^^^^^^^

Rename by columns internally for cumulative operations on the same by columns (:pr:9430) Pavithra Eswaramoorthy_
Fix get_group with categoricals (:pr:9436) Pavithra Eswaramoorthy_
Fix caching-related MaterializedLayer.cull performance regression (:pr:9413) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Add maintainer documentation page (:pr:9309) James Bourbeau_

Maintenance ^^^^^^^^^^^

Revert skipped fastparquet test (:pr:9439) Pavithra Eswaramoorthy_
tmpfile does not end files with period on empty extension (:pr:9429) Hendrik Makait_
Skip failing fastparquet test with latest release (:pr:9432) James Bourbeau_

.. _v2022.8.1:

2022.8.1

Released on August 19, 2022

New Features ^^^^^^^^^^^^

Implement ma.*_like functions (:pr:9378) Ruth Comer_

Enhancements ^^^^^^^^^^^^

Fuse compatible annotations (:pr:9402) Ian Rose_
Shuffle-based groupby aggregation for high-cardinality groups (:pr:9302) Richard (Rick) Zamora_
Unpack namedtuple (:pr:9361) Hendrik Makait_

Bug Fixes ^^^^^^^^^

Fix SeriesGroupBy cumulative functions with axis=1 (:pr:9377) Pavithra Eswaramoorthy_
Sparse array reductions (:pr:9342) Ian Rose_
Fix make_meta while using categorical column with index (:pr:9348) Pavithra Eswaramoorthy_
Don't allow incompatible keywords in DataFrame.dropna (:pr:9366) Naty Clementi_
Make set_index handle entirely empty dataframes (:pr:8896) Julia Signell_
Improve dataclass handling in unpack_collections (:pr:9345) Hendrik Makait_
Fix bag sampling when there are some smaller partitions (:pr:9349) Ian Rose_
Add support for empty partitions to da.min/da.max functions (:pr:9268) geraninam_

Documentation ^^^^^^^^^^^^^

Clarify that bind() etc. regenerate the keys (:pr:9385) crusaderky_
Consolidate dashboard diagnostics documentation (:pr:9357) Sarah Charlotte Johnson_
Remove outdated meta information Pavithra Eswaramoorthy_

Maintenance ^^^^^^^^^^^

Use entry_points utility in sizeof (:pr:9390) James Bourbeau_
Add entry_points compatibility utility (:pr:9388) Jacob Tomlinson_
Upload environment file artifact for each CI build (:pr:9372) James Bourbeau_
Remove werkzeug pin in CI (:pr:9371) James Bourbeau_
Fix type annotations for dd.from_pandas and dd.from_delayed (:pr:9362) Jordan Yap_

.. _v2022.8.0:

2022.8.0

Released on August 5, 2022

Enhancements ^^^^^^^^^^^^

Ensure make_meta doesn't hold ref to data (:pr:9354) Jim Crist-Harif_
Revise divisions logic in from_pandas (:pr:9221) Richard (Rick) Zamora_
Warn if user sets index with existing index (:pr:9341) Julia Signell_
Add keepdims keyword for da.average (:pr:9332) Ruth Comer_
Change repr methods to avoid Layer materialization (:pr:9289) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

Make sure order kwarg will not crash the astype method (:pr:9317) Genevieve Buckley_
Fix bug for cumsum on cupy chunked dask arrays (:pr:9320) Genevieve Buckley_
Match input and output structure in _sample_reduce (:pr:9272) Pavithra Eswaramoorthy_
Include meta in array serialization (:pr:9240) Frédéric BRIOL_
Fix Index.memory_usage (:pr:9290) James Bourbeau_
Fix division calculation in dask.dataframe.io.from_dask_array (:pr:9282) Jordan Yap_

Documentation ^^^^^^^^^^^^^

Fow to use kwargs with custom task graphs (:pr:9322) Genevieve Buckley_
Add note to da.from_array about how the order is not preserved (:pr:9346) Julia Signell_
Add I/O info for async functions (:pr:9326) Logan Norman_
Tidy up docs snippet for futures IO functions (:pr:9340) Julia Signell_
Use consistent variable names for pandas df and Dask ddf in dataframe-groupby.rst (:pr:9304) ivojuroro_
Switch js-yaml for yaml.js in config converter (:pr:9306) Jacob Tomlinson_

Maintenance ^^^^^^^^^^^

Update da.linalg.solve for SciPy 1.9.0 compatibility (:pr:9350) Pavithra Eswaramoorthy_
Update test_getitem_avoids_large_chunks_missing (:pr:9347) Pavithra Eswaramoorthy_
Fix docs title formatting for "Extend sizeof" Doug Davis_
Import loop_in_thread fixture in tests (:pr:9337) James Bourbeau_
Temporarily xfail test_solve_sym_pos (:pr:9336) Pavithra Eswaramoorthy_
Fix small typo in 10 minutes to Dask page (:pr:9329) Shaghayegh_
Temporarily pin werkzeug in CI to avoid test suite hanging (:pr:9325) James Bourbeau_
Add tests for cupy.angle() (:pr:9312) Peter Andreas Entschev_
Update gpuCI RAPIDS_VER to 22.10 (:pr:9314)
Add pandas[test] to test extra (:pr:9110) Ben Beasley_
Add bokeh and scipy to upstream CI build (:pr:9265) James Bourbeau_

.. _v2022.7.1:

2022.7.1

Released on July 22, 2022

Enhancements ^^^^^^^^^^^^

Return Dask array if all axes are squeezed (:pr:9250) Pavithra Eswaramoorthy_
Make cycle reported by toposort shorter (:pr:9068) Erik Welch_
Unknown chunk slicing - raise informative error (:pr:9285) Naty Clementi_

Bug Fixes ^^^^^^^^^

Fix bug in HighLevelGraph.cull (:pr:9267) Richard (Rick) Zamora_
Sort categories (:pr:9264) Pavithra Eswaramoorthy_
Use max (instead of sum) for calculating warnsize (:pr:9235) Pavithra Eswaramoorthy_
Fix bug when filtering on partitioned column with pyarrow (:pr:9252) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Updated repartition documentation to add note about partition_size (:pr:9288) Dylan Stewart_
Don't include docs in Array methods, just refer to module docs (:pr:9244) Julia Signell_
Remove outdated reference to scheduler and worker dashboards (:pr:9278) Pavithra Eswaramoorthy_
Fix a few typos (:pr:9270) Tim Gates_
Adds an custom aggregate example using numpy methods (:pr:9260) geraninam_

Maintenance ^^^^^^^^^^^

Add type annotations to dd.from_pandas and dd.from_delayed (:pr:9237) Michael Milton_
Update calculate_divisions docstring (:pr:9275) Tom Augspurger_
Update test_plot_multiple for upcoming bokeh release (:pr:9261) James Bourbeau_
Add typing to common array properties (:pr:9255) Illviljan_

.. _v2022.7.0:

2022.7.0

Released on July 8, 2022

Enhancements ^^^^^^^^^^^^

Support pathlib.PurePath in normalize_token (:pr:9229) Angus Hollands_
Add AttributeNotImplementedError for properties so IPython glob search works (:pr:9231) Erik Welch_
map_overlap: multiple dataframe handling (:pr:9145) Fabien Aulaire_
Read entrypoints in dask.sizeof (:pr:7688) Angus Hollands_

Bug Fixes ^^^^^^^^^

Fix TypeError: 'Serialize' object is not subscriptable when writing parquet dataset with Client(processes=False) (:pr:9015) Lucas Miguel Ponce_
Correct dtypes when concat with an empty dataframe (:pr:9193) Pavithra Eswaramoorthy_

Documentation ^^^^^^^^^^^^^

Highlight note about persist (:pr:9234) Pavithra Eswaramoorthy_
Update release-procedure to include more detail and helpful commands (:pr:9215) Julia Signell_
Better SEO for Futures and Dask vs. Spark pages (:pr:9217) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

Use math.prod instead of np.prod on lists, tuples, and iters (:pr:9232) crusaderky_
Only import IPython if type checking (:pr:9230) Florian Jetter_
Tougher mypy checks (:pr:9206) crusaderky_

.. _v2022.6.1:

2022.6.1

Released on June 24, 2022

Enhancements ^^^^^^^^^^^^

Dask in pyodide (:pr:9053) Ian Rose_
Create dask.utils.show_versions (:pr:9144) Sultan Orazbayev_
Better error message for unsupported numpy operations on dask.dataframe objects. (:pr:9201) Julia Signell_
Add allow_rechunk kwarg to dask.array.overlap function (:pr:7776) Genevieve Buckley_
Add minutes and hours to dask.utils.format_time (:pr:9116) Matthew Rocklin_
More retries when writing parquet to remote filesystem (:pr:9175) Ian Rose_

Bug Fixes ^^^^^^^^^

Timedelta deterministic hashing (:pr:9213) Fabien Aulaire_
Enum deterministic hashing (:pr:9212) Fabien Aulaire_
shuffle_group(): avoid converting to arrays (:pr:9157) Mads R. B. Kristensen_

Deprecations ^^^^^^^^^^^^

Deprecate extra format_time utility (:pr:9184) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Better SEO for 10 Minutes to Dask (:pr:9182) Sarah Charlotte Johnson_
Better SEO for Delayed and Best Practices (:pr:9194) Sarah Charlotte Johnson_
Include known inconsistency in DataFrame str.split accessor docstring (:pr:9177) Richard Pelgrim_
Add inconsistencies keyword to derived_from (:pr:9192) Richard Pelgrim_
Add missing append in delayed best practices example (:pr:9202) Ben_
Fix indentation in Best Practices (:pr:9196) Sarah Charlotte Johnson_
Add link to Genevieve Buckley's blog on chunk sizes (:pr:9199) Pavithra Eswaramoorthy
Update to_csv docstring (:pr:9094) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

Update versioneer: change from using SafeConfigParser to ConfigParser (:pr:9205) Thomas A Caswell_
Remove ipython hack in CI(:pr:9200) crusaderky_

.. _v2022.6.0:

2022.6.0

Released on June 10, 2022

Enhancements ^^^^^^^^^^^^

Add feature to show names of layer dependencies in HLG JupyterLab repr (:pr:9081) Angelos Omirolis_
Add arrow schema extraction dispatch (:pr:9169) GALI PREM SAGAR_
Add sort_results argument to assert_eq (:pr:9130) Pavithra Eswaramoorthy_
Add weeks to parse_timedelta (:pr:9168) Matthew Rocklin_
Warn that cloudpickle is not always deterministic (:pr:9148) Pavithra Eswaramoorthy_
Switch parquet default engine (:pr:9140) Jim Crist-Harif_
Use deterministic hashing with _iLocIndexer / _LocIndexer (:pr:9108) Fabien Aulaire_
Enfore consistent schema in to_parquet pyarrow (:pr:9131) Jim Crist-Harif_

Bug Fixes ^^^^^^^^^

Fix pyarrow.StringArray pickle (:pr:9170) Jim Crist-Harif_
Fix parallel metadata collection in pyarrow engine (:pr:9165) Richard (Rick) Zamora_
Improve pyarrow partitioning logic (:pr:9147) James Bourbeau_
pyarrow 8.0 partitioning fix (:pr:9143) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Better SEO for Installing Dask and Dask DataFrame Best Practices (:pr:9178) Sarah Charlotte Johnson_
Update logos page in docs (:pr:9167) Sarah Charlotte Johnson_
Add example using pandas Series to map_partition doctring (:pr:9161) Alex-JG3_
Update docs theme for rebranding (:pr:9160) Sarah Charlotte Johnson_
Better SEO for docs on Dask DataFrames (:pr:9128) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

Remove ensure_file from recommended practice for downstream libraries (:pr:9171) Matthew Rocklin_
Test round-tripping DataFrame parquet I/O including pyspark (:pr:9156) Ian Rose_
Try disabling HDF5 locking (:pr:9154) Ian Rose_
Link best practices to DataFrame-parquet (:pr:9150) Tom Augspurger_
Fix typo in map_partitions func parameter description (:pr:9149) Christopher Akiki_
Un-xfail test_groupby_grouper_dispatch (:pr:9139) GALI PREM SAGAR_
Temporarily import cleanup fixture from distributed (:pr:9138) James Bourbeau_
Simplify partitioning logic in pyarrow parquet engine (:pr:9041) Richard (Rick) Zamora_

.. _v2022.05.2:

2022.05.2

Released on May 26, 2022

Enhancements ^^^^^^^^^^^^

Add a dispatch for non-pandas Grouper objects and use it in GroupBy (:pr:9074) brandon-b-miller_
Error if read_parquet & to_parquet files intersect (:pr:9124) Jim Crist-Harif_
Visualize task graphs using ipycytoscape (:pr:9091) Ian Rose_

Documentation ^^^^^^^^^^^^^

Fix various typos (:pr:9126) Ryan Russell_

Maintenance ^^^^^^^^^^^

Fix flaky test_filter_nonpartition_columns (:pr:9127) Pavithra Eswaramoorthy_
Update gpuCI RAPIDS_VER to 22.08 (:pr:9120)
Include ``conftest.py``` in sdists (:pr:9115) Ben Beasley_

.. _v2022.05.1:

2022.05.1

Released on May 24, 2022

New Features ^^^^^^^^^^^^

Add DataFrame.from_dict classmethod (:pr:9017) Matthew Powers_
Add from_map function to Dask DataFrame (:pr:8911) Richard (Rick) Zamora_

Enhancements ^^^^^^^^^^^^

Improve to_parquet error for appended divisions overlap (:pr:9102) Jim Crist-Harif_
Enabled user-defined process-initializer functions (:pr:9087) ParticularMiner_
Mention align_dataframes=False option in map_partitions error (:pr:9075) Gabe Joseph_
Add kwarg enforce_ndim to dask.array.map_blocks() (:pr:8865) ParticularMiner_
Implement Series.GroupBy.fillna / DataFrame.GroupBy.fillna methods (:pr:8869) Pavithra Eswaramoorthy_
Allow fillna with Dask DataFrame (:pr:8950) Pavithra Eswaramoorthy_
Update error message for assignment with 1-d dask array (:pr:9036) Pavithra Eswaramoorthy_
Collection Protocol (:pr:8674) Doug Davis_
Patch around pandas ArrowStringArray pickling (:pr:9024) Jim Crist-Harif_
Band-aid for compute_as_if_collection (:pr:8998) Ian Rose_
Add p2p shuffle option (:pr:8836) Matthew Rocklin_

Bug Fixes ^^^^^^^^^

Fixup column projection with no columns (:pr:9106) Jim Crist-Harif_
Blockwise cull NumPy dtype (:pr:9100) Ian Rose_
Fix column-projection bug in from_map (:pr:9078) Richard (Rick) Zamora_
Prevent nulls in index for non-numeric dtypes (:pr:8963) Jorge López_
Fix is_monotonic methods for more than 8 partitions (:pr:9019) Julia Signell_
Handle enumerate and generator inputs to from_map (:pr:9066) Richard (Rick) Zamora_
Revert is_dask_collection; back to previous implementation (:pr:9062) Doug Davis_
Fix Blockwise.clone does not handle iterable literal arguments correctly (:pr:8979) JSKenyon_
Array setitem hardmask (:pr:9027) David Hassell_
Fix overlapping divisions error on append (:pr:8997) Ian Rose_

Deprecations ^^^^^^^^^^^^

Add pre-deprecation warnings for read_parquet kwargs chunksize and aggregate_files (:pr:9052) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Document map_partitions handling of args vs kwargs, usage of partition_info (:pr:9084) Charles Blackmon-Luca_
Update custom collection documentation (leverage new collection protocol) (:pr:9097) Doug Davis_
Better SEO for docs on creating and storing Dask DataFrames (:pr:9098) Sarah Charlotte Johnson_
Clarify chunking in imread docstring (:pr:9082) Genevieve Buckley_
Rearrange docs TOC (:pr:9001) Matthew Rocklin_
Corrected map_blocks() docstring for kwarg enforce_ndim (:pr:9071) ParticularMiner_
Update DataFrame SQL docs references to other libraries (:pr:9077) Charles Blackmon-Luca_
Update page on creating and storing Dask DataFrames (:pr:9025) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

Include NUMPY_LICENSE.txt in license files (:pr:9113) Ben Beasley_
Increase retries when installing nightly pandas (:pr:9103) James Bourbeau_
Force nightly pyarrow in the upstream build (:pr:9095) Joris Van den Bossche_
Improve object handling & testing of ensure_unicode (:pr:9059) John A Kirkham_
Force nightly pyarrow in the upstream build (:pr:8993) Joris Van den Bossche_
Additional check on is_dask_collection (:pr:9054) Doug Davis_
Update ensure_bytes (:pr:9050) John A Kirkham_
Add end of file pre-commit hook (:pr:9045) James Bourbeau_
Add codespell pre-commit hook (:pr:9040) James Bourbeau_
Remove the HDFS tests (:pr:9039) Jim Crist-Harif_
Fix flaky test_reductions_2D (:pr:9037) Jim Crist-Harif_
Prevent codecov from notifying of failure too soon (:pr:9031) Jim Crist-Harif_
Only test on Python 3.9 on macos (:pr:9029) Jim Crist-Harif_
Update to_timedelta default unit (:pr:9010) Pavithra Eswaramoorthy_

.. _v2022.05.0:

2022.05.0

Released on May 2, 2022

Highlights ^^^^^^^^^^ This is a bugfix release for this issue <https://github.com/dask/distributed/issues/6255>_.

Documentation ^^^^^^^^^^^^^

Add highlights section to 2022.04.2 release notes (:pr:9012) James Bourbeau_

.. _v2022.04.2:

2022.04.2

Released on April 29, 2022

Highlights ^^^^^^^^^^ This release includes several deprecations/breaking API changes to dask.dataframe.read_parquet and dask.dataframe.to_parquet:

to_parquet no longer writes _metadata files by default. If you want to write a _metadata file, you can pass in write_metadata_file=True.
read_parquet now defaults to split_row_groups=False, which results in one Dask dataframe partition per parquet file when reading in a parquet dataset. If you're working with large parquet files you may need to set split_row_groups=True to reduce your partition size.
read_parquet no longer calculates divisions by default. If you require read_parquet to return dataframes with known divisions, please set calculate_divisions=True.
read_parquet has deprecated the gather_statistics keyword argument. Please use the calculate_divisions keyword argument instead.
read_parquet has deprecated the require_extensions keyword argument. Please use the parquet_file_extension keyword argument instead.

New Features ^^^^^^^^^^^^

Add removeprefix and removesuffix as StringMethods (:pr:8912) Jorge López_

Enhancements ^^^^^^^^^^^^

Call fs.invalidate_cache in to_parquet (:pr:8994) Jim Crist-Harif_
Change to_parquet default to write_metadata_file=None (:pr:8988) Jim Crist-Harif_
Let arg reductions pass keepdims (:pr:8926) Julia Signell_
Change split_row_groups default to False in read_parquet (:pr:8981) Richard (Rick) Zamora_
Improve NotImplementedError message for da.reshape (:pr:8987) Jim Crist-Harif_
Simplify to_parquet compute path (:pr:8982) Jim Crist-Harif_
Raise an error if you try to use vindex with a Dask object (:pr:8945) Julia Signell_
Avoid pre_buffer=True when a precache method is specified (:pr:8957) Richard (Rick) Zamora_
from_dask_array uses blockwise instead of merging graphs (:pr:8889) Bryan Weber_
Use pre_buffer=True for "pyarrow" Parquet engine (:pr:8952) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

Handle dtype=None correctly in da.full (:pr:8954) Tom White_
Fix dask-sql bug caused by blockwise fusion (:pr:8989) Richard (Rick) Zamora_
to_parquet errors for non-string column names (:pr:8990) Jim Crist-Harif_
Make sure da.roll works even if shape is 0 (:pr:8925) Julia Signell_
Fix recursion error issue with set_index (:pr:8967) Paul Hobson_
Stringify BlockwiseDepDict mapping values when produces_keys=True (:pr:8972) Richard (Rick) Zamora_
Use DataFrameIOLayer in ``DataFrame.from_delayed`` (:pr:8852) Richard (Rick) Zamora`_
Check that values for the in predicate in read_parquet are correct (:pr:8846) Bryan Weber_
Fix bug for reduction of zero dimensional arrays (:pr:8930) Tom White_
Specify dtype when deciding division using np.linspace in read_sql_query (:pr:8940) Cheun Hong_

Deprecations ^^^^^^^^^^^^

Deprecate gather_statistics from read_parquet (:pr:8992) Richard (Rick) Zamora_
Change require_extension to top-level parquet_file_extension read_parquet kwarg (:pr:8935) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Update write_metadata_file discussion in documentation (:pr:8995) Richard (Rick) Zamora_
Update DataFrame.merge docstring (:pr:8966) Pavithra Eswaramoorthy_
Added description for parameter align_arrays in array.blockwise() (:pr:8977) ParticularMiner_
ecommend not to use map_block(drop_axis=...) on chunked axes of an array (:pr:8921) ParticularMiner_
Add copy button to code snippets in docs (:pr:8956) James Bourbeau_

Maintenance ^^^^^^^^^^^

Pandas 1.5.0 compatibility (:pr:8961) Ian Rose_
Add pytest-timeout to distributed envs on CI (:pr:8986) Julia Signell_
Improve read_parquet docstring formatting (:pr:8971) Bryan Weber_
Remove pytest.warns(None) (:pr:8924) Pavithra Eswaramoorthy_
Document Python 3.10 as supported (:pr:8976) Eray Aslan_
parse_timedelta option to enforce explicit unit (:pr:8969) crusaderky_
mypy compatibility (:pr:8854) Paul Hobson_
Add a docs page for Dask & Parquet (:pr:8899) Jim Crist-Harif_
Adds configuration to ignore revs in blame (:pr:8933) Bryan Weber_

.. _v2022.04.1:

2022.04.1

Released on April 15, 2022

New Features ^^^^^^^^^^^^

Add missing NumPy ufuncs: abs, left_shift, right_shift, positive. (:pr:8920) Tom White_

Enhancements ^^^^^^^^^^^^

Avoid collecting parquet metadata in pyarrow when write_metadata_file=False (:pr:8906) Richard (Rick) Zamora_
Better error for failed wildcard path in dd.read_csv() (fixes #8878) (:pr:8908) Roger Filmyer_
Return da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series (:pr:8558) Julia Signell_
Let get_dummies use meta computation in map_partitions (:pr:8898) Julia Signell_
Masked scalars input to da.from_array (:pr:8895) David Hassell_
Raise ValueError in merge_asof for duplicate kwargs (:pr:8861) Bryan Weber_

Bug Fixes ^^^^^^^^^

Make is_monotonic work when some partitions are empty (:pr:8897) Julia Signell_
Fix custom getter in da.from_array when inline_array=False (:pr:8903) Ian Rose_
Correctly handle dict-specification for rechunk. (:pr:8859) Richard_
Fix merge_asof: drop index column if left_on == right_on (:pr:8874) Gil Forsyth_

Deprecations ^^^^^^^^^^^^

Warn users that engine='auto' will change in future (:pr:8907) Jim Crist-Harif_
Remove pyarrow-legacy engine from parquet API (:pr:8835) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Add note on missing parameter out for dask.array.dot (:pr:8913) Francesco Andreuzzi_
Update DataFrame.query docstring (:pr:8890) Pavithra Eswaramoorthy_

Maintenance ^^^^^^^^^^^

Don't test da.prod on large integer data (:pr:8893) Jim Crist-Harif_
Add network marks to tests that fail without an internet connection (:pr:8881) Paul Hobson_
Fix gpuCI GHA version (:pr:8891) Charles Blackmon-Luca_
xfail/skip some flaky distributed tests (:pr:8887) Jim Crist-Harif_
Remove unused (deprecated) code from ArrowDatasetEngine (:pr:8885) Richard (Rick) Zamora_
Add mild typing to common utils functions, part 2 (:pr:8867) crusaderky_
Documentation of Limitation of sample() (:pr:8858) Nadiem Sissouno_

.. _v2022.04.0:

2022.04.0

Released on April 1, 2022

.. note::

This is the first release with support for Python 3.10

New Features ^^^^^^^^^^^^

Add Python 3.10 support (:pr:8566) James Bourbeau_

Enhancements ^^^^^^^^^^^^

Add check on dtype.itemsize in order to produce a useful error (:pr:8860) Davide Gavio_
Add mild typing to common utils functions (:pr:8848) Matthew Rocklin_
Add sanity checks to divisions setter (:pr:8806) Jim Crist-Harif_
Use Blockwise and map_partitions for more tasks (:pr:8831) Bryan Weber_

Bug Fixes ^^^^^^^^^

Fix dataframe.merge_asof to preserve right_on column (:pr:8857) Sarah Charlotte Johnson_
Fix "Buffer dtype mismatch" for pandas >= 1.3 on 32bit (:pr:8851) Ben Greiner_
Fix slicing fusion by altering SubgraphCallable getter (:pr:8827) Ian Rose_

Deprecations ^^^^^^^^^^^^

Remove support for PyPy (:pr:8863) James Bourbeau_
Drop setuptools at runtime (:pr:8855) crusaderky_
Remove dataframe.tseries.resample.getnanos (:pr:8834) Sarah Charlotte Johnson_

Documentation ^^^^^^^^^^^^^

Organize diagnostic and performance docs (:pr:8871) Naty Clementi_
Add image to explain drop_axis option of map_blocks (:pr:8868) ParticularMiner_

Maintenance ^^^^^^^^^^^

Update gpuCI RAPIDS_VER to 22.06 (:pr:8828)
Restore test_parquet in http (:pr:8850) Bryan Weber_
Simplify gpuCI updating workflow (:pr:8849) Charles Blackmon-Luca_

.. _v2022.03.0:

2022.03.0

Released on March 18, 2022

New Features ^^^^^^^^^^^^

Bag: add implementation for reservoir sampling (:pr:7636) Daniel Mesejo-León_
Add ma.count to Dask array (:pr:8785) David Hassell_
Change to_parquet default to compression="snappy" (:pr:8814) Jim Crist-Harif_
Add weights parameter to dask.array.reduction (:pr:8805) David Hassell_
Add ddf.compute_current_divisions to get divisions on a sorted index or column (:pr:8517) Julia Signell_

Enhancements ^^^^^^^^^^^^

Pass __name__ and __doc__ through on DelayedLeaf (:pr:8820) Leo Gao_
Raise exception for not implemented merge how option (:pr:8818) Naty Clementi_
Move Bag.map_partitions to Blockwise (:pr:8646) Richard (Rick) Zamora_
Improve error messages for malformed config files (:pr:8801) Jim Crist-Harif_
Revise column-projection optimization to capture common dask-sql patterns (:pr:8692) Richard (Rick) Zamora_
Useful error for empty divisions (:pr:8789) Pavithra Eswaramoorthy_
Scipy 1.8.0 compat: copy private classes into dask/array/stats.py (:pr:8694) Julia Signell_
Raise warning when using multiple types of schedulers where one is distributed (:pr:8700) Pedro Silva_

Bug Fixes ^^^^^^^^^

Fix bug in applying != filter in read_parquet (:pr:8824) Richard (Rick) Zamora_
Fix set_index when directly passed a dask Index (:pr:8680) Paul Hobson_
Quick fix for unbounded memory usage in tensordot (:pr:7980) Genevieve Buckley_
If hdf file is empty, don't fail on meta creation (:pr:8809) Julia Signell_
Update clone_key("x") to retain prefix (:pr:8792) crusaderky_
Fix "physical" column bug in pyarrow-based read_parquet (:pr:8775) Richard (Rick) Zamora_
Fix groupby.shift bug caused by unsorted partitions after shuffle (:pr:8782) kori73_
Fix serialization bug (:pr:8786) Richard (Rick) Zamora_

Deprecations ^^^^^^^^^^^^

Bump diagnostics bokeh dependency to 2.4.2 (:pr:8791) Charles Blackmon-Luca_
Deprecate bcolz support (:pr:8754) Pavithra Eswaramoorthy_
Finish making map_overlap default boundary kwarg 'none' (:pr:8743) Genevieve Buckley_

Documentation ^^^^^^^^^^^^^

Custom collection example docs fix (:pr:8807) Doug Davis_
Add Series.str, Series.dt, and Series.cat accessors to docs (:pr:8757) Sarah Charlotte Johnson_
Fix docstring for ddf.compute_current_divisions (:pr:8793) Julia Signell_
Dashboard docs on /status page (:pr:8648) Naty Clementi_
Clarify divisions kwarg in repartition docstring (:pr:8781) Sarah Charlotte Johnson_
Update Docker images to use ghcr.io (:pr:8774) Jacob Tomlinson_

Maintenance ^^^^^^^^^^^

Reduce gpuci pytest parallelism (:pr:8826) GALI PREM SAGAR_
absolufy-imports - No relative imports - PEP8 (:pr:8796) Julia Signell_
Tidy up assert_eq calls in array tests (:pr:8812) Julia Signell_
Avoid pytest.warns(None) (:pr:8718) LSturtew_
Fix test_describe_empty to work without global -Werror (:pr:8291) Michał Górny_
Temporarily xfail graphviz tests on windows (:pr:8794) Jim Crist-Harif_
Use packaging.parse for md5 compatibility (:pr:8763) James Bourbeau_
Make tokenize work in a FIPS 140-2 environment (:pr:8762) Jim Crist-Harif_
Label issues and PRs on open with 'needs triage' (:pr:8761) Julia Signell_
Add some extra test coverage (:pr:8302) lrjball_
Specify action version and change from pull_request_target to pull_request (:pr:8767) Julia Signell_
Make scheduler kwarg pass though to sub functions in da.assert_eq (:pr:8755) Julia Signell_

.. _v2022.02.1:

2022.02.1

Released on February 25, 2022

New Features ^^^^^^^^^^^^

Add aggregate functions first and last to dask.dataframe.pivot_table (:pr:8649) Knut Nordanger_
Add std() support for datetime64 dtype for pandas-like objects (:pr:8523) Ben Glossner_
Add materialized task counts to HighLevelGraph and Layer html reprs (:pr:8589) kori73_

Enhancements ^^^^^^^^^^^^

Do not allow iterating a DataFrameGroupBy (:pr:8696) Bryan Weber_
Fix missing newline after info() call on empty DataFrame (:pr:8727) Naty Clementi_
Add groupby.compute as a not implemented method (:pr:8734) Dranaxel_
Improve multi dataframe join performance (:pr:8740) Holden Karau_
Include bool type for Index (:pr:8732) Naty Clementi_
Allow ArrowDatasetEngine subclass to override pandas->arrow conversion also for partitioned write (:pr:8741) Joris Van den Bossche_
Increase performance of k-diagonal extraction in da.diag() and da.diagonal() (:pr:8689) ParticularMiner_
Change linspace creation to match numpy when num equal to 0 (:pr:8676) Peter_
Tokenize dataclasses (:pr:8557) Gabe Joseph_
Update tokenize to treat dict and kwargs differently (:pr:8655) James Bourbeau_

Bug Fixes ^^^^^^^^^

Fix bug in dask.array.roll() for roll-shifts that match the size of the input array (:pr:8723) ParticularMiner_
Fix for normalize_function dataclass methods (:pr:8527) Sarah Charlotte Johnson_
Fix rechunking with zero-size-chunks (:pr:8703) ParticularMiner_
Move creation of sqlalchemy connection for picklability (:pr:8745) Julia Signell_

Deprecations ^^^^^^^^^^^^

Drop Python 3.7 (:pr:8572) James Bourbeau_
Deprecate iteritems (:pr:8660) James Bourbeau_
Deprecate dataframe.tseries.resample.getnanos (:pr:8752) Sarah Charlotte Johnson_
Add deprecation warning for pyarrow-legacy engine (:pr:8758) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Update link typos in changelog (:pr:8717) James Bourbeau_
Clarify dask.visualize docstring (:pr:8710) Dranaxel_
Update Docker example to use current best practices (:pr:8731) Jacob Tomlinson_
Update docs to include distributed.Client.preload (:pr:8679) Bryan Weber_
Document monthly social meeting (:pr:8595) Thomas Grainger_
Add docs for Gen2 access with RBAC/ACL i.e. security principal (:pr:8748) Martin Thøgersen_
Use Dask configuration extension from dask-sphinx-theme (:pr:8751) Benjamin Zaitlen_

Maintenance ^^^^^^^^^^^

Unpin coverage in CI (:pr:8690) James Bourbeau_
Add manual trigger for running test suite (:pr:8716) James Bourbeau_
Xfail scheduler_HLG_unpack_import; flaky test (:pr:8724) Mike McCarty_
Temporarily remove scipy upstream CI build (:pr:8725) James Bourbeau_
Bump pre-release version to be greater than stable releases (:pr:8728) Charles Blackmon-Luca_
Move custom sort function logic to internal sort_values (:pr:8571) Charles Blackmon-Luca_
Pin cloudpickle and scipy in docs requirements (:pr:8737) Julia Signell_
Make the labeler not delete labels, and look for the docs at the right spot (:pr:8746) Julia Signell_
Fix docs build warnings (:pr:8432) Kristopher Overholt_
Update test status badge (:pr:8747) James Bourbeau_
Fix parquet test_pandas_timestamp_overflow_pyarrow test (:pr:8733) Joris Van den Bossche_
Only run PR builds on changes to relevant files (:pr:8756) Charles Blackmon-Luca_

.. _v2022.02.0:

2022.02.0

Released on February 11, 2022

.. note::

This is the last release with support for Python 3.7

New Features ^^^^^^^^^^^^

Add region to to_zarr when using existing array (:pr:8590) Chris Roat_
Add engine_kwargs support to dask.dataframe.to_sql (:pr:8609) Amir Kadivar_
Add include_path_column arg to read_json (:pr:8603) Bryan Weber_
Add expand_dims to Dask array (:pr:8687) Tom White_

Enhancements ^^^^^^^^^^^^

Add scheduler option to assert_eq utilities (:pr:8610) Xinrong Meng_
Fix eye inconsistency with NumPy for dtype=None (:pr:8685) Tom White_
Fix concatenate inconsistency with NumPy for axis=None (:pr:8686) Tom White_
Type annotations, part 1 (:pr:8295) crusaderky_
Really allow any iterable to be passed as a meta (:pr:8629) Julia Signell_
Use map_partitions (Blockwise) in to_parquet (:pr:8487) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

Result of reducing an array should not depend on its chunk-structure (:pr:8637) ParticularMiner_
Pass place-holder metadata to map_partitions in ACA code path (:pr:8643) Richard (Rick) Zamora_

Deprecations ^^^^^^^^^^^^

Deprecate is_monotonic (:pr:8653) James Bourbeau_
Remove some deprecations (:pr:8605) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Add Domino Data Lab to Hosted / managed Dask clusters (:pr:8675) Ray Bell_
Fix inter-linking and remove deprecated function (:pr:8715) Julia Signell_
Fix imbalanced backticks. (:pr:8693) Matthias Bussonnier_
Add documentation for high level graph visualization (:pr:8483) Genevieve Buckley_
Update documentation of ProgressBar out parameter (:pr:8604) Pedro Silva_
Improve documentation of dask.config.set (:pr:8705) crusaderky_
Revert mention to mypy among type checkers (:pr:8699) crusaderky_

Maintenance ^^^^^^^^^^^

Update warning handling in get_dummies tests (:pr:8651) James Bourbeau_
Add a github changelog template (:pr:8714) Julia Signell_
Update year in LICENSE.txt (:pr:8665) David Hoese_
Update pre-commit version (:pr:8691) James Bourbeau_
Include scipy in upstream CI build (:pr:8681) James Bourbeau_
Temporarily pin scipy < 1.8.0 in CI (:pr:8683) James Bourbeau_
Pin scipy to less than 1.8.0 in GPU CI (:pr:8698) Julia Signell_
Avoid pytest.warns(None) in test_multi.py (:pr:8678) James Bourbeau_
Update GHA concurrent job cancellation (:pr:8652) James Bourbeau_
Make test__get_paths robust to site.PREFIXES being set (:pr:8644) James Bourbeau_
Bump gpuCI PYTHON_VER to 3.9 (:pr:8642) Charles Blackmon-Luca_

.. _v2022.01.1:

2022.01.1

Released on January 28, 2022

New Features ^^^^^^^^^^^^

Add dask.dataframe.series.view() (:pr:8533) Pavithra Eswaramoorthy_

Enhancements ^^^^^^^^^^^^

Update tz for fastparquet + pandas 1.4.0 (:pr:8626) Martin Durant_
Cleaning up misc tests for pandas compat (:pr:8623) Julia Signell_
Moving to SQLAlchemy >= 1.4 (:pr:8158) McToel_
Pandas compat: Filter sparse warnings (:pr:8621) Julia Signell_
Fail if meta is not a pandas object (:pr:8563) Julia Signell_
Use fsspec.parquet module for better remote-storage read_parquet performance (:pr:8339) Richard (Rick) Zamora_
Move DataFrame ACA aggregations to HLG (:pr:8468) Richard (Rick) Zamora_
Add optional information about originating function call in DataFrameIOLayer (:pr:8453) Richard (Rick) Zamora_
Blockwise array creation redux (:pr:7417) Ian Rose_
Refactor config default search path retrieval (:pr:8573) James Bourbeau_
Add optimize_graph flag to Bag.to_dataframe function (:pr:8486) Maxim Lippeveld_
Make sure that delayed output operations still return lists of paths (:pr:8498) Julia Signell_
Pandas compat: Fix to_frame name to not pass None (:pr:8554) Julia Signell_
Pandas compat: Fix axis=None warning (:pr:8555) Julia Signell_
Expand Dask YAML config search directories (:pr:8531) abergou_

Bug Fixes ^^^^^^^^^

Fix groupby.cumsum with series grouped by index (:pr:8588) Julia Signell_
Fix derived_from for pandas methods (:pr:8612) Thomas J. Fan_
Enforce boolean ascending for sort_values (:pr:8440) Charles Blackmon-Luca_
Fix parsing of __setitem__ indices (:pr:8601) David Hassell_
Avoid divide by zero in slicing (:pr:8597) Doug Davis_

Deprecations ^^^^^^^^^^^^

Downgrade meta error in (:pr:8563) to warning (:pr:8628) Julia Signell_
Pandas compat: Deprecate append when pandas >= 1.4.0 (:pr:8617) Julia Signell_

Documentation ^^^^^^^^^^^^^

Replace outdated columns argument with meta in DataFrame constructor (:pr:8614) kori73_
Refactor deploying docs (:pr:8602) Jacob Tomlinson_

Maintenance ^^^^^^^^^^^

Pin coverage in CI (:pr:8631) James Bourbeau_
Move cached_cumsum imports to be from dask.utils (:pr:8606) James Bourbeau_
Update gpuCI RAPIDS_VER to 22.04 (:pr:8600)
Update cocstring for from_delayed function (:pr:8576) Kirito1397_
Handle plot_width / plot_height deprecations (:pr:8544) Bryan Van de Ven_
Remove unnecessary pyyaml importorskip (:pr:8562) James Bourbeau_
Specify scheduler in DataFrame assert_eq (:pr:8559) Gabe Joseph_

.. _v2022.01.0:

2022.01.0

Released on January 14, 2022

New Features ^^^^^^^^^^^^

Add groupby.shift method (:pr:8522) kori73_
Add DataFrame.nunique (:pr:8479) Sarah Charlotte Johnson_
Add da.ndim to match np.ndim (:pr:8502) Julia Signell_

Enhancements ^^^^^^^^^^^^

Only show percentile interpolation= keyword warning if NumPy version >= 1.22 (:pr:8564) Julia Signell_
Raise PerformanceWarning when limit and "array.slicing.split-large-chunks" are None (:pr:8511) Julia Signell_
Define normalize_seq function at import time (:pr:8521) Illviljan_
Ensure that divisions are alway tuples (:pr:8393) Charles Blackmon-Luca_
Allow a callable scheduler for bag.groupby (:pr:8492) Julia Signell_
Save Zarr arrays with dask-on-ray scheduler (:pr:8472) TnTo_
Make byte blocks more even in read_bytes (:pr:8459) Martin Durant_
Improved the efficiency of matmul() by completely removing concatenation (:pr:8423) ParticularMiner_
Limit max chunk size when reshaping dask arrays (:pr:8124) Genevieve Buckley_
Changes for fastparquet superthrift (:pr:8470) Martin Durant_

Bug Fixes ^^^^^^^^^

Fix boolean indices in array assignment (:pr:8538) David Hassell_
Detect default dtype on array-likes (:pr:8501) aeisenbarth_
Fix optimize_blockwise bug for duplicate dependency names (:pr:8542) Richard (Rick) Zamora_
Update warnings for DataFrame.GroupBy.apply and transform (:pr:8507) Sarah Charlotte Johnson_
Track HLG layer name in Delayed (:pr:8452) Gabe Joseph_
Fix single item nanmin and nanmax reductions (:pr:8484) Julia Signell_
Make read_csv with comment kwarg work even if there is a comment in the header (:pr:8433) Julia Signell_

Deprecations ^^^^^^^^^^^^

Replace interpolation with method and method with internal_method (:pr:8525) Julia Signell_
Remove daily stock demo utility (:pr:8477) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Add a join example in docs that be run with copy/paste (:pr:8520) kori73_
Mention dashboard link in config (:pr:8510) Ray Bell_
Fix changelog section hyperlinks (:pr:8534) Aneesh Nema_
Hyphenate "single-machine scheduler" for consistency (:pr:8519) Deepyaman Datta_
Normalize whitespace in doctests in slicing.py (:pr:8512) Maren Westermann_
Best practices storage line typo (:pr:8529) Michael Delgado_
Update figures (:pr:8401) Sarah Charlotte Johnson_
Remove pyarrow-only reference from split_row_groups in read_parquet docstring (:pr:8490) Naty Clementi_

Maintenance ^^^^^^^^^^^

Remove obsolete LocalFileSystem tests that fail for fsspec>=2022.1.0 (:pr:8565) Richard (Rick) Zamora_
Tweak: "RuntimeWarning: invalid value encountered in reciprocal" (:pr:8561) crusaderky_
Fix skipna=None for DataFrame.sem (:pr:8556) Julia Signell_
Fix PANDAS_GT_140 (:pr:8552) Julia Signell_
Collections with HLG must always implement __dask_layers__ (:pr:8548) crusaderky_
Work around race condition in import llvmlite (:pr:8550) crusaderky_
Set a minimum version for pyyaml (:pr:8545) Gaurav Sheni_
Adding nodefaults to environments to fix tiledb + mac issue (:pr:8505) Julia Signell_
Set ceiling for setuptools (:pr:8509) Julia Signell_
Add workflow / recipe to generate Dask nightlies (:pr:8469) Charles Blackmon-Luca_
Bump gpuCI CUDA_VER to 11.5 (:pr:8489) Charles Blackmon-Luca_

.. _v2021.12.0:

2021.12.0

Released on December 10, 2021

New Features ^^^^^^^^^^^^

Add Series and Index is_monotonic* methods (:pr:8304) Daniel Mesejo-León_

Enhancements ^^^^^^^^^^^^

Blockwise map_partitions with partition_info (:pr:8310) Gabe Joseph_
Better error message for length of array with unknown chunk sizes (:pr:8436) Doug Davis_
Use by instead of index internally on the Groupby class (:pr:8441) Julia Signell_
Allow custom sort functions for sort_values (:pr:8345) Charles Blackmon-Luca_
Add warning to read_parquet when statistics and partitions are misaligned (:pr:8416) Richard (Rick) Zamora_
Support where argument in ufuncs (:pr:8253) mihir_
Make visualize more consistent with compute (:pr:8328) JSKenyon_

Bug Fixes ^^^^^^^^^

Fix map_blocks not using own arguments in name generation (:pr:8462) David Hoese_
Fix for index error with reading empty parquet file (:pr:8410) Sarah Charlotte Johnson_
Fix nullable-dtype error when writing partitioned parquet data (:pr:8400) Richard (Rick) Zamora_
Fix CSV header bug (:pr:8413) Richard (Rick) Zamora_
Fix empty chunk causes exception in nanmin/nanmax (:pr:8375) Boaz Mohar_

Deprecations ^^^^^^^^^^^^

Deprecate token keyword argument to map_blocks (:pr:8464) James Bourbeau_
Deprecation warning for default value of boundary kwarg in map_overlap (:pr:8397) Genevieve Buckley_

Documentation ^^^^^^^^^^^^^

Clarify block_info documentation (:pr:8425) Genevieve Buckley_
Output from alt text sprint (:pr:8456) Sarah Charlotte Johnson_
Update talks and presentations (:pr:8370) Naty Clementi_
Update Anaconda link in "Paid support" section of docs (:pr:8427) Martin Durant_
Fixed broken dask-gateway link in ecosystem.rst (:pr:8424) ofirr_
Fix CuPy doctest error (:pr:8412) Genevieve Buckley_

Maintenance ^^^^^^^^^^^

Bump Bokeh min version to 2.1.1 (:pr:8431) Bryan Van de Ven_
Fix following fsspec=2021.11.1 release (:pr:8428) Martin Durant_
Add dask/ml.py to pytest exclude list (:pr:8414) Genevieve Buckley_
Update gpuCI RAPIDS_VER to 22.02 (:pr:8394)
Unpin graphviz and improve package management in environment-3.7 (:pr:8411) Julia Signell_

.. _v2021.11.2:

2021.11.2

Released on November 19, 2021

Only run gpuCI bump script daily (:pr:8404) Charles Blackmon-Luca_
Actually ignore index when asked in assert_eq (:pr:8396) Gabe Joseph_
Ensure single-partition join divisions is tuple (:pr:8389) Charles Blackmon-Luca_
Try to make divisions behavior clearer (:pr:8379) Julia Signell_
Fix typo in set_index partition_size parameter description (:pr:8384) FredericOdermatt_
Use blockwise in single_partition_join (:pr:8341) Gabe Joseph_
Use more explicit keyword arguments (:pr:8354) Boaz Mohar_
Fix .loc of DataFrame with nullable boolean dtype (:pr:8368) Marco Rossi_
Parameterize shuffle implementation in tests (:pr:8250) Ian Rose_
Remove some doc build warnings (:pr:8369) Boaz Mohar_
Include properties in array API docs (:pr:8356) Julia Signell_
Fix Zarr for upstream (:pr:8367) Julia Signell_
Pin graphviz to avoid issue with windows and Python 3.7 (:pr:8365) Julia Signell_
Import graphviz.Diagraph from top of module, not from dot (:pr:8363) Julia Signell_

.. _v2021.11.1:

2021.11.1

Released on November 8, 2021

Patch release to update distributed dependency to version 2021.11.1.

.. _v2021.11.0:

2021.11.0

Released on November 5, 2021

Fx required_extension behavior in read_parquet (:pr:8351) Richard (Rick) Zamora_
Add align_dataframes to map_partitions to broadcast a dataframe passed as an arg (:pr:6628) Julia Signell_
Better handling for arrays/series of keys in dask.dataframe.loc (:pr:8254) Julia Signell_
Point users to Discourse (:pr:8332) Ian Rose_
Add name_function option to to_parquet (:pr:7682) Matthew Powers_
Get rid of environment-latest.yml and update to Python 3.9 (:pr:8275) Julia Signell_
Require newer s3fs in CI (:pr:8336) James Bourbeau_
Groupby Rolling (:pr:8176) Julia Signell_
Add more ordering diagnostics to dask.visualize (:pr:7992) Erik Welch_
Use HighLevelGraph optimizations for delayed (:pr:8316) Ian Rose_
demo_tuples produces malformed HighLevelGraph (:pr:8325) crusaderky_
Dask calendar should show events in local time (:pr:8312) Genevieve Buckley_
Fix flaky test_interrupt (:pr:8314) crusaderky_
Deprecate AxisError (:pr:8305) crusaderky_
Fix name of cuDF in extension documentation. (:pr:8311) Vyas Ramasubramani_
Add single eq operator (=) to parquet filters (:pr:8300) Ayush Dattagupta_
Improve support for Spark output in read_parquet (:pr:8274) Richard (Rick) Zamora_
Add dask.ml module (:pr:6384) Matthew Rocklin_
CI fixups (:pr:8298) James Bourbeau_
Make slice errors match NumPy (:pr:8248) Julia Signell_
Fix API docs misrendering with new sphinx theme (:pr:8296) Julia Signell_
Replace block property with blockview for array-like operations on blocks (:pr:8242) Davis Bennett_
Deprecate file_path and make it possible to save from within a notebook (:pr:8283) Julia Signell_

.. _v2021.10.0:

2021.10.0

Released on October 22, 2021

da.store to create well-formed HighLevelGraph (:pr:8261) crusaderky_
CI: force nightly pyarrow in the upstream build (:pr:8281) Joris Van den Bossche_
Remove chest (:pr:8279) James Bourbeau_
Skip doctests if optional dependencies are not installed (:pr:8258) Genevieve Buckley_
Update tmpdir and tmpfile context manager docstrings (:pr:8270) Daniel Mesejo-León_
Unregister callbacks in doctests (:pr:8276) James Bourbeau_
Fix typo in docs (:pr:8277) JoranDox_
Stale label GitHub action (:pr:8244) Genevieve Buckley_
Client-shutdown method appears twice (:pr:8273) German Shiklov_
Add pre-commit to test requirements (:pr:8257) Genevieve Buckley_
Refactor read_metadata in fastparquet engine (:pr:8092) Richard (Rick) Zamora_
Support Path objects in from_zarr (:pr:8266) Samuel Gaist_
Make nested redirects work (:pr:8272) Julia Signell_
Set memory_usage to True if verbose is True in info (:pr:8222) Kinshuk Dua_
Remove individual API doc pages from sphinx toctree (:pr:8238) James Bourbeau_
Ignore whitespace in gufunc signature (:pr:8267) James Bourbeau_
Add workflow to update gpuCI (:pr:8215) Charles Blackmon-Luca_
DataFrame.head shouldn't warn when there's one partition (:pr:8091) Pankaj Patil_
Ignore arrow doctests if pyarrow not installed (:pr:8256) Genevieve Buckley_
Fix debugging.html redirect (:pr:8251) James Bourbeau_
Fix null sorting for single partition dataframes (:pr:8225) Charles Blackmon-Luca_
Fix setup.html redirect (:pr:8249) Florian Jetter_
Run pyupgrade in CI (:pr:8246) crusaderky_
Fix label typo in upstream CI build (:pr:8237) James Bourbeau_
Add support for "dependent" columns in DataFrame.assign (:pr:8086) Suriya Senthilkumar_
add NumPy array of Dask keys to Array (:pr:7922) Davis Bennett_
Remove unnecessary dask.multiprocessing import in docs (:pr:8240) Ray Bell_
Adjust retrieving _max_workers from Executor (:pr:8228) John A Kirkham_
Update function signatures in delayed best practices docs (:pr:8231) Vũ Trung Đức_
Docs reoganization (:pr:7984) Julia Signell_
Fix df.quantile on all missing data (:pr:8129) Julia Signell_
Add tokenize.ensure-deterministic config option (:pr:7413) Hristo Georgiev_
Use inclusive rather than closed with pandas>=1.4.0 and pd.date_range (:pr:8213) Julia Signell_
Add dask-gateway, Coiled, and Saturn-Cloud to list of Dask setup tools (:pr:7814) Kristopher Overholt_
Ensure existing futures get passed as deps when serializing HighLevelGraph layers (:pr:8199) Jim Crist-Harif_
Make sure that the divisions of the single partition merge is left (:pr:8162) Julia Signell_
Refactor read_metadata in pyarrow parquet engines (:pr:8072) Richard (Rick) Zamora_
Support negative drop_axis in map_blocks and map_overlap (:pr:8192) Gregory R. Lee_
Fix upstream tests (:pr:8205) Julia Signell_
Add support for scalar item assignment by Series (:pr:8195) Charles Blackmon-Luca_
Add some basic examples to doc strings on dask.bag all, any, count methods (:pr:7630) Nathan Danielsen_
Don't have upstream report depend on commit message (:pr:8202) James Bourbeau_
Ensure upstream CI cron job runs (:pr:8200) James Bourbeau_
Use pytest.param to properly label param-specific GPU tests (:pr:8197) Charles Blackmon-Luca_
Add test_set_index to tests ran on gpuCI (:pr:8198) Charles Blackmon-Luca_
Suppress tmpfile OSError (:pr:8191) James Bourbeau_
Use s.isna instead of pd.isna(s) in set_partitions_pre (fix cudf CI) (:pr:8193) Charles Blackmon-Luca_
Open an issue for test-upstream failures (:pr:8067) Wallace Reis_
Fix to_parquet bug in call to pyarrow.parquet.read_metadata (:pr:8186) Richard (Rick) Zamora_
Add handling for null values in sort_values (:pr:8167) Charles Blackmon-Luca_
Bump RAPIDS_VER for gpuCI (:pr:8184) Charles Blackmon-Luca_
Dispatch walks MRO for lazily registered handlers (:pr:8185) Jim Crist-Harif_
Configure SSHCluster instructions (:pr:8181) Ray Bell_
Preserve HighLevelGraphs in DataFrame.from_delayed (:pr:8174) Gabe Joseph_
Deprecate inplace argument for Dask series renaming (:pr:8136) Marcel Coetzee_
Fix rolling for compatibility with pandas > 1.3.0 (:pr:8150) Julia Signell_
Raise error when setitem on unknown chunks (:pr:8166) Julia Signell_
Include divisions when doing Index.to_series (:pr:8165) Julia Signell_

.. _v2021.09.1:

2021.09.1

Released on September 21, 2021

Fix groupby for future pandas (:pr:8151) Julia Signell_
Remove warning filters in tests that are no longer needed (:pr:8155) Julia Signell_
Add link to diagnostic visualize function in local diagnostic docs (:pr:8157) David Hoese_
Add datetime_is_numeric to dataframe.describe (:pr:7719) Julia Signell_
Remove references to pd.Int64Index in anticipation of deprecation (:pr:8144) Julia Signell_
Use loc if needed for series __get_item__ (:pr:7953) Julia Signell_
Specifically ignore warnings on mean for empty slices (:pr:8125) Julia Signell_
Skip groupby nunique test for pandas >= 1.3.3 (:pr:8142) Julia Signell_
Implement ascending arg for sort_values (:pr:8130) Charles Blackmon-Luca_
Replace operator.getitem (:pr:8015) Naty Clementi_
Deprecate zero_broadcast_dimensions and homogeneous_deepmap (:pr:8134) SnkSynthesis_
Add error if drop_index is negative (:pr:8064) neel iyer_
Allow scheduler to be an Executor (:pr:8112) John A Kirkham_
Handle asarray/asanyarray cases where like is a dask.Array (:pr:8128) Peter Andreas Entschev_
Fix index_col duplication if index_col is type str (:pr:7661) McToel_
Add dtype and order to asarray and asanyarray definitions (:pr:8106) Julia Signell_
Deprecate dask.dataframe.Series.__contains__ (:pr:7914) Julia Signell_
Fix edge case with like-arrays in _wrapped_qr (:pr:8122) Peter Andreas Entschev_
Deprecate boundary_slice kwarg: kind for pandas compat (:pr:8037) Julia Signell_

.. _v2021.09.0:

2021.09.0

Released on September 3, 2021

Fewer open files (:pr:7303) Julia Signell_
Add FileNotFound to expected http errors (:pr:8109) Martin Durant_
Add DataFrame.sort_values to API docs (:pr:8107) Benjamin Zaitlen_
Change to dask.order: be more eager at times (:pr:7929) Erik Welch_
Add pytest color to CI (:pr:8090) James Bourbeau_
FIX: make_people works with processes scheduler (:pr:8103) Dahn_
Adds deep param to Dataframe copy method and restrict it to False (:pr:8068) João Paulo Lacerda_
Fix typo in configuration docs (:pr:8104) Robert Hales_
Update formatting in DataFrame.query docstring (:pr:8100) James Bourbeau_
Un-xfail sparse tests for 0.13.0 release (:pr:8102) James Bourbeau_
Add axes property to DataFrame and Series (:pr:8069) Jordan Jensen_
Add CuPy support in da.unique (values only) (:pr:8021) Peter Andreas Entschev_
Unit tests for sparse.zeros_like (xfailed) (:pr:8093) crusaderky_
Add explicit like kwarg support to array creation functions (:pr:8054) Peter Andreas Entschev_
Separate Array and DataFrame mindeps builds (:pr:8079) James Bourbeau_
Fork out percentile_dispatch to dask.array (:pr:8083) GALI PREM SAGAR_
Ensure filepath exists in to_parquet (:pr:8057) James Bourbeau_
Update scheduler plugin usage in test_scheduler_highlevel_graph_unpack_import (:pr:8080) James Bourbeau_
Add DataFrame.shuffle to API docs (:pr:8076) Martin Fleischmann_
Order requirements alphabetically (:pr:8073) John A Kirkham_

.. _v2021.08.1:

2021.08.1

Released on August 20, 2021

Add ignore_metadata_file option to read_parquet (pyarrow-dataset and fastparquet support only) (:pr:8034) Richard (Rick) Zamora_
Add reference to pytest-xdist in dev docs (:pr:8066) Julia Signell_
Include tz in meta from to_datetime (:pr:8000) Julia Signell_
CI Infra Docs (:pr:7985) Benjamin Zaitlen_
Include invalid DataFrame key in assert_eq check (:pr:8061) James Bourbeau_
Use __class__ when creating DataFrames (:pr:8053) Mads R. B. Kristensen_
Use development version of distributed in gpuCI build (:pr:7976) James Bourbeau_
Ignore whitespace when gufunc signature (:pr:8049) James Bourbeau_
Move pandas import and percentile dispatch refactor (:pr:8055) GALI PREM SAGAR_
Add colors to represent high level layer types (:pr:7974) Freyam Mehta_
Upstream instance fix (:pr:8060) Jacob Tomlinson_
Add dask.widgets and migrate HTML reprs to jinja2 (:pr:8019) Jacob Tomlinson_
Remove wrap_func_like_safe, not required with NumPy >= 1.17 (:pr:8052) Peter Andreas Entschev_
Fix threaded scheduler memory backpressure regression (:pr:8040) David Hoese_
Add percentile dispatch (:pr:8029) GALI PREM SAGAR_
Use a publicly documented attribute obj in groupby rather than private _selected_obj (:pr:8038) GALI PREM SAGAR_
Specify module to import rechunk from (:pr:8039) Illviljan_
Use dict to store data for {nan,}arg{min,max} in certain cases (:pr:8014) Peter Andreas Entschev_
Fix blocksize description formatting in read_pandas (:pr:8047) Louis Maddox_
Fix "point" -> "pointers" typo in docs (:pr:8043) David Chudzicki_

.. _v2021.08.0:

2021.08.0

Released on August 13, 2021

Fix to_orc delayed compute behavior (:pr:8035) Richard (Rick) Zamora_
Don't convert to low-level task graph in compute_as_if_collection (:pr:7969) James Bourbeau_
Fix multifile read for hdf (:pr:8033) Julia Signell_
Resolve warning in distributed tests (:pr:8025) James Bourbeau_
Update to_orc collection name (:pr:8024) James Bourbeau_
Resolve skipfooter problem (:pr:7855) Ross_
Raise NotImplementedError for non-indexable arg passed to to_datetime (:pr:7989) Doug Davis_
Ensure we error on warnings from distributed (:pr:8002) James Bourbeau_
Added dict format in to_bag accessories of DataFrame (:pr:7932) gurunath_
Delayed docs indirect dependencies (:pr:8016) aa1371_
Add tooltips to graphviz high-level graphs (:pr:7973) Freyam Mehta_
Close 2021 User Survey (:pr:8007) Julia Signell_
Reorganize CuPy tests into multiple files (:pr:8013) Peter Andreas Entschev_
Refactor and Expand Dask-Dataframe ORC API (:pr:7756) Richard (Rick) Zamora_
Don't enforce columns if enforce=False (:pr:7916) Julia Signell_
Fix map_overlap trimming behavior when drop_axis is not None (:pr:7894) Gregory R. Lee_
Mark gpuCI CuPy test as flaky (:pr:7994) Peter Andreas Entschev_
Avoid using Delayed in to_csv and to_parquet (:pr:7968) Matthew Rocklin_
Removed redundant check_dtypes (:pr:7952) gurunath_
Use pytest.warns instead of raises for checking parquet engine deprecation (:pr:7993) Joris Van den Bossche_
Bump RAPIDS_VER in gpuCI to 21.10 (:pr:7991) Charles Blackmon-Luca_
Add back pyarrow-legacy test coverage for pyarrow>=5 (:pr:7988) Richard (Rick) Zamora_
Allow pyarrow>=5 in to_parquet and read_parquet (:pr:7967) Richard (Rick) Zamora_
Skip CuPy tests requiring NEP-35 when NumPy < 1.20 is available (:pr:7982) Peter Andreas Entschev_
Add tail and head to SeriesGroupby (:pr:7935) Daniel Mesejo-León_
Update Zoom link for monthly meeting (:pr:7979) James Bourbeau_
Add gpuCI build script (:pr:7966) Charles Blackmon-Luca_
Deprecate daily_stock utility (:pr:7949) James Bourbeau_
Add distributed.nanny to configuration reference docs (:pr:7955) James Bourbeau_
Require NumPy 1.18+ & Pandas 1.0+ (:pr:7939) John A Kirkham_

.. _v2021.07.2:

2021.07.2

Released on July 30, 2021

.. note::

This is the last release with support for NumPy 1.17 and pandas 0.25. Beginning with the next release, NumPy 1.18 and pandas 1.0 will be the minimum supported versions.

Add dask.array SVG to the HTML Repr (:pr:7886) Freyam Mehta_
Avoid use of Delayed in to_parquet (:pr:7958) Matthew Rocklin_
Temporarily pin pyarrow<5 in CI (:pr:7960) James Bourbeau_
Add deprecation warning for top-level ucx and rmm config values (:pr:7956) James Bourbeau_
Remove skips from doctests (4 of 6) (:pr:7865) Zhengnan Zhao_
Remove skips from doctests (5 of 6) (:pr:7864) Zhengnan Zhao_
Adds missing prepend/append functionality to da.diff (:pr:7946) Peter Andreas Entschev_
Change graphviz font family to sans (:pr:7931) Freyam Mehta_
Fix read-csv name - when path is different, use different name for task (:pr:7942) Julia Signell_
Update configuration reference for ucx and rmm changes (:pr:7943) James Bourbeau_
Add meta support to __setitem__ (:pr:7940) Peter Andreas Entschev_
NEP-35 support for slice_with_int_dask_array (:pr:7927) Peter Andreas Entschev_
Unpin fastparquet in CI (:pr:7928) James Bourbeau_
Remove skips from doctests (3 of 6) (:pr:7872) Zhengnan Zhao_

.. _v2021.07.1:

2021.07.1

Released on July 23, 2021

Make array assert_eq check dtype (:pr:7903) Julia Signell_
Remove skips from doctests (6 of 6) (:pr:7863) Zhengnan Zhao_
Remove experimental feature warning from actors docs (:pr:7925) Matthew Rocklin_
Remove skips from doctests (2 of 6) (:pr:7873) Zhengnan Zhao_
Separate out Array and Bag API (:pr:7917) Julia Signell_
Implement lazy Array.__iter__ (:pr:7905) Julia Signell_
Clean up places where we inadvertently iterate over arrays (:pr:7913) Julia Signell_
Add numeric_only kwarg to DataFrame reductions (:pr:7831) Julia Signell_
Add pytest marker for GPU tests (:pr:7876) Charles Blackmon-Luca_
Add support for histogram2d in dask.array (:pr:7827) Doug Davis_
Remove skips from doctests (1 of 6) (:pr:7874) Zhengnan Zhao_
Add node size scaling to the Graphviz output for the high level graphs (:pr:7869) Freyam Mehta_
Update old Bokeh links (:pr:7915) Bryan Van de Ven_
Temporarily pin fastparquet in CI (:pr:7907) James Bourbeau_
Add dask.array import to progress bar docs (:pr:7910) Fabian Gebhart_
Use separate files for each DataFrame API function and method (:pr:7890) Julia Signell_
Fix pyarrow-dataset ordering bug (:pr:7902) Richard (Rick) Zamora_
Generalize unique aggregate (:pr:7892) GALI PREM SAGAR_
Raise NotImplementedError when using pd.Grouper (:pr:7857) Ruben van de Geer_
Add aggregate_files argument to enable multi-file partitions in read_parquet (:pr:7557) Richard (Rick) Zamora_
Un-xfail test_daily_stock (:pr:7895) James Bourbeau_
Update access configuration docs (:pr:7837) Naty Clementi_
Use packaging for version comparisons (:pr:7820) Elliott Sales de Andrade_
Handle infinite loops in merge_asof (:pr:7842) gerrymanoim_

.. _v2021.07.0:

2021.07.0

Released on July 9, 2021

Include fastparquet in upstream CI build (:pr:7884) James Bourbeau_
Blockwise: handle non-string constant dependencies (:pr:7849) Mads R. B. Kristensen_
fastparquet now supports new time types, including ns precision (:pr:7880) Martin Durant_
Avoid ParquetDataset API when appending in ArrowDatasetEngine (:pr:7544) Richard (Rick) Zamora_
Add retry logic to test_shuffle_priority (:pr:7879) Richard (Rick) Zamora_
Use strict channel priority in CI (:pr:7878) James Bourbeau_
Support nested dask.distributed imports (:pr:7866) Matthew Rocklin_
Should check module name only, not the entire directory filepath (:pr:7856) Genevieve Buckley_
Updates due to https://github.com/dask/fastparquet/pull/623 (:pr:7875) Martin Durant_
da.eye fix for chunks=-1 (:pr:7854) Naty Clementi_
Temporarily xfail test_daily_stock (:pr:7858) James Bourbeau_
Set priority annotations in SimpleShuffleLayer (:pr:7846) Richard (Rick) Zamora_
Blockwise: stringify constant key inputs (:pr:7838) Mads R. B. Kristensen_
Allow mixing dask and numpy arrays in @guvectorize (:pr:6863) Julia Signell_
Don't sample dict result of a shuffle group when calculating its size (:pr:7834) Florian Jetter_
Fix scipy tests (:pr:7841) Julia Signell_
Deterministically tokenize datetime.date (:pr:7836) James Bourbeau_
Add sample_rows to read_csv-like (:pr:7825) Martin Durant_
Fix typo in config.deserialize docstring (:pr:7830) Geoffrey Lentner_
Remove warning filter in test_dataframe_picklable (:pr:7822) James Bourbeau_
Improvements to histogramdd (for handling inputs that are sequences-of-arrays). (:pr:7634) Doug Davis_
Make PY_VERSION private (:pr:7824) James Bourbeau_

.. _v2021.06.2:

2021.06.2

Released on June 22, 2021

layers.py compare parts_out with set(self.parts_out) (:pr:7787) Genevieve Buckley_
Make check_meta understand pandas dtypes better (:pr:7813) Julia Signell_
Remove "Educational Resources" doc page (:pr:7818) James Bourbeau_

.. _v2021.06.1:

2021.06.1

Released on June 18, 2021

Replace funding page with 'Supported By' section on dask.org (:pr:7817) James Bourbeau_
Add initial deprecation utilities (:pr:7810) James Bourbeau_
Enforce dtype conservation in ufuncs that explicitly use dtype= (:pr:7808) Doug Davis_
Add Coiled to list of paid support organizations (:pr:7811) Kristopher Overholt_
Small tweaks to the HTML repr for Layer & HighLevelGraph (:pr:7812) Genevieve Buckley_
Add dark mode support to HLG HTML repr (:pr:7809) Jacob Tomlinson_
Remove compatibility entries for old distributed (:pr:7801) Elliott Sales de Andrade_
Implementation of HTML repr for HighLevelGraph layers (:pr:7763) Genevieve Buckley_
Update default blockwise token to avoid DataFrame column name clash (:pr:6546) James Bourbeau_
Use dispatch concat for merge_asof (:pr:7806) Julia Signell_
Fix upstream freq tests (:pr:7795) Julia Signell_
Use more context managers from the standard library (:pr:7796) James Bourbeau_
Simplify skips in parquet tests (:pr:7802) Elliott Sales de Andrade_
Remove check for outdated bokeh (:pr:7804) Elliott Sales de Andrade_
More test coverage uploads (:pr:7799) James Bourbeau_
Remove ImportError catching from dask/__init__.py (:pr:7797) James Bourbeau_
Allow DataFrame.join() to take a list of DataFrames to merge with (:pr:7578) Krishan Bhasin_
Fix maximum recursion depth exception in dask.array.linspace (:pr:7667) Daniel Mesejo-León_
Fix docs links (:pr:7794) Julia Signell_
Initial da.select() implementation and test (:pr:7760) Gabriel Miretti_
Layers must implement get_output_keys method (:pr:7790) Genevieve Buckley_
Don't include or expect freq in divisions (:pr:7785) Julia Signell_
A HighLevelGraph abstract layer for map_overlap (:pr:7595) Genevieve Buckley_
Always include kwarg name in drop (:pr:7784) Julia Signell_
Only rechunk for median if needed (:pr:7782) Julia Signell_
Add add_(prefix|suffix) to DataFrame and Series (:pr:7745) tsuga_
Move read_hdf to Blockwise (:pr:7625) Richard (Rick) Zamora_
Make Layer.get_output_keys officially an abstract method (:pr:7775) Genevieve Buckley_
Non-dask-arrays and broadcasting in ravel_multi_index (:pr:7594) Gabe Joseph_
Fix for paths ending with "/" in parquet overwrite (:pr:7773) Martin Durant_
Fixing calling .visualize() with filename=None (:pr:7740) Freyam Mehta_
Generate unique names for SubgraphCallable (:pr:7637) Bruce Merry_
Pin fsspec to 2021.5.0 in CI (:pr:7771) James Bourbeau_
Evaluate graph lazily if meta is provided in from_delayed (:pr:7769) Florian Jetter_
Add meta support for DatetimeTZDtype (:pr:7627) gerrymanoim_
Add dispatch label to automatic PR labeler (:pr:7701) James Bourbeau_
Fix HDFS tests (:pr:7752) Julia Signell_

.. _v2021.06.0:

2021.06.0

Released on June 4, 2021

Remove abstract tokens from graph keys in rewrite_blockwise (:pr:7721) Richard (Rick) Zamora_
Ensure correct column order in csv project_columns (:pr:7761) Richard (Rick) Zamora_
Renamed inner loop variables to avoid duplication (:pr:7741) Boaz Mohar_
Do not return delayed object from to_zarr (:pr:7738) Chris Roat
Array: correct number of outputs in apply_gufunc (:pr:7669) Gabe Joseph_
Rewrite da.fromfunction with da.blockwise (:pr:7704) John A Kirkham_
Rename make_meta_util to make_meta (:pr:7743) GALI PREM SAGAR_
Repartition before shuffle if the requested partitions are less than input partitions (:pr:7715) Vibhu Jawa_
Blockwise: handle constant key inputs (:pr:7734) Mads R. B. Kristensen_
Added raise to apply_gufunc (:pr:7744) Boaz Mohar_
Show failing tests summary in CI (:pr:7735) Genevieve Buckley_
sizeof sets in Python 3.9 (:pr:7739) Mads R. B. Kristensen_
Warn if using pandas datetimelike string in dataframe.__getitem__ (:pr:7749) Julia Signell_
Highlight the client.dashboard_link (:pr:7747) Genevieve Buckley_
Easier link for subscribing to the Google calendar (:pr:7733) Genevieve Buckley_
Automatically show graph visualization in Jupyter notebooks (:pr:7716) Genevieve Buckley_
Add autofunction for unify_chunks in API docs (:pr:7730) James Bourbeau_

.. _v2021.05.1:

2021.05.1

Released on May 28, 2021

Pandas compatibility (:pr:7712) Julia Signell_
Fix optimize_dataframe_getitem bug (:pr:7698) Richard (Rick) Zamora_
Update make_meta import in docs (:pr:7713) Benjamin Zaitlen_
Implement da.searchsorted (:pr:7696) Tom White_
Fix format string in error message (:pr:7706) Jiaming Yuan_
Fix read_sql_table returning wrong result for single column loads (:pr:7572) c-thiel_
Add slack join link in support.rst (:pr:7679) Naty Clementi_
Remove unused alphabet variable (:pr:7700) James Bourbeau_
Fix meta creation incase of object (:pr:7586) GALI PREM SAGAR_
Add dispatch for union_categoricals (:pr:7699) GALI PREM SAGAR_
Consolidate array Dispatch objects (:pr:7505) James Bourbeau_
Move DataFrame dispatch.registers to their own file (:pr:7503) Julia Signell_
Fix delayed with dataclasses where init=False (:pr:7656) Julia Signell_
Allow a column to be named divisions (:pr:7605) Julia Signell_
Stack nd array with unknown chunks (:pr:7562) Chris Roat_
Promote the 2021 Dask User Survey (:pr:7694) Genevieve Buckley_
Fix typo in DataFrame.set_index() (:pr:7691) James Lamb_
Cleanup array API reference links (:pr:7684) David Hoese_
Accept axis tuple for flip to be consistent with NumPy (:pr:7675) Andrew Champion_
Bump pre-commit hook versions (:pr:7676) James Bourbeau_
Cleanup to_zarr docstring (:pr:7683) David Hoese_
Fix the docstring of read_orc (:pr:7678) Justus Magin_
Doc ipyparallel & mpi4py concurrent.futures (:pr:7665) John A Kirkham_
Update tests to support CuPy 9 (:pr:7671) Peter Andreas Entschev_
Fix some HighLevelGraph documentation inaccuracies (:pr:7662) Mads R. B. Kristensen_
Fix spelling in Series getitem error message (:pr:7659) Maisie Marshall_

.. _v2021.05.0:

2021.05.0

Released on May 14, 2021

Remove deprecated kind kwarg to comply with pandas 1.3.0 (:pr:7653) Julia Signell_
Fix bug in DataFrame column projection (:pr:7645) Richard (Rick) Zamora_
Merge global annotations when packing (:pr:7565) Mads R. B. Kristensen_
Avoid inplace= in pandas set_categories (:pr:7633) James Bourbeau_
Change the active-fusion default to False for Dask-Dataframe (:pr:7620) Richard (Rick) Zamora_
Array: remove extraneous code from RandomState (:pr:7487) Gabe Joseph_
Implement str.concat when others=None (:pr:7623) Daniel Mesejo-León_
Fix dask.dataframe in sandboxed environments (:pr:7601) Noah D. Brenowitz_
Support for cupyx.scipy.linalg (:pr:7563) Benjamin Zaitlen_
Move timeseries and daily-stock to Blockwise (:pr:7615) Richard (Rick) Zamora_
Fix bugs in broadcast join (:pr:7617) Richard (Rick) Zamora_
Use Blockwise for DataFrame IO (parquet, csv, and orc) (:pr:7415) Richard (Rick) Zamora_
Adding chunk & type information to Dask HighLevelGraph s (:pr:7309) Genevieve Buckley_
Add pyarrow sphinx intersphinx_mapping (:pr:7612) Ray Bell_
Remove skip on test freq (:pr:7608) Julia Signell_
Defaults in read_parquet parameters (:pr:7567) Ray Bell_
Remove ignore_abc_warning (:pr:7606) Julia Signell_
Harden DataFrame merge between column-selection and index (:pr:7575) Richard (Rick) Zamora_
Get rid of ignore_abc decorator (:pr:7604) Julia Signell_
Remove kwarg validation for bokeh (:pr:7597) Julia Signell_
Add loky example (:pr:7590) Naty Clementi_
Delayed: nout when arguments become tasks (:pr:7593) Gabe Joseph_
Update distributed version in mindep CI build (:pr:7602) James Bourbeau_
Support all or no overlap between partition columns and real columns (:pr:7541) Richard (Rick) Zamora_

.. _v2021.04.1:

2021.04.1

Released on April 23, 2021

Handle Blockwise HLG pack/unpack for concatenate=True (:pr:7455) Richard (Rick) Zamora_
map_partitions: use tokenized info as name of the SubgraphCallable (:pr:7524) Mads R. B. Kristensen_
Using tmp_path and tmpdir to avoid temporary files and directories hanging in the repo (:pr:7592) Naty Clementi_
Contributing to docs (development guide) (:pr:7591) Naty Clementi_
Add more packages to Python 3.9 CI build (:pr:7588) James Bourbeau_
Array: Fix NEP-18 dispatching in finalize (:pr:7508) Gabe Joseph_
Misc fixes for numpydoc (:pr:7569) Matthias Bussonnier_
Avoid pandas level= keyword deprecation (:pr:7577) James Bourbeau_
Map e.g. .repartition(freq="M") to .repartition(freq="MS") (:pr:7504) Ruben van de Geer_
Remove hash seeding in parallel CI runs (:pr:7128) Elliott Sales de Andrade_
Add defaults in parameters in to_parquet (:pr:7564) Ray Bell_
Simplify transpose axes cleanup (:pr:7561) Julia Signell_
Make ValueError in len(index_names) > 1 explicit it's using fastparquet (:pr:7556) Ray Bell_
Fix dict-column appending for pyarrow parquet engines (:pr:7527) Richard (Rick) Zamora_
Add a documentation auto label (:pr:7560) Doug Davis_
Add dask.delayed.Delayed to docs so it can be referenced by other sphinx docs (:pr:7559) Doug Davis_
Fix upstream idxmaxmin for uneven split_every (:pr:7538) Julia Signell_
Make normalize_token for pandas Series/DataFrame future proof (no direct block access) (:pr:7318) Joris Van den Bossche_
Redesigned __setitem__ implementation (:pr:7393) David Hassell_
histogram, histogramdd improvements (docs; return consistencies) (:pr:7520) Doug Davis_
Force nightly pyarrow in the upstream build (:pr:7530) Joris Van den Bossche_
Fix Configuration Reference (:pr:7533) Benjamin Zaitlen_
Use .to_parquet on dask.dataframe in doc string (:pr:7528) Ray Bell_
Avoid double msgpack serialization of HLGs (:pr:7525) Mads R. B. Kristensen_
Encourage usage of yaml.safe_load() in configuration doc (:pr:7529) Hristo Georgiev_
Fix reshape bug. Add relevant test. Fixes #7171. (:pr:7523) JSKenyon_
Support custom_metadata= argument in to_parquet (:pr:7359) Richard (Rick) Zamora_
Clean some documentation warnings (:pr:7518) Daniel Mesejo-León_
Getting rid of more docs warnings (:pr:7426) Julia Signell_
Added product (alias of prod) (:pr:7517) Freyam Mehta_
Fix upstream __array_ufunc__ tests (:pr:7494) Julia Signell_
Escape from map_overlap to map_blocks if depth is zero (:pr:7481) Genevieve Buckley_
Add check_type to array assert_eq (:pr:7491) Julia Signell_

.. _v2021.04.0:

2021.04.0

Released on April 2, 2021

Adding support for multidimensional histograms with dask.array.histogramdd (:pr:7387) Doug Davis_
Update docs on number of threads and workers in default LocalCluster (:pr:7497) cameron16_
Add labels automatically when certain files are touched in a PR (:pr:7506) Julia Signell_
Extract ignore_order from kwargs (:pr:7500) GALI PREM SAGAR_
Only provide installation instructions when distributed is missing (:pr:7498) Matthew Rocklin_
Start adding isort (:pr:7370) Julia Signell_
Add ignore_order parameter in dd.concat (:pr:7473) Daniel Mesejo-León_
Use powers-of-two when displaying RAM (:pr:7484) crusaderky_
Added License Classifier (:pr:7485) Tom Augspurger_
Replace conda with mamba (:pr:7227) crusaderky_
Fix typo in array docs (:pr:7478) James Lamb_
Use concurrent.futures in local scheduler (:pr:6322) John A Kirkham_

.. _v2021.03.1:

2021.03.1

Released on March 26, 2021

Add a dispatch for is_categorical_dtype to handle non-pandas objects (:pr:7469) brandon-b-miller_
Use multiprocessing.Pool in test_read_text (:pr:7472) John A Kirkham_
Add missing meta kwarg to gufunc class (:pr:7423) Peter Andreas Entschev_
Example for memory-mapped Dask array (:pr:7380) Dieter Weber_
Fix NumPy upstream failures xfail pandas and fastparquet failures (:pr:7441) Julia Signell_
Fix bug in repartition with freq (:pr:7357) Ruben van de Geer_
Fix __array_function__ dispatching for tril/triu (:pr:7457) Peter Andreas Entschev_
Use concurrent.futures.Executors in a few tests (:pr:7429) John A Kirkham_
Require NumPy >=1.16 (:pr:7383) crusaderky_
Minor sort_values housekeeping (:pr:7462) Ryan Williams_
Ensure natural sort order in parquet part paths (:pr:7249) Ryan Williams_
Remove global env mutation upon running test_config.py (:pr:7464) Hristo Georgiev_
Update NumPy intersphinx URL (:pr:7460) Gabe Joseph_
Add rot90 (:pr:7440) Trevor Manz_
Update docs for required package for endpoint (:pr:7454) Nick Vazquez_
Master -> main in slice_array docstring (:pr:7453) Gabe Joseph_
Expand dask.utils.is_arraylike docstring (:pr:7445) Doug Davis_
Simplify BlockwiseIODeps importing (:pr:7420) Richard (Rick) Zamora_
Update layer annotation packing method (:pr:7430) James Bourbeau_
Drop duplicate test in test_describe_empty (:pr:7431) John A Kirkham_
Add Series.dot method to dataframe module (:pr:7236) Madhu94_
Added df kurtosis-method and testing (:pr:7273) Jan Borchmann_
Avoid quadratic-time performance for HLG culling (:pr:7403) Bruce Merry_
Temporarily skip problematic sparse test (:pr:7421) James Bourbeau_
Update some CI workflow names (:pr:7422) James Bourbeau_
Fix HDFS test (:pr:7418) Julia Signell_
Make changelog subtitles match the hierarchy (:pr:7419) Julia Signell_
Add support for normalize in value_counts (:pr:7342) Julia Signell_
Avoid unnecessary imports for HLG Layer unpacking and materialization (:pr:7381) Richard (Rick) Zamora_
Bincount fix slicing (:pr:7391) Genevieve Buckley_
Add sliding_window_view (:pr:7234) Deepak Cherian_
Fix typo in docs/source/develop.rst (:pr:7414) Hristo Georgiev_
Switch documentation builds for PRs to readthedocs (:pr:7397) James Bourbeau_
Adds sort_values to dask.DataFrame (:pr:7286) gerrymanoim_
Pin sqlalchemy<1.4.0 in CI (:pr:7405) James Bourbeau_
Comment fixes (:pr:7215) Ryan Williams_
Dead code removal / fixes (:pr:7388) Ryan Williams_
Use single thread for pa.Table.from_pandas calls (:pr:7347) Richard (Rick) Zamora_
Replace 'container' with 'image' (:pr:7389) James Lamb_
DOC hyperlink repartition (:pr:7394) Ray Bell_
Pass delimiter to fsspec in bag.read_text (:pr:7349) Martin Durant_
Update read_hdf default mode to "r" (:pr:7039) rs9w33_
Embed literals in SubgraphCallable when packing Blockwise (:pr:7353) Mads R. B. Kristensen_
Update test_hdf.py to not reuse file handlers (:pr:7044) rs9w33_
Require additional dependencies: cloudpickle, partd, fsspec, toolz (:pr:7345) Julia Signell_
Prepare Blockwise + IO infrastructure (:pr:7281) Richard (Rick) Zamora_
Remove duplicated imports from test_slicing.py (:pr:7365) Hristo Georgiev_
Add test deps for pip development (:pr:7360) Julia Signell_
Support int slicing for non-NumPy arrays (:pr:7364) Peter Andreas Entschev_
Automatically cancel previous CI builds (:pr:7348) James Bourbeau_
dask.array.asarray should handle case where xarray class is in top-level namespace (:pr:7335) Tom White_
HighLevelGraph length without materializing layers (:pr:7274) Gabe Joseph_
Drop support for Python 3.6 (:pr:7006) James Bourbeau_
Fix fsspec usage in create_metadata_file (:pr:7295) Richard (Rick) Zamora_
Change default branch from master to main (:pr:7198) Julia Signell_
Add Xarray to CI software environment (:pr:7338) James Bourbeau_
Update repartition argument name in error text (:pr:7336) Eoin Shanaghy_
Run upstream tests based on commit message (:pr:7329) James Bourbeau_
Use pytest.register_assert_rewrite on util modules (:pr:7278) Bruce Merry_
Add example on using specific chunk sizes in from_array() (:pr:7330) James Lamb_
Move NumPy skip into test (:pr:7247) Julia Signell_

.. _v2021.03.0:

2021.03.0

Released on March 5, 2021

.. note::

This is the first release with support for Python 3.9 and the
last release with support for Python 3.6

Bump minimum version of distributed (:pr:7328) James Bourbeau_
Fix percentiles_summary with dask_cudf (:pr:7325) Peter Andreas Entschev_
Temporarily revert recent Array.__setitem__ updates (:pr:7326) James Bourbeau_
Blockwise.clone (:pr:7312) crusaderky_
NEP-35 duck array update (:pr:7321) James Bourbeau_
Don't allow setting .name for array (:pr:7222) Julia Signell_
Use nearest interpolation for creating percentiles of integer input (:pr:7305) Kyle Barron_
Test exp with CuPy arrays (:pr:7322) John A Kirkham_
Check that computed chunks have right size and dtype (:pr:7277) Bruce Merry_
pytest.mark.flaky (:pr:7319) crusaderky_
Contributing docs: add note to pull the latest git tags before pip installing Dask (:pr:7308) Genevieve Buckley_
Support for Python 3.9 (:pr:7289) crusaderky_
Add broadcast-based merge implementation (:pr:7143) Richard (Rick) Zamora_
Add split_every to graph_manipulation (:pr:7282) crusaderky_
Typo in optimize docs (:pr:7306) Julius Busecke_
dask.graph_manipulation support for xarray.Dataset (:pr:7276) crusaderky_
Add plot width and height support for Bokeh 2.3.0 (:pr:7297) James Bourbeau_
Add NumPy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from (:pr:6997) Illviljan_
Remove "cleanup" task in DataFrame on-disk shuffle (:pr:7260) Sinclair Target_
Use development version of distributed in CI (:pr:7279) James Bourbeau_
Moving high level graph pack/unpack Dask (:pr:7179) Mads R. B. Kristensen_
Improve performance of merge_percentiles (:pr:7172) Ashwin Srinath_
DOC: add dask-sql and fugue (:pr:7129) Ray Bell_
Example for working with categoricals and parquet (:pr:7085) McToel_
Adds tree reduction to bincount (:pr:7183) Thomas J. Fan_
Improve documentation of name in from_array (:pr:7264) Bruce Merry_
Fix cumsum for empty partitions (:pr:7230) Julia Signell_
Add map_blocks example to dask array creation docs (:pr:7221) Julia Signell_
Fix performance issue in dask.graph_manipulation.wait_on() (:pr:7258) crusaderky_
Replace coveralls with codecov.io (:pr:7246) crusaderky_
Pin to a particular black rev in pre-commit (:pr:7256) Julia Signell_
Minor typo in documentation: array-chunks.rst (:pr:7254) Magnus Nord_
Fix bugs in Blockwise and ShuffleLayer (:pr:7213) Richard (Rick) Zamora_
Fix parquet filtering bug for "pyarrow-dataset" with pyarrow-3.0.0 (:pr:7200) Richard (Rick) Zamora_
graph_manipulation without NumPy (:pr:7243) crusaderky_
Support for NEP-35 (:pr:6738) Peter Andreas Entschev_
Avoid running unit tests during doctest CI build (:pr:7240) James Bourbeau_
Run doctests on CI (:pr:7238) Julia Signell_
Cleanup code quality on set arithmetics (:pr:7196) crusaderky_
Add dask.array.delete (:pr:7125) Julia Signell_
Unpin graphviz now that new conda-forge recipe is built (:pr:7235) Julia Signell_
Don't use NumPy 1.20 from conda-forge on Mac (:pr:7211) crusaderky_
map_overlap: Don't rechunk axes without overlap (:pr:7233) Deepak Cherian_
Pin graphviz to avoid issue with latest conda-forge build (:pr:7232) Julia Signell_
Use html_css_files in docs for custom CSS (:pr:7220) James Bourbeau_
Graph manipulation: clone, bind, checkpoint, wait_on (:pr:7109) crusaderky_
Fix handling of filter expressions in parquet pyarrow-dataset engine (:pr:7186) Joris Van den Bossche_
Extend __setitem__ to more closely match numpy (:pr:7033) David Hassell_
Clean up Python 2 syntax (:pr:7195) crusaderky_
Fix regression in Delayed._length (:pr:7194) crusaderky_
__dask_layers__() tests and tweaks (:pr:7177) crusaderky_
Properly convert HighLevelGraph in multiprocessing scheduler (:pr:7191) Jim Crist-Harif_
Don't fail fast in CI (:pr:7188) James Bourbeau_

.. _v2021.02.0:

2021.02.0

Released on February 5, 2021

Add percentile support for NEP-35 (:pr:7162) Peter Andreas Entschev_
Added support for Float64 in column assignment (:pr:7173) Nils Braun_
Coarsen rechunking error (:pr:7127) Davis Bennett_
Fix upstream CI tests (:pr:6896) Julia Signell_
Revise HighLevelGraph Mapping API (:pr:7160) crusaderky_
Update low-level graph spec to use any hashable for keys (:pr:7163) James Bourbeau_
Generically rebuild a collection with different keys (:pr:7142) crusaderky_
Make easier to link issues in PRs (:pr:7130) Ray Bell_
Add dask.array.append (:pr:7146) D-Stacks_
Allow dask.array.ravel to accept array_like argument (:pr:7138) D-Stacks_
Fixes link in array design doc (:pr:7152) Thomas J. Fan_
Fix example of using blockwise for an outer product (:pr:7119) Bruce Merry_
Deprecate HighlevelGraph.dicts in favor of .layers (:pr:7145) Amit Kumar_
Align FastParquetEngine with pyarrow engines (:pr:7091) Richard (Rick) Zamora_
Merge annotations (:pr:7102) Ian Rose_
Simplify contents of parts list in read_parquet (:pr:7066) Richard (Rick) Zamora_
check_meta(): use __class__ when checking DataFrame types (:pr:7099) Mads R. B. Kristensen_
Cache several properties (:pr:7104) Illviljan_
Fix parquet getitem optimization (:pr:7106) Richard (Rick) Zamora_
Add cytoolz back to CI environment (:pr:7103) James Bourbeau_

.. _v2021.01.1:

2021.01.1

Released on January 22, 2021

Partially fix cumprod (:pr:7089) Julia Signell_
Test pandas 1.1.x / 1.2.0 releases and pandas nightly (:pr:6996) Joris Van den Bossche_
Use assign to avoid SettingWithCopyWarning (:pr:7092) Julia Signell_
'mode' argument passed to bokeh.output_file() (:pr:7034) (:pr:7075) patquem_
Skip empty partitions when doing groupby.value_counts (:pr:7073) Julia Signell_
Add error messages to assert_eq() (:pr:7083) James Lamb_
Make cached properties read-only (:pr:7077) Illviljan_

.. _v2021.01.0:

2021.01.0

Released on January 15, 2021

map_partitions with review comments (:pr:6776) Kumar Bharath Prabhu_
Make sure that population is a real list (:pr:7027) Julia Signell_
Propagate storage_options in read_csv (:pr:7074) Richard (Rick) Zamora_
Remove all BlockwiseIO code (:pr:7067) Richard (Rick) Zamora_
Fix CI (:pr:7069) James Bourbeau_
Add option to control rechunking in reshape (:pr:6753) Tom Augspurger_
Fix linalg.lstsq for complex inputs (:pr:7056) Johnnie Gray_
Add compression='infer' default to read_csv (:pr:6960) Richard (Rick) Zamora_
Revert parameter changes in svd_compressed #7003 (:pr:7004) Eric Czech_
Skip failing s3 test (:pr:7064) Martin Durant_
Revert BlockwiseIO (:pr:7048) Richard (Rick) Zamora_
Add some cross-references to DataFrame.to_bag() and Series.to_bag() (:pr:7049) Rob Malouf_
Rewrite matmul as blockwise without contraction/concatenate (:pr:7000) Rafal Wojdyla_
Use functools.cached_property in da.shape (:pr:7023) Illviljan_
Use meta value in series non_empty (:pr:6976) Julia Signell_
Revert "Temporarly pin sphinx version to 3.3.1 (:pr:7002)" (:pr:7014) Rafal Wojdyla_
Revert python-graphviz pinning (:pr:7037) Julia Signell_
Accidentally committed print statement (:pr:7038) Julia Signell_
Pass dropna and observed in agg (:pr:6992) Julia Signell_
Add index to meta after .str.split with expand (:pr:7026) Ruben van de Geer_
CI: test pyarrow 2.0 and nightly (:pr:7030) Joris Van den Bossche_
Temporarily pin python-graphviz in CI (:pr:7031) James Bourbeau_
Underline section in numpydoc (:pr:7013) Matthias Bussonnier_
Keep normal optimizations when adding custom optimizations (:pr:7016) Matthew Rocklin_
Temporarily pin sphinx version to 3.3.1 (:pr:7002) Rafal Wojdyla_
DOC: Misc formatting (:pr:6998) Matthias Bussonnier_
Add inline_array option to from_array (:pr:6773) Tom Augspurger_
Revert "Initial pass at blockwise array creation routines (:pr:6931)" (:pr:6995) James Bourbeau`_
Set npartitions in set_index (:pr:6978) Julia Signell_
Upstream config serialization and inheritance (:pr:6987) Jacob Tomlinson_
Bump the minimum time in test_minimum_time (:pr:6988) Martin Durant_
Fix pandas dtype inference for read_parquet (:pr:6985) Richard (Rick) Zamora_
Avoid data loss in set_index with sorted=True (:pr:6980) Richard (Rick) Zamora_
Bugfix in read_parquet for handling un-named indices with index=False (:pr:6969) Richard (Rick) Zamora_
Use __class__ when comparing meta data (:pr:6981) Mads R. B. Kristensen_
Comparing string versions won't always work (:pr:6979) Rafal Wojdyla_
Fix :pr:6925 (:pr:6982) sdementen_
Initial pass at blockwise array creation routines (:pr:6931) Ian Rose_
Simplify has_parallel_type() (:pr:6927) Mads R. B. Kristensen_
Handle annotation unpacking in BlockwiseIO (:pr:6934) Simon Perkins_
Avoid deprecated yield_fixture in test_sql.py (:pr:6968) Richard (Rick) Zamora_
Remove bad graph logic in BlockwiseIO (:pr:6933) Richard (Rick) Zamora_
Get config item if variable is None (:pr:6862) Jacob Tomlinson_
Update from_pandas docstring (:pr:6957) Richard (Rick) Zamora_
Prevent fuse_roots from clobbering annotations (:pr:6955) Simon Perkins_

.. _v2020.12.0:

2020.12.0

Released on December 10, 2020

Highlights ^^^^^^^^^^

Switched to CalVer <https://calver.org/>_ for versioning scheme.
Introduced new APIs for HighLevelGraph to enable sending high-level representations of task graphs to the distributed scheduler.
Introduced new HighLevelGraph layer objects including BasicLayer, Blockwise, BlockwiseIO, ShuffleLayer, and more.
Added support for applying custom Layer-level annotations like priority, retries, etc. with the dask.annotations context manager.
Updated minimum supported version of pandas to 0.25.0 and NumPy to 1.15.1.
Support for the pyarrow.dataset API to read_parquet.
Several fixes to Dask Array's SVD.

All changes ^^^^^^^^^^^

Make observed kwarg optional (:pr:6952) Julia Signell_
Min supported pandas 0.25.0 numpy 1.15.1 (:pr:6895) Julia Signell_
Make order of categoricals unambiguous (:pr:6949) Julia Signell_
Improve "pyarrow-dataset" statistics performance for read_parquet (:pr:6918) Richard (Rick) Zamora_
Add observed keyword to groupby (:pr:6854) Julia Signell_
Make sure include_path_column works when there are multiple partitions per file (:pr:6911) Julia Signell_
Fix: array.overlap and array.map_overlap block sizes are incorrect when depth is an unsigned bit type (:pr:6909) GFleishman_
Fix syntax error in HLG docs example (:pr:6946) Mark_
Return a Bag from sample (:pr:6941) Shang Wang_
Add ravel_multi_index (:pr:6939) Illviljan_
Enable parquet metadata collection in parallel (:pr:6921) Richard (Rick) Zamora_
Avoid using _file in progressbar if it is None (:pr:6938) Mark Harfouche_
Add Zarr to upstream CI build (:pr:6932) James Bourbeau_
Introduce BlockwiseIO layer (:pr:6878) Richard (Rick) Zamora_
Transmit Layer Annotations to Scheduler (:pr:6889) Simon Perkins_
Update opportunistic caching page to remove experimental warning (:pr:6926) Timost_
Allow pyarrow >2.0.0 (:pr:6772) Richard (Rick) Zamora_
Support pyarrow.dataset API for read_parquet (:pr:6534) Richard (Rick) Zamora_
Add more informative error message to da.coarsen when coarsening factors do not divide shape (:pr:6908) Davis Bennett_
Only run the cron CI on dask/dask not forks (:pr:6905) Jacob Tomlinson_
Add annotations to ShuffleLayers (:pr:6913) Matthew Rocklin_
Temporarily xfail test_from_s3 (:pr:6915) James Bourbeau_
Added dataframe skew method (:pr:6881) Jan Borchmann_
Fix dtype in array meta (:pr:6893) Julia Signell_
Missing name arg in helm install ... (:pr:6903) Ruben van de Geer_
Fix: exception when reading an item with filters (:pr:6901) Martin Durant_
Add support for cupyx sparse to dask.array.dot (:pr:6846) Akira Naruse_
Pin array mindeps up a bit to get the tests to pass [test-mindeps] (:pr:6894) Julia Signell_
Update/remove pandas and numpy in mindeps (:pr:6888) Julia Signell_
Fix ArrowEngine bug in use of clear_known_categories (:pr:6887) Richard (Rick) Zamora_
Fix documentation about task scheduler (:pr:6879) Zhengnan Zhao_
Add human relative time formatting utility (:pr:6883) Jacob Tomlinson_
Possible fix for 6864 set_index issue (:pr:6866) Richard (Rick) Zamora_
BasicLayer: remove dependency arguments (:pr:6859) Mads R. B. Kristensen_
Serialization of Blockwise (:pr:6848) Mads R. B. Kristensen_
Address columns=[] bug (:pr:6871) Richard (Rick) Zamora_
Avoid duplicate parquet schema communication (:pr:6841) Richard (Rick) Zamora_
Add create_metadata_file utility for existing parquet datasets (:pr:6851) Richard (Rick) Zamora_
Improve ordering for workloads with a common terminus (:pr:6779) Tom Augspurger_
Stringify utilities (:pr:6852) Mads R. B. Kristensen_
Add keyword overwrite=True to to_parquet to remove dangling files when overwriting a pyarrow Dataset. (:pr:6825) Greg Hayes_
Removed map_tasks() and map_basic_layers() (:pr:6853) Mads R. B. Kristensen_
Introduce QR iteration to svd_compressed (:pr:6813) RogerMoens_
__dask_distributed_pack__() now takes a client argument (:pr:6850) Mads R. B. Kristensen_
Use map_partitions instead of delayed in set_index (:pr:6837) Mads R. B. Kristensen_
Add doc hit for as_completed().update(futures) (:pr:6817) manuels_
Bump GHA setup-miniconda version (:pr:6847) Jacob Tomlinson_
Remove nans when setting sorted index (:pr:6829) Rockwell Weiner_
Fix transpose of u in SVD (:pr:6799) RogerMoens_
Migrate to GitHub Actions (:pr:6794) Jacob Tomlinson_
Fix sphinx currentmodule usage (:pr:6839) James Bourbeau_
Fix minimum dependencies CI builds (:pr:6838) James Bourbeau_
Avoid graph materialization during Blockwise culling (:pr:6815) Richard (Rick) Zamora_
Fixed typo (:pr:6834) Devanshu Desai_
Use HighLevelGraph.merge in collections_to_dsk (:pr:6836) Mads R. B. Kristensen_
Respect dtype in svd compression_matrix #2849 (:pr:6802) RogerMoens_
Add blocksize to task name (:pr:6818) Julia Signell_
Check for all-NaN partitions (:pr:6821) Rockwell Weiner_
Change "institutional" SQL doc section to point to main SQL doc (:pr:6823) Martin Durant_
Fix: DataFrame.join doesn't accept Series as other (:pr:6809) David Katz_
Remove to_delayed operations from to_parquet (:pr:6801) Richard (Rick) Zamora_
Layer annotation docstrings improvements (:pr:6806) Simon Perkins_
Avro reader (:pr:6780) Martin Durant_
Rechunk array if smallest chunk size is smaller than depth (:pr:6708) Julia Signell_
Add Layer Annotations (:pr:6767) Simon Perkins_
Add "view code" links to documentation (:pr:6793) manuels_
Add optional IO-subgraph to Blockwise Layers (:pr:6715) Richard (Rick) Zamora_
Add high level graph pack/unpack for distributed (:pr:6786) Mads R. B. Kristensen_
Add missing methods of the Dataframe API (:pr:6789) Stephannie Jimenez Gacha_
Add doc on managing environments (:pr:6778) Martin Durant_
HLG: get_all_external_keys() (:pr:6774) Mads R. B. Kristensen_
Avoid rechunking in reshape with chunksize=1 (:pr:6748) Tom Augspurger_
Try to make categoricals work on join (:pr:6205) Julia Signell_
Fix some minor typos and trailing whitespaces in array-slice.rst (:pr:6771) Magnus Nord_
Bugfix for parquet metadata writes of empty dataframe partitions (pyarrow) (:pr:6741) Callum Noble_
Document meta kwarg in map_blocks and map_overlap. (:pr:6763) Peter Andreas Entschev_
Begin experimenting with parallel prefix scan for cumsum and cumprod (:pr:6675) Erik Welch_
Clarify differences in boolean indexing between dask and numpy arrays (:pr:6764) Illviljan_
Efficient serialization of shuffle layers (:pr:6760) James Bourbeau_
Config array optimize to skip fusion and return a HLG (:pr:6751) Mads R. B. Kristensen_
Temporarily use pyarrow<2 in CI (:pr:6759) James Bourbeau_
Fix meta for min/max reductions (:pr:6736) Peter Andreas Entschev_
Add 2D possibility to da.linalg.lstsq - mirroring numpy (:pr:6749) Pascal Bourgault_
CI: Fixed bug causing flaky test failure in pivot (:pr:6752) Tom Augspurger_
Serialization of layers (:pr:6693) Mads R. B. Kristensen_
Add attrs property to Series/Dataframe (:pr:6742) Illviljan_
Removed Mutable Default Argument (:pr:6747) Mads R. B. Kristensen_
Adjust parquet ArrowEngine to allow more easy subclass for writing (:pr:6505) Joris Van den Bossche_
Add ShuffleStage HLG Layer (:pr:6650) Richard (Rick) Zamora_
Handle literal in meta_from_array (:pr:6731) Peter Andreas Entschev_
Do balanced rechunking even if chunks are the same (:pr:6735) Chris Roat_
Fix docstring DataFrame.set_index (:pr:6739) Gil Forsyth_
Ensure HighLevelGraph layers always contain Layer instances (:pr:6716) James Bourbeau_
Map on HighLevelGraph Layers (:pr:6689) Mads R. B. Kristensen_
Update overlap *_like function calls and CuPy tests (:pr:6728) Peter Andreas Entschev_
Fixes for svd with __array_function__ (:pr:6727) Peter Andreas Entschev_
Added doctest extension for documentation (:pr:6397) Jim Circadian_
Minor fix to #5628 using @pentschev's suggestion (:pr:6724) John A Kirkham_
Change type of Dask array when meta type changes (:pr:5628) Matthew Rocklin_
Add az (:pr:6719) Ray Bell_
HLG: get_dependencies() of single keys (:pr:6699) Mads R. B. Kristensen_
Revert "Revert "Use HighLevelGraph layers everywhere in collections (:pr:6510)" (:pr:6697)" (:pr:6707) Tom Augspurger_
Allow *_like array creation functions to respect input array type (:pr:6680) Genevieve Buckley_
Update dask-sphinx-theme version (:pr:6700) Gil Forsyth_

.. _v2.30.0 / 2020-10-06:

2.30.0 / 2020-10-06

Array ^^^^^

Allow rechunk to evenly split into N chunks (:pr:6420) Scott Sievert_

.. _v2.29.0 / 2020-10-02:

2.29.0 / 2020-10-02

Array ^^^^^

_repr_html_: color sides darker instead of drawing all the lines (:pr:6683) Julia Signell_
Removes warning from nanstd and nanvar (:pr:6667) Thomas J. Fan_
Get shape of output from original array - map_overlap (:pr:6682) Julia Signell_
Replace np.searchsorted with bisect in indexing (:pr:6669) Joachim B Haga_

Bag ^^^

Make sure subprocesses have a consistent hash for bag groupby (:pr:6660) Itamar Turner-Trauring_

Core ^^^^

Revert "Use HighLevelGraph layers everywhere in collections (:pr:6510)" (:pr:6697) Tom Augspurger_
Use pandas.testing (:pr:6687) John A Kirkham_
Improve 128-bit floating-point skip in tests (:pr:6676) Elliott Sales de Andrade_

DataFrame ^^^^^^^^^

Allow setting dataframe items using a bool dataframe (:pr:6608) Julia Signell_

Documentation ^^^^^^^^^^^^^

Fix typo (:pr:6692) garanews_
Fix a few typos (:pr:6678) Pav A_

.. _v2.28.0 / 2020-09-25:

2.28.0 / 2020-09-25

Array ^^^^^

Partially reverted changes to Array indexing that produces large changes. This restores the behavior from Dask 2.25.0 and earlier, with a warning when large chunks are produced. A configuration option is provided to avoid creating the large chunks, see :ref:array.slicing.efficiency. (:pr:6665) Tom Augspurger_
Add meta to to_dask_array (:pr:6651) Kyle Nicholson_
Fix :pr:6631 and :pr:6611 (:pr:6632) Rafal Wojdyla_
Infer object in array reductions (:pr:6629) Daniel Saxton_
Adding v_based flag for svd_flip (:pr:6658) Eric Czech_
Fix flakey array mean (:pr:6656) Sam Grayson_

Core ^^^^

Removed dsk equality check from SubgraphCallable.__eq__ (:pr:6666) Mads R. B. Kristensen_
Use HighLevelGraph layers everywhere in collections (:pr:6510) Mads R. B. Kristensen_
Adds hash dunder method to SubgraphCallable for caching purposes (:pr:6424) Andrew Fulton_
Stop writing commented out config files by default (:pr:6647) Matthew Rocklin_

DataFrame ^^^^^^^^^

Add support for collect list aggregation via agg API (:pr:6655) Madhur Tandon_
Slightly better error message (:pr:6657) Julia Signell_

.. _v2.27.0 / 2020-09-18:

2.27.0 / 2020-09-18

Array ^^^^^

Preserve dtype in svd (:pr:6643) Eric Czech_

Core ^^^^

store(): create a single HLG layer (:pr:6601) Mads R. B. Kristensen_
Add pre-commit CI build (:pr:6645) James Bourbeau_
Update .pre-commit-config to latest black. (:pr:6641) Julia Signell_
Update super usage to remove Python 2 compatibility (:pr:6630) Poruri Sai Rahul_
Remove u string prefixes (:pr:6633) Poruri Sai Rahul_

DataFrame ^^^^^^^^^

Improve error message for to_sql (:pr:6638) Julia Signell_
Use empty list as categories (:pr:6626) Julia Signell_

Documentation ^^^^^^^^^^^^^

Add autofunction to array api docs for more ufuncs (:pr:6644) James Bourbeau_
Add a number of missing ufuncs to dask.array docs (:pr:6642) Ralf Gommers_
Add HelmCluster docs (:pr:6290) Jacob Tomlinson_

.. _v2.26.0 / 2020-09-11:

2.26.0 / 2020-09-11

Array ^^^^^

Backend-aware dtype inference for single-chunk svd (:pr:6623) Eric Czech_
Make array.reduction docstring match for dtype (:pr:6624) Martin Durant_
Set lower bound on compression level for svd_compressed using rows and cols (:pr:6622) Eric Czech_
Improve SVD consistency and small array handling (:pr:6616) Eric Czech_
Add svd_flip #6599 (:pr:6613) Eric Czech_
Handle sequences containing dask Arrays (:pr:6595) Gabe Joseph_
Avoid large chunks from getitem with lists (:pr:6514) Tom Augspurger_
Eagerly slice numpy arrays in from_array (:pr:6605) Deepak Cherian_
Restore ability to pickle dask arrays (:pr:6594) Noah D. Brenowitz_
Add SVD support for short-and-fat arrays (:pr:6591) Eric Czech_
Add simple chunk type registry and defer as appropriate to upcast types (:pr:6393) Jon Thielen_
Align coarsen chunks by default (:pr:6580) Deepak Cherian_
Fixup reshape on unknown dimensions and other testing fixes (:pr:6578) Ryan Williams_

Core ^^^^

Add validation and fixes for HighLevelGraph dependencies (:pr:6588) Mads R. B. Kristensen_
Fix linting issue (:pr:6598) Tom Augspurger_
Skip bokeh version 2.0.0 (:pr:6572) John A Kirkham_

DataFrame ^^^^^^^^^

Added bytes/row calculation when using meta (:pr:6585) McToel_
Handle min_count in Series.sum / prod (:pr:6618) Daniel Saxton_
Update DataFrame.set_index docstring (:pr:6549) Timost_
Always compute 0 and 1 quantiles during quantile calculations (:pr:6564) Erik Welch_
Fix wrong path when reading empty csv file (:pr:6573) Abdulelah Bin Mahfoodh_

Documentation ^^^^^^^^^^^^^

Doc: Troubleshooting dashboard 404 (:pr:6215) Kilian Lieret_
Fixup extraConfig example (:pr:6625) Tom Augspurger_
Update supported Python versions (:pr:6609) Julia Signell_
Document dask/daskhub helm chart (:pr:6560) Tom Augspurger_

.. _v2.25.0 / 2020-08-28:

2.25.0 / 2020-08-28

Core ^^^^

Compare key hashes in subs() (:pr:6559) Mads R. B. Kristensen_
Rerun with latest black release (:pr:6568) James Bourbeau_
License update (:pr:6554) Tom Augspurger_

DataFrame ^^^^^^^^^

Add gs read_parquet example (:pr:6548) Ray Bell_

Documentation ^^^^^^^^^^^^^

Remove version from documentation page names (:pr:6558) James Bourbeau_
Update kubernetes-helm.rst (:pr:6523) David Sheldon_
Stop 2020 survey (:pr:6547) Tom Augspurger_

.. _v2.24.0 / 2020-08-22:

2.24.0 / 2020-08-22

Array ^^^^^

Fix setting random seed in tests. (:pr:6518) Elliott Sales de Andrade_
Support meta in apply gufunc (:pr:6521) joshreback_
Replace cupy.sparse with cupyx.scipy.sparse (:pr:6530) John A Kirkham_

Dataframe ^^^^^^^^^

Bump up tolerance for rolling tests (:pr:6502) Julia Signell_
Implement DatFrame.len (:pr:6515) Tom Augspurger_
Infer arrow schema in to_parquet (for ArrowEngine) (:pr:6490) Richard (Rick) Zamora`_
Fix parquet test when no pyarrow (:pr:6524) Martin Durant_
Remove problematic filter arguments in ArrowEngine (:pr:6527) Richard (Rick) Zamora_
Avoid schema validation by default in ArrowEngine (:pr:6536) Richard (Rick) Zamora_

Core ^^^^

Use unpack_collections in make_blockwise_graph (:pr:6517) Thomas J. Fan_
Move key_split() from optimization.py to utils.py (:pr:6529) Mads R. B. Kristensen_
Make tests run on moto server (:pr:6528) Martin Durant_

.. _v2.23.0 / 2020-08-14:

2.23.0 / 2020-08-14

Array ^^^^^

Reduce np.zeros, ones, and full array size with broadcasting (:pr:6491) Matthias Bussonnier_
Add missing meta= for trim in map_overlap (:pr:6494) Peter Andreas Entschev_

Bag ^^^

Bag repartition partition size (:pr:6371) joshreback_

Core ^^^^

Scalar.__dask_layers__() to return self._name instead of self.key (:pr:6507) Mads R. B. Kristensen_
Update dependencies correctly in fuse_root optimization (:pr:6508) Mads R. B. Kristensen_

DataFrame ^^^^^^^^^

Adds items to dataframe (:pr:6503) Thomas J. Fan_
Include compression in write_table call (:pr:6499) Julia Signell_
Fixed warning in nonempty_series (:pr:6485) Tom Augspurger_
Intelligently determine partitions based on type of first arg (:pr:6479) Matthew Rocklin_
Fix pyarrow mkdirs (:pr:6475) Julia Signell_
Fix duplicate parquet output in to_parquet (:pr:6451) michaelnarodovitch_

Documentation ^^^^^^^^^^^^^

Fix documentation da.histogram (:pr:6439) Roberto Panai_
Add agg nunique example (:pr:6404) Ray Bell_
Fixed a few typos in the SQL docs (:pr:6489) Mike McCarty_
Docs for SQLing (:pr:6453) Martin Durant_

.. _v2.22.0 / 2020-07-31:

2.22.0 / 2020-07-31

Array ^^^^^

Compatibility for NumPy dtype deprecation (:pr:6430) Tom Augspurger_

Core ^^^^

Implement sizeof for some bytes-like objects (:pr:6457) John A Kirkham_
HTTP error for new fsspec (:pr:6446) Martin Durant_
When RecursionError is raised, return uuid from tokenize function (:pr:6437) Julia Signell_
Install deps of upstream-dev packages (:pr:6431) Tom Augspurger_
Use updated link in setup.cfg (:pr:6426) Zhengnan Zhao_

DataFrame ^^^^^^^^^

Add single quotes around column names if strings (:pr:6471) Gil Forsyth_
Refactor ArrowEngine for better read_parquet performance (:pr:6346) Richard (Rick) Zamora_
Add tolist dispatch (:pr:6444) GALI PREM SAGAR_
Compatibility with pandas 1.1.0rc0 (:pr:6429) Tom Augspurger_
Multi value pivot table (:pr:6428) joshreback_
Duplicate argument definitions in to_csv docstring (:pr:6411) Jun Han (Johnson) Ooi_

Documentation ^^^^^^^^^^^^^

Add utility to docs to convert YAML config to env vars and back (:pr:6472) Jacob Tomlinson_
Fix parameter server rendering (:pr:6466) Scott Sievert_
Fixes broken links (:pr:6403) Jim Circadian_
Complete parameter server implementation in docs (:pr:6449) Scott Sievert_
Fix typo (:pr:6436) Jack Xiaosong Xu_

.. _v2.21.0 / 2020-07-17:

2.21.0 / 2020-07-17

Array ^^^^^

Correct error message in array.routines.gradient() (:pr:6417) johnomotani_
Fix blockwise concatenate for array with some dimension=1 (:pr:6342) Matthias Bussonnier_

Bag ^^^

Fix bag.take example (:pr:6418) Roberto Panai_

Core ^^^^

Groups values in optimization pass should only be graph and keys -- not an optimization + keys (:pr:6409) Benjamin Zaitlen_
Call custom optimizations once, with kwargs provided (:pr:6382) Clark Zinzow_
Include pickle5 for testing on Python 3.7 (:pr:6379) John A Kirkham_

DataFrame ^^^^^^^^^

Correct typo in error message (:pr:6422) Tom McTiernan_
Use pytest.warns to check for UserWarning (:pr:6378) Richard (Rick) Zamora_
Parse bytes_per_chunk keyword from string (:pr:6370) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

Numpydoc formatting (:pr:6421) Matthias Bussonnier_
Unpin numpydoc following 1.1 release (:pr:6407) Gil Forsyth_
Numpydoc formatting (:pr:6402) Matthias Bussonnier_
Add instructions for using conda when installing code for development (:pr:6399) Ray Bell_
Update visualize docstrings (:pr:6383) Zhengnan Zhao_

.. _v2.20.0 / 2020-07-02:

2.20.0 / 2020-07-02

Array ^^^^^

Register sizeof for numpy zero-strided arrays (:pr:6343) Matthias Bussonnier_
Use concatenate_lookup in concatenate (:pr:6339) John A Kirkham_
Fix rechunking of arrays with some zero-length dimensions (:pr:6335) Matthias Bussonnier_

DataFrame ^^^^^^^^^

Dispatch iloc``` calls to getitem`` (:pr:6355) Gil Forsyth_
Handle unnamed pandas RangeIndex in fastparquet engine (:pr:6350) Richard (Rick) Zamora_
Preserve index when writing partitioned parquet datasets with pyarrow (:pr:6282) Richard (Rick) Zamora_
Use ignore_index for pandas' group_split_dispatch (:pr:6251) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

Add doc describing argument (:pr:6318) asmith26_

.. _v2.19.0 / 2020-06-19:

2.19.0 / 2020-06-19

Array ^^^^^

Cast chunk sizes to python int dtype (:pr:6326) Gil Forsyth_
Add shape=None to *_like() array creation functions (:pr:6064) Anderson Banihirwe_

Core ^^^^

Update expected error msg for protocol difference in fsspec (:pr:6331) Gil Forsyth_
Fix for floats < 1 in parse_bytes (:pr:6311) Gil Forsyth_
Fix exception causes all over the codebase (:pr:6308) Ram Rachum_
Fix duplicated tests (:pr:6303) James Lamb_
Remove unused testing function (:pr:6304) James Lamb_

DataFrame ^^^^^^^^^

Add high-level CSV Subgraph (:pr:6262) Gil Forsyth_
Fix ValueError when merging an index-only 1-partition dataframe (:pr:6309) Krishan Bhasin_
Make index.map clear divisions. (:pr:6285) Julia Signell_

Documentation ^^^^^^^^^^^^^

Add link to 2020 survey (:pr:6328) Tom Augspurger_
Update bag.rst (:pr:6317) Ben Shaver_

.. _v2.18.1 / 2020-06-09:

2.18.1 / 2020-06-09

Array ^^^^^

Don't try to set name on full (:pr:6299) Julia Signell_
Histogram: support lazy values for range/bins (another way) (:pr:6252) Gabe Joseph_

Core ^^^^

Fix exception causes in utils.py (:pr:6302) Ram Rachum_
Improve performance of HighLevelGraph construction (:pr:6293) Julia Signell_

Documentation ^^^^^^^^^^^^^

Now readthedocs builds unrelased features' docstrings (:pr:6295) Antonio Ercole De Luca_
Add asyncssh intersphinx mappings (:pr:6298) Jacob Tomlinson_

.. _v2.18.0 / 2020-06-05:

2.18.0 / 2020-06-05

Array ^^^^^

Cast slicing index to dask array if same shape as original (:pr:6273) Julia Signell_
Fix stack error message (:pr:6268) Stephanie Gott_
full & full_like: error on non-scalar fill_value (:pr:6129) Huite_
Support for multiple arrays in map_overlap (:pr:6165) Eric Czech_
Pad resample divisions so that edges are counted (:pr:6255) Julia Signell_

Bag ^^^

Random sampling of k elements from a dask bag #4799 (:pr:6239) Antonio Ercole De Luca_

DataFrame ^^^^^^^^^

Add dropna, sort, and ascending to sort_values (:pr:5880) Julia Signell_
Generalize from_dask_array (:pr:6263) GALI PREM SAGAR_
Add derived docstring for SeriesGroupby.nunique (:pr:6284) Julia Signell_
Remove NotImplementedError in resample with rule (:pr:6274) Abdulelah Bin Mahfoodh_
Add dd.to_sql (:pr:6038) Ryan Williams_

Documentation ^^^^^^^^^^^^^

Update remote data section (:pr:6258) Ray Bell_

.. _v2.17.2 / 2020-05-28:

2.17.2 / 2020-05-28

Core ^^^^

Re-add the complete extra (:pr:6257) Jim Crist-Harif_

DataFrame ^^^^^^^^^

Raise error if resample isn't going to give right answer (:pr:6244) Julia Signell_

.. _v2.17.1 / 2020-05-28:

2.17.1 / 2020-05-28

Array ^^^^^

Empty array rechunk (:pr:6233) Andrew Fulton_

Core ^^^^

Make pyyaml required (:pr:6250) Jim Crist-Harif_
Fix install commands from ImportError (:pr:6238) Gaurav Sheni_
Remove issue template (:pr:6249) Jacob Tomlinson_

DataFrame ^^^^^^^^^

Pass ignore_index to dd_shuffle from DataFrame.shuffle (:pr:6247) Richard (Rick) Zamora_
Cope with missing HDF keys (:pr:6204) Martin Durant_
Generalize describe & quantile apis (:pr:5137) GALI PREM SAGAR_

.. _v2.17.0 / 2020-05-26:

2.17.0 / 2020-05-26

Array ^^^^^

Small improvements to da.pad (:pr:6213) Mark Boer_
Return tuple if multiple outputs in dask.array.apply_gufunc, add test to check for tuple (:pr:6207) Kai Mühlbauer_
Support stack with unknown chunksizes (:pr:6195) swapna_

Bag ^^^

Random Choice on Bags (:pr:6208) Antonio Ercole De Luca_

Core ^^^^

Raise warning delayed.visualise() (:pr:6216) Amol Umbarkar_
Ensure other pickle arguments work (:pr:6229) John A Kirkham_
Overhaul fuse() config (:pr:6198) crusaderky_
Update dask.order.order to consider "next" nodes using both FIFO and LIFO (:pr:5872) Erik Welch_

DataFrame ^^^^^^^^^

Use 0 as fill_value for more agg methods (:pr:6245) Julia Signell_
Generalize rearrange_by_column_tasks and add DataFrame.shuffle (:pr:6066) Richard (Rick) Zamora_
Xfail test_rolling_numba_engine for newer numba and older pandas (:pr:6236) James Bourbeau_
Generalize fix_overlap (:pr:6240) GALI PREM SAGAR_
Fix DataFrame.shape with no columns (:pr:6237) noreentry_
Avoid shuffle when setting a presorted index with overlapping divisions (:pr:6226) Krishan Bhasin_
Adjust the Parquet engine classes to allow more easily subclassing (:pr:6211) Marius van Niekerk_
Fix dd.merge_asof with left_on='col' & right_index=True (:pr:6192) noreentry_
Disable warning for concat (:pr:6210) Tung Dang_
Move AUTO_BLOCKSIZE out of read_csv signature (:pr:6214) Jim Crist-Harif_
.loc indexing with callable (:pr:6185) Endre Mark Borza_
Avoid apply in _compute_sum_of_squares for groupby std agg (:pr:6186) Richard (Rick) Zamora_
Minor correction to test_parquet (:pr:6190) Brian Larsen_
Adhering to the passed pat for delimeter join and fix error message (:pr:6194) GALI PREM SAGAR_
Skip test_to_parquet_with_get if no parquet libs available (:pr:6188) Scott Sanderson_

Documentation ^^^^^^^^^^^^^

Added documentation for distributed.Event class (:pr:6231) Nils Braun_
Doc write to remote (:pr:6124) Ray Bell_

.. _v2.16.0 / 2020-05-08:

2.16.0 / 2020-05-08

Array ^^^^^

Fix array general-reduction name (:pr:6176) Nick Evans_
Replace dim with shape in unravel_index (:pr:6155) Julia Signell_
Moment: handle all elements being masked (:pr:5339) Gabe Joseph_

Core ^^^^

Remove Redundant string concatenations in dask code-base (:pr:6137) GALI PREM SAGAR_
Upstream compat (:pr:6159) Tom Augspurger_
Ensure sizeof of dict and sequences returns an integer (:pr:6179) James Bourbeau_
Estimate python collection sizes with random sampling (:pr:6154) Florian Jetter_
Update test upstream (:pr:6146) Tom Augspurger_
Skip test for mindeps build (:pr:6144) Tom Augspurger_
Switch default multiprocessing context to "spawn" (:pr:4003) Itamar Turner-Trauring_
Update manifest to include dask-schema (:pr:6140) Benjamin Zaitlen_

DataFrame ^^^^^^^^^

Harden inconsistent-schema handling in pyarrow-based read_parquet (:pr:6160) Richard (Rick) Zamora_
Add compute kwargs to methods that write data to disk (:pr:6056) Krishan Bhasin_
Fix issue where unique returns an index like result from backends (:pr:6153) GALI PREM SAGAR_
Fix internal error in map_partitions with collections (:pr:6103) Tom Augspurger_

Documentation ^^^^^^^^^^^^^

Add phase of computation to index TOC (:pr:6157) Benjamin Zaitlen_
Remove unused imports in scheduling script (:pr:6138) James Lamb_
Fix indent (:pr:6147) Martin Durant_
Add Tom's log config example (:pr:6143) Martin Durant_

.. _v2.15.0 / 2020-04-24:

2.15.0 / 2020-04-24

Array ^^^^^

Update dask.array.from_array to warn when passed a Dask collection (:pr:6122) James Bourbeau_
Un-numpy like behaviour in dask.array.pad (:pr:6042) Mark Boer_
Add support for repeats=0 in da.repeat (:pr:6080) James Bourbeau_

Core ^^^^

Fix yaml layout for schema (:pr:6132) Benjamin Zaitlen_
Configuration Reference (:pr:6069) Benjamin Zaitlen_
Add configuration option to turn off task fusion (:pr:6087) Matthew Rocklin_
Skip pyarrow on windows (:pr:6094) Tom Augspurger_
Set limit to maximum length of fused key (:pr:6057) Lucas Rademaker_
Add test against #6062 (:pr:6072) Martin Durant_
Bump checkout action to v2 (:pr:6065) James Bourbeau_

DataFrame ^^^^^^^^^

Generalize categorical calls to support cudf Categorical (:pr:6113) GALI PREM SAGAR_
Avoid reading _metadata on every worker (:pr:6017) Richard (Rick) Zamora_
Use group_split_dispatch and ignore_index in apply_concat_apply (:pr:6119) Richard (Rick) Zamora_
Handle new (dtype) pandas metadata with pyarrow (:pr:6090) Richard (Rick) Zamora_
Skip test_partition_on_cats_pyarrow if pyarrow is not installed (:pr:6112) James Bourbeau_
Update DataFrame len to handle columns with the same name (:pr:6111) James Bourbeau_
ArrowEngine bug fixes and test coverage (:pr:6047) Richard (Rick) Zamora_
Added mode (:pr:5958) Adam Lewis_

Documentation ^^^^^^^^^^^^^

Update "helm install" for helm 3 usage (:pr:6130) JulianWgs_
Extend preload documentation (:pr:6077) Matthew Rocklin_
Fixed small typo in DataFrame map_partitions() docstring (:pr:6115) Eugene Huang_
Fix typo: "double" should be times, not plus (:pr:6091) David Chudzicki_
Fix first line of array.random.* docs (:pr:6063) Martin Durant_
Add section about Semaphore in distributed (:pr:6053) Florian Jetter_

.. _v2.14.0 / 2020-04-03:

2.14.0 / 2020-04-03

Array ^^^^^

Added np.iscomplexobj implementation (:pr:6045) Tom Augspurger_

Core ^^^^

Update test_rearrange_disk_cleanup_with_exception to pass without cloudpickle installed (:pr:6052) James Bourbeau_
Fixed flaky test-rearrange (:pr:5977) Tom Augspurger_

DataFrame ^^^^^^^^^

Use _meta_nonempty for dtype casting in stack_partitions (:pr:6061) mlondschien_
Fix bugs in _metadata creation and filtering in parquet ArrowEngine (:pr:6023) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

DOC: Add name caveats (:pr:6040) Tom Augspurger_

.. _v2.13.0 / 2020-03-25:

2.13.0 / 2020-03-25

Array ^^^^^

Support dtype and other keyword arguments in da.random (:pr:6030) Matthew Rocklin_
Register support for cupy sparse hstack/vstack (:pr:5735) Corey J. Nolet_
Force self.name to str in dask.array (:pr:6002) Chuanzhu Xu_

Bag ^^^

Set rename_fused_keys to None by default in bag.optimize (:pr:6000) Lucas Rademaker_

Core ^^^^

Copy dict in to_graphviz to prevent overwriting (:pr:5996) JulianWgs_
Stricter pandas xfail (:pr:6024) Tom Augspurger_
Fix CI failures (:pr:6013) James Bourbeau_
Update toolz to 0.8.2 and use tlz (:pr:5997) Ryan Grout_
Move Windows CI builds to GitHub Actions (:pr:5862) James Bourbeau_

DataFrame ^^^^^^^^^

Improve path-related exceptions in read_hdf (:pr:6032) psimaj_
Fix dtype handling in dd.concat (:pr:6006) mlondschien_
Handle cudf's leftsemi and leftanti joins (:pr:6025) Richard J Zamora_
Remove unused npartitions variable in dd.from_pandas (:pr:6019) Daniel Saxton_
Added shuffle to DataFrame.random_split (:pr:5980) petiop_

Documentation ^^^^^^^^^^^^^

Fix indentation in scheduler-overview docs (:pr:6022) Matthew Rocklin_
Update task graphs in optimize docs (:pr:5928) Julia Signell_
Optionally get rid of intermediary boxes in visualize, and add more labels (:pr:5976) Julia Signell_

.. _v2.12.0 / 2020-03-06:

2.12.0 / 2020-03-06

Array ^^^^^

Improve reuse of temporaries with numpy (:pr:5933) Bruce Merry_
Make map_blocks with block_info produce a Blockwise (:pr:5896) Bruce Merry_
Optimize make_blockwise_graph (:pr:5940) Bruce Merry_
Fix axes ordering in da.tensordot (:pr:5975) Gil Forsyth_
Adds empty mode to array.pad (:pr:5931) Thomas J. Fan_

Core ^^^^

Remove toolz.memoize dependency in dask.utils (:pr:5978) Ryan Grout_
Close pool leaking subprocess (:pr:5979) Tom Augspurger_
Pin numpydoc to 0.8.0 (fix double autoescape) (:pr:5961) Gil Forsyth_
Register deterministic tokenization for range objects (:pr:5947) James Bourbeau_
Unpin msgpack in CI (:pr:5930) JAmes Bourbeau_
Ensure dot results are placed in unique files. (:pr:5937) Elliott Sales de Andrade_
Add remaining optional dependencies to Travis 3.8 CI build environment (:pr:5920) James Bourbeau_

DataFrame ^^^^^^^^^

Skip parquet getitem optimization for some keys (:pr:5917) Tom Augspurger_
Add ignore_index argument to rearrange_by_column code path (:pr:5973) Richard J Zamora_
Add DataFrame and Series memory_usage_per_partition methods (:pr:5971) James Bourbeau_
xfail test_describe when using Pandas 0.24.2 (:pr:5948) James Bourbeau_
Implement dask.dataframe.to_numeric (:pr:5929) Julia Signell_
Add new error message content when columns are in a different order (:pr:5927) Julia Signell_
Use shallow copy for assign operations when possible (:pr:5740) Richard J Zamora_

Documentation ^^^^^^^^^^^^^

Changed above to below in dask.array.triu docs (:pr:5984) Henrik Andersson_
Array slicing: fix typo in slice_with_int_dask_array error message (:pr:5981) Gabe Joseph_
Grammar and formatting updates to docstrings (:pr:5963) James Lamb_
Update develop doc with conda option (:pr:5939) Ray Bell_
Update title of DataFrame extension docs (:pr:5954) James Bourbeau_
Fixed typos in documentation (:pr:5962) James Lamb_
Add original class or module as a kwarg on _bind_* methods (:pr:5946) Julia Signell_
Add collect list example (:pr:5938) Ray Bell_
Update optimization doc for python 3 (:pr:5926) Julia Signell_

.. _v2.11.0 / 2020-02-19:

2.11.0 / 2020-02-19

Array ^^^^^

Cache result of Array.shape (:pr:5916) Bruce Merry_
Improve accuracy of estimate_graph_size for rechunk (:pr:5907) Bruce Merry_
Skip rechunk steps that do not alter chunking (:pr:5909) Bruce Merry_
Support dtype and other kwargs in coarsen (:pr:5903) Matthew Rocklin_
Push chunk override from map_blocks into blockwise (:pr:5895) Bruce Merry_
Avoid using rewrite_blockwise for a singleton (:pr:5890) Bruce Merry_
Optimize slices_from_chunks (:pr:5891) Bruce Merry_
Avoid unnecessary __getitem__ in block() when chunks have correct dimensionality (:pr:5884) Thomas Robitaille_

Bag ^^^

Add include_path option for dask.bag.read_text (:pr:5836) Yifan Gu_
Fixes ValueError in delayed execution of bagged NumPy array (:pr:5828) Surya Avala_

Core ^^^^

CI: Pin msgpack (:pr:5923) Tom Augspurger_
Rename test_inner to test_outer (:pr:5922) Shiva Raisinghani_
quote should quote dicts too (:pr:5905) Bruce Merry_
Register a normalizer for literal (:pr:5898) Bruce Merry_
Improve layer name synthesis for non-HLGs (:pr:5888) Bruce Merry_
Replace flake8 pre-commit-hook with upstream (:pr:5892) Julia Signell_
Call pip as a module to avoid warnings (:pr:5861) Cyril Shcherbin_
Close ThreadPool at exit (:pr:5852) Tom Augspurger_
Remove dask.dataframe import in tokenization code (:pr:5855) James Bourbeau_

DataFrame ^^^^^^^^^

Require pandas>=0.23 (:pr:5883) Tom Augspurger_
Remove lambda from dataframe aggregation (:pr:5901) Matthew Rocklin_
Fix exception chaining in dataframe/__init__.py (:pr:5882) Ram Rachum_
Add support for reductions on empty dataframes (:pr:5804) Shiva Raisinghani_
Expose sort= argument for groupby (:pr:5801) Richard J Zamora_
Add df.empty property (:pr:5711) rockwellw_
Use parquet read speed-ups from fastparquet.api.paths_to_cats. (:pr:5821) Igor Gotlibovych_

Documentation ^^^^^^^^^^^^^

Deprecate doc_wraps (:pr:5912) Tom Augspurger_
Update array internal design docs for HighLevelGraph era (:pr:5889) Bruce Merry_
Move over dashboard connection docs (:pr:5877) Matthew Rocklin_
Move prometheus docs from distributed.dask.org (:pr:5876) Matthew Rocklin_
Removing duplicated DO block at the end (:pr:5878) K.-Michael Aye_
map_blocks see also (:pr:5874) Tom Augspurger_
More derived from (:pr:5871) Julia Signell_
Fix typo (:pr:5866) Yetunde Dada_
Fix typo in cloud.rst (:pr:5860) Andrew Thomas_
Add note pointing to code of conduct and diversity statement (:pr:5844) Matthew Rocklin_

.. _v2.10.1 / 2020-01-30:

2.10.1 / 2020-01-30

Fix Pandas 1.0 version comparison (:pr:5851) Tom Augspurger_
Fix typo in distributed diagnostics documentation (:pr:5841) Gerrit Holl_

.. _v2.10.0 / 2020-01-28:

2.10.0 / 2020-01-28

Support for pandas 1.0's new BooleanDtype and StringDtype (:pr:5815) Tom Augspurger_
Compatibility with pandas 1.0's API breaking changes and deprecations (:pr:5792) Tom Augspurger_
Fixed non-deterministic tokenization of some extension-array backed pandas objects (:pr:5813) Tom Augspurger_
Fixed handling of dataclass class objects in collections (:pr:5812) Matteo De Wint_
Fixed resampling with tz-aware dates when one of the endpoints fell in a non-existent time (:pr:5807) dfonnegra_
Delay initial Zarr dataset creation until the computation occurs (:pr:5797) Chris Roat_
Use parquet dataset statistics in more cases with the pyarrow engine (:pr:5799) Richard J Zamora_
Fixed exception in groupby.std() when some of the keys were large integers (:pr:5737) H. Thomson Comer_

.. _v2.9.2 / 2020-01-16:

2.9.2 / 2020-01-16

Array ^^^^^

Unify chunks in broadcast_arrays (:pr:5765) Matthew Rocklin_

Core ^^^^

xfail CSV encoding tests (:pr:5791) Tom Augspurger_
Update order to handle empty dask graph (:pr:5789) James Bourbeau_
Redo dask.order.order (:pr:5646) Erik Welch_

DataFrame ^^^^^^^^^

Add transparent compression for on-disk shuffle with partd (:pr:5786) Christian Wesp_
Fix repr for empty dataframes (:pr:5781) Shiva Raisinghani_
Pandas 1.0.0RC0 compat (:pr:5784) Tom Augspurger_
Remove buggy assertions (:pr:5783) Tom Augspurger_
Pandas 1.0 compat (:pr:5782) Tom Augspurger_
Fix bug in pyarrow-based read_parquet on partitioned datasets (:pr:5777) Richard J Zamora_
Compat for pandas 1.0 (:pr:5779) Tom Augspurger_
Fix groupby/mean error with with categorical index (:pr:5776) Richard J Zamora_
Support empty partitions when performing cumulative aggregation (:pr:5730) Matthew Rocklin_
set_index accepts single-item unnested list (:pr:5760) Wes Roach_
Fixed partitioning in set index for ordered Categorical (:pr:5715) Tom Augspurger_

Documentation ^^^^^^^^^^^^^

Note additional use case for normalize_token.register (:pr:5766) Thomas A Caswell_
Update bag repartition docstring (:pr:5772) Timost_
Small typos (:pr:5771) Maarten Breddels_
Fix typo in Task Expectations docs (:pr:5767) James Bourbeau_
Add docs section on task expectations to graph page (:pr:5764) Devin Petersohn_

.. _v2.9.1 / 2019-12-27:

2.9.1 / 2019-12-27

Array ^^^^^

Support Array.view with dtype=None (:pr:5736) Anderson Banihirwe_
Add dask.array.nanmedian (:pr:5684) Deepak Cherian_

Core ^^^^

xfail test_temporary_directory on Python 3.8 (:pr:5734) James Bourbeau_
Add support for Python 3.8 (:pr:5603) James Bourbeau_
Use id to dedupe constants in rewrite_blockwise (:pr:5696) Jim Crist_

DataFrame ^^^^^^^^^

Raise error when converting a dask dataframe scalar to a boolean (:pr:5743) James Bourbeau_
Ensure dataframe groupby-variance is greater than zero (:pr:5728) Matthew Rocklin_
Fix DataFrame.iter (:pr:5719) Tom Augspurger_
Support Parquet filters in disjunctive normal form, like PyArrow (:pr:5656) Matteo De Wint_
Auto-detect categorical columns in ArrowEngine-based read_parquet (:pr:5690) Richard J Zamora_
Skip parquet getitem optimization tests if no engine found (:pr:5697) James Bourbeau_
Fix independent optimization of parquet-getitem (:pr:5613) Tom Augspurger_

Documentation ^^^^^^^^^^^^^

Update helm config doc (:pr:5750) Ray Bell_
Link to examples.dask.org in several places (:pr:5733) Tom Augspurger_
Add missing " in performance report example (:pr:5724) James Bourbeau_
Resolve several documentation build warnings (:pr:5685) James Bourbeau_
add info on performance_report (:pr:5713) Benjamin Zaitlen_
Add more docs disclaimers (:pr:5710) Julia Signell_
Fix simple typo: wihout -> without (:pr:5708) Tim Gates_
Update numpydoc dependency (:pr:5694) James Bourbeau_

.. _v2.9.0 / 2019-12-06:

2.9.0 / 2019-12-06

Array ^^^^^

Fix da.std to work with NumPy arrays (:pr:5681) James Bourbeau_

Core ^^^^

Register sizeof functions for Numba and RMM (:pr:5668) John A Kirkham_
Update meeting time (:pr:5682) Tom Augspurger_

DataFrame ^^^^^^^^^

Modify dd.DataFrame.drop to use shallow copy (:pr:5675) Richard J Zamora_
Fix bug in _get_md_row_groups (:pr:5673) Richard J Zamora_
Close sqlalchemy engine after querying DB (:pr:5629) Krishan Bhasin_
Allow dd.map_partitions to not enforce meta (:pr:5660) Matthew Rocklin_
Generalize concat_unindexed_dataframes to support cudf-backend (:pr:5659) Richard J Zamora_
Add dataframe resample methods (:pr:5636) Benjamin Zaitlen_
Compute length of dataframe as length of first column (:pr:5635) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

Doc fixup (:pr:5665) James Bourbeau_
Update doc build instructions (:pr:5640) James Bourbeau_
Fix ADL link (:pr:5639) Ray Bell_
Add documentation build (:pr:5617) James Bourbeau_

.. _v2.8.1 / 2019-11-22:

2.8.1 / 2019-11-22

Array ^^^^^

Use auto rechunking in da.rechunk if no value given (:pr:5605) Matthew Rocklin_

Core ^^^^

Add simple action to activate GH actions (:pr:5619) James Bourbeau_

DataFrame ^^^^^^^^^

Fix "file_path_0" bug in aggregate_row_groups (:pr:5627) Richard J Zamora_
Add chunksize argument to read_parquet (:pr:5607) Richard J Zamora_
Change test_repartition_npartitions to support arch64 architecture (:pr:5620) ossdev07_
Categories lost after groupby + agg (:pr:5423) Oliver Hofkens_
Fixed relative path issue with parquet metadata file (:pr:5608) Nuno Gomes Silva_
Enable gpu-backed covariance/correlation in dataframes (:pr:5597) Richard J Zamora_

Documentation ^^^^^^^^^^^^^

Fix institutional faq and unknown doc warnings (:pr:5616) James Bourbeau_
Add doc for some utils (:pr:5609) Tom Augspurger_
Removes html_extra_path (:pr:5614) James Bourbeau_
Fixed See Also referencence (:pr:5612) Tom Augspurger_

.. _v2.8.0 / 2019-11-14:

2.8.0 / 2019-11-14

Array ^^^^^

Implement complete dask.array.tile function (:pr:5574) Bouwe Andela_
Add median along an axis with automatic rechunking (:pr:5575) Matthew Rocklin_
Allow da.asarray to chunk inputs (:pr:5586) Matthew Rocklin_

Bag ^^^

Use key_split in Bag name (:pr:5571) Matthew Rocklin_

Core ^^^^

Switch Doctests to Py3.7 (:pr:5573) Ryan Nazareth_
Relax get_colors test to adapt to new Bokeh release (:pr:5576) Matthew Rocklin_
Add dask.blockwise.fuse_roots optimization (:pr:5451) Matthew Rocklin_
Add sizeof implementation for small dicts (:pr:5578) Matthew Rocklin_
Update fsspec, gcsfs, s3fs (:pr:5588) Tom Augspurger_

DataFrame ^^^^^^^^^

Add dropna argument to groupby (:pr:5579) Richard J Zamora_
Revert "Remove import of dask_cudf, which is now a part of cudf (:pr:5568)" (:pr:5590) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

Add best practice for dask.compute function (:pr:5583) Matthew Rocklin_
Create FUNDING.yml (:pr:5587) Gina Helfrich_
Add screencast for coordination primitives (:pr:5593) Matthew Rocklin_
Move funding to .github repo (:pr:5589) Tom Augspurger_
Update calendar link (:pr:5569) Tom Augspurger_

.. _v2.7.0 / 2019-11-08:

2.7.0 / 2019-11-08

This release drops support for Python 3.5

Array ^^^^^

Reuse code for assert_eq util method (:pr:5496) Vijayant_
Update da.array to always return a dask array (:pr:5510) James Bourbeau_
Skip transpose on trivial inputs (:pr:5523) Ryan Abernathey_
Avoid NumPy scalar string representation in tokenize (:pr:5527) James Bourbeau_
Remove unnecessary tiledb shape constraint (:pr:5545) Norman Barker_
Removes bytes from sparse array HTML repr (:pr:5556) James Bourbeau_

Core ^^^^

Drop Python 3.5 (:pr:5528) James Bourbeau_
Update the use of fixtures in distributed tests (:pr:5497) Matthew Rocklin_
Changed deprecated bokeh-port to dashboard-address (:pr:5507) darindf_
Avoid updating with identical dicts in ensure_dict (:pr:5501) James Bourbeau_
Test Upstream (:pr:5516) Tom Augspurger_
Accelerate reverse_dict (:pr:5479) Ryan Grout_
Update test_imports.sh (:pr:5534) James Bourbeau_
Support cgroups limits on cpu count in multiprocess and threaded schedulers (:pr:5499) Albert DeFusco_
Update minimum pyarrow version on CI (:pr:5562) James Bourbeau_
Make cloudpickle optional (:pr:5511) crusaderky_

DataFrame ^^^^^^^^^

Add an example of index_col usage (:pr:3072) Bruno Bonfils_
Explicitly use iloc for row indexing (:pr:5500) Krishan Bhasin_
Accept dask arrays on columns assignemnt (:pr:5224) Henrique Ribeiro-
Implement unique and value_counts for SeriesGroupBy (:pr:5358) Scott Sievert_
Add sizeof definition for pyarrow tables and columns (:pr:5522) Richard J Zamora_
Enable row-group task partitioning in pyarrow-based read_parquet (:pr:5508) Richard J Zamora_
Removes npartitions='auto' from dd.merge docstring (:pr:5531) James Bourbeau_
Apply enforce error message shows non-overlapping columns. (:pr:5530) Tom Augspurger_
Optimize meta_nonempty for repetitive dtypes (:pr:5553) Petio Petrov_
Remove import of dask_cudf, which is now a part of cudf (:pr:5568) Mads R. B. Kristensen_

Documentation ^^^^^^^^^^^^^

Make capitalization more consistent in FAQ docs (:pr:5512) Matthew Rocklin_
Add CONTRIBUTING.md (:pr:5513) Jacob Tomlinson_
Document optional dependencies (:pr:5456) Prithvi MK_
Update helm chart docs to reflect new chart repo (:pr:5539) Jacob Tomlinson_
Add Resampler to API docs (:pr:5551) James Bourbeau_
Fix typo in read_sql_table (:pr:5554) Eric Dill_
Add adaptive deployments screencast [skip ci] (:pr:5566) Matthew Rocklin_

.. _v2.6.0 / 2019-10-15:

2.6.0 / 2019-10-15

Core ^^^^

Call ensure_dict on graphs before entering toolz.merge (:pr:5486) Matthew Rocklin_
Consolidating hash dispatch functions (:pr:5476) Richard J Zamora_

DataFrame ^^^^^^^^^

Support Python 3.5 in Parquet code (:pr:5491) Benjamin Zaitlen_
Avoid identity check in warn_dtype_mismatch (:pr:5489) Tom Augspurger_
Enable unused groupby tests (:pr:3480) Jörg Dietrich_
Remove old parquet and bcolz dataframe optimizations (:pr:5484) Matthew Rocklin_
Add getitem optimization for read_parquet (:pr:5453) Tom Augspurger_
Use _constructor_sliced method to determine Series type (:pr:5480) Richard J Zamora_
Fix map(series) for unsorted base series index (:pr:5459) Justin Waugh_
Fix KeyError with Groupby label (:pr:5467) Ryan Nazareth_

Documentation ^^^^^^^^^^^^^

Use Zoom meeting instead of appear.in (:pr:5494) Matthew Rocklin_
Added curated list of resources (:pr:5460) Javad_
Update SSH docs to include SSHCluster (:pr:5482) Matthew Rocklin_
Update "Why Dask?" page (:pr:5473) Matthew Rocklin_
Fix typos in docstrings (:pr:5469) garanews_

.. _v2.5.2 / 2019-10-04:

2.5.2 / 2019-10-04

Array ^^^^^

Correct chunk size logic for asymmetric overlaps (:pr:5449) Ben Jeffery_
Make da.unify_chunks public API (:pr:5443) Matthew Rocklin_

DataFrame ^^^^^^^^^

Fix dask.dataframe.fillna handling of Scalar object (:pr:5463) Zhenqing Li_

Documentation ^^^^^^^^^^^^^

Remove boxes in Spark comparison page (:pr:5445) Matthew Rocklin_
Add latest presentations (:pr:5446) Javad_
Update cloud documentation (:pr:5444) Matthew Rocklin_

.. _v2.5.0 / 2019-09-27:

2.5.0 / 2019-09-27

Core ^^^^

Add sentinel no_default to get_dependencies task (:pr:5420) James Bourbeau_
Update fsspec version (:pr:5415) Matthew Rocklin_
Remove PY2 checks (:pr:5400) Jim Crist_

DataFrame ^^^^^^^^^

Add option to not check meta in dd.from_delayed (:pr:5436) Christopher J. Wright_
Fix test_timeseries_nulls_in_schema failures with pyarrow master (:pr:5421) Richard J Zamora_
Reduce read_metadata output size in pyarrow/parquet (:pr:5391) Richard J Zamora_
Test numeric edge case for repartition with npartitions. (:pr:5433) amerkel2_
Unxfail pandas-datareader test (:pr:5430) Tom Augspurger_
Add DataFrame.pop implementation (:pr:5422) Matthew Rocklin_
Enable merge/set_index for cudf-based dataframes with cupy values (:pr:5322) Richard J Zamora_
drop_duplicates support for positional subset parameter (:pr:5410) Wes Roach_

Documentation ^^^^^^^^^^^^^

Add screencasts to array, bag, dataframe, delayed, futures and setup (:pr:5429) (:pr:5424) Matthew Rocklin_
Fix delimeter parsing documentation (:pr:5428) Mahmut Bulut_
Update overview image (:pr:5404) James Bourbeau_

.. _v2.4.0 / 2019-09-13:

2.4.0 / 2019-09-13

Array ^^^^^

Adds explicit h5py.File mode (:pr:5390) James Bourbeau_
Provides method to compute unknown array chunks sizes (:pr:5312) Scott Sievert_
Ignore runtime warning in Array compute_meta (:pr:5356) estebanag_
Add _meta to Array.__dask_postpersist__ (:pr:5353) Benoit Bovy_
Fixup da.asarray and da.asanyarray for datetime64 dtype and xarray objects (:pr:5334) Stephan Hoyer_
Add shape implementation (:pr:5293) Tom Augspurger_
Add chunktype to array text repr (:pr:5289) James Bourbeau_
Array.random.choice: handle array-like non-arrays (:pr:5283) Gabe Joseph_

Core ^^^^

Remove deprecated code (:pr:5401) Jim Crist_
Fix funcname when vectorized func has no __name__ (:pr:5399) James Bourbeau_
Truncate funcname to avoid long key names (:pr:5383) Matthew Rocklin_
Add support for numpy.vectorize in funcname (:pr:5396) James Bourbeau_
Fixed HDFS upstream test (:pr:5395) Tom Augspurger_
Support numbers and None in parse_bytes/timedelta (:pr:5384) Matthew Rocklin_
Fix tokenizing of subindexes on memmapped numpy arrays (:pr:5351) Henry Pinkard_
Upstream fixups (:pr:5300) Tom Augspurger_

DataFrame ^^^^^^^^^

Allow pandas to cast type of statistics (:pr:5402) Richard J Zamora_
Preserve index dtype after applying dd.pivot_table (:pr:5385) therhaag_
Implement explode for Series and DataFrame (:pr:5381) Arpit Solanki_
set_index on categorical fails with less categories than partitions (:pr:5354) Oliver Hofkens_
Support output to a single CSV file (:pr:5304) Hongjiu Zhang_
Add groupby().transform() (:pr:5327) Oliver Hofkens_
Adding filter kwarg to pyarrow dataset call (:pr:5348) Richard J Zamora_
Implement and check compression defaults for parquet (:pr:5335) Sarah Bird_
Pass sqlalchemy params to delayed objects (:pr:5332) Arpit Solanki_
Fixing schema handling in arrow-parquet (:pr:5307) Richard J Zamora_
Add support for DF and Series groupby().idxmin/max() (:pr:5273) Oliver Hofkens_
Add correlation calculation and add test (:pr:5296) Benjamin Zaitlen_

Documentation ^^^^^^^^^^^^^

Numpy docstring standard has moved (:pr:5405) Wes Roach_
Reference correct NumPy array name (:pr:5403) Wes Roach_
Minor edits to Array chunk documentation (:pr:5372) Scott Sievert_
Add methods to API docs (:pr:5387) Tom Augspurger_
Add namespacing to configuration example (:pr:5374) Matthew Rocklin_
Add get_task_stream and profile to the diagnostics page (:pr:5375) Matthew Rocklin_
Add best practice to load data with Dask (:pr:5369) Matthew Rocklin_
Update institutional-faq.rst (:pr:5345) DomHudson_
Add threads and processes note to the best practices (:pr:5340) Matthew Rocklin_
Update cuDF links (:pr:5328) James Bourbeau_
Fixed small typo with parentheses placement (:pr:5311) Eugene Huang_
Update link in reshape docstring (:pr:5297) James Bourbeau_

.. _v2.3.0 / 2019-08-16:

2.3.0 / 2019-08-16

Array ^^^^^

Raise exception when from_array is given a dask array (:pr:5280) David Hoese_
Avoid adjusting gufunc's meta dtype twice (:pr:5274) Peter Andreas Entschev_
Add meta= keyword to map_blocks and add test with sparse (:pr:5269) Matthew Rocklin_
Add rollaxis and moveaxis (:pr:4822) Tobias de Jong_
Always increment old chunk index (:pr:5256) James Bourbeau_
Shuffle dask array (:pr:3901) Tom Augspurger_
Fix ordering when indexing a dask array with a bool dask array (:pr:5151) James Bourbeau_

Bag ^^^

Add workaround for memory leaks in bag generators (:pr:5208) Marco Neumann_

Core ^^^^

Set strict xfail option (:pr:5220) James Bourbeau_
test-upstream (:pr:5267) Tom Augspurger_
Fixed HDFS CI failure (:pr:5234) Tom Augspurger_
Error nicely if no file size inferred (:pr:5231) Jim Crist_
A few changes to config.set (:pr:5226) Jim Crist_
Fixup black string normalization (:pr:5227) Jim Crist_
Pin NumPy in windows tests (:pr:5228) Jim Crist_
Ensure parquet tests are skipped if fastparquet and pyarrow not installed (:pr:5217) James Bourbeau_
Add fsspec to readthedocs (:pr:5207) Matthew Rocklin_
Bump NumPy and Pandas to 1.17 and 0.25 in CI test (:pr:5179) John A Kirkham_

DataFrame ^^^^^^^^^

Fix DataFrame.query docstring (incorrect numexpr API) (:pr:5271) Doug Davis_
Parquet metadata-handling improvements (:pr:5218) Richard J Zamora_
Improve messaging around sorted parquet columns for index (:pr:5265) Martin Durant_
Add rearrange_by_divisions and set_index support for cudf (:pr:5205) Richard J Zamora_
Fix groupby.std() with integer colum names (:pr:5096) Nicolas Hug_
Add Series.__iter__ (:pr:5071) Blane_
Generalize hash_pandas_object to work for non-pandas backends (:pr:5184) GALI PREM SAGAR_
Add rolling cov (:pr:5154) Ivars Geidans_
Add columns argument in drop function (:pr:5223) Henrique Ribeiro_

Documentation ^^^^^^^^^^^^^

Update institutional FAQ doc (:pr:5277) Matthew Rocklin_
Add draft of institutional FAQ (:pr:5214) Matthew Rocklin_
Make boxes for dask-spark page (:pr:5249) Martin Durant_
Add motivation for shuffle docs (:pr:5213) Matthew Rocklin_
Fix links and API entries for best-practices (:pr:5246) Martin Durant_
Remove "bytes" (internal data ingestion) doc page (:pr:5242) Martin Durant_
Redirect from our local distributed page to distributed.dask.org (:pr:5248) Matthew Rocklin_
Cleanup API page (:pr:5247) Matthew Rocklin_
Remove excess endlines from install docs (:pr:5243) Matthew Rocklin_
Remove item list in phases of computation doc (:pr:5245) Martin Durant_
Remove custom graphs from the TOC sidebar (:pr:5241) Matthew Rocklin_
Remove experimental status of custom collections (:pr:5236) James Bourbeau_
Adds table of contents to Why Dask? (:pr:5244) James Bourbeau_
Moves bag overview to top-level bag page (:pr:5240) James Bourbeau_
Remove use-cases in favor of stories.dask.org (:pr:5238) Matthew Rocklin_
Removes redundant TOC information in index.rst (:pr:5235) James Bourbeau_
Elevate dashboard in distributed diagnostics documentation (:pr:5239) Martin Durant_
Updates "add" layer in HLG docs example (:pr:5237) James Bourbeau_
Update GUFunc documentation (:pr:5232) Matthew Rocklin_

.. _v2.2.0 / 2019-08-01:

2.2.0 / 2019-08-01

Array ^^^^^

Use da.from_array(..., asarray=False) if input follows NEP-18 (:pr:5074) Matthew Rocklin_
Add missing attributes to from_array documentation (:pr:5108) Peter Andreas Entschev_
Fix meta computation for some reduction functions (:pr:5035) Peter Andreas Entschev_
Raise informative error in to_zarr if unknown chunks (:pr:5148) James Bourbeau_
Remove invalid pad tests (:pr:5122) Tom Augspurger_
Ignore NumPy warnings in compute_meta (:pr:5103) Peter Andreas Entschev_
Fix kurtosis calc for single dimension input array (:pr:5177) @andrethrill_
Support Numpy 1.17 in tests (:pr:5192) Matthew Rocklin_

Bag ^^^

Supply pool to bag test to resolve intermittent failure (:pr:5172) Tom Augspurger_

Core ^^^^

Base dask on fsspec (:pr:5064) (:pr:5121) Martin Durant_
Various upstream compatibility fixes (:pr:5056) Tom Augspurger_
Make distributed tests optional again. (:pr:5128) Elliott Sales de Andrade_
Fix HDFS in dask (:pr:5130) Martin Durant_
Ignore some more invalid value warnings. (:pr:5140) Elliott Sales de Andrade_

DataFrame ^^^^^^^^^

Fix pd.MultiIndex size estimate (:pr:5066) Brett Naul_
Generalizing has_known_categories (:pr:5090) GALI PREM SAGAR_
Refactor Parquet engine (:pr:4995) Richard J Zamora_
Add divide method to series and dataframe (:pr:5094) msbrown47_
fix flaky partd test (:pr:5111) Tom Augspurger_
Adjust is_dataframe_like to adjust for value_counts change (:pr:5143) Tom Augspurger_
Generalize rolling windows to support non-Pandas dataframes (:pr:5149) Nick Becker_
Avoid unnecessary aggregation in pivot_table (:pr:5173) Daniel Saxton_
Add column names to apply_and_enforce error message (:pr:5180) Matthew Rocklin_
Add schema keyword argument to to_parquet (:pr:5150) Sarah Bird_
Remove recursion error in accessors (:pr:5182) Jim Crist_
Allow fastparquet to handle gather_statistics=False for file lists (:pr:5157) Richard J Zamora_

Documentation ^^^^^^^^^^^^^

Adds NumFOCUS badge to the README (:pr:5086) James Bourbeau_
Update developer docs [ci skip] (:pr:5093) Jim Crist_
Document DataFrame.set_index computataion behavior Natalya Rapstine_
Use pip install . instead of calling setup.py (:pr:5139) Matthias Bussonier_
Close user survey (:pr:5147) Tom Augspurger_
Fix Google Calendar meeting link (:pr:5155) Loïc Estève_
Add docker image customization example (:pr:5171) James Bourbeau_
Update remote-data-services after fsspec (:pr:5170) Martin Durant_
Fix typo in spark.rst (:pr:5164) Xavier Holt_
Update setup/python docs for async/await API (:pr:5163) Matthew Rocklin_
Update Local Storage HPC documentation (:pr:5165) Matthew Rocklin_

.. _v2.1.0 / 2019-07-08:

2.1.0 / 2019-07-08

Array ^^^^^

Add recompute= keyword to svd_compressed for lower-memory use (:pr:5041) Matthew Rocklin_
Change __array_function__ implementation for backwards compatibility (:pr:5043) Ralf Gommers_
Added dtype and shape kwargs to apply_along_axis (:pr:3742) Davis Bennett_
Fix reduction with empty tuple axis (:pr:5025) Peter Andreas Entschev_
Drop size 0 arrays in stack (:pr:4978) John A Kirkham_

Core ^^^^

Removes index keyword from pandas to_parquet call (:pr:5075) James Bourbeau_
Fixes upstream dev CI build installation (:pr:5072) James Bourbeau_
Ensure scalar arrays are not rendered to SVG (:pr:5058) Willi Rath_
Environment creation overhaul (:pr:5038) Tom Augspurger_
s3fs, moto compatibility (:pr:5033) Tom Augspurger_
pytest 5.0 compat (:pr:5027) Tom Augspurger_

DataFrame ^^^^^^^^^

Fix compute_meta recursion in blockwise (:pr:5048) Peter Andreas Entschev_
Remove hard dependency on pandas in get_dummies (:pr:5057) GALI PREM SAGAR_
Check dtypes unchanged when using DataFrame.assign (:pr:5047) asmith26_
Fix cumulative functions on tables with more than 1 partition (:pr:5034) tshatrov_
Handle non-divisible sizes in repartition (:pr:5013) George Sakkis_
Handles timestamp and preserve_index changes in pyarrow (:pr:5018) Richard J Zamora_
Fix undefined meta for str.split(expand=False) (:pr:5022) Brett Naul_
Removed checks used for debugging merge_asof (:pr:5011) Cody Johnson_
Don't use type when getting accessor in dataframes (:pr:4992) Matthew Rocklin_
Add melt as a method of Dask DataFrame (:pr:4984) Dustin Tindall_
Adds path-like support to to_hdf (:pr:5003) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Point to latest K8s setup article in JupyterHub docs (:pr:5065) Sean McKenna_
Changes vizualize to visualize (:pr:5061) David Brochart_
Fix from_sequence typo in delayed best practices (:pr:5045) James Bourbeau_
Add user survey link to docs (:pr:5026) James Bourbeau_
Fixes typo in optimization docs (:pr:5015) James Bourbeau_
Update community meeting information (:pr:5006) Tom Augspurger_

.. _v2.0.0 / 2019-06-25:

2.0.0 / 2019-06-25

Array ^^^^^

Support automatic chunking in da.indices (:pr:4981) James Bourbeau_
Err if there are no arrays to stack (:pr:4975) John A Kirkham_
Asymmetrical Array Overlap (:pr:4863) Michael Eaton_
Dispatch concatenate where possible within dask array (:pr:4669) Hameer Abbasi_
Fix tokenization of memmapped numpy arrays on different part of same file (:pr:4931) Henry Pinkard_
Preserve NumPy condition in da.asarray to preserve output shape (:pr:4945) Alistair Miles_
Expand foo_like_safe usage (:pr:4946) Peter Andreas Entschev_
Defer order/casting einsum parameters to NumPy implementation (:pr:4914) Peter Andreas Entschev_
Remove numpy warning in moment calculation (:pr:4921) Matthew Rocklin_
Fix meta_from_array to support Xarray test suite (:pr:4938) Matthew Rocklin_
Cache chunk boundaries for integer slicing (:pr:4923) Bruce Merry_
Drop size 0 arrays in concatenate (:pr:4167) John A Kirkham_
Raise ValueError if concatenate is given no arrays (:pr:4927) John A Kirkham_
Promote types in concatenate using _meta (:pr:4925) John A Kirkham_
Add chunk type to html repr in Dask array (:pr:4895) Matthew Rocklin_
Add Dask Array.meta attribute (:pr:4543) Peter Andreas Entschev
- Fix meta slicing of flexible types (:pr:4912) Peter Andreas Entschev
- Minor meta construction cleanup in concatenate (:pr:4937) Peter Andreas Entschev_
- Further relax Array meta checks for Xarray (:pr:4944) Matthew Rocklin_
- Support meta= keyword in da.from_delayed (:pr:4972) Matthew Rocklin_
- Concatenate meta along axis (:pr:4977) John A Kirkham_
- Use meta in stack (:pr:4976) John A Kirkham_
- Move blockwise_meta to more general compute_meta function (:pr:4954) Matthew Rocklin_
Alias .partitions to .blocks attribute of dask arrays (:pr:4853) Genevieve Buckley_
Drop outdated numpy_compat functions (:pr:4850) John A Kirkham_
Allow da.eye to support arbitrary chunking sizes with chunks='auto' (:pr:4834) Anderson Banihirwe_
Fix CI warnings in dask.array tests (:pr:4805) Tom Augspurger_
Make map_blocks work with drop_axis + block_info (:pr:4831) Bruce Merry_
Add SVG image and table in Array.repr_html (:pr:4794) Matthew Rocklin_
ufunc: avoid array_wrap in favor of array_function (:pr:4708) Peter Andreas Entschev_
Ensure trivial padding returns the original array (:pr:4990) John A Kirkham_
Test da.block with 0-size arrays (:pr:4991) John A Kirkham_

Core ^^^^

Drop Python 2.7 (:pr:4919) Jim Crist_
Quiet dependency installs in CI (:pr:4960) Tom Augspurger_
Raise on warnings in tests (:pr:4916) Tom Augspurger_
Add a diagnostics extra to setup.py (includes bokeh) (:pr:4924) John A Kirkham_
Add newline delimter keyword to OpenFile (:pr:4935) btw08_
Overload HighLevelGraphs values method (:pr:4918) James Bourbeau_
Add await method to Dask collections (:pr:4901) Matthew Rocklin_
Also ignore AttributeErrors which may occur if snappy (not python-snappy) is installed (:pr:4908) Mark Bell_
Canonicalize key names in config.rename (:pr:4903) Ian Bolliger_
Bump minimum partd to 0.3.10 (:pr:4890) Tom Augspurger_
Catch async def SyntaxError (:pr:4836) James Bourbeau_
catch IOError in ensure_file (:pr:4806) Justin Poehnelt_
Cleanup CI warnings (:pr:4798) Tom Augspurger_
Move distributed's parse and format functions to dask.utils (:pr:4793) Matthew Rocklin_
Apply black formatting (:pr:4983) James Bourbeau_
Package license file in wheels (:pr:4988) John A Kirkham_

DataFrame ^^^^^^^^^

Add an optional partition_size parameter to repartition (:pr:4416) George Sakkis_
merge_asof and prefix_reduction (:pr:4877) Cody Johnson_
Allow dataframes to be indexed by dask arrays (:pr:4882) Endre Mark Borza_
Avoid deprecated message parameter in pytest.raises (:pr:4962) James Bourbeau_
Update test_to_records to test with lengths argument(:pr:4515) asmith26_
Remove pandas pinning in Dataframe accessors (:pr:4955) Matthew Rocklin_
Fix correlation of series with same names (:pr:4934) Philipp S. Sommer_
Map Dask Series to Dask Series (:pr:4872) Justin Waugh_
Warn in dd.merge on dtype warning (:pr:4917) mcsoini_
Add groupby Covariance/Correlation (:pr:4889) Benjamin Zaitlen_
keep index name with to_datetime (:pr:4905) Ian Bolliger_
Add Parallel variance computation for dataframes (:pr:4865) Ksenia Bobrova_
Add divmod implementation to arrays and dataframes (:pr:4884) Henrique Ribeiro_
Add documentation for dataframe reshape methods (:pr:4896) tpanza_
Avoid use of pandas.compat (:pr:4881) Tom Augspurger_
Added accessor registration for Series, DataFrame, and Index (:pr:4829) Tom Augspurger_
Add read_function keyword to read_json (:pr:4810) Richard J Zamora_
Provide full type name in check_meta (:pr:4819) Matthew Rocklin_
Correctly estimate bytes per row in read_sql_table (:pr:4807) Lijo Jose_
Adding support of non-numeric data to describe() (:pr:4791) Ksenia Bobrova_
Scalars for extension dtypes. (:pr:4459) Tom Augspurger_
Call head before compute in dd.from_delayed (:pr:4802) Matthew Rocklin_
Add support for rolling operations with larger window that partition size in DataFrames with Time-based index (:pr:4796) Jorge Pessoa_
Update groupby-apply doc with warning (:pr:4800) Tom Augspurger_
Change groupby-ness tests in _maybe_slice (:pr:4786) Benjamin Zaitlen_
Add master best practices document (:pr:4745) Matthew Rocklin_
Add document for how Dask works with GPUs (:pr:4792) Matthew Rocklin_
Add cli API docs (:pr:4788) James Bourbeau_
Ensure concat output has coherent dtypes (:pr:4692) Guillaume Lemaitre_
Fixes pandas_datareader dependencies installation (:pr:4989) James Bourbeau_
Accept pathlib.Path as pattern in read_hdf (:pr:3335) Jörg Dietrich_

Documentation ^^^^^^^^^^^^^

Move CLI API docs to relavant pages (:pr:4980) James Bourbeau_
Add to_datetime function to dataframe API docs Matthew Rocklin_
Add documentation entry for dask.array.ma.average (:pr:4970) Bouwe Andela_
Add bag.read_avro to bag API docs (:pr:4969) James Bourbeau_
Fix typo (:pr:4968) mbarkhau_
Docs: Drop support for Python 2.7 (:pr:4932) Hugo_
Remove requirement to modify changelog (:pr:4915) Matthew Rocklin_
Add documentation about meta column order (:pr:4887) Tom Augspurger_
Add documentation note in DataFrame.shift (:pr:4886) Tom Augspurger_
Docs: Fix typo (:pr:4868) Paweł Kordek_
Put do/don't into boxes for delayed best practice docs (:pr:3821) Martin Durant_
Doc fixups (:pr:2528) Tom Augspurger_
Add quansight to paid support doc section (:pr:4838) Martin Durant_
Add document for custom startup (:pr:4833) Matthew Rocklin_
Allow utils.derive_from to accept functions, apply across array (:pr:4804) Martin Durant_
Add "Avoid Large Partitions" section to best practices (:pr:4808) Matthew Rocklin_
Update URL for joblib to new website hosting their doc (:pr:4816) Christian Hudon_

.. _v1.2.2 / 2019-05-08:

1.2.2 / 2019-05-08

Array ^^^^^

Clarify regions kwarg to array.store (:pr:4759) Martin Durant_
Add dtype= parameter to da.random.randint (:pr:4753) Matthew Rocklin_
Use "row major" rather than "C order" in docstring (:pr:4452) @asmith26_
Normalize Xarray datasets to Dask arrays (:pr:4756) Matthew Rocklin_
Remove normed keyword in da.histogram (:pr:4755) Matthew Rocklin_

Bag ^^^

Add key argument to Bag.distinct (:pr:4423) Daniel Severo_

Core ^^^^

Add core dask config file (:pr:4774) Matthew Rocklin_
Add core dask config file to MANIFEST.in (:pr:4780) James Bourbeau_
Enabling glob with HTTP file-system (:pr:3926) Martin Durant_
HTTPFile.seek with whence=1 (:pr:4751) Martin Durant_
Remove config key normalization (:pr:4742) Jim Crist_

DataFrame ^^^^^^^^^

Remove explicit references to Pandas in dask.dataframe.groupby (:pr:4778) Matthew Rocklin_
Add support for group_keys kwarg in DataFrame.groupby() (:pr:4771) Brian Chu_
Describe doc (:pr:4762) Martin Durant_
Remove explicit pandas check in cumulative aggregations (:pr:4765) Nick Becker_
Added meta for read_json and test (:pr:4588) Abhinav Ralhan_
Add test for dtype casting (:pr:4760) Martin Durant_
Document alignment in map_partitions (:pr:4757) Jim Crist_
Implement Series.str.split(expand=True) (:pr:4744) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

Tweaks to develop.rst from trying to run tests (:pr:4772) Christian Hudon_
Add document describing phases of computation (:pr:4766) Matthew Rocklin_
Point users to Dask-Yarn from spark documentation (:pr:4770) Matthew Rocklin_
Update images in delayed doc to remove labels (:pr:4768) Martin Durant_
Explain intermediate storage for dask arrays (:pr:4025) John A Kirkham_
Specify bash code-block in array best practices (:pr:4764) James Bourbeau_
Add array best practices doc (:pr:4705) Matthew Rocklin_
Update optimization docs now that cull is not automatic (:pr:4752) Matthew Rocklin_

.. _v1.2.1 / 2019-04-29:

1.2.1 / 2019-04-29

Array ^^^^^

Fix map_blocks with block_info and broadcasting (:pr:4737) Bruce Merry_
Make 'minlength' keyword argument optional in da.bincount (:pr:4684) Genevieve Buckley_
Add support for map_blocks with no array arguments (:pr:4713) Bruce Merry_
Add dask.array.trace (:pr:4717) Danilo Horta_
Add sizeof support for cupy.ndarray (:pr:4715) Peter Andreas Entschev_
Add name kwarg to from_zarr (:pr:4663) Michael Eaton_
Add chunks='auto' to from_array (:pr:4704) Matthew Rocklin_
Raise TypeError if dask array is given as shape for da.ones, zeros, empty or full (:pr:4707) Genevieve Buckley_
Add TileDB backend (:pr:4679) Isaiah Norton_

Core ^^^^

Delay long list arguments (:pr:4735) Matthew Rocklin_
Bump to numpy >= 1.13, pandas >= 0.21.0 (:pr:4720) Jim Crist_
Remove file "test" (:pr:4710) James Bourbeau_
Reenable development build, uses upstream libraries (:pr:4696) Peter Andreas Entschev_
Remove assertion in HighLevelGraph constructor (:pr:4699) Matthew Rocklin_

DataFrame ^^^^^^^^^

Change cum-aggregation last-nonnull-value algorithm (:pr:4736) Nick Becker_
Fixup series-groupby-apply (:pr:4738) Jim Crist_
Refactor array.percentile and dataframe.quantile to use t-digest (:pr:4677) Janne Vuorela_
Allow naive concatenation of sorted dataframes (:pr:4725) Matthew Rocklin_
Fix perf issue in dd.Series.isin (:pr:4727) Jim Crist_
Remove hard pandas dependency for melt by using methodcaller (:pr:4719) Nick Becker_
A few dataframe metadata fixes (:pr:4695) Jim Crist_
Add Dataframe.replace (:pr:4714) Matthew Rocklin_
Add 'threshold' parameter to pd.DataFrame.dropna (:pr:4625) Nathan Matare_

Documentation ^^^^^^^^^^^^^

Add warning about derived docstrings early in the docstring (:pr:4716) Matthew Rocklin_
Create dataframe best practices doc (:pr:4703) Matthew Rocklin_
Uncomment dask_sphinx_theme (:pr:4728) James Bourbeau_
Fix minor typo fix in a Queue/fire_and_forget example (:pr:4709) Matthew Rocklin_
Update from_pandas docstring to match signature (:pr:4698) James Bourbeau_

.. _v1.2.0 / 2019-04-12:

1.2.0 / 2019-04-12

Array ^^^^^

Fixed mean() and moment() on sparse arrays (:pr:4525) Peter Andreas Entschev_
Add test for NEP-18. (:pr:4675) Hameer Abbasi_
Allow None to say "no chunking" in normalize_chunks (:pr:4656) Matthew Rocklin_
Fix limit value in auto_chunks (:pr:4645) Matthew Rocklin_

Core ^^^^

Updated diagnostic bokeh test for compatibility with bokeh>=1.1.0 (:pr:4680) Philipp Rudiger_
Adjusts codecov's target/threshold, disable patch (:pr:4671) Peter Andreas Entschev_
Always start with empty http buffer, not None (:pr:4673) Martin Durant_

DataFrame ^^^^^^^^^

Propagate index dtype and name when create dask dataframe from array (:pr:4686) Henrique Ribeiro_
Fix ordering of quantiles in describe (:pr:4647) gregrf_
Clean up and document rearrange_column_by_tasks (:pr:4674) Matthew Rocklin_
Mark some parquet tests xfail (:pr:4667) Peter Andreas Entschev_
Fix parquet breakages with arrow 0.13.0 (:pr:4668) Martin Durant_
Allow sample to be False when reading CSV from a remote URL (:pr:4634) Ian Rose_
Fix timezone metadata inference on parquet load (:pr:4655) Martin Durant_
Use is_dataframe/index_like in dd.utils (:pr:4657) Matthew Rocklin_
Add min_count parameter to groupby sum method (:pr:4648) Henrique Ribeiro_
Correct quantile to handle unsorted quantiles (:pr:4650) gregrf_

Documentation ^^^^^^^^^^^^^

Add delayed extra dependencies to install docs (:pr:4660) James Bourbeau_

.. _v1.1.5 / 2019-03-29:

1.1.5 / 2019-03-29

Array ^^^^^

Ensure that we use the dtype keyword in normalize_chunks (:pr:4646) Matthew Rocklin_

Core ^^^^

Use recursive glob in LocalFileSystem (:pr:4186) Brett Naul_
Avoid YAML deprecation (:pr:4603)
Fix CI and add set -e (:pr:4605) James Bourbeau_
Support builtin sequence types in dask.visualize (:pr:4602)
unpack/repack orderedDict (:pr:4623) Justin Poehnelt_
Add da.random.randint to API docs (:pr:4628) James Bourbeau_
Add zarr to CI environment (:pr:4604) James Bourbeau_
Enable codecov (:pr:4631) Peter Andreas Entschev_

DataFrame ^^^^^^^^^

Support setting the index (:pr:4565)
DataFrame.itertuples accepts index, name kwargs (:pr:4593) Dan O'Donovan_
Support non-Pandas series in dd.Series.unique (:pr:4599) Benjamin Zaitlen_
Replace use of explicit type check with ._is_partition_type predicate (:pr:4533)
Remove additional pandas warnings in tests (:pr:4576)
Check object for name/dtype attributes rather than type (:pr:4606)
Fix comparison against pd.Series (:pr:4613) amerkel2_
Fixing warning from setting categorical codes to floats (:pr:4624) Julia Signell_
Fix renaming on index to_frame method (:pr:4498) Henrique Ribeiro_
Fix divisions when joining two single-partition dataframes (:pr:4636) Justin Waugh_
Warn if partitions overlap in compute_divisions (:pr:4600) Brian Chu_
Give informative meta= warning (:pr:4637) Matthew Rocklin_
Add informative error message to Series.getitem (:pr:4638) Matthew Rocklin_
Add clear exception message when using index or index_col in read_csv (:pr:4651) Álvaro Abella Bascarán_

Documentation ^^^^^^^^^^^^^

Add documentation for custom groupby aggregations (:pr:4571)
Docs dataframe joins (:pr:4569)
Specify fork-based contributions (:pr:4619) James Bourbeau_
correct to_parquet example in docs (:pr:4641) Aaron Fowles_
Update and secure several references (:pr:4649) Søren Fuglede Jørgensen_

.. _v1.1.4 / 2019-03-08:

1.1.4 / 2019-03-08

Array ^^^^^

Use mask selection in compress (:pr:4548) John A Kirkham_
Use asarray in extract (:pr:4549) John A Kirkham_
Use correct dtype when test concatenation. (:pr:4539) Elliott Sales de Andrade_
Fix CuPy tests or properly marks as xfail (:pr:4564) Peter Andreas Entschev_

Core ^^^^

Fix local scheduler callback to deal with custom caching (:pr:4542) Yu Feng_
Use parse_bytes in read_bytes(sample=...) (:pr:4554) Matthew Rocklin_

DataFrame ^^^^^^^^^

Fix up groupby-standard deviation again on object dtype keys (:pr:4541) Matthew Rocklin_
TST/CI: Updates for pandas 0.24.1 (:pr:4551) Tom Augspurger_
Add ability to control number of unique elements in timeseries (:pr:4557) Matthew Rocklin_
Add support in read_csv for parameter skiprows for other iterables (:pr:4560) @JulianWgs_

Documentation ^^^^^^^^^^^^^

DataFrame to Array conversion and unknown chunks (:pr:4516) Scott Sievert_
Add docs for random array creation (:pr:4566) Matthew Rocklin_
Fix typo in docstring (:pr:4572) Shyam Saladi_

.. _v1.1.3 / 2019-03-01:

1.1.3 / 2019-03-01

Array ^^^^^

Modify mean chunk functions to return dicts rather than arrays (:pr:4513) Matthew Rocklin_
Change sparse installation in CI for NumPy/Python2 compatibility (:pr:4537) Matthew Rocklin_

DataFrame ^^^^^^^^^

Make merge dispatchable on pandas/other dataframe types (:pr:4522) Matthew Rocklin_
read_sql_table - datetime index fix and index type checking (:pr:4474) Joe Corbett_
Use generalized form of index checking (is_index_like) (:pr:4531) Benjamin Zaitlen_
Add tests for groupby reductions with object dtypes (:pr:4535) Matthew Rocklin_
Fixes #4467 : Updates time_series for pandas deprecation (:pr:4530) @HSR05_

Documentation ^^^^^^^^^^^^^

Add missing method to documentation index (:pr:4528) Bart Broere_

.. _v1.1.2 / 2019-02-25:

1.1.2 / 2019-02-25

Array ^^^^^

Fix another unicode/mixed-type edge case in normalize_array (:pr:4489) Marco Neumann_
Add dask.array.diagonal (:pr:4431) Danilo Horta_
Call asanyarray in unify_chunks (:pr:4506) Jim Crist_
Modify moment chunk functions to return dicts (:pr:4519) Peter Andreas Entschev_

Bag ^^^

Don't inline output keys in dask.bag (:pr:4464) Jim Crist_
Ensure that bag.from_sequence always includes at least one partition (:pr:4475) Anderson Banihirwe_
Implement out_type for bag.fold (:pr:4502) Matthew Rocklin_
Remove map from bag keynames (:pr:4500) Matthew Rocklin_
Avoid itertools.repeat in map_partitions (:pr:4507) Matthew Rocklin_

DataFrame ^^^^^^^^^

Fix relative path parsing on windows when using fastparquet (:pr:4445) Janne Vuorela_
Fix bug in pyarrow and hdfs (:pr:4453) (:pr:4455) Michał Jastrzębski_
df getitem with integer slices is not implemented (:pr:4466) Jim Crist_
Replace cudf-specific code with dask-cudf import (:pr:4470) Matthew Rocklin_
Avoid groupby.agg(callable) in groupby-var (:pr:4482) Matthew Rocklin_
Consider uint types as numerical in check_meta (:pr:4485) Marco Neumann_
Fix some typos in groupby comments (:pr:4494) Daniel Saxton_
Add error message around set_index(inplace=True) (:pr:4501) Matthew Rocklin_
meta_nonempty works with categorical index (:pr:4505) Jim Crist_
Add module name to expected meta error message (:pr:4499) Matthew Rocklin_
groupby-nunique works on empty chunk (:pr:4504) Jim Crist_
Propagate index metadata if not specified (:pr:4509) Jim Crist_

Documentation ^^^^^^^^^^^^^

Update docs to use from_zarr (:pr:4472) John A Kirkham_
DOC: add section of Using Other S3-Compatible Services for remote-data-services (:pr:4405) Aploium_
Fix header level of section in changelog (:pr:4483) Bruce Merry_
Add quotes to pip install [skip-ci] (:pr:4508) James Bourbeau_

Core ^^^^

Extend started_cbs AFTER state is initialized (:pr:4460) Marco Neumann_
Fix bug in HTTPFile.fetch_range with headers (:pr:4479) (:pr:4480) Ross Petchler
Repeat optimize_blockwise for diamond fusion (:pr:4492) Matthew Rocklin_

.. _v1.1.1 / 2019-01-31:

1.1.1 / 2019-01-31

Array ^^^^^

Add support for cupy.einsum (:pr:4402) Johnnie Gray_
Provide byte size in chunks keyword (:pr:4434) Adam Beberg_
Raise more informative error for histogram bins and range (:pr:4430) James Bourbeau_

DataFrame ^^^^^^^^^

Lazily register more cudf functions and move to backends file (:pr:4396) Matthew Rocklin_
Fix ORC tests for pyarrow 0.12.0 (:pr:4413) Jim Crist_
rearrange_by_column: ensure that shuffle arg defaults to 'disk' if it's None in dask.config (:pr:4414) George Sakkis_
Implement filters for read_pyarrow (:pr:4415) George Sakkis
Avoid checking against types in is_dataframe_like (:pr:4418) Matthew Rocklin_
Pass username as 'user' when using pyarrow (:pr:4438) Roma Sokolov_

Delayed ^^^^^^^

Fix DelayedAttr return value (:pr:4440) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

Use SVG for pipeline graphic (:pr:4406) John A Kirkham_
Add doctest-modules to py.test documentation (:pr:4427) Daniel Severo_

Core ^^^^

Work around psutil 5.5.0 not allowing pickling Process objects Janne Vuorela_

.. _v1.1.0 / 2019-01-18:

1.1.0 / 2019-01-18

Array ^^^^^

Fix the average function when there is a masked array (:pr:4236) Damien Garaud_
Add allow_unknown_chunksizes to hstack and vstack (:pr:4287) Paul Vecchio_
Fix tensordot for 27+ dimensions (:pr:4304) Johnnie Gray_
Fixed block_info with axes. (:pr:4301) Tom Augspurger_
Use safe_wraps for matmul (:pr:4346) Mark Harfouche_
Use chunks="auto" in array creation routines (:pr:4354) Matthew Rocklin_
Fix np.matmul in dask.array.Array.array_ufunc (:pr:4363) Stephan Hoyer_
COMPAT: Re-enable multifield copy->view change (:pr:4357) Diane Trout_
Calling np.dtype on a delayed object works (:pr:4387) Jim Crist_
Rework normalize_array for numpy data (:pr:4312) Marco Neumann_

DataFrame ^^^^^^^^^

Add fill_value support for series comparisons (:pr:4250) James Bourbeau_
Add schema name in read_sql_table for empty tables (:pr:4268) Mina Farid_
Adjust check for bad chunks in map_blocks (:pr:4308) Tom Augspurger_
Add dask.dataframe.read_fwf (:pr:4316) @slnguyen_
Use atop fusion in dask dataframe (:pr:4229) Matthew Rocklin_
Use parallel_types() in from_pandas (:pr:4331) Matthew Rocklin_
Change DataFrame.repr_data to method (:pr:4330) Matthew Rocklin
Install pyarrow fastparquet for Appveyor (:pr:4338) Gábor Lipták_
Remove explicit pandas checks and provide cudf lazy registration (:pr:4359) Matthew Rocklin_
Replace isinstance(..., pandas) with is_dataframe_like (:pr:4375) Matthew Rocklin_
ENH: Support 3rd-party ExtensionArrays (:pr:4379) Tom Augspurger_
Pandas 0.24.0 compat (:pr:4374) Tom Augspurger_

Documentation ^^^^^^^^^^^^^

Fix link to 'map_blocks' function in array api docs (:pr:4258) David Hoese_
Add a paragraph on Dask-Yarn in the cloud docs (:pr:4260) Jim Crist_
Copy edit documentation (:pr:4267), (:pr:4263), (:pr:4262), (:pr:4277), (:pr:4271), (:pr:4279), (:pr:4265), (:pr:4295), (:pr:4293), (:pr:4296), (:pr:4302), (:pr:4306), (:pr:4318), (:pr:4314), (:pr:4309), (:pr:4317), (:pr:4326), (:pr:4325), (:pr:4322), (:pr:4332), (:pr:4333), Miguel Farrajota_
Fix typo in code example (:pr:4272) Daniel Li_
Doc: Update array-api.rst (:pr:4259) (:pr:4282) Prabakaran Kumaresshan_
Update hpc doc (:pr:4266) Guillaume Eynard-Bontemps_
Doc: Replace from_avro with read_avro in documents (:pr:4313) Prabakaran Kumaresshan_
Remove reference to "get" scheduler functions in docs (:pr:4350) Matthew Rocklin_
Fix typo in docstring (:pr:4376) Daniel Saxton_
Added documentation for dask.dataframe.merge (:pr:4382) Jendrik Jördening_

Core ^^^^

Avoid recursion in dask.core.get (:pr:4219) Matthew Rocklin_
Remove verbose flag from pytest setup.cfg (:pr:4281) Matthew Rocklin_
Support Pytest 4.0 by specifying marks explicitly (:pr:4280) Takahiro Kojima_
Add High Level Graphs (:pr:4092) Matthew Rocklin_
Fix SerializableLock locked and acquire methods (:pr:4294) Stephan Hoyer_
Pin boto3 to earlier version in tests to avoid moto conflict (:pr:4276) Martin Durant_
Treat None as missing in config when updating (:pr:4324) Matthew Rocklin_
Update Appveyor to Python 3.6 (:pr:4337) Gábor Lipták_
Use parse_bytes more liberally in dask.dataframe/bytes/bag (:pr:4339) Matthew Rocklin_
Add a better error message when cloudpickle is missing (:pr:4342) Mark Harfouche_
Support pool= keyword argument in threaded/multiprocessing get functions (:pr:4351) Matthew Rocklin_
Allow updates from arbitrary Mappings in config.update, not only dicts. (:pr:4356) Stuart Berg_
Move dask/array/top.py code to dask/blockwise.py (:pr:4348) Matthew Rocklin_
Add has_parallel_type (:pr:4395) Matthew Rocklin_
CI: Update Appveyor (:pr:4381) Tom Augspurger_
Ignore non-readable config files (:pr:4388) Jim Crist_

.. _v1.0.0 / 2018-11-28:

1.0.0 / 2018-11-28

Array ^^^^^

Add nancumsum/nancumprod unit tests (:pr:4215) crusaderky_

DataFrame ^^^^^^^^^

Add index to to_dask_dataframe docstring (:pr:4232) James Bourbeau_
Text and fix when appending categoricals with fastparquet (:pr:4245) Martin Durant_
Don't reread metadata when passing ParquetFile to read_parquet (:pr:4247) Martin Durant_

Documentation ^^^^^^^^^^^^^

Copy edit documentation (:pr:4222) (:pr:4224) (:pr:4228) (:pr:4231) (:pr:4230) (:pr:4234) (:pr:4235) (:pr:4254) Miguel Farrajota_
Updated doc for the new scheduler keyword (:pr:4251) @milesial_

Core ^^^^

Avoid a few warnings (:pr:4223) Matthew Rocklin_
Remove dask.store module (:pr:4221) Matthew Rocklin_
Remove AUTHORS.md Jim Crist_

.. _v0.20.2 / 2018-11-15:

0.20.2 / 2018-11-15

Array ^^^^^

Avoid fusing dependencies of atop reductions (:pr:4207) Matthew Rocklin_

Dataframe ^^^^^^^^^

Improve memory footprint for dataframe correlation (:pr:4193) Damien Garaud_
Add empty DataFrame check to boundary_slice (:pr:4212) James Bourbeau_

Documentation ^^^^^^^^^^^^^

Copy edit documentation (:pr:4197) (:pr:4204) (:pr:4198) (:pr:4199) (:pr:4200) (:pr:4202) (:pr:4209) Miguel Farrajota_
Add stats module namespace (:pr:4206) James Bourbeau_
Fix link in dataframe documentation (:pr:4208) James Bourbeau_

.. _v0.20.1 / 2018-11-09:

0.20.1 / 2018-11-09

Array ^^^^^

Only allocate the result space in wrapped_pad_func (:pr:4153) John A Kirkham_
Generalize expand_pad_width to expand_pad_value (:pr:4150) John A Kirkham_
Test da.pad with 2D linear_ramp case (:pr:4162) John A Kirkham_
Fix import for broadcast_to. (:pr:4168) samc0de_
Rewrite Dask Array's pad to add only new chunks (:pr:4152) John A Kirkham_
Validate index inputs to atop (:pr:4182) Matthew Rocklin_

Core ^^^^

Dask.config set and get normalize underscores and hyphens (:pr:4143) James Bourbeau_
Only subs on core collections, not subclasses (:pr:4159) Matthew Rocklin_
Add block_size=0 option to HTTPFileSystem. (:pr:4171) Martin Durant_
Add traverse support for dataclasses (:pr:4165) Armin Berres_
Avoid optimization on sharedicts without dependencies (:pr:4181) Matthew Rocklin_
Update the pytest version for TravisCI (:pr:4189) Damien Garaud_
Use key_split rather than funcname in visualize names (:pr:4160) Matthew Rocklin_

Dataframe ^^^^^^^^^

Add fix for DataFrame.setitem for index (:pr:4151) Anderson Banihirwe_
Fix column choice when passing list of files to fastparquet (:pr:4174) Martin Durant_
Pass engine_kwargs from read_sql_table to sqlalchemy (:pr:4187) Damien Garaud_

Documentation ^^^^^^^^^^^^^

Fix documentation in Delayed best practices example that returned an empty list (:pr:4147) Jonathan Fraine_
Copy edit documentation (:pr:4164) (:pr:4175) (:pr:4185) (:pr:4192) (:pr:4191) (:pr:4190) (:pr:4180) Miguel Farrajota_
Fix typo in docstring (:pr:4183) Carlos Valiente_

.. _v0.20.0 / 2018-10-26:

0.20.0 / 2018-10-26

Array ^^^^^

Fuse Atop operations (:pr:3998), (:pr:4081) Matthew Rocklin_
Support da.asanyarray on dask dataframes (:pr:4080) Matthew Rocklin_
Remove unnecessary endianness check in datetime test (:pr:4113) Elliott Sales de Andrade_
Set name=False in array foo_like functions (:pr:4116) Matthew Rocklin_
Remove dask.array.ghost module (:pr:4121) Matthew Rocklin_
Fix use of getargspec in dask array (:pr:4125) Stephan Hoyer_
Adds dask.array.invert (:pr:4127), (:pr:4131) Anderson Banihirwe_
Raise informative error on arg-reduction on unknown chunksize (:pr:4128), (:pr:4135) Matthew Rocklin_
Normalize reversed slices in dask array (:pr:4126) Matthew Rocklin_

Bag ^^^

Add bag.to_avro (:pr:4076) Martin Durant_

Core ^^^^

Pull num_workers from config.get (:pr:4086), (:pr:4093) James Bourbeau_
Fix invalid escape sequences with raw strings (:pr:4112) Elliott Sales de Andrade_
Raise an error on the use of the get= keyword and set_options (:pr:4077) Matthew Rocklin_
Add import for Azure DataLake storage, and add docs (:pr:4132) Martin Durant_
Avoid collections.Mapping/Sequence (:pr:4138) Matthew Rocklin_

Dataframe ^^^^^^^^^

Include index keyword in to_dask_dataframe (:pr:4071) Matthew Rocklin_
add support for duplicate column names (:pr:4087) Jan Koch_
Implement min_count for the DataFrame methods sum and prod (:pr:4090) Bart Broere_
Remove pandas warnings in concat (:pr:4095) Matthew Rocklin_
DataFrame.to_csv header option to only output headers in the first chunk (:pr:3909) Rahul Vaidya_
Remove Series.to_parquet (:pr:4104) Justin Dennison_
Avoid warnings and deprecated pandas methods (:pr:4115) Matthew Rocklin_
Swap 'old' and 'previous' when reporting append error (:pr:4130) Martin Durant_

Documentation ^^^^^^^^^^^^^

Copy edit documentation (:pr:4073), (:pr:4074), (:pr:4094), (:pr:4097), (:pr:4107), (:pr:4124), (:pr:4133), (:pr:4139) Miguel Farrajota_
Fix typo in code example (:pr:4089) Antonino Ingargiola_
Add pycon 2018 presentation (:pr:4102) Javad_
Quick description for gcsfs (:pr:4109) Martin Durant_
Fixed typo in docstrings of read_sql_table method (:pr:4114) TakaakiFuruse_
Make target directories in redirects if they don't exist (:pr:4136) Matthew Rocklin_

.. _v0.19.4 / 2018-10-09:

0.19.4 / 2018-10-09

Array ^^^^^

Implement apply_gufunc(..., axes=..., keepdims=...) (:pr:3985) Markus Gonser_

Bag ^^^

Fix typo in datasets.make_people (:pr:4069) Matthew Rocklin_

Dataframe ^^^^^^^^^

Added percentiles options for dask.dataframe.describe method (:pr:4067) Zhenqing Li_
Add DataFrame.partitions accessor similar to Array.blocks (:pr:4066) Matthew Rocklin_

Core ^^^^

Pass get functions and Clients through scheduler keyword (:pr:4062) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

Fix Typo on hpc example. (missing = in kwarg). (:pr:4068) Matthias Bussonier_
Extensive copy-editing: (:pr:4065), (:pr:4064), (:pr:4063) Miguel Farrajota_

.. _v0.19.3 / 2018-10-05:

0.19.3 / 2018-10-05

Array ^^^^^

Make da.RandomState extensible to other modules (:pr:4041) Matthew Rocklin_
Support unknown dims in ravel no-op case (:pr:4055) Jim Crist_
Add basic infrastructure for cupy (:pr:4019) Matthew Rocklin_
Avoid asarray and lock arguments for from_array(getitem) (:pr:4044) Matthew Rocklin_
Move local imports in corrcoef to global imports (:pr:4030) John A Kirkham_
Move local indices import to global import (:pr:4029) John A Kirkham_
Fix-up Dask Array's fromfunction w.r.t. dtype and kwargs (:pr:4028) John A Kirkham_
Don't use dummy expansion for trim_internal in overlapped (:pr:3964) Mark Harfouche_
Add unravel_index (:pr:3958) John A Kirkham_

Bag ^^^

Sort result in Bag.frequencies (:pr:4033) Matthew Rocklin_
Add support for npartitions=1 edge case in groupby (:pr:4050) James Bourbeau_
Add new random dataset for people (:pr:4018) Matthew Rocklin_
Improve performance of bag.read_text on small files (:pr:4013) Eric Wolak_
Add bag.read_avro (:pr:4000) (:pr:4007) Martin Durant_

Dataframe ^^^^^^^^^

Added an index parameter to :meth:dask.dataframe.from_dask_array for creating a dask DataFrame from a dask Array with a given index. (:pr:3991) Tom Augspurger_
Improve sub-classability of dask dataframe (:pr:4015) Matthew Rocklin_
Fix failing hdfs test [test-hdfs] (:pr:4046) Jim Crist_
fuse_subgraphs works without normal fuse (:pr:4042) Jim Crist_
Make path for reading many parquet files without prescan (:pr:3978) Martin Durant_
Index in dd.from_dask_array (:pr:3991) Tom Augspurger_
Making skiprows accept lists (:pr:3975) Julia Signell_
Fail early in fastparquet read for nonexistent column (:pr:3989) Martin Durant_

Core ^^^^

Add support for npartitions=1 edge case in groupby (:pr:4050) James Bourbeau_
Automatically wrap large arguments with dask.delayed in map_blocks/partitions (:pr:4002) Matthew Rocklin_
Fuse linear chains of subgraphs (:pr:3979) Jim Crist_
Make multiprocessing context configurable (:pr:3763) Itamar Turner-Trauring_

Documentation ^^^^^^^^^^^^^

Extensive copy-editing (:pr:4049), (:pr:4034), (:pr:4031), (:pr:4020), (:pr:4021), (:pr:4022), (:pr:4023), (:pr:4016), (:pr:4017), (:pr:4010), (:pr:3997), (:pr:3996), Miguel Farrajota_
Update shuffle method selection docs (:pr:4048) James Bourbeau_
Remove docs/source/examples, point to examples.dask.org (:pr:4014) Matthew Rocklin_
Replace readthedocs links with dask.org (:pr:4008) Matthew Rocklin_
Updates DataFrame.to_hdf docstring for returned values (:pr:3992) James Bourbeau_

.. _v0.19.2 / 2018-09-17:

0.19.2 / 2018-09-17

Array ^^^^^

apply_gufunc implements automatic infer of functions output dtypes (:pr:3936) Markus Gonser_
Fix array histogram range error when array has nans (:pr:3980) James Bourbeau_
Issue 3937 follow up, int type checks. (:pr:3956) Yu Feng_
from_array: add @martindurant's explaining of how hashing is done for an array. (:pr:3965) Mark Harfouche_
Support gradient with coordinate (:pr:3949) Keisuke Fujii_

Core ^^^^

Fix use of has_keyword with partial in Python 2.7 (:pr:3966) Mark Harfouche_
Set pyarrow as default for HDFS (:pr:3957) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

Use dask_sphinx_theme (:pr:3963) Matthew Rocklin_
Use JupyterLab in Binder links from main page Matthew Rocklin_
DOC: fixed sphinx syntax (:pr:3960) Tom Augspurger_

.. _v0.19.1 / 2018-09-06:

0.19.1 / 2018-09-06

Array ^^^^^

Don't enforce dtype if result has no dtype (:pr:3928) Matthew Rocklin_
Fix NumPy issubtype deprecation warning (:pr:3939) Bruce Merry_
Fix arg reduction tokens to be unique with different arguments (:pr:3955) Tobias de Jong_
Coerce numpy integers to ints in slicing code (:pr:3944) Yu Feng_
Linalg.norm ndim along axis partial fix (:pr:3933) Tobias de Jong_

Dataframe ^^^^^^^^^

Deterministic DataFrame.set_index (:pr:3867) George Sakkis_
Fix divisions in read_parquet when dealing with filters #3831 #3930 (:pr:3923) (:pr:3931) @andrethrill_
Fixing returning type in categorical.as_known (:pr:3888) Sriharsha Hatwar_
Fix DataFrame.assign for callables (:pr:3919) Tom Augspurger_
Include partitions with no width in repartition (:pr:3941) Matthew Rocklin_
Don't constrict stage/k dtype in dataframe shuffle (:pr:3942) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

DOC: Add hint on how to render task graphs horizontally (:pr:3922) Uwe Korn_
Add try-now button to main landing page (:pr:3924) Matthew Rocklin_

.. _v0.19.0 / 2018-08-29:

0.19.0 / 2018-08-29

Array ^^^^^

Support coordinate in gradient (:pr:3949) Keisuke Fujii_
Fix argtopk split_every bug (:pr:3810) crusaderky_
Ensure result computing dask.array.isnull() always gives a numpy array (:pr:3825) Stephan Hoyer_
Support concatenate for scipy.sparse in dask array (:pr:3836) Matthew Rocklin_
Fix argtopk on 32-bit systems. (:pr:3823) Elliott Sales de Andrade_
Normalize keys in rechunk (:pr:3820) Matthew Rocklin_
Allow shape of dask.array to be a numpy array (:pr:3844) Mark Harfouche_
Fix numpy deprecation warning on tuple indexing (:pr:3851) Tobias de Jong_
Rename ghost module to overlap (:pr:3830) Robert Sare_
Re-add the ghost import to da init (:pr:3861) Jim Crist_
Ensure copy preserves masked arrays (:pr:3852) Tobias de Jong_

DataFrame ^^^^^^^^^^

Added dtype and sparse keywords to :func:dask.dataframe.get_dummies (:pr:3792) Tom Augspurger_
Added :meth:dask.dataframe.to_dask_array for converting a Dask Series or DataFrame to a Dask Array, possibly with known chunk sizes (:pr:3884) Tom Augspurger
Changed the behavior for :meth:dask.array.asarray for dask dataframe and series inputs. Previously, the series was eagerly converted to an in-memory NumPy array before creating a dask array with known chunks sizes. This caused unexpectedly high memory usage. Now, no intermediate NumPy array is created, and a Dask array with unknown chunk sizes is returned (:pr:3884) Tom Augspurger
DataFrame.iloc (:pr:3805) Tom Augspurger_
When reading multiple paths, expand globs. (:pr:3828) Irina Truong_
Added index column name after resample (:pr:3833) Eric Bonfadini_
Add (lazy) shape property to dataframe and series (:pr:3212) Henrique Ribeiro_
Fix failing hdfs test [test-hdfs] (:pr:3858) Jim Crist_
Fixes for pyarrow 0.10.0 release (:pr:3860) Jim Crist_
Rename to_csv keys for diagnostics (:pr:3890) Matthew Rocklin_
Match pandas warnings for concat sort (:pr:3897) Tom Augspurger_
Include filename in read_csv (:pr:3908) Julia Signell_

Core ^^^^

Better error message on import when missing common dependencies (:pr:3771) Danilo Horta_
Drop Python 3.4 support (:pr:3840) Jim Crist_
Remove expired deprecation warnings (:pr:3841) Jim Crist_
Add DASK_ROOT_CONFIG environment variable (:pr:3849) Joe Hamman_
Don't cull in local scheduler, do cull in delayed (:pr:3856) Jim Crist_
Increase conda download retries (:pr:3857) Jim Crist_
Add python_requires and Trove classifiers (:pr:3855) @hugovk_
Fix collections.abc deprecation warnings in Python 3.7.0 (:pr:3876) Jan Margeta_
Allow dot jpeg to xfail in visualize tests (:pr:3896) Matthew Rocklin_
Add Python 3.7 to travis.yml (:pr:3894) Matthew Rocklin_
Add expand_environment_variables to dask.config (:pr:3893) Joe Hamman_

Docs ^^^^

Fix typo in import statement of diagnostics (:pr:3826) John Mrziglod_
Add link to YARN docs (:pr:3838) Jim Crist_
fix of minor typos in landing page index.html (:pr:3746) Christoph Moehl_
Update delayed-custom.rst (:pr:3850) Anderson Banihirwe_
DOC: clarify delayed docstring (:pr:3709) Scott Sievert_
Add new presentations (:pr:3880) Javad_
Add dask array normalize_chunks to documentation (:pr:3878) Daniel Rothenberg_
Docs: Fix link to snakeviz (:pr:3900) Hans Moritz Günther_
Add missing to docstring (:pr:3915) @rtobar`_

.. _v0.18.2 / 2018-07-23:

0.18.2 / 2018-07-23

Array ^^^^^

Reimplemented argtopk to make it release the GIL (:pr:3610) crusaderky_
Don't overlap on non-overlapped dimensions in map_overlap (:pr:3653) Matthew Rocklin_
Fix linalg.tsqr for dimensions of uncertain length (:pr:3662) Jeremy Chen_
Break apart uneven array-of-int slicing to separate chunks (:pr:3648) Matthew Rocklin_
Align auto chunks to provided chunks, rather than shape (:pr:3679) Matthew Rocklin_
Adds endpoint and retstep support for linspace (:pr:3675) James Bourbeau_
Implement .blocks accessor (:pr:3689) Matthew Rocklin_
Add block_info keyword to map_blocks functions (:pr:3686) Matthew Rocklin_
Slice by dask array of ints (:pr:3407) crusaderky_
Support dtype in arange (:pr:3722) crusaderky_
Fix argtopk with uneven chunks (:pr:3720) crusaderky_
Raise error when replace=False in da.choice (:pr:3765) James Bourbeau_
Update chunks in Array.__setitem__ (:pr:3767) Itamar Turner-Trauring_
Add a chunksize convenience property (:pr:3777) Jacob Tomlinson_
Fix and simplify array slicing behavior when step < 0 (:pr:3702) Ziyao Wei_
Ensure to_zarr with return_stored True returns a Dask Array (:pr:3786) John A Kirkham_

Bag ^^^

Add last_endline optional parameter in to_textfiles (:pr:3745) George Sakkis_

Dataframe ^^^^^^^^^

Add aggregate function for rolling objects (:pr:3772) Gerome Pistre_
Properly tokenize cumulative groupby aggregations (:pr:3799) Cloves Almeida_

Delayed ^^^^^^^

Add the @ operator to the delayed objects (:pr:3691) Mark Harfouche_
Add delayed best practices to documentation (:pr:3737) Matthew Rocklin_
Fix @delayed decorator for methods and add tests (:pr:3757) Ziyao Wei_

Core ^^^^

Fix extra progressbar (:pr:3669) Mike Neish_
Allow tasks back onto ordering stack if they have one dependency (:pr:3652) Matthew Rocklin_
Prefer end-tasks with low numbers of dependencies when ordering (:pr:3588) Tom Augspurger_
Add assert_eq to top-level modules (:pr:3726) Matthew Rocklin_
Test that dask collections can hold scipy.sparse arrays (:pr:3738) Matthew Rocklin_
Fix setup of lz4 decompression functions (:pr:3782) Elliott Sales de Andrade_
Add datasets module (:pr:3780) Matthew Rocklin_

.. _v0.18.1 / 2018-06-22:

0.18.1 / 2018-06-22

Array ^^^^^

from_array now supports scalar types and nested lists/tuples in input, just like all numpy functions do; it also produces a simpler graph when the input is a plain ndarray (:pr:3568) crusaderky_
Fix slicing of big arrays due to cumsum dtype bug (:pr:3620) Marco Rossi_
Add Dask Array implementation of pad (:pr:3578) John A Kirkham_
Fix array random API examples (:pr:3625) James Bourbeau_
Add average function to dask array (:pr:3640) James Bourbeau_
Tokenize ghost_internal with axes (:pr:3643) Matthew Rocklin_
Add outer for Dask Arrays (:pr:3658) John A Kirkham_

DataFrame ^^^^^^^^^

Add Index.to_series method (:pr:3613) Henrique Ribeiro_
Fix missing partition columns in pyarrow-parquet (:pr:3636) Martin Durant_

Core ^^^^

Minor tweaks to CI (:pr:3629) crusaderky_
Add back dask.utils.effective_get (:pr:3642) Matthew Rocklin_
DASK_CONFIG dictates config write location (:pr:3621) Jim Crist_
Replace 'collections' key in unpack_collections with unique key (:pr:3632) Yu Feng_
Avoid deepcopy in dask.config.set (:pr:3649) Matthew Rocklin_

.. _v0.18.0 / 2018-06-14:

0.18.0 / 2018-06-14

Array ^^^^^

Add to/from_zarr for Zarr-format datasets and arrays (:pr:3460) Martin Durant_
Experimental addition of generalized ufunc support, apply_gufunc, gufunc, and as_gufunc (:pr:3109) (:pr:3526) (:pr:3539) Markus Gonser_
Avoid unnecessary rechunking tasks (:pr:3529) Matthew Rocklin_
Compute dtypes at runtime for fft (:pr:3511) Matthew Rocklin_
Generate UUIDs for all da.store operations (:pr:3540) Martin Durant_
Correct internal dimension of Dask's SVD (:pr:3517) John A Kirkham_
BUG: do not raise IndexError for identity slice in array.vindex (:pr:3559) Scott Sievert_
Adds isneginf and isposinf (:pr:3581) John A Kirkham_
Drop Dask Array's learn module (:pr:3580) John A Kirkham_
added sfqr (short-and-fat) as a counterpart to tsqr… (:pr:3575) Jeremy Chen_
Allow 0-width chunks in dask.array.rechunk (:pr:3591) Marc Pfister_
Document Dask Array's nan_to_num in public API (:pr:3599) John A Kirkham_
Show block example (:pr:3601) John A Kirkham_
Replace token= keyword with name= in map_blocks (:pr:3597) Matthew Rocklin_
Disable locking in to_zarr (needed for using to_zarr in a distributed context) (:pr:3607) John A Kirkham_
Support Zarr Arrays in to_zarr/from_zarr (:pr:3561) John A Kirkham_
Added recursion to array/linalg/tsqr to better manage the single core bottleneck (:pr:3586) Jeremy Chan_ (:pr:3396) crusaderky_

Dataframe ^^^^^^^^^

Add to/read_json (:pr:3494) Martin Durant_
Adds index to unsupported arguments for DataFrame.rename method (:pr:3522) James Bourbeau_
Adds support to subset Dask DataFrame columns using numpy.ndarray, pandas.Series, and pandas.Index objects (:pr:3536) James Bourbeau_
Raise error if meta columns do not match dataframe (:pr:3485) Christopher Ren_
Add index to unsupprted argument for DataFrame.rename (:pr:3522) James Bourbeau_
Adds support for subsetting DataFrames with pandas Index/Series and numpy ndarrays (:pr:3536) James Bourbeau_
Dataframe sample method docstring fix (:pr:3566) James Bourbeau_
fixes dd.read_json to infer file compression (:pr:3594) Matt Lee_
Adds n to sample method (:pr:3606) James Bourbeau_
Add fastparquet ParquetFile object support (:pr:3573) @andrethrill_

Bag ^^^

Rename method= keyword to shuffle= in bag.groupby (:pr:3470) Matthew Rocklin_

Core ^^^^

Replace get= keyword with scheduler= keyword (:pr:3448) Matthew Rocklin_
Add centralized dask.config module to handle configuration for all Dask subprojects (:pr:3432) (:pr:3513) (:pr:3520) Matthew Rocklin_
Add dask-ssh CLI Options and Description. (:pr:3476) @beomi_
Read whole files fix regardless of header for HTTP (:pr:3496) Martin Durant_
Adds synchronous scheduler syntax to debugging docs (:pr:3509) James Bourbeau_
Replace dask.set_options with dask.config.set (:pr:3502) Matthew Rocklin_
Update sphinx readthedocs-theme (:pr:3516) Matthew Rocklin_
Introduce "auto" value for normalize_chunks (:pr:3507) Matthew Rocklin_
Fix check in configuration with env=None (:pr:3562) Simon Perkins_
Update sizeof definitions (:pr:3582) Matthew Rocklin_
Remove --verbose flag from travis-ci (:pr:3477) Matthew Rocklin_
Remove "da.random" from random array keys (:pr:3604) Matthew Rocklin_

.. _v0.17.5 / 2018-05-16:

0.17.5 / 2018-05-16

Array ^^^^^

Fix rechunk with chunksize of -1 in a dict (:pr:3469) Stephan Hoyer_
einsum now accepts the split_every parameter (:pr:3471) crusaderky_
Improved slicing performance (:pr:3479) Yu Feng_

DataFrame ^^^^^^^^^

Compatibility with pandas 0.23.0 (:pr:3499) Tom Augspurger_

.. _v0.17.4 / 2018-05-03:

0.17.4 / 2018-05-03

Dataframe ^^^^^^^^^

Add support for indexing Dask DataFrames with string subclasses (:pr:3461) James Bourbeau_
Allow using both sorted_index and chunksize in read_hdf (:pr:3463) Pierre Bartet_
Pass filesystem to arrow piece reader (:pr:3466) Martin Durant_
Switches to using dask.compat string_types (:pr:3462) James Bourbeau_

.. _v0.17.3 / 2018-05-02:

0.17.3 / 2018-05-02

Array ^^^^^

Add einsum for Dask Arrays (:pr:3412) Simon Perkins_
Add piecewise for Dask Arrays (:pr:3350) John A Kirkham_
Fix handling of nan in broadcast_shapes (:pr:3356) John A Kirkham_
Add isin for dask arrays (:pr:3363). Stephan Hoyer_
Overhauled topk for Dask Arrays: faster algorithm, particularly for large k's; added support for multiple axes, recursive aggregation, and an option to pick the bottom k elements instead. (:pr:3395) crusaderky_
The topk API has changed from topk(k, array) to the more conventional topk(array, k). The legacy API still works but is now deprecated. (:pr:2965) crusaderky_
New function argtopk for Dask Arrays (:pr:3396) crusaderky_
Fix handling partial depth and boundary in map_overlap (:pr:3445) John A Kirkham_
Add gradient for Dask Arrays (:pr:3434) John A Kirkham_

DataFrame ^^^^^^^^^

Allow t as shorthand for table in to_hdf for pandas compatibility (:pr:3330) Jörg Dietrich_
Added top level isna method for Dask DataFrames (:pr:3294) Christopher Ren_
Fix selection on partition column on read_parquet for engine="pyarrow" (:pr:3207) Uwe Korn_
Added DataFrame.squeeze method (:pr:3366) Christopher Ren_
Added infer_divisions option to read_parquet to specify whether read engines should compute divisions (:pr:3387) Jon Mease_
Added support for inferring division for engine="pyarrow" (:pr:3387) Jon Mease_
Provide more informative error message for meta= errors (:pr:3343) Matthew Rocklin_
add orc reader (:pr:3284) Martin Durant_
Default compression for parquet now always Snappy, in line with pandas (:pr:3373) Martin Durant_
Fixed bug in Dask DataFrame and Series comparisons with NumPy scalars (:pr:3436) James Bourbeau_
Remove outdated requirement from repartition docstring (:pr:3440) Jörg Dietrich_
Fixed bug in aggregation when only a Series is selected (:pr:3446) Jörg Dietrich_
Add default values to make_timeseries (:pr:3421) Matthew Rocklin_

Core ^^^^

Support traversing collections in persist, visualize, and optimize (:pr:3410) Jim Crist_
Add schedule= keyword to compute and persist. This replaces common use of the get= keyword (:pr:3448) Matthew Rocklin_

.. _v0.17.2 / 2018-03-21:

0.17.2 / 2018-03-21

Array ^^^^^

Add broadcast_arrays for Dask Arrays (:pr:3217) John A Kirkham_
Add bitwise_* ufuncs (:pr:3219) John A Kirkham_
Add optional axis argument to squeeze (:pr:3261) John A Kirkham_
Validate inputs to atop (:pr:3307) Matthew Rocklin_
Avoid calls to astype in concatenate if all parts have the same dtype (:pr:3301) Martin Durant_

DataFrame ^^^^^^^^^

Fixed bug in shuffle due to aggressive truncation (:pr:3201) Matthew Rocklin_
Support specifying categorical columns on read_parquet with categories=[…] for engine="pyarrow" (:pr:3177) Uwe Korn_
Add dd.tseries.Resampler.agg (:pr:3202) Richard Postelnik_
Support operations that mix dataframes and arrays (:pr:3230) Matthew Rocklin_
Support extra Scalar and Delayed args in dd.groupby._Groupby.apply (:pr:3256) Gabriele Lanaro_

Bag ^^^

Support joining against single-partitioned bags and delayed objects (:pr:3254) Matthew Rocklin_

Core ^^^^

Fixed bug when using unexpected but hashable types for keys (:pr:3238) Daniel Collins_
Fix bug in task ordering so that we break ties consistently with the key name (:pr:3271) Matthew Rocklin_
Avoid sorting tasks in order when the number of tasks is very large (:pr:3298) Matthew Rocklin_

.. _v0.17.1 / 2018-02-22:

0.17.1 / 2018-02-22

Array ^^^^^

Corrected dimension chunking in indices (:issue:3166, :pr:3167) Simon Perkins_
Inline store_chunk calls for store's return_stored option (:pr:3153) John A Kirkham_
Compatibility with struct dtypes for NumPy 1.14.1 release (:pr:3187) Matthew Rocklin_

DataFrame ^^^^^^^^^

Bugfix to allow column assignment of pandas datetimes(:pr:3164) Max Epstein_

Core ^^^^

New file-system for HTTP(S), allowing direct loading from specific URLs (:pr:3160) Martin Durant_
Fix bug when tokenizing partials with no keywords (:pr:3191) Matthew Rocklin_
Use more recent LZ4 API (:pr:3157) Thrasibule_
Introduce output stream parameter for progress bar (:pr:3185) Dieter Weber_

.. _v0.17.0 / 2018-02-09:

0.17.0 / 2018-02-09

Array ^^^^^

Added a support object-type arrays for nansum, nanmin, and nanmax (:issue:3133) Keisuke Fujii_
Update error handling when len is called with empty chunks (:issue:3058) Xander Johnson_
Fixes a metadata bug with store's return_stored option (:pr:3064) John A Kirkham_
Fix a bug in optimization.fuse_slice to properly handle when first input is None (:pr:3076) James Bourbeau_
Support arrays with unknown chunk sizes in percentile (:pr:3107) Matthew Rocklin_
Tokenize scipy.sparse arrays and np.matrix (:pr:3060) Roman Yurchak_

DataFrame ^^^^^^^^^

Support month timedeltas in repartition(freq=...) (:pr:3110) Matthew Rocklin_
Avoid mutation in dataframe groupby tests (:pr:3118) Matthew Rocklin_
read_csv, read_table, and read_parquet accept iterables of paths (:pr:3124) Jim Crist_
Deprecates the dd.to_delayed function in favor of the existing method (:pr:3126) Jim Crist_
Return dask.arrays from df.map_partitions calls when the UDF returns a numpy array (:pr:3147) Matthew Rocklin_
Change handling of columns and index in dd.read_parquet to be more consistent, especially in handling of multi-indices (:pr:3149) Jim Crist_
fastparquet append=True allowed to create new dataset (:pr:3097) Martin Durant_
dtype rationalization for sql queries (:pr:3100) Martin Durant_

Bag ^^^

Document bag.map_paritions function may receive either a list or generator. (:pr:3150) Nir_

Core ^^^^

Change default task ordering to prefer nodes with few dependents and then many downstream dependencies (:pr:3056) Matthew Rocklin_
Add color= option to visualize to color by task order (:pr:3057) (:pr:3122) Matthew Rocklin_
Deprecate dask.bytes.open_text_files (:pr:3077) Jim Crist_
Remove short-circuit hdfs reads handling due to maintenance costs. May be re-added in a more robust manner later (:pr:3079) Jim Crist_
Add dask.base.optimize for optimizing multiple collections without computing. (:pr:3071) Jim Crist_
Rename dask.optimize module to dask.optimization (:pr:3071) Jim Crist_
Change task ordering to do a full traversal (:pr:3066) Matthew Rocklin_
Adds an optimize_graph keyword to all to_delayed methods to allow controlling whether optimizations occur on conversion. (:pr:3126) Jim Crist_
Support using pyarrow for hdfs integration (:pr:3123) Jim Crist_
Move HDFS integration and tests into dask repo (:pr:3083) Jim Crist_
Remove write_bytes (:pr:3116) Jim Crist_

.. _v0.16.1 / 2018-01-09:

0.16.1 / 2018-01-09

Array ^^^^^

Fix handling of scalar percentile values in percentile (:pr:3021) James Bourbeau_
Prevent bool() coercion from calling compute (:pr:2958) Albert DeFusco_
Add matmul (:pr:2904) John A Kirkham_
Support N-D arrays with matmul (:pr:2909) John A Kirkham_
Add vdot (:pr:2910) John A Kirkham_
Explicit chunks argument for broadcast_to (:pr:2943) Stephan Hoyer_
Add meshgrid (:pr:2938) John A Kirkham_ and (:pr:3001) Markus Gonser_
Preserve singleton chunks in fftshift/ifftshift (:pr:2733) John A Kirkham_
Fix handling of negative indexes in vindex and raise errors for out of bounds indexes (:pr:2967) Stephan Hoyer_
Add flip, flipud, fliplr (:pr:2954) John A Kirkham_
Add float_power ufunc (:pr:2962) (:pr:2969) John A Kirkham_
Compatibility for changes to structured arrays in the upcoming NumPy 1.14 release (:pr:2964) Tom Augspurger_
Add block (:pr:2650) John A Kirkham_
Add frompyfunc (:pr:3030) Jim Crist_
Add the return_stored option to store for chaining stored results (:pr:2980) John A Kirkham_

DataFrame ^^^^^^^^^

Fixed naming bug in cumulative aggregations (:issue:3037) Martijn Arts_
Fixed dd.read_csv when names is given but header is not set to None (:issue:2976) Martijn Arts_
Fixed dd.read_csv so that passing instances of CategoricalDtype in dtype will result in known categoricals (:pr:2997) Tom Augspurger_
Prevent bool() coercion from calling compute (:pr:2958) Albert DeFusco_
DataFrame.read_sql() (:pr:2928) to an empty database tables returns an empty dask dataframe Apostolos Vlachopoulos_
Compatibility for reading Parquet files written by PyArrow 0.8.0 (:pr:2973) Tom Augspurger_
Correctly handle the column name (df.columns.name) when reading in dd.read_parquet (:pr:2973) Tom Augspurger_
Fixed dd.concat losing the index dtype when the data contained a categorical (:issue:2932) Tom Augspurger_
Add dd.Series.rename (:pr:3027) Jim Crist_
DataFrame.merge() now supports merging on a combination of columns and the index (:pr:2960) Jon Mease_
Removed the deprecated dd.rolling* methods, in preparation for their removal in the next pandas release (:pr:2995) Tom Augspurger_
Fix metadata inference bug in which single-partition series were mistakenly special cased (:pr:3035) Jim Crist_
Add support for Series.str.cat (:pr:3028) Jim Crist_

Core ^^^^

Improve 32-bit compatibility (:pr:2937) Matthew Rocklin_
Change task prioritization to avoid upwards branching (:pr:3017) Matthew Rocklin_

.. _v0.16.0 / 2017-11-17:

0.16.0 / 2017-11-17

This is a major release. It includes breaking changes, new protocols, and a large number of bug fixes.

Array ^^^^^

Add atleast_1d, atleast_2d, and atleast_3d (:pr:2760) (:pr:2765) John A Kirkham_
Add allclose (:pr:2771) by John A Kirkham_
Remove random.different_seeds from Dask Array API docs (:pr:2772) John A Kirkham_
Deprecate vnorm in favor of dask.array.linalg.norm (:pr:2773) John A Kirkham_
Reimplement unique to be lazy (:pr:2775) John A Kirkham_
Support broadcasting of Dask Arrays with 0-length dimensions (:pr:2784) John A Kirkham_
Add asarray and asanyarray to Dask Array API docs (:pr:2787) James Bourbeau_
Support unique's return_* arguments (:pr:2779) John A Kirkham_
Simplify _unique_internal (:pr:2850) (:pr:2855) John A Kirkham_
Avoid removing some getter calls in array optimizations (:pr:2826) Jim Crist_

DataFrame ^^^^^^^^^

Support pyarrow in dd.to_parquet (:pr:2868) Jim Crist_
Fixed DataFrame.quantile and Series.quantile returning nan when missing values are present (:pr:2791) Tom Augspurger_
Fixed DataFrame.quantile losing the result .name when q is a scalar (:pr:2791) Tom Augspurger_
Fixed dd.concat return a dask.Dataframe when concatenating a single series along the columns, matching pandas' behavior (:pr:2800) James Munroe_
Fixed default inplace parameter for DataFrame.eval to match the pandas defualt for pandas >= 0.21.0 (:pr:2838) Tom Augspurger_
Fix exception when calling DataFrame.set_index on text column where one of the partitions was empty (:pr:2831) Jesse Vogt_
Do not raise exception when calling DataFrame.set_index on empty dataframe (:pr:2827) Jesse Vogt_
Fixed bug in Dataframe.fillna when filling with a Series value (:pr:2810) Tom Augspurger_
Deprecate old argument ordering in dd.to_parquet to better match convention of putting the dataframe first (:pr:2867) Jim Crist_
df.astype(categorical_dtype -> known categoricals (:pr:2835) Jim Crist_
Test against Pandas release candidate (:pr:2814) Tom Augspurger_
Add more tests for read_parquet(engine='pyarrow') (:pr:2822) Uwe Korn_
Remove unnecessary map_partitions in aggregate (:pr:2712) Christopher Prohm_
Fix bug calling sample on empty partitions (:pr:2818) @xwang777_
Error nicely when parsing dates in read_csv (:pr:2863) Jim Crist_
Cleanup handling of passing filesystem objects to PyArrow readers (:pr:2527) @fjetter_
Support repartitioning even if there are no divisions (:pr:2873) @Ced4_
Support reading/writing to hdfs using pyarrow in dd.to_parquet (:pr:2894, :pr:2881) Jim Crist_

Core ^^^^

Allow tuples as sharedict keys (:pr:2763) Matthew Rocklin_
Calling compute within a dask.distributed task defaults to distributed scheduler (:pr:2762) Matthew Rocklin_
Auto-import gcsfs when gcs:// protocol is used (:pr:2776) Matthew Rocklin_
Fully remove dask.async module, use dask.local instead (:pr:2828) Thomas Caswell_
Compatibility with bokeh 0.12.10 (:pr:2844) Tom Augspurger_
Reduce test memory usage (:pr:2782) Jim Crist_
Add Dask collection interface (:pr:2748) Jim Crist_
Update Dask collection interface during XArray integration (:pr:2847) Matthew Rocklin_
Close resource profiler process on exit (:pr:2871) Jim Crist_
Fix S3 tests (:pr:2875) Jim Crist_
Fix port for bokeh dashboard in docs (:pr:2889) Ian Hopkinson_
Wrap Dask filesystems for PyArrow compatibility (:pr:2881) Jim Crist_

.. _v0.15.4 / 2017-10-06:

0.15.4 / 2017-10-06

Array ^^^^^

da.random.choice now works with array arguments (:pr:2781)
Support indexing in arrays with np.int (fixes regression) (:pr:2719)
Handle zero dimension with rechunking (:pr:2747)
Support -1 as an alias for "size of the dimension" in chunks (:pr:2749)
Call mkdir in array.to_npy_stack (:pr:2709)

DataFrame ^^^^^^^^^

Added the .str accessor to Categoricals with string categories (:pr:2743)
Support int96 (spark) datetimes in parquet writer (:pr:2711)
Pass on file scheme to fastparquet (:pr:2714)
Support Pandas 0.21 (:pr:2737)

Bag ^^^

Add tree reduction support for foldby (:pr:2710)

Core ^^^^

Drop s3fs from pip install dask[complete] (:pr:2750)

.. _v0.15.3 / 2017-09-24:

0.15.3 / 2017-09-24

Array ^^^^^

Add masked arrays (:pr:2301)
Add *_like array creation functions (:pr:2640)
Indexing with unsigned integer array (:pr:2647)
Improved slicing with boolean arrays of different dimensions (:pr:2658)
Support literals in top and atop (:pr:2661)
Optional axis argument in cumulative functions (:pr:2664)
Improve tests on scalars with assert_eq (:pr:2681)
Fix norm keepdims (:pr:2683)
Add ptp (:pr:2691)
Add apply_along_axis (:pr:2690) and apply_over_axes (:pr:2702)

DataFrame ^^^^^^^^^

Added Series.str[index] (:pr:2634)
Allow the groupby by param to handle columns and index levels (:pr:2636)
DataFrame.to_csv and Bag.to_textfiles now return the filenames to which they have written (:pr:2655)
Fix combination of partition_on and append in to_parquet (:pr:2645)
Fix for parquet file schemes (:pr:2667)
Repartition works with mixed categoricals (:pr:2676)

Core ^^^^

python setup.py test now runs tests (:pr:2641)
Added new cheatsheet (:pr:2649)
Remove resize tool in Bokeh plots (:pr:2688)

.. _v0.15.2 / 2017-08-25:

0.15.2 / 2017-08-25

Array ^^^^^

Remove spurious keys from map_overlap graph (:pr:2520)
where works with non-bool condition and scalar values (:pr:2543) (:pr:2549)
Improve compress (:pr:2541) (:pr:2545) (:pr:2555)
Add argwhere, _nonzero, and where(cond) (:pr:2539)
Generalize vindex in dask.array to handle multi-dimensional indices (:pr:2573)
Add choose method (:pr:2584)
Split code into reorganized files (:pr:2595)
Add linalg.norm (:pr:2597)
Add diff, ediff1d (:pr:2607), (:pr:2609)
Improve dtype inference and reflection (:pr:2571)

Bag ^^^

Remove deprecated Bag behaviors (:pr:2525)

DataFrame ^^^^^^^^^

Support callables in assign (:pr:2513)
better error messages for read_csv (:pr:2522)
Add dd.to_timedelta (:pr:2523)
Verify metadata in from_delayed (:pr:2534) (:pr:2591)
Add DataFrame.isin (:pr:2558)
Read_hdf supports iterables of files (:pr:2547)

Core ^^^^

Remove bare except: blocks everywhere (:pr:2590)

.. _v0.15.1 / 2017-07-08:

0.15.1 / 2017-07-08

Add storage_options to to_textfiles and to_csv (:pr:2466)
Rechunk and simplify rfftfreq (:pr:2473), (:pr:2475)
Better support ndarray subclasses (:pr:2486)
Import star in dask.distributed (:pr:2503)
Threadsafe cache handling with tokenization (:pr:2511)

.. _v0.15.0 / 2017-06-09:

0.15.0 / 2017-06-09

Array ^^^^^

Add dask.array.stats submodule (:pr:2269)
Support ufunc.outer (:pr:2345)
Optimize fancy indexing by reducing graph overhead (:pr:2333) (:pr:2394)
Faster array tokenization using alternative hashes (:pr:2377)
Added the matmul @ operator (:pr:2349)
Improved coverage of the numpy.fft module (:pr:2320) (:pr:2322) (:pr:2327) (:pr:2323)
Support NumPy's __array_ufunc__ protocol (:pr:2438)

Bag ^^^

Fix bug where reductions on bags with no partitions would fail (:pr:2324)
Add broadcasting and variadic db.map top-level function. Also remove auto-expansion of tuples as map arguments (:pr:2339)
Rename Bag.concat to Bag.flatten (:pr:2402)

DataFrame ^^^^^^^^^

Parquet improvements (:pr:2277) (:pr:2422)

Core ^^^^

Move dask.async module to dask.local (:pr:2318)
Support callbacks with nested scheduler calls (:pr:2397)
Support pathlib.Path objects as uris (:pr:2310)

.. _v0.14.3 / 2017-05-05:

0.14.3 / 2017-05-05

DataFrame ^^^^^^^^^

Pandas 0.20.0 support

.. _v0.14.2 / 2017-05-03:

0.14.2 / 2017-05-03

Array ^^^^^

Add da.indices (:pr:2268), da.tile (:pr:2153), da.roll (:pr:2135)
Simultaneously support drop_axis and new_axis in da.map_blocks (:pr:2264)
Rechunk and concatenate work with unknown chunksizes (:pr:2235) and (:pr:2251)
Support non-numpy container arrays, notably sparse arrays (:pr:2234)
Tensordot contracts over multiple axes (:pr:2186)
Allow delayed targets in da.store (:pr:2181)
Support interactions against lists and tuples (:pr:2148)
Constructor plugins for debugging (:pr:2142)
Multi-dimensional FFTs (single chunk) (:pr:2116)

Bag ^^^

to_dataframe enforces consistent types (:pr:2199)

DataFrame ^^^^^^^^^

Set_index always fully sorts the index (:pr:2290)
Support compatibility with pandas 0.20.0 (:pr:2249), (:pr:2248), and (:pr:2246)
Support Arrow Parquet reader (:pr:2223)
Time-based rolling windows (:pr:2198)
Repartition can now create more partitions, not just less (:pr:2168)

Core ^^^^

Always use absolute paths when on POSIX file system (:pr:2263)
Support user provided graph optimizations (:pr:2219)
Refactor path handling (:pr:2207)
Improve fusion performance (:pr:2129), (:pr:2131), and (:pr:2112)

.. _v0.14.1 / 2017-03-22:

0.14.1 / 2017-03-22

Array ^^^^^

Micro-optimize optimizations (:pr:2058)
Change slicing optimizations to avoid fusing raw numpy arrays (:pr:2075) (:pr:2080)
Dask.array operations now work on numpy arrays (:pr:2079)
Reshape now works in a much broader set of cases (:pr:2089)
Support deepcopy python protocol (:pr:2090)
Allow user-provided FFT implementations in da.fft (:pr:2093)

DataFrame ^^^^^^^^^

Fix to_parquet with empty partitions (:pr:2020)
Optional npartitions='auto' mode in set_index (:pr:2025)
Optimize shuffle performance (:pr:2032)
Support efficient repartitioning along time windows like repartition(freq='12h') (:pr:2059)
Improve speed of categorize (:pr:2010)
Support single-row dataframe arithmetic (:pr:2085)
Automatically avoid shuffle when setting index with a sorted column (:pr:2091)
Improve handling of integer-na handling in read_csv (:pr:2098)

Delayed ^^^^^^^

Repeated attribute access on delayed objects uses the same key (:pr:2084)

Core ^^^^

Improve naming of nodes in dot visuals to avoid generic apply (:pr:2070)
Ensure that worker processes have different random seeds (:pr:2094)

.. _v0.14.0 / 2017-02-24:

0.14.0 / 2017-02-24

Array ^^^^^

Fix corner cases with zero shape and misaligned values in arange (:pr:1902), (:pr:1904), (:pr:1935), (:pr:1955), (:pr:1956)
Improve concatenation efficiency (:pr:1923)
Avoid hashing in from_array if name is provided (:pr:1972)

Bag ^^^

Repartition can now increase number of partitions (:pr:1934)
Fix bugs in some reductions with empty partitions (:pr:1939), (:pr:1950), (:pr:1953)

DataFrame ^^^^^^^^^

Support non-uniform categoricals (:pr:1877), (:pr:1930)
Groupby cumulative reductions (:pr:1909)
DataFrame.loc indexing now supports lists (:pr:1913)
Improve multi-level groupbys (:pr:1914)
Improved HTML and string repr for DataFrames (:pr:1637)
Parquet append (:pr:1940)
Add dd.demo.daily_stock function for teaching (:pr:1992)

Delayed ^^^^^^^

Add traverse= keyword to delayed to optionally avoid traversing nested data structures (:pr:1899)
Support Futures in from_delayed functions (:pr:1961)
Improve serialization of decorated delayed functions (:pr:1969)

Core ^^^^

Improve windows path parsing in corner cases (:pr:1910)
Rename tasks when fusing (:pr:1919)
Add top level persist function (:pr:1927)
Propagate errors= keyword in byte handling (:pr:1954)
Dask.compute traverses Python collections (:pr:1975)
Structural sharing between graphs in dask.array and dask.delayed (:pr:1985)

.. _v0.13.0 / 2017-01-02:

0.13.0 / 2017-01-02

Array ^^^^^

Mandatory dtypes on dask.array. All operations maintain dtype information and UDF functions like map_blocks now require a dtype= keyword if it can not be inferred. (:pr:1755)
Support arrays without known shapes, such as arises when slicing arrays with arrays or converting dataframes to arrays (:pr:1838)
Support mutation by setting one array with another (:pr:1840)
Tree reductions for covariance and correlations. (:pr:1758)
Add SerializableLock for better use with distributed scheduling (:pr:1766)
Improved atop support (:pr:1800)
Rechunk optimization (:pr:1737), (:pr:1827)

Bag ^^^

Avoid wrong results when recomputing the same groupby twice (:pr:1867)

DataFrame ^^^^^^^^^

Add map_overlap for custom rolling operations (:pr:1769)
Add shift (:pr:1773)
Add Parquet support (:pr:1782) (:pr:1792) (:pr:1810), (:pr:1843), (:pr:1859), (:pr:1863)
Add missing methods combine, abs, autocorr, sem, nsmallest, first, last, prod, (:pr:1787)
Approximate nunique (:pr:1807), (:pr:1824)
Reductions with multiple output partitions (for operations like drop_duplicates) (:pr:1808), (:pr:1823) (:pr:1828)
Add delitem and copy to DataFrames, increasing mutation support (:pr:1858)

Delayed ^^^^^^^

Changed behaviour for delayed(nout=0) and delayed(nout=1): delayed(nout=1) does not default to out=None anymore, and delayed(nout=0) is also enabled. I.e. functions with return tuples of length 1 or 0 can be handled correctly. This is especially handy, if functions with a variable amount of outputs are wrapped by delayed. E.g. a trivial example: delayed(lambda *args: args, nout=len(vals))(*vals)

Core ^^^^

Refactor core byte ingest (:pr:1768), (:pr:1774)
Improve import time (:pr:1833)

.. _v0.12.0 / 2016-11-03:

0.12.0 / 2016-11-03

DataFrame ^^^^^^^^^

Return a series when functions given to dataframe.map_partitions return scalars (:pr:1515)
Fix type size inference for series (:pr:1513)
dataframe.DataFrame.categorize no longer includes missing values in the categories. This is for compatibility with a pandas change <https://github.com/pydata/pandas/pull/10929>_ (:pr:1565)
Fix head parser error in dataframe.read_csv when some lines have quotes (:pr:1495)
Add dataframe.reduction and series.reduction methods to apply generic row-wise reduction to dataframes and series (:pr:1483)
Add dataframe.select_dtypes, which mirrors the pandas method <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html>_ (:pr:1556)
dataframe.read_hdf now supports reading Series (:pr:1564)
Support Pandas 0.19.0 (:pr:1540)
Implement select_dtypes (:pr:1556)
String accessor works with indexes (:pr:1561)
Add pipe method to dask.dataframe (:pr:1567)
Add indicator keyword to merge (:pr:1575)
Support Series in read_hdf (:pr:1575)
Support Categories with missing values (:pr:1578)
Support inplace operators like df.x += 1 (:pr:1585)
Str accessor passes through args and kwargs (:pr:1621)
Improved groupby support for single-machine multiprocessing scheduler (:pr:1625)
Tree reductions (:pr:1663)
Pivot tables (:pr:1665)
Add clip (:pr:1667), align (:pr:1668), combine_first (:pr:1725), and any/all (:pr:1724)
Improved handling of divisions on dask-pandas merges (:pr:1666)
Add groupby.aggregate method (:pr:1678)
Add dd.read_table function (:pr:1682)
Improve support for multi-level columns (:pr:1697) (:pr:1712)
Support 2d indexing in loc (:pr:1726)
Extend resample to include DataFrames (:pr:1741)
Support dask.array ufuncs on dask.dataframe objects (:pr:1669)

Array ^^^^^

Add information about how dask.array chunks argument work (:pr:1504)
Fix field access with non-scalar fields in dask.array (:pr:1484)
Add concatenate= keyword to atop to concatenate chunks of contracted dimensions
Optimized slicing performance (:pr:1539) (:pr:1731)
Extend atop with a concatenate= (:pr:1609) new_axes= (:pr:1612) and adjust_chunks= (:pr:1716) keywords
Add clip (:pr:1610) swapaxes (:pr:1611) round (:pr:1708) repeat
Automatically align chunks in atop-backed operations (:pr:1644)
Cull dask.arrays on slicing (:pr:1709)

Bag ^^^

Fix issue with callables in bag.from_sequence being interpreted as tasks (:pr:1491)
Avoid non-lazy memory use in reductions (:pr:1747)

Administration ^^^^^^^^^^^^^^

Added changelog (:pr:1526)
Create new threadpool when operating from thread (:pr:1487)
Unify example documentation pages into one (:pr:1520)
Add versioneer for git-commit based versions (:pr:1569)
Pass through node_attr and edge_attr keywords in dot visualization (:pr:1614)
Add continuous testing for Windows with Appveyor (:pr:1648)
Remove use of multiprocessing.Manager (:pr:1653)
Add global optimizations keyword to compute (:pr:1675)
Micro-optimize get_dependencies (:pr:1722)

.. _v0.11.0 / 2016-08-24:

0.11.0 / 2016-08-24

Major Points ^^^^^^^^^^^^

DataFrames now enforce knowing full metadata (columns, dtypes) everywhere. Previously we would operate in an ambiguous state when functions lost dtype information (such as apply). Now all dataframes always know their dtypes and raise errors asking for information if they are unable to infer (which they usually can). Some internal attributes like _pd and _pd_nonempty have been moved.

The internals of the distributed scheduler have been refactored to transition tasks between explicit states. This improves resilience, reasoning about scheduling, plugin operation, and logging. It also makes the scheduler code easier to understand for newcomers.

Breaking Changes ^^^^^^^^^^^^^^^^

The distributed.s3 and distributed.hdfs namespaces are gone. Use protocols in normal methods like read_text('s3://...' instead.
Dask.array.reshape now errs in some cases where previously it would have create a very large number of tasks

.. _v0.10.2 / 2016-07-27:

0.10.2 / 2016-07-27

More Dataframe shuffles now work in distributed settings, ranging from setting-index to hash joins, to sorted joins and groupbys.
Dask passes the full test suite when run when under in Python's optimized-OO mode.
On-disk shuffles were found to produce wrong results in some highly-concurrent situations, especially on Windows. This has been resolved by a fix to the partd library.
Fixed a growth of open file descriptors that occurred under large data communications
Support ports in the --bokeh-whitelist option ot dask-scheduler to better routing of web interface messages behind non-trivial network settings
Some improvements to resilience to worker failure (though other known failures persist)
You can now start an IPython kernel on any worker for improved debugging and analysis
Improvements to dask.dataframe.read_hdf, especially when reading from multiple files and docs

.. _v0.10.0 / 2016-06-13:

0.10.0 / 2016-06-13

Major Changes ^^^^^^^^^^^^^

This version drops support for Python 2.6
Conda packages are built and served from conda-forge
The dask.distributed executables have been renamed from dfoo to dask-foo. For example dscheduler is renamed to dask-scheduler
Both Bag and DataFrame include a preliminary distributed shuffle.

Bag ^^^

Add task-based shuffle for distributed groupbys
Add accumulate for cumulative reductions

DataFrame ^^^^^^^^^

Add a task-based shuffle suitable for distributed joins, groupby-applys, and set_index operations. The single-machine shuffle remains untouched (and much more efficient.)
Add support for new Pandas rolling API with improved communication performance on distributed systems.
Add groupby.std/var
Pass through S3/HDFS storage options in read_csv
Improve categorical partitioning
Add eval, info, isnull, notnull for dataframes

Distributed ^^^^^^^^^^^

Rename executables like dscheduler to dask-scheduler
Improve scheduler performance in the many-fast-tasks case (important for shuffling)
Improve work stealing to be aware of expected function run-times and data sizes. The drastically increases the breadth of algorithms that can be efficiently run on the distributed scheduler without significant user expertise.
Support maximum buffer sizes in streaming queues
Improve Windows support when using the Bokeh diagnostic web interface
Support compression of very-large-bytestrings in protocol
Support clean cancellation of submitted futures in Joblib interface

Other ^^^^^

All dask-related projects (dask, distributed, s3fs, hdfs, partd) are now building conda packages on conda-forge.
Change credential handling in s3fs to only pass around delegated credentials if explicitly given secret/key. The default now is to rely on managed environments. This can be changed back by explicitly providing a keyword argument. Anonymous mode must be explicitly declared if desired.

.. _v0.9.0 / 2016-05-11:

0.9.0 / 2016-05-11

API Changes ^^^^^^^^^^^

dask.do and dask.value have been renamed to dask.delayed
dask.bag.from_filenames has been renamed to dask.bag.read_text
All S3/HDFS data ingest functions like db.from_s3 or distributed.s3.read_csv have been moved into the plain read_text, read_csv functions, which now support protocols, like dd.read_csv('s3://bucket/keys*.csv')

Array ^^^^^

Add support for scipy.LinearOperator
Improve optional locking to on-disk data structures
Change rechunk to expose the intermediate chunks

Bag ^^^

Rename from_filename\ s to read_text
Remove from_s3 in favor of read_text('s3://...')

DataFrame ^^^^^^^^^

Fixed numerical stability issue for correlation and covariance
Allow no-hash from_pandas for speedy round-trips to and from-pandas objects
Generally reengineered read_csv to be more in line with Pandas behavior
Support fast set_index operations for sorted columns

Delayed ^^^^^^^

Rename do/value to delayed
Rename to/from_imperative to to/from_delayed

Distributed ^^^^^^^^^^^

Move s3 and hdfs functionality into the dask repository
Adaptively oversubscribe workers for very fast tasks
Improve PyPy support
Improve work stealing for unbalanced workers
Scatter data efficiently with tree-scatters

Other ^^^^^

Add lzma/xz compression support
Raise a warning when trying to split unsplittable compression types, like gzip or bz2
Improve hashing for single-machine shuffle operations
Add new callback method for start state
General performance tuning

.. _v0.8.1 / 2016-03-11:

0.8.1 / 2016-03-11

Array ^^^^^

Bugfix for range slicing that could periodically lead to incorrect results.
Improved support and resiliency of arg reductions (argmin, argmax, etc.)

Bag ^^^

Add zip function

DataFrame ^^^^^^^^^

Add corr and cov functions
Add melt function
Bugfixes for io to bcolz and hdf5

.. _v0.8.0 / 2016-02-20:

0.8.0 / 2016-02-20

Array ^^^^^

Changed default array reduction split from 32 to 4
Linear algebra, tril, triu, LU, inv, cholesky, solve, solve_triangular, eye, lstsq, diag, corrcoef.

Bag ^^^

Add tree reductions
Add range function
drop from_hdfs function (better functionality now exists in hdfs3 and distributed projects)

DataFrame ^^^^^^^^^

Refactor dask.dataframe to include a full empty pandas dataframe as metadata. Drop the .columns attribute on Series
Add Series categorical accessor, series.nunique, drop the .columns attribute for series.
read_csv fixes (multi-column parse_dates, integer column names, etc. )
Internal changes to improve graph serialization

Other ^^^^^

Documentation updates
Add from_imperative and to_imperative functions for all collections
Aesthetic changes to profiler plots
Moved the dask project to a new dask organization

.. _v0.7.6 / 2016-01-05:

0.7.6 / 2016-01-05

Array ^^^^^

Improve thread safety
Tree reductions
Add view, compress, hstack, dstack, vstack methods
map_blocks can now remove and add dimensions

DataFrame ^^^^^^^^^

Improve thread safety
Extend sampling to include replacement options

Imperative ^^^^^^^^^^

Removed optimization passes that fused results.

Core ^^^^

Removed dask.distributed
Improved performance of blocked file reading
Serialization improvements
Test Python 3.5

.. _v0.7.4 / 2015-10-23:

0.7.4 / 2015-10-23

This was mostly a bugfix release. Some notable changes:

Fix minor bugs associated with the release of numpy 1.10 and pandas 0.17
Fixed a bug with random number generation that would cause repeated blocks due to the birthday paradox
Use locks in dask.dataframe.read_hdf by default to avoid concurrency issues
Change dask.get to point to dask.async.get_sync by default
Allow visualization functions to accept general graphviz graph options like rankdir='LR'
Add reshape and ravel to dask.array
Support the creation of dask.arrays from dask.imperative objects

Deprecation ^^^^^^^^^^^

This release also includes a deprecation warning for dask.distributed, which will be removed in the next version.

Future development in distributed computing for dask is happening here: https://distributed.dask.org . General feedback on that project is most welcome from this community.

.. _v0.7.3 / 2015-09-25:

0.7.3 / 2015-09-25

Diagnostics ^^^^^^^^^^^

A utility for profiling memory and cpu usage has been added to the dask.diagnostics module.

DataFrame ^^^^^^^^^ This release improves coverage of the pandas API. Among other things it includes nunique, nlargest, quantile. Fixes encoding issues with reading non-ascii csv files. Performance improvements and bug fixes with resample. More flexible read_hdf with globbing. And many more. Various bug fixes in dask.imperative and dask.bag.

.. _v0.7.0 / 2015-08-15:

0.7.0 / 2015-08-15

DataFrame ^^^^^^^^^ This release includes significant bugfixes and alignment with the Pandas API. This has resulted both from use and from recent involvement by Pandas core developers.

New operations: query, rolling operations, drop
Improved operations: quantiles, arithmetic on full dataframes, dropna, constructor logic, merge/join, elemwise operations, groupby aggregations

Bag ^^^

Fixed a bug in fold where with a null default argument

Array ^^^^^

New operations: da.fft module, da.image.imread

Infrastructure ^^^^^^^^^^^^^^

The array and dataframe collections create graphs with deterministic keys. These tend to be longer (hash strings) but should be consistent between computations. This will be useful for caching in the future.
All collections (Array, Bag, DataFrame) inherit from common subclass

.. _v0.6.1 / 2015-07-23:

0.6.1 / 2015-07-23

Distributed ^^^^^^^^^^^

Improved (though not yet sufficient) resiliency for dask.distributed when workers die

DataFrame ^^^^^^^^^

Improved writing to various formats, including to_hdf, to_castra, and to_csv
Improved creation of dask DataFrames from dask Arrays and Bags
Improved support for categoricals and various other methods

Array ^^^^^

Various bug fixes
Histogram function

Scheduling ^^^^^^^^^^

Added tie-breaking ordering of tasks within parallel workloads to better handle and clear intermediate results

Other ^^^^^

Added the dask.do function for explicit construction of graphs with normal python code
Traded pydot for graphviz library for graph printing to support Python3
There is also a gitter chat room and a stackoverflow tag