Back to Dask

Changelog

docs/source/changelog.rst

2026.3.0456.7 KB
Original Source

Changelog

.. note::

This is not exhaustive. For an exhaustive list of changes, see the git log.

.. _v2026.3.0:

2026.3.0

Highlights ^^^^^^^^^^

  • Preliminary Python 3.14t support (:pr:12223) Guido Imperiale_
  • Bokeh 3.9.0 compatibility (:pr-distributed:9205) Dimitri Papadopoulos Orfanos_

.. dropdown:: Additional changes

  • docs: document approximate algorithm and Dask-specific params in describe() (:pr:12300) Maxime Grenu_

  • docs: clarify coarsen reduction function contract (:pr:12314) monkeyjack123_

  • Fix misleading TypeError for scalar overflow in dask.array elemwise (:pr:12301) Maxime Grenu_

  • Stricter warnings filter (:pr:12274) Guido Imperiale_

  • Clean up obsolete PANDAS_GE markers (:pr:12279) Guido Imperiale_

  • Bump actions/upload-artifact from 6 to 7 (:pr:12311) dependabot[bot]_

  • Remove mention of obsolete default value for 'boundary' parameter. (:pr:12304) Marianne Corvellec_

  • Pandas in 3.14t CI (:pr:12284) Guido Imperiale_

  • Quadratic definition time in xarray.DataArray.to_zarr(compute=False) (:pr:12299) Guido Imperiale_

  • Bump scientific-python/issue-from-pytest-log-action from 1.4.0 to 1.5.0 (:pr:12294) dependabot[bot]_

  • test_tokenize_range_index fails if cityhash is not installed (:pr:12286) Guido Imperiale_

  • Bump minimum version of scipy (:pr:12271) Guido Imperiale_

  • Fix flaky categorical concat test (:pr:12276) Harshith J_

  • Doc: document Zarr compression options for to_zarr (:pr:12269) Harshith J_

  • Disable the GIL on 3.14t Windows CI (:pr:12280) Guido Imperiale_

  • Update obsolete pandas URLs (:pr:12278) Guido Imperiale_

  • Suppress warning: Consolidated metadata is not part of Zarr 3 (:pr:12273) Guido Imperiale_

  • Pandas4Warning: Copy-on-Write is always enabled with pandas >= 3.0 (:pr:12272) Guido Imperiale_

  • Disable the GIL in 3.14t CI (:pr:12270) Guido Imperiale_

  • Propagate contextvars to worker threads; catch warnings in 3.14t (:pr:12224) Guido Imperiale_

  • Fix bugs in env.yaml / pytest.xml upload (:pr:12266) Guido Imperiale_

  • Added full_matrices parameter to dask.array.linalg.svd (:pr:12292) Ayan Bag_

  • fix: zarr.create_array for better backward compatibility (:pr:12291) Wouter-Michiel Vierdag_

  • Silence deprecations in global config if local config overrides them (:pr:12315) Guido Imperiale_

  • Fix Total CPU % on /workers tab to normalize by total nthreads (:pr-distributed:9195) Ernest Provo_

  • setproctitle: avoid being caught by dask.config; add to test envs (:pr-distributed:9202) Guido Imperiale_

  • Add return type annotation for Client.register_plugin (:pr-distributed:9201) Simon-Martin Schröder

  • Bump actions/upload-artifact from 6 to 7 (:pr-distributed:9199) dependabot[bot]_

  • docs: fix Scheduler.close docstring (:pr-distributed:9198) Chase Naples_

  • Fix Total CPU % on /workers tab to normalize by total nthreads (:pr-distributed:9195) Ernest Provo_

  • XFAIL test_handle_null_partitions_2 (:pr-distributed:9191) Guido Imperiale_

  • Type hints for Future.status (:pr-distributed:9188) Navid_

  • Pin sphinx=8 (:pr-distributed:9190) Guido Imperiale_

.. _v2026.2.0:

2026.2.0

Highlights ^^^^^^^^^^

.. dropdown:: Additional changes

  • Minimum version of optional dependency scipy bumped to 1.10.0 (was 1.7.2)

.. _v2026.1.2:

2026.1.2

Highlights ^^^^^^^^^^

  • dask.dataframe now requires PyArrow 16 or greater (was 14)
  • Have **kwargs in to_zarr follow zarr-python API and add mode argument (:pr:12205) Wouter-Michiel Vierdag_

.. note::

Passing on io-related arguments in ``**kwargs`` in ``to_zarr`` will be deprecated
and ``read_kwargs`` argument as well as ``zarr_array_kwargs`` (dict) introduced in 2025.12.0
has been removed.
If you passed on either ``mode`` or `read_only` as ``**kwargs`` or ``read_kwargs`` in
``to_zarr``, please use the new ``mode`` argument. The ``read_only`` argument can still
be passed on, but it will give a warning and have no effect (given that ``to_zarr``
is meant to write this should not be an issue). For now no error will be thrown.
``**kwargs`` in ``to_zarr`` has been renamed as ``**zarr_array_kwargs`` to indicate
that this  directly follows the ``zarr-python`` API of ``Group.create_array``
when ``zarr>v3.0.0`` and ``zarr.create`` for ``zarr<v3.0.0``. Please see
:func:`dask.array.to_zarr` for more.

.. dropdown:: Additional changes

  • Minimum version of optional dependency h5py bumped to 3.7.0 (was 3.4.0)
  • Minimum version of optional dependency python-snappy bumped to 0.7.1 (was 0.6.0)
  • Minimum version of optional dependency tiledb bumped to 0.27.0 (was 0.12.0)

.. _v2026.1.1:

2026.1.1

Highlights ^^^^^^^^^^

  • Fix XSS vulnerability CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ Jacob Tomlinson_
  • Support duck-typed Futures in task graph processing (:pr:12213) Matthew Rocklin_

.. dropdown:: Additional changes

  • Remove the Python 2 Comment (:pr:12229) Vipin Kataria_

  • Fix changelog: distributed-pr -> pr-distributed (:pr:12227) Matthew Plough_

  • Support duck-typed Futures in task graph processing (:pr:12213) Matthew Rocklin_

  • Relax test_serialization (:pr:12226) Guido Imperiale_

  • [cosmetic] Reorganise dependency groups in CI environment files (:pr:12222) Guido Imperiale_

  • Review _array_expr_enabled() (:pr:12217) Guido Imperiale_

  • Increase coverage; lower codecov threshold to pass (:pr:12214) Guido Imperiale_

  • Test array expr on mindeps (:pr:12216) Guido Imperiale_

  • Disable some Mac builds (:pr:12218) Guido Imperiale_

  • Typing tweaks (:pr:12215) Guido Imperiale_

  • [CI] unbreak codecov (:pr:12211) Guido Imperiale_

  • Test array expr on Python 3.14 (:pr:12212) Guido Imperiale_

  • Fix pickle compatibility for Python 3.14 (:pr:12206) Matthew Rocklin_

  • Remove deprecated dask._compatibility.entry_points (:pr:12202) Guido Imperiale_

  • Tweak MacOS CI (:pr:12200) Guido Imperiale_

  • Remove obsolete CI pins (:pr:12199) Guido Imperiale_

  • Fix XSS vulnerability CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ Jacob Tomlinson_

  • Clean up obsolete pins in CI (:pr-distributed:9172) Guido Imperiale_

  • Fix incompatibility of pyparsing vs. packaging in mindeps CI (:pr-distributed:9170) Guido Imperiale_

  • Bump mypy; fix mypy failure (:pr-distributed:9171) Guido Imperiale_

.. _v2026.1.0:

2026.1.0

Broken yanked release, please ignore.

.. _v2025.12.0:

2025.12.0

Highlights ^^^^^^^^^^

  • More improvements for pandas 3.x Tom Augspurger_
  • Support zarr sharding through create_array (:pr:12153) Wouter-Michiel Vierdag_
  • Various improvements for project linting and type hinting Dimitri Papadopoulos Orfanos_
  • Add new "optimization.tune.active" configuration option to disable partition fusion (:pr:12194) Richard (Rick) Zamora_

.. dropdown:: Additional changes

  • Stable sort in Series.value_counts for pandas 3.x (:pr:12191) Tom Augspurger_

  • Add new "optimization.tune.active" configuration option to disable partition fusion (:pr:12194) Richard (Rick) Zamora_

  • Build llms.txt files in Sphinx documentation (:pr:12192) Jacob Tomlinson_

  • Support zarr sharding through create_array (:pr:12153) Wouter-Michiel Vierdag_

  • Support min/max of datetime (:pr:12183) Julia Signell_

  • pandas 3.x compatibility (:pr:12180) Tom Augspurger_

  • Minimal version of setuptools-scm (:pr:12184) Dimitri Papadopoulos Orfanos_

  • Update test_ufunc_meta for upstream-dev failure (:pr:12170) Tom Augspurger_

  • Upstream compat (:pr:12165) Tom Augspurger_

  • Enforce a few more ruff rules (:pr:12157) Dimitri Papadopoulos Orfanos_

  • Enforce ruff/refurb rules (FURB) (:pr:12144) Dimitri Papadopoulos Orfanos_

  • DEP: bump minimal requirement on toolz (0.10.0 -> 0.12.0) (:pr:12163) Clément Robert_

  • Fix execution stop in da.to_zarr due to (misleading) PerformanceWarning raised as exception (:pr:12161) Marvin Albert_

  • Use f-string interpolation where possible (:pr:12140) Dimitri Papadopoulos Orfanos_

  • pre-commit black hook: use implicit defaults (:pr:12156) Dimitri Papadopoulos Orfanos_

  • Enforce ruff/pygrep-hooks rules (PGH) (:pr:12143) Dimitri Papadopoulos Orfanos_

  • Apply Repo-Review rules (:pr:12148) Dimitri Papadopoulos Orfanos_

  • Document groupby: split_every, split_out (:pr:12135) Jayesh Manani_

  • isort → ruff (:pr:12149) Dimitri Papadopoulos Orfanos_

  • Enforce ruff/pyupgrade rule UP031 (:pr:12137) Dimitri Papadopoulos Orfanos_

  • Replace pre-commit hook with ruff rule (:pr:12142) Dimitri Papadopoulos Orfanos_

  • Fix reify to handle sparse arrays and other objects without len (:pr:12103) Gautham Hullikunte_

  • Ruff supersedes absolufy-imports (:pr:12141) Dimitri Papadopoulos Orfanos_

  • Enforce ruff/pyupgrade rule UP032 (:pr:12136) Dimitri Papadopoulos Orfanos_

  • Typing fixes (:pr-distributed:9159) Jacob Tomlinson_

  • Explicit setuptools-scm minimum version (:pr-distributed:9160) Jacob Tomlinson_

  • Enforce ruff rules (RUF) (:pr-distributed:9153) Dimitri Papadopoulos Orfanos_

  • Clean up MANIFEST.in (:pr-distributed:9149) Dimitri Papadopoulos Orfanos_

  • isort → ruff (:pr-distributed:9152) Dimitri Papadopoulos Orfanos_

  • Ruff supersedes absolufy-imports (:pr-distributed:9154) Dimitri Papadopoulos Orfanos_

  • Bump minimum supported toolz to 0.12.0 (:pr-distributed:9151) James Bourbeau_

  • flake8, bugbear, pyupgrade → ruff (:pr-distributed:9147) Dimitri Papadopoulos Orfanos_

  • Fix typos found by codespell (:pr-distributed:9145) Dimitri Papadopoulos Orfanos_

  • Clean up setuptools-specific configuration (:pr-distributed:9150) Dimitri Papadopoulos Orfanos_

  • PEP 639 compliance (:pr-distributed:9146) Dimitri Papadopoulos Orfanos_

  • Update black (:pr-distributed:9148) Dimitri Papadopoulos Orfanos_

  • Fix empty progress bar (:pr-distributed:9144) Jacob Tomlinson_

  • Exclude broken tblib versions in CI (:pr-distributed:9141) Jacob Tomlinson_

.. _v2025.11.0:

2025.11.0

Highlights ^^^^^^^^^^

  • Use shard shape when available in to_zarr (:pr:12105) Davis Bennett_
  • Improve worker and nanny support for ipv6 (:pr-distributed:9133) Jianyu Sun_
  • Linting and type hinting improvements across the codebase

.. dropdown:: Additional changes

  • Replace versioneer with setuptools-scm (:pr:12133) Jacob Tomlinson_

  • Apply ruff/Pylint Refactor rules (PLR) (:pr:12010) Dimitri Papadopoulos Orfanos_

  • Remove files from MANIFEST.in (:pr:12041) Dimitri Papadopoulos Orfanos_

  • Stabilize test_filter_nonpartition_columns (:pr:12131) DongWon_

  • Enforce ruff/pyupgrade rules UP007 and UP033 (:pr:12125) Dimitri Papadopoulos Orfanos_

  • Update np.accumulate workaround comment (:pr:12129) Jacob Tomlinson_

  • flake8, bugbear, pyupgraderuff (:pr:12002) Dimitri Papadopoulos Orfanos_

  • Adjust pyarrow version skip in test_parquet (:pr:12124) Tom Augspurger_

  • Fix ufunc in dask.array.cumreduction (:pr:12119) Tony Ding_

  • Fix docs footer (:pr:12120) Jacob Tomlinson_

  • Use integer multiple of shard shape when rechunking in to_zarr (:pr:12106) Davis Bennett_

  • Ensure that the shard shape is used as the default chunk shape for sharded Zarr arrays (:pr:12104) Davis Bennett_

  • Skip test_parquet for pyarrow==22.0 (:pr:12116) Tom Augspurger_

  • Clean up setuptools-specific configuration (:pr:12040) Dimitri Papadopoulos Orfanos_

  • PEP 639 compliance (:pr:12024) Dimitri Papadopoulos Orfanos_

  • Fix deprecated quantile interpolation being passed to numpy (:pr:12108) David Hoese_

  • Add uv.lock to .gitignore (:pr:12110) Jacob Tomlinson_

  • Use shard shape when available in to_zarr (:pr:12105) Davis Bennett_

  • Add more optional dependencies to Python 3.13 CI builds (:pr:12100) James Bourbeau_

  • Remove pip pin for docs (:pr:12102) James Bourbeau_

  • Address collection-based meta arguments in GroupByApply (:pr:12099) Richard (Rick) Zamora_

  • Replace versioneer with setuptools-scm (:pr-distributed:9137) Jacob Tomlinson_

  • Improve worker and nanny support for ipv6 (:pr-distributed:9133) Jianyu Sun_

  • Fix CI Multiple aliased keys in file /Users/runner/.condarc (:pr-distributed:9136) Jacob Tomlinson_

  • Remove pip pin for docs (:pr-distributed:9132) James Bourbeau_

  • Remove UCX configuration schema (:pr-distributed:9127) Peter Andreas Entschev_

  • Add generic type support to Future and Client methods (:pr-distributed:9123) Simon-Martin Schröder_

.. _v2025.10.0:

2025.10.0

Highlights ^^^^^^^^^^

  • Several Dask Array bug fixes including :pr:12097, :pr:12089, :pr:12088, and :pr:12090.

.. dropdown:: Additional changes

  • Use updated docs theme (:pr:12093) Jacob Tomlinson_

  • Fix: dask.array.cumprod does not deal with dtype (:pr:12097) Tony Ding_

  • CuPy compatibility for percentile (:pr:12098) Tom Augspurger_

  • Avoid using methods.concat on empty lists (:pr:12096) Tony Ding_

  • Add distribution check for optional dependencies (:pr:12087) James Bourbeau_

  • Fix percentile inconsistencies (:pr:12088) Oisin-M_

  • Fix warning in test_ufunc_where_no_out (:pr:12094) Tom Augspurger_

  • Fix/choose trivial case (:pr:12090) Oisin-M_

  • Add input validation on dask.dataframe.read_sql_query() (:pr:12091) Jacob Tomlinson_

  • Numpy 2.2 updates for cov function with tests (:pr:12079) Mike McCarty_

  • Fix nanvar (:pr:12089) Oisin-M_

  • Document manually triggering the conda-forge bots (:pr:12083) Jacob Tomlinson_

  • Fix mixed HLG/Expr handling in _ExprSequence._simplify_down (:pr:12081) Richard (Rick) Zamora_

  • Add dask.tokenize to API docs (:pr:12080) Username46786_

  • CreateOverlappingPartitions: Add before and after to prepend name (:pr:11965) Fabien Aulaire_

  • Fix scipy.sparce.csc_matrix scalar declaration in _array_like_safe (:pr:12078) Ilan Gold_

  • Update docs theme and remove docs env pins (:pr-distributed:9125) Jacob Tomlinson_

  • Add worker name as prefix to ThreadPoolExecutor name (:pr-distributed:9120) Maneesh Sutar_

  • Skip hanging SSH tests on Windows (:pr-distributed:9115) Jacob Tomlinson_

  • Fix macOS CI failure during job startup (:pr-distributed:9113) Jacob Tomlinson_

  • Prevent task stream dashboard showing 1970 date (:pr-distributed:9109) Guillaume Eynard-Bontemps_

.. _v2025.9.2:

2025.9.2

This is a backport security release only.

See CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ for more details.

.. _v2025.9.1:

2025.9.1

Highlights ^^^^^^^^^^

  • Avoid unconditional pyarrow dependency in dataframe.backends (:pr:12075) Tom Augspurger_
  • pandas 3.x compatibility for .groups (:pr:12071) Tom Augspurger_

.. dropdown:: Additional changes

  • Avoid unconditional pyarrow dependency in dataframe.backends (:pr:12075) Tom Augspurger_
  • pandas 3.x compatibility for .groups (:pr:12071) Tom Augspurger_
  • Expose details about worker start timeout in the exception message (:pr-distributed:9092) Taylor Braun-Jones_
  • pynvml => nvidia-ml-py in CI (:pr-distributed:9111) Jacob Tomlinson_

.. _v2025.9.0:

2025.9.0

Highlights ^^^^^^^^^^

  • pandas 3.x compatibility (:pr:12025) Tom Augspurger_
  • Remove protocol="ucx" support in favor of distributed-ucxx (:pr-distributed:9105) Peter Andreas Entschev_

.. dropdown:: Additional changes

  • Fix 0 scalar setting for scipy.sparse (:pr:12027) Ilan Gold_

  • Workaround failing upstream-dev tests (:pr:12061) Tom Augspurger_

  • avoid instantiating a potentially very large arange in take (:pr:11998) Justus Magin_

  • MAINT: address NumPy deprecation in np.minimum (:pr:12059) Marco Edward Gorelli_

  • CI fixes (:pr:12058) Tom Augspurger_

  • MAINT: Address NumPy DeprecationWarning (:pr:12056) Marco Edward Gorelli_

  • Fix test_enforce_columns on Python 3.14 (:pr:12047) Elliott Sales de Andrade_

  • Fix "th" --> "the" typo in DataFrame SQL docs (:pr:12038) Peter A. Jonsson_

  • Advance rng state in permutation (:pr:12031) James Bourbeau_

  • Fix pyarrow chunked array conversion (:pr:12034) James Bourbeau_

  • Fix xfail condition for pyarrow large_string issue (:pr:12032) James Bourbeau_

  • pandas 3.x compatibility (:pr:12025) Tom Augspurger_

  • Fix name not propagated correctly in map_blocks (:pr:11952) Ilan Gold_

  • Clean tuples dict keys from workers_info in /api/v1/retire_workers. (:pr-distributed:8996) Florian Courtial_

  • Remove protocol="ucx" support in favor of distributed-ucxx (:pr-distributed:9105) Peter Andreas Entschev_

.. _v2025.7.0:

2025.7.0

Highlights ^^^^^^^^^^

  • Account for __main__ in pickle normalization (:pr:11970) James Bourbeau_
  • Enable column projection in MapPartitions (:pr:11875) Richard (Rick) Zamora_
  • Add config option for direct-to-workers (:pr-distributed:9097) James Bourbeau_

.. dropdown:: Additional changes

  • CI: update actions location (:pr:12019) Brigitta Sipőcz_

  • Apply ruff/flake8-comprehensions rules (C4) (:pr:12004) Dimitri Papadopoulos Orfanos_

  • Apply ruff/flake8-pie rules (PIE) (:pr:12006) Dimitri Papadopoulos Orfanos_

  • Apply ruff/Pylint Error rules (PLE) (:pr:12013) Dimitri Papadopoulos Orfanos_

  • Apply ruff/Pylint Convention rules (PLC) (:pr:12012) Dimitri Papadopoulos Orfanos_

  • Apply ruff/flake8-pyi rules (PYI) (:pr:12007) Dimitri Papadopoulos Orfanos_

  • Apply ruff/flake8-simplify rules (SIM) (:pr:12008) Dimitri Papadopoulos Orfanos_

  • Apply ruff/Pylint Warning rules (PLW) (:pr:12011) Dimitri Papadopoulos Orfanos_

  • Apply ruff/flake8-implicit-str-concat rules (ISC) (:pr:12005) Dimitri Papadopoulos Orfanos_

  • Apply ruff/pycodestyle rule E714 (:pr:12000) Dimitri Papadopoulos Orfanos_

  • Fix typos found by codespell (:pr:12001) Dimitri Papadopoulos Orfanos_

  • Update PyPI URL for official nightly pyarrow repository (:pr:11996) Raúl Cumplido_

  • Fall-back to textual repr in case jinja2 is not installed (:pr:11987) Lukas Bindreiter_

  • Prevent builtins.any from being shadowed in dask.array.reductions (:pr:11988) Marvin Albert_

  • Bump conda-incubator/setup-miniconda from 3.1.1 to 3.2.0 (:pr:11982)

  • Skip groupby cov test for pandas 3.x (:pr:11977) Tom Augspurger_

  • Fix upstream CI installation (:pr:11976) James Bourbeau_

  • Make module name logic more resilient in Dispatch (:pr:11974) James Bourbeau_

  • Ensure memray profiler runs on all workers (:pr-distributed:9095) James Bourbeau_

  • Update def to class typo in actors docs (:pr-distributed:9091) Peter Fackeldey_

  • Bump conda-incubator/setup-miniconda from 3.1.1 to 3.2.0 (:pr-distributed:9090)

  • Update persist in tests for async clients (:pr-distributed:9089) Tom Augspurger_

  • Fix pyarrow FileInfo import (:pr-distributed:9078) James Bourbeau_

  • Make module name logic more resilient in _always_use_pickle_for (:pr-distributed:9086) James Bourbeau_

  • Temporarily pin pytest in CI to avoid coverage error (:pr-distributed:9088) James Bourbeau_

  • Remove s3fs from testing CI environment (:pr-distributed:9087) James Bourbeau_

  • Reuse Comm objects in Scheduler.broadcast (:pr-distributed:9083) Tom Augspurger_

  • Fix test_resubmit_nondeterministic_task_different_deps (:pr-distributed:9085) James Bourbeau_

.. _v2025.5.1:

2025.5.1

Highlights ^^^^^^^^^^ Fixed Dask Array slicing regression introduced in the 2025.5.0 release. See :pr:11947 from Florian Jetter_ for more details.

.. dropdown:: Additional changes

  • Speed up slicing graph generation (:pr:11945) Florian Jetter_
  • Revert "Don't handle tuple in task_spec.parse_input" (:pr:11953) Florian Jetter_
  • Optimize slicing graph generation (:pr:11946) Florian Jetter_
  • Fix xarray slicing regression (:pr:11947) Florian Jetter_
  • Don't handle tuple in task_spec.parse_input (:pr:11948) Florian Jetter_

.. _v2025.5.0:

2025.5.0

Highlights ^^^^^^^^^^

  • Fixed Array setitem when both the array and the indexer have unknown shape. See :pr:11753 from Tom Augspurger_ for more details.
  • Fixed several delayed graph handling issues introduced in the 2025.4.0 release. See :pr:11917, :pr:11907, and :pr-distributed:9071 from Florian Jetter_ for more details.

.. dropdown:: Additional changes

  • Speed up slicing graph generation (:pr:11945) Florian Jetter_

  • Optimize dask order for worst case of get_target (:pr:11935) Florian Jetter_

  • Raise on local executor if tasks are missing dependency (:pr:11944) Florian Jetter_

  • Fix to_dask_array for single partition (:pr:11931) James Bourbeau_

  • Ensure parquet plan is fully cached during optimization (:pr:11933) Florian Jetter_

  • Better documentation for expression system (:pr:11915) Florian Jetter_

  • Simplify (and speed up) culling (:pr:11899) Florian Jetter_

  • Update pre-commit (:pr:11926) Florian Jetter_

  • Don't run post setup-miniconda step in CI (:pr:11925) James Bourbeau_

  • Try to pin pip for readthedocs (:pr:11923) Florian Jetter_

  • Fix windows CI (:pr:11919) Florian Jetter_

  • Use stable crick for py310 (:pr-distributed:9072) Florian Jetter_

  • Remove internal dependencies mapping in update_graph (:pr-distributed:9036) Florian Jetter_

  • Partially forgotten dependencies (:pr-distributed:9068) Florian Jetter_

  • Replace filesystem-spec in CI environment with fsspec (:pr-distributed:9069) James Bourbeau_

  • Ensure actors set erred state properly in case of worker failure (:pr-distributed:9067) Florian Jetter_

  • Refactor timeouts in start cluster (:pr-distributed:9062) Florian Jetter_

  • Fix workers / threads / memory displayed in client repr (:pr-distributed:9066) James Bourbeau_

  • Pin pip for readthedocs (:pr-distributed:9063) Florian Jetter_

  • Skip TLS functional tests (:pr-distributed:9061) Florian Jetter_

  • Ensure client submit does not serialize unnecessarily (:pr-distributed:9057) Florian Jetter_

.. _v2025.4.1:

2025.4.1

Highlights ^^^^^^^^^^ This release contains several graph optimization fixes for issues introduced in the 2025.4.0 release.

See :pr:11906, :pr:11898, :pr:11903, and :pr:11904 by Florian Jetter_ for more details.

.. dropdown:: Additional changes

  • Implement ufuncs and gufunc for array-expr (:pr:11818) Patrick Hoefler_
  • Implement map_overlap for array-expr (:pr:11822) Patrick Hoefler_

.. _v2025.4.0:

2025.4.0

Highlights ^^^^^^^^^^

  • When computing multiple Dask-Expr backed collections like DataFrames, they are now optimized together instead of individually.
  • Graph materialization and low level optimization is now being performed on the scheduler of a distributed cluster (if available).
  • New kwarg force for DataFrame.shuffle which signals the optimizer to not drop the shuffle during optimization.
  • Collections that are passed to Dask methods as arguments are now properly optimized. If multiple collections are passed as arguments they will be optimized together. Collections passed this way are prohibited from being being reused, i.e. if the collection is used again in another function call it will be computed again. This pattern is used to avoid pipeline breakers which typically drive memory usage. Avoiding those should reduce memory pressure on the cluster but can cause runtime regressions.
  • (Special case of above point) Collections passed to Delayed objects are now optimized automatically.

Breaking changes ^^^^^^^^^^^^^^^^

  • Support for custom low level optimizers removed.
  • Top level dask.optimize will now always trigger graph materialization. Previously this was not always the case. This also causes any low level HLG annotations to be dropped.
  • DataFrame and Array compute results are now always concatenated on the cluster. Previously, the behavior was dependent on the API used to call compute (dask.compute, DaskCollection.compute, or Client.compute).
  • dask.base.collections_to_dsk has been renamed to collections_to_expr and no longer returns a HighLevelGraph or dict object but instead guarantees an dask._expr.Expr object. Further, it no longer performs low level optimization immediately but instead delays until the Expr instance is materialized, i.e. the returned object is no longer a mapping such that converting it to dict or iterating over it is not possible any more.

.. dropdown:: Additional changes

  • Ensure Future value is in da.from_delayed task graph (:pr:11896) Tom Augspurger_

  • Fix annotations passed to delayed (:pr:11893) Florian Jetter_

  • Migrate delayed unpack_collections (:pr:11881) Florian Jetter_

  • Remove Pub / Sub references from docs (:pr:11891) James Bourbeau_

  • Ensure only classes without custom init are singletons (:pr:11886) Florian Jetter_

  • Remove custom initializers for delayed expressions (:pr:11888) Florian Jetter_

  • Fix persisting multiple DFs at the same time (:pr:11887) Florian Jetter_

  • Avoid always parsing list inputs to DataFrame.isin as object type numpy arrays (:pr:11869) Matthew Roeschke_

  • Unskip pandas-dev cov / corr tests (:pr:11873) Tom Augspurger_

  • HLG blockwise fix (:pr:11871) Florian Jetter_

  • Ensure annotations for HLG objects are properly generated (:pr:11866) Florian Jetter_

  • Factor out singleton logic from base Expr class (:pr:11868) Florian Jetter_

  • Ensure HLGs are using dependencies properly in optimization (:pr:11859) Florian Jetter_

  • Ensure dictionaries tokenize deterministically (:pr:11867) Florian Jetter_

  • Ensure default dask scheduler only compute what's needed (:pr:11861) Florian Jetter_

  • Faster tokenization of pd.RangeIndex (:pr:11863) Florian Jetter_

  • Update link to Quansight in community doc (:pr:11860) Pavithra Eswaramoorthy_

  • Relax tolerance in autocorr test (:pr:11857) Tom Augspurger_

  • Use map_blocks in array.store to avoid materialization and dropping of annotations (:pr:11844) Florian Jetter_

  • Ensure repartition does not trigger memory size computation during lowering (i.e. on the scheduler) (:pr:11855) Florian Jetter_

  • Support args and kwargs for rolling aggregations (:pr:11856) Florian Jetter_

  • Remove nightly h5py from upstream CI job (:pr:11847) James Bourbeau_

  • Ensure HLGExpr tokenize uniquely (:pr:11849) Florian Jetter_

  • Do not inject median in describe for pandas 3 (:pr:11846) Florian Jetter_

  • Fixed Expr.__setattr__ for subclasses (:pr:11845) Tom Augspurger_

  • Wrap HLGs in an Expr to avoid Client side materialization (:pr:11736) Florian Jetter_

  • Improve error when submitting work from a closed client (:pr-distributed:9049) James Bourbeau_

  • Return a default value if address resolution fails (:pr-distributed:9051) Sandro_

  • Avoid deepcopy when submitting graph (:pr-distributed:8633) Florian Jetter_

  • Dynamically scale heartbeat and scheduler_info intervals (:pr-distributed:9046) Florian Jetter_

  • Speed up process startup time by avoiding importing packages on version check (:pr-distributed:9048) Florian Jetter_

  • Reduce size of scheduler_info (:pr-distributed:9045) Florian Jetter_

  • Cache WorkerState host property (:pr-distributed:9044) Florian Jetter_

  • Clear ci env cache (:pr-distributed:9047) Florian Jetter_

  • Remove deprecated Pub / Sub (:pr-distributed:9039) Florian Jetter_

  • Perform explicit culling step only if LLG is submitted (:pr-distributed:9040) Florian Jetter_

  • Do not fully materialize global annotations by type (:pr-distributed:9035) Florian Jetter_

  • Allow nested worker_client calls (:pr-distributed:9038) George Sakkis_

  • Dump ci cache (:pr-distributed:9037) Florian Jetter_

  • Scheduler type annotations (:pr-distributed:9030) Florian Jetter_

  • Reduce dask.order overhead by removing stripped_dep computation (:pr-distributed:9031) Florian Jetter_

  • Use Expr instead of HLG (:pr-distributed:9008) Florian Jetter_

.. _v2025.3.0:

2025.3.0

Highlights ^^^^^^^^^^

Automatically adjust chunksizes in xarray.apply_ufunc """""""""""""""""""""""""""""""""""""""""""""""""""""""""

apply_ufunc requires the core dimension to have chunksize=-1. The underlying rechunking operation will automatically adjust the chunksize of the core dimension but keep the other dimensions the same. This can cause exploding chunksizes under the hood.

This release adds an intermediate step that resizes the non-core dimensions by the same factor that the core dimension will increase to keep the maximum chunksize under control. This behavior is automatically enabled when allow_rechunk=True is set.

.. code-block::

import xarray as xr
import dask.array as da

arr = xr.DataArray(
    da.random.random((1, 750, 45910), chunks=(1, "auto", -1)),
    dims=["band", "y", "x"],
)

result = arr.interp(
    y=arr.coords["y"],
    method="linear",
)

.. grid:: 2

.. grid-item:: **Previously**

    Individual chunks are exploding to 25 GiB, likely causing out of memory errors.

    .. image:: images/changelog/gufunc_chunksizes_exploding.png
      :width: 100%
      :align: center
      :alt: Individual chunks are exploding to 25 GiB, likely causing out of memory errors.

.. grid-item:: **Now**

    Dask will now automatically split individual chunks into chunks that will have the
    same chunksize minus a small tolerance.

    .. image:: images/changelog/gufunc_chunksizes_constant.png
      :width: 100%
      :align: center
      :alt: Individual chunks are now roughly the same size

.. dropdown:: Additional changes

  • Fix dataset info cache assignment (:pr:11840) Florian Jetter_

  • Expr setattr (:pr:11836) Florian Jetter_

  • Follow up to expression tokenization caching (:pr:11837) Florian Jetter_

  • Consolidate getattr for expr classes (:pr:11835) Florian Jetter_

  • Reduce pickle size of ReadParquet expression (:pr:11797) Florian Jetter_

  • arange loses precision on ~2**63 (:pr:11801) Guido Imperiale_

  • Remove numbagg from upstream build (:pr:11821) Patrick Hoefler_

  • Dispatch to numbagg for nanmedian and nanquantile (:pr:11817) Patrick Hoefler_

  • Make missing meta warning more ergonomic (:pr:11814) Patrick Hoefler_

  • Remove name doc from from_pandas (:pr:11812) Patrick Hoefler_

  • Implement an Array Scalar (:pr:11810) Patrick Hoefler_

  • Added to_orc to DataFrame API (:pr:11807) Tom Augspurger_

  • Implement reverse indexing for DataFrames (:pr:11803) Patrick Hoefler_

  • Add lazy to_pandas_dispatch registration for cudf (:pr:11799) Richard (Rick) Zamora_

  • Fix missing imports in array-expr (:pr:11796) Florian Jetter_

  • Cache tokens on expressions and restore after pickle roundtrip (:pr:11791) Florian Jetter_

  • Use random dashboard ports for LocalCluster in distributed tests (:pr:11795) Florian Jetter_

  • Implement slicing for array-expr (:pr:11783) Patrick Hoefler_

  • Never use an asynchronous Client when calling top level compute function (:pr:11790) Florian Jetter_

  • Refactor import tests (:pr:11794) Florian Jetter_

  • Migrate base.unpack_collections to Task class (:pr:11793) Florian Jetter_

  • Ensure map_blocks generates unique tokens (:pr:11792) Florian Jetter_

  • Speed up normalize_pickle by 50 percent (:pr:11788) Florian Jetter_

  • Fix divisions calculation with duplicates (:pr:11787) Patrick Hoefler_

  • Fix assign align for duplicated divisions (:pr:11786) Patrick Hoefler_

  • Ensure concat optimize project does not raise (:pr:11784) Florian Jetter_

  • Add array-expr from_array (:pr:11772) Patrick Hoefler_

  • Keep chunksizes consistent in apply_gufunc (:pr:11683) Patrick Hoefler_

  • Test dask.dataframe.__all__ (:pr:11782) Philipp A._

  • Add __all__ to dask.bag (:pr:11781) Philipp A._

  • Add test for dask.array.__all__ (:pr:11780) Philipp A._

  • Bump JamesIves/github-pages-deploy-action from 4.7.2 to 4.7.3 (:pr:11777)

  • Export dask.array members (:pr:11779) Philipp A._

  • Fix sorted_divisions_locations with duplicates (:pr:11773) Tom Augspurger_

  • Fix small typo in best-practices.rst (:pr:11775) Sergey Kolesnikov_

  • Allow unknown chunks in blockwise adjust_chunks (:pr:11769) Lindsey Gray_

  • Fix crash in asarray(..., like=...) vs. scipy.sparse objects (:pr:11755) Guido Imperiale_

  • Remove flaky optional dependency (:pr:11771) Tom Augspurger_

  • Add support for scipy sparray (:pr:11750) Philipp A._

  • Added flaky to tests extra (:pr:11770) Tom Augspurger_

  • Ensure divisions are plain scalars (:pr:11767) Tom Augspurger_

  • Remove divisions code duplication (:pr:11764) Florian Jetter_

  • Ensure divisions not diverging from npartitions in Merge (:pr:11762) Florian Jetter_

  • Skip test_visualize_int_overflow on windows (:pr:11761) Florian Jetter_

  • Reduce pickle size for tasks (:pr:11687) Florian Jetter_

  • Implement unify_chunks and Rechunk (:pr:11692) Patrick Hoefler_

  • Fix expression getitem to avoid alignment (:pr:11760) Patrick Hoefler_

  • arange(..., like=x) embeds the graph of x (:pr:11754) Guido Imperiale_

  • Simplify assert_divisions (:pr:11745) Florian Jetter_

  • Fix Projection logic for Series objects (:pr:11747) Patrick Hoefler_

  • Remove bytes as keys (:pr:11757) Florian Jetter_

  • Ensure map_partitions returns Series object if function returns scalar (:pr:11756) Florian Jetter_

  • Don't upload env twice (:pr:11748) Patrick Hoefler_

  • Fix badges in readme (:pr-distributed:9029) Florian Jetter_

  • Properly forward cancellation reason (:pr-distributed:9028) Florian Jetter_

  • Fix bokeh circle (:pr-distributed:9026) Florian Jetter_

  • Ensure FileInfo can be serialized (:pr-distributed:9025) Florian Jetter_

  • Add ipykernel to skipped modules in code sampling (:pr-distributed:9022) Matthew Rocklin_

  • SpecCluster: add option to not shut down the scheduler when the cluster is closed (:pr-distributed:9021) Taylor Braun-Jones_

  • Fix CI by using client.persist(collection) instead of collection.persist() (:pr-distributed:9020) Hendrik Makait_

  • Add redirect from prefix root to status (:pr-distributed:9015) Isaac_

  • Bump JamesIves/github-pages-deploy-action from 4.7.2 to 4.7.3 (:pr-distributed:9018)

  • Remove bytes keys from tests (:pr-distributed:9017) Jacob Tomlinson_

.. _v2025.2.0:

2025.2.0

Highlights ^^^^^^^^^^ This release includes a critical fix that fixes a deadlock that can arise when seceded task are rescheduled, or cancelled and resubmitted, e.g. due to a worker being lost.

See :pr-distributed:8991 by Hendrik Makait_ for more details.

.. dropdown:: Additional changes

  • Add big array example (:pr:11744) James Bourbeau_

  • Fix exploding chunksizes in pad for constant padding (:pr:11743) Patrick Hoefler_

  • Move optimize method to base class (:pr:11742) Florian Jetter_

  • Add changelog entry for fixed deadlock (:pr:11741) Hendrik Makait_

  • Fix graph creation in dask-expr to_delayed (:pr:11739) Patrick Hoefler_

  • Remove culling from delayed optimisation (:pr:11737) Patrick Hoefler_

  • Compute meta for from_map on the cluster (:pr:11738) Patrick Hoefler_

  • Bugs in __setitem__ with dask bool mask (:pr:11728) Guido Imperiale_

  • Implement infrastructure, random, blockwise and Elemwise (:pr:11689) Patrick Hoefler_

  • array / asarray with both like= and dtype= (:pr:11733) Guido Imperiale_

  • Fix annotations warnings test (:pr:11734) Patrick Hoefler_

  • Catch warnings when writing to remote storage with to_parquet (:pr:11731) Patrick Hoefler_

  • Remove LocalCluster from tests (:pr:11729) Patrick Hoefler_

  • Fix partition pruning when using from_array (:pr:11725) Patrick Hoefler_

  • Fix concatentation with mixed dtype columns (:pr:11727) Patrick Hoefler_

  • arange: fix extreme values (:pr:11707) Guido Imperiale_

  • Graph corruption on scalar getitem -> setitem (:pr:11723) Guido Imperiale_

  • Never share buffers after compute() (:pr:11697) Guido Imperiale_

  • Extract Dask Array from xarray DataArray in from_array (:pr:11712) Patrick Hoefler_

  • arange: support kwargs (:pr:11710) Guido Imperiale_

  • Ensure normalize_token is threadsafe (:pr:11709) Florian Jetter_

  • Expand advise for instance types and processes (:pr:11705) Florian Jetter_

  • Drop legacy timeseries implementation (:pr:11704) Florian Jetter_

  • Update Dask Cloud Provider documentation to include Nebius as a supported cloud option (:pr:11703) Alexander_

  • Fix normalize_chunks when squashing into a single chunk (:pr:11702) Patrick Hoefler_

  • Fix positional indexing with newaxis (:pr:11699) Patrick Hoefler_

  • Set array backend in scipy-sparse-indexing (:pr:11700) Tom Augspurger_

  • Fix value_counts shuffling strategy (:pr:11698) Patrick Hoefler_

  • Disentangle core expression class from dataframe specific code (:pr:11688) Patrick Hoefler_

  • Bump conda-incubator/setup-miniconda from 3.1.0 to 3.1.1 (:pr:11685)

  • Fixup dataframe conversion from array methods (:pr:11684) Patrick Hoefler_

  • Remove remaining artifacts of fastparquet (:pr:11682) Patrick Hoefler_

  • Remove traceback from sizeof failure warning (:pr-distributed:9006) Jacob Tomlinson_

  • Hotfix: Ignore negative occupancy (:pr-distributed:9012) Hendrik Makait_

  • Remove expensive tokenization for key uniqueness check (:pr-distributed:9009) Patrick Hoefler_

  • Fix CI for changes in from_map (:pr-distributed:9011) Patrick Hoefler_

  • Avoid handling stale long-running messages on scheduler (:pr-distributed:8991) Hendrik Makait_

  • Bump test_stress timeout (:pr-distributed:9002) Tom Augspurger_

  • Poll in test_rmm_metrics test (:pr-distributed:9004) Tom Augspurger_

  • Cache occupancy in WorkStealing.balance() (:pr-distributed:9005) Hendrik Makait_

  • Homogeneous balancing by accounting for in-flight requests (:pr-distributed:9003) Hendrik Makait_

  • Consistent estimation of task duration between stealing, adaptive and occupancy calculation (:pr-distributed:9000) Hendrik Makait_

  • Increase default work-stealing interval by 10x (:pr-distributed:8997) Hendrik Makait_

  • Remove occupancy plot from status dashboard (:pr-distributed:8995) Hendrik Makait_

  • Bump conda-incubator/setup-miniconda from 3.1.0 to 3.1.1 (:pr-distributed:8990)

.. _v2025.1.0:

2025.1.0

Highlights ^^^^^^^^^^

Legacy Dask DataFrame Implementation removed """"""""""""""""""""""""""""""""""""""""""""

This release drops the legacy Dask DataFrame implementation. The API with query planning is now the only available Dask DataFrame implementation.

This enforces the deprecation of the configuration:

.. code-block::

dask.config.set({"dataframe.query-planning": False})

Dask-Expr was merged into the dask package as well as the dask/dask repository. It is no longer necessary to install dask-expr separately.

Reducing Memory Pressure for Xarray Workloads """""""""""""""""""""""""""""""""""""""""""""

Dask introduced a mechanism that is called root task queuing <https://distributed.dask.org/en/stable/scheduling-policies.html#queuing>_ in 2022. This mechanism allows Dask to detect tasks that are reading data from storage and schedule them defensively to avoid memory pressure on the cluster through overproduction of these tasks. The underlying mechanism was very fragile and failed for specific types of computations like opening multiple zarr stores or loading a large number of netcdf files.

The recent changes in Dask's task graph representation allow for more robust detection of root tasks. This change makes the detection mechanism independent of the workload running and is especially beneficial for Xarray workloads.

This results in significantly more memory stability and a reduced memory footprint for workloads where root task detection was previously failing and makes the expected memory profile deterministic and independent of the topology of the task graph.

.. _v2024.12.1:

2024.12.1

Highlights ^^^^^^^^^^

Improved scheduler responsiveness for large task graphs """"""""""""""""""""""""""""""""""""""""""""""""""""""" This release reduces the number of Python object references related to tracking tasks by the Dask scheduler. This increases scheduler responsiveness by reducing the time needed to run garbage collection on the scheduler.

See :issue:8958, :pr:11608, :pr:11600, :pr:11598, :pr:11597, and :pr-distributed:8963 from Hendrik Makait_ for more details.

.. dropdown:: Additional changes

  • Fix map_overlap bug where rechunking and trim=False caused inconsistent chunkings (:pr:11605) Patrick Hoefler_

  • Avoid legacy implementation in read-csv (:pr:11603) Patrick Hoefler_

  • Remove legacy DataFrame import (:pr:11604) Patrick Hoefler_

  • asarray ignores dtype for array inputs (:pr:11586) crusaderky_

  • Add back LLM chatbot to Dask docs (:pr:11594) dchudz_

  • Bump JamesIves/github-pages-deploy-action from 4.6.9 to 4.7.2 (:pr:11593)

  • Migrate dask array creation routines to task spec (:pr:11582) James Bourbeau_

  • Migrate most of dask array random to task spec (:pr:11581) James Bourbeau_

  • Do not use local function in array.push (:pr:11576) Florian Jetter_

  • Bump conda-incubator/setup-miniconda from 3.0.3 to 3.1.0 (:pr-distributed:8922)

  • Pick random dashboard port in tests (:pr-distributed:8965) Hendrik Makait_

  • Fix formatting for NoValidWorkerException message (:pr-distributed:8967) Hendrik Makait_

  • Support pynvml>=11.5 in WSL (:pr-distributed:8962) Richard (Rick) Zamora_

  • Bump JamesIves/github-pages-deploy-action from 4.6.9 to 4.7.2 (:pr-distributed:8960)

.. _v2024.12.0:

2024.12.0

Highlights ^^^^^^^^^^

Python 3.13 Support """"""""""""""""""" This release adds support for Python 3.13. Dask now supports Python 3.10-3.13.

See :pr:11456 and :pr-distributed:8904 from Patrick Hoefler_ and James Bourbeau_ for more details.

.. dropdown:: Additional changes

  • Revert "Add LLM chatbot to Dask docs (:pr:11556)" (:pr:11577) dchudz_

  • Automatically rechunk if array in to_zarr has irregular chunks (:pr:11553) Patrick Hoefler_

  • Blockwise uses Task class (:pr:11568) Florian Jetter_

  • Migrate rechunk and reshape to task spec (:pr:11555) Patrick Hoefler_

  • Cache svg-representation for arrays (:pr:11560) Deepak Cherian_

  • Fix empty input for containers (:pr:11571) Florian Jetter_

  • Convert Bag graphs to TaskSpec graphs during optimization (:pr:11569) Florian Jetter_

  • Add LLM chatbot to Dask docs (:pr:11556) dchudz_

  • Fuse data nodes in linear fusion too (:pr:11549) Patrick Hoefler_

  • Migrate slicing code to task spec (:pr:11548) Patrick Hoefler_

  • Speed up ArraySliceDep tokenization (:pr:11551) Patrick Hoefler_

  • Fix fusing of p2p barrier tasks (:pr:11543) Patrick Hoefler_

  • Remove infra/mentions of GPU CI (:pr:11546) Charles Blackmon-Luca_

  • Temporarily disable gpuCI update CI job (:pr:11545) James Bourbeau_

  • Use BlockwiseDep to implement map_blocks keywords (:pr:11542) Patrick Hoefler_

  • Remove optimize_slices (:pr:11538) Patrick Hoefler_

  • Make reshape_blockwise a noop if shape is the same (:pr:11541) Patrick Hoefler_

  • Remove read-only flag from open_arry in open_zarr (:pr:11539) Patrick Hoefler_

  • Implement linear_fusion for task spec class (:pr:11525) Patrick Hoefler_

  • Remove recursion from TaskSpec (:pr:11477) Florian Jetter_

  • Fixup test after dask-expr change (:pr:11536) Patrick Hoefler_

  • Bump codecov/codecov-action from 3 to 5 (:pr:11532)

  • Create dask-expr frame directly without roundtripping (:pr:11529) Patrick Hoefler_

  • Add scikit-image nightly back to upstream CI (:pr:11530) James Bourbeau_

  • Remove from_dask_dataframe import (:pr:11528) Patrick Hoefler_

  • Ensure that from_array creates a copy (:pr:11524) Patrick Hoefler_

  • Simplify and improve performance of normalize chunks (:pr:11521) Patrick Hoefler_

  • Fix flaky nanquantile test (:pr:11518) Patrick Hoefler_

  • Fix tests for new read_only kwarg in zarr=3 (:pr:11516) Patrick Hoefler_

  • Fix test_jupyter.py::test_shutsdown_cleanly (:pr-distributed:8954) Hendrik Makait_

  • Install tornado from conda-forge in Python 3.13 CI (:pr-distributed:8951) James Bourbeau_

  • Restore retire workers API (:pr-distributed:8939) Florian Jetter_

  • Properly convert finalize dependencies to references (:pr-distributed:8949) Hendrik Makait_

  • Block fusion for barrier tasks (:pr-distributed:8944) Patrick Hoefler_

  • Remove infra/mentions of GPUCI (:pr-distributed:8946) Charles Blackmon-Luca_

  • Temporarily disable gpuCI update CI job (:pr-distributed:8945) James Bourbeau_

  • Remove recursion in task spec (:pr-distributed:8920) Florian Jetter_

  • Less verbose log messages for remove and register worker (:pr-distributed:8938) Florian Jetter_

  • Do not log full worker info in retire_workers (:pr-distributed:8935) Florian Jetter_

.. _v2024.11.2:

2024.11.2

.. note:: Versions 2024.11.0 and 2024.11.1 included a critical performance regression and should be skipped by every user.

Highlights ^^^^^^^^^^

Legacy Dask DataFrame Deprecated """"""""""""""""""""""""""""""""

This release deprecates the legacy Dask DataFrame implementation. The old implementation will be removed completely in a future release. Users are encourage to switch to the new implementation now and to report any issues they are facing.

Users are also encourage to check that they are only importing functions from dask.dataframe and not any of the submodules.

New quantile methods for Dask Array API """""""""""""""""""""""""""""""""""""""

Dask Array added new quantile and nanquantile methods. Previously, Dask dispatched to the NumPy implementation, which blocked the GIL a lot. This caused large slowdowns on workers with more than one tread and could lead to runtimes over 200s per chunk.

The new quantile implementation avoids many of these problems and reduces runtime to around 1s per chunk independently of the number of threads.

Consistent chunksize in Xarray rolling-construct """"""""""""""""""""""""""""""""""""""""""""""""

Using Xarrays rolling(...).construct(...) with Dask Arrays led to very large chunksizes that rarely fit into memory on a single worker.

The underlying operations is a view on the smaller NumPy array, but triggering a copy of the data will lead to very large memory usage.

.. code-block::

import xarray as xr
import dask.array as da

arr = xr.DataArray(
    da.ones((93504, 721, 1440), chunks=("auto", -1, -1)),
    dims=["time", "lat", "longitude"],
)   # Initial chunks are ~128 MiB
arr.rolling(time=30).construct("window_dim")

.. grid:: 2

.. grid-item:: **Previously**

    Individual chunks are exploding to 10 GiB, likely causing out of memory errors.

    .. image:: images/changelog/rolling-construct-exploding-chunks.png
      :width: 100%
      :align: center
      :alt: Individual chunks are exploding to 10 GiB, likely causing out of memory errors.

.. grid-item:: **Now**

    Dask will now automatically split individual chunks into chunks that will have the
    same chunksize minus a small tolerance.

    .. image:: images/changelog/rolling-construct-constant-chunks.png
      :width: 100%
      :align: center
      :alt: Individual chunks are now roughly the same size

Improved efficiency of map overlap """"""""""""""""""""""""""""""""""

map_overlap now creates smaller and more efficient graphs to keep task graphs generally a lot smaller.

The previous version injected a lot of tasks that weren't necessary, increasing the number of tasks by a factor of 2-10x of what actually necessary. This caused a lot of stress on the scheduler.

Consistent chunksizes for Einstein summation """"""""""""""""""""""""""""""""""""""""""""

Einstein summation historically led to very large chunksizes if applied to more than one Dask Array. This behavior is inherited from NumPy but led to out of memory errors on workers:

.. code-block::

import dask.array as da
arr = da.random.random((1024, 64, 64, 64, 64), chunks=(256, 16, 16, 16, 16)) # Initial chunks are 128 MiB
result = da.einsum("aijkl,amnop->ijklmnop", arr, arr)

.. grid:: 2

.. grid-item:: **Previously**

    Individual chunks are exploding to 32 GiB, very likely causing out of memory errors.

    .. image:: images/changelog/einstein-exploding-chunks.png
      :width: 100%
      :align: center
      :alt: Individual chunks are exploding to 32 GiB, very likely causing out of memory errors

.. grid-item:: **Now**

    The operation keeps individual chunksizes the same.

    .. image:: images/changelog/einstein-constant-chunks.png
      :width: 100%
      :align: center
      :alt: Individual chunks are now roughly the same size

.. dropdown:: Additional changes

  • Add changelog for Dask release (:pr:11502) Patrick Hoefler_

  • Minor updates to optional dependencies table (:pr:11503) James Bourbeau_

  • Add push for ffill like operations (:pr:11501) Patrick Hoefler_

  • Remove func packing for TaskSpec (:pr:11496) Florian Jetter_

  • Make tokenization for vindex more efficient (:pr:11493) Patrick Hoefler_

  • Cut down runtime of einstein summation test (:pr:11499) Patrick Hoefler_

  • Improve test runtime for test_rot90 (:pr:11498) Florian Jetter_

  • Disable low level optimization for TaskSpec in Bags (:pr:11495) Florian Jetter_

  • Add automatic rechunking to sliding-window-view (:pr:11479) Patrick Hoefler_

  • Add load_stored kwarg to dask.array.store (:pr:11465) Deepak Cherian_

  • Fix quantile error in two dimensions (:pr:11489) Patrick Hoefler_

  • Bump conda-incubator/setup-miniconda from 3.0.4 to 3.1.0 (:pr:11490)

  • Update map_blocks docstring (:pr:11491) Patrick Hoefler_

  • Fix einsum with empty arrays (:pr:11488) Patrick Hoefler_

  • Implement non gil-blocking quantile method (:pr:11473) Patrick Hoefler_

  • Use internal keyword for trimming in map_overlap to reduce graph size (:pr:11486) Patrick Hoefler_

  • Minor dask order refactor (:pr:11467) Florian Jetter_

  • Remove empty tasks from map_overlap (:pr:11483) Patrick Hoefler_

  • Fixup auto chunks calculation if single chunk goes below 1 (:pr:11485) Patrick Hoefler_

  • Fix CI after pandas upstream changes (:pr:11482) Patrick Hoefler_

  • Make sure that block_id and block_info don't create extra tasks (:pr:11484) Patrick Hoefler_

  • Use repeat to build nearest boundary (:pr:9666) Jean-Baptiste Bayle_

  • Remove dead code from make_blockwise (:pr:11478) Florian Jetter_

  • Patch auto-chunks calculation for rioxarray (:pr:11480) Patrick Hoefler_

  • Skip legacy test because of flaky warning (:pr:11475) Patrick Hoefler_

  • Unskip a few dask-expr tests (:pr:11474) Patrick Hoefler_

  • Keep chunk sizes consistent in einsum (:pr:11464) Patrick Hoefler_

  • Improve how normalize_chunks squashes together chunks when "auto" is set (:pr:11468) Patrick Hoefler_

  • Fix resolve_aliases when multiple aliases are in graph (:pr:11469) Patrick Hoefler_

  • Avoid cyclic import in dask.array (:pr:11472) Hendrik Makait_

  • Unskip dataframe test (:pr:11471) Patrick Hoefler_

  • Improve dask.order performance for large graphs (:pr:11466) Florian Jetter_

  • Ensure that slice(None) just maps the keys (:pr:11450) Patrick Hoefler_

  • Fix Task.__repr__() of unpickled object (:pr:11463) Peter Andreas Entschev_

  • Use TaskSpec in local dask execution (:pr:11378) Florian Jetter_

  • Adjust accuracy in test_solve_triangular_vector (:pr:11461) Florian Jetter_

  • Update Aggregation docstring (:pr:11459) Guillaume Eynard-Bontemps_

  • Implement fuse option for delayed objects (:pr:11441) Patrick Hoefler_

  • Deprecate legacy dask dataframe implementation (:pr:11437) Patrick Hoefler_

  • Fix na casting behavior for groupby.agg with arrow dtypes (:pr:11118) Patrick Hoefler_

  • Fix behavior of keys_in_tasks for TaskSpec nodes (:pr:11445) Florian Jetter_

  • Convert dtype to int instead of np.uint8 for visualizing large task graphs (:pr:11440) Patrick Hoefler_

  • Ensure dependencies are not mutated (:pr:11438) Florian Jetter_

  • Full support for task spec in dask.order (:pr:11347) Florian Jetter_

  • Remove redundant methods in P2PBarrierTask (:pr-distributed:8924) Florian Jetter_

  • Fix skipif condition for test_tell_workers_when_peers_have_left (:pr-distributed:8929) Florian Jetter_

  • Ensure ConnectionPool is closed even if network stack swallows CancelledErrors (:pr-distributed:8928) Florian Jetter_

  • Fix flaky test_server_comms_mark_active_handlers (:pr-distributed:8927) Florian Jetter_

  • Make assumption in P2P's barrier mechanism explicit (:pr-distributed:8926) Hendrik Makait_

  • Adjust timeouts in Jupyter cli test (:pr-distributed:8925) Florian Jetter_

  • Add stimulus_id to update_graph plugin hook (:pr-distributed:8923) Hendrik Makait_

  • Reduce P2P transfer task overhead (:pr-distributed:8912) Hendrik Makait_

  • Disable profiler on Python 3.11 (:pr-distributed:8916) Florian Jetter_

  • Fix test_restarting_does_not_deadlock (:pr-distributed:8849) Florian Jetter_

  • Adjust popen timeouts for testing (:pr-distributed:8848) Florian Jetter_

  • Add retry to shuffle broadcast (:pr-distributed:8900) Florian Jetter_

  • Fix test_shuffle_with_array_conversion (:pr-distributed:8909) Florian Jetter_

  • Refactor some tests (:pr-distributed:8908) Florian Jetter_

  • Graduate dask-expr from contrib to core project (:pr-distributed:8911) Hendrik Makait_

  • Skip test_tell_workers_when_peers_have_left on py10 (:pr-distributed:8910) Florian Jetter_

  • Internal cleanup of P2P code (:pr-distributed:8907) Hendrik Makait_

  • Use Task class instead of tuple (:pr-distributed:8797) Florian Jetter_

  • Increase connect timeout for test_tell_workers_when_peers_have_left (:pr-distributed:8906) Florian Jetter_

  • Remove dispatching in TaskCollection (:pr-distributed:8903) Florian Jetter_

  • Deduplicate requests to scheduler in P2P (:pr-distributed:8899) Hendrik Makait_

  • Add configurations for rootish taskgroup threshold (:pr-distributed:8898) Patrick Hoefler_

.. _v2024.10.0:

2024.10.0

Notable Changes ^^^^^^^^^^^^^^^

  • Zarr-Python 3 compatibility (:pr:11388)
  • Avoid exponentially increasing taskgraph in overlap (:pr:11423)
  • Ensure numba tokenization does not use slow pickle path (:pr:11419)

.. dropdown:: Additional changes

  • Ensure broadcast_shapes() returns integers, not NumPy scalars. (:pr:11434) Martin Yeo_
  • (fix): sparse indexing (:pr:11430) Ilan Gold_
  • Ensure that recursively calling tokenize respects ensure_deterministic (:pr:11431) Florian Jetter_
  • Make P2P more configurable (:pr-distributed:8469) Hendrik Makait_
  • Fit Dashboard worker table to page width (:pr-distributed:8897) Jacob Tomlinson_
  • Raise helpful error when using the wrong plugin base classes (:pr-distributed:8893) Jacob Tomlinson_
  • Fix url escaping on exceptions dashboard for non-string keys (:pr-distributed:8891) Patrick Hoefler_
  • Add meaningful error for out of disk exception during write (:pr-distributed:8886) Hendrik Makait_
  • Fix binary operations with scalar on the left (:pr-expr:1150) Patrick Hoefler_
  • Raise exception when calculating divisons (:pr-expr:1149) Patrick Hoefler_
  • Fix merge_asof for single partition (:pr-expr:1145) Patrick Hoefler_
  • Improve handling of optional dependencies in analyze and explain (:pr-expr:1146) Hendrik Makait_
  • Fix alignment issue with groupby index accessors (:pr-expr:1142) Patrick Hoefler_
  • Fix displaying timestamp scalar (:pr-expr:1141) Patrick Hoefler_

.. _v2024.9.1:

2024.9.1

Highlights ^^^^^^^^^^

Improved adaptive scaling resilience """""""""""""""""""""""""""""""""""" Adaptive scaling clusters now recover from spurious errors during scaling.

See :pr-distributed:8871 by Hendrik Makait_ for more details.

.. dropdown:: Additional changes

  • Improve error message for incorrect columns order in meta information (:pr:11393) Dmitry Balabka_

  • Update gpuCI RAPIDS_VER to 24.12 (:pr:11407)

  • Bump jacobtomlinson/gha-anaconda-package-version from 0.1.3 to 0.1.4 (:pr:11405)

  • Switch to using zarr.open_array instead of using the zarr.Array constructor (:pr:11387) Joe Hamman_

  • Update gpuCI RAPIDS_VER to 24.12 (:pr-distributed:8879)

  • Don't consider scheduler idle while executing Scheduler.update_graph (:pr-distributed:8877) Hendrik Makait_

  • Bump jacobtomlinson/gha-anaconda-package-version from 0.1.3 to 0.1.4 (:pr-distributed:8878)

  • Support P2P rechunking datetime arrays (:pr-distributed:8875) James Bourbeau_

.. _v2024.9.0:

2024.9.0

Highlights ^^^^^^^^^^

Bump Bokeh minimum version to 3.1.0 """"""""""""""""""""""""""""""""""" bokeh>=3.1.0 is now required for diagnostics and the distributed cluster dashboard.

See :pr:11375 and :pr-distributed:8861 by James Bourbeau_ for more details.

Introduce new Task class """""""""""""""""""""""" Add a Task class to replace tuples for task specification.

See :pr:11248 by Florian Jetter_ for more details.

.. dropdown:: Additional changes

  • Bump peter-evans/create-pull-request from 6 to 7 (:pr:11380)

  • Reduce overhead in tokenize (:pr:11373) Florian Jetter_

  • Move tokenize to dedicated submodule (:pr:11371) Florian Jetter_

  • Ensure process_runnables is not too eager in the presence of multiple splits (:pr:11367) Florian Jetter_

  • Use np.min_scalar_type in shuffle (:pr:11369) James Bourbeau_

  • Write indexing arrays into dask graph to reduce size for multiple xarray variables (:pr:11362) Patrick Hoefler_

  • Cast indexer to minimal dtype in shuffle (:pr:11364) Patrick Hoefler_

  • Reduce memory usage of dask.order (:pr:11361) Florian Jetter_

  • Bump JamesIves/github-pages-deploy-action from 4.6.3 to 4.6.4 (:pr:11366)

  • precommit autoupdate (:pr:11360) Florian Jetter_

  • Homogeneously schedule P2P's unpack tasks (:pr-distributed:8873) Hendrik Makait_

  • Work/fix firewall for localhost (:pr-distributed:8868) Mario Linker_

  • Use new tokenize module (:pr-distributed:8858) James Bourbeau_

  • Point to user code with idempotent plugin warning (:pr-distributed:8856) James Bourbeau_

  • Fix test nanny timeout (:pr-distributed:8847) Florian Jetter_

  • Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.4 (:pr-distributed:8853)

  • Speed up Client.map by computing token only once for func and kwargs (:pr-distributed:8855) Florian Jetter_

  • Update pre-commit (:pr-distributed:8852) Florian Jetter_

.. _v2024.8.2:

2024.8.2

Highlights ^^^^^^^^^^

Automatic selection of rechunking method """"""""""""""""""""""""""""""""""""""""

To enable users to rechunk data at larger scales than before, Dask now automatically chooses an appropriate rechunking method when rechunking on a cluster. This requires no additional configuration and is enabled by default.

Specifically, Dask chooses between task-based and P2P rechunking. While task-based rechunking has been the previous default, P2P rechunking is beneficial when rechunking requires almost all-to-all communication between the old and new chunks, e.g., when changing between spacial and temporal chunking. In these cases, P2P rechunking offers constant memory usage and creates smaller task graphs. As a result, it works for cases where tasks-based rechunking would have previously failed.

To disable automatic selection, users can select their preferred method via the configuration

.. code-block::

import dask.config
# Choose either "tasks" or "p2p"
dask.config.set({"array.rechunk.method": "tasks"})

or when rechunking

.. code-block::

import dask.array as da
arr = da.random.random(size=(1000, 1000, 365), chunks=(-1, -1, "auto"))
# Choose either "tasks" or "p2p"
arr = arr.rechunk(("auto", "auto", -1), method="tasks")

See :pr:11337 by Hendrik Makait_ for more details.

New shuffle API for Dask Arrays """""""""""""""""""""""""""""""

Dask added a shuffle-API to Dask Arrays. This API allows for shuffling the data along a single dimension. It will ensure that every group of elements along this dimension are in exactly one chunk. This is a very useful operation for GroupBy-Map patterns in Xarray. See :py:func:~dask.array.Array.shuffle for more information and API signature.

See :pr:11267, :pr:11311 and :pr:11326 by Patrick Hoefler_ for more details.

New blockwise_reshape API for Dask Arrays """""""""""""""""""""""""""""""""""""""""

The new :py:func:~dask.array.blockwise_reshape enables an embarassingly parallel reshaping operation for cases where you don't care about the order of the underlying array. It is embarassingly parallel and doesn't trigger a rechunking operation under the hood anymore. This is useful when you don't care about the order of the resulting Array, i.e. if a reduction is applied to the array or if the reshaping is only temporary.

.. code-block::

arr = da.random.random(size=(100, 100, 48_000), chunks=(1000, 100, 83)
result = reshape_blockwise(arr, (10_000, 48_000))
result.sum()

# or: do something that preserves the shape of each chunk

result = reshape_blockwise(result, (100, 100, 48_000), chunks=arr.chunks)

Dask will automatically calculate the resulting chunks if the number of dimensions is reduced, but you have to specify the resulting chunks if the number of dimensions is increased.

Reshaping a Dask Array oftentimes creates a very complicated computations with rechunk operations in between because Dask respect the C ordering of the Array by default. This ensures that the resulting Dask Array is returned in the same order as the corresponding NumPy Array. However, this can lead to very inefficient computations. The blockwise_reshape is a lot more efficient than the default implemenation if you don't care about the order.

.. warning::

Blockwise reshape operations are more efficient as the default, but they will
return an Array that is ordered differently. Use with care!

See :pr:11328 by Patrick Hoefler_ for more details.

Mutlidimensional positional indexing keeping chunksizes consistent """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Indexing a Dask Array with :py:func:~dask.array.vindex previously created a single output chunk along the dimensions that were indexed. vindex is commonly used in Xarray when indexing multiple dimensions in a single step, i.e.:

.. code-block::

arr = xr.DataArray(
    da.random.random((100, 100, 100), chunks=(5, 5, 50)),
    dims=['a', "b", "c"],
)

Previously, this put the indexed dimensions into a single chunk:

.. image:: images/changelog/vindex-memory-increase.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB

Dask now uses an improved algorithm that ensures that the chunksizes are kept consistent:

.. image:: images/changelog/vindex-memory-constant.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB

See :pr:11330 by Patrick Hoefler_ for more details.

.. dropdown:: Additional changes

  • Add changelog entries for shuffle, vindex and blockwise_reshape (:pr:11350) Patrick Hoefler_

  • Ensure persisted collections are released without GC (:pr:11348) Florian Jetter_

  • Update zoom link for dask meeting (:pr:11357) Sarah Charlotte Johnson_

  • Add more docstring examples for normalize_chunks (:pr:11271) Illviljan_

  • Choose automatically between tasks-based and p2p rechunking (:pr:11337) Hendrik Makait_

  • Implement blockwise reshaping API for arrays (:pr:11328) Patrick Hoefler_

  • Make rechunking in shuffle more intelligent to distribute unevenly if necessary (:pr:11326) Patrick Hoefler_

  • Increase visibility of GPU CI updates (:pr:11345) Charles Blackmon-Luca_

  • Update numpy and pyarrow versions in install docs (:pr:11340) James Bourbeau_

  • Fixup dask and distributed dependencies (:pr:11338) Patrick Hoefler_

  • Bump numpy>=1.24 and pyarrow>=14.0.1 minimum versions (:pr:11331) James Bourbeau_

  • Add crick back to Python 3.11+ CI builds (:pr:11335) James Bourbeau_

  • Preserve chunksizes in vindex (:pr:11330) Patrick Hoefler_

  • Fix dask.array.fft mismatch with Numpy's interface (add support for norm argument) (:pr:10665) joanrue_

  • Pass additional parameters to rechunk_p2p (:pr:11319) Hendrik Makait_

  • Fix docstring formatting for map_overlap (:pr:11332) Tao Xin_

  • Fix NumPy overflowing for prod on 2.0 (:pr:11327) Patrick Hoefler_

  • Ensure axes are positive / add tests for negative axes (:pr:10812) joanrue_

  • Fix map_overlap with new_axis (:pr:11128) David Stansby_

  • Avoid capturing code of xdist (:pr-distributed:8846) Florian Jetter_

  • Reduce memory footprint of culling P2P rechunking (:pr-distributed:8845) Hendrik Makait_

  • Add tests for choosing default rechunking method (:pr-distributed:8843) Hendrik Makait_

  • Increase visibility of GPU CI updates (:pr-distributed:8841) Charles Blackmon-Luca_

  • Bump test_pause_while_idle timeout (:pr-distributed:8844) Florian Jetter_

  • Concatenate small input chunks before P2P rechunking (:pr-distributed:8832) Hendrik Makait_

  • Remove dump cluster from gen_cluster (:pr-distributed:8823) Florian Jetter_

  • Bump numpy>=1.24 and pyarrow>=14.0.1 minimum versions (:pr-distributed:8837) James Bourbeau_

  • Fix PipInstall plugin on Worker (:pr-distributed:8839) Hendrik Makait_

  • Remove more Python 3.10 compatibility code (:pr-distributed:8824) James Bourbeau_

  • Use task-based rechunking to prechunk along partial boundaries (:pr-distributed:8831) Hendrik Makait_

  • Ensure client_desires_keys does not corrupt Scheduler state (:pr-distributed:8827) Florian Jetter_

  • Bump minimum cloudpickle to 3 (:pr-distributed:8836) James Bourbeau_

.. _v2024.8.1:

2024.8.1

Highlights ^^^^^^^^^^

Improve output chunksizes for reshaping Dask Arrays """""""""""""""""""""""""""""""""""""""""""""""""""

Reshaping a Dask Array oftentimes squashed the dimensions to reshape into a single chunk. This caused very large output chunks and subsequently a lot of out of memory errors and performance issues.

.. code-block::

arr = da.ones(shape=(1000, 100, 48_000), chunks=(1000, 100, 83))
arr.reshape(1000, 100, 4, 12_000)

Previously, this put the last dimension into a single chunk of size 12_000.

.. image:: images/changelog/reshape-memory-increase.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB

The new algorithm will ensure that the chunk-size between in- and output is kept the same. This will avoid large increases in chunk-size and fragmentation of chunks.

.. image:: images/changelog/reshape-constant-memory.png :width: 75% :align: center :alt: Size of each individual chunk stays the same

Improve scheduling efficiency for Xarray Rechunk-GroupBy-Reduce patterns """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

The scheduler previously created an inefficient execution graph for Xarray GroupBy-Reduction patterns that use the cohorts strategy:

.. code-block:: python

import xarray as xr

arr = xr.open_zarr(...)
arr.chunk(time=TimeResampler("ME")).groupby("time.month").mean()

An issue in the algorithm that creates the execution order of the task graph lead to an inefficient execution strategy that accumulates a lot of unnecessary memory on the cluster. The improvement is very similar to :ref:the previous ordering improvement in 2024.08.0 <label.xarray_groupby_ordering>.

Drop support for Python 3.9 """""""""""""""""""""""""""

This release drops support for Python 3.9 in accordance with NEP 29. Python 3.10 is now the required minimum version to run Dask.

See :pr:11245 and :pr-distributed:8793 by Patrick Hoefler_ for more details.

.. dropdown:: Additional changes

  • Ensure pickle does not change tokens (:pr:11320) Florian Jetter_

  • Add changelog entry for reshape and ordering improvements (:pr:11324) Patrick Hoefler_

  • Rename chunksize-tolerance option (:pr:11317) Patrick Hoefler_

  • Upgrade gpuCI and fix Dask Array failures with "cupy" backend (:pr:11309) Richard (Rick) Zamora_

  • Implement automatic rechunking for shuffle (:pr:11311) Patrick Hoefler_

  • Ensure we test against numpy 2 in CI (:pr:11182) James Bourbeau_

  • Revert "Test ordering on distributed scheduler (:pr:11310)" (:pr:11321) Florian Jetter_

  • Test ordering on distributed scheduler (:pr:11310) Florian Jetter_

  • Add tests to cover more cases of new reshape implementation (:pr:11313) Patrick Hoefler_

  • Order: Choose better target for branches with multiple leaf nodes (:pr:11303) Patrick Hoefler_

  • Order: Ensure runnable tasks are certainly runnable (:pr:11305) Florian Jetter_

  • Fix upstream numpy build (:pr:11304) Patrick Hoefler_

  • Make shuffle a no-op if possible (:pr:11291) Patrick Hoefler_

  • Keep chunksize consistent in reshape (:pr:11273) Patrick Hoefler_

  • Enable slicing with only one unknown chunk (:pr:11301) Patrick Hoefler_

  • Link to dask vs spark benchmarks on Dask docs (:pr:11289) Sarah Charlotte Johnson_

  • Fix slicing for masked arrays (:pr:11300) Patrick Hoefler_

  • Array: fix asarray for array input with dtype (:pr:11288) Lucas Colley_

  • Add numpy constants to array api (:pr:11287) Lucas Colley_

  • Ignore typing of return value (:pr:11286) Patrick Hoefler_

  • Remove automatic resizing in reshape (:pr:11269) Patrick Hoefler_

  • API: expose np dtypes in dask.array namespace (:pr:11178) Lucas Colley_

  • Reduce frequency of unmanaged memory use warning (:pr-distributed:8834) Patrick Hoefler_

  • Update gpuCI RAPIDS_VER to 24.10 (:pr-distributed:8786)

  • Avoid RuntimeError: dictionary changed size during iteration in Server._shift_counters() (:pr-distributed:8828) Hendrik Makait_

  • Improve concurrent close for scheduler (:pr-distributed:8829) Hendrik Makait_

  • MINOR: Extract truncation logic out of partial concatenation in P2P rechunking (:pr-distributed:8826) Hendrik Makait_

  • avoid excessive attribute access overhead for remove_from_task_prefix_count (:pr-distributed:8821) Florian Jetter_

  • Avoid key validation if validation is disabled (:pr-distributed:8822) Florian Jetter_

  • Log worker_client event (:pr-distributed:8819) James Bourbeau_

.. _v2024.8.0:

2024.8.0

Highlights ^^^^^^^^^^

Improve efficiency and performance of slicing with positional indexers """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

Performance improvement for slicing a Dask Array with a positional indexer. Random access patterns are now more stable and produce easier-to-use results.

.. code-block:: python

x[slice(None), [1, 1, 3, 6, 3, 4, 5]]

Using a positional indexer was previously prone to drastically increasing the number of output chunks and generating a very large task graph. This has been fixed with a more efficient algorithm.

The new algorithm will keep the chunk-sizes along the axis that is indexed the same to avoid fragmentation of chunks or a large increase in chunk-size.

See :pr:11262 and :pr:11267 by Patrick Hoefler_ for more details and performance benchmarks.

.. _label.xarray_groupby_ordering:

Improve scheduling efficiency for Xarray GroupBy-Reduce patterns """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

The scheduler previously created an inefficient execution graph for Xarray GroupBy-Reduction patterns like:

.. code-block:: python

import xarray as xr

arr = xr.open_zarr(...)
arr.groupby("time.month").mean()

An issue in the algorithm that creates the execution order of the task graph lead to an inefficient execution strategy that accumulates a lot of unneceessary memory on the cluster.

.. image:: images/changelog/dask-order-growing-memory.png :width: 75% :align: center :alt: Memory keeps accumulating on the cluster when running an embarassingly parallel operation.

The operation itself is embarassingly parallel. Using the proper execution strategy the scheduler can now execute the operation with constant memory, avoiding spilling and allowing us to scale to larger datasets.

.. image:: images/changelog/dask-order-constant-memory.png :width: 75% :align: center :alt: Same operation is running with constant memory usage for the whole computation and can scale for bigger datasets.

See :pr-distributed:8818 by Patrick Hoefler_ for more details and examples.

.. dropdown:: Additional changes

  • Add changelog for dask order patch (:pr:11278) Patrick Hoefler_

  • Add regression test for xarray map reduce (:pr:11277) Florian Jetter_

  • Add changelog entry for take (:pr:11274) Patrick Hoefler_

  • Revert "order: remove data task graph normalization" (:pr:11276) Patrick Hoefler_

  • Use the shuffle algorithm for take (:pr:11267) Patrick Hoefler_

  • Implement task-based array shuffle (:pr:11262) Patrick Hoefler_

  • Remove data task graph normalization (:pr:11263) Florian Jetter_

  • Update zoom link for monthly meeting (:pr:11265) Sarah Charlotte Johnson_

  • Update data loading section of best practices (:pr:11247) Patrick Hoefler_

  • Match default chunksize in docstring to actual default set in code (:pr:11254) Bernhard Raml_

  • Fixup casting error in pandas 3 (:pr:11250) Patrick Hoefler_

  • Skip new warning from pandas (:pr:11249) Patrick Hoefler_

  • Fix pandas nightly bugs (:pr:11244) Patrick Hoefler_

  • Run graph normalisation after dask order (:pr-distributed:8818) Patrick Hoefler_

  • Update large graph size warning to remove scatter recommendation (:pr-distributed:8815) Patrick Hoefler_

  • Fail tasks exceeding no-workers-timeout (:pr-distributed:8806) Hendrik Makait_

  • Fix exception handling for NannyPlugin.setup and NannyPlugin.teardown (:pr-distributed:8811) Hendrik Makait_

  • Fix exception handling for WorkerPlugin.setup and WorkerPlugin.teardown (:pr-distributed:8810) Hendrik Makait_

  • typo fix (:pr-distributed:8812) alex-rakowski_

  • Fix if / else for send_recv_from_rpc (:pr-distributed:8809) Patrick Hoefler_

  • Ensure that adaptive only stops once (:pr-distributed:8807) Hendrik Makait_

  • Reduce noise from GC-related logging (:pr-distributed:8804) Hendrik Makait_

  • Remove unused delete_interval and synchronize_worker_interval from Scheduler (:pr-distributed:8801) Hendrik Makait_

  • Change log level for Compute Failed log message (:pr-distributed:8802) Patrick Hoefler_

  • Add Prometheus metric for time spent on GC (:pr-distributed:8803) Hendrik Makait_

  • Add Prometheus metrics for dask_worker_{added|removed}_total (:pr-distributed:8798) Hendrik Makait_

  • Add log event for worker-ttl-timed-out (:pr-distributed:8800) Hendrik Makait_

  • Add Prometheus metrics for dask_client_connections_{added|removed}_total (:pr-distributed:8799) Hendrik Makait_

  • Fix PackageInstall plugin (:pr-distributed:8794) Hendrik Makait_

  • Make stealing more robust (:pr-distributed:8788) Hendrik Makait_

  • Leave a warning about future instantiation (:pr-distributed:8782) Florian Jetter_

.. _v2024.7.1:

2024.7.1

Highlights ^^^^^^^^^^

More resilient distributed lock """""""""""""""""""""""""""""""

:py:class:distributed.Lock is now resilient to worker failures. Previously deadlocks were possible in cases where a lock-holding worker was lost and/or failed to release the lock due to an error.

See :pr-distributed:8770 by Florian Jetter_ for more details.

.. dropdown:: Additional changes

  • Remove and warn of persist usage (:pr:11237) Patrick Hoefler_

  • Preserve timestamp unit during meta creation (:pr:11233) Patrick Hoefler_

  • Ensure that dask-expr DataFrames are optimized when put into delayed (:pr:11231) Patrick Hoefler_

  • Fixes for d freq deprecation in pandas=3 (:pr:11228) James Bourbeau_

  • bump approx threshold for test_quantile (:pr:10720) Florian Jetter_

  • Bump xarray-contrib/issue-from-pytest-log from 1.2.8 to 1.3.0 (:pr:11221)

  • Bump JamesIves/github-pages-deploy-action from 4.6.1 to 4.6.3 (:pr:11222)

  • Ensure Lock always register with scheduler (:pr-distributed:8781) Florian Jetter_

  • Temporarily pin setuptools < 71 (:pr-distributed:8785) James Bourbeau_

  • Restore len() on TaskPrefix (:pr-distributed:8783) Hendrik Makait_

  • Avoid false positives for p2p-failed log event (:pr-distributed:8777) Hendrik Makait_

  • Expose paused and retired workers separately in prometheus (:pr-distributed:8613) Patrick Hoefler_

  • Creating transitions-failures log event (:pr-distributed:8776) alex-rakowski_

  • Implement HLG layer for P2P rechunking (:pr-distributed:8751) Hendrik Makait_

  • Add another test for a possible deadlock scenario caused by (:pr-distributed:8703) (:pr-distributed:8769) Hendrik Makait_

  • Raise an error if compute on persisted collection with released futures (:pr-distributed:8764) Florian Jetter_

  • Re-raise P2PConsistencyError from failed P2P tasks (:pr-distributed:8748) Hendrik Makait_

  • Robuster faster tests memory sampler (:pr-distributed:8758) Florian Jetter_

  • Fix scheduler_bokeh::test_shuffling (:pr-distributed:8766) Florian Jetter_

  • Increase timeouts for pubsub::test_client_worker (:pr-distributed:8765) Florian Jetter_

  • Factor out async taskgroup (:pr-distributed:8756) Florian Jetter_

  • Don't sort keys lexicographically in worker table (:pr-distributed:8753) Florian Jetter_

  • Use functools.cache instead of functools.lru_cache for extremely often called functions (:pr-distributed:8762) Jonas Dedden_

  • Robuster deeply nested structures (:pr-distributed:8730) Florian Jetter_

  • Adding HLG to MAP (:pr-distributed:8740) alex-rakowski_

  • Add close worker button to worker info page (:pr-distributed:8742) James Bourbeau_

.. _v2024.7.0:

2024.7.0

Highlights ^^^^^^^^^^

Drop support for pandas 1.x """""""""""""""""""""""""""

This release drops support for pandas<2. pandas 2.0 is now the required minimum version to run Dask DataFrame.

The mimimum version of partd was also raised to 1.4.0. Versions before 1.4 are not compatible with pandas 2.

See :pr:11199 by Patrick Hoefler_ for more details.

Publish-subscribe APIs deprecated """""""""""""""""""""""""""""""""

:py:class:distributed.Pub and :py:class:distributed.Sub have been deprecated and will be removed in a future release. Please switch to :py:func:distributed.Client.log_event and :py:func:distributed.Worker.log_event instead.

See :pr-distributed:8724 by Hendrik Makait_ for more details.

.. dropdown:: Additional changes

  • Only count data that is in memory for xarray sizeof (:pr:11206) Florian Jetter_

  • Fix botocore re-raising error (:pr:11209) Patrick Hoefler_

  • Update Coiled links in documentation (:pr:11211) Sarah Charlotte Johnson_

  • Add some array-expr methods (:pr:11210) Patrick Hoefler_

  • Fix quantile for arrow dtypes (:pr:11202) Patrick Hoefler_

  • Add utility to verify optional dependencies (:pr:11205) Patrick Hoefler_

  • Implement array expression switch (:pr:11203) Patrick Hoefler_

  • Remove no longer supported ipython reference (:pr:11196) Patrick Hoefler_

  • Remove from_delayed references (:pr:11195) Patrick Hoefler_

  • Add other IO connectors to docs (:pr:11189) Patrick Hoefler_

  • Fix assert_eq import from cudf (:pr-distributed:8747) James Bourbeau_

  • Log traceback upon task error (:pr-distributed:8746) Hendrik Makait_

  • Update system monitor when polling Prometheus metrics (:pr-distributed:8745) Hendrik Makait_

  • Bump pandas to 2.0 in mindeps build (:pr-distributed:8743) James Bourbeau_

  • Refactor event logging functionality into broker (:pr-distributed:8731) Hendrik Makait_

  • Drop support for pandas 1.X (:pr-distributed:8741) Hendrik Makait_

  • Remove is_python_shutting_down (:pr-distributed:8492) Hendrik Makait_

  • Fix test_task_state_instance_are_garbage_collected (:pr-distributed:8735) Hendrik Makait_

  • Fix floating-point inaccuracy (:pr-distributed:8736) Hendrik Makait_

  • Fix pynvml handles (:pr-distributed:8693) Benjamin Zaitlen_

  • get_ip: handle getting 0.0.0.0 (:pr-distributed:8712) Adam Williamson_

  • Remove FutureWarning in test_task_state_instance_are_garbage_collected (:pr-distributed:8734) Hendrik Makait_

  • Fix mindeps-testing on CI (:pr-distributed:8728) Hendrik Makait_

  • Extract tests related to event-logging into separate file (:pr-distributed:8733) Hendrik Makait_

  • Use safer context for ProcessPoolExecutor (:pr-distributed:8715) Elliott Sales de Andrade_

  • Cache URL encoding of worker addresses in dashboard (:pr-distributed:8725) Florian Jetter_

  • More robust bokeh test_shuffling (:pr-distributed:8727) Florian Jetter_

  • Fix type in actor docs (:pr-distributed:8711) Sultan Orazbayev_

  • More useful warning if a plugin type is provided instead of instance (:pr-distributed:8689) Florian Jetter_

  • Improve error on cancelled tasks due to disconnect (:pr-distributed:8705) Hendrik Makait_

  • Fix wait condition on test_forget_errors (:pr-distributed:8714) Elliott Sales de Andrade_

  • Skip test_deadlock_dependency_of_queued_released (:pr-distributed:8723) Hendrik Makait_

  • Fix test_quiet_client_close (:pr-distributed:8722) Hendrik Makait_

  • Fix cleanup iteration in save_sys_modules (:pr-distributed:8713) Elliott Sales de Andrade_

  • Add quotes to missing bokeh installation commands (:pr-distributed:8717) James Bourbeau_

.. _v2024.6.2:

2024.6.2

This is a patch release to update an issue with dask and distributed version pinning in the 2024.6.1 release.

.. dropdown:: Additional changes

  • Get docs build passing (:pr:11184) James Bourbeau_
  • profile._f_lineno: handle next_line being None in Python 3.13 (:pr:8710) Adam Williamson_

.. _v2024.6.1:

2024.6.1

Highlights ^^^^^^^^^^

This release includes a critical fix that fixes a deadlock that can arise when dependencies of root-ish tasks are rescheduled, e.g. due to a worker being lost.

See :pr-distributed:8703 by Hendrik Makait_ for more details.

.. dropdown:: Additional changes

  • Cache global query-planning config (:pr:11183) Richard (Rick) Zamora_
  • Python 3.13 fixes (:pr:11185) Adam Williamson_
  • Fix test_map_freq_to_period_start for pandas=3 (:pr:11181) James Bourbeau_
  • Bump release-drafter/release-drafter from 5 to 6 (:pr-distributed:8699)

.. _v2024.6.0:

2024.6.0

Highlights ^^^^^^^^^^

memmap array tokenization """"""""""""""""""""""""" Tokenizing memmap arrays will now avoid materializing the array into memory.

See :pr:11161 by Florian Jetter_ for more details.

.. dropdown:: Additional changes

  • Fix test_dt_accessor with query planning disabled (:pr:11177) James Bourbeau_

  • Use packaging.version.Version (:pr:11171) James Bourbeau_

  • Remove deprecated dask.compatibility module (:pr:11172) James Bourbeau_

  • Ensure compatibility for xarray.NamedArray (:pr:11168) Hendrik Makait_

  • Estimate sizes of xarray collections (:pr:11166) Florian Jetter_

  • Add section about futures and variables (:pr:11164) Florian Jetter_

  • Update docs for combined Dask community meeting info (:pr:11159) Sarah Charlotte Johnson_

  • Avoid rounding error in test_prometheus_collect_count_total_by_cost_multipliers (:pr-distributed:8687) Hendrik Makait_

  • Log key collision count in update_graph log event (:pr-distributed:8692) Hendrik Makait_

  • Automate GitHub Releases when new tags are pushed (:pr-distributed:8626) Jacob Tomlinson_

  • Fix log event with multiple topics (:pr-distributed:8691) Hendrik Makait_

  • Rename safe to expected in Scheduler.remove_worker (:pr-distributed:8686) Hendrik Makait_

  • Log event during failure (:pr-distributed:8663) Hendrik Makait_

  • Eagerly update aggregate statistics for TaskPrefix instead of calculating them on-demand (:pr-distributed:8681) Hendrik Makait_

  • Improve graph submission time for P2P rechunking by avoiding unpack recursion into indices (:pr-distributed:8672) Florian Jetter_

  • Add safe keyword to remove-worker event (:pr-distributed:8647) alex-rakowski_

  • Improved errors and reduced logging for P2P RPC calls (:pr-distributed:8666) Hendrik Makait_

  • Adjust P2P tests for dask-expr (:pr-distributed:8662) Hendrik Makait_

  • Iterate over copy of Server.digests_total_since_heartbeat to avoid RuntimeError (:pr-distributed:8670) Hendrik Makait_

  • Log task state in Compute Failed (:pr-distributed:8668) Hendrik Makait_

  • Add Prometheus gauge for task groups (:pr-distributed:8661) Hendrik Makait_

  • Fix too strict assertion in shuffle code for pandas subclasses (:pr-distributed:8667) Joris Van den Bossche_

  • Reduce noise from erring tasks that are not supposed to be running (:pr-distributed:8664) Hendrik Makait_

.. _v2024.5.2:

2024.5.2

This release primarily contains minor bug fixes.

.. dropdown:: Additional changes

  • Fix nightly Zarr installation in CI (:pr:11151) James Bourbeau_

  • Add python 3.11 build to GPU CI (:pr:11135) Charles Blackmon-Luca_

  • Update gpuCI RAPIDS_VER to 24.08 (:pr:11141)

  • Update test_groupby_grouper_dispatch (:pr:11144) Richard (Rick) Zamora_

  • Bump JamesIves/github-pages-deploy-action from 4.6.0 to 4.6.1 (:pr:11136)

  • Unskip test_array_function_sparse with new sparse release (:pr:11139) James Bourbeau_

  • Fix test_parse_dates_multi_column on pandas=3 (:pr:11132) James Bourbeau_

  • Don't draft release notes for tagged commits (:pr:11138) Jacob Tomlinson_

  • Reduce task group count for partial P2P rechunks (:pr-distributed:8655) Hendrik Makait_

  • Update gpuCI RAPIDS_VER to 24.08 (:pr-distributed:8652)

  • Submit collections metadata to scheduler (:pr-distributed:8612) Florian Jetter_

  • Fix indent in code example in task-launch.rst (:pr-distributed:8650) Ray Bell_

  • Avoid multiple WorkerState sphinx error (:pr-distributed:8643) James Bourbeau_

.. _v2024.5.1:

2024.5.1

Highlights ^^^^^^^^^^

NumPy 2.0 support """"""""""""""""" This release contains compatibility updates for the upcoming NumPy 2.0 release.

See :pr:11096 by Benjamin Zaitlen_ and :pr:11106 by James Bourbeau_ for more details.

Increased Zarr store support """""""""""""""""""""""""""" This release contains adds support for MutableMapping-backed Zarr stores like :py:class:zarr.storage.DirectoryStore, etc.

See :pr:10422 by Greg M. Fleishman_ for more details.

.. dropdown:: Additional changes

  • Minor updates to ML page (:pr:11129) James Bourbeau_

  • Skip failing sparse test on 0.15.2 (:pr:11131) James Bourbeau_

  • Make sure nightly pyarrow is installed in upstream CI build (:pr:11121) James Bourbeau_

  • Add initial draft of ML overview document (:pr:11114) Matthew Rocklin_

  • Test query-planning in gpuCI (:pr:11060) Richard (Rick) Zamora_

  • Avoid pytest error when skipping NumPy 2.0 tests (:pr:11110) James Bourbeau_

  • Use nightly h5py in upstream CI build (:pr:11108) James Bourbeau_

  • Use nightly scikit-image in upstream CI build (:pr:11107) James Bourbeau_

  • Bump actions/checkout from 4.1.4 to 4.1.5 (:pr:11105)

  • Enable parquet append tests after fix (:pr:11104) Patrick Hoefler_

  • Skip fastparquet tests for numpy 2 (:pr:11103) Patrick Hoefler_

  • Fix misspelling found by codespell (:pr:11097) Dimitri Papadopoulos Orfanos_

  • Fix doc build (:pr:11099) Patrick Hoefler_

  • Clean up percentiles_summary logic (:pr:11094) Richard (Rick) Zamora_

  • Apply ruff/flake8-implicit-str-concat rule ISC001 (:pr:11098) Dimitri Papadopoulos Orfanos_

  • Fix clocks on Windows with Python 3.13 (:pr-distributed:8642) Victor Stinner_

  • Fix "Print host info" CI step on Mac OS (arm64) (:pr-distributed:8638) Hendrik Makait_

.. _v2024.5.0:

2024.5.0

Highlights ^^^^^^^^^^

This release primarily contains minor bugfixes.

.. dropdown:: Additional changes

  • Don't link to click intersphinx dev version (:pr:11091) M Bussonnier_

  • Fix API doc links for some dask-expr expressions (:pr:11092) Patrick Hoefler_

  • Add dask-expr to upstream build (:pr:11086) Patrick Hoefler_

  • Add melt support when query-planning is enabled (:pr:11088) Richard (Rick) Zamora_

  • Skip dataframe/product when in numpy 2 envs (:pr:11089) Benjamin Zaitlen_

  • Add plots to illustrate what the optimizer does (:pr:11072) Patrick Hoefler_

  • Fixup pandas upstream tests (:pr:11085) Patrick Hoefler_

  • Bump conda-incubator/setup-miniconda from 3.0.3 to 3.0.4 (:pr:11084)

  • Bump actions/checkout from 4.1.3 to 4.1.4 (:pr:11083)

  • Fix CI after pytest changes (:pr:11082) Patrick Hoefler_

  • Fixup tests for more efficient dask-expr implementation (:pr:11071) Patrick Hoefler_

  • Generalize clear_known_categories utility (:pr:11059) Richard (Rick) Zamora_

  • Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.0 (:pr:11062)

  • Bump release-drafter/release-drafter from 5 to 6 (:pr:11063)

  • Bump actions/checkout from 4.1.2 to 4.1.3 (:pr:11061)

  • Update GPU CI RAPIDS_VER to 24.06, disable query planning (:pr:11045) Charles Blackmon-Luca_

  • Move tests (:pr-distributed:8631) Hendrik Makait_

  • Bump actions/checkout from 4.1.2 to 4.1.3 (:pr-distributed:8628)

.. _v2024.4.2:

2024.4.2

Highlights ^^^^^^^^^^

Trivial Merge Implementation """"""""""""""""""""""""""""

The Query Optimizer will inspect quires to determine if a merge(...) or groupby(...).apply(...) requires a shuffle. A shuffle can be avoided, if the DataFrame was shuffled on the same columns in a previous step without any operations in between that change the partitioning layout or the relevant values in each partition.

.. code-block:: python

>>> result = df.merge(df2, on="a")
>>> result = result.merge(df3, on="a")

The Query optimizer will identify that result was previously shuffled on "a" as well and thus only shuffle df3 in the second merge operation before doing a blockwise merge.

Auto-partitioning in read_parquet """""""""""""""""""""""""""""""""""""

The Query Optimizer will automatically repartition datasets read from Parquet files if individual partitions are too small. This will reduce the number of partitions in consequentially also the size of the task graph.

The Optimizer aims to produce partitions of at least 75MB and will combine multiple files together if necessary to reach this threshold. The value can be configured by using

.. code-block:: python

>>> dask.config.set({"dataframe.parquet.minimum-partition-size": 100_000_000})

The value is given in bytes. The default threshold is relatively conservative to avoid memory issues on worker nodes with a relatively small amount of memory per thread.

.. dropdown:: Additional changes

  • Add GitHub Releases automation (:pr:11057) Jacob Tomlinson_

  • Add changelog entries for new release (:pr:11058) Patrick Hoefler_

  • Reinstate try/except block in _bind_property (:pr:11049) Lawrence Mitchell_

  • Fix link for query planning docs (:pr:11054) Patrick Hoefler_

  • Add config parameter for parquet file size (:pr:11052) Patrick Hoefler_

  • Update percentile docstring (:pr:11053) Abel Aoun_

  • Add docs for query optimizer (:pr:11043) Patrick Hoefler_

  • Assignment of np.ma.masked to obect-type Array (:pr:9627) David Hassell_

  • Don't error if dask_expr is not installed (:pr:11048) Simon Høxbro Hansen_

  • Adjust test_set_index for "cudf" backend (:pr:11029) Richard (Rick) Zamora_

  • Use to/from_legacy_dataframe instead of to/from_dask_dataframe (:pr:11025) Richard (Rick) Zamora_

  • Tokenize bag groupby keys (:pr:10734) Charles Stern_

  • Add lazy "cudf" registration for p2p-related dispatch functions (:pr:11040) Richard (Rick) Zamora_

  • Collect memray profiles on exception (:pr-distributed:8625) Florian Jetter_

  • Ensure inproc properly emulates serialization protocol (:pr-distributed:8622) Florian Jetter_

  • Relax test stats profiling2 (:pr-distributed:8621) Florian Jetter_

  • Restart workers when worker-ttl expires (:pr-distributed:8538) crusaderky_

  • Use monotonic for deadline test (:pr-distributed:8620) Florian Jetter_

  • Fix race condition for published futures with annotations (:pr-distributed:8577) Florian Jetter_

  • Scatter by worker instead of worker -> nthreads (:pr-distributed:8590) Miles_

  • Send log-event if worker is restarted because of memory pressure (:pr-distributed:8617) Patrick Hoefler_

  • Do not print xfailed tests in CI (:pr-distributed:8619) Florian Jetter_

  • ensure workers are not downscaled when participating in p2p (:pr-distributed:8610) Florian Jetter_

  • Run against stable fsspec (:pr-distributed:8615) Florian Jetter_

.. _v2024.4.1:

2024.4.1

This is a minor bugfix release that that fixes an error when importing dask.dataframe with Python 3.11.9.

See :pr:11035 and :pr:11039 from Richard (Rick) Zamora_ for details.

.. dropdown:: Additional changes

  • Remove skips for named aggregations (:pr:11036) Patrick Hoefler_
  • Don't deep-copy read-only buffers on unpickle (:pr-distributed:8609) crusaderky_
  • Add dask-expr to dask conda recipe (:pr-distributed:8601) Charles Blackmon-Luca_

.. _v2024.4.0:

2024.4.0

Highlights ^^^^^^^^^^

Query planning fixes """""""""""""""""""" This release contains a variety of bugfixes in Dask DataFrame's new query planner.

GPU metric dashboard fixes """""""""""""""""""""""""" GPU memory and utilization dashboard functionality has been restored. Previously these plots were unintentionally left blank.

See :pr-distributed:8572 from Benjamin Zaitlen_ for details.

.. dropdown:: Additional changes

  • Build nightlies on tag releases (:pr:11014) Charles Blackmon-Luca_

  • Remove xfail tracebacks from test suite (:pr:11028) Patrick Hoefler_

  • Fix CI for upstream pandas changes (:pr:11027) Patrick Hoefler_

  • Fix value_counts raising if branch exists of nans only (:pr:11023) Patrick Hoefler_

  • Enable custom expressions in dask_cudf (:pr:11013) Richard (Rick) Zamora_

  • Raise ImportError instead of ValueError when dask-expr cannot be imported (:pr:11007) James Lamb_

  • Add HypersSpy to ecosystem.rst (:pr:11008) Jonas Lähnemann_

  • Add Hugging Face hf:// to the list of fsspec compatible remote services (:pr:11012) Quentin Lhoest_

  • Bump actions/checkout from 4.1.1 to 4.1.2 (:pr:11009)

  • Refresh documentation for annotations and spans (:pr-distributed:8593) crusaderky_

  • Fixup deprecation warning from pandas (:pr-distributed:8564) Patrick Hoefler_

  • Add Python 3.11 to GPU CI matrix (:pr-distributed:8598) Charles Blackmon-Luca_

  • Deadline to use a monotonic timer (:pr-distributed:8597) crusaderky_

  • Update gpuCI RAPIDS_VER to 24.06 (:pr-distributed:8588)

  • Refactor restart() and restart_workers() (:pr-distributed:8550) crusaderky_

  • Bump actions/checkout from 4.1.1 to 4.1.2 (:pr-distributed:8587)

  • Fix bokeh deprecations (:pr-distributed:8594) Miles_

  • Fix flaky test: test_shutsdown_cleanly (:pr-distributed:8582) Miles_

  • Include type in failed sizeof warning (:pr-distributed:8580) James Bourbeau_

.. _v2024.3.1:

2024.3.1

This is a minor release that primarily demotes an exception to a warning if dask-expr is not installed when upgrading.

.. dropdown:: Additional changes

  • Only warn if dask-expr is not installed (:pr:11003) Florian Jetter_
  • Fix typos found by codespell (:pr:10993) Dimitri Papadopoulos Orfanos_
  • Extra CI job with dask-expr disabled (:pr-distributed:8583) crusaderky_
  • Fix worker dashboard proxy (:pr-distributed:8528) Miles_
  • Fix flaky test_restart_waits_for_new_workers (:pr-distributed:8573) crusaderky_
  • Fix flaky test_raise_on_incompatible_partitions (:pr-distributed:8571) crusaderky_

.. _v2024.3.0:

2024.3.0

Released on March 11, 2024

Highlights ^^^^^^^^^^

Query planning """"""""""""""

This release is enabling query planning by default for all users of dask.dataframe.

The query planning functionality represents a rewrite of the DataFrame using dask-expr. This is a drop-in replacement and we expect that most users will not have to adjust any of their code. Any feedback can be reported on the Dask issue tracker <https://github.com/dask/dask/issues>_ or on the query planning feedback issue <https://github.com/dask/dask/issues/10995>_.

If you are encountering any issues you are still able to opt-out by setting

.. code-block:: python

>>> import dask
>>> dask.config.set({'dataframe.query-planning': False})

Sunset of Pandas 1.X support """"""""""""""""""""""""""""

The new query planning backend is requiring at least pandas 2.0. This pandas version will automatically be installed if you are installing from conda or if you are installing using dask[complete] or dask[dataframe] from pip.

The legacy DataFrame implementation is still supporting pandas 1.X if you install dask without extras.

.. dropdown:: Additional changes

  • Update tests for pandas nightlies with dask-expr (:pr:10989) Patrick Hoefler_
  • Use dask-expr docs as main reference docs for DataFrames (:pr:10990) Patrick Hoefler_
  • Adjust from_array test for dask-expr (:pr:10988) Patrick Hoefler_
  • Unskip to_delayed test (:pr:10985) Patrick Hoefler_
  • Bump conda-incubator/setup-miniconda from 3.0.1 to 3.0.3 (:pr:10978)
  • Fix bug when enabling dask-expr (:pr:10977) Patrick Hoefler_
  • Update docs and requirements for dask-expr and remove warning (:pr:10976) Patrick Hoefler_
  • Fix numpy 2 compatibility with ogrid usage (:pr:10929) David Hoese_
  • Turn on dask-expr switch (:pr:10967) Patrick Hoefler_
  • Force initializing the random seed with the same byte order interpret… (:pr:10970) Elliott Sales de Andrade_
  • Use correct encoding for line terminator when reading CSV (:pr:10972) Elliott Sales de Andrade_
  • perf: do not unnecessarily recalculate input/output indices in optimize_blockwise (:pr:10966) Lindsey Gray
  • Adjust tests for string option in dask-expr (:pr:10968) Patrick Hoefler_
  • Adjust tests for array conversion in dask-expr (:pr:10973) Patrick Hoefler_
  • TST: Fix sizeof tests on 32bit (:pr:10971) Elliott Sales de Andrade_
  • TST: Add missing skip for pyarrow (:pr:10969) Elliott Sales de Andrade_
  • Implement dask-expr conversion for bag.to_dataframe (:pr:10963) Patrick Hoefler_
  • Fix dask-expr import errors (:pr:10964) Miles_
  • Clean up Sphinx documentation for dask.config (:pr:10959) crusaderky_
  • Use stdlib importlib.metadata on Python 3.12+ (:pr:10955) wim glenn_
  • Cast partitioning_index to smaller size (:pr:10953) Florian Jetter_
  • Reuse dask/dask groupby Aggregation (:pr:10952) Patrick Hoefler_
  • ensure tokens on futures are unique (:pr-distributed:8569) Florian Jetter_
  • Don't obfuscate fine performance metrics failures (:pr-distributed:8568) crusaderky_
  • Mark shuffle fast tasks in dask-expr (:pr-distributed:8563) crusaderky_
  • Weigh gilknocker Prometheus metric by duration (:pr-distributed:8558) crusaderky_
  • Fix scheduler transition error on memory->erred (:pr-distributed:8549) Hendrik Makait_
  • Make CI happy again (:pr-distributed:8560) Miles_
  • Fix flaky test_Future_release_sync (:pr-distributed:8562) crusaderky_
  • Fix flaky test_flaky_connect_recover_with_retry (:pr-distributed:8556) Hendrik Makait_
  • typing tweaks in scheduler.py (:pr-distributed:8551) crusaderky_
  • Bump conda-incubator/setup-miniconda from 3.0.2 to 3.0.3 (:pr-distributed:8553)
  • Install dask-expr on CI (:pr-distributed:8552) Hendrik Makait_
  • P2P shuffle can drop partition column before writing to disk (:pr-distributed:8531) Hendrik Makait_
  • Better logging for worker removal (:pr-distributed:8517) crusaderky_
  • Add indicator support to merge (:pr-distributed:8539) Patrick Hoefler_
  • Bump conda-incubator/setup-miniconda from 3.0.1 to 3.0.2 (:pr-distributed:8535)
  • Avoid iteration error when getting module path (:pr-distributed:8533) James Bourbeau_
  • Ignore stdlib threading module in code collection (:pr-distributed:8532) James Bourbeau_
  • Fix excessive logging on P2P retry (:pr-distributed:8511) Hendrik Makait_
  • Prevent typos in retire_workers parameters (:pr-distributed:8524) crusaderky_
  • Cosmetic cleanup of test_steal (backport from #8185) (:pr-distributed:8509) crusaderky_
  • Fix flaky test_compute_per_key (:pr-distributed:8521) crusaderky_
  • Fix flaky test_no_workers_timeout_queued (:pr-distributed:8523) crusaderky_

.. _v2024.2.1:

2024.2.1

Released on February 23, 2024

Highlights ^^^^^^^^^^

Allow silencing dask.DataFrame deprecation warning """"""""""""""""""""""""""""""""""""""""""""""""""

The last release contained a DeprecationWarning that alerts users to an upcoming switch of dask.dafaframe to use the new backend with support for query planning (see also :issue:10934).

This DeprecationWarning is triggered in import of the dask.dataframe module and the community raised concerns about this being to verbose.

It is now possible to silence this warning

.. code::

# via Python
>>> dask.config.set({'dataframe.query-planning-warning': False})

# via CLI
dask config set dataframe.query-planning-warning False

See :pr:10936 and :pr:10925 from Miles_ for details.

More robust distributed scheduler for rare key collisions """""""""""""""""""""""""""""""""""""""""""""""""""""""""

Blockwise fusion optimization can cause a task key collision that is not being handled properly by the distributed scheduler (see :issue:9888). Users will typically notice this by seeing one of various internal exceptions that cause a system deadlock or critical failure. While this issue could not be fixed, the scheduler now implements a mechanism that should mitigate most occurences and issues a warning if the issue is detected.

See :pr-distributed:8185 from crusaderky_ and Florian Jetter_ for details.

Over the course of this, various improvements to tokenization have been implemented. See :pr:10913, :pr:10884, :pr:10919, :pr:10896 and primarily :pr:10883 from crusaderky_ for more details.

More robust adaptive scaling on large clusters """"""""""""""""""""""""""""""""""""""""""""""

Adaptive scaling could previously lose data during downscaling if many tasks had to be moved. This typically, but not exclusively, occured on large clusters and would manifest as a recomputation of tasks and could cause clusters to oscillate between up- and downscaling without ever finishing.

See :pr-distributed:8522 from crusaderky_ for more details.

.. dropdown:: Additional changes

  • Remove flaky fastparquet test (:pr:10948) Patrick Hoefler_
  • Enable Aggregation from dask-expr (:pr:10947) Patrick Hoefler_
  • Update tests for assign change in dask-expr (:pr:10944) Patrick Hoefler_
  • Adjust for pandas large string change (:pr:10942) Patrick Hoefler_
  • Fix flaky test_describe_empty (:pr:10943) crusaderky_
  • Use Python 3.12 as reference environment (:pr:10939) crusaderky_
  • [Cosmetic] Clean up temp paths in test_config.py (:pr:10938) crusaderky_
  • [CLI] dask config set and dask config find updates. (:pr:10930) Miles_
  • combine_first when a chunk is full of NaNs (:pr:10932) crusaderky_
  • Correctly parse lowercase true/false config from CLI (:pr:10926) crusaderky_
  • dask config get fix when printing None values (:pr:10927) crusaderky_
  • query-planning can't be None (:pr:10928) crusaderky_
  • Add dask config set (:pr:10921) Miles_
  • Make nunique faster again (:pr:10922) Patrick Hoefler_
  • Clean up some Cython warnings handling (:pr:10924) crusaderky_
  • Bump pre-commit/action from 3.0.0 to 3.0.1 (:pr:10920)
  • Raise and avoid data loss of meta provided to P2P shuffle is wrong (:pr-distributed:8520) Florian Jetter_
  • Fix gpuci: np.product is deprecated (:pr-distributed:8518) crusaderky_
  • Update gpuCI RAPIDS_VER to 24.04 (:pr-distributed:8471)
  • Unpin ipywidgets on Python 3.12 (:pr-distributed:8516) crusaderky_
  • Keep old dependencies on run_spec collision (:pr-distributed:8512) crusaderky_
  • Trivial mypy fix (:pr-distributed:8513) crusaderky_
  • Ensure large payload can be serialized and sent over comms (:pr-distributed:8507) Florian Jetter_
  • Allow large graph warning threshold to be configured (:pr-distributed:8508) Florian Jetter_
  • Tokenization-related test tweaks (backport from #8185) (:pr-distributed:8499) crusaderky_
  • Tweaks to update_graph (backport from #8185) (:pr-distributed:8498) crusaderky_
  • AMM: test incremental retirements (:pr-distributed:8501) crusaderky_
  • Suppress dask-expr warning in CI (:pr-distributed:8505) crusaderky_
  • Ignore dask-expr warning in CI (:pr-distributed:8504) James Bourbeau_
  • Improve tests for P2P stable ordering (:pr-distributed:8458) Hendrik Makait_
  • Bump pre-commit/action from 3.0.0 to 3.0.1 (:pr-distributed:8503)

.. _v2024.2.0:

2024.2.0

Released on February 9, 2024

Highlights ^^^^^^^^^^

Deprecate Dask DataFrame implementation """"""""""""""""""""""""""""""""""""""" The current Dask DataFrame implementation is deprecated. In a future release, Dask DataFrame will use new implementation that contains several improvements including a logical query planning. The user-facing DataFrame API will remain unchanged.

The new implementation is already available and can be enabled by installing the dask-expr library:

.. code-block:: bash

$ pip install dask-expr

and turning the query planning option on:

.. code-block:: python

>>> import dask
>>> dask.config.set({'dataframe.query-planning': True})
>>> import dask.dataframe as dd

API documentation for the new implementation is available at https://docs.dask.org/en/stable/dataframe-api.html

Any feedback can be reported on the Dask issue tracker https://github.com/dask/dask/issues

See :pr:10912 from Patrick Hoefler_ for details.

Improved tokenization """"""""""""""""""""" This release contains several improvements to Dask's object tokenization logic. More objects now produce deterministic tokens, which can lead to improved performance through caching of intermediate results.

See :pr:10898, :pr:10904, :pr:10876, :pr:10874, and :pr:10865 from crusaderky_ for details.

.. dropdown:: Additional changes

  • Fix inplace modification on read-only arrays for string conversion (:pr:10886) Patrick Hoefler_

  • Add changelog entry for dask-expr (:pr:10915) Patrick Hoefler_

  • Fix leftsemi merge for cudf (:pr:10914) Patrick Hoefler_

  • Slight update to dask-expr warning (:pr:10916) James Bourbeau_

  • Improve performance for groupby.nunique (:pr:10910) Patrick Hoefler_

  • Add configuration for leftsemi merges in dask-expr (:pr:10908) Patrick Hoefler_

  • Adjust assign test for dask-expr (:pr:10907) Patrick Hoefler_

  • Avoid pytest.warns in test_to_datetime for GPU CI (:pr:10902) Richard (Rick) Zamora_

  • Update deployment options in docs homepage (:pr:10901) James Bourbeau_

  • Fix typo in dataframe docs (:pr:10900) Matthew Rocklin_

  • Bump peter-evans/create-pull-request from 5 to 6 (:pr:10894)

  • Fix mimesis API >=13.1.0 - use random.randint (:pr:10888) Miles_

  • Adjust invalid test (:pr:10897) Patrick Hoefler_

  • Pickle da.argwhere and da.count_nonzero (:pr:10885) crusaderky_

  • Fix dask-expr tests after singleton pr (:pr:10892) Patrick Hoefler_

  • Set lower bound version for s3fs (:pr:10889) Miles_

  • Add a couple of dask-expr fixes for new parquet cache (:pr:10880) Florian Jetter_

  • Update deployment documentation (:pr:10882) Matthew Rocklin_

  • Start with dask-expr doc build (:pr:10879) Patrick Hoefler_

  • Test tokenization of static and class methods (:pr:10872) crusaderky_

  • Add distributed.print and distributed.warn to API docs (:pr:10878) James Bourbeau_

  • Run macos ci on M1 architecture (:pr:10877) Patrick Hoefler_

  • Update tests for dask-expr (:pr:10838) Patrick Hoefler_

  • Update parquet tests to align with dask-expr fixes (:pr:10851) Richard (Rick) Zamora_

  • Fix regression in test_graph_manipulation (:pr:10873) crusaderky_

  • Adjust pytest errors for dask-expr ci (:pr:10871) Patrick Hoefler_

  • Set upper bound version for numba when pandas<2.1 (:pr:10890) Miles_

  • Deprecate method parameter in DataFrame.fillna (:pr:10846) Miles_

  • Remove warning filter from pyproject.toml (:pr:10867) Patrick Hoefler_

  • Skip test_append_with_partition for fastparquet (:pr:10828) Patrick Hoefler_

  • Fix pytest 8 issues (:pr:10868) Patrick Hoefler_

  • Adjust test for support of median in Groupby.aggregate in dask-expr (2/2) (:pr:10870) Hendrik Makait_

  • Allow length of ascending to be larger than one in sort_values (:pr:10864) Florian Jetter_

  • Allow other message raised in Python 3.9 (:pr:10862) Hendrik Makait_

  • Don't crash when getting computation code in pathological cases (:pr-distributed:8502) James Bourbeau_

  • Bump peter-evans/create-pull-request from 5 to 6 (:pr-distributed:8494)

  • fix test of cudf spilling metrics (:pr-distributed:8478) Mads R. B. Kristensen_

  • Upgrade to pytest 8 (:pr-distributed:8482) crusaderky_

  • Fix test_two_consecutive_clients_share_results (:pr-distributed:8484) crusaderky_

  • Client word mix-up (:pr-distributed:8481) templiert_

.. _v2024.1.1:

2024.1.1

Released on January 26, 2024

Highlights ^^^^^^^^^^

Pandas 2.2 and Scipy 1.12 support """"""""""""""""""""""""""""""""" This release contains compatibility updates for the latest pandas and scipy releases.

See :pr:10834, :pr:10849, :pr:10845, and :pr-distributed:8474 from crusaderky_ for details.

Deprecations """"""""""""

  • Deprecate convert_dtype in apply (:pr:10827) Miles_
  • Deprecate axis in DataFrame.rolling (:pr:10803) Miles_
  • Deprecate out= and dtype= parameter in most DataFrame methods (:pr:10800) crusaderky_
  • Deprecate axis in groupby cumulative transformers (:pr:10796) Miles_
  • Rename shuffle to shuffle_method in remaining methods (:pr:10797) Miles_

.. dropdown:: Additional changes

  • Add recommended deployment options to deployment docs (:pr:10866) James Bourbeau_

  • Improve _agg_finalize to confirm to output expectation (:pr:10835) Hendrik Makait_

  • Implement deterministic tokenization for hlg (:pr:10817) Patrick Hoefler_

  • Refactor: move tests for tokenize() to its own module (:pr:10863) crusaderky_

  • Update DataFrame examples section (:pr:10856) James Bourbeau_

  • Temporarily pin mimesis<13.1.0 (:pr:10860) James Bourbeau_

  • Trivial cosmetic tweaks to _testing.py (:pr:10857) crusaderky_

  • Unskip and adjust tests for groupby-aggregate with median using dask-expr (:pr:10832) Hendrik Makait_

  • Fix test for sizeof(pd.MultiIndex) in upstream CI (:pr:10850) crusaderky_

  • numpy 2.0: fix slicing by uint64 array (:pr:10854) crusaderky_

  • Rename numpy version constants to match pandas (:pr:10843) crusaderky_

  • Bump actions/cache from 3 to 4 (:pr:10852)

  • Update gpuCI RAPIDS_VER to 24.04 (:pr:10841)

  • Fix deprecations in doctest (:pr:10844) crusaderky_

  • Changed dtype arithmetics in numpy 2.x (:pr:10831) crusaderky_

  • Adjust tests for median support in dask-expr (:pr:10839) Patrick Hoefler_

  • Adjust tests for median support in groupby-aggregate in dask-expr (:pr:10840) Hendrik Makait_

  • numpy 2.x: fix std() on MaskedArray (:pr:10837) crusaderky_

  • Fail dask-expr ci if tests fail (:pr:10829) Patrick Hoefler_

  • Activate query_planning when exporting tests (:pr:10833) Patrick Hoefler_

  • Expose dataframe tests (:pr:10830) Patrick Hoefler_

  • numpy 2: deprecations in n-dimensional fft functions (:pr:10821) crusaderky_

  • Generalize CreationDispatch for dask-expr (:pr:10794) Richard (Rick) Zamora_

  • Remove circular import when dask-expr enabled (:pr:10824) Miles_

  • Minor[CI]: publish-test-results not marked as failed (:pr:10825) Miles_

  • Fix more tests to use pytest.warns() (:pr:10818) Michał Górny_

  • np.unique(): inverse is shaped in numpy 2 (:pr:10819) crusaderky_

  • Pin test_split_adaptive_files to pyarrow engine (:pr:10820) Patrick Hoefler_

  • Adjust remaining tests in dask/dask (:pr:10813) Patrick Hoefler_

  • Restrict test to Arrow only (:pr:10814) Patrick Hoefler_

  • Filter warnings from std test (:pr:10815) Patrick Hoefler_

  • Adjust mostly indexing tests (:pr:10790) Patrick Hoefler_

  • Updates to deployment docs (:pr:10778) Sarah Charlotte Johnson_

  • Unblock documentation build (:pr:10807) Miles_

  • Adjust test_to_datetime for dask-expr compatibility Hendrik Makait_

  • Upstream CI tweaks (:pr:10806) crusaderky_

  • Improve tests for to_numeric (:pr:10804) Hendrik Makait_

  • Fix test-report cache key indent (:pr:10798) Miles_

  • Add test-report workflow (:pr:10783) Miles_

  • Handle matrix subclass serialization (:pr-distributed:8480) Florian Jetter_

  • Use smallest data type for partition column in P2P (:pr-distributed:8479) Florian Jetter_

  • pandas 2.2: fix test_dataframe_groupby_tasks (:pr-distributed:8475) crusaderky_

  • Bump actions/cache from 3 to 4 (:pr-distributed:8477)

  • pandas 2.2 vs. pyarrow 14: deprecated DatetimeTZBlock (:pr-distributed:8476) crusaderky_

  • pandas 2.2.0: Deprecated frequency alias M in favor of ME (:pr-distributed:8473) Hendrik Makait_

  • Fix docs build (:pr-distributed:8472) Hendrik Makait_

  • Fix P2P-based joins with explicit npartitions (:pr-distributed:8470) Hendrik Makait_

  • Ignore dask-expr in test_report.py script (:pr-distributed:8464) Miles_

  • Nit: hardcode Python version in test report environment (:pr-distributed:8462) crusaderky_

  • Change test_report.py - skip bad artifacts in dask/dask (:pr-distributed:8461) Miles_

  • Replace all occurrences of sys.is_finalizing (:pr-distributed:8449) Florian Jetter_

.. _v2024.1.0:

2024.1.0

Released on January 12, 2024

Highlights ^^^^^^^^^^

Partial rechunks within P2P """"""""""""""""""""""""""" P2P rechunking now utilizes the relationships between input and output chunks. For situations that do not require all-to-all data transfer, this may significantly reduce the runtime and memory/disk footprint. It also enables task culling.

See :pr-distributed:8330 from Hendrik Makait_ for details.

Fastparquet engine deprecated """"""""""""""""""""""""""""" The fastparquet Parquet engine has been deprecated. Users should migrate to the pyarrow engine by installing PyArrow <https://arrow.apache.org/docs/python/install.html>_ and removing engine="fastparquet" in read_parquet or to_parquet calls.

See :pr:10743 from crusaderky_ for details.

Improved serialization for arbitrary data """"""""""""""""""""""""""""""""""""""""" This release improves serialization robustness for arbitrary data. Previously there were some cases where serialization could fail for non-msgpack serializable data. In those cases we now fallback to using pickle.

See :pr:8447 from Hendrik Makait_ for details.

Additional deprecations """""""""""""""""""""""

  • Deprecate shuffle keyword in favour of shuffle_method for DataFrame methods (:pr:10738) Hendrik Makait_
  • Deprecate automatic argument inference in repartition (:pr:10691) Patrick Hoefler_
  • Deprecate compute parameter in set_index (:pr:10784) Miles_
  • Deprecate inplace in eval (:pr:10785) Miles_
  • Deprecate Series.view (:pr:10754) Miles_
  • Deprecate npartitions="auto" for set_index & sort_values (:pr:10750) Miles_

.. dropdown:: Additional changes

  • Avoid shortcut in tasks shuffle that let to data loss (:pr:10763) Patrick Hoefler_

  • Ignore data tasks when ordering (:pr:10706) Florian Jetter_

  • Add get_dummies from dask-expr (:pr:10791) Patrick Hoefler_

  • Adjust IO tests for dask-expr migration (:pr:10776) Patrick Hoefler_

  • Remove deprecation warning about sort and split_out in groupby (:pr:10788) Patrick Hoefler_

  • Address pandas deprecations (:pr:10789) Patrick Hoefler_

  • Import distributed only once in get_scheduler (:pr:10771) Florian Jetter_

  • Simplify GitHub actions (:pr:10781) crusaderky_

  • Add unit test overview (:pr:10769) Miles_

  • Clean up redundant bits in CI (:pr:10768) crusaderky_

  • Update tests for ufunc (:pr:10773) Patrick Hoefler_

  • Use pytest.mark.skipif(DASK_EXPR_ENABLED) (:pr:10774) crusaderky_

  • Adjust shuffle tests for dask-expr (:pr:10759) Patrick Hoefler_

  • Fix some deprecation warnings from pandas (:pr:10749) Patrick Hoefler_

  • Adjust shuffle tests for dask-expr (:pr:10762) Patrick Hoefler_

  • Update pre-commit (:pr:10767) Hendrik Makait_

  • Clean up config switches in CI (:pr:10766) crusaderky_

  • Improve exception for validate_key (:pr:10765) Hendrik Makait_

  • Handle datetimeindexes in set_index with unknown divisions (:pr:10757) Patrick Hoefler_

  • Add hashing for decimals (:pr:10758) Patrick Hoefler_

  • Review tests for is_monotonic (:pr:10756) crusaderky_

  • Change argument order in value_counts_aggregate (:pr:10751) Patrick Hoefler_

  • Adjust some groupby tests for dask-expr (:pr:10752) Patrick Hoefler_

  • Restrict mimesis to < 12 for 3.9 build (:pr:10755) Patrick Hoefler_

  • Don't evaluate config in skip condition (:pr:10753) Patrick Hoefler_

  • Adjust some tests to be compatible with dask-expr (:pr:10714) Patrick Hoefler_

  • Make dask.array.utils functions more generic to other Dask Arrays (:pr:10676) Matthew Rocklin_

  • Remove duplciate "single machine" section (:pr:10747) Matthew Rocklin_

  • Tweak ORC engine= parameter (:pr:10746) crusaderky_

  • Add pandas 3.0 deprecations and migration prep for dask-expr (:pr:10723) Miles_

  • Add task graph animation to docs homepage (:pr:10730) Sarah Charlotte Johnson_

  • Use new Xarray logo (:pr:10729) James Bourbeau_

  • Update tab styling on "10 Minutes to Dask" page (:pr:10728) James Bourbeau_

  • Update environment file upload step in CI (:pr:10726) James Bourbeau_

  • Don't duplicate unobserved categories in GroupBy.nunqiue if split_out>1 (:pr:10716) Patrick Hoefler_

  • Changelog entry for dask.order update (:pr:10715) Florian Jetter_

  • Relax redundant-key check in _check_dsk (:pr:10701) Richard (Rick) Zamora_

  • Fix test_report.py (:pr-distributed:8459) Miles_

  • Revert pickle change (:pr-distributed:8456) Florian Jetter_

  • Adapt test_report.py to support dask/dask repository (:pr-distributed:8450) Miles_

  • Maintain stable ordering for P2P shuffling (:pr-distributed:8453) Hendrik Makait_

  • Add no worker timeout for scheduler (:pr-distributed:8371) FTang21_

  • Allow tests workflow to be dispatched manually by maintainers (:pr-distributed:8445) Erik Sundell_

  • Make scheduler-related transition functionality private (:pr-distributed:8448) Hendrik Makait_

  • Update pre-commit hooks (:pr-distributed:8444) Hendrik Makait_

  • Do not always check if __main__ in result when pickling (:pr-distributed:8443) Florian Jetter_

  • Delegate wait_for_workers to cluster instances only when implemented (:pr-distributed:8441) Erik Sundell_

  • Extend sleep in test_pandas (:pr-distributed:8440) Julian Gilbey_

  • Avoid deprecated shuffle keyword (:pr-distributed:8439) Hendrik Makait_

  • Shuffle metrics 4/4: Remove bespoke diagnostics (:pr-distributed:8367) crusaderky_

  • Do not run gilknocker in testsuite (:pr-distributed:8423) Florian Jetter_

  • Tweak abstractmethods (:pr-distributed:8427) crusaderky_

  • Shuffle metrics 3/4: Capture background metrics (:pr-distributed:8366) crusaderky_

  • Shuffle metrics 2/4: Add background metrics (:pr-distributed:8365) crusaderky_

  • Shuffle metrics 1/4: Add foreground metrics (:pr-distributed:8364) crusaderky_

  • Bump actions/upload-artifact from 3 to 4 (:pr-distributed:8420)

  • Fix test_merge_p2p_shuffle_reused_dataframe_with_different_parameters (:pr-distributed:8422) Hendrik Makait_

  • Expand Client.upload_file docs example (:pr-distributed:8313) Miles_

  • Improve logging in P2P's scheduler plugin (:pr-distributed:8410) Hendrik Makait_

  • Re-enable test_decide_worker_coschedule_order_neighbors (:pr-distributed:8402) Florian Jetter_

  • Add cuDF spilling statistics to RMM/GPU memory plot (:pr-distributed:8148) Charles Blackmon-Luca_

  • Fix inconsistent hashing for Nanny-spawned workers (:pr-distributed:8400) Charles Stern_

  • Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client) (:pr-distributed:7481) Florian Jetter_

  • Fix flaky test_subprocess_cluster_does_not_depend_on_logging (:pr-distributed:8417) crusaderky_

.. _v2023.12.1:

2023.12.1

Released on December 15, 2023

Highlights ^^^^^^^^^^

Logical Query Planning now available for Dask DataFrames """"""""""""""""""""""""""""""""""""""""""""""""""""""""

Dask DataFrames are now much more performant by using a logical query planner. This feature is currently off by default, but can be turned on with:

.. code:: python

dask.config.set({"dataframe.query-planning": True})

You also need to have dask-expr installed:

.. code:: bash

pip install dask-expr

We've seen promising performance improvements so far, see this blog post <https://blog.coiled.io/blog/dask-expr-tpch-dask.html>__ and these regularly updated benchmarks <https://tpch.coiled.io>__ for more information. A more detailed explanation of how the query optimizer works can be found in this blog post <https://blog.coiled.io/blog/dask-expr-introduction.html>__.

This feature is still under active development and the API <https://github.com/dask-contrib/dask-expr#api-coverage>__ isn't stable yet, so breaking changes can occur. We expect to make the query optimizer the default early next year.

See :pr:10634 from Patrick Hoefler_ for details.

Dtype inference in read_parquet """""""""""""""""""""""""""""""""""

read_parquet will now infer the Arrow types pa.date32(), pa.date64() and pa.decimal() as a ArrowDtype in pandas. These dtypes are backed by the original Arrow array, and thus avoid the conversion to NumPy object. Additionally, read_parquet will no longer infer nested and binary types as strings, they will be stored in NumPy object arrays.

See :pr:10698 and :pr:10705 from Patrick Hoefler_ for details.

Scheduling improvements to reduce memory usage """"""""""""""""""""""""""""""""""""""""""""""

This release includes a major rewrite to a core part of our scheduling logic. It includes a new approach to the topological sorting algorithm in dask.order which determines the order in which tasks are run. Improper ordering is known to be a major contributor to too large cluster memory pressure.

Updates in this release fix a couple of performance regressions that were introduced in the release 2023.10.0 (see :pr:10535). Generally, computations should now be much more eager to release data if it is no longer required in memory.

See :pr:10660, :pr:10697 from Florian Jetter_ for details.

Improved P2P-based merging robustness and performance """""""""""""""""""""""""""""""""""""""""""""""""""""

This release contains several updates that fix a possible deadlock introduced in 2023.9.2 and improve the robustness of P2P-based merging when the cluster is dynamically scaling up.

See :pr-distributed:8415, :pr-distributed:8416, and :pr-distributed:8414 from Hendrik Makait_ for details.

Removed disabling pickle option """""""""""""""""""""""""""""""

The distributed.scheduler.pickle configuration option is no longer supported. As of the 2023.4.0 release, pickle is used to transmit task graphs, so can no longer be disabled. We now raise an informative error when distributed.scheduler.pickle is set to False.

See :pr-distributed:8401 from Florian Jetter_ for details.

.. dropdown:: Additional changes

  • Add changelog entry for recent P2P merge fixes (:pr:10712) Hendrik Makait_

  • Update DataFrame page (:pr:10710) Matthew Rocklin_

  • Add changelog entry for dask-expr switch (:pr:10704) Patrick Hoefler_

  • Improve changelog entry for PipInstall changes (:pr:10711) Hendrik Makait_

  • Remove PR labeler (:pr:10709) James Bourbeau_

  • Add .__wrapped__ to Delayed object (:pr:10695) Andrew S. Rosen_

  • Bump actions/labeler from 4.3.0 to 5.0.0 (:pr:10689)

  • Bump actions/stale from 8 to 9 (:pr:10690)

  • [Dask.order] Remove non-runnable leaf nodes from ordering (:pr:10697) Florian Jetter_

  • Update installation docs (:pr:10699) Matthew Rocklin_

  • Fix software environment link in docs (:pr:10700) James Bourbeau_

  • Avoid converting non-strings to arrow strings for read_parquet (:pr:10692) Patrick Hoefler_

  • Bump xarray-contrib/issue-from-pytest-log from 1.2.7 to 1.2.8 (:pr:10687)

  • Fix tokenize for pd.DateOffset (:pr:10664) jochenott_

  • Bugfix for writing empty array to zarr (:pr:10506) Ben_

  • Docs update, fixup styling, mention free (:pr:10679) Matthew Rocklin_

  • Update deployment docs (:pr:10680) Matthew Rocklin_

  • Dask.order rewrite using a critical path approach (:pr:10660) Florian Jetter_

  • Avoid substituting keys that occur multiple times (:pr:10646) Florian Jetter_

  • Add missing image to docs (:pr:10694) Matthew Rocklin_

  • Bump actions/setup-python from 4 to 5 (:pr:10688)

  • Update landing page (:pr:10674) Matthew Rocklin_

  • Make meta check simpler in dispatch (:pr:10638) Patrick Hoefler_

  • Pin PR Labeler (:pr:10675) Matthew Rocklin_

  • Reorganize docs index a bit (:pr:10669) Matthew Rocklin_

  • Bump actions/setup-java from 3 to 4 (:pr:10667)

  • Bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.1 (:pr:10668)

  • Bump xarray-contrib/issue-from-pytest-log from 1.2.6 to 1.2.7 (:pr:10666)

  • Fix test_categorize_info with nightly pyarrow (:pr:10662) James Bourbeau_

  • Rewrite test_subprocess_cluster_does_not_depend_on_logging (:pr-distributed:8409) Hendrik Makait_

  • Avoid RecursionError when failing to pickle key in SpillBuffer and using tblib=3 (:pr-distributed:8404) Hendrik Makait_

  • Allow tasks to override is_rootish heuristic (:pr-distributed:8412) Hendrik Makait_

  • Remove GPU executor (:pr-distributed:8399) Hendrik Makait_

  • Do not rely on logging for subprocess cluster (:pr-distributed:8398) Hendrik Makait_

  • Update gpuCI RAPIDS_VER to 24.02 (:pr-distributed:8384)

  • Bump actions/setup-python from 4 to 5 (:pr-distributed:8396)

  • Ensure output chunks in P2P rechunking are distributed homogeneously (:pr-distributed:8207) Florian Jetter_

  • Trivial: fix typo (:pr-distributed:8395) crusaderky_

  • Bump JamesIves/github-pages-deploy-action from 4.4.3 to 4.5.0 (:pr-distributed:8387)

  • Bump conda-incubator/setup-miniconda from 3.0.0 to 3.0.1 (:pr-distributed:8388)

.. _v2023.12.0:

2023.12.0

Released on December 1, 2023

Highlights ^^^^^^^^^^

PipInstall restart and environment variables """"""""""""""""""""""""""""""""""""""""""""

The distributed.PipInstall plugin now has more robust restart logic and also supports environment variables <https://pip.pypa.io/en/stable/reference/requirements-file-format/#using-environment-variables>_.

Below shows how users can use the distributed.PipInstall plugin and a TOKEN environment variable to securely install a package from a private repository:

.. code:: python

from dask.distributed import PipInstall plugin = PipInstall(packages=["private_package@git+https://${TOKEN}@github.com/dask/private_package.git]) client.register_plugin(plugin)

See :pr-distributed:8374, :pr-distributed:8357, and :pr-distributed:8343 from Hendrik Makait_ for details.

Bokeh 3.3.0 compatibility """"""""""""""""""""""""" This release contains compatibility updates for using bokeh>=3.3.0 with proxied Dask dashboards. Previously the contents of dashboard plots wouldn't be displayed.

See :pr-distributed:8347 and :pr-distributed:8381 from Jacob Tomlinson_ for details.

.. dropdown:: Additional changes

  • Add network marker to test_pyarrow_filesystem_option_real_data (:pr:10653) Richard (Rick) Zamora_
  • Bump GPU CI to CUDA 11.8 (:pr:10656) Charles Blackmon-Luca_
  • Tokenize pandas offsets deterministically (:pr:10643) Patrick Hoefler_
  • Add tokenize pd.NA functionality (:pr:10640) Patrick Hoefler_
  • Update gpuCI RAPIDS_VER to 24.02 (:pr:10636)
  • Fix precision handling in array.linalg.norm (:pr:10556) joanrue_
  • Add axis argument to DataFrame.clip and Series.clip (:pr:10616) Richard (Rick) Zamora_
  • Update changelog entry for in-memory rechunking (:pr:10630) Florian Jetter_
  • Fix flaky test_resources_reset_after_cancelled_task (:pr-distributed:8373) crusaderky_
  • Bump GPU CI to CUDA 11.8 (:pr-distributed:8376) Charles Blackmon-Luca_
  • Bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.0 (:pr-distributed:8372)
  • Add debug logs to P2P scheduler plugin (:pr-distributed:8358) Hendrik Makait_
  • O(1) access for /info/task/ endpoint (:pr-distributed:8363) crusaderky_
  • Remove stringification from shuffle annotations (:pr-distributed:8362) crusaderky_
  • Don't cast int metrics to float (:pr-distributed:8361) crusaderky_
  • Drop asyncio TCP backend (:pr-distributed:8355) Florian Jetter_
  • Add offload support to context_meter.add_callback (:pr-distributed:8360) crusaderky_
  • Test that sync() propagates contextvars (:pr-distributed:8354) crusaderky_
  • captured_context_meter (:pr-distributed:8352) crusaderky_
  • context_meter.clear_callbacks (:pr-distributed:8353) crusaderky_
  • Use @log_errors decorator (:pr-distributed:8351) crusaderky_
  • Fix test_statistical_profiling_cycle (:pr-distributed:8356) Florian Jetter_
  • Shuffle: don't parse dask.config at every RPC (:pr-distributed:8350) crusaderky_
  • Replace Client.register_plugin s idempotent argument with .idempotent attribute on plugins (:pr-distributed:8342) Hendrik Makait_
  • Fix test report generation (:pr-distributed:8346) Hendrik Makait_
  • Install pyarrow-hotfix on mindeps-pandas CI (:pr-distributed:8344) Hendrik Makait_
  • Reduce memory usage of scheduler process - optimize scheduler.py::TaskState class (:pr-distributed:8331) Miles_
  • Bump pre-commit linters (:pr-distributed:8340) crusaderky_
  • Update cuDF test with explicit dtype=object (:pr-distributed:8339) Peter Andreas Entschev_
  • Fix Cluster / SpecCluster calls to async close methods (:pr-distributed:8327) Peter Andreas Entschev_

.. _v2023.11.0:

2023.11.0

Released on November 10, 2023

Highlights ^^^^^^^^^^

Zero-copy P2P Array Rechunking """"""""""""""""""""""""""""""

Users should see significant performance improvements when using in-memory P2P array rechunking. This is due to no longer copying underlying data buffers.

Below shows a simple example where we compare performance of different rechunking methods.

.. code:: python

shape = (30_000, 6_000, 150) # 201.17 GiB input_chunks = (60, -1, -1) # 411.99 MiB output_chunks = (-1, 6, -1) # 205.99 MiB

arr = da.random.random(size, chunks=input_chunks) with dask.config.set({ "array.rechunk.method": "p2p", "distributed.p2p.disk": True, }): ( da.random.random(size, chunks=input_chunks) .rechunk(output_chunks) .sum() .compute() )

.. image:: images/changelog/2023110-rechunking-disk-perf.png :width: 75% :align: center :alt: A comparison of rechunking performance between the different methods tasks, p2p with disk and p2p without disk on different cluster sizes. The graph shows that p2p without disk is up to 60% faster than the default tasks based approach.

See :pr-distributed:8282, :pr-distributed:8318, :pr-distributed:8321 from crusaderky_ and (:pr-distributed:8322) from Hendrik Makait_ for details.

Deprecating PyArrow <14.0.1 """"""""""""""""""""""""""" pyarrow<14.0.1 usage is deprecated starting in this release. It's recommended for all users to upgrade their version of pyarrow or install pyarrow-hotfix. See this CVE <https://www.cve.org/CVERecord?id=CVE-2023-47248>_ for full details.

See :pr:10622 from Florian Jetter_ for details.

Improved PyArrow filesystem for Parquet """"""""""""""""""""""""""""""""""""""" Using filesystem="arrow" when reading Parquet datasets now properly inferrs the correct cloud region when accessing remote, cloud-hosted data.

See :pr:10590 from Richard (Rick) Zamora_ for details.

Improve Type Reconciliation in P2P Shuffling """""""""""""""""""""""""""""""""""""""""""" See :pr-distributed:8332 from Hendrik Makait_ for details.

.. dropdown:: Additional changes

- Fix sporadic failure of ``test_dataframe::test_quantile`` (:pr:`10625`) `Miles`_
- Bump minimum ``click`` to ``>=8.1`` (:pr:`10623`) `Jacob Tomlinson`_
- Refactor ``test_quantile`` (:pr:`10620`) `Miles`_
- Avoid ``PerformanceWarning`` for fragmented DataFrame (:pr:`10621`) `Patrick Hoefler`_
- Generalize computation of ``NEW_*_VER`` in GPU CI updating workflow (:pr:`10610`) `Charles Blackmon-Luca`_
- Switch to newer GPU CI images (:pr:`10608`) `Charles Blackmon-Luca`_
- Remove double slash in ``fsspec`` tests (:pr:`10605`) `Mario Šaško`_
- Reenable ``test_ucx_config_w_env_var`` (:pr-distributed:`8272`) `Peter Andreas Entschev`_
- Don't share ``host_array`` when receiving from network (:pr-distributed:`8308`) `crusaderky`_
- Generalize computation of ``NEW_*_VER`` in GPU CI updating workflow (:pr-distributed:`8319`) `Charles Blackmon-Luca`_
- Switch to newer GPU CI images (:pr-distributed:`8316`) `Charles Blackmon-Luca`_
- Minor updates to shuffle dashboard (:pr-distributed:`8315`) `Matthew Rocklin`_
- Don't use ``bytearray().join`` (:pr-distributed:`8312`) `crusaderky`_
- Reuse identical shuffles in P2P hash join (:pr-distributed:`8306`) `Hendrik Makait`_

.. _v2023.10.1:

2023.10.1

Released on October 27, 2023

Highlights ^^^^^^^^^^

Python 3.12 """"""""""" This release adds official support for Python 3.12.

See :pr:10544 and :pr-distributed:8223 from Thomas Grainger_ for details.

.. dropdown:: Additional changes

- Avoid splitting parquet files to row groups as aggressively (:pr:`10600`) `Matthew Rocklin`_
- Speed up ``normalize_chunks`` for common case (:pr:`10579`) `Martin Durant`_
- Use Python 3.11 for upstream and doctests CI build (:pr:`10596`) `Thomas Grainger`_
- Bump ``actions/checkout`` from 4.1.0 to 4.1.1 (:pr:`10592`)
- Switch to PyTables ``HEAD`` (:pr:`10580`) `Thomas Grainger`_
- Remove ``numpy.core`` warning filter, link to issue on ``pyarrow`` caused ``BlockManager`` warning (:pr:`10571`) `Thomas Grainger`_
- Unignore and fix deprecated freq aliases (:pr:`10577`) `Thomas Grainger`_
- Move ``register_assert_rewrite`` earlier in ``conftest`` to fix warnings (:pr:`10578`) `Thomas Grainger`_
- Upgrade ``versioneer`` to 0.29 (:pr:`10575`) `Thomas Grainger`_
- change ``test_concat_categorical`` to be non-strict (:pr:`10574`) `Thomas Grainger`_
- Enable SciPy tests with NumPy 2.0 `Thomas Grainger`_
- Enable tests for scikit-image with NumPy 2.0 (:pr:`10569`) `Thomas Grainger`_
- Fix upstream build (:pr:`10549`) `Thomas Grainger`_
- Add optimized code paths for ``drop_duplicates`` (:pr:`10542`) `Richard (Rick) Zamora`_
- Support ``cudf`` backend in ``dd.DataFrame.sort_values`` (:pr:`10551`) `Richard (Rick) Zamora`_
- Rename "GIL Contention" to just GIL in chart labels (:pr-distributed:`8305`) `Matthew Rocklin`_
- Bump ``actions/checkout`` from 4.1.0 to 4.1.1 (:pr-distributed:`8299`)
- Fix dashboard (:pr-distributed:`8293`) `Hendrik Makait`_
- ``@log_errors`` for async tasks (:pr-distributed:`8294`) `crusaderky`_
- Annotations and better tests for serialize_bytes (:pr-distributed:`8300`) `crusaderky`_
- Temporarily xfail ``test_decide_worker_coschedule_order_neighbors`` to unblock CI (:pr-distributed:`8298`) `James Bourbeau`_
- Skip ``xdist`` and ``matplotlib`` in code samples (:pr-distributed:`8290`) `Matthew Rocklin`_
- Use ``numpy._core`` on ``numpy>=2.dev0`` (:pr-distributed:`8291`) `Thomas Grainger`_
- Fix calculation of ``MemoryShardsBuffer.bytes_read`` (:pr-distributed:`8289`) `crusaderky`_
- Allow P2P to store data in-memory (:pr-distributed:`8279`) `Hendrik Makait`_
- Upgrade ``versioneer`` to 0.29 (:pr-distributed:`8288`) `Thomas Grainger`_
- Allow ``ResourceLimiter`` to be unlimited (:pr-distributed:`8276`) `Hendrik Makait`_
- Run ``pre-commit`` autoupdate (:pr-distributed:`8281`) `Thomas Grainger`_
- Annotate instance variables for P2P layers (:pr-distributed:`8280`) `Hendrik Makait`_
- Remove worker gracefully should not mark tasks as suspicious (:pr-distributed:`8234`) `Thomas Grainger`_
- Add signal handling to ``dask spec`` (:pr-distributed:`8261`) `Thomas Grainger`_
- Add typing for ``sync`` (:pr-distributed:`8275`) `Hendrik Makait`_
- Better annotations for shuffle offload (:pr-distributed:`8277`) `crusaderky`_
- Test minimum versions for p2p shuffle (:pr-distributed:`8270`) `crusaderky`_
- Run coverage on test failures (:pr-distributed:`8269`) `crusaderky`_
- Use ``aiohttp`` with extensions (:pr-distributed:`8274`) `Thomas Grainger`_

.. _v2023.10.0:

2023.10.0

Released on October 13, 2023

Highlights ^^^^^^^^^^

Reduced memory pressure for multi array reductions """""""""""""""""""""""""""""""""""""""""""""""""" This release contains major updates to Dask's task graph scheduling logic. The updates here significantly reduce memory pressure on array reductions. We anticipate this will have a strong impact on the array computing community.

See :pr:10535 from Florian Jetter_ for details.

Improved P2P shuffling robustness """"""""""""""""""""""""""""""""" There are several updates (listed below) that make P2P shuffling much more robust and less likely to fail.

See :pr-distributed:8262, :pr-distributed:8264, :pr-distributed:8242, :pr-distributed:8244, and :pr-distributed:8235 from Hendrik Makait_ and :pr-distributed:8124 from Charles Blackmon-Luca_ for details.

Reduced scheduler CPU load for large graphs """"""""""""""""""""""""""""""""""""""""""" Users should see reduced CPU load on their scheduler when computing large task graphs.

See :pr-distributed:8238 and :pr:10547 from Florian Jetter_ and :pr-distributed:8240 from crusaderky_ for details.

.. dropdown:: Additional changes

- Dispatch the ``partd.Encode`` class used for disk-based shuffling (:pr:`10552`) `Richard (Rick) Zamora`_
- Add documentation for hive partitioning (:pr:`10454`) `Richard (Rick) Zamora`_
- Add typing to ``dask.order`` (:pr:`10553`) `Florian Jetter`_
- Allow passing ``index_col=False`` in ``dd.read_csv`` (:pr:`9961`) `Michael Leslie`_
- Tighten ``HighLevelGraph`` annotations (:pr:`10524`) `crusaderky`_
- Support for latest ``ipykernel``/``ipywidgets`` (:pr-distributed:`8253`) `crusaderky`_
- Check minimal ``pyarrow`` version for P2P merge (:pr-distributed:`8266`) `Hendrik Makait`_
- Support for Python 3.12 (:pr-distributed:`8223`) `Thomas Grainger`_
- Use ``memoryview.nbytes`` when warning on large graph send (:pr-distributed:`8268`) `crusaderky`_
- Run tests without ``gilknocker`` (:pr-distributed:`8263`) `crusaderky`_
- Disable ipv6 on MacOS CI (:pr-distributed:`8254`) `crusaderky`_
- Clean up redundant minimum versions (:pr-distributed:`8251`) `crusaderky`_
- Clean up use of ``BARRIER_PREFIX`` in scheduler plugin (:pr-distributed:`8252`) `crusaderky`_
- Improve shuffle run handling in P2P's worker plugin (:pr-distributed:`8245`) `Hendrik Makait`_
- Explicitly set ``charset=utf-8`` (:pr-distributed:`8250`) `crusaderky`_
- Typing tweaks to :pr-distributed:`8239` (:pr-distributed:`8247`) `crusaderky`_
- Simplify scheduler assertion (:pr-distributed:`8246`) `crusaderky`_
- Improve typing (:pr-distributed:`8239`) `Hendrik Makait`_
- Respect cgroups v2 "low" memory limit (:pr-distributed:`8243`) `Samantha Hughes`_
- Fix ``PackageInstall`` by making it a scheduler plugin (:pr-distributed:`8142`) `Hendrik Makait`_
- Xfail ``test_ucx_config_w_env_var`` (:pr-distributed:`8241`) `crusaderky`_
- ``SpecCluster`` resilience to broken workers (:pr-distributed:`8233`) `crusaderky`_
- Suppress ``SpillBuffer`` stack traces for cancelled tasks (:pr-distributed:`8232`) `crusaderky`_
- Update annotations after stringification changes (:pr-distributed:`8195`) `crusaderky`_
- Reduce max recursion depth of profile (:pr-distributed:`8224`) `crusaderky`_
- Offload deeply nested objects (:pr-distributed:`8214`) `crusaderky`_
- Fix flaky ``test_close_connections`` (:pr-distributed:`8231`) `crusaderky`_
- Fix flaky ``test_popen_timeout`` (:pr-distributed:`8229`) `crusaderky`_
- Fix flaky ``test_adapt_then_manual`` (:pr-distributed:`8228`) `crusaderky`_
- Prevent collisions in ``SpillBuffer`` (:pr-distributed:`8226`) `crusaderky`_
- Allow ``retire_workers`` to run concurrently (:pr-distributed:`8056`) `Florian Jetter`_
- Fix HTML repr for ``TaskState`` objects (:pr-distributed:`8188`) `Florian Jetter`_
- Fix ``AttributeError`` for ``builtin_function_or_method`` in ``profile.py`` (:pr-distributed:`8181`) `Florian Jetter`_
- Fix flaky ``test_spans`` (v2) (:pr-distributed:`8222`) `crusaderky`_

.. _v2023.9.3:

2023.9.3

Released on September 29, 2023

Highlights ^^^^^^^^^^

Restore previous configuration override behavior """""""""""""""""""""""""""""""""""""""""""""""" The 2023.9.2 release introduced an unintentional breaking change in how configuration options are overriden in dask.config.get with the override_with= keyword (see :issue:10519). This release restores the previous behavior.

See :pr:10521 from crusaderky_ for details.

Complex dtypes in Dask Array reductions """"""""""""""""""""""""""""""""""""""" This release includes improved support for using common reductions in Dask Array (e.g. var, std, moment) with complex dtypes.

See :pr:10009 from wkrasnicki_ for details.

.. dropdown:: Additional changes

- Bump ``actions/checkout`` from 4.0.0 to 4.1.0 (:pr:`10532`)
- Match ``pandas`` reverting ``apply`` deprecation (:pr:`10531`) `James Bourbeau`_
- Update gpuCI ``RAPIDS_VER`` to ``23.12`` (:pr:`10526`)
- Temporarily skip failing tests with ``fsspec==2023.9.1`` (:pr:`10520`) `James Bourbeau`_

.. _v2023.9.2:

2023.9.2

Released on September 15, 2023

Highlights ^^^^^^^^^^

P2P shuffling now raises when outdated PyArrow is installed """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" Previously the default shuffling method would silently fallback from P2P to task-based shuffling if an older version of pyarrow was installed. Now we raise an informative error with the minimum required pyarrow version for P2P instead of silently falling back.

See :pr:10496 from Hendrik Makait_ for details.

Deprecation cycle for admin.traceback.shorten """"""""""""""""""""""""""""""""""""""""""""" The 2023.9.0 release modified the admin.traceback.shorten configuration option without introducing a deprecation cycle. This resulted in failures to create Dask clusters in some cases. This release introduces a deprecation cycle for this configuration change.

See :pr:10509 from crusaderky_ for details.

.. dropdown:: Additional changes

- Avoid materializing all iterators in ``delayed`` tasks (:pr:`10498`) `James Bourbeau`_
- Overhaul deprecations system in ``dask.config`` (:pr:`10499`) `crusaderky`_
- Remove unnecessary check in ``timeseries`` (:pr:`10447`) `Patrick Hoefler`_
- Use ``register_plugin`` in tests (:pr:`10503`) `James Bourbeau`_
- Make ``preserve_index`` explicit in ``pyarrow_schema_dispatch`` (:pr:`10501`) `Hendrik Makait`_
- Add ``**kwargs`` support for ``pyarrow_schema_dispatch`` (:pr:`10500`) `Hendrik Makait`_
- Centralize and type ``no_default`` (:pr:`10495`) `crusaderky`_

.. _v2023.9.1:

2023.9.1

Released on September 6, 2023

.. note:: This is a hotfix release that fixes a P2P shuffling bug introduced in the 2023.9.0 release (see :pr:10493).

Enhancements ^^^^^^^^^^^^

  • Stricter data type for dask keys (:pr:10485) crusaderky_
  • Special handling for None in DASK_ environment variables (:pr:10487) crusaderky_

Bug Fixes ^^^^^^^^^

  • Fix _partitions dtype in meta for DataFrame.set_index and DataFrame.sort_values (:pr:10493) Hendrik Makait_
  • Handle cached_property decorators in derived_from (:pr:10490) Lawrence Mitchell_

Maintenance ^^^^^^^^^^^

  • Bump actions/checkout from 3.6.0 to 4.0.0 (:pr:10492)
  • Simplify some tests that import distributed (:pr:10484) crusaderky_

.. _v2023.9.0:

2023.9.0

Released on September 1, 2023

Bug Fixes ^^^^^^^^^

  • Remove support for np.int64 in keys (:pr:10483) crusaderky_
  • Fix _partitions dtype in meta for shuffling (:pr:10462) Hendrik Makait_
  • Don't use exception hooks to shorten tracebacks (:pr:10456) crusaderky_

Documentation ^^^^^^^^^^^^^

  • Add p2p shuffle option to DataFrame docs (:pr:10477) Patrick Hoefler_

Maintenance ^^^^^^^^^^^

  • Skip failing tests for pandas=2.1.0 (:pr:10488) Patrick Hoefler_
  • Update tests for pandas=2.1.0 (:pr:10439) Patrick Hoefler_
  • Enable pytest-timeout (:pr:10482) crusaderky_
  • Bump actions/checkout from 3.5.3 to 3.6.0 (:pr:10470)

.. _v2023.8.1:

2023.8.1

Released on August 18, 2023

Enhancements ^^^^^^^^^^^^

  • Adding support for cgroup v2 to cpu_count (:pr:10419) Johan Olsson_
  • Support multi-column groupby with sort=True and split_out>1 (:pr:10425) Richard (Rick) Zamora_
  • Add DataFrame.enforce_runtime_divisions method (:pr:10404) Richard (Rick) Zamora_
  • Enable file mode="x" with a single_file=True for Dask DataFrame to_csv (:pr:10443) Genevieve Buckley_

Bug Fixes ^^^^^^^^^

  • Fix ValueError when running to_csv in append mode with single_file as True (:pr:10441) Ben_

Maintenance ^^^^^^^^^^^

  • Add default types_mapper to from_pyarrow_table_dispatch for pandas (:pr:10446) Richard (Rick) Zamora_

.. _v2023.8.0:

2023.8.0

Released on August 4, 2023

Enhancements ^^^^^^^^^^^^

  • Fix for make_timeseries performance regression (:pr:10428) Irina Truong_

Documentation ^^^^^^^^^^^^^

  • Add distributed.print to debugging docs (:pr:10435) James Bourbeau_
  • Documenting compatibility of NumPy functions with Dask functions (:pr:9941) Chiara Marmo_

Maintenance ^^^^^^^^^^^

  • Use SPDX in license metadata (:pr:10437) John A Kirkham_
  • Require dask[array] in dask[dataframe] (:pr:10357) John A Kirkham_
  • Update gpuCI RAPIDS_VER to 23.10 (:pr:10427)
  • Simplify compatibility code (:pr:10426) Hendrik Makait_
  • Fix compatibility variable naming (:pr:10424) Hendrik Makait_
  • Fix a few errors with upstream pandas and pyarrow (:pr:10412) Irina Truong_

.. _v2023.7.1:

2023.7.1

Released on July 20, 2023

.. note::

This release updates Dask DataFrame to automatically convert text data using object data types to string[pyarrow] if pandas>=2 and pyarrow>=12 are installed.

This should result in significantly reduced memory consumption and increased computation performance in many workflows that deal with text data.

You can disable this change by setting the dataframe.convert-string configuration value to False with

.. code-block:: python

  dask.config.set({"dataframe.convert-string": False})

Enhancements ^^^^^^^^^^^^

  • Convert to pyarrow strings if proper dependencies are installed (:pr:10400) James Bourbeau_
  • Avoid repartition before shuffle for p2p (:pr:10421) Patrick Hoefler_
  • API to generate random Dask DataFrames (:pr:10392) Irina Truong_
  • Speed up dask.bag.Bag.random_sample (:pr:10356) crusaderky_
  • Raise helpful ValueError for invalid time units (:pr:10408) Nat Tabris_
  • Make repartition a no-op when divisions match (divisions provided as a list) (:pr:10395) Nicolas Grandemange_

Bug Fixes ^^^^^^^^^

  • Use dataframe.convert-string in read_parquet token (:pr:10411) James Bourbeau_
  • Category dtype is lost when concatenating MultiIndex (:pr:10407) Irina Truong_
  • Fix FutureWarning: The provided callable... (:pr:10405) Irina Truong_
  • Enable non-categorical hive-partition columns in read_parquet (:pr:10353) Richard (Rick) Zamora_
  • concat ignoring DataFrame withouth columns (:pr:10359) Patrick Hoefler_

.. _v2023.7.0:

2023.7.0

Released on July 7, 2023

Enhancements ^^^^^^^^^^^^

  • Catch exceptions when attempting to load CLI entry points (:pr:10380) Jacob Tomlinson_

Bug Fixes ^^^^^^^^^

  • Fix typo in _clean_ipython_traceback (:pr:10385) Alexander Clausen_
  • Ensure that df is immutable after from_pandas (:pr:10383) Patrick Hoefler_
  • Warn consistently for inplace in Series.rename (:pr:10313) Patrick Hoefler_

Documentation ^^^^^^^^^^^^^

  • Add clarification about output shape and reshaping in rechunk documentation (:pr:10377) Swayam Patil_

Maintenance ^^^^^^^^^^^

  • Simplify astype implementation (:pr:10393) Patrick Hoefler_
  • Fix test_first_and_last to accommodate deprecated last (:pr:10373) James Bourbeau_
  • Add level to create_merge_tree (:pr:10391) Patrick Hoefler_
  • Do not derive from scipy.stats.chisquare docstring (:pr:10382) Doug Davis_

.. _v2023.6.1:

2023.6.1

Released on June 26, 2023

Enhancements ^^^^^^^^^^^^

  • Remove no longer supported clip_lower and clip_upper (:pr:10371) Patrick Hoefler_
  • Support DataFrame.set_index(..., sort=False) (:pr:10342) Miles_
  • Cleanup remote tracebacks (:pr:10354) Irina Truong_
  • Add dispatching mechanisms for pyarrow.Table conversion (:pr:10312) Richard (Rick) Zamora_
  • Choose P2P even if fusion is enabled (:pr:10344) Hendrik Makait_
  • Validate that rechunking is possible earlier in graph generation (:pr:10336) Hendrik Makait_

Bug Fixes ^^^^^^^^^

  • Fix issue with header passed to read_csv (:pr:10355) GALI PREM SAGAR_
  • Respect dropna and observed in GroupBy.var and GroupBy.std (:pr:10350) Patrick Hoefler_
  • Fix H5FD_lock error when writing to hdf with distributed client (:pr:10309) Irina Truong_
  • Fix for total_mem_usage of bag.map() (:pr:10341) Irina Truong_

Deprecations ^^^^^^^^^^^^

  • Deprecate DataFrame.fillna/Series.fillna with method (:pr:10349) Irina Truong_
  • Deprecate DataFrame.first and Series.first (:pr:10352) Irina Truong_

Maintenance ^^^^^^^^^^^

  • Deprecate numpy.compat (:pr:10370) Irina Truong_
  • Fix annotations and spans leaking between threads (:pr:10367) Irina Truong_
  • Use general kwargs in pyarrow_table_dispatch functions (:pr:10364) Richard (Rick) Zamora_
  • Remove unnecessary try/except in isna (:pr:10363) Patrick Hoefler_
  • mypy support for numpy 1.25 (:pr:10362) crusaderky_
  • Bump actions/checkout from 3.5.2 to 3.5.3 (:pr:10348)
  • Restore numba in upstream build (:pr:10330) James Bourbeau_
  • Update nightly wheel index for pandas/numpy/scipy (:pr:10346) Matthew Roeschke_
  • Add rechunk config values to yaml (:pr:10343) Hendrik Makait_

.. _v2023.6.0:

2023.6.0

Released on June 9, 2023

Enhancements ^^^^^^^^^^^^

  • Add missing not in predicate support to read_parquet (:pr:10320) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

  • Fix for incorrect value_counts (:pr:10323) Irina Truong_
  • Update empty describe top and freq values (:pr:10319) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Fix hetzner typo (:pr:10332) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

  • Test with numba and sparse on Python 3.11 (:pr:10329) Thomas Grainger_
  • Remove numpy.find_common_type warning ignore (:pr:10311) James Bourbeau_
  • Update gpuCI RAPIDS_VER to 23.08 (:pr:10310)

.. _v2023.5.1:

2023.5.1

Released on May 26, 2023

.. note::

This release drops support for Python 3.8. As of this release Dask supports Python 3.9, 3.10, and 3.11. See this community issue <https://github.com/dask/community/issues/315>_ for more details.

Enhancements ^^^^^^^^^^^^

  • Drop Python 3.8 support (:pr:10295) Thomas Grainger_
  • Change Dask Bag partitioning scheme to improve cluster saturation (:pr:10294) Jacob Tomlinson_
  • Generalize dd.to_datetime for GPU-backed collections, introduce get_meta_library utility (:pr:9881) Charles Blackmon-Luca_
  • Add na_action to DataFrame.map (:pr:10305) Patrick Hoefler_
  • Raise TypeError in DataFrame.nsmallest and DataFrame.nlargest when columns is not given (:pr:10301) Patrick Hoefler_
  • Improve sizeof for pd.MultiIndex (:pr:10230) Patrick Hoefler_
  • Support duplicated columns in a bunch of DataFrame methods (:pr:10261) Patrick Hoefler_
  • Add numeric_only support to DataFrame.idxmin and DataFrame.idxmax (:pr:10253) Patrick Hoefler_
  • Implement numeric_only support for DataFrame.quantile (:pr:10259) Patrick Hoefler_
  • Add support for numeric_only=False in DataFrame.std (:pr:10251) Patrick Hoefler_
  • Implement numeric_only=False for GroupBy.cumprod and GroupBy.cumsum (:pr:10262) Patrick Hoefler_
  • Implement numeric_only for skew and kurtosis (:pr:10258) Patrick Hoefler_
  • mask and where should accept a callable (:pr:10289) Irina Truong_
  • Fix conversion from Categorical to pa.dictionary in read_parquet (:pr:10285) Patrick Hoefler_

Bug Fixes ^^^^^^^^^

  • Spurious config on nested annotations (:pr:10318) crusaderky_
  • Fix rechunking behavior for dimensions with known and unknown chunk sizes (:pr:10157) Hendrik Makait_
  • Enable drop to support mismatched partitions (:pr:10300) James Bourbeau_
  • Fix divisions construction for to_timestamp (:pr:10304) Patrick Hoefler_
  • pandas ExtensionDtype raising in Series reduction operations (:pr:10149) Patrick Hoefler_
  • Fix regression in da.random interface (:pr:10247) Eray Aslan_
  • da.coarsen doesn't trim an empty chunk in meta (:pr:10281) Irina Truong_
  • Fix dtype inference for engine="pyarrow" in read_csv (:pr:10280) Patrick Hoefler_

Documentation ^^^^^^^^^^^^^

  • Add meta_from_array to API docs (:pr:10306) Ruth Comer_
  • Update Coiled links (:pr:10296) Sarah Charlotte Johnson_
  • Add docs for demo day (:pr:10288) Matthew Rocklin_

Maintenance ^^^^^^^^^^^

  • Explicitly install anaconda-client from conda-forge when uploading conda nightlies (:pr:10316) Charles Blackmon-Luca_
  • Configure isort to add from __future__ import annotations (:pr:10314) Thomas Grainger_
  • Avoid pandas Series.__getitem__ deprecation in tests (:pr:10308) James Bourbeau_
  • Ignore numpy.find_common_type warning from pandas (:pr:10307) James Bourbeau_
  • Add test to check that DataFrame.__setitem__ does not modify df inplace (:pr:10223) Patrick Hoefler_
  • Clean up default value of dropna in value_counts (:pr:10299) Patrick Hoefler_
  • Add pytest-cov to test extra (:pr:10271) James Bourbeau_

.. _v2023.5.0:

2023.5.0

Released on May 12, 2023

Enhancements ^^^^^^^^^^^^

  • Implement numeric_only=False for GroupBy.corr and GroupBy.cov (:pr:10264) Patrick Hoefler_
  • Add support for numeric_only=False in DataFrame.var (:pr:10250) Patrick Hoefler_
  • Add numeric_only support to DataFrame.mode (:pr:10257) Patrick Hoefler_
  • Add DataFrame.map to dask.DataFrame API (:pr:10246) Patrick Hoefler_
  • Adjust for DataFrame.applymap deprecation and all NA concat behaviour change (:pr:10245) Patrick Hoefler_
  • Enable numeric_only=False for DataFrame.count (:pr:10234) Patrick Hoefler_
  • Disallow array input in mask/where (:pr:10163) Irina Truong_
  • Support numeric_only=True in GroupBy.corr and GroupBy.cov (:pr:10227) Patrick Hoefler_
  • Add numeric_only support to GroupBy.median (:pr:10236) Patrick Hoefler_
  • Support mimesis=9 in dask.datasets (:pr:10241) James Bourbeau_
  • Add numeric_only support to min, max and prod (:pr:10219) Patrick Hoefler_
  • Add numeric_only=True support for GroupBy.cumsum and GroupBy.cumprod (:pr:10224) Patrick Hoefler_
  • Add helper to unpack numeric_only keyword (:pr:10228) Patrick Hoefler_

Bug Fixes ^^^^^^^^^

  • Fix clone + from_array failure (:pr:10211) crusaderky_
  • Fix dataframe reductions for ea dtypes (:pr:10150) Patrick Hoefler_
  • Avoid scalar conversion deprecation warning in numpy=1.25 (:pr:10248) James Bourbeau_
  • Make sure transform output has the same index as input (:pr:10184) Irina Truong_
  • Fix corr and cov on a single-row partition (:pr:9756) Irina Truong_
  • Fix test_groupby_numeric_only_supported and test_groupby_aggregate_categorical_observed upstream errors (:pr:10243) Irina Truong_

Documentation ^^^^^^^^^^^^^

  • Clean up futures docs (:pr:10266) Matthew Rocklin_
  • Add Index API reference (:pr:10263) hotpotato_

Maintenance ^^^^^^^^^^^

  • Warn when meta is passed to apply (:pr:10256) Patrick Hoefler_
  • Remove imageio version restriction in CI (:pr:10260) Patrick Hoefler_
  • Remove unused DataFrame variance methods (:pr:10252) Patrick Hoefler_
  • Un-xfail test_categories with pyarrow strings and pyarrow>=12 (:pr:10244) Irina Truong_
  • Bump gpuCI PYTHON_VER 3.8->3.9 (:pr:10233) Charles Blackmon-Luca_

.. _v2023.4.1:

2023.4.1

Released on April 28, 2023

Enhancements ^^^^^^^^^^^^

  • Implement numeric_only support for DataFrame.sum (:pr:10194) Patrick Hoefler_
  • Add support for numeric_only=True in GroupBy operations (:pr:10222) Patrick Hoefler_
  • Avoid deep copy in DataFrame.__setitem__ for pandas 1.4 and up (:pr:10221) Patrick Hoefler_
  • Avoid calling Series.apply with _meta_nonempty (:pr:10212) Patrick Hoefler_
  • Unpin sqlalchemy and fix compatibility issues (:pr:10140) Patrick Hoefler_

Bug Fixes ^^^^^^^^^

  • Partially revert default client discovery (:pr:10225) Florian Jetter_
  • Support arrow dtypes in Index meta creation (:pr:10170) Patrick Hoefler_
  • Repartitioning raises with extension dtype when truncating floats (:pr:10169) Patrick Hoefler_
  • Adjust empty Index from fastparquet to object dtype (:pr:10179) Patrick Hoefler_

Documentation ^^^^^^^^^^^^^

  • Update Kubernetes docs (:pr:10232) Jacob Tomlinson_
  • Add DataFrame.reduction to API docs (:pr:10229) James Bourbeau_
  • Add DataFrame.persist to docs and fix links (:pr:10231) Patrick Hoefler_
  • Add documentation for GroupBy.transform (:pr:10185) Irina Truong_
  • Fix formatting in random number generation docs (:pr:10189) Eray Aslan_

Maintenance ^^^^^^^^^^^

  • Pin imageio to <2.28 (:pr:10216) Patrick Hoefler_
  • Add note about importlib_metadata backport (:pr:10207) James Bourbeau_
  • Add xarray back to Python 3.11 CI builds (:pr:10200) James Bourbeau_
  • Add mindeps build with all optional dependencies (:pr:10161) Charles Blackmon-Luca_
  • Provide proper like value for array_safe in percentiles_summary (:pr:10156) Charles Blackmon-Luca_
  • Avoid re-opening hdf file multiple times in read_hdf (:pr:10205) Thomas Grainger_
  • Add merge tests on nullable columns (:pr:10071) Charles Blackmon-Luca_
  • Fix coverage configuration (:pr:10203) Thomas Grainger_
  • Remove is_period_dtype and is_sparse_dtype (:pr:10197) Patrick Hoefler_
  • Bump actions/checkout from 3.5.0 to 3.5.2 (:pr:10201)
  • Avoid deprecated is_categorical_dtype from pandas (:pr:10180) Patrick Hoefler_
  • Adjust for deprecated is_interval_dtype and is_datetime64tz_dtype (:pr:10188) Patrick Hoefler_

.. _v2023.4.0:

2023.4.0

Released on April 14, 2023

Enhancements ^^^^^^^^^^^^

  • Override old default values in update_defaults (:pr:10159) Gabe Joseph_
  • Add a CLI command to list and get a value from dask config (:pr:9936) Irina Truong_
  • Handle string-based engine argument to read_json (:pr:9947) Richard (Rick) Zamora_
  • Avoid deprecated GroupBy.dtypes (:pr:10111) Irina Truong_

Bug Fixes ^^^^^^^^^

  • Revert grouper-related changes (:pr:10182) Irina Truong_
  • GroupBy.cov raising for non-numeric grouping column (:pr:10171) Patrick Hoefler_
  • Updates for Index supporting numpy numeric dtypes (:pr:10154) Irina Truong_
  • Preserve dtype for partitioning columns when read with pyarrow (:pr:10115) Patrick Hoefler_
  • Fix annotations for to_hdf (:pr:10123) Hendrik Makait_
  • Handle None column name when checking if columns are all numeric (:pr:10128) Lawrence Mitchell_
  • Fix valid_divisions when passed a tuple (:pr:10126) Brian Phillips_
  • Maintain annotations in DataFrame.categorize (:pr:10120) Hendrik Makait_
  • Fix handling of missing min/max parquet statistics during filtering (:pr:10042) Richard (Rick) Zamora_

Deprecations ^^^^^^^^^^^^

  • Deprecate use_nullable_dtypes= and add dtype_backend= (:pr:10076) Irina Truong_
  • Deprecate convert_dtype in Series.apply (:pr:10133) Irina Truong_

Documentation ^^^^^^^^^^^^^

  • Document Generator based random number generation (:pr:10134) Eray Aslan_

Maintenance ^^^^^^^^^^^

  • Update dataframe.convert_string to dataframe.convert-string (:pr:10191) Irina Truong_
  • Add python-cityhash to CI environments (:pr:10190) Charles Blackmon-Luca_
  • Temporarily pin scikit-image to fix Windows CI (:pr:10186) Patrick Hoefler_
  • Handle pandas deprecation warnings for to_pydatetime and apply (:pr:10168) Patrick Hoefler_
  • Drop bokeh<3 restriction (:pr:10177) James Bourbeau_
  • Fix failing tests under copy-on-write (:pr:10173) Patrick Hoefler_
  • Allow pyarrow CI to fail (:pr:10176) James Bourbeau_
  • Switch to Generator for random number generation in dask.array (:pr:10003) Eray Aslan_
  • Bump peter-evans/create-pull-request from 4 to 5 (:pr:10166)
  • Fix flaky modf operation in test_arithmetic (:pr:10162) Irina Truong_
  • Temporarily remove xarray from CI with pandas 2.0 (:pr:10153) James Bourbeau_
  • Fix update_graph counting logic in test_default_scheduler_on_worker (:pr:10145) James Bourbeau_
  • Fix documentation build with pandas 2.0 (:pr:10138) James Bourbeau_
  • Remove dask/gpu from gpuCI update reviewers (:pr:10135) Charles Blackmon-Luca_
  • Update gpuCI RAPIDS_VER to 23.06 (:pr:10129)
  • Bump actions/stale from 6 to 8 (:pr:10121)
  • Use declarative setuptools (:pr:10102) Thomas Grainger_
  • Relax assert_eq checks on Scalar-like objects (:pr:10125) Matthew Rocklin_
  • Upgrade readthedocs config to ubuntu 22.04 and Python 3.11 (:pr:10124) Thomas Grainger_
  • Bump actions/checkout from 3.4.0 to 3.5.0 (:pr:10122)
  • Fix test_null_partition_pyarrow in pyarrow CI build (:pr:10116) Irina Truong_
  • Drop distributed pack (:pr:9988) Florian Jetter_
  • Make dask.compatibility private (:pr:10114) Jacob Tomlinson_

.. _v2023.3.2:

2023.3.2

Released on March 24, 2023

Enhancements ^^^^^^^^^^^^

  • Deprecate observed=False for groupby with categoricals (:pr:10095) Irina Truong_
  • Deprecate axis= for some groupby operations (:pr:10094) James Bourbeau_
  • The axis keyword in DataFrame.rolling/Series.rolling is deprecated (:pr:10110) Irina Truong_
  • DataFrame._data deprecation in pandas (:pr:10081) Irina Truong_
  • Use importlib_metadata backport to avoid CLI UserWarning (:pr:10070) Thomas Grainger_
  • Port option parsing logic from dask.dataframe.read_parquet to to_parquet (:pr:9981) Anton Loukianov_

Bug Fixes ^^^^^^^^^

  • Avoid using dd.shuffle in groupby-apply (:pr:10043) Richard (Rick) Zamora_
  • Enable null hive partitions with pyarrow parquet engine (:pr:10007) Richard (Rick) Zamora_
  • Support unknown shapes in *_like functions (:pr:10064) Doug Davis_

Documentation ^^^^^^^^^^^^^

  • Add to_backend methods to API docs (:pr:10093) Lawrence Mitchell_
  • Remove broken gpuCI link in developer docs (:pr:10065) Charles Blackmon-Luca_

Maintenance ^^^^^^^^^^^

  • Configure readthedocs sphinx warnings as errors (:pr:10104) Thomas Grainger_
  • Un-xfail test_division_or_partition with pyarrow strings active (:pr:10108) Irina Truong_
  • Un-xfail test_different_columns_are_allowed with pyarrow strings active (:pr:10109) Irina Truong_
  • Restore Entrypoints compatibility (:pr:10113) Jacob Tomlinson_
  • Un-xfail test_to_dataframe_optimize_graph with pyarrow strings active (:pr:10087) Irina Truong_
  • Only run test_development_guidelines_matches_ci on editable install (:pr:10106) Charles Blackmon-Luca_
  • Un-xfail test_dataframe_cull_key_dependencies_materialized with pyarrow strings active (:pr:10088) Irina Truong_
  • Install mimesis in CI environments (:pr:10105) Charles Blackmon-Luca_
  • Fix for no module named ipykernel (:pr:10101) Irina Truong_
  • Fix docs builds by installing ipykernel (:pr:10103) Thomas Grainger_
  • Allow pyarrow build to continue on failures (:pr:10097) James Bourbeau_
  • Bump actions/checkout from 3.3.0 to 3.4.0 (:pr:10096)
  • Fix test_set_index_on_empty with pyarrow strings active (:pr:10054) Irina Truong_
  • Un-xfail pyarrow pickling tests (:pr:10082) James Bourbeau_
  • CI environment file cleanup (:pr:10078) James Bourbeau_
  • Un-xfail more pyarrow tests (:pr:10066) Irina Truong_
  • Temporarily skip pyarrow_compat tests with pandas 2.0 (:pr:10063) James Bourbeau`_
  • Fix test_melt with pyarrow strings active (:pr:10052) Irina Truong_
  • Fix test_str_accessor with pyarrow strings active (:pr:10048) James Bourbeau_
  • Fix test_better_errors_object_reductions with pyarrow strings active (:pr:10051) James Bourbeau_
  • Fix test_loc_with_non_boolean_series with pyarrow strings active (:pr:10046) James Bourbeau_
  • Fix test_values with pyarrow strings active (:pr:10050) James Bourbeau_
  • Temporarily xfail test_upstream_packages_installed (:pr:10047) James Bourbeau_

.. _v2023.3.1:

2023.3.1

Released on March 10, 2023

Enhancements ^^^^^^^^^^^^

  • Support pyarrow strings in MultiIndex (:pr:10040) Irina Truong_
  • Improved support for pyarrow strings (:pr:10000) Irina Truong_
  • Fix flaky RuntimeWarning during array reductions (:pr:10030) James Bourbeau_
  • Extend complete extras (:pr:10023) James Bourbeau_
  • Raise an error with dataframe.convert-string=True and pandas<2.0 (:pr:10033) Irina Truong_
  • Rename shuffle/rechunk config option/kwarg to method (:pr:10013) James Bourbeau_
  • Add initial support for converting pandas extension dtypes to arrays (:pr:10018) James Bourbeau_
  • Remove randomgen support (:pr:9987) Eray Aslan_

Bug Fixes ^^^^^^^^^

  • Skip rechunk when rechunking to the same chunks with unknown sizes (:pr:10027) Hendrik Makait_
  • Custom utility to convert parquet filters to pyarrow expression (:pr:9885) Richard (Rick) Zamora_
  • Consider numpy scalars and 0d arrays as scalars when padding (:pr:9653) Justus Magin_
  • Fix parquet overwrite behavior after an adaptive read_parquet operation (:pr:10002) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Add and update docs for Data Transfer section (:pr:10022) Miles_

Maintenance ^^^^^^^^^^^

  • Remove stale hive-partitioning code from pyarrow parquet engine (:pr:10039) Richard (Rick) Zamora_
  • Increase minimum supported pyarrow to 7.0 (:pr:10024) James Bourbeau_
  • Revert "Prepare drop packunpack (:pr:9994) (:pr:10037) Florian Jetter_
  • Have codecov wait for more builds before reporting (:pr:10031) James Bourbeau_
  • Prepare drop packunpack (:pr:9994) Florian Jetter_
  • Add CI job with pyarrow strings turned on (:pr:10017) James Bourbeau_
  • Fix test_groupby_dropna_with_agg for pandas 2.0 (:pr:10001) Irina Truong_
  • Fix test_pickle_roundtrip for pandas 2.0 (:pr:10011) James Bourbeau_

.. _v2023.3.0:

2023.3.0

Released on March 1, 2023

Bug Fixes ^^^^^^^^^

  • Bag must not pick p2p as shuffle default (:pr:10005) Florian Jetter_

Documentation ^^^^^^^^^^^^^

  • Minor follow-up to P2P by default (:pr:10008) James Bourbeau_

Maintenance ^^^^^^^^^^^

  • Add minimum version to optional jinja2 dependency (:pr:9999) Charles Blackmon-Luca_

.. _v2023.2.1:

2023.2.1

Released on February 24, 2023

.. note::

This release changes the default DataFrame shuffle algorithm to ``p2p``
to improve stability and performance. `Learn more here <https://blog.coiled.io/blog/shuffling-large-data-at-constant-memory.html?utm_source=dask-docs&utm_medium=changelog>`_
and please provide any feedback `on this discussion <https://github.com/dask/distributed/discussions/7509>`_.

If you encounter issues with this new algorithm, please see the :ref:`documentation <shuffle-methods>`
for more information, and how to switch back to the old mode.

Enhancements ^^^^^^^^^^^^

  • Enable P2P shuffling by default (:pr:9991) Florian Jetter_
  • P2P rechunking (:pr:9939) Hendrik Makait_
  • Efficient dataframe.convert-string support for read_parquet (:pr:9979) Irina Truong_
  • Allow p2p shuffle kwarg for DataFrame merges (:pr:9900) Florian Jetter_
  • Change split_row_groups default to "infer" (:pr:9637) Richard (Rick) Zamora_
  • Add option for converting string data to use pyarrow strings (:pr:9926) James Bourbeau_
  • Add support for multi-column sort_values (:pr:8263) Charles Blackmon-Luca_
  • Generator based random-number generation indask.array (:pr:9038) Eray Aslan_
  • Support numeric_only for simple groupby aggregations for pandas 2.0 compatibility (:pr:9889) Irina Truong_

Bug Fixes ^^^^^^^^^

  • Fix profilers plot not being aligned to context manager enter time (:pr:9739) David Hoese_
  • Relax dask.dataframe assert_eq type checks (:pr:9989) Matthew Rocklin_
  • Restore describe compatibility for pandas 2.0 (:pr:9982) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Improving deploying Dask docs (:pr:9912) Sarah Charlotte Johnson_
  • More docs for DataFrame.partitions (:pr:9976) Tom Augspurger_
  • Update docs with more information on default Delayed scheduler (:pr:9903) Guillaume Eynard-Bontemps_
  • Deployment Considerations documentation (:pr:9933) Gabe Joseph_

Maintenance ^^^^^^^^^^^

  • Temporarily rerun flaky tests (:pr:9983) James Bourbeau_
  • Update parsing of FULL_RAPIDS_VER/FULL_UCX_PY_VER (:pr:9990) Charles Blackmon-Luca_
  • Increase minimum supported versions to pandas=1.3 and numpy=1.21 (:pr:9950) James Bourbeau_
  • Fix std to work with numeric_only for pandas 2.0 (:pr:9960) Irina Truong_
  • Temporarily xfail test_roundtrip_partitioned_pyarrow_dataset (:pr:9977) James Bourbeau_
  • Fix copy on write failure in test_idxmaxmin (:pr:9944) Patrick Hoefler_
  • Bump pre-commit versions (:pr:9955) crusaderky_
  • Fix test_groupby_unaligned_index for pandas 2.0 (:pr:9963) Irina Truong_
  • Un-xfail test_set_index_overlap_2 for pandas 2.0 (:pr:9959) James Bourbeau_
  • Fix test_merge_by_index_patterns for pandas 2.0 (:pr:9930) Irina Truong_
  • Bump jacobtomlinson/gha-find-replace from 2 to 3 (:pr:9953) James Bourbeau_
  • Fix test_rolling_agg_aggregate for pandas 2.0 compatibility (:pr:9948) Irina Truong_
  • Bump black to 23.1.0 (:pr:9956) crusaderky_
  • Run GPU tests on python 3.8 & 3.10 (:pr:9940) Charles Blackmon-Luca_
  • Fix test_to_timestamp for pandas 2.0 (:pr:9932) Irina Truong_
  • Fix an error with groupby value_counts for pandas 2.0 compatibility (:pr:9928) Irina Truong_
  • Config converter: replace all dashes with underscores (:pr:9945) Jacob Tomlinson_
  • CI: use nightly wheel to install pyarrow in upstream test build (:pr:9873) Joris Van den Bossche_

.. _v2023.2.0:

2023.2.0

Released on February 10, 2023

Enhancements ^^^^^^^^^^^^

  • Update numeric_only default in quantile for pandas 2.0 (:pr:9854) Irina Truong_
  • Make repartition a no-op when divisions match (:pr:9924) James Bourbeau_
  • Update datetime_is_numeric behavior in describe for pandas 2.0 (:pr:9868) Irina Truong_
  • Update value_counts to return correct name in pandas 2.0 (:pr:9919) Irina Truong_
  • Support new axis=None behavior in pandas 2.0 for certain reductions (:pr:9867) James Bourbeau_
  • Filter out all-nan RuntimeWarning at the chunk level for nanmin and nanmax (:pr:9916) Julia Signell_
  • Fix numeric meta_nonempty index creation for pandas 2.0 (:pr:9908) James Bourbeau_
  • Fix DataFrame.info() tests for pandas 2.0 (:pr:9909) James Bourbeau_

Bug Fixes ^^^^^^^^^

  • Fix GroupBy.value_counts handling for multiple groupby columns (:pr:9905) Charles Blackmon-Luca_

Documentation ^^^^^^^^^^^^^

  • Fix some outdated information/typos in development guide (:pr:9893) Patrick Hoefler_
  • Add note about keep=False in drop_duplicates docstring (:pr:9887) Jayesh Manani_
  • Add meta details to dask Array (:pr:9886) Jayesh Manani_
  • Clarify task stream showing more rows than threads (:pr:9906) Gabe Joseph_

Maintenance ^^^^^^^^^^^

  • Fix test_numeric_column_names for pandas 2.0 (:pr:9937) Irina Truong_
  • Fix dask/dataframe/tests/test_utils_dataframe.py tests for pandas 2.0 (:pr:9788) James Bourbeau_
  • Replace index.is_numeric with is_any_real_numeric_dtype for pandas 2.0 compatibility (:pr:9918) Irina Truong_
  • Avoid pd.core import in dask utils (:pr:9907) Matthew Roeschke_
  • Use label for upstream build on pull requests (:pr:9910) James Bourbeau_
  • Broaden exception catching for sqlalchemy.exc.RemovedIn20Warning (:pr:9904) James Bourbeau_
  • Temporarily restrict sqlalchemy < 2 in CI (:pr:9897) James Bourbeau_
  • Update isort version to 5.12.0 (:pr:9895) Lawrence Mitchell_
  • Remove unused skiprows variable in read_csv (:pr:9892) Patrick Hoefler_

.. _v2023.1.1:

2023.1.1

Released on January 27, 2023

Enhancements ^^^^^^^^^^^^

  • Add to_backend method to Array and _Frame (:pr:9758) Richard (Rick) Zamora_
  • Small fix for timestamp index divisions in pandas 2.0 (:pr:9872) Irina Truong_
  • Add numeric_only to DataFrame.cov and DataFrame.corr (:pr:9787) James Bourbeau_
  • Fixes related to group_keys default change in pandas 2.0 (:pr:9855) Irina Truong_
  • infer_datetime_format compatibility for pandas 2.0 (:pr:9783) James Bourbeau_

Bug Fixes ^^^^^^^^^

  • Fix serialization bug in BroadcastJoinLayer (:pr:9871) Richard (Rick) Zamora_
  • Satisfy broadcast argument in DataFrame.merge (:pr:9852) Richard (Rick) Zamora_
  • Fix pyarrow parquet columns statistics computation (:pr:9772) aywandji_

Documentation ^^^^^^^^^^^^^

  • Fix "duplicate explicit target name" docs warning (:pr:9863) Chiara Marmo_
  • Fix code formatting issue in "Defining a new collection backend" docs (:pr:9864) Chiara Marmo_
  • Update dashboard documentation for memory plot (:pr:9768) Jayesh Manani_
  • Add docs section about no-worker tasks (:pr:9839) Florian Jetter_

Maintenance ^^^^^^^^^^^

  • Additional updates for detecting a distributed scheduler (:pr:9890) James Bourbeau_
  • Update gpuCI RAPIDS_VER to 23.04 (:pr:9876)
  • Reverse precedence between collection and distributed default (:pr:9869) Florian Jetter_
  • Update xarray-contrib/issue-from-pytest-log to version 1.2.6 (:pr:9865) James Bourbeau_
  • Dont require dask config shuffle default (:pr:9826) Florian Jetter_
  • Un-xfail datetime64 Parquet roundtripping tests for new fastparquet (:pr:9811) James Bourbeau_
  • Add option to manually run upstream CI build (:pr:9853) James Bourbeau_
  • Use custom timeout in CI builds (:pr:9844) James Bourbeau_
  • Remove kwargs from make_blockwise_graph (:pr:9838) Florian Jetter_
  • Ignore warnings on persist call in test_setitem_extended_API_2d_mask (:pr:9843) Charles Blackmon-Luca_
  • Fix running S3 tests locally (:pr:9833) James Bourbeau_

.. _v2023.1.0:

2023.1.0

Released on January 13, 2023

Enhancements ^^^^^^^^^^^^

  • Use distributed default clients even if no config is set (:pr:9808) Florian Jetter_
  • Implement ma.where and ma.nonzero (:pr:9760) Erik Holmgren_
  • Update zarr store creation functions (:pr:9790) Ryan Abernathey_
  • iteritems compatibility for pandas 2.0 (:pr:9785) James Bourbeau_
  • Accurate sizeof for pandas string[python] dtype (:pr:9781) crusaderky_
  • Deflate sizeof() of duplicate references to pandas object types (:pr:9776) crusaderky_
  • GroupBy.__getitem__ compatibility for pandas 2.0 (:pr:9779) James Bourbeau_
  • append compatibility for pandas 2.0 (:pr:9750) James Bourbeau_
  • get_dummies compatibility for pandas 2.0 (:pr:9752) James Bourbeau_
  • is_monotonic compatibility for pandas 2.0 (:pr:9751) James Bourbeau_
  • numpy=1.24 compatability (:pr:9777) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Remove duplicated encoding kwarg in docstring for to_json (:pr:9796) Sultan Orazbayev_
  • Mention SubprocessCluster in LocalCluster documentation (:pr:9784) Hendrik Makait_
  • Move Prometheus docs to dask/distributed (:pr:9761) crusaderky_

Maintenance ^^^^^^^^^^^

  • Temporarily ignore RuntimeWarning in test_setitem_extended_API_2d_mask (:pr:9828) James Bourbeau_
  • Fix flaky test_threaded.py::test_interrupt (:pr:9827) Hendrik Makait_
  • Update xarray-contrib/issue-from-pytest-log in upstream report (:pr:9822) James Bourbeau_
  • pip install dask on gpuCI builds (:pr:9816) Charles Blackmon-Luca_
  • Bump actions/checkout from 3.2.0 to 3.3.0 (:pr:9815)
  • Resolve sqlalchemy import failures in mindeps testing (:pr:9809) Charles Blackmon-Luca_
  • Ignore sqlalchemy.exc.RemovedIn20Warning (:pr:9801) Thomas Grainger_
  • xfail datetime64 Parquet roundtripping tests for pandas 2.0 (:pr:9786) James Bourbeau_
  • Remove sqlachemy 1.3 compatibility (:pr:9695) McToel_
  • Reduce size of expected DoK sparse matrix (:pr:9775) Elliott Sales de Andrade_
  • Remove executable flag from dask/dataframe/io/orc/utils.py (:pr:9774) Elliott Sales de Andrade_

.. _v2022.12.1:

2022.12.1

Released on December 16, 2022

Enhancements ^^^^^^^^^^^^

  • Support dtype_backend="pandas|pyarrow" configuration (:pr:9719) James Bourbeau_
  • Support cupy.ndarray to cudf.DataFrame dispatching in dask.dataframe (:pr:9579) Richard (Rick) Zamora_
  • Make filesystem-backend configurable in read_parquet (:pr:9699) Richard (Rick) Zamora_
  • Serialize all pyarrow extension arrays efficiently (:pr:9740) James Bourbeau_

Bug Fixes ^^^^^^^^^

  • Fix bug when repartitioning with tz-aware datetime index (:pr:9741) James Bourbeau_
  • Partial functions in aggs may have arguments (:pr:9724) Irina Truong_
  • Add support for simple operation with pyarrow-backed extension dtypes (:pr:9717) James Bourbeau_
  • Rename columns correctly in case of SeriesGroupby (:pr:9716) Lawrence Mitchell_

Documentation ^^^^^^^^^^^^^

  • Fix url link typo in collection backend doc (:pr:9748) Shawn_
  • Update Prometheus docs (:pr:9696) Hendrik Makait_

Maintenance ^^^^^^^^^^^

  • Add zarr to Python 3.11 CI environment (:pr:9771) James Bourbeau_
  • Add support for Python 3.11 (:pr:9708) Thomas Grainger_
  • Bump actions/checkout from 3.1.0 to 3.2.0 (:pr:9753)
  • Avoid np.bool8 deprecation warning (:pr:9737) James Bourbeau_
  • Make sure dev packages aren't overwritten in upstream CI build (:pr:9731) James Bourbeau_
  • Avoid adding data.h5 and mydask.html files during tests (:pr:9726) Thomas Grainger_

.. _v2022.12.0:

2022.12.0

Released on December 2, 2022

Enhancements ^^^^^^^^^^^^

  • Remove statistics-based set_index logic from read_parquet (:pr:9661) Richard (Rick) Zamora_
  • Add support for use_nullable_dtypes to dd.read_parquet (:pr:9617) Ian Rose_
  • Fix map_overlap in order to accept pandas arguments (:pr:9571) Fabien Aulaire_
  • Fix pandas 1.5+ FutureWarning in .str.split(..., expand=True) (:pr:9704) Jacob Hayes_
  • Enable column projection for groupby slicing (:pr:9667) Richard (Rick) Zamora_
  • Support duplicate column cum-functions (:pr:9685) Ben_
  • Improve error message for failed backend dispatch call (:pr:9677) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

  • Revise meta creation in arrow parquet engine (:pr:9672) Richard (Rick) Zamora_
  • Fix da.fft.fft for array-like inputs (:pr:9688) James Bourbeau_
  • Fix groupby -aggregation when grouping on an index by name (:pr:9646) Richard (Rick) Zamora_

Maintenance ^^^^^^^^^^^

  • Avoid PytestReturnNotNoneWarning in test_inheriting_class (:pr:9707) Thomas Grainger_
  • Fix flaky test_dataframe_aggregations_multilevel (:pr:9701) Richard (Rick) Zamora_
  • Bump mypy version (:pr:9697) crusaderky_
  • Disable dashboard in test_map_partitions_df_input (:pr:9687) James Bourbeau_
  • Use latest xarray-contrib/issue-from-pytest-log in upstream build (:pr:9682) James Bourbeau_
  • xfail ttest_1samp for upstream scipy (:pr:9670) James Bourbeau_
  • Update gpuCI RAPIDS_VER to 23.02 (:pr:9678)

.. _v2022.11.1:

2022.11.1

Released on November 18, 2022

Enhancements ^^^^^^^^^^^^

  • Restrict bokeh=3 support (:pr:9673) Gabe Joseph_
  • Updates for fastparquet evolution (:pr:9650) Martin Durant_

Maintenance ^^^^^^^^^^^

  • Update ga-yaml-parser step in gpuCI updating workflow (:pr:9675) Charles Blackmon-Luca_
  • Revert importlib.metadata workaround (:pr:9658) James Bourbeau_
  • Fix mindeps-distributed CI build to handle numpy/pandas not being installed (:pr:9668) James Bourbeau_

.. _v2022.11.0:

2022.11.0

Released on November 15, 2022

Enhancements ^^^^^^^^^^^^

  • Generalize from_dict implementation to allow usage from other backends (:pr:9628) GALI PREM SAGAR_

Bug Fixes ^^^^^^^^^

  • Avoid pandas constructors in dask.dataframe.core (:pr:9570) Richard (Rick) Zamora_
  • Fix sort_values with Timestamp data (:pr:9642) James Bourbeau_
  • Generalize array checking and remove pd.Index call in _get_partitions (:pr:9634) Benjamin Zaitlen_
  • Fix read_csv behavior for header=0 and names (:pr:9614) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Update dashboard docs for queuing (:pr:9660) Gabe Joseph_
  • Remove import dask as d from docstrings (:pr:9644) Matthew Rocklin_
  • Fix link to partitions docs in read_parquet docstring (:pr:9636) qheuristics_
  • Add API doc links to array/bag/dataframe sections (:pr:9630) Matthew Rocklin_

Maintenance ^^^^^^^^^^^

  • Use conda-incubator/[email protected] (:pr:9662) John A Kirkham_
  • Allow bokeh=3 (:pr:9659) James Bourbeau_
  • Run upstream build with Python 3.10 (:pr:9655) James Bourbeau_
  • Pin pyyaml version in mindeps testing (:pr:9640) Charles Blackmon-Luca_
  • Add pre-commit to catch breakpoint() (:pr:9638) James Bourbeau_
  • Bump xarray-contrib/issue-from-pytest-log from 1.1 to 1.2 (:pr:9635)
  • Remove blosc references (:pr:9625) Naty Clementi_
  • Upgrade mypy and drop unused comments (:pr:9616) Hendrik Makait_
  • Harden test_repartition_npartitions (:pr:9585) Richard (Rick) Zamora_

.. _v2022.10.2:

2022.10.2

Released on October 31, 2022

This was a hotfix and has no changes in this repository. The necessary fix was in dask/distributed, but we decided to bump this version number for consistency.

.. _v2022.10.1:

2022.10.1

Released on October 28, 2022

Enhancements ^^^^^^^^^^^^

  • Enable named aggregation syntax (:pr:9563) ChrisJar_
  • Add extension dtype support to set_index (:pr:9566) James Bourbeau_
  • Redesigning the array HTML repr for clarity (:pr:9519) Shingo OKAWA_

Bug Fixes ^^^^^^^^^

  • Fix merge with emtpy left DataFrame (:pr:9578) Ian Rose_

Documentation ^^^^^^^^^^^^^

  • Add note about limiting thread oversubscription by default (:pr:9592) James Bourbeau_
  • Use sphinx-click for dask CLI (:pr:9589) James Bourbeau_
  • Fix Semaphore API docs (:pr:9584) James Bourbeau_
  • Render meta description in map_overlap docstring (:pr:9568) James Bourbeau_

Maintenance ^^^^^^^^^^^

  • Require Click 7.0+ in Dask (:pr:9595) John A Kirkham_
  • Temporarily restrict bokeh<3 (:pr:9607) James Bourbeau_
  • Resolve importlib-related failures in upstream CI (:pr:9604) Charles Blackmon-Luca_
  • Improve upstream CI report (:pr:9603) James Bourbeau_
  • Fix upstream CI report (:pr:9602) James Bourbeau_
  • Remove setuptools host dep, add CLI entrypoint (:pr:9600) Charles Blackmon-Luca_
  • More Backend dispatch class type annotations (:pr:9573) Ian Rose_

.. _v2022.10.0:

2022.10.0

Released on October 14, 2022

New Features ^^^^^^^^^^^^

  • Backend library dispatching for IO in Dask-Array and Dask-DataFrame (:pr:9475) Richard (Rick) Zamora_
  • Add new CLI that is extensible (:pr:9283) Doug Davis_

Enhancements ^^^^^^^^^^^^

  • Groupby median (:pr:9516) Ian Rose_
  • Fix array copy not being a no-op (:pr:9555) David Hoese_
  • Add support for string timedelta in map_overlap (:pr:9559) Nicolas Grandemange_
  • Shuffle-based groupby for single functions (:pr:9504) Ian Rose_
  • Make datetime.datetime tokenize idempotantly (:pr:9532) Martin Durant_
  • Support tokenizing datetime.time (:pr:9528) Tim Paine_

Bug Fixes ^^^^^^^^^

  • Avoid race condition in lazy dispatch registration (:pr:9545) James Bourbeau_
  • Do not allow setitem to np.nan for int dtype (:pr:9531) Doug Davis_
  • Stable demo column projection (:pr:9538) Ian Rose_
  • Ensure pickle-able binops in delayed (:pr:9540) Ian Rose_
  • Fix project CSV columns when selecting (:pr:9534) Martin Durant_

Documentation ^^^^^^^^^^^^^

  • Update Parquet best practice (:pr:9537) Matthew Rocklin_

Maintenance ^^^^^^^^^^^

  • Restrict tiledb-py version to avoid CI failures (:pr:9569) James Bourbeau_
  • Bump actions/github-script from 3 to 6 (:pr:9564)
  • Bump actions/stale from 4 to 6 (:pr:9551)
  • Bump peter-evans/create-pull-request from 3 to 4 (:pr:9550)
  • Bump actions/checkout from 2 to 3.1.0 (:pr:9552)
  • Bump codecov/codecov-action from 1 to 3 (:pr:9549)
  • Bump the-coding-turtle/ga-yaml-parser from 0.1.1 to 0.1.2 (:pr:9553)
  • Move dependabot configuration file (:pr:9547) James Bourbeau_
  • Add dependabot for GitHub actions (:pr:9542) James Bourbeau_
  • Run mypy on Windows and Linux (:pr:9530) crusaderky_
  • Update gpuCI RAPIDS_VER to 22.12 (:pr:9524)

.. _v2022.9.2:

2022.9.2

Released on September 30, 2022

Enhancements ^^^^^^^^^^^^

  • Remove factorization logic from array auto chunking (:pr:9507) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Add docs on running Dask in a standalone Python script (:pr:9513) James Bourbeau_
  • Clarify custom-graph multiprocessing example (:pr:9511) nouman_

Maintenance ^^^^^^^^^^^

  • Groupby sort upstream compatibility (:pr:9486) Ian Rose_

.. _v2022.9.1:

2022.9.1

Released on September 16, 2022

New Features ^^^^^^^^^^^^

  • Add DataFrame and Series median methods (:pr:9483) James Bourbeau_

Enhancements ^^^^^^^^^^^^

  • Shuffle groupby default (:pr:9453) Ian Rose_
  • Filter by list (:pr:9419) Greg Hayes_
  • Added distributed.utils.key_split functionality to dask.utils.key_split (:pr:9464) Luke Conibear_

Bug Fixes ^^^^^^^^^

  • Fix overlap so that set_index doesn't drop rows (:pr:9423) Julia Signell_
  • Fix assigning pandas Series to column when ddf.columns.min() raises (:pr:9485) Erik Welch_
  • Fix metadata comparison stack_partitions (:pr:9481) James Bourbeau_
  • Provide default for split_out (:pr:9493) Lawrence Mitchell_

Deprecations ^^^^^^^^^^^^

  • Allow split_out to be None, which then defaults to 1 in groupby().aggregate() (:pr:9491) Ian Rose_

Documentation ^^^^^^^^^^^^^

  • Fixing enforce_metadata documentation, not checking for dtypes (:pr:9474) Nicolas Grandemange_
  • Fix it's --> its typo (:pr:9484) Nat Tabris_

Maintenance ^^^^^^^^^^^

  • Workaround for parquet writing failure using some datetime series but not others (:pr:9500) Ian Rose_
  • Filter out numeric_only warnings from pandas (:pr:9496) James Bourbeau_
  • Avoid set_index(..., inplace=True) where not necessary (:pr:9472) James Bourbeau_
  • Avoid passing groupby key list of length one (:pr:9495) James Bourbeau_
  • Update test_groupby_dropna_cudf based on cudf support for group_keys (:pr:9482) James Bourbeau_
  • Remove dd.from_bcolz (:pr:9479) James Bourbeau_
  • Added flake8-bugbear to pre-commit hooks (:pr:9457) Luke Conibear_
  • Bind loop variables in function definitions (B023) (:pr:9461) Luke Conibear_
  • Added assert for comparisons (B015) (:pr:9459) Luke Conibear_
  • Set top-level default shell in CI workflows (:pr:9469) James Bourbeau_
  • Removed unused loop control variables (B007) (:pr:9458) Luke Conibear_
  • Replaced getattr calls for constant attributes (B009) (:pr:9460) Luke Conibear_
  • Pin libprotobuf to allow nightly pyarrow in the upstream CI build (:pr:9465) Joris Van den Bossche_
  • Replaced mutable data structures for default arguments (B006) (:pr:9462) Luke Conibear_
  • Changed flake8 mirror and updated version (:pr:9456) Luke Conibear_

.. _v2022.9.0:

2022.9.0

Released on September 2, 2022

Enhancements ^^^^^^^^^^^^

  • Enable automatic column projection for groupby aggregations (:pr:9442) Richard (Rick) Zamora_
  • Accept superclasses in NEP-13/17 dispatching (:pr:6710) Gabe Joseph_

Bug Fixes ^^^^^^^^^

  • Rename by columns internally for cumulative operations on the same by columns (:pr:9430) Pavithra Eswaramoorthy_
  • Fix get_group with categoricals (:pr:9436) Pavithra Eswaramoorthy_
  • Fix caching-related MaterializedLayer.cull performance regression (:pr:9413) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Add maintainer documentation page (:pr:9309) James Bourbeau_

Maintenance ^^^^^^^^^^^

  • Revert skipped fastparquet test (:pr:9439) Pavithra Eswaramoorthy_
  • tmpfile does not end files with period on empty extension (:pr:9429) Hendrik Makait_
  • Skip failing fastparquet test with latest release (:pr:9432) James Bourbeau_

.. _v2022.8.1:

2022.8.1

Released on August 19, 2022

New Features ^^^^^^^^^^^^

  • Implement ma.*_like functions (:pr:9378) Ruth Comer_

Enhancements ^^^^^^^^^^^^

  • Fuse compatible annotations (:pr:9402) Ian Rose_
  • Shuffle-based groupby aggregation for high-cardinality groups (:pr:9302) Richard (Rick) Zamora_
  • Unpack namedtuple (:pr:9361) Hendrik Makait_

Bug Fixes ^^^^^^^^^

  • Fix SeriesGroupBy cumulative functions with axis=1 (:pr:9377) Pavithra Eswaramoorthy_
  • Sparse array reductions (:pr:9342) Ian Rose_
  • Fix make_meta while using categorical column with index (:pr:9348) Pavithra Eswaramoorthy_
  • Don't allow incompatible keywords in DataFrame.dropna (:pr:9366) Naty Clementi_
  • Make set_index handle entirely empty dataframes (:pr:8896) Julia Signell_
  • Improve dataclass handling in unpack_collections (:pr:9345) Hendrik Makait_
  • Fix bag sampling when there are some smaller partitions (:pr:9349) Ian Rose_
  • Add support for empty partitions to da.min/da.max functions (:pr:9268) geraninam_

Documentation ^^^^^^^^^^^^^

  • Clarify that bind() etc. regenerate the keys (:pr:9385) crusaderky_
  • Consolidate dashboard diagnostics documentation (:pr:9357) Sarah Charlotte Johnson_
  • Remove outdated meta information Pavithra Eswaramoorthy_

Maintenance ^^^^^^^^^^^

  • Use entry_points utility in sizeof (:pr:9390) James Bourbeau_
  • Add entry_points compatibility utility (:pr:9388) Jacob Tomlinson_
  • Upload environment file artifact for each CI build (:pr:9372) James Bourbeau_
  • Remove werkzeug pin in CI (:pr:9371) James Bourbeau_
  • Fix type annotations for dd.from_pandas and dd.from_delayed (:pr:9362) Jordan Yap_

.. _v2022.8.0:

2022.8.0

Released on August 5, 2022

Enhancements ^^^^^^^^^^^^

  • Ensure make_meta doesn't hold ref to data (:pr:9354) Jim Crist-Harif_
  • Revise divisions logic in from_pandas (:pr:9221) Richard (Rick) Zamora_
  • Warn if user sets index with existing index (:pr:9341) Julia Signell_
  • Add keepdims keyword for da.average (:pr:9332) Ruth Comer_
  • Change repr methods to avoid Layer materialization (:pr:9289) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

  • Make sure order kwarg will not crash the astype method (:pr:9317) Genevieve Buckley_
  • Fix bug for cumsum on cupy chunked dask arrays (:pr:9320) Genevieve Buckley_
  • Match input and output structure in _sample_reduce (:pr:9272) Pavithra Eswaramoorthy_
  • Include meta in array serialization (:pr:9240) Frédéric BRIOL_
  • Fix Index.memory_usage (:pr:9290) James Bourbeau_
  • Fix division calculation in dask.dataframe.io.from_dask_array (:pr:9282) Jordan Yap_

Documentation ^^^^^^^^^^^^^

  • Fow to use kwargs with custom task graphs (:pr:9322) Genevieve Buckley_
  • Add note to da.from_array about how the order is not preserved (:pr:9346) Julia Signell_
  • Add I/O info for async functions (:pr:9326) Logan Norman_
  • Tidy up docs snippet for futures IO functions (:pr:9340) Julia Signell_
  • Use consistent variable names for pandas df and Dask ddf in dataframe-groupby.rst (:pr:9304) ivojuroro_
  • Switch js-yaml for yaml.js in config converter (:pr:9306) Jacob Tomlinson_

Maintenance ^^^^^^^^^^^

  • Update da.linalg.solve for SciPy 1.9.0 compatibility (:pr:9350) Pavithra Eswaramoorthy_
  • Update test_getitem_avoids_large_chunks_missing (:pr:9347) Pavithra Eswaramoorthy_
  • Fix docs title formatting for "Extend sizeof" Doug Davis_
  • Import loop_in_thread fixture in tests (:pr:9337) James Bourbeau_
  • Temporarily xfail test_solve_sym_pos (:pr:9336) Pavithra Eswaramoorthy_
  • Fix small typo in 10 minutes to Dask page (:pr:9329) Shaghayegh_
  • Temporarily pin werkzeug in CI to avoid test suite hanging (:pr:9325) James Bourbeau_
  • Add tests for cupy.angle() (:pr:9312) Peter Andreas Entschev_
  • Update gpuCI RAPIDS_VER to 22.10 (:pr:9314)
  • Add pandas[test] to test extra (:pr:9110) Ben Beasley_
  • Add bokeh and scipy to upstream CI build (:pr:9265) James Bourbeau_

.. _v2022.7.1:

2022.7.1

Released on July 22, 2022

Enhancements ^^^^^^^^^^^^

  • Return Dask array if all axes are squeezed (:pr:9250) Pavithra Eswaramoorthy_
  • Make cycle reported by toposort shorter (:pr:9068) Erik Welch_
  • Unknown chunk slicing - raise informative error (:pr:9285) Naty Clementi_

Bug Fixes ^^^^^^^^^

  • Fix bug in HighLevelGraph.cull (:pr:9267) Richard (Rick) Zamora_
  • Sort categories (:pr:9264) Pavithra Eswaramoorthy_
  • Use max (instead of sum) for calculating warnsize (:pr:9235) Pavithra Eswaramoorthy_
  • Fix bug when filtering on partitioned column with pyarrow (:pr:9252) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Updated repartition documentation to add note about partition_size (:pr:9288) Dylan Stewart_
  • Don't include docs in Array methods, just refer to module docs (:pr:9244) Julia Signell_
  • Remove outdated reference to scheduler and worker dashboards (:pr:9278) Pavithra Eswaramoorthy_
  • Fix a few typos (:pr:9270) Tim Gates_
  • Adds an custom aggregate example using numpy methods (:pr:9260) geraninam_

Maintenance ^^^^^^^^^^^

  • Add type annotations to dd.from_pandas and dd.from_delayed (:pr:9237) Michael Milton_
  • Update calculate_divisions docstring (:pr:9275) Tom Augspurger_
  • Update test_plot_multiple for upcoming bokeh release (:pr:9261) James Bourbeau_
  • Add typing to common array properties (:pr:9255) Illviljan_

.. _v2022.7.0:

2022.7.0

Released on July 8, 2022

Enhancements ^^^^^^^^^^^^

  • Support pathlib.PurePath in normalize_token (:pr:9229) Angus Hollands_
  • Add AttributeNotImplementedError for properties so IPython glob search works (:pr:9231) Erik Welch_
  • map_overlap: multiple dataframe handling (:pr:9145) Fabien Aulaire_
  • Read entrypoints in dask.sizeof (:pr:7688) Angus Hollands_

Bug Fixes ^^^^^^^^^

  • Fix TypeError: 'Serialize' object is not subscriptable when writing parquet dataset with Client(processes=False) (:pr:9015) Lucas Miguel Ponce_
  • Correct dtypes when concat with an empty dataframe (:pr:9193) Pavithra Eswaramoorthy_

Documentation ^^^^^^^^^^^^^

  • Highlight note about persist (:pr:9234) Pavithra Eswaramoorthy_
  • Update release-procedure to include more detail and helpful commands (:pr:9215) Julia Signell_
  • Better SEO for Futures and Dask vs. Spark pages (:pr:9217) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

  • Use math.prod instead of np.prod on lists, tuples, and iters (:pr:9232) crusaderky_
  • Only import IPython if type checking (:pr:9230) Florian Jetter_
  • Tougher mypy checks (:pr:9206) crusaderky_

.. _v2022.6.1:

2022.6.1

Released on June 24, 2022

Enhancements ^^^^^^^^^^^^

  • Dask in pyodide (:pr:9053) Ian Rose_
  • Create dask.utils.show_versions (:pr:9144) Sultan Orazbayev_
  • Better error message for unsupported numpy operations on dask.dataframe objects. (:pr:9201) Julia Signell_
  • Add allow_rechunk kwarg to dask.array.overlap function (:pr:7776) Genevieve Buckley_
  • Add minutes and hours to dask.utils.format_time (:pr:9116) Matthew Rocklin_
  • More retries when writing parquet to remote filesystem (:pr:9175) Ian Rose_

Bug Fixes ^^^^^^^^^

  • Timedelta deterministic hashing (:pr:9213) Fabien Aulaire_
  • Enum deterministic hashing (:pr:9212) Fabien Aulaire_
  • shuffle_group(): avoid converting to arrays (:pr:9157) Mads R. B. Kristensen_

Deprecations ^^^^^^^^^^^^

  • Deprecate extra format_time utility (:pr:9184) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Better SEO for 10 Minutes to Dask (:pr:9182) Sarah Charlotte Johnson_
  • Better SEO for Delayed and Best Practices (:pr:9194) Sarah Charlotte Johnson_
  • Include known inconsistency in DataFrame str.split accessor docstring (:pr:9177) Richard Pelgrim_
  • Add inconsistencies keyword to derived_from (:pr:9192) Richard Pelgrim_
  • Add missing append in delayed best practices example (:pr:9202) Ben_
  • Fix indentation in Best Practices (:pr:9196) Sarah Charlotte Johnson_
  • Add link to Genevieve Buckley's blog on chunk sizes (:pr:9199) Pavithra Eswaramoorthy
  • Update to_csv docstring (:pr:9094) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

  • Update versioneer: change from using SafeConfigParser to ConfigParser (:pr:9205) Thomas A Caswell_
  • Remove ipython hack in CI(:pr:9200) crusaderky_

.. _v2022.6.0:

2022.6.0

Released on June 10, 2022

Enhancements ^^^^^^^^^^^^

  • Add feature to show names of layer dependencies in HLG JupyterLab repr (:pr:9081) Angelos Omirolis_
  • Add arrow schema extraction dispatch (:pr:9169) GALI PREM SAGAR_
  • Add sort_results argument to assert_eq (:pr:9130) Pavithra Eswaramoorthy_
  • Add weeks to parse_timedelta (:pr:9168) Matthew Rocklin_
  • Warn that cloudpickle is not always deterministic (:pr:9148) Pavithra Eswaramoorthy_
  • Switch parquet default engine (:pr:9140) Jim Crist-Harif_
  • Use deterministic hashing with _iLocIndexer / _LocIndexer (:pr:9108) Fabien Aulaire_
  • Enfore consistent schema in to_parquet pyarrow (:pr:9131) Jim Crist-Harif_

Bug Fixes ^^^^^^^^^

  • Fix pyarrow.StringArray pickle (:pr:9170) Jim Crist-Harif_
  • Fix parallel metadata collection in pyarrow engine (:pr:9165) Richard (Rick) Zamora_
  • Improve pyarrow partitioning logic (:pr:9147) James Bourbeau_
  • pyarrow 8.0 partitioning fix (:pr:9143) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Better SEO for Installing Dask and Dask DataFrame Best Practices (:pr:9178) Sarah Charlotte Johnson_
  • Update logos page in docs (:pr:9167) Sarah Charlotte Johnson_
  • Add example using pandas Series to map_partition doctring (:pr:9161) Alex-JG3_
  • Update docs theme for rebranding (:pr:9160) Sarah Charlotte Johnson_
  • Better SEO for docs on Dask DataFrames (:pr:9128) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

  • Remove ensure_file from recommended practice for downstream libraries (:pr:9171) Matthew Rocklin_
  • Test round-tripping DataFrame parquet I/O including pyspark (:pr:9156) Ian Rose_
  • Try disabling HDF5 locking (:pr:9154) Ian Rose_
  • Link best practices to DataFrame-parquet (:pr:9150) Tom Augspurger_
  • Fix typo in map_partitions func parameter description (:pr:9149) Christopher Akiki_
  • Un-xfail test_groupby_grouper_dispatch (:pr:9139) GALI PREM SAGAR_
  • Temporarily import cleanup fixture from distributed (:pr:9138) James Bourbeau_
  • Simplify partitioning logic in pyarrow parquet engine (:pr:9041) Richard (Rick) Zamora_

.. _v2022.05.2:

2022.05.2

Released on May 26, 2022

Enhancements ^^^^^^^^^^^^

  • Add a dispatch for non-pandas Grouper objects and use it in GroupBy (:pr:9074) brandon-b-miller_
  • Error if read_parquet & to_parquet files intersect (:pr:9124) Jim Crist-Harif_
  • Visualize task graphs using ipycytoscape (:pr:9091) Ian Rose_

Documentation ^^^^^^^^^^^^^

  • Fix various typos (:pr:9126) Ryan Russell_

Maintenance ^^^^^^^^^^^

  • Fix flaky test_filter_nonpartition_columns (:pr:9127) Pavithra Eswaramoorthy_
  • Update gpuCI RAPIDS_VER to 22.08 (:pr:9120)
  • Include ``conftest.py``` in sdists (:pr:9115) Ben Beasley_

.. _v2022.05.1:

2022.05.1

Released on May 24, 2022

New Features ^^^^^^^^^^^^

  • Add DataFrame.from_dict classmethod (:pr:9017) Matthew Powers_
  • Add from_map function to Dask DataFrame (:pr:8911) Richard (Rick) Zamora_

Enhancements ^^^^^^^^^^^^

  • Improve to_parquet error for appended divisions overlap (:pr:9102) Jim Crist-Harif_
  • Enabled user-defined process-initializer functions (:pr:9087) ParticularMiner_
  • Mention align_dataframes=False option in map_partitions error (:pr:9075) Gabe Joseph_
  • Add kwarg enforce_ndim to dask.array.map_blocks() (:pr:8865) ParticularMiner_
  • Implement Series.GroupBy.fillna / DataFrame.GroupBy.fillna methods (:pr:8869) Pavithra Eswaramoorthy_
  • Allow fillna with Dask DataFrame (:pr:8950) Pavithra Eswaramoorthy_
  • Update error message for assignment with 1-d dask array (:pr:9036) Pavithra Eswaramoorthy_
  • Collection Protocol (:pr:8674) Doug Davis_
  • Patch around pandas ArrowStringArray pickling (:pr:9024) Jim Crist-Harif_
  • Band-aid for compute_as_if_collection (:pr:8998) Ian Rose_
  • Add p2p shuffle option (:pr:8836) Matthew Rocklin_

Bug Fixes ^^^^^^^^^

  • Fixup column projection with no columns (:pr:9106) Jim Crist-Harif_
  • Blockwise cull NumPy dtype (:pr:9100) Ian Rose_
  • Fix column-projection bug in from_map (:pr:9078) Richard (Rick) Zamora_
  • Prevent nulls in index for non-numeric dtypes (:pr:8963) Jorge López_
  • Fix is_monotonic methods for more than 8 partitions (:pr:9019) Julia Signell_
  • Handle enumerate and generator inputs to from_map (:pr:9066) Richard (Rick) Zamora_
  • Revert is_dask_collection; back to previous implementation (:pr:9062) Doug Davis_
  • Fix Blockwise.clone does not handle iterable literal arguments correctly (:pr:8979) JSKenyon_
  • Array setitem hardmask (:pr:9027) David Hassell_
  • Fix overlapping divisions error on append (:pr:8997) Ian Rose_

Deprecations ^^^^^^^^^^^^

  • Add pre-deprecation warnings for read_parquet kwargs chunksize and aggregate_files (:pr:9052) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Document map_partitions handling of args vs kwargs, usage of partition_info (:pr:9084) Charles Blackmon-Luca_
  • Update custom collection documentation (leverage new collection protocol) (:pr:9097) Doug Davis_
  • Better SEO for docs on creating and storing Dask DataFrames (:pr:9098) Sarah Charlotte Johnson_
  • Clarify chunking in imread docstring (:pr:9082) Genevieve Buckley_
  • Rearrange docs TOC (:pr:9001) Matthew Rocklin_
  • Corrected map_blocks() docstring for kwarg enforce_ndim (:pr:9071) ParticularMiner_
  • Update DataFrame SQL docs references to other libraries (:pr:9077) Charles Blackmon-Luca_
  • Update page on creating and storing Dask DataFrames (:pr:9025) Sarah Charlotte Johnson_

Maintenance ^^^^^^^^^^^

  • Include NUMPY_LICENSE.txt in license files (:pr:9113) Ben Beasley_
  • Increase retries when installing nightly pandas (:pr:9103) James Bourbeau_
  • Force nightly pyarrow in the upstream build (:pr:9095) Joris Van den Bossche_
  • Improve object handling & testing of ensure_unicode (:pr:9059) John A Kirkham_
  • Force nightly pyarrow in the upstream build (:pr:8993) Joris Van den Bossche_
  • Additional check on is_dask_collection (:pr:9054) Doug Davis_
  • Update ensure_bytes (:pr:9050) John A Kirkham_
  • Add end of file pre-commit hook (:pr:9045) James Bourbeau_
  • Add codespell pre-commit hook (:pr:9040) James Bourbeau_
  • Remove the HDFS tests (:pr:9039) Jim Crist-Harif_
  • Fix flaky test_reductions_2D (:pr:9037) Jim Crist-Harif_
  • Prevent codecov from notifying of failure too soon (:pr:9031) Jim Crist-Harif_
  • Only test on Python 3.9 on macos (:pr:9029) Jim Crist-Harif_
  • Update to_timedelta default unit (:pr:9010) Pavithra Eswaramoorthy_

.. _v2022.05.0:

2022.05.0

Released on May 2, 2022

Highlights ^^^^^^^^^^ This is a bugfix release for this issue <https://github.com/dask/distributed/issues/6255>_.

Documentation ^^^^^^^^^^^^^

  • Add highlights section to 2022.04.2 release notes (:pr:9012) James Bourbeau_

.. _v2022.04.2:

2022.04.2

Released on April 29, 2022

Highlights ^^^^^^^^^^ This release includes several deprecations/breaking API changes to dask.dataframe.read_parquet and dask.dataframe.to_parquet:

  • to_parquet no longer writes _metadata files by default. If you want to write a _metadata file, you can pass in write_metadata_file=True.
  • read_parquet now defaults to split_row_groups=False, which results in one Dask dataframe partition per parquet file when reading in a parquet dataset. If you're working with large parquet files you may need to set split_row_groups=True to reduce your partition size.
  • read_parquet no longer calculates divisions by default. If you require read_parquet to return dataframes with known divisions, please set calculate_divisions=True.
  • read_parquet has deprecated the gather_statistics keyword argument. Please use the calculate_divisions keyword argument instead.
  • read_parquet has deprecated the require_extensions keyword argument. Please use the parquet_file_extension keyword argument instead.

New Features ^^^^^^^^^^^^

  • Add removeprefix and removesuffix as StringMethods (:pr:8912) Jorge López_

Enhancements ^^^^^^^^^^^^

  • Call fs.invalidate_cache in to_parquet (:pr:8994) Jim Crist-Harif_
  • Change to_parquet default to write_metadata_file=None (:pr:8988) Jim Crist-Harif_
  • Let arg reductions pass keepdims (:pr:8926) Julia Signell_
  • Change split_row_groups default to False in read_parquet (:pr:8981) Richard (Rick) Zamora_
  • Improve NotImplementedError message for da.reshape (:pr:8987) Jim Crist-Harif_
  • Simplify to_parquet compute path (:pr:8982) Jim Crist-Harif_
  • Raise an error if you try to use vindex with a Dask object (:pr:8945) Julia Signell_
  • Avoid pre_buffer=True when a precache method is specified (:pr:8957) Richard (Rick) Zamora_
  • from_dask_array uses blockwise instead of merging graphs (:pr:8889) Bryan Weber_
  • Use pre_buffer=True for "pyarrow" Parquet engine (:pr:8952) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

  • Handle dtype=None correctly in da.full (:pr:8954) Tom White_
  • Fix dask-sql bug caused by blockwise fusion (:pr:8989) Richard (Rick) Zamora_
  • to_parquet errors for non-string column names (:pr:8990) Jim Crist-Harif_
  • Make sure da.roll works even if shape is 0 (:pr:8925) Julia Signell_
  • Fix recursion error issue with set_index (:pr:8967) Paul Hobson_
  • Stringify BlockwiseDepDict mapping values when produces_keys=True (:pr:8972) Richard (Rick) Zamora_
  • Use DataFrameIOLayer in ``DataFrame.from_delayed`` (:pr:8852) Richard (Rick) Zamora`_
  • Check that values for the in predicate in read_parquet are correct (:pr:8846) Bryan Weber_
  • Fix bug for reduction of zero dimensional arrays (:pr:8930) Tom White_
  • Specify dtype when deciding division using np.linspace in read_sql_query (:pr:8940) Cheun Hong_

Deprecations ^^^^^^^^^^^^

  • Deprecate gather_statistics from read_parquet (:pr:8992) Richard (Rick) Zamora_
  • Change require_extension to top-level parquet_file_extension read_parquet kwarg (:pr:8935) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Update write_metadata_file discussion in documentation (:pr:8995) Richard (Rick) Zamora_
  • Update DataFrame.merge docstring (:pr:8966) Pavithra Eswaramoorthy_
  • Added description for parameter align_arrays in array.blockwise() (:pr:8977) ParticularMiner_
  • ecommend not to use map_block(drop_axis=...) on chunked axes of an array (:pr:8921) ParticularMiner_
  • Add copy button to code snippets in docs (:pr:8956) James Bourbeau_

Maintenance ^^^^^^^^^^^

  • Pandas 1.5.0 compatibility (:pr:8961) Ian Rose_
  • Add pytest-timeout to distributed envs on CI (:pr:8986) Julia Signell_
  • Improve read_parquet docstring formatting (:pr:8971) Bryan Weber_
  • Remove pytest.warns(None) (:pr:8924) Pavithra Eswaramoorthy_
  • Document Python 3.10 as supported (:pr:8976) Eray Aslan_
  • parse_timedelta option to enforce explicit unit (:pr:8969) crusaderky_
  • mypy compatibility (:pr:8854) Paul Hobson_
  • Add a docs page for Dask & Parquet (:pr:8899) Jim Crist-Harif_
  • Adds configuration to ignore revs in blame (:pr:8933) Bryan Weber_

.. _v2022.04.1:

2022.04.1

Released on April 15, 2022

New Features ^^^^^^^^^^^^

  • Add missing NumPy ufuncs: abs, left_shift, right_shift, positive. (:pr:8920) Tom White_

Enhancements ^^^^^^^^^^^^

  • Avoid collecting parquet metadata in pyarrow when write_metadata_file=False (:pr:8906) Richard (Rick) Zamora_
  • Better error for failed wildcard path in dd.read_csv() (fixes #8878) (:pr:8908) Roger Filmyer_
  • Return da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series (:pr:8558) Julia Signell_
  • Let get_dummies use meta computation in map_partitions (:pr:8898) Julia Signell_
  • Masked scalars input to da.from_array (:pr:8895) David Hassell_
  • Raise ValueError in merge_asof for duplicate kwargs (:pr:8861) Bryan Weber_

Bug Fixes ^^^^^^^^^

  • Make is_monotonic work when some partitions are empty (:pr:8897) Julia Signell_
  • Fix custom getter in da.from_array when inline_array=False (:pr:8903) Ian Rose_
  • Correctly handle dict-specification for rechunk. (:pr:8859) Richard_
  • Fix merge_asof: drop index column if left_on == right_on (:pr:8874) Gil Forsyth_

Deprecations ^^^^^^^^^^^^

  • Warn users that engine='auto' will change in future (:pr:8907) Jim Crist-Harif_
  • Remove pyarrow-legacy engine from parquet API (:pr:8835) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Add note on missing parameter out for dask.array.dot (:pr:8913) Francesco Andreuzzi_
  • Update DataFrame.query docstring (:pr:8890) Pavithra Eswaramoorthy_

Maintenance ^^^^^^^^^^^

  • Don't test da.prod on large integer data (:pr:8893) Jim Crist-Harif_
  • Add network marks to tests that fail without an internet connection (:pr:8881) Paul Hobson_
  • Fix gpuCI GHA version (:pr:8891) Charles Blackmon-Luca_
  • xfail/skip some flaky distributed tests (:pr:8887) Jim Crist-Harif_
  • Remove unused (deprecated) code from ArrowDatasetEngine (:pr:8885) Richard (Rick) Zamora_
  • Add mild typing to common utils functions, part 2 (:pr:8867) crusaderky_
  • Documentation of Limitation of sample() (:pr:8858) Nadiem Sissouno_

.. _v2022.04.0:

2022.04.0

Released on April 1, 2022

.. note::

This is the first release with support for Python 3.10

New Features ^^^^^^^^^^^^

  • Add Python 3.10 support (:pr:8566) James Bourbeau_

Enhancements ^^^^^^^^^^^^

  • Add check on dtype.itemsize in order to produce a useful error (:pr:8860) Davide Gavio_
  • Add mild typing to common utils functions (:pr:8848) Matthew Rocklin_
  • Add sanity checks to divisions setter (:pr:8806) Jim Crist-Harif_
  • Use Blockwise and map_partitions for more tasks (:pr:8831) Bryan Weber_

Bug Fixes ^^^^^^^^^

  • Fix dataframe.merge_asof to preserve right_on column (:pr:8857) Sarah Charlotte Johnson_
  • Fix "Buffer dtype mismatch" for pandas >= 1.3 on 32bit (:pr:8851) Ben Greiner_
  • Fix slicing fusion by altering SubgraphCallable getter (:pr:8827) Ian Rose_

Deprecations ^^^^^^^^^^^^

  • Remove support for PyPy (:pr:8863) James Bourbeau_
  • Drop setuptools at runtime (:pr:8855) crusaderky_
  • Remove dataframe.tseries.resample.getnanos (:pr:8834) Sarah Charlotte Johnson_

Documentation ^^^^^^^^^^^^^

  • Organize diagnostic and performance docs (:pr:8871) Naty Clementi_
  • Add image to explain drop_axis option of map_blocks (:pr:8868) ParticularMiner_

Maintenance ^^^^^^^^^^^

  • Update gpuCI RAPIDS_VER to 22.06 (:pr:8828)
  • Restore test_parquet in http (:pr:8850) Bryan Weber_
  • Simplify gpuCI updating workflow (:pr:8849) Charles Blackmon-Luca_

.. _v2022.03.0:

2022.03.0

Released on March 18, 2022

New Features ^^^^^^^^^^^^

  • Bag: add implementation for reservoir sampling (:pr:7636) Daniel Mesejo-León_
  • Add ma.count to Dask array (:pr:8785) David Hassell_
  • Change to_parquet default to compression="snappy" (:pr:8814) Jim Crist-Harif_
  • Add weights parameter to dask.array.reduction (:pr:8805) David Hassell_
  • Add ddf.compute_current_divisions to get divisions on a sorted index or column (:pr:8517) Julia Signell_

Enhancements ^^^^^^^^^^^^

  • Pass __name__ and __doc__ through on DelayedLeaf (:pr:8820) Leo Gao_
  • Raise exception for not implemented merge how option (:pr:8818) Naty Clementi_
  • Move Bag.map_partitions to Blockwise (:pr:8646) Richard (Rick) Zamora_
  • Improve error messages for malformed config files (:pr:8801) Jim Crist-Harif_
  • Revise column-projection optimization to capture common dask-sql patterns (:pr:8692) Richard (Rick) Zamora_
  • Useful error for empty divisions (:pr:8789) Pavithra Eswaramoorthy_
  • Scipy 1.8.0 compat: copy private classes into dask/array/stats.py (:pr:8694) Julia Signell_
  • Raise warning when using multiple types of schedulers where one is distributed (:pr:8700) Pedro Silva_

Bug Fixes ^^^^^^^^^

  • Fix bug in applying != filter in read_parquet (:pr:8824) Richard (Rick) Zamora_
  • Fix set_index when directly passed a dask Index (:pr:8680) Paul Hobson_
  • Quick fix for unbounded memory usage in tensordot (:pr:7980) Genevieve Buckley_
  • If hdf file is empty, don't fail on meta creation (:pr:8809) Julia Signell_
  • Update clone_key("x") to retain prefix (:pr:8792) crusaderky_
  • Fix "physical" column bug in pyarrow-based read_parquet (:pr:8775) Richard (Rick) Zamora_
  • Fix groupby.shift bug caused by unsorted partitions after shuffle (:pr:8782) kori73_
  • Fix serialization bug (:pr:8786) Richard (Rick) Zamora_

Deprecations ^^^^^^^^^^^^

  • Bump diagnostics bokeh dependency to 2.4.2 (:pr:8791) Charles Blackmon-Luca_
  • Deprecate bcolz support (:pr:8754) Pavithra Eswaramoorthy_
  • Finish making map_overlap default boundary kwarg 'none' (:pr:8743) Genevieve Buckley_

Documentation ^^^^^^^^^^^^^

  • Custom collection example docs fix (:pr:8807) Doug Davis_
  • Add Series.str, Series.dt, and Series.cat accessors to docs (:pr:8757) Sarah Charlotte Johnson_
  • Fix docstring for ddf.compute_current_divisions (:pr:8793) Julia Signell_
  • Dashboard docs on /status page (:pr:8648) Naty Clementi_
  • Clarify divisions kwarg in repartition docstring (:pr:8781) Sarah Charlotte Johnson_
  • Update Docker images to use ghcr.io (:pr:8774) Jacob Tomlinson_

Maintenance ^^^^^^^^^^^

  • Reduce gpuci pytest parallelism (:pr:8826) GALI PREM SAGAR_
  • absolufy-imports - No relative imports - PEP8 (:pr:8796) Julia Signell_
  • Tidy up assert_eq calls in array tests (:pr:8812) Julia Signell_
  • Avoid pytest.warns(None) (:pr:8718) LSturtew_
  • Fix test_describe_empty to work without global -Werror (:pr:8291) Michał Górny_
  • Temporarily xfail graphviz tests on windows (:pr:8794) Jim Crist-Harif_
  • Use packaging.parse for md5 compatibility (:pr:8763) James Bourbeau_
  • Make tokenize work in a FIPS 140-2 environment (:pr:8762) Jim Crist-Harif_
  • Label issues and PRs on open with 'needs triage' (:pr:8761) Julia Signell_
  • Add some extra test coverage (:pr:8302) lrjball_
  • Specify action version and change from pull_request_target to pull_request (:pr:8767) Julia Signell_
  • Make scheduler kwarg pass though to sub functions in da.assert_eq (:pr:8755) Julia Signell_

.. _v2022.02.1:

2022.02.1

Released on February 25, 2022

New Features ^^^^^^^^^^^^

  • Add aggregate functions first and last to dask.dataframe.pivot_table (:pr:8649) Knut Nordanger_
  • Add std() support for datetime64 dtype for pandas-like objects (:pr:8523) Ben Glossner_
  • Add materialized task counts to HighLevelGraph and Layer html reprs (:pr:8589) kori73_

Enhancements ^^^^^^^^^^^^

  • Do not allow iterating a DataFrameGroupBy (:pr:8696) Bryan Weber_
  • Fix missing newline after info() call on empty DataFrame (:pr:8727) Naty Clementi_
  • Add groupby.compute as a not implemented method (:pr:8734) Dranaxel_
  • Improve multi dataframe join performance (:pr:8740) Holden Karau_
  • Include bool type for Index (:pr:8732) Naty Clementi_
  • Allow ArrowDatasetEngine subclass to override pandas->arrow conversion also for partitioned write (:pr:8741) Joris Van den Bossche_
  • Increase performance of k-diagonal extraction in da.diag() and da.diagonal() (:pr:8689) ParticularMiner_
  • Change linspace creation to match numpy when num equal to 0 (:pr:8676) Peter_
  • Tokenize dataclasses (:pr:8557) Gabe Joseph_
  • Update tokenize to treat dict and kwargs differently (:pr:8655) James Bourbeau_

Bug Fixes ^^^^^^^^^

  • Fix bug in dask.array.roll() for roll-shifts that match the size of the input array (:pr:8723) ParticularMiner_
  • Fix for normalize_function dataclass methods (:pr:8527) Sarah Charlotte Johnson_
  • Fix rechunking with zero-size-chunks (:pr:8703) ParticularMiner_
  • Move creation of sqlalchemy connection for picklability (:pr:8745) Julia Signell_

Deprecations ^^^^^^^^^^^^

  • Drop Python 3.7 (:pr:8572) James Bourbeau_
  • Deprecate iteritems (:pr:8660) James Bourbeau_
  • Deprecate dataframe.tseries.resample.getnanos (:pr:8752) Sarah Charlotte Johnson_
  • Add deprecation warning for pyarrow-legacy engine (:pr:8758) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Update link typos in changelog (:pr:8717) James Bourbeau_
  • Clarify dask.visualize docstring (:pr:8710) Dranaxel_
  • Update Docker example to use current best practices (:pr:8731) Jacob Tomlinson_
  • Update docs to include distributed.Client.preload (:pr:8679) Bryan Weber_
  • Document monthly social meeting (:pr:8595) Thomas Grainger_
  • Add docs for Gen2 access with RBAC/ACL i.e. security principal (:pr:8748) Martin Thøgersen_
  • Use Dask configuration extension from dask-sphinx-theme (:pr:8751) Benjamin Zaitlen_

Maintenance ^^^^^^^^^^^

  • Unpin coverage in CI (:pr:8690) James Bourbeau_
  • Add manual trigger for running test suite (:pr:8716) James Bourbeau_
  • Xfail scheduler_HLG_unpack_import; flaky test (:pr:8724) Mike McCarty_
  • Temporarily remove scipy upstream CI build (:pr:8725) James Bourbeau_
  • Bump pre-release version to be greater than stable releases (:pr:8728) Charles Blackmon-Luca_
  • Move custom sort function logic to internal sort_values (:pr:8571) Charles Blackmon-Luca_
  • Pin cloudpickle and scipy in docs requirements (:pr:8737) Julia Signell_
  • Make the labeler not delete labels, and look for the docs at the right spot (:pr:8746) Julia Signell_
  • Fix docs build warnings (:pr:8432) Kristopher Overholt_
  • Update test status badge (:pr:8747) James Bourbeau_
  • Fix parquet test_pandas_timestamp_overflow_pyarrow test (:pr:8733) Joris Van den Bossche_
  • Only run PR builds on changes to relevant files (:pr:8756) Charles Blackmon-Luca_

.. _v2022.02.0:

2022.02.0

Released on February 11, 2022

.. note::

This is the last release with support for Python 3.7

New Features ^^^^^^^^^^^^

  • Add region to to_zarr when using existing array (:pr:8590) Chris Roat_
  • Add engine_kwargs support to dask.dataframe.to_sql (:pr:8609) Amir Kadivar_
  • Add include_path_column arg to read_json (:pr:8603) Bryan Weber_
  • Add expand_dims to Dask array (:pr:8687) Tom White_

Enhancements ^^^^^^^^^^^^

  • Add scheduler option to assert_eq utilities (:pr:8610) Xinrong Meng_
  • Fix eye inconsistency with NumPy for dtype=None (:pr:8685) Tom White_
  • Fix concatenate inconsistency with NumPy for axis=None (:pr:8686) Tom White_
  • Type annotations, part 1 (:pr:8295) crusaderky_
  • Really allow any iterable to be passed as a meta (:pr:8629) Julia Signell_
  • Use map_partitions (Blockwise) in to_parquet (:pr:8487) Richard (Rick) Zamora_

Bug Fixes ^^^^^^^^^

  • Result of reducing an array should not depend on its chunk-structure (:pr:8637) ParticularMiner_
  • Pass place-holder metadata to map_partitions in ACA code path (:pr:8643) Richard (Rick) Zamora_

Deprecations ^^^^^^^^^^^^

  • Deprecate is_monotonic (:pr:8653) James Bourbeau_
  • Remove some deprecations (:pr:8605) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Add Domino Data Lab to Hosted / managed Dask clusters (:pr:8675) Ray Bell_
  • Fix inter-linking and remove deprecated function (:pr:8715) Julia Signell_
  • Fix imbalanced backticks. (:pr:8693) Matthias Bussonnier_
  • Add documentation for high level graph visualization (:pr:8483) Genevieve Buckley_
  • Update documentation of ProgressBar out parameter (:pr:8604) Pedro Silva_
  • Improve documentation of dask.config.set (:pr:8705) crusaderky_
  • Revert mention to mypy among type checkers (:pr:8699) crusaderky_

Maintenance ^^^^^^^^^^^

  • Update warning handling in get_dummies tests (:pr:8651) James Bourbeau_
  • Add a github changelog template (:pr:8714) Julia Signell_
  • Update year in LICENSE.txt (:pr:8665) David Hoese_
  • Update pre-commit version (:pr:8691) James Bourbeau_
  • Include scipy in upstream CI build (:pr:8681) James Bourbeau_
  • Temporarily pin scipy < 1.8.0 in CI (:pr:8683) James Bourbeau_
  • Pin scipy to less than 1.8.0 in GPU CI (:pr:8698) Julia Signell_
  • Avoid pytest.warns(None) in test_multi.py (:pr:8678) James Bourbeau_
  • Update GHA concurrent job cancellation (:pr:8652) James Bourbeau_
  • Make test__get_paths robust to site.PREFIXES being set (:pr:8644) James Bourbeau_
  • Bump gpuCI PYTHON_VER to 3.9 (:pr:8642) Charles Blackmon-Luca_

.. _v2022.01.1:

2022.01.1

Released on January 28, 2022

New Features ^^^^^^^^^^^^

  • Add dask.dataframe.series.view() (:pr:8533) Pavithra Eswaramoorthy_

Enhancements ^^^^^^^^^^^^

  • Update tz for fastparquet + pandas 1.4.0 (:pr:8626) Martin Durant_
  • Cleaning up misc tests for pandas compat (:pr:8623) Julia Signell_
  • Moving to SQLAlchemy >= 1.4 (:pr:8158) McToel_
  • Pandas compat: Filter sparse warnings (:pr:8621) Julia Signell_
  • Fail if meta is not a pandas object (:pr:8563) Julia Signell_
  • Use fsspec.parquet module for better remote-storage read_parquet performance (:pr:8339) Richard (Rick) Zamora_
  • Move DataFrame ACA aggregations to HLG (:pr:8468) Richard (Rick) Zamora_
  • Add optional information about originating function call in DataFrameIOLayer (:pr:8453) Richard (Rick) Zamora_
  • Blockwise array creation redux (:pr:7417) Ian Rose_
  • Refactor config default search path retrieval (:pr:8573) James Bourbeau_
  • Add optimize_graph flag to Bag.to_dataframe function (:pr:8486) Maxim Lippeveld_
  • Make sure that delayed output operations still return lists of paths (:pr:8498) Julia Signell_
  • Pandas compat: Fix to_frame name to not pass None (:pr:8554) Julia Signell_
  • Pandas compat: Fix axis=None warning (:pr:8555) Julia Signell_
  • Expand Dask YAML config search directories (:pr:8531) abergou_

Bug Fixes ^^^^^^^^^

  • Fix groupby.cumsum with series grouped by index (:pr:8588) Julia Signell_
  • Fix derived_from for pandas methods (:pr:8612) Thomas J. Fan_
  • Enforce boolean ascending for sort_values (:pr:8440) Charles Blackmon-Luca_
  • Fix parsing of __setitem__ indices (:pr:8601) David Hassell_
  • Avoid divide by zero in slicing (:pr:8597) Doug Davis_

Deprecations ^^^^^^^^^^^^

  • Downgrade meta error in (:pr:8563) to warning (:pr:8628) Julia Signell_
  • Pandas compat: Deprecate append when pandas >= 1.4.0 (:pr:8617) Julia Signell_

Documentation ^^^^^^^^^^^^^

  • Replace outdated columns argument with meta in DataFrame constructor (:pr:8614) kori73_
  • Refactor deploying docs (:pr:8602) Jacob Tomlinson_

Maintenance ^^^^^^^^^^^

  • Pin coverage in CI (:pr:8631) James Bourbeau_
  • Move cached_cumsum imports to be from dask.utils (:pr:8606) James Bourbeau_
  • Update gpuCI RAPIDS_VER to 22.04 (:pr:8600)
  • Update cocstring for from_delayed function (:pr:8576) Kirito1397_
  • Handle plot_width / plot_height deprecations (:pr:8544) Bryan Van de Ven_
  • Remove unnecessary pyyaml importorskip (:pr:8562) James Bourbeau_
  • Specify scheduler in DataFrame assert_eq (:pr:8559) Gabe Joseph_

.. _v2022.01.0:

2022.01.0

Released on January 14, 2022

New Features ^^^^^^^^^^^^

  • Add groupby.shift method (:pr:8522) kori73_
  • Add DataFrame.nunique (:pr:8479) Sarah Charlotte Johnson_
  • Add da.ndim to match np.ndim (:pr:8502) Julia Signell_

Enhancements ^^^^^^^^^^^^

  • Only show percentile interpolation= keyword warning if NumPy version >= 1.22 (:pr:8564) Julia Signell_
  • Raise PerformanceWarning when limit and "array.slicing.split-large-chunks" are None (:pr:8511) Julia Signell_
  • Define normalize_seq function at import time (:pr:8521) Illviljan_
  • Ensure that divisions are alway tuples (:pr:8393) Charles Blackmon-Luca_
  • Allow a callable scheduler for bag.groupby (:pr:8492) Julia Signell_
  • Save Zarr arrays with dask-on-ray scheduler (:pr:8472) TnTo_
  • Make byte blocks more even in read_bytes (:pr:8459) Martin Durant_
  • Improved the efficiency of matmul() by completely removing concatenation (:pr:8423) ParticularMiner_
  • Limit max chunk size when reshaping dask arrays (:pr:8124) Genevieve Buckley_
  • Changes for fastparquet superthrift (:pr:8470) Martin Durant_

Bug Fixes ^^^^^^^^^

  • Fix boolean indices in array assignment (:pr:8538) David Hassell_
  • Detect default dtype on array-likes (:pr:8501) aeisenbarth_
  • Fix optimize_blockwise bug for duplicate dependency names (:pr:8542) Richard (Rick) Zamora_
  • Update warnings for DataFrame.GroupBy.apply and transform (:pr:8507) Sarah Charlotte Johnson_
  • Track HLG layer name in Delayed (:pr:8452) Gabe Joseph_
  • Fix single item nanmin and nanmax reductions (:pr:8484) Julia Signell_
  • Make read_csv with comment kwarg work even if there is a comment in the header (:pr:8433) Julia Signell_

Deprecations ^^^^^^^^^^^^

  • Replace interpolation with method and method with internal_method (:pr:8525) Julia Signell_
  • Remove daily stock demo utility (:pr:8477) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Add a join example in docs that be run with copy/paste (:pr:8520) kori73_
  • Mention dashboard link in config (:pr:8510) Ray Bell_
  • Fix changelog section hyperlinks (:pr:8534) Aneesh Nema_
  • Hyphenate "single-machine scheduler" for consistency (:pr:8519) Deepyaman Datta_
  • Normalize whitespace in doctests in slicing.py (:pr:8512) Maren Westermann_
  • Best practices storage line typo (:pr:8529) Michael Delgado_
  • Update figures (:pr:8401) Sarah Charlotte Johnson_
  • Remove pyarrow-only reference from split_row_groups in read_parquet docstring (:pr:8490) Naty Clementi_

Maintenance ^^^^^^^^^^^

  • Remove obsolete LocalFileSystem tests that fail for fsspec>=2022.1.0 (:pr:8565) Richard (Rick) Zamora_
  • Tweak: "RuntimeWarning: invalid value encountered in reciprocal" (:pr:8561) crusaderky_
  • Fix skipna=None for DataFrame.sem (:pr:8556) Julia Signell_
  • Fix PANDAS_GT_140 (:pr:8552) Julia Signell_
  • Collections with HLG must always implement __dask_layers__ (:pr:8548) crusaderky_
  • Work around race condition in import llvmlite (:pr:8550) crusaderky_
  • Set a minimum version for pyyaml (:pr:8545) Gaurav Sheni_
  • Adding nodefaults to environments to fix tiledb + mac issue (:pr:8505) Julia Signell_
  • Set ceiling for setuptools (:pr:8509) Julia Signell_
  • Add workflow / recipe to generate Dask nightlies (:pr:8469) Charles Blackmon-Luca_
  • Bump gpuCI CUDA_VER to 11.5 (:pr:8489) Charles Blackmon-Luca_

.. _v2021.12.0:

2021.12.0

Released on December 10, 2021

New Features ^^^^^^^^^^^^

  • Add Series and Index is_monotonic* methods (:pr:8304) Daniel Mesejo-León_

Enhancements ^^^^^^^^^^^^

  • Blockwise map_partitions with partition_info (:pr:8310) Gabe Joseph_
  • Better error message for length of array with unknown chunk sizes (:pr:8436) Doug Davis_
  • Use by instead of index internally on the Groupby class (:pr:8441) Julia Signell_
  • Allow custom sort functions for sort_values (:pr:8345) Charles Blackmon-Luca_
  • Add warning to read_parquet when statistics and partitions are misaligned (:pr:8416) Richard (Rick) Zamora_
  • Support where argument in ufuncs (:pr:8253) mihir_
  • Make visualize more consistent with compute (:pr:8328) JSKenyon_

Bug Fixes ^^^^^^^^^

  • Fix map_blocks not using own arguments in name generation (:pr:8462) David Hoese_
  • Fix for index error with reading empty parquet file (:pr:8410) Sarah Charlotte Johnson_
  • Fix nullable-dtype error when writing partitioned parquet data (:pr:8400) Richard (Rick) Zamora_
  • Fix CSV header bug (:pr:8413) Richard (Rick) Zamora_
  • Fix empty chunk causes exception in nanmin/nanmax (:pr:8375) Boaz Mohar_

Deprecations ^^^^^^^^^^^^

  • Deprecate token keyword argument to map_blocks (:pr:8464) James Bourbeau_
  • Deprecation warning for default value of boundary kwarg in map_overlap (:pr:8397) Genevieve Buckley_

Documentation ^^^^^^^^^^^^^

  • Clarify block_info documentation (:pr:8425) Genevieve Buckley_
  • Output from alt text sprint (:pr:8456) Sarah Charlotte Johnson_
  • Update talks and presentations (:pr:8370) Naty Clementi_
  • Update Anaconda link in "Paid support" section of docs (:pr:8427) Martin Durant_
  • Fixed broken dask-gateway link in ecosystem.rst (:pr:8424) ofirr_
  • Fix CuPy doctest error (:pr:8412) Genevieve Buckley_

Maintenance ^^^^^^^^^^^

  • Bump Bokeh min version to 2.1.1 (:pr:8431) Bryan Van de Ven_
  • Fix following fsspec=2021.11.1 release (:pr:8428) Martin Durant_
  • Add dask/ml.py to pytest exclude list (:pr:8414) Genevieve Buckley_
  • Update gpuCI RAPIDS_VER to 22.02 (:pr:8394)
  • Unpin graphviz and improve package management in environment-3.7 (:pr:8411) Julia Signell_

.. _v2021.11.2:

2021.11.2

Released on November 19, 2021

  • Only run gpuCI bump script daily (:pr:8404) Charles Blackmon-Luca_
  • Actually ignore index when asked in assert_eq (:pr:8396) Gabe Joseph_
  • Ensure single-partition join divisions is tuple (:pr:8389) Charles Blackmon-Luca_
  • Try to make divisions behavior clearer (:pr:8379) Julia Signell_
  • Fix typo in set_index partition_size parameter description (:pr:8384) FredericOdermatt_
  • Use blockwise in single_partition_join (:pr:8341) Gabe Joseph_
  • Use more explicit keyword arguments (:pr:8354) Boaz Mohar_
  • Fix .loc of DataFrame with nullable boolean dtype (:pr:8368) Marco Rossi_
  • Parameterize shuffle implementation in tests (:pr:8250) Ian Rose_
  • Remove some doc build warnings (:pr:8369) Boaz Mohar_
  • Include properties in array API docs (:pr:8356) Julia Signell_
  • Fix Zarr for upstream (:pr:8367) Julia Signell_
  • Pin graphviz to avoid issue with windows and Python 3.7 (:pr:8365) Julia Signell_
  • Import graphviz.Diagraph from top of module, not from dot (:pr:8363) Julia Signell_

.. _v2021.11.1:

2021.11.1

Released on November 8, 2021

Patch release to update distributed dependency to version 2021.11.1.

.. _v2021.11.0:

2021.11.0

Released on November 5, 2021

  • Fx required_extension behavior in read_parquet (:pr:8351) Richard (Rick) Zamora_
  • Add align_dataframes to map_partitions to broadcast a dataframe passed as an arg (:pr:6628) Julia Signell_
  • Better handling for arrays/series of keys in dask.dataframe.loc (:pr:8254) Julia Signell_
  • Point users to Discourse (:pr:8332) Ian Rose_
  • Add name_function option to to_parquet (:pr:7682) Matthew Powers_
  • Get rid of environment-latest.yml and update to Python 3.9 (:pr:8275) Julia Signell_
  • Require newer s3fs in CI (:pr:8336) James Bourbeau_
  • Groupby Rolling (:pr:8176) Julia Signell_
  • Add more ordering diagnostics to dask.visualize (:pr:7992) Erik Welch_
  • Use HighLevelGraph optimizations for delayed (:pr:8316) Ian Rose_
  • demo_tuples produces malformed HighLevelGraph (:pr:8325) crusaderky_
  • Dask calendar should show events in local time (:pr:8312) Genevieve Buckley_
  • Fix flaky test_interrupt (:pr:8314) crusaderky_
  • Deprecate AxisError (:pr:8305) crusaderky_
  • Fix name of cuDF in extension documentation. (:pr:8311) Vyas Ramasubramani_
  • Add single eq operator (=) to parquet filters (:pr:8300) Ayush Dattagupta_
  • Improve support for Spark output in read_parquet (:pr:8274) Richard (Rick) Zamora_
  • Add dask.ml module (:pr:6384) Matthew Rocklin_
  • CI fixups (:pr:8298) James Bourbeau_
  • Make slice errors match NumPy (:pr:8248) Julia Signell_
  • Fix API docs misrendering with new sphinx theme (:pr:8296) Julia Signell_
  • Replace block property with blockview for array-like operations on blocks (:pr:8242) Davis Bennett_
  • Deprecate file_path and make it possible to save from within a notebook (:pr:8283) Julia Signell_

.. _v2021.10.0:

2021.10.0

Released on October 22, 2021

  • da.store to create well-formed HighLevelGraph (:pr:8261) crusaderky_
  • CI: force nightly pyarrow in the upstream build (:pr:8281) Joris Van den Bossche_
  • Remove chest (:pr:8279) James Bourbeau_
  • Skip doctests if optional dependencies are not installed (:pr:8258) Genevieve Buckley_
  • Update tmpdir and tmpfile context manager docstrings (:pr:8270) Daniel Mesejo-León_
  • Unregister callbacks in doctests (:pr:8276) James Bourbeau_
  • Fix typo in docs (:pr:8277) JoranDox_
  • Stale label GitHub action (:pr:8244) Genevieve Buckley_
  • Client-shutdown method appears twice (:pr:8273) German Shiklov_
  • Add pre-commit to test requirements (:pr:8257) Genevieve Buckley_
  • Refactor read_metadata in fastparquet engine (:pr:8092) Richard (Rick) Zamora_
  • Support Path objects in from_zarr (:pr:8266) Samuel Gaist_
  • Make nested redirects work (:pr:8272) Julia Signell_
  • Set memory_usage to True if verbose is True in info (:pr:8222) Kinshuk Dua_
  • Remove individual API doc pages from sphinx toctree (:pr:8238) James Bourbeau_
  • Ignore whitespace in gufunc signature (:pr:8267) James Bourbeau_
  • Add workflow to update gpuCI (:pr:8215) Charles Blackmon-Luca_
  • DataFrame.head shouldn't warn when there's one partition (:pr:8091) Pankaj Patil_
  • Ignore arrow doctests if pyarrow not installed (:pr:8256) Genevieve Buckley_
  • Fix debugging.html redirect (:pr:8251) James Bourbeau_
  • Fix null sorting for single partition dataframes (:pr:8225) Charles Blackmon-Luca_
  • Fix setup.html redirect (:pr:8249) Florian Jetter_
  • Run pyupgrade in CI (:pr:8246) crusaderky_
  • Fix label typo in upstream CI build (:pr:8237) James Bourbeau_
  • Add support for "dependent" columns in DataFrame.assign (:pr:8086) Suriya Senthilkumar_
  • add NumPy array of Dask keys to Array (:pr:7922) Davis Bennett_
  • Remove unnecessary dask.multiprocessing import in docs (:pr:8240) Ray Bell_
  • Adjust retrieving _max_workers from Executor (:pr:8228) John A Kirkham_
  • Update function signatures in delayed best practices docs (:pr:8231) Vũ Trung Đức_
  • Docs reoganization (:pr:7984) Julia Signell_
  • Fix df.quantile on all missing data (:pr:8129) Julia Signell_
  • Add tokenize.ensure-deterministic config option (:pr:7413) Hristo Georgiev_
  • Use inclusive rather than closed with pandas>=1.4.0 and pd.date_range (:pr:8213) Julia Signell_
  • Add dask-gateway, Coiled, and Saturn-Cloud to list of Dask setup tools (:pr:7814) Kristopher Overholt_
  • Ensure existing futures get passed as deps when serializing HighLevelGraph layers (:pr:8199) Jim Crist-Harif_
  • Make sure that the divisions of the single partition merge is left (:pr:8162) Julia Signell_
  • Refactor read_metadata in pyarrow parquet engines (:pr:8072) Richard (Rick) Zamora_
  • Support negative drop_axis in map_blocks and map_overlap (:pr:8192) Gregory R. Lee_
  • Fix upstream tests (:pr:8205) Julia Signell_
  • Add support for scalar item assignment by Series (:pr:8195) Charles Blackmon-Luca_
  • Add some basic examples to doc strings on dask.bag all, any, count methods (:pr:7630) Nathan Danielsen_
  • Don't have upstream report depend on commit message (:pr:8202) James Bourbeau_
  • Ensure upstream CI cron job runs (:pr:8200) James Bourbeau_
  • Use pytest.param to properly label param-specific GPU tests (:pr:8197) Charles Blackmon-Luca_
  • Add test_set_index to tests ran on gpuCI (:pr:8198) Charles Blackmon-Luca_
  • Suppress tmpfile OSError (:pr:8191) James Bourbeau_
  • Use s.isna instead of pd.isna(s) in set_partitions_pre (fix cudf CI) (:pr:8193) Charles Blackmon-Luca_
  • Open an issue for test-upstream failures (:pr:8067) Wallace Reis_
  • Fix to_parquet bug in call to pyarrow.parquet.read_metadata (:pr:8186) Richard (Rick) Zamora_
  • Add handling for null values in sort_values (:pr:8167) Charles Blackmon-Luca_
  • Bump RAPIDS_VER for gpuCI (:pr:8184) Charles Blackmon-Luca_
  • Dispatch walks MRO for lazily registered handlers (:pr:8185) Jim Crist-Harif_
  • Configure SSHCluster instructions (:pr:8181) Ray Bell_
  • Preserve HighLevelGraphs in DataFrame.from_delayed (:pr:8174) Gabe Joseph_
  • Deprecate inplace argument for Dask series renaming (:pr:8136) Marcel Coetzee_
  • Fix rolling for compatibility with pandas > 1.3.0 (:pr:8150) Julia Signell_
  • Raise error when setitem on unknown chunks (:pr:8166) Julia Signell_
  • Include divisions when doing Index.to_series (:pr:8165) Julia Signell_

.. _v2021.09.1:

2021.09.1

Released on September 21, 2021

  • Fix groupby for future pandas (:pr:8151) Julia Signell_
  • Remove warning filters in tests that are no longer needed (:pr:8155) Julia Signell_
  • Add link to diagnostic visualize function in local diagnostic docs (:pr:8157) David Hoese_
  • Add datetime_is_numeric to dataframe.describe (:pr:7719) Julia Signell_
  • Remove references to pd.Int64Index in anticipation of deprecation (:pr:8144) Julia Signell_
  • Use loc if needed for series __get_item__ (:pr:7953) Julia Signell_
  • Specifically ignore warnings on mean for empty slices (:pr:8125) Julia Signell_
  • Skip groupby nunique test for pandas >= 1.3.3 (:pr:8142) Julia Signell_
  • Implement ascending arg for sort_values (:pr:8130) Charles Blackmon-Luca_
  • Replace operator.getitem (:pr:8015) Naty Clementi_
  • Deprecate zero_broadcast_dimensions and homogeneous_deepmap (:pr:8134) SnkSynthesis_
  • Add error if drop_index is negative (:pr:8064) neel iyer_
  • Allow scheduler to be an Executor (:pr:8112) John A Kirkham_
  • Handle asarray/asanyarray cases where like is a dask.Array (:pr:8128) Peter Andreas Entschev_
  • Fix index_col duplication if index_col is type str (:pr:7661) McToel_
  • Add dtype and order to asarray and asanyarray definitions (:pr:8106) Julia Signell_
  • Deprecate dask.dataframe.Series.__contains__ (:pr:7914) Julia Signell_
  • Fix edge case with like-arrays in _wrapped_qr (:pr:8122) Peter Andreas Entschev_
  • Deprecate boundary_slice kwarg: kind for pandas compat (:pr:8037) Julia Signell_

.. _v2021.09.0:

2021.09.0

Released on September 3, 2021

  • Fewer open files (:pr:7303) Julia Signell_
  • Add FileNotFound to expected http errors (:pr:8109) Martin Durant_
  • Add DataFrame.sort_values to API docs (:pr:8107) Benjamin Zaitlen_
  • Change to dask.order: be more eager at times (:pr:7929) Erik Welch_
  • Add pytest color to CI (:pr:8090) James Bourbeau_
  • FIX: make_people works with processes scheduler (:pr:8103) Dahn_
  • Adds deep param to Dataframe copy method and restrict it to False (:pr:8068) João Paulo Lacerda_
  • Fix typo in configuration docs (:pr:8104) Robert Hales_
  • Update formatting in DataFrame.query docstring (:pr:8100) James Bourbeau_
  • Un-xfail sparse tests for 0.13.0 release (:pr:8102) James Bourbeau_
  • Add axes property to DataFrame and Series (:pr:8069) Jordan Jensen_
  • Add CuPy support in da.unique (values only) (:pr:8021) Peter Andreas Entschev_
  • Unit tests for sparse.zeros_like (xfailed) (:pr:8093) crusaderky_
  • Add explicit like kwarg support to array creation functions (:pr:8054) Peter Andreas Entschev_
  • Separate Array and DataFrame mindeps builds (:pr:8079) James Bourbeau_
  • Fork out percentile_dispatch to dask.array (:pr:8083) GALI PREM SAGAR_
  • Ensure filepath exists in to_parquet (:pr:8057) James Bourbeau_
  • Update scheduler plugin usage in test_scheduler_highlevel_graph_unpack_import (:pr:8080) James Bourbeau_
  • Add DataFrame.shuffle to API docs (:pr:8076) Martin Fleischmann_
  • Order requirements alphabetically (:pr:8073) John A Kirkham_

.. _v2021.08.1:

2021.08.1

Released on August 20, 2021

  • Add ignore_metadata_file option to read_parquet (pyarrow-dataset and fastparquet support only) (:pr:8034) Richard (Rick) Zamora_
  • Add reference to pytest-xdist in dev docs (:pr:8066) Julia Signell_
  • Include tz in meta from to_datetime (:pr:8000) Julia Signell_
  • CI Infra Docs (:pr:7985) Benjamin Zaitlen_
  • Include invalid DataFrame key in assert_eq check (:pr:8061) James Bourbeau_
  • Use __class__ when creating DataFrames (:pr:8053) Mads R. B. Kristensen_
  • Use development version of distributed in gpuCI build (:pr:7976) James Bourbeau_
  • Ignore whitespace when gufunc signature (:pr:8049) James Bourbeau_
  • Move pandas import and percentile dispatch refactor (:pr:8055) GALI PREM SAGAR_
  • Add colors to represent high level layer types (:pr:7974) Freyam Mehta_
  • Upstream instance fix (:pr:8060) Jacob Tomlinson_
  • Add dask.widgets and migrate HTML reprs to jinja2 (:pr:8019) Jacob Tomlinson_
  • Remove wrap_func_like_safe, not required with NumPy >= 1.17 (:pr:8052) Peter Andreas Entschev_
  • Fix threaded scheduler memory backpressure regression (:pr:8040) David Hoese_
  • Add percentile dispatch (:pr:8029) GALI PREM SAGAR_
  • Use a publicly documented attribute obj in groupby rather than private _selected_obj (:pr:8038) GALI PREM SAGAR_
  • Specify module to import rechunk from (:pr:8039) Illviljan_
  • Use dict to store data for {nan,}arg{min,max} in certain cases (:pr:8014) Peter Andreas Entschev_
  • Fix blocksize description formatting in read_pandas (:pr:8047) Louis Maddox_
  • Fix "point" -> "pointers" typo in docs (:pr:8043) David Chudzicki_

.. _v2021.08.0:

2021.08.0

Released on August 13, 2021

  • Fix to_orc delayed compute behavior (:pr:8035) Richard (Rick) Zamora_
  • Don't convert to low-level task graph in compute_as_if_collection (:pr:7969) James Bourbeau_
  • Fix multifile read for hdf (:pr:8033) Julia Signell_
  • Resolve warning in distributed tests (:pr:8025) James Bourbeau_
  • Update to_orc collection name (:pr:8024) James Bourbeau_
  • Resolve skipfooter problem (:pr:7855) Ross_
  • Raise NotImplementedError for non-indexable arg passed to to_datetime (:pr:7989) Doug Davis_
  • Ensure we error on warnings from distributed (:pr:8002) James Bourbeau_
  • Added dict format in to_bag accessories of DataFrame (:pr:7932) gurunath_
  • Delayed docs indirect dependencies (:pr:8016) aa1371_
  • Add tooltips to graphviz high-level graphs (:pr:7973) Freyam Mehta_
  • Close 2021 User Survey (:pr:8007) Julia Signell_
  • Reorganize CuPy tests into multiple files (:pr:8013) Peter Andreas Entschev_
  • Refactor and Expand Dask-Dataframe ORC API (:pr:7756) Richard (Rick) Zamora_
  • Don't enforce columns if enforce=False (:pr:7916) Julia Signell_
  • Fix map_overlap trimming behavior when drop_axis is not None (:pr:7894) Gregory R. Lee_
  • Mark gpuCI CuPy test as flaky (:pr:7994) Peter Andreas Entschev_
  • Avoid using Delayed in to_csv and to_parquet (:pr:7968) Matthew Rocklin_
  • Removed redundant check_dtypes (:pr:7952) gurunath_
  • Use pytest.warns instead of raises for checking parquet engine deprecation (:pr:7993) Joris Van den Bossche_
  • Bump RAPIDS_VER in gpuCI to 21.10 (:pr:7991) Charles Blackmon-Luca_
  • Add back pyarrow-legacy test coverage for pyarrow>=5 (:pr:7988) Richard (Rick) Zamora_
  • Allow pyarrow>=5 in to_parquet and read_parquet (:pr:7967) Richard (Rick) Zamora_
  • Skip CuPy tests requiring NEP-35 when NumPy < 1.20 is available (:pr:7982) Peter Andreas Entschev_
  • Add tail and head to SeriesGroupby (:pr:7935) Daniel Mesejo-León_
  • Update Zoom link for monthly meeting (:pr:7979) James Bourbeau_
  • Add gpuCI build script (:pr:7966) Charles Blackmon-Luca_
  • Deprecate daily_stock utility (:pr:7949) James Bourbeau_
  • Add distributed.nanny to configuration reference docs (:pr:7955) James Bourbeau_
  • Require NumPy 1.18+ & Pandas 1.0+ (:pr:7939) John A Kirkham_

.. _v2021.07.2:

2021.07.2

Released on July 30, 2021

.. note::

This is the last release with support for NumPy 1.17 and pandas 0.25. Beginning with the next release, NumPy 1.18 and pandas 1.0 will be the minimum supported versions.

  • Add dask.array SVG to the HTML Repr (:pr:7886) Freyam Mehta_
  • Avoid use of Delayed in to_parquet (:pr:7958) Matthew Rocklin_
  • Temporarily pin pyarrow<5 in CI (:pr:7960) James Bourbeau_
  • Add deprecation warning for top-level ucx and rmm config values (:pr:7956) James Bourbeau_
  • Remove skips from doctests (4 of 6) (:pr:7865) Zhengnan Zhao_
  • Remove skips from doctests (5 of 6) (:pr:7864) Zhengnan Zhao_
  • Adds missing prepend/append functionality to da.diff (:pr:7946) Peter Andreas Entschev_
  • Change graphviz font family to sans (:pr:7931) Freyam Mehta_
  • Fix read-csv name - when path is different, use different name for task (:pr:7942) Julia Signell_
  • Update configuration reference for ucx and rmm changes (:pr:7943) James Bourbeau_
  • Add meta support to __setitem__ (:pr:7940) Peter Andreas Entschev_
  • NEP-35 support for slice_with_int_dask_array (:pr:7927) Peter Andreas Entschev_
  • Unpin fastparquet in CI (:pr:7928) James Bourbeau_
  • Remove skips from doctests (3 of 6) (:pr:7872) Zhengnan Zhao_

.. _v2021.07.1:

2021.07.1

Released on July 23, 2021

  • Make array assert_eq check dtype (:pr:7903) Julia Signell_
  • Remove skips from doctests (6 of 6) (:pr:7863) Zhengnan Zhao_
  • Remove experimental feature warning from actors docs (:pr:7925) Matthew Rocklin_
  • Remove skips from doctests (2 of 6) (:pr:7873) Zhengnan Zhao_
  • Separate out Array and Bag API (:pr:7917) Julia Signell_
  • Implement lazy Array.__iter__ (:pr:7905) Julia Signell_
  • Clean up places where we inadvertently iterate over arrays (:pr:7913) Julia Signell_
  • Add numeric_only kwarg to DataFrame reductions (:pr:7831) Julia Signell_
  • Add pytest marker for GPU tests (:pr:7876) Charles Blackmon-Luca_
  • Add support for histogram2d in dask.array (:pr:7827) Doug Davis_
  • Remove skips from doctests (1 of 6) (:pr:7874) Zhengnan Zhao_
  • Add node size scaling to the Graphviz output for the high level graphs (:pr:7869) Freyam Mehta_
  • Update old Bokeh links (:pr:7915) Bryan Van de Ven_
  • Temporarily pin fastparquet in CI (:pr:7907) James Bourbeau_
  • Add dask.array import to progress bar docs (:pr:7910) Fabian Gebhart_
  • Use separate files for each DataFrame API function and method (:pr:7890) Julia Signell_
  • Fix pyarrow-dataset ordering bug (:pr:7902) Richard (Rick) Zamora_
  • Generalize unique aggregate (:pr:7892) GALI PREM SAGAR_
  • Raise NotImplementedError when using pd.Grouper (:pr:7857) Ruben van de Geer_
  • Add aggregate_files argument to enable multi-file partitions in read_parquet (:pr:7557) Richard (Rick) Zamora_
  • Un-xfail test_daily_stock (:pr:7895) James Bourbeau_
  • Update access configuration docs (:pr:7837) Naty Clementi_
  • Use packaging for version comparisons (:pr:7820) Elliott Sales de Andrade_
  • Handle infinite loops in merge_asof (:pr:7842) gerrymanoim_

.. _v2021.07.0:

2021.07.0

Released on July 9, 2021

  • Include fastparquet in upstream CI build (:pr:7884) James Bourbeau_
  • Blockwise: handle non-string constant dependencies (:pr:7849) Mads R. B. Kristensen_
  • fastparquet now supports new time types, including ns precision (:pr:7880) Martin Durant_
  • Avoid ParquetDataset API when appending in ArrowDatasetEngine (:pr:7544) Richard (Rick) Zamora_
  • Add retry logic to test_shuffle_priority (:pr:7879) Richard (Rick) Zamora_
  • Use strict channel priority in CI (:pr:7878) James Bourbeau_
  • Support nested dask.distributed imports (:pr:7866) Matthew Rocklin_
  • Should check module name only, not the entire directory filepath (:pr:7856) Genevieve Buckley_
  • Updates due to https://github.com/dask/fastparquet/pull/623 (:pr:7875) Martin Durant_
  • da.eye fix for chunks=-1 (:pr:7854) Naty Clementi_
  • Temporarily xfail test_daily_stock (:pr:7858) James Bourbeau_
  • Set priority annotations in SimpleShuffleLayer (:pr:7846) Richard (Rick) Zamora_
  • Blockwise: stringify constant key inputs (:pr:7838) Mads R. B. Kristensen_
  • Allow mixing dask and numpy arrays in @guvectorize (:pr:6863) Julia Signell_
  • Don't sample dict result of a shuffle group when calculating its size (:pr:7834) Florian Jetter_
  • Fix scipy tests (:pr:7841) Julia Signell_
  • Deterministically tokenize datetime.date (:pr:7836) James Bourbeau_
  • Add sample_rows to read_csv-like (:pr:7825) Martin Durant_
  • Fix typo in config.deserialize docstring (:pr:7830) Geoffrey Lentner_
  • Remove warning filter in test_dataframe_picklable (:pr:7822) James Bourbeau_
  • Improvements to histogramdd (for handling inputs that are sequences-of-arrays). (:pr:7634) Doug Davis_
  • Make PY_VERSION private (:pr:7824) James Bourbeau_

.. _v2021.06.2:

2021.06.2

Released on June 22, 2021

  • layers.py compare parts_out with set(self.parts_out) (:pr:7787) Genevieve Buckley_
  • Make check_meta understand pandas dtypes better (:pr:7813) Julia Signell_
  • Remove "Educational Resources" doc page (:pr:7818) James Bourbeau_

.. _v2021.06.1:

2021.06.1

Released on June 18, 2021

  • Replace funding page with 'Supported By' section on dask.org (:pr:7817) James Bourbeau_
  • Add initial deprecation utilities (:pr:7810) James Bourbeau_
  • Enforce dtype conservation in ufuncs that explicitly use dtype= (:pr:7808) Doug Davis_
  • Add Coiled to list of paid support organizations (:pr:7811) Kristopher Overholt_
  • Small tweaks to the HTML repr for Layer & HighLevelGraph (:pr:7812) Genevieve Buckley_
  • Add dark mode support to HLG HTML repr (:pr:7809) Jacob Tomlinson_
  • Remove compatibility entries for old distributed (:pr:7801) Elliott Sales de Andrade_
  • Implementation of HTML repr for HighLevelGraph layers (:pr:7763) Genevieve Buckley_
  • Update default blockwise token to avoid DataFrame column name clash (:pr:6546) James Bourbeau_
  • Use dispatch concat for merge_asof (:pr:7806) Julia Signell_
  • Fix upstream freq tests (:pr:7795) Julia Signell_
  • Use more context managers from the standard library (:pr:7796) James Bourbeau_
  • Simplify skips in parquet tests (:pr:7802) Elliott Sales de Andrade_
  • Remove check for outdated bokeh (:pr:7804) Elliott Sales de Andrade_
  • More test coverage uploads (:pr:7799) James Bourbeau_
  • Remove ImportError catching from dask/__init__.py (:pr:7797) James Bourbeau_
  • Allow DataFrame.join() to take a list of DataFrames to merge with (:pr:7578) Krishan Bhasin_
  • Fix maximum recursion depth exception in dask.array.linspace (:pr:7667) Daniel Mesejo-León_
  • Fix docs links (:pr:7794) Julia Signell_
  • Initial da.select() implementation and test (:pr:7760) Gabriel Miretti_
  • Layers must implement get_output_keys method (:pr:7790) Genevieve Buckley_
  • Don't include or expect freq in divisions (:pr:7785) Julia Signell_
  • A HighLevelGraph abstract layer for map_overlap (:pr:7595) Genevieve Buckley_
  • Always include kwarg name in drop (:pr:7784) Julia Signell_
  • Only rechunk for median if needed (:pr:7782) Julia Signell_
  • Add add_(prefix|suffix) to DataFrame and Series (:pr:7745) tsuga_
  • Move read_hdf to Blockwise (:pr:7625) Richard (Rick) Zamora_
  • Make Layer.get_output_keys officially an abstract method (:pr:7775) Genevieve Buckley_
  • Non-dask-arrays and broadcasting in ravel_multi_index (:pr:7594) Gabe Joseph_
  • Fix for paths ending with "/" in parquet overwrite (:pr:7773) Martin Durant_
  • Fixing calling .visualize() with filename=None (:pr:7740) Freyam Mehta_
  • Generate unique names for SubgraphCallable (:pr:7637) Bruce Merry_
  • Pin fsspec to 2021.5.0 in CI (:pr:7771) James Bourbeau_
  • Evaluate graph lazily if meta is provided in from_delayed (:pr:7769) Florian Jetter_
  • Add meta support for DatetimeTZDtype (:pr:7627) gerrymanoim_
  • Add dispatch label to automatic PR labeler (:pr:7701) James Bourbeau_
  • Fix HDFS tests (:pr:7752) Julia Signell_

.. _v2021.06.0:

2021.06.0

Released on June 4, 2021

  • Remove abstract tokens from graph keys in rewrite_blockwise (:pr:7721) Richard (Rick) Zamora_
  • Ensure correct column order in csv project_columns (:pr:7761) Richard (Rick) Zamora_
  • Renamed inner loop variables to avoid duplication (:pr:7741) Boaz Mohar_
  • Do not return delayed object from to_zarr (:pr:7738) Chris Roat
  • Array: correct number of outputs in apply_gufunc (:pr:7669) Gabe Joseph_
  • Rewrite da.fromfunction with da.blockwise (:pr:7704) John A Kirkham_
  • Rename make_meta_util to make_meta (:pr:7743) GALI PREM SAGAR_
  • Repartition before shuffle if the requested partitions are less than input partitions (:pr:7715) Vibhu Jawa_
  • Blockwise: handle constant key inputs (:pr:7734) Mads R. B. Kristensen_
  • Added raise to apply_gufunc (:pr:7744) Boaz Mohar_
  • Show failing tests summary in CI (:pr:7735) Genevieve Buckley_
  • sizeof sets in Python 3.9 (:pr:7739) Mads R. B. Kristensen_
  • Warn if using pandas datetimelike string in dataframe.__getitem__ (:pr:7749) Julia Signell_
  • Highlight the client.dashboard_link (:pr:7747) Genevieve Buckley_
  • Easier link for subscribing to the Google calendar (:pr:7733) Genevieve Buckley_
  • Automatically show graph visualization in Jupyter notebooks (:pr:7716) Genevieve Buckley_
  • Add autofunction for unify_chunks in API docs (:pr:7730) James Bourbeau_

.. _v2021.05.1:

2021.05.1

Released on May 28, 2021

  • Pandas compatibility (:pr:7712) Julia Signell_
  • Fix optimize_dataframe_getitem bug (:pr:7698) Richard (Rick) Zamora_
  • Update make_meta import in docs (:pr:7713) Benjamin Zaitlen_
  • Implement da.searchsorted (:pr:7696) Tom White_
  • Fix format string in error message (:pr:7706) Jiaming Yuan_
  • Fix read_sql_table returning wrong result for single column loads (:pr:7572) c-thiel_
  • Add slack join link in support.rst (:pr:7679) Naty Clementi_
  • Remove unused alphabet variable (:pr:7700) James Bourbeau_
  • Fix meta creation incase of object (:pr:7586) GALI PREM SAGAR_
  • Add dispatch for union_categoricals (:pr:7699) GALI PREM SAGAR_
  • Consolidate array Dispatch objects (:pr:7505) James Bourbeau_
  • Move DataFrame dispatch.registers to their own file (:pr:7503) Julia Signell_
  • Fix delayed with dataclasses where init=False (:pr:7656) Julia Signell_
  • Allow a column to be named divisions (:pr:7605) Julia Signell_
  • Stack nd array with unknown chunks (:pr:7562) Chris Roat_
  • Promote the 2021 Dask User Survey (:pr:7694) Genevieve Buckley_
  • Fix typo in DataFrame.set_index() (:pr:7691) James Lamb_
  • Cleanup array API reference links (:pr:7684) David Hoese_
  • Accept axis tuple for flip to be consistent with NumPy (:pr:7675) Andrew Champion_
  • Bump pre-commit hook versions (:pr:7676) James Bourbeau_
  • Cleanup to_zarr docstring (:pr:7683) David Hoese_
  • Fix the docstring of read_orc (:pr:7678) Justus Magin_
  • Doc ipyparallel & mpi4py concurrent.futures (:pr:7665) John A Kirkham_
  • Update tests to support CuPy 9 (:pr:7671) Peter Andreas Entschev_
  • Fix some HighLevelGraph documentation inaccuracies (:pr:7662) Mads R. B. Kristensen_
  • Fix spelling in Series getitem error message (:pr:7659) Maisie Marshall_

.. _v2021.05.0:

2021.05.0

Released on May 14, 2021

  • Remove deprecated kind kwarg to comply with pandas 1.3.0 (:pr:7653) Julia Signell_
  • Fix bug in DataFrame column projection (:pr:7645) Richard (Rick) Zamora_
  • Merge global annotations when packing (:pr:7565) Mads R. B. Kristensen_
  • Avoid inplace= in pandas set_categories (:pr:7633) James Bourbeau_
  • Change the active-fusion default to False for Dask-Dataframe (:pr:7620) Richard (Rick) Zamora_
  • Array: remove extraneous code from RandomState (:pr:7487) Gabe Joseph_
  • Implement str.concat when others=None (:pr:7623) Daniel Mesejo-León_
  • Fix dask.dataframe in sandboxed environments (:pr:7601) Noah D. Brenowitz_
  • Support for cupyx.scipy.linalg (:pr:7563) Benjamin Zaitlen_
  • Move timeseries and daily-stock to Blockwise (:pr:7615) Richard (Rick) Zamora_
  • Fix bugs in broadcast join (:pr:7617) Richard (Rick) Zamora_
  • Use Blockwise for DataFrame IO (parquet, csv, and orc) (:pr:7415) Richard (Rick) Zamora_
  • Adding chunk & type information to Dask HighLevelGraph s (:pr:7309) Genevieve Buckley_
  • Add pyarrow sphinx intersphinx_mapping (:pr:7612) Ray Bell_
  • Remove skip on test freq (:pr:7608) Julia Signell_
  • Defaults in read_parquet parameters (:pr:7567) Ray Bell_
  • Remove ignore_abc_warning (:pr:7606) Julia Signell_
  • Harden DataFrame merge between column-selection and index (:pr:7575) Richard (Rick) Zamora_
  • Get rid of ignore_abc decorator (:pr:7604) Julia Signell_
  • Remove kwarg validation for bokeh (:pr:7597) Julia Signell_
  • Add loky example (:pr:7590) Naty Clementi_
  • Delayed: nout when arguments become tasks (:pr:7593) Gabe Joseph_
  • Update distributed version in mindep CI build (:pr:7602) James Bourbeau_
  • Support all or no overlap between partition columns and real columns (:pr:7541) Richard (Rick) Zamora_

.. _v2021.04.1:

2021.04.1

Released on April 23, 2021

  • Handle Blockwise HLG pack/unpack for concatenate=True (:pr:7455) Richard (Rick) Zamora_
  • map_partitions: use tokenized info as name of the SubgraphCallable (:pr:7524) Mads R. B. Kristensen_
  • Using tmp_path and tmpdir to avoid temporary files and directories hanging in the repo (:pr:7592) Naty Clementi_
  • Contributing to docs (development guide) (:pr:7591) Naty Clementi_
  • Add more packages to Python 3.9 CI build (:pr:7588) James Bourbeau_
  • Array: Fix NEP-18 dispatching in finalize (:pr:7508) Gabe Joseph_
  • Misc fixes for numpydoc (:pr:7569) Matthias Bussonnier_
  • Avoid pandas level= keyword deprecation (:pr:7577) James Bourbeau_
  • Map e.g. .repartition(freq="M") to .repartition(freq="MS") (:pr:7504) Ruben van de Geer_
  • Remove hash seeding in parallel CI runs (:pr:7128) Elliott Sales de Andrade_
  • Add defaults in parameters in to_parquet (:pr:7564) Ray Bell_
  • Simplify transpose axes cleanup (:pr:7561) Julia Signell_
  • Make ValueError in len(index_names) > 1 explicit it's using fastparquet (:pr:7556) Ray Bell_
  • Fix dict-column appending for pyarrow parquet engines (:pr:7527) Richard (Rick) Zamora_
  • Add a documentation auto label (:pr:7560) Doug Davis_
  • Add dask.delayed.Delayed to docs so it can be referenced by other sphinx docs (:pr:7559) Doug Davis_
  • Fix upstream idxmaxmin for uneven split_every (:pr:7538) Julia Signell_
  • Make normalize_token for pandas Series/DataFrame future proof (no direct block access) (:pr:7318) Joris Van den Bossche_
  • Redesigned __setitem__ implementation (:pr:7393) David Hassell_
  • histogram, histogramdd improvements (docs; return consistencies) (:pr:7520) Doug Davis_
  • Force nightly pyarrow in the upstream build (:pr:7530) Joris Van den Bossche_
  • Fix Configuration Reference (:pr:7533) Benjamin Zaitlen_
  • Use .to_parquet on dask.dataframe in doc string (:pr:7528) Ray Bell_
  • Avoid double msgpack serialization of HLGs (:pr:7525) Mads R. B. Kristensen_
  • Encourage usage of yaml.safe_load() in configuration doc (:pr:7529) Hristo Georgiev_
  • Fix reshape bug. Add relevant test. Fixes #7171. (:pr:7523) JSKenyon_
  • Support custom_metadata= argument in to_parquet (:pr:7359) Richard (Rick) Zamora_
  • Clean some documentation warnings (:pr:7518) Daniel Mesejo-León_
  • Getting rid of more docs warnings (:pr:7426) Julia Signell_
  • Added product (alias of prod) (:pr:7517) Freyam Mehta_
  • Fix upstream __array_ufunc__ tests (:pr:7494) Julia Signell_
  • Escape from map_overlap to map_blocks if depth is zero (:pr:7481) Genevieve Buckley_
  • Add check_type to array assert_eq (:pr:7491) Julia Signell_

.. _v2021.04.0:

2021.04.0

Released on April 2, 2021

  • Adding support for multidimensional histograms with dask.array.histogramdd (:pr:7387) Doug Davis_
  • Update docs on number of threads and workers in default LocalCluster (:pr:7497) cameron16_
  • Add labels automatically when certain files are touched in a PR (:pr:7506) Julia Signell_
  • Extract ignore_order from kwargs (:pr:7500) GALI PREM SAGAR_
  • Only provide installation instructions when distributed is missing (:pr:7498) Matthew Rocklin_
  • Start adding isort (:pr:7370) Julia Signell_
  • Add ignore_order parameter in dd.concat (:pr:7473) Daniel Mesejo-León_
  • Use powers-of-two when displaying RAM (:pr:7484) crusaderky_
  • Added License Classifier (:pr:7485) Tom Augspurger_
  • Replace conda with mamba (:pr:7227) crusaderky_
  • Fix typo in array docs (:pr:7478) James Lamb_
  • Use concurrent.futures in local scheduler (:pr:6322) John A Kirkham_

.. _v2021.03.1:

2021.03.1

Released on March 26, 2021

  • Add a dispatch for is_categorical_dtype to handle non-pandas objects (:pr:7469) brandon-b-miller_
  • Use multiprocessing.Pool in test_read_text (:pr:7472) John A Kirkham_
  • Add missing meta kwarg to gufunc class (:pr:7423) Peter Andreas Entschev_
  • Example for memory-mapped Dask array (:pr:7380) Dieter Weber_
  • Fix NumPy upstream failures xfail pandas and fastparquet failures (:pr:7441) Julia Signell_
  • Fix bug in repartition with freq (:pr:7357) Ruben van de Geer_
  • Fix __array_function__ dispatching for tril/triu (:pr:7457) Peter Andreas Entschev_
  • Use concurrent.futures.Executors in a few tests (:pr:7429) John A Kirkham_
  • Require NumPy >=1.16 (:pr:7383) crusaderky_
  • Minor sort_values housekeeping (:pr:7462) Ryan Williams_
  • Ensure natural sort order in parquet part paths (:pr:7249) Ryan Williams_
  • Remove global env mutation upon running test_config.py (:pr:7464) Hristo Georgiev_
  • Update NumPy intersphinx URL (:pr:7460) Gabe Joseph_
  • Add rot90 (:pr:7440) Trevor Manz_
  • Update docs for required package for endpoint (:pr:7454) Nick Vazquez_
  • Master -> main in slice_array docstring (:pr:7453) Gabe Joseph_
  • Expand dask.utils.is_arraylike docstring (:pr:7445) Doug Davis_
  • Simplify BlockwiseIODeps importing (:pr:7420) Richard (Rick) Zamora_
  • Update layer annotation packing method (:pr:7430) James Bourbeau_
  • Drop duplicate test in test_describe_empty (:pr:7431) John A Kirkham_
  • Add Series.dot method to dataframe module (:pr:7236) Madhu94_
  • Added df kurtosis-method and testing (:pr:7273) Jan Borchmann_
  • Avoid quadratic-time performance for HLG culling (:pr:7403) Bruce Merry_
  • Temporarily skip problematic sparse test (:pr:7421) James Bourbeau_
  • Update some CI workflow names (:pr:7422) James Bourbeau_
  • Fix HDFS test (:pr:7418) Julia Signell_
  • Make changelog subtitles match the hierarchy (:pr:7419) Julia Signell_
  • Add support for normalize in value_counts (:pr:7342) Julia Signell_
  • Avoid unnecessary imports for HLG Layer unpacking and materialization (:pr:7381) Richard (Rick) Zamora_
  • Bincount fix slicing (:pr:7391) Genevieve Buckley_
  • Add sliding_window_view (:pr:7234) Deepak Cherian_
  • Fix typo in docs/source/develop.rst (:pr:7414) Hristo Georgiev_
  • Switch documentation builds for PRs to readthedocs (:pr:7397) James Bourbeau_
  • Adds sort_values to dask.DataFrame (:pr:7286) gerrymanoim_
  • Pin sqlalchemy<1.4.0 in CI (:pr:7405) James Bourbeau_
  • Comment fixes (:pr:7215) Ryan Williams_
  • Dead code removal / fixes (:pr:7388) Ryan Williams_
  • Use single thread for pa.Table.from_pandas calls (:pr:7347) Richard (Rick) Zamora_
  • Replace 'container' with 'image' (:pr:7389) James Lamb_
  • DOC hyperlink repartition (:pr:7394) Ray Bell_
  • Pass delimiter to fsspec in bag.read_text (:pr:7349) Martin Durant_
  • Update read_hdf default mode to "r" (:pr:7039) rs9w33_
  • Embed literals in SubgraphCallable when packing Blockwise (:pr:7353) Mads R. B. Kristensen_
  • Update test_hdf.py to not reuse file handlers (:pr:7044) rs9w33_
  • Require additional dependencies: cloudpickle, partd, fsspec, toolz (:pr:7345) Julia Signell_
  • Prepare Blockwise + IO infrastructure (:pr:7281) Richard (Rick) Zamora_
  • Remove duplicated imports from test_slicing.py (:pr:7365) Hristo Georgiev_
  • Add test deps for pip development (:pr:7360) Julia Signell_
  • Support int slicing for non-NumPy arrays (:pr:7364) Peter Andreas Entschev_
  • Automatically cancel previous CI builds (:pr:7348) James Bourbeau_
  • dask.array.asarray should handle case where xarray class is in top-level namespace (:pr:7335) Tom White_
  • HighLevelGraph length without materializing layers (:pr:7274) Gabe Joseph_
  • Drop support for Python 3.6 (:pr:7006) James Bourbeau_
  • Fix fsspec usage in create_metadata_file (:pr:7295) Richard (Rick) Zamora_
  • Change default branch from master to main (:pr:7198) Julia Signell_
  • Add Xarray to CI software environment (:pr:7338) James Bourbeau_
  • Update repartition argument name in error text (:pr:7336) Eoin Shanaghy_
  • Run upstream tests based on commit message (:pr:7329) James Bourbeau_
  • Use pytest.register_assert_rewrite on util modules (:pr:7278) Bruce Merry_
  • Add example on using specific chunk sizes in from_array() (:pr:7330) James Lamb_
  • Move NumPy skip into test (:pr:7247) Julia Signell_

.. _v2021.03.0:

2021.03.0

Released on March 5, 2021

.. note::

This is the first release with support for Python 3.9 and the
last release with support for Python 3.6
  • Bump minimum version of distributed (:pr:7328) James Bourbeau_
  • Fix percentiles_summary with dask_cudf (:pr:7325) Peter Andreas Entschev_
  • Temporarily revert recent Array.__setitem__ updates (:pr:7326) James Bourbeau_
  • Blockwise.clone (:pr:7312) crusaderky_
  • NEP-35 duck array update (:pr:7321) James Bourbeau_
  • Don't allow setting .name for array (:pr:7222) Julia Signell_
  • Use nearest interpolation for creating percentiles of integer input (:pr:7305) Kyle Barron_
  • Test exp with CuPy arrays (:pr:7322) John A Kirkham_
  • Check that computed chunks have right size and dtype (:pr:7277) Bruce Merry_
  • pytest.mark.flaky (:pr:7319) crusaderky_
  • Contributing docs: add note to pull the latest git tags before pip installing Dask (:pr:7308) Genevieve Buckley_
  • Support for Python 3.9 (:pr:7289) crusaderky_
  • Add broadcast-based merge implementation (:pr:7143) Richard (Rick) Zamora_
  • Add split_every to graph_manipulation (:pr:7282) crusaderky_
  • Typo in optimize docs (:pr:7306) Julius Busecke_
  • dask.graph_manipulation support for xarray.Dataset (:pr:7276) crusaderky_
  • Add plot width and height support for Bokeh 2.3.0 (:pr:7297) James Bourbeau_
  • Add NumPy functions tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from (:pr:6997) Illviljan_
  • Remove "cleanup" task in DataFrame on-disk shuffle (:pr:7260) Sinclair Target_
  • Use development version of distributed in CI (:pr:7279) James Bourbeau_
  • Moving high level graph pack/unpack Dask (:pr:7179) Mads R. B. Kristensen_
  • Improve performance of merge_percentiles (:pr:7172) Ashwin Srinath_
  • DOC: add dask-sql and fugue (:pr:7129) Ray Bell_
  • Example for working with categoricals and parquet (:pr:7085) McToel_
  • Adds tree reduction to bincount (:pr:7183) Thomas J. Fan_
  • Improve documentation of name in from_array (:pr:7264) Bruce Merry_
  • Fix cumsum for empty partitions (:pr:7230) Julia Signell_
  • Add map_blocks example to dask array creation docs (:pr:7221) Julia Signell_
  • Fix performance issue in dask.graph_manipulation.wait_on() (:pr:7258) crusaderky_
  • Replace coveralls with codecov.io (:pr:7246) crusaderky_
  • Pin to a particular black rev in pre-commit (:pr:7256) Julia Signell_
  • Minor typo in documentation: array-chunks.rst (:pr:7254) Magnus Nord_
  • Fix bugs in Blockwise and ShuffleLayer (:pr:7213) Richard (Rick) Zamora_
  • Fix parquet filtering bug for "pyarrow-dataset" with pyarrow-3.0.0 (:pr:7200) Richard (Rick) Zamora_
  • graph_manipulation without NumPy (:pr:7243) crusaderky_
  • Support for NEP-35 (:pr:6738) Peter Andreas Entschev_
  • Avoid running unit tests during doctest CI build (:pr:7240) James Bourbeau_
  • Run doctests on CI (:pr:7238) Julia Signell_
  • Cleanup code quality on set arithmetics (:pr:7196) crusaderky_
  • Add dask.array.delete (:pr:7125) Julia Signell_
  • Unpin graphviz now that new conda-forge recipe is built (:pr:7235) Julia Signell_
  • Don't use NumPy 1.20 from conda-forge on Mac (:pr:7211) crusaderky_
  • map_overlap: Don't rechunk axes without overlap (:pr:7233) Deepak Cherian_
  • Pin graphviz to avoid issue with latest conda-forge build (:pr:7232) Julia Signell_
  • Use html_css_files in docs for custom CSS (:pr:7220) James Bourbeau_
  • Graph manipulation: clone, bind, checkpoint, wait_on (:pr:7109) crusaderky_
  • Fix handling of filter expressions in parquet pyarrow-dataset engine (:pr:7186) Joris Van den Bossche_
  • Extend __setitem__ to more closely match numpy (:pr:7033) David Hassell_
  • Clean up Python 2 syntax (:pr:7195) crusaderky_
  • Fix regression in Delayed._length (:pr:7194) crusaderky_
  • __dask_layers__() tests and tweaks (:pr:7177) crusaderky_
  • Properly convert HighLevelGraph in multiprocessing scheduler (:pr:7191) Jim Crist-Harif_
  • Don't fail fast in CI (:pr:7188) James Bourbeau_

.. _v2021.02.0:

2021.02.0

Released on February 5, 2021

  • Add percentile support for NEP-35 (:pr:7162) Peter Andreas Entschev_
  • Added support for Float64 in column assignment (:pr:7173) Nils Braun_
  • Coarsen rechunking error (:pr:7127) Davis Bennett_
  • Fix upstream CI tests (:pr:6896) Julia Signell_
  • Revise HighLevelGraph Mapping API (:pr:7160) crusaderky_
  • Update low-level graph spec to use any hashable for keys (:pr:7163) James Bourbeau_
  • Generically rebuild a collection with different keys (:pr:7142) crusaderky_
  • Make easier to link issues in PRs (:pr:7130) Ray Bell_
  • Add dask.array.append (:pr:7146) D-Stacks_
  • Allow dask.array.ravel to accept array_like argument (:pr:7138) D-Stacks_
  • Fixes link in array design doc (:pr:7152) Thomas J. Fan_
  • Fix example of using blockwise for an outer product (:pr:7119) Bruce Merry_
  • Deprecate HighlevelGraph.dicts in favor of .layers (:pr:7145) Amit Kumar_
  • Align FastParquetEngine with pyarrow engines (:pr:7091) Richard (Rick) Zamora_
  • Merge annotations (:pr:7102) Ian Rose_
  • Simplify contents of parts list in read_parquet (:pr:7066) Richard (Rick) Zamora_
  • check_meta(): use __class__ when checking DataFrame types (:pr:7099) Mads R. B. Kristensen_
  • Cache several properties (:pr:7104) Illviljan_
  • Fix parquet getitem optimization (:pr:7106) Richard (Rick) Zamora_
  • Add cytoolz back to CI environment (:pr:7103) James Bourbeau_

.. _v2021.01.1:

2021.01.1

Released on January 22, 2021

  • Partially fix cumprod (:pr:7089) Julia Signell_
  • Test pandas 1.1.x / 1.2.0 releases and pandas nightly (:pr:6996) Joris Van den Bossche_
  • Use assign to avoid SettingWithCopyWarning (:pr:7092) Julia Signell_
  • 'mode' argument passed to bokeh.output_file() (:pr:7034) (:pr:7075) patquem_
  • Skip empty partitions when doing groupby.value_counts (:pr:7073) Julia Signell_
  • Add error messages to assert_eq() (:pr:7083) James Lamb_
  • Make cached properties read-only (:pr:7077) Illviljan_

.. _v2021.01.0:

2021.01.0

Released on January 15, 2021

  • map_partitions with review comments (:pr:6776) Kumar Bharath Prabhu_
  • Make sure that population is a real list (:pr:7027) Julia Signell_
  • Propagate storage_options in read_csv (:pr:7074) Richard (Rick) Zamora_
  • Remove all BlockwiseIO code (:pr:7067) Richard (Rick) Zamora_
  • Fix CI (:pr:7069) James Bourbeau_
  • Add option to control rechunking in reshape (:pr:6753) Tom Augspurger_
  • Fix linalg.lstsq for complex inputs (:pr:7056) Johnnie Gray_
  • Add compression='infer' default to read_csv (:pr:6960) Richard (Rick) Zamora_
  • Revert parameter changes in svd_compressed #7003 (:pr:7004) Eric Czech_
  • Skip failing s3 test (:pr:7064) Martin Durant_
  • Revert BlockwiseIO (:pr:7048) Richard (Rick) Zamora_
  • Add some cross-references to DataFrame.to_bag() and Series.to_bag() (:pr:7049) Rob Malouf_
  • Rewrite matmul as blockwise without contraction/concatenate (:pr:7000) Rafal Wojdyla_
  • Use functools.cached_property in da.shape (:pr:7023) Illviljan_
  • Use meta value in series non_empty (:pr:6976) Julia Signell_
  • Revert "Temporarly pin sphinx version to 3.3.1 (:pr:7002)" (:pr:7014) Rafal Wojdyla_
  • Revert python-graphviz pinning (:pr:7037) Julia Signell_
  • Accidentally committed print statement (:pr:7038) Julia Signell_
  • Pass dropna and observed in agg (:pr:6992) Julia Signell_
  • Add index to meta after .str.split with expand (:pr:7026) Ruben van de Geer_
  • CI: test pyarrow 2.0 and nightly (:pr:7030) Joris Van den Bossche_
  • Temporarily pin python-graphviz in CI (:pr:7031) James Bourbeau_
  • Underline section in numpydoc (:pr:7013) Matthias Bussonnier_
  • Keep normal optimizations when adding custom optimizations (:pr:7016) Matthew Rocklin_
  • Temporarily pin sphinx version to 3.3.1 (:pr:7002) Rafal Wojdyla_
  • DOC: Misc formatting (:pr:6998) Matthias Bussonnier_
  • Add inline_array option to from_array (:pr:6773) Tom Augspurger_
  • Revert "Initial pass at blockwise array creation routines (:pr:6931)" (:pr:6995) James Bourbeau`_
  • Set npartitions in set_index (:pr:6978) Julia Signell_
  • Upstream config serialization and inheritance (:pr:6987) Jacob Tomlinson_
  • Bump the minimum time in test_minimum_time (:pr:6988) Martin Durant_
  • Fix pandas dtype inference for read_parquet (:pr:6985) Richard (Rick) Zamora_
  • Avoid data loss in set_index with sorted=True (:pr:6980) Richard (Rick) Zamora_
  • Bugfix in read_parquet for handling un-named indices with index=False (:pr:6969) Richard (Rick) Zamora_
  • Use __class__ when comparing meta data (:pr:6981) Mads R. B. Kristensen_
  • Comparing string versions won't always work (:pr:6979) Rafal Wojdyla_
  • Fix :pr:6925 (:pr:6982) sdementen_
  • Initial pass at blockwise array creation routines (:pr:6931) Ian Rose_
  • Simplify has_parallel_type() (:pr:6927) Mads R. B. Kristensen_
  • Handle annotation unpacking in BlockwiseIO (:pr:6934) Simon Perkins_
  • Avoid deprecated yield_fixture in test_sql.py (:pr:6968) Richard (Rick) Zamora_
  • Remove bad graph logic in BlockwiseIO (:pr:6933) Richard (Rick) Zamora_
  • Get config item if variable is None (:pr:6862) Jacob Tomlinson_
  • Update from_pandas docstring (:pr:6957) Richard (Rick) Zamora_
  • Prevent fuse_roots from clobbering annotations (:pr:6955) Simon Perkins_

.. _v2020.12.0:

2020.12.0

Released on December 10, 2020

Highlights ^^^^^^^^^^

  • Switched to CalVer <https://calver.org/>_ for versioning scheme.
  • Introduced new APIs for HighLevelGraph to enable sending high-level representations of task graphs to the distributed scheduler.
  • Introduced new HighLevelGraph layer objects including BasicLayer, Blockwise, BlockwiseIO, ShuffleLayer, and more.
  • Added support for applying custom Layer-level annotations like priority, retries, etc. with the dask.annotations context manager.
  • Updated minimum supported version of pandas to 0.25.0 and NumPy to 1.15.1.
  • Support for the pyarrow.dataset API to read_parquet.
  • Several fixes to Dask Array's SVD.

All changes ^^^^^^^^^^^

  • Make observed kwarg optional (:pr:6952) Julia Signell_
  • Min supported pandas 0.25.0 numpy 1.15.1 (:pr:6895) Julia Signell_
  • Make order of categoricals unambiguous (:pr:6949) Julia Signell_
  • Improve "pyarrow-dataset" statistics performance for read_parquet (:pr:6918) Richard (Rick) Zamora_
  • Add observed keyword to groupby (:pr:6854) Julia Signell_
  • Make sure include_path_column works when there are multiple partitions per file (:pr:6911) Julia Signell_
  • Fix: array.overlap and array.map_overlap block sizes are incorrect when depth is an unsigned bit type (:pr:6909) GFleishman_
  • Fix syntax error in HLG docs example (:pr:6946) Mark_
  • Return a Bag from sample (:pr:6941) Shang Wang_
  • Add ravel_multi_index (:pr:6939) Illviljan_
  • Enable parquet metadata collection in parallel (:pr:6921) Richard (Rick) Zamora_
  • Avoid using _file in progressbar if it is None (:pr:6938) Mark Harfouche_
  • Add Zarr to upstream CI build (:pr:6932) James Bourbeau_
  • Introduce BlockwiseIO layer (:pr:6878) Richard (Rick) Zamora_
  • Transmit Layer Annotations to Scheduler (:pr:6889) Simon Perkins_
  • Update opportunistic caching page to remove experimental warning (:pr:6926) Timost_
  • Allow pyarrow >2.0.0 (:pr:6772) Richard (Rick) Zamora_
  • Support pyarrow.dataset API for read_parquet (:pr:6534) Richard (Rick) Zamora_
  • Add more informative error message to da.coarsen when coarsening factors do not divide shape (:pr:6908) Davis Bennett_
  • Only run the cron CI on dask/dask not forks (:pr:6905) Jacob Tomlinson_
  • Add annotations to ShuffleLayers (:pr:6913) Matthew Rocklin_
  • Temporarily xfail test_from_s3 (:pr:6915) James Bourbeau_
  • Added dataframe skew method (:pr:6881) Jan Borchmann_
  • Fix dtype in array meta (:pr:6893) Julia Signell_
  • Missing name arg in helm install ... (:pr:6903) Ruben van de Geer_
  • Fix: exception when reading an item with filters (:pr:6901) Martin Durant_
  • Add support for cupyx sparse to dask.array.dot (:pr:6846) Akira Naruse_
  • Pin array mindeps up a bit to get the tests to pass [test-mindeps] (:pr:6894) Julia Signell_
  • Update/remove pandas and numpy in mindeps (:pr:6888) Julia Signell_
  • Fix ArrowEngine bug in use of clear_known_categories (:pr:6887) Richard (Rick) Zamora_
  • Fix documentation about task scheduler (:pr:6879) Zhengnan Zhao_
  • Add human relative time formatting utility (:pr:6883) Jacob Tomlinson_
  • Possible fix for 6864 set_index issue (:pr:6866) Richard (Rick) Zamora_
  • BasicLayer: remove dependency arguments (:pr:6859) Mads R. B. Kristensen_
  • Serialization of Blockwise (:pr:6848) Mads R. B. Kristensen_
  • Address columns=[] bug (:pr:6871) Richard (Rick) Zamora_
  • Avoid duplicate parquet schema communication (:pr:6841) Richard (Rick) Zamora_
  • Add create_metadata_file utility for existing parquet datasets (:pr:6851) Richard (Rick) Zamora_
  • Improve ordering for workloads with a common terminus (:pr:6779) Tom Augspurger_
  • Stringify utilities (:pr:6852) Mads R. B. Kristensen_
  • Add keyword overwrite=True to to_parquet to remove dangling files when overwriting a pyarrow Dataset. (:pr:6825) Greg Hayes_
  • Removed map_tasks() and map_basic_layers() (:pr:6853) Mads R. B. Kristensen_
  • Introduce QR iteration to svd_compressed (:pr:6813) RogerMoens_
  • __dask_distributed_pack__() now takes a client argument (:pr:6850) Mads R. B. Kristensen_
  • Use map_partitions instead of delayed in set_index (:pr:6837) Mads R. B. Kristensen_
  • Add doc hit for as_completed().update(futures) (:pr:6817) manuels_
  • Bump GHA setup-miniconda version (:pr:6847) Jacob Tomlinson_
  • Remove nans when setting sorted index (:pr:6829) Rockwell Weiner_
  • Fix transpose of u in SVD (:pr:6799) RogerMoens_
  • Migrate to GitHub Actions (:pr:6794) Jacob Tomlinson_
  • Fix sphinx currentmodule usage (:pr:6839) James Bourbeau_
  • Fix minimum dependencies CI builds (:pr:6838) James Bourbeau_
  • Avoid graph materialization during Blockwise culling (:pr:6815) Richard (Rick) Zamora_
  • Fixed typo (:pr:6834) Devanshu Desai_
  • Use HighLevelGraph.merge in collections_to_dsk (:pr:6836) Mads R. B. Kristensen_
  • Respect dtype in svd compression_matrix #2849 (:pr:6802) RogerMoens_
  • Add blocksize to task name (:pr:6818) Julia Signell_
  • Check for all-NaN partitions (:pr:6821) Rockwell Weiner_
  • Change "institutional" SQL doc section to point to main SQL doc (:pr:6823) Martin Durant_
  • Fix: DataFrame.join doesn't accept Series as other (:pr:6809) David Katz_
  • Remove to_delayed operations from to_parquet (:pr:6801) Richard (Rick) Zamora_
  • Layer annotation docstrings improvements (:pr:6806) Simon Perkins_
  • Avro reader (:pr:6780) Martin Durant_
  • Rechunk array if smallest chunk size is smaller than depth (:pr:6708) Julia Signell_
  • Add Layer Annotations (:pr:6767) Simon Perkins_
  • Add "view code" links to documentation (:pr:6793) manuels_
  • Add optional IO-subgraph to Blockwise Layers (:pr:6715) Richard (Rick) Zamora_
  • Add high level graph pack/unpack for distributed (:pr:6786) Mads R. B. Kristensen_
  • Add missing methods of the Dataframe API (:pr:6789) Stephannie Jimenez Gacha_
  • Add doc on managing environments (:pr:6778) Martin Durant_
  • HLG: get_all_external_keys() (:pr:6774) Mads R. B. Kristensen_
  • Avoid rechunking in reshape with chunksize=1 (:pr:6748) Tom Augspurger_
  • Try to make categoricals work on join (:pr:6205) Julia Signell_
  • Fix some minor typos and trailing whitespaces in array-slice.rst (:pr:6771) Magnus Nord_
  • Bugfix for parquet metadata writes of empty dataframe partitions (pyarrow) (:pr:6741) Callum Noble_
  • Document meta kwarg in map_blocks and map_overlap. (:pr:6763) Peter Andreas Entschev_
  • Begin experimenting with parallel prefix scan for cumsum and cumprod (:pr:6675) Erik Welch_
  • Clarify differences in boolean indexing between dask and numpy arrays (:pr:6764) Illviljan_
  • Efficient serialization of shuffle layers (:pr:6760) James Bourbeau_
  • Config array optimize to skip fusion and return a HLG (:pr:6751) Mads R. B. Kristensen_
  • Temporarily use pyarrow<2 in CI (:pr:6759) James Bourbeau_
  • Fix meta for min/max reductions (:pr:6736) Peter Andreas Entschev_
  • Add 2D possibility to da.linalg.lstsq - mirroring numpy (:pr:6749) Pascal Bourgault_
  • CI: Fixed bug causing flaky test failure in pivot (:pr:6752) Tom Augspurger_
  • Serialization of layers (:pr:6693) Mads R. B. Kristensen_
  • Add attrs property to Series/Dataframe (:pr:6742) Illviljan_
  • Removed Mutable Default Argument (:pr:6747) Mads R. B. Kristensen_
  • Adjust parquet ArrowEngine to allow more easy subclass for writing (:pr:6505) Joris Van den Bossche_
  • Add ShuffleStage HLG Layer (:pr:6650) Richard (Rick) Zamora_
  • Handle literal in meta_from_array (:pr:6731) Peter Andreas Entschev_
  • Do balanced rechunking even if chunks are the same (:pr:6735) Chris Roat_
  • Fix docstring DataFrame.set_index (:pr:6739) Gil Forsyth_
  • Ensure HighLevelGraph layers always contain Layer instances (:pr:6716) James Bourbeau_
  • Map on HighLevelGraph Layers (:pr:6689) Mads R. B. Kristensen_
  • Update overlap *_like function calls and CuPy tests (:pr:6728) Peter Andreas Entschev_
  • Fixes for svd with __array_function__ (:pr:6727) Peter Andreas Entschev_
  • Added doctest extension for documentation (:pr:6397) Jim Circadian_
  • Minor fix to #5628 using @pentschev's suggestion (:pr:6724) John A Kirkham_
  • Change type of Dask array when meta type changes (:pr:5628) Matthew Rocklin_
  • Add az (:pr:6719) Ray Bell_
  • HLG: get_dependencies() of single keys (:pr:6699) Mads R. B. Kristensen_
  • Revert "Revert "Use HighLevelGraph layers everywhere in collections (:pr:6510)" (:pr:6697)" (:pr:6707) Tom Augspurger_
  • Allow *_like array creation functions to respect input array type (:pr:6680) Genevieve Buckley_
  • Update dask-sphinx-theme version (:pr:6700) Gil Forsyth_

.. _v2.30.0 / 2020-10-06:

2.30.0 / 2020-10-06

Array ^^^^^

  • Allow rechunk to evenly split into N chunks (:pr:6420) Scott Sievert_

.. _v2.29.0 / 2020-10-02:

2.29.0 / 2020-10-02

Array ^^^^^

  • _repr_html_: color sides darker instead of drawing all the lines (:pr:6683) Julia Signell_
  • Removes warning from nanstd and nanvar (:pr:6667) Thomas J. Fan_
  • Get shape of output from original array - map_overlap (:pr:6682) Julia Signell_
  • Replace np.searchsorted with bisect in indexing (:pr:6669) Joachim B Haga_

Bag ^^^

  • Make sure subprocesses have a consistent hash for bag groupby (:pr:6660) Itamar Turner-Trauring_

Core ^^^^

  • Revert "Use HighLevelGraph layers everywhere in collections (:pr:6510)" (:pr:6697) Tom Augspurger_
  • Use pandas.testing (:pr:6687) John A Kirkham_
  • Improve 128-bit floating-point skip in tests (:pr:6676) Elliott Sales de Andrade_

DataFrame ^^^^^^^^^

  • Allow setting dataframe items using a bool dataframe (:pr:6608) Julia Signell_

Documentation ^^^^^^^^^^^^^

  • Fix typo (:pr:6692) garanews_
  • Fix a few typos (:pr:6678) Pav A_

.. _v2.28.0 / 2020-09-25:

2.28.0 / 2020-09-25

Array ^^^^^

  • Partially reverted changes to Array indexing that produces large changes. This restores the behavior from Dask 2.25.0 and earlier, with a warning when large chunks are produced. A configuration option is provided to avoid creating the large chunks, see :ref:array.slicing.efficiency. (:pr:6665) Tom Augspurger_
  • Add meta to to_dask_array (:pr:6651) Kyle Nicholson_
  • Fix :pr:6631 and :pr:6611 (:pr:6632) Rafal Wojdyla_
  • Infer object in array reductions (:pr:6629) Daniel Saxton_
  • Adding v_based flag for svd_flip (:pr:6658) Eric Czech_
  • Fix flakey array mean (:pr:6656) Sam Grayson_

Core ^^^^

  • Removed dsk equality check from SubgraphCallable.__eq__ (:pr:6666) Mads R. B. Kristensen_
  • Use HighLevelGraph layers everywhere in collections (:pr:6510) Mads R. B. Kristensen_
  • Adds hash dunder method to SubgraphCallable for caching purposes (:pr:6424) Andrew Fulton_
  • Stop writing commented out config files by default (:pr:6647) Matthew Rocklin_

DataFrame ^^^^^^^^^

  • Add support for collect list aggregation via agg API (:pr:6655) Madhur Tandon_
  • Slightly better error message (:pr:6657) Julia Signell_

.. _v2.27.0 / 2020-09-18:

2.27.0 / 2020-09-18

Array ^^^^^

  • Preserve dtype in svd (:pr:6643) Eric Czech_

Core ^^^^

  • store(): create a single HLG layer (:pr:6601) Mads R. B. Kristensen_
  • Add pre-commit CI build (:pr:6645) James Bourbeau_
  • Update .pre-commit-config to latest black. (:pr:6641) Julia Signell_
  • Update super usage to remove Python 2 compatibility (:pr:6630) Poruri Sai Rahul_
  • Remove u string prefixes (:pr:6633) Poruri Sai Rahul_

DataFrame ^^^^^^^^^

  • Improve error message for to_sql (:pr:6638) Julia Signell_
  • Use empty list as categories (:pr:6626) Julia Signell_

Documentation ^^^^^^^^^^^^^

  • Add autofunction to array api docs for more ufuncs (:pr:6644) James Bourbeau_
  • Add a number of missing ufuncs to dask.array docs (:pr:6642) Ralf Gommers_
  • Add HelmCluster docs (:pr:6290) Jacob Tomlinson_

.. _v2.26.0 / 2020-09-11:

2.26.0 / 2020-09-11

Array ^^^^^

  • Backend-aware dtype inference for single-chunk svd (:pr:6623) Eric Czech_
  • Make array.reduction docstring match for dtype (:pr:6624) Martin Durant_
  • Set lower bound on compression level for svd_compressed using rows and cols (:pr:6622) Eric Czech_
  • Improve SVD consistency and small array handling (:pr:6616) Eric Czech_
  • Add svd_flip #6599 (:pr:6613) Eric Czech_
  • Handle sequences containing dask Arrays (:pr:6595) Gabe Joseph_
  • Avoid large chunks from getitem with lists (:pr:6514) Tom Augspurger_
  • Eagerly slice numpy arrays in from_array (:pr:6605) Deepak Cherian_
  • Restore ability to pickle dask arrays (:pr:6594) Noah D. Brenowitz_
  • Add SVD support for short-and-fat arrays (:pr:6591) Eric Czech_
  • Add simple chunk type registry and defer as appropriate to upcast types (:pr:6393) Jon Thielen_
  • Align coarsen chunks by default (:pr:6580) Deepak Cherian_
  • Fixup reshape on unknown dimensions and other testing fixes (:pr:6578) Ryan Williams_

Core ^^^^

  • Add validation and fixes for HighLevelGraph dependencies (:pr:6588) Mads R. B. Kristensen_
  • Fix linting issue (:pr:6598) Tom Augspurger_
  • Skip bokeh version 2.0.0 (:pr:6572) John A Kirkham_

DataFrame ^^^^^^^^^

  • Added bytes/row calculation when using meta (:pr:6585) McToel_
  • Handle min_count in Series.sum / prod (:pr:6618) Daniel Saxton_
  • Update DataFrame.set_index docstring (:pr:6549) Timost_
  • Always compute 0 and 1 quantiles during quantile calculations (:pr:6564) Erik Welch_
  • Fix wrong path when reading empty csv file (:pr:6573) Abdulelah Bin Mahfoodh_

Documentation ^^^^^^^^^^^^^

  • Doc: Troubleshooting dashboard 404 (:pr:6215) Kilian Lieret_
  • Fixup extraConfig example (:pr:6625) Tom Augspurger_
  • Update supported Python versions (:pr:6609) Julia Signell_
  • Document dask/daskhub helm chart (:pr:6560) Tom Augspurger_

.. _v2.25.0 / 2020-08-28:

2.25.0 / 2020-08-28

Core ^^^^

  • Compare key hashes in subs() (:pr:6559) Mads R. B. Kristensen_
  • Rerun with latest black release (:pr:6568) James Bourbeau_
  • License update (:pr:6554) Tom Augspurger_

DataFrame ^^^^^^^^^

  • Add gs read_parquet example (:pr:6548) Ray Bell_

Documentation ^^^^^^^^^^^^^

  • Remove version from documentation page names (:pr:6558) James Bourbeau_
  • Update kubernetes-helm.rst (:pr:6523) David Sheldon_
  • Stop 2020 survey (:pr:6547) Tom Augspurger_

.. _v2.24.0 / 2020-08-22:

2.24.0 / 2020-08-22

Array ^^^^^

  • Fix setting random seed in tests. (:pr:6518) Elliott Sales de Andrade_
  • Support meta in apply gufunc (:pr:6521) joshreback_
  • Replace cupy.sparse with cupyx.scipy.sparse (:pr:6530) John A Kirkham_

Dataframe ^^^^^^^^^

  • Bump up tolerance for rolling tests (:pr:6502) Julia Signell_
  • Implement DatFrame.len (:pr:6515) Tom Augspurger_
  • Infer arrow schema in to_parquet (for ArrowEngine) (:pr:6490) Richard (Rick) Zamora`_
  • Fix parquet test when no pyarrow (:pr:6524) Martin Durant_
  • Remove problematic filter arguments in ArrowEngine (:pr:6527) Richard (Rick) Zamora_
  • Avoid schema validation by default in ArrowEngine (:pr:6536) Richard (Rick) Zamora_

Core ^^^^

  • Use unpack_collections in make_blockwise_graph (:pr:6517) Thomas J. Fan_
  • Move key_split() from optimization.py to utils.py (:pr:6529) Mads R. B. Kristensen_
  • Make tests run on moto server (:pr:6528) Martin Durant_

.. _v2.23.0 / 2020-08-14:

2.23.0 / 2020-08-14

Array ^^^^^

  • Reduce np.zeros, ones, and full array size with broadcasting (:pr:6491) Matthias Bussonnier_
  • Add missing meta= for trim in map_overlap (:pr:6494) Peter Andreas Entschev_

Bag ^^^

  • Bag repartition partition size (:pr:6371) joshreback_

Core ^^^^

  • Scalar.__dask_layers__() to return self._name instead of self.key (:pr:6507) Mads R. B. Kristensen_
  • Update dependencies correctly in fuse_root optimization (:pr:6508) Mads R. B. Kristensen_

DataFrame ^^^^^^^^^

  • Adds items to dataframe (:pr:6503) Thomas J. Fan_
  • Include compression in write_table call (:pr:6499) Julia Signell_
  • Fixed warning in nonempty_series (:pr:6485) Tom Augspurger_
  • Intelligently determine partitions based on type of first arg (:pr:6479) Matthew Rocklin_
  • Fix pyarrow mkdirs (:pr:6475) Julia Signell_
  • Fix duplicate parquet output in to_parquet (:pr:6451) michaelnarodovitch_

Documentation ^^^^^^^^^^^^^

  • Fix documentation da.histogram (:pr:6439) Roberto Panai_
  • Add agg nunique example (:pr:6404) Ray Bell_
  • Fixed a few typos in the SQL docs (:pr:6489) Mike McCarty_
  • Docs for SQLing (:pr:6453) Martin Durant_

.. _v2.22.0 / 2020-07-31:

2.22.0 / 2020-07-31

Array ^^^^^

  • Compatibility for NumPy dtype deprecation (:pr:6430) Tom Augspurger_

Core ^^^^

  • Implement sizeof for some bytes-like objects (:pr:6457) John A Kirkham_
  • HTTP error for new fsspec (:pr:6446) Martin Durant_
  • When RecursionError is raised, return uuid from tokenize function (:pr:6437) Julia Signell_
  • Install deps of upstream-dev packages (:pr:6431) Tom Augspurger_
  • Use updated link in setup.cfg (:pr:6426) Zhengnan Zhao_

DataFrame ^^^^^^^^^

  • Add single quotes around column names if strings (:pr:6471) Gil Forsyth_
  • Refactor ArrowEngine for better read_parquet performance (:pr:6346) Richard (Rick) Zamora_
  • Add tolist dispatch (:pr:6444) GALI PREM SAGAR_
  • Compatibility with pandas 1.1.0rc0 (:pr:6429) Tom Augspurger_
  • Multi value pivot table (:pr:6428) joshreback_
  • Duplicate argument definitions in to_csv docstring (:pr:6411) Jun Han (Johnson) Ooi_

Documentation ^^^^^^^^^^^^^

  • Add utility to docs to convert YAML config to env vars and back (:pr:6472) Jacob Tomlinson_
  • Fix parameter server rendering (:pr:6466) Scott Sievert_
  • Fixes broken links (:pr:6403) Jim Circadian_
  • Complete parameter server implementation in docs (:pr:6449) Scott Sievert_
  • Fix typo (:pr:6436) Jack Xiaosong Xu_

.. _v2.21.0 / 2020-07-17:

2.21.0 / 2020-07-17

Array ^^^^^

  • Correct error message in array.routines.gradient() (:pr:6417) johnomotani_
  • Fix blockwise concatenate for array with some dimension=1 (:pr:6342) Matthias Bussonnier_

Bag ^^^

  • Fix bag.take example (:pr:6418) Roberto Panai_

Core ^^^^

  • Groups values in optimization pass should only be graph and keys -- not an optimization + keys (:pr:6409) Benjamin Zaitlen_
  • Call custom optimizations once, with kwargs provided (:pr:6382) Clark Zinzow_
  • Include pickle5 for testing on Python 3.7 (:pr:6379) John A Kirkham_

DataFrame ^^^^^^^^^

  • Correct typo in error message (:pr:6422) Tom McTiernan_
  • Use pytest.warns to check for UserWarning (:pr:6378) Richard (Rick) Zamora_
  • Parse bytes_per_chunk keyword from string (:pr:6370) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

  • Numpydoc formatting (:pr:6421) Matthias Bussonnier_
  • Unpin numpydoc following 1.1 release (:pr:6407) Gil Forsyth_
  • Numpydoc formatting (:pr:6402) Matthias Bussonnier_
  • Add instructions for using conda when installing code for development (:pr:6399) Ray Bell_
  • Update visualize docstrings (:pr:6383) Zhengnan Zhao_

.. _v2.20.0 / 2020-07-02:

2.20.0 / 2020-07-02

Array ^^^^^

  • Register sizeof for numpy zero-strided arrays (:pr:6343) Matthias Bussonnier_
  • Use concatenate_lookup in concatenate (:pr:6339) John A Kirkham_
  • Fix rechunking of arrays with some zero-length dimensions (:pr:6335) Matthias Bussonnier_

DataFrame ^^^^^^^^^

  • Dispatch iloc``` calls to getitem`` (:pr:6355) Gil Forsyth_
  • Handle unnamed pandas RangeIndex in fastparquet engine (:pr:6350) Richard (Rick) Zamora_
  • Preserve index when writing partitioned parquet datasets with pyarrow (:pr:6282) Richard (Rick) Zamora_
  • Use ignore_index for pandas' group_split_dispatch (:pr:6251) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • Add doc describing argument (:pr:6318) asmith26_

.. _v2.19.0 / 2020-06-19:

2.19.0 / 2020-06-19

Array ^^^^^

  • Cast chunk sizes to python int dtype (:pr:6326) Gil Forsyth_
  • Add shape=None to *_like() array creation functions (:pr:6064) Anderson Banihirwe_

Core ^^^^

  • Update expected error msg for protocol difference in fsspec (:pr:6331) Gil Forsyth_
  • Fix for floats < 1 in parse_bytes (:pr:6311) Gil Forsyth_
  • Fix exception causes all over the codebase (:pr:6308) Ram Rachum_
  • Fix duplicated tests (:pr:6303) James Lamb_
  • Remove unused testing function (:pr:6304) James Lamb_

DataFrame ^^^^^^^^^

  • Add high-level CSV Subgraph (:pr:6262) Gil Forsyth_
  • Fix ValueError when merging an index-only 1-partition dataframe (:pr:6309) Krishan Bhasin_
  • Make index.map clear divisions. (:pr:6285) Julia Signell_

Documentation ^^^^^^^^^^^^^

  • Add link to 2020 survey (:pr:6328) Tom Augspurger_
  • Update bag.rst (:pr:6317) Ben Shaver_

.. _v2.18.1 / 2020-06-09:

2.18.1 / 2020-06-09

Array ^^^^^

  • Don't try to set name on full (:pr:6299) Julia Signell_
  • Histogram: support lazy values for range/bins (another way) (:pr:6252) Gabe Joseph_

Core ^^^^

  • Fix exception causes in utils.py (:pr:6302) Ram Rachum_
  • Improve performance of HighLevelGraph construction (:pr:6293) Julia Signell_

Documentation ^^^^^^^^^^^^^

  • Now readthedocs builds unrelased features' docstrings (:pr:6295) Antonio Ercole De Luca_
  • Add asyncssh intersphinx mappings (:pr:6298) Jacob Tomlinson_

.. _v2.18.0 / 2020-06-05:

2.18.0 / 2020-06-05

Array ^^^^^

  • Cast slicing index to dask array if same shape as original (:pr:6273) Julia Signell_
  • Fix stack error message (:pr:6268) Stephanie Gott_
  • full & full_like: error on non-scalar fill_value (:pr:6129) Huite_
  • Support for multiple arrays in map_overlap (:pr:6165) Eric Czech_
  • Pad resample divisions so that edges are counted (:pr:6255) Julia Signell_

Bag ^^^

  • Random sampling of k elements from a dask bag #4799 (:pr:6239) Antonio Ercole De Luca_

DataFrame ^^^^^^^^^

  • Add dropna, sort, and ascending to sort_values (:pr:5880) Julia Signell_
  • Generalize from_dask_array (:pr:6263) GALI PREM SAGAR_
  • Add derived docstring for SeriesGroupby.nunique (:pr:6284) Julia Signell_
  • Remove NotImplementedError in resample with rule (:pr:6274) Abdulelah Bin Mahfoodh_
  • Add dd.to_sql (:pr:6038) Ryan Williams_

Documentation ^^^^^^^^^^^^^

  • Update remote data section (:pr:6258) Ray Bell_

.. _v2.17.2 / 2020-05-28:

2.17.2 / 2020-05-28

Core ^^^^

  • Re-add the complete extra (:pr:6257) Jim Crist-Harif_

DataFrame ^^^^^^^^^

  • Raise error if resample isn't going to give right answer (:pr:6244) Julia Signell_

.. _v2.17.1 / 2020-05-28:

2.17.1 / 2020-05-28

Array ^^^^^

  • Empty array rechunk (:pr:6233) Andrew Fulton_

Core ^^^^

  • Make pyyaml required (:pr:6250) Jim Crist-Harif_
  • Fix install commands from ImportError (:pr:6238) Gaurav Sheni_
  • Remove issue template (:pr:6249) Jacob Tomlinson_

DataFrame ^^^^^^^^^

  • Pass ignore_index to dd_shuffle from DataFrame.shuffle (:pr:6247) Richard (Rick) Zamora_
  • Cope with missing HDF keys (:pr:6204) Martin Durant_
  • Generalize describe & quantile apis (:pr:5137) GALI PREM SAGAR_

.. _v2.17.0 / 2020-05-26:

2.17.0 / 2020-05-26

Array ^^^^^

  • Small improvements to da.pad (:pr:6213) Mark Boer_
  • Return tuple if multiple outputs in dask.array.apply_gufunc, add test to check for tuple (:pr:6207) Kai Mühlbauer_
  • Support stack with unknown chunksizes (:pr:6195) swapna_

Bag ^^^

  • Random Choice on Bags (:pr:6208) Antonio Ercole De Luca_

Core ^^^^

  • Raise warning delayed.visualise() (:pr:6216) Amol Umbarkar_
  • Ensure other pickle arguments work (:pr:6229) John A Kirkham_
  • Overhaul fuse() config (:pr:6198) crusaderky_
  • Update dask.order.order to consider "next" nodes using both FIFO and LIFO (:pr:5872) Erik Welch_

DataFrame ^^^^^^^^^

  • Use 0 as fill_value for more agg methods (:pr:6245) Julia Signell_
  • Generalize rearrange_by_column_tasks and add DataFrame.shuffle (:pr:6066) Richard (Rick) Zamora_
  • Xfail test_rolling_numba_engine for newer numba and older pandas (:pr:6236) James Bourbeau_
  • Generalize fix_overlap (:pr:6240) GALI PREM SAGAR_
  • Fix DataFrame.shape with no columns (:pr:6237) noreentry_
  • Avoid shuffle when setting a presorted index with overlapping divisions (:pr:6226) Krishan Bhasin_
  • Adjust the Parquet engine classes to allow more easily subclassing (:pr:6211) Marius van Niekerk_
  • Fix dd.merge_asof with left_on='col' & right_index=True (:pr:6192) noreentry_
  • Disable warning for concat (:pr:6210) Tung Dang_
  • Move AUTO_BLOCKSIZE out of read_csv signature (:pr:6214) Jim Crist-Harif_
  • .loc indexing with callable (:pr:6185) Endre Mark Borza_
  • Avoid apply in _compute_sum_of_squares for groupby std agg (:pr:6186) Richard (Rick) Zamora_
  • Minor correction to test_parquet (:pr:6190) Brian Larsen_
  • Adhering to the passed pat for delimeter join and fix error message (:pr:6194) GALI PREM SAGAR_
  • Skip test_to_parquet_with_get if no parquet libs available (:pr:6188) Scott Sanderson_

Documentation ^^^^^^^^^^^^^

  • Added documentation for distributed.Event class (:pr:6231) Nils Braun_
  • Doc write to remote (:pr:6124) Ray Bell_

.. _v2.16.0 / 2020-05-08:

2.16.0 / 2020-05-08

Array ^^^^^

  • Fix array general-reduction name (:pr:6176) Nick Evans_
  • Replace dim with shape in unravel_index (:pr:6155) Julia Signell_
  • Moment: handle all elements being masked (:pr:5339) Gabe Joseph_

Core ^^^^

  • Remove Redundant string concatenations in dask code-base (:pr:6137) GALI PREM SAGAR_
  • Upstream compat (:pr:6159) Tom Augspurger_
  • Ensure sizeof of dict and sequences returns an integer (:pr:6179) James Bourbeau_
  • Estimate python collection sizes with random sampling (:pr:6154) Florian Jetter_
  • Update test upstream (:pr:6146) Tom Augspurger_
  • Skip test for mindeps build (:pr:6144) Tom Augspurger_
  • Switch default multiprocessing context to "spawn" (:pr:4003) Itamar Turner-Trauring_
  • Update manifest to include dask-schema (:pr:6140) Benjamin Zaitlen_

DataFrame ^^^^^^^^^

  • Harden inconsistent-schema handling in pyarrow-based read_parquet (:pr:6160) Richard (Rick) Zamora_
  • Add compute kwargs to methods that write data to disk (:pr:6056) Krishan Bhasin_
  • Fix issue where unique returns an index like result from backends (:pr:6153) GALI PREM SAGAR_
  • Fix internal error in map_partitions with collections (:pr:6103) Tom Augspurger_

Documentation ^^^^^^^^^^^^^

  • Add phase of computation to index TOC (:pr:6157) Benjamin Zaitlen_
  • Remove unused imports in scheduling script (:pr:6138) James Lamb_
  • Fix indent (:pr:6147) Martin Durant_
  • Add Tom's log config example (:pr:6143) Martin Durant_

.. _v2.15.0 / 2020-04-24:

2.15.0 / 2020-04-24

Array ^^^^^

  • Update dask.array.from_array to warn when passed a Dask collection (:pr:6122) James Bourbeau_
  • Un-numpy like behaviour in dask.array.pad (:pr:6042) Mark Boer_
  • Add support for repeats=0 in da.repeat (:pr:6080) James Bourbeau_

Core ^^^^

  • Fix yaml layout for schema (:pr:6132) Benjamin Zaitlen_
  • Configuration Reference (:pr:6069) Benjamin Zaitlen_
  • Add configuration option to turn off task fusion (:pr:6087) Matthew Rocklin_
  • Skip pyarrow on windows (:pr:6094) Tom Augspurger_
  • Set limit to maximum length of fused key (:pr:6057) Lucas Rademaker_
  • Add test against #6062 (:pr:6072) Martin Durant_
  • Bump checkout action to v2 (:pr:6065) James Bourbeau_

DataFrame ^^^^^^^^^

  • Generalize categorical calls to support cudf Categorical (:pr:6113) GALI PREM SAGAR_
  • Avoid reading _metadata on every worker (:pr:6017) Richard (Rick) Zamora_
  • Use group_split_dispatch and ignore_index in apply_concat_apply (:pr:6119) Richard (Rick) Zamora_
  • Handle new (dtype) pandas metadata with pyarrow (:pr:6090) Richard (Rick) Zamora_
  • Skip test_partition_on_cats_pyarrow if pyarrow is not installed (:pr:6112) James Bourbeau_
  • Update DataFrame len to handle columns with the same name (:pr:6111) James Bourbeau_
  • ArrowEngine bug fixes and test coverage (:pr:6047) Richard (Rick) Zamora_
  • Added mode (:pr:5958) Adam Lewis_

Documentation ^^^^^^^^^^^^^

  • Update "helm install" for helm 3 usage (:pr:6130) JulianWgs_
  • Extend preload documentation (:pr:6077) Matthew Rocklin_
  • Fixed small typo in DataFrame map_partitions() docstring (:pr:6115) Eugene Huang_
  • Fix typo: "double" should be times, not plus (:pr:6091) David Chudzicki_
  • Fix first line of array.random.* docs (:pr:6063) Martin Durant_
  • Add section about Semaphore in distributed (:pr:6053) Florian Jetter_

.. _v2.14.0 / 2020-04-03:

2.14.0 / 2020-04-03

Array ^^^^^

  • Added np.iscomplexobj implementation (:pr:6045) Tom Augspurger_

Core ^^^^

  • Update test_rearrange_disk_cleanup_with_exception to pass without cloudpickle installed (:pr:6052) James Bourbeau_
  • Fixed flaky test-rearrange (:pr:5977) Tom Augspurger_

DataFrame ^^^^^^^^^

  • Use _meta_nonempty for dtype casting in stack_partitions (:pr:6061) mlondschien_
  • Fix bugs in _metadata creation and filtering in parquet ArrowEngine (:pr:6023) Richard (Rick) Zamora_

Documentation ^^^^^^^^^^^^^

  • DOC: Add name caveats (:pr:6040) Tom Augspurger_

.. _v2.13.0 / 2020-03-25:

2.13.0 / 2020-03-25

Array ^^^^^

  • Support dtype and other keyword arguments in da.random (:pr:6030) Matthew Rocklin_
  • Register support for cupy sparse hstack/vstack (:pr:5735) Corey J. Nolet_
  • Force self.name to str in dask.array (:pr:6002) Chuanzhu Xu_

Bag ^^^

  • Set rename_fused_keys to None by default in bag.optimize (:pr:6000) Lucas Rademaker_

Core ^^^^

  • Copy dict in to_graphviz to prevent overwriting (:pr:5996) JulianWgs_
  • Stricter pandas xfail (:pr:6024) Tom Augspurger_
  • Fix CI failures (:pr:6013) James Bourbeau_
  • Update toolz to 0.8.2 and use tlz (:pr:5997) Ryan Grout_
  • Move Windows CI builds to GitHub Actions (:pr:5862) James Bourbeau_

DataFrame ^^^^^^^^^

  • Improve path-related exceptions in read_hdf (:pr:6032) psimaj_
  • Fix dtype handling in dd.concat (:pr:6006) mlondschien_
  • Handle cudf's leftsemi and leftanti joins (:pr:6025) Richard J Zamora_
  • Remove unused npartitions variable in dd.from_pandas (:pr:6019) Daniel Saxton_
  • Added shuffle to DataFrame.random_split (:pr:5980) petiop_

Documentation ^^^^^^^^^^^^^

  • Fix indentation in scheduler-overview docs (:pr:6022) Matthew Rocklin_
  • Update task graphs in optimize docs (:pr:5928) Julia Signell_
  • Optionally get rid of intermediary boxes in visualize, and add more labels (:pr:5976) Julia Signell_

.. _v2.12.0 / 2020-03-06:

2.12.0 / 2020-03-06

Array ^^^^^

  • Improve reuse of temporaries with numpy (:pr:5933) Bruce Merry_
  • Make map_blocks with block_info produce a Blockwise (:pr:5896) Bruce Merry_
  • Optimize make_blockwise_graph (:pr:5940) Bruce Merry_
  • Fix axes ordering in da.tensordot (:pr:5975) Gil Forsyth_
  • Adds empty mode to array.pad (:pr:5931) Thomas J. Fan_

Core ^^^^

  • Remove toolz.memoize dependency in dask.utils (:pr:5978) Ryan Grout_
  • Close pool leaking subprocess (:pr:5979) Tom Augspurger_
  • Pin numpydoc to 0.8.0 (fix double autoescape) (:pr:5961) Gil Forsyth_
  • Register deterministic tokenization for range objects (:pr:5947) James Bourbeau_
  • Unpin msgpack in CI (:pr:5930) JAmes Bourbeau_
  • Ensure dot results are placed in unique files. (:pr:5937) Elliott Sales de Andrade_
  • Add remaining optional dependencies to Travis 3.8 CI build environment (:pr:5920) James Bourbeau_

DataFrame ^^^^^^^^^

  • Skip parquet getitem optimization for some keys (:pr:5917) Tom Augspurger_
  • Add ignore_index argument to rearrange_by_column code path (:pr:5973) Richard J Zamora_
  • Add DataFrame and Series memory_usage_per_partition methods (:pr:5971) James Bourbeau_
  • xfail test_describe when using Pandas 0.24.2 (:pr:5948) James Bourbeau_
  • Implement dask.dataframe.to_numeric (:pr:5929) Julia Signell_
  • Add new error message content when columns are in a different order (:pr:5927) Julia Signell_
  • Use shallow copy for assign operations when possible (:pr:5740) Richard J Zamora_

Documentation ^^^^^^^^^^^^^

  • Changed above to below in dask.array.triu docs (:pr:5984) Henrik Andersson_
  • Array slicing: fix typo in slice_with_int_dask_array error message (:pr:5981) Gabe Joseph_
  • Grammar and formatting updates to docstrings (:pr:5963) James Lamb_
  • Update develop doc with conda option (:pr:5939) Ray Bell_
  • Update title of DataFrame extension docs (:pr:5954) James Bourbeau_
  • Fixed typos in documentation (:pr:5962) James Lamb_
  • Add original class or module as a kwarg on _bind_* methods (:pr:5946) Julia Signell_
  • Add collect list example (:pr:5938) Ray Bell_
  • Update optimization doc for python 3 (:pr:5926) Julia Signell_

.. _v2.11.0 / 2020-02-19:

2.11.0 / 2020-02-19

Array ^^^^^

  • Cache result of Array.shape (:pr:5916) Bruce Merry_
  • Improve accuracy of estimate_graph_size for rechunk (:pr:5907) Bruce Merry_
  • Skip rechunk steps that do not alter chunking (:pr:5909) Bruce Merry_
  • Support dtype and other kwargs in coarsen (:pr:5903) Matthew Rocklin_
  • Push chunk override from map_blocks into blockwise (:pr:5895) Bruce Merry_
  • Avoid using rewrite_blockwise for a singleton (:pr:5890) Bruce Merry_
  • Optimize slices_from_chunks (:pr:5891) Bruce Merry_
  • Avoid unnecessary __getitem__ in block() when chunks have correct dimensionality (:pr:5884) Thomas Robitaille_

Bag ^^^

  • Add include_path option for dask.bag.read_text (:pr:5836) Yifan Gu_
  • Fixes ValueError in delayed execution of bagged NumPy array (:pr:5828) Surya Avala_

Core ^^^^

  • CI: Pin msgpack (:pr:5923) Tom Augspurger_
  • Rename test_inner to test_outer (:pr:5922) Shiva Raisinghani_
  • quote should quote dicts too (:pr:5905) Bruce Merry_
  • Register a normalizer for literal (:pr:5898) Bruce Merry_
  • Improve layer name synthesis for non-HLGs (:pr:5888) Bruce Merry_
  • Replace flake8 pre-commit-hook with upstream (:pr:5892) Julia Signell_
  • Call pip as a module to avoid warnings (:pr:5861) Cyril Shcherbin_
  • Close ThreadPool at exit (:pr:5852) Tom Augspurger_
  • Remove dask.dataframe import in tokenization code (:pr:5855) James Bourbeau_

DataFrame ^^^^^^^^^

  • Require pandas>=0.23 (:pr:5883) Tom Augspurger_
  • Remove lambda from dataframe aggregation (:pr:5901) Matthew Rocklin_
  • Fix exception chaining in dataframe/__init__.py (:pr:5882) Ram Rachum_
  • Add support for reductions on empty dataframes (:pr:5804) Shiva Raisinghani_
  • Expose sort= argument for groupby (:pr:5801) Richard J Zamora_
  • Add df.empty property (:pr:5711) rockwellw_
  • Use parquet read speed-ups from fastparquet.api.paths_to_cats. (:pr:5821) Igor Gotlibovych_

Documentation ^^^^^^^^^^^^^

  • Deprecate doc_wraps (:pr:5912) Tom Augspurger_
  • Update array internal design docs for HighLevelGraph era (:pr:5889) Bruce Merry_
  • Move over dashboard connection docs (:pr:5877) Matthew Rocklin_
  • Move prometheus docs from distributed.dask.org (:pr:5876) Matthew Rocklin_
  • Removing duplicated DO block at the end (:pr:5878) K.-Michael Aye_
  • map_blocks see also (:pr:5874) Tom Augspurger_
  • More derived from (:pr:5871) Julia Signell_
  • Fix typo (:pr:5866) Yetunde Dada_
  • Fix typo in cloud.rst (:pr:5860) Andrew Thomas_
  • Add note pointing to code of conduct and diversity statement (:pr:5844) Matthew Rocklin_

.. _v2.10.1 / 2020-01-30:

2.10.1 / 2020-01-30

  • Fix Pandas 1.0 version comparison (:pr:5851) Tom Augspurger_
  • Fix typo in distributed diagnostics documentation (:pr:5841) Gerrit Holl_

.. _v2.10.0 / 2020-01-28:

2.10.0 / 2020-01-28

  • Support for pandas 1.0's new BooleanDtype and StringDtype (:pr:5815) Tom Augspurger_
  • Compatibility with pandas 1.0's API breaking changes and deprecations (:pr:5792) Tom Augspurger_
  • Fixed non-deterministic tokenization of some extension-array backed pandas objects (:pr:5813) Tom Augspurger_
  • Fixed handling of dataclass class objects in collections (:pr:5812) Matteo De Wint_
  • Fixed resampling with tz-aware dates when one of the endpoints fell in a non-existent time (:pr:5807) dfonnegra_
  • Delay initial Zarr dataset creation until the computation occurs (:pr:5797) Chris Roat_
  • Use parquet dataset statistics in more cases with the pyarrow engine (:pr:5799) Richard J Zamora_
  • Fixed exception in groupby.std() when some of the keys were large integers (:pr:5737) H. Thomson Comer_

.. _v2.9.2 / 2020-01-16:

2.9.2 / 2020-01-16

Array ^^^^^

  • Unify chunks in broadcast_arrays (:pr:5765) Matthew Rocklin_

Core ^^^^

  • xfail CSV encoding tests (:pr:5791) Tom Augspurger_
  • Update order to handle empty dask graph (:pr:5789) James Bourbeau_
  • Redo dask.order.order (:pr:5646) Erik Welch_

DataFrame ^^^^^^^^^

  • Add transparent compression for on-disk shuffle with partd (:pr:5786) Christian Wesp_
  • Fix repr for empty dataframes (:pr:5781) Shiva Raisinghani_
  • Pandas 1.0.0RC0 compat (:pr:5784) Tom Augspurger_
  • Remove buggy assertions (:pr:5783) Tom Augspurger_
  • Pandas 1.0 compat (:pr:5782) Tom Augspurger_
  • Fix bug in pyarrow-based read_parquet on partitioned datasets (:pr:5777) Richard J Zamora_
  • Compat for pandas 1.0 (:pr:5779) Tom Augspurger_
  • Fix groupby/mean error with with categorical index (:pr:5776) Richard J Zamora_
  • Support empty partitions when performing cumulative aggregation (:pr:5730) Matthew Rocklin_
  • set_index accepts single-item unnested list (:pr:5760) Wes Roach_
  • Fixed partitioning in set index for ordered Categorical (:pr:5715) Tom Augspurger_

Documentation ^^^^^^^^^^^^^

  • Note additional use case for normalize_token.register (:pr:5766) Thomas A Caswell_
  • Update bag repartition docstring (:pr:5772) Timost_
  • Small typos (:pr:5771) Maarten Breddels_
  • Fix typo in Task Expectations docs (:pr:5767) James Bourbeau_
  • Add docs section on task expectations to graph page (:pr:5764) Devin Petersohn_

.. _v2.9.1 / 2019-12-27:

2.9.1 / 2019-12-27

Array ^^^^^

  • Support Array.view with dtype=None (:pr:5736) Anderson Banihirwe_
  • Add dask.array.nanmedian (:pr:5684) Deepak Cherian_

Core ^^^^

  • xfail test_temporary_directory on Python 3.8 (:pr:5734) James Bourbeau_
  • Add support for Python 3.8 (:pr:5603) James Bourbeau_
  • Use id to dedupe constants in rewrite_blockwise (:pr:5696) Jim Crist_

DataFrame ^^^^^^^^^

  • Raise error when converting a dask dataframe scalar to a boolean (:pr:5743) James Bourbeau_
  • Ensure dataframe groupby-variance is greater than zero (:pr:5728) Matthew Rocklin_
  • Fix DataFrame.iter (:pr:5719) Tom Augspurger_
  • Support Parquet filters in disjunctive normal form, like PyArrow (:pr:5656) Matteo De Wint_
  • Auto-detect categorical columns in ArrowEngine-based read_parquet (:pr:5690) Richard J Zamora_
  • Skip parquet getitem optimization tests if no engine found (:pr:5697) James Bourbeau_
  • Fix independent optimization of parquet-getitem (:pr:5613) Tom Augspurger_

Documentation ^^^^^^^^^^^^^

  • Update helm config doc (:pr:5750) Ray Bell_
  • Link to examples.dask.org in several places (:pr:5733) Tom Augspurger_
  • Add missing " in performance report example (:pr:5724) James Bourbeau_
  • Resolve several documentation build warnings (:pr:5685) James Bourbeau_
  • add info on performance_report (:pr:5713) Benjamin Zaitlen_
  • Add more docs disclaimers (:pr:5710) Julia Signell_
  • Fix simple typo: wihout -> without (:pr:5708) Tim Gates_
  • Update numpydoc dependency (:pr:5694) James Bourbeau_

.. _v2.9.0 / 2019-12-06:

2.9.0 / 2019-12-06

Array ^^^^^

  • Fix da.std to work with NumPy arrays (:pr:5681) James Bourbeau_

Core ^^^^

  • Register sizeof functions for Numba and RMM (:pr:5668) John A Kirkham_
  • Update meeting time (:pr:5682) Tom Augspurger_

DataFrame ^^^^^^^^^

  • Modify dd.DataFrame.drop to use shallow copy (:pr:5675) Richard J Zamora_
  • Fix bug in _get_md_row_groups (:pr:5673) Richard J Zamora_
  • Close sqlalchemy engine after querying DB (:pr:5629) Krishan Bhasin_
  • Allow dd.map_partitions to not enforce meta (:pr:5660) Matthew Rocklin_
  • Generalize concat_unindexed_dataframes to support cudf-backend (:pr:5659) Richard J Zamora_
  • Add dataframe resample methods (:pr:5636) Benjamin Zaitlen_
  • Compute length of dataframe as length of first column (:pr:5635) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

  • Doc fixup (:pr:5665) James Bourbeau_
  • Update doc build instructions (:pr:5640) James Bourbeau_
  • Fix ADL link (:pr:5639) Ray Bell_
  • Add documentation build (:pr:5617) James Bourbeau_

.. _v2.8.1 / 2019-11-22:

2.8.1 / 2019-11-22

Array ^^^^^

  • Use auto rechunking in da.rechunk if no value given (:pr:5605) Matthew Rocklin_

Core ^^^^

  • Add simple action to activate GH actions (:pr:5619) James Bourbeau_

DataFrame ^^^^^^^^^

  • Fix "file_path_0" bug in aggregate_row_groups (:pr:5627) Richard J Zamora_
  • Add chunksize argument to read_parquet (:pr:5607) Richard J Zamora_
  • Change test_repartition_npartitions to support arch64 architecture (:pr:5620) ossdev07_
  • Categories lost after groupby + agg (:pr:5423) Oliver Hofkens_
  • Fixed relative path issue with parquet metadata file (:pr:5608) Nuno Gomes Silva_
  • Enable gpu-backed covariance/correlation in dataframes (:pr:5597) Richard J Zamora_

Documentation ^^^^^^^^^^^^^

  • Fix institutional faq and unknown doc warnings (:pr:5616) James Bourbeau_
  • Add doc for some utils (:pr:5609) Tom Augspurger_
  • Removes html_extra_path (:pr:5614) James Bourbeau_
  • Fixed See Also referencence (:pr:5612) Tom Augspurger_

.. _v2.8.0 / 2019-11-14:

2.8.0 / 2019-11-14

Array ^^^^^

  • Implement complete dask.array.tile function (:pr:5574) Bouwe Andela_
  • Add median along an axis with automatic rechunking (:pr:5575) Matthew Rocklin_
  • Allow da.asarray to chunk inputs (:pr:5586) Matthew Rocklin_

Bag ^^^

  • Use key_split in Bag name (:pr:5571) Matthew Rocklin_

Core ^^^^

  • Switch Doctests to Py3.7 (:pr:5573) Ryan Nazareth_
  • Relax get_colors test to adapt to new Bokeh release (:pr:5576) Matthew Rocklin_
  • Add dask.blockwise.fuse_roots optimization (:pr:5451) Matthew Rocklin_
  • Add sizeof implementation for small dicts (:pr:5578) Matthew Rocklin_
  • Update fsspec, gcsfs, s3fs (:pr:5588) Tom Augspurger_

DataFrame ^^^^^^^^^

  • Add dropna argument to groupby (:pr:5579) Richard J Zamora_
  • Revert "Remove import of dask_cudf, which is now a part of cudf (:pr:5568)" (:pr:5590) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

  • Add best practice for dask.compute function (:pr:5583) Matthew Rocklin_
  • Create FUNDING.yml (:pr:5587) Gina Helfrich_
  • Add screencast for coordination primitives (:pr:5593) Matthew Rocklin_
  • Move funding to .github repo (:pr:5589) Tom Augspurger_
  • Update calendar link (:pr:5569) Tom Augspurger_

.. _v2.7.0 / 2019-11-08:

2.7.0 / 2019-11-08

This release drops support for Python 3.5

Array ^^^^^

  • Reuse code for assert_eq util method (:pr:5496) Vijayant_
  • Update da.array to always return a dask array (:pr:5510) James Bourbeau_
  • Skip transpose on trivial inputs (:pr:5523) Ryan Abernathey_
  • Avoid NumPy scalar string representation in tokenize (:pr:5527) James Bourbeau_
  • Remove unnecessary tiledb shape constraint (:pr:5545) Norman Barker_
  • Removes bytes from sparse array HTML repr (:pr:5556) James Bourbeau_

Core ^^^^

  • Drop Python 3.5 (:pr:5528) James Bourbeau_
  • Update the use of fixtures in distributed tests (:pr:5497) Matthew Rocklin_
  • Changed deprecated bokeh-port to dashboard-address (:pr:5507) darindf_
  • Avoid updating with identical dicts in ensure_dict (:pr:5501) James Bourbeau_
  • Test Upstream (:pr:5516) Tom Augspurger_
  • Accelerate reverse_dict (:pr:5479) Ryan Grout_
  • Update test_imports.sh (:pr:5534) James Bourbeau_
  • Support cgroups limits on cpu count in multiprocess and threaded schedulers (:pr:5499) Albert DeFusco_
  • Update minimum pyarrow version on CI (:pr:5562) James Bourbeau_
  • Make cloudpickle optional (:pr:5511) crusaderky_

DataFrame ^^^^^^^^^

  • Add an example of index_col usage (:pr:3072) Bruno Bonfils_
  • Explicitly use iloc for row indexing (:pr:5500) Krishan Bhasin_
  • Accept dask arrays on columns assignemnt (:pr:5224) Henrique Ribeiro-
  • Implement unique and value_counts for SeriesGroupBy (:pr:5358) Scott Sievert_
  • Add sizeof definition for pyarrow tables and columns (:pr:5522) Richard J Zamora_
  • Enable row-group task partitioning in pyarrow-based read_parquet (:pr:5508) Richard J Zamora_
  • Removes npartitions='auto' from dd.merge docstring (:pr:5531) James Bourbeau_
  • Apply enforce error message shows non-overlapping columns. (:pr:5530) Tom Augspurger_
  • Optimize meta_nonempty for repetitive dtypes (:pr:5553) Petio Petrov_
  • Remove import of dask_cudf, which is now a part of cudf (:pr:5568) Mads R. B. Kristensen_

Documentation ^^^^^^^^^^^^^

  • Make capitalization more consistent in FAQ docs (:pr:5512) Matthew Rocklin_
  • Add CONTRIBUTING.md (:pr:5513) Jacob Tomlinson_
  • Document optional dependencies (:pr:5456) Prithvi MK_
  • Update helm chart docs to reflect new chart repo (:pr:5539) Jacob Tomlinson_
  • Add Resampler to API docs (:pr:5551) James Bourbeau_
  • Fix typo in read_sql_table (:pr:5554) Eric Dill_
  • Add adaptive deployments screencast [skip ci] (:pr:5566) Matthew Rocklin_

.. _v2.6.0 / 2019-10-15:

2.6.0 / 2019-10-15

Core ^^^^

  • Call ensure_dict on graphs before entering toolz.merge (:pr:5486) Matthew Rocklin_
  • Consolidating hash dispatch functions (:pr:5476) Richard J Zamora_

DataFrame ^^^^^^^^^

  • Support Python 3.5 in Parquet code (:pr:5491) Benjamin Zaitlen_
  • Avoid identity check in warn_dtype_mismatch (:pr:5489) Tom Augspurger_
  • Enable unused groupby tests (:pr:3480) Jörg Dietrich_
  • Remove old parquet and bcolz dataframe optimizations (:pr:5484) Matthew Rocklin_
  • Add getitem optimization for read_parquet (:pr:5453) Tom Augspurger_
  • Use _constructor_sliced method to determine Series type (:pr:5480) Richard J Zamora_
  • Fix map(series) for unsorted base series index (:pr:5459) Justin Waugh_
  • Fix KeyError with Groupby label (:pr:5467) Ryan Nazareth_

Documentation ^^^^^^^^^^^^^

  • Use Zoom meeting instead of appear.in (:pr:5494) Matthew Rocklin_
  • Added curated list of resources (:pr:5460) Javad_
  • Update SSH docs to include SSHCluster (:pr:5482) Matthew Rocklin_
  • Update "Why Dask?" page (:pr:5473) Matthew Rocklin_
  • Fix typos in docstrings (:pr:5469) garanews_

.. _v2.5.2 / 2019-10-04:

2.5.2 / 2019-10-04

Array ^^^^^

  • Correct chunk size logic for asymmetric overlaps (:pr:5449) Ben Jeffery_
  • Make da.unify_chunks public API (:pr:5443) Matthew Rocklin_

DataFrame ^^^^^^^^^

  • Fix dask.dataframe.fillna handling of Scalar object (:pr:5463) Zhenqing Li_

Documentation ^^^^^^^^^^^^^

  • Remove boxes in Spark comparison page (:pr:5445) Matthew Rocklin_
  • Add latest presentations (:pr:5446) Javad_
  • Update cloud documentation (:pr:5444) Matthew Rocklin_

.. _v2.5.0 / 2019-09-27:

2.5.0 / 2019-09-27

Core ^^^^

  • Add sentinel no_default to get_dependencies task (:pr:5420) James Bourbeau_
  • Update fsspec version (:pr:5415) Matthew Rocklin_
  • Remove PY2 checks (:pr:5400) Jim Crist_

DataFrame ^^^^^^^^^

  • Add option to not check meta in dd.from_delayed (:pr:5436) Christopher J. Wright_
  • Fix test_timeseries_nulls_in_schema failures with pyarrow master (:pr:5421) Richard J Zamora_
  • Reduce read_metadata output size in pyarrow/parquet (:pr:5391) Richard J Zamora_
  • Test numeric edge case for repartition with npartitions. (:pr:5433) amerkel2_
  • Unxfail pandas-datareader test (:pr:5430) Tom Augspurger_
  • Add DataFrame.pop implementation (:pr:5422) Matthew Rocklin_
  • Enable merge/set_index for cudf-based dataframes with cupy values (:pr:5322) Richard J Zamora_
  • drop_duplicates support for positional subset parameter (:pr:5410) Wes Roach_

Documentation ^^^^^^^^^^^^^

  • Add screencasts to array, bag, dataframe, delayed, futures and setup (:pr:5429) (:pr:5424) Matthew Rocklin_
  • Fix delimeter parsing documentation (:pr:5428) Mahmut Bulut_
  • Update overview image (:pr:5404) James Bourbeau_

.. _v2.4.0 / 2019-09-13:

2.4.0 / 2019-09-13

Array ^^^^^

  • Adds explicit h5py.File mode (:pr:5390) James Bourbeau_
  • Provides method to compute unknown array chunks sizes (:pr:5312) Scott Sievert_
  • Ignore runtime warning in Array compute_meta (:pr:5356) estebanag_
  • Add _meta to Array.__dask_postpersist__ (:pr:5353) Benoit Bovy_
  • Fixup da.asarray and da.asanyarray for datetime64 dtype and xarray objects (:pr:5334) Stephan Hoyer_
  • Add shape implementation (:pr:5293) Tom Augspurger_
  • Add chunktype to array text repr (:pr:5289) James Bourbeau_
  • Array.random.choice: handle array-like non-arrays (:pr:5283) Gabe Joseph_

Core ^^^^

  • Remove deprecated code (:pr:5401) Jim Crist_
  • Fix funcname when vectorized func has no __name__ (:pr:5399) James Bourbeau_
  • Truncate funcname to avoid long key names (:pr:5383) Matthew Rocklin_
  • Add support for numpy.vectorize in funcname (:pr:5396) James Bourbeau_
  • Fixed HDFS upstream test (:pr:5395) Tom Augspurger_
  • Support numbers and None in parse_bytes/timedelta (:pr:5384) Matthew Rocklin_
  • Fix tokenizing of subindexes on memmapped numpy arrays (:pr:5351) Henry Pinkard_
  • Upstream fixups (:pr:5300) Tom Augspurger_

DataFrame ^^^^^^^^^

  • Allow pandas to cast type of statistics (:pr:5402) Richard J Zamora_
  • Preserve index dtype after applying dd.pivot_table (:pr:5385) therhaag_
  • Implement explode for Series and DataFrame (:pr:5381) Arpit Solanki_
  • set_index on categorical fails with less categories than partitions (:pr:5354) Oliver Hofkens_
  • Support output to a single CSV file (:pr:5304) Hongjiu Zhang_
  • Add groupby().transform() (:pr:5327) Oliver Hofkens_
  • Adding filter kwarg to pyarrow dataset call (:pr:5348) Richard J Zamora_
  • Implement and check compression defaults for parquet (:pr:5335) Sarah Bird_
  • Pass sqlalchemy params to delayed objects (:pr:5332) Arpit Solanki_
  • Fixing schema handling in arrow-parquet (:pr:5307) Richard J Zamora_
  • Add support for DF and Series groupby().idxmin/max() (:pr:5273) Oliver Hofkens_
  • Add correlation calculation and add test (:pr:5296) Benjamin Zaitlen_

Documentation ^^^^^^^^^^^^^

  • Numpy docstring standard has moved (:pr:5405) Wes Roach_
  • Reference correct NumPy array name (:pr:5403) Wes Roach_
  • Minor edits to Array chunk documentation (:pr:5372) Scott Sievert_
  • Add methods to API docs (:pr:5387) Tom Augspurger_
  • Add namespacing to configuration example (:pr:5374) Matthew Rocklin_
  • Add get_task_stream and profile to the diagnostics page (:pr:5375) Matthew Rocklin_
  • Add best practice to load data with Dask (:pr:5369) Matthew Rocklin_
  • Update institutional-faq.rst (:pr:5345) DomHudson_
  • Add threads and processes note to the best practices (:pr:5340) Matthew Rocklin_
  • Update cuDF links (:pr:5328) James Bourbeau_
  • Fixed small typo with parentheses placement (:pr:5311) Eugene Huang_
  • Update link in reshape docstring (:pr:5297) James Bourbeau_

.. _v2.3.0 / 2019-08-16:

2.3.0 / 2019-08-16

Array ^^^^^

  • Raise exception when from_array is given a dask array (:pr:5280) David Hoese_
  • Avoid adjusting gufunc's meta dtype twice (:pr:5274) Peter Andreas Entschev_
  • Add meta= keyword to map_blocks and add test with sparse (:pr:5269) Matthew Rocklin_
  • Add rollaxis and moveaxis (:pr:4822) Tobias de Jong_
  • Always increment old chunk index (:pr:5256) James Bourbeau_
  • Shuffle dask array (:pr:3901) Tom Augspurger_
  • Fix ordering when indexing a dask array with a bool dask array (:pr:5151) James Bourbeau_

Bag ^^^

  • Add workaround for memory leaks in bag generators (:pr:5208) Marco Neumann_

Core ^^^^

  • Set strict xfail option (:pr:5220) James Bourbeau_
  • test-upstream (:pr:5267) Tom Augspurger_
  • Fixed HDFS CI failure (:pr:5234) Tom Augspurger_
  • Error nicely if no file size inferred (:pr:5231) Jim Crist_
  • A few changes to config.set (:pr:5226) Jim Crist_
  • Fixup black string normalization (:pr:5227) Jim Crist_
  • Pin NumPy in windows tests (:pr:5228) Jim Crist_
  • Ensure parquet tests are skipped if fastparquet and pyarrow not installed (:pr:5217) James Bourbeau_
  • Add fsspec to readthedocs (:pr:5207) Matthew Rocklin_
  • Bump NumPy and Pandas to 1.17 and 0.25 in CI test (:pr:5179) John A Kirkham_

DataFrame ^^^^^^^^^

  • Fix DataFrame.query docstring (incorrect numexpr API) (:pr:5271) Doug Davis_
  • Parquet metadata-handling improvements (:pr:5218) Richard J Zamora_
  • Improve messaging around sorted parquet columns for index (:pr:5265) Martin Durant_
  • Add rearrange_by_divisions and set_index support for cudf (:pr:5205) Richard J Zamora_
  • Fix groupby.std() with integer colum names (:pr:5096) Nicolas Hug_
  • Add Series.__iter__ (:pr:5071) Blane_
  • Generalize hash_pandas_object to work for non-pandas backends (:pr:5184) GALI PREM SAGAR_
  • Add rolling cov (:pr:5154) Ivars Geidans_
  • Add columns argument in drop function (:pr:5223) Henrique Ribeiro_

Documentation ^^^^^^^^^^^^^

  • Update institutional FAQ doc (:pr:5277) Matthew Rocklin_
  • Add draft of institutional FAQ (:pr:5214) Matthew Rocklin_
  • Make boxes for dask-spark page (:pr:5249) Martin Durant_
  • Add motivation for shuffle docs (:pr:5213) Matthew Rocklin_
  • Fix links and API entries for best-practices (:pr:5246) Martin Durant_
  • Remove "bytes" (internal data ingestion) doc page (:pr:5242) Martin Durant_
  • Redirect from our local distributed page to distributed.dask.org (:pr:5248) Matthew Rocklin_
  • Cleanup API page (:pr:5247) Matthew Rocklin_
  • Remove excess endlines from install docs (:pr:5243) Matthew Rocklin_
  • Remove item list in phases of computation doc (:pr:5245) Martin Durant_
  • Remove custom graphs from the TOC sidebar (:pr:5241) Matthew Rocklin_
  • Remove experimental status of custom collections (:pr:5236) James Bourbeau_
  • Adds table of contents to Why Dask? (:pr:5244) James Bourbeau_
  • Moves bag overview to top-level bag page (:pr:5240) James Bourbeau_
  • Remove use-cases in favor of stories.dask.org (:pr:5238) Matthew Rocklin_
  • Removes redundant TOC information in index.rst (:pr:5235) James Bourbeau_
  • Elevate dashboard in distributed diagnostics documentation (:pr:5239) Martin Durant_
  • Updates "add" layer in HLG docs example (:pr:5237) James Bourbeau_
  • Update GUFunc documentation (:pr:5232) Matthew Rocklin_

.. _v2.2.0 / 2019-08-01:

2.2.0 / 2019-08-01

Array ^^^^^

  • Use da.from_array(..., asarray=False) if input follows NEP-18 (:pr:5074) Matthew Rocklin_
  • Add missing attributes to from_array documentation (:pr:5108) Peter Andreas Entschev_
  • Fix meta computation for some reduction functions (:pr:5035) Peter Andreas Entschev_
  • Raise informative error in to_zarr if unknown chunks (:pr:5148) James Bourbeau_
  • Remove invalid pad tests (:pr:5122) Tom Augspurger_
  • Ignore NumPy warnings in compute_meta (:pr:5103) Peter Andreas Entschev_
  • Fix kurtosis calc for single dimension input array (:pr:5177) @andrethrill_
  • Support Numpy 1.17 in tests (:pr:5192) Matthew Rocklin_

Bag ^^^

  • Supply pool to bag test to resolve intermittent failure (:pr:5172) Tom Augspurger_

Core ^^^^

  • Base dask on fsspec (:pr:5064) (:pr:5121) Martin Durant_
  • Various upstream compatibility fixes (:pr:5056) Tom Augspurger_
  • Make distributed tests optional again. (:pr:5128) Elliott Sales de Andrade_
  • Fix HDFS in dask (:pr:5130) Martin Durant_
  • Ignore some more invalid value warnings. (:pr:5140) Elliott Sales de Andrade_

DataFrame ^^^^^^^^^

  • Fix pd.MultiIndex size estimate (:pr:5066) Brett Naul_
  • Generalizing has_known_categories (:pr:5090) GALI PREM SAGAR_
  • Refactor Parquet engine (:pr:4995) Richard J Zamora_
  • Add divide method to series and dataframe (:pr:5094) msbrown47_
  • fix flaky partd test (:pr:5111) Tom Augspurger_
  • Adjust is_dataframe_like to adjust for value_counts change (:pr:5143) Tom Augspurger_
  • Generalize rolling windows to support non-Pandas dataframes (:pr:5149) Nick Becker_
  • Avoid unnecessary aggregation in pivot_table (:pr:5173) Daniel Saxton_
  • Add column names to apply_and_enforce error message (:pr:5180) Matthew Rocklin_
  • Add schema keyword argument to to_parquet (:pr:5150) Sarah Bird_
  • Remove recursion error in accessors (:pr:5182) Jim Crist_
  • Allow fastparquet to handle gather_statistics=False for file lists (:pr:5157) Richard J Zamora_

Documentation ^^^^^^^^^^^^^

  • Adds NumFOCUS badge to the README (:pr:5086) James Bourbeau_
  • Update developer docs [ci skip] (:pr:5093) Jim Crist_
  • Document DataFrame.set_index computataion behavior Natalya Rapstine_
  • Use pip install . instead of calling setup.py (:pr:5139) Matthias Bussonier_
  • Close user survey (:pr:5147) Tom Augspurger_
  • Fix Google Calendar meeting link (:pr:5155) Loïc Estève_
  • Add docker image customization example (:pr:5171) James Bourbeau_
  • Update remote-data-services after fsspec (:pr:5170) Martin Durant_
  • Fix typo in spark.rst (:pr:5164) Xavier Holt_
  • Update setup/python docs for async/await API (:pr:5163) Matthew Rocklin_
  • Update Local Storage HPC documentation (:pr:5165) Matthew Rocklin_

.. _v2.1.0 / 2019-07-08:

2.1.0 / 2019-07-08

Array ^^^^^

  • Add recompute= keyword to svd_compressed for lower-memory use (:pr:5041) Matthew Rocklin_
  • Change __array_function__ implementation for backwards compatibility (:pr:5043) Ralf Gommers_
  • Added dtype and shape kwargs to apply_along_axis (:pr:3742) Davis Bennett_
  • Fix reduction with empty tuple axis (:pr:5025) Peter Andreas Entschev_
  • Drop size 0 arrays in stack (:pr:4978) John A Kirkham_

Core ^^^^

  • Removes index keyword from pandas to_parquet call (:pr:5075) James Bourbeau_
  • Fixes upstream dev CI build installation (:pr:5072) James Bourbeau_
  • Ensure scalar arrays are not rendered to SVG (:pr:5058) Willi Rath_
  • Environment creation overhaul (:pr:5038) Tom Augspurger_
  • s3fs, moto compatibility (:pr:5033) Tom Augspurger_
  • pytest 5.0 compat (:pr:5027) Tom Augspurger_

DataFrame ^^^^^^^^^

  • Fix compute_meta recursion in blockwise (:pr:5048) Peter Andreas Entschev_
  • Remove hard dependency on pandas in get_dummies (:pr:5057) GALI PREM SAGAR_
  • Check dtypes unchanged when using DataFrame.assign (:pr:5047) asmith26_
  • Fix cumulative functions on tables with more than 1 partition (:pr:5034) tshatrov_
  • Handle non-divisible sizes in repartition (:pr:5013) George Sakkis_
  • Handles timestamp and preserve_index changes in pyarrow (:pr:5018) Richard J Zamora_
  • Fix undefined meta for str.split(expand=False) (:pr:5022) Brett Naul_
  • Removed checks used for debugging merge_asof (:pr:5011) Cody Johnson_
  • Don't use type when getting accessor in dataframes (:pr:4992) Matthew Rocklin_
  • Add melt as a method of Dask DataFrame (:pr:4984) Dustin Tindall_
  • Adds path-like support to to_hdf (:pr:5003) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Point to latest K8s setup article in JupyterHub docs (:pr:5065) Sean McKenna_
  • Changes vizualize to visualize (:pr:5061) David Brochart_
  • Fix from_sequence typo in delayed best practices (:pr:5045) James Bourbeau_
  • Add user survey link to docs (:pr:5026) James Bourbeau_
  • Fixes typo in optimization docs (:pr:5015) James Bourbeau_
  • Update community meeting information (:pr:5006) Tom Augspurger_

.. _v2.0.0 / 2019-06-25:

2.0.0 / 2019-06-25

Array ^^^^^

  • Support automatic chunking in da.indices (:pr:4981) James Bourbeau_
  • Err if there are no arrays to stack (:pr:4975) John A Kirkham_
  • Asymmetrical Array Overlap (:pr:4863) Michael Eaton_
  • Dispatch concatenate where possible within dask array (:pr:4669) Hameer Abbasi_
  • Fix tokenization of memmapped numpy arrays on different part of same file (:pr:4931) Henry Pinkard_
  • Preserve NumPy condition in da.asarray to preserve output shape (:pr:4945) Alistair Miles_
  • Expand foo_like_safe usage (:pr:4946) Peter Andreas Entschev_
  • Defer order/casting einsum parameters to NumPy implementation (:pr:4914) Peter Andreas Entschev_
  • Remove numpy warning in moment calculation (:pr:4921) Matthew Rocklin_
  • Fix meta_from_array to support Xarray test suite (:pr:4938) Matthew Rocklin_
  • Cache chunk boundaries for integer slicing (:pr:4923) Bruce Merry_
  • Drop size 0 arrays in concatenate (:pr:4167) John A Kirkham_
  • Raise ValueError if concatenate is given no arrays (:pr:4927) John A Kirkham_
  • Promote types in concatenate using _meta (:pr:4925) John A Kirkham_
  • Add chunk type to html repr in Dask array (:pr:4895) Matthew Rocklin_
  • Add Dask Array.meta attribute (:pr:4543) Peter Andreas Entschev
    • Fix meta slicing of flexible types (:pr:4912) Peter Andreas Entschev
    • Minor meta construction cleanup in concatenate (:pr:4937) Peter Andreas Entschev_
    • Further relax Array meta checks for Xarray (:pr:4944) Matthew Rocklin_
    • Support meta= keyword in da.from_delayed (:pr:4972) Matthew Rocklin_
    • Concatenate meta along axis (:pr:4977) John A Kirkham_
    • Use meta in stack (:pr:4976) John A Kirkham_
    • Move blockwise_meta to more general compute_meta function (:pr:4954) Matthew Rocklin_
  • Alias .partitions to .blocks attribute of dask arrays (:pr:4853) Genevieve Buckley_
  • Drop outdated numpy_compat functions (:pr:4850) John A Kirkham_
  • Allow da.eye to support arbitrary chunking sizes with chunks='auto' (:pr:4834) Anderson Banihirwe_
  • Fix CI warnings in dask.array tests (:pr:4805) Tom Augspurger_
  • Make map_blocks work with drop_axis + block_info (:pr:4831) Bruce Merry_
  • Add SVG image and table in Array.repr_html (:pr:4794) Matthew Rocklin_
  • ufunc: avoid array_wrap in favor of array_function (:pr:4708) Peter Andreas Entschev_
  • Ensure trivial padding returns the original array (:pr:4990) John A Kirkham_
  • Test da.block with 0-size arrays (:pr:4991) John A Kirkham_

Core ^^^^

  • Drop Python 2.7 (:pr:4919) Jim Crist_
  • Quiet dependency installs in CI (:pr:4960) Tom Augspurger_
  • Raise on warnings in tests (:pr:4916) Tom Augspurger_
  • Add a diagnostics extra to setup.py (includes bokeh) (:pr:4924) John A Kirkham_
  • Add newline delimter keyword to OpenFile (:pr:4935) btw08_
  • Overload HighLevelGraphs values method (:pr:4918) James Bourbeau_
  • Add await method to Dask collections (:pr:4901) Matthew Rocklin_
  • Also ignore AttributeErrors which may occur if snappy (not python-snappy) is installed (:pr:4908) Mark Bell_
  • Canonicalize key names in config.rename (:pr:4903) Ian Bolliger_
  • Bump minimum partd to 0.3.10 (:pr:4890) Tom Augspurger_
  • Catch async def SyntaxError (:pr:4836) James Bourbeau_
  • catch IOError in ensure_file (:pr:4806) Justin Poehnelt_
  • Cleanup CI warnings (:pr:4798) Tom Augspurger_
  • Move distributed's parse and format functions to dask.utils (:pr:4793) Matthew Rocklin_
  • Apply black formatting (:pr:4983) James Bourbeau_
  • Package license file in wheels (:pr:4988) John A Kirkham_

DataFrame ^^^^^^^^^

  • Add an optional partition_size parameter to repartition (:pr:4416) George Sakkis_
  • merge_asof and prefix_reduction (:pr:4877) Cody Johnson_
  • Allow dataframes to be indexed by dask arrays (:pr:4882) Endre Mark Borza_
  • Avoid deprecated message parameter in pytest.raises (:pr:4962) James Bourbeau_
  • Update test_to_records to test with lengths argument(:pr:4515) asmith26_
  • Remove pandas pinning in Dataframe accessors (:pr:4955) Matthew Rocklin_
  • Fix correlation of series with same names (:pr:4934) Philipp S. Sommer_
  • Map Dask Series to Dask Series (:pr:4872) Justin Waugh_
  • Warn in dd.merge on dtype warning (:pr:4917) mcsoini_
  • Add groupby Covariance/Correlation (:pr:4889) Benjamin Zaitlen_
  • keep index name with to_datetime (:pr:4905) Ian Bolliger_
  • Add Parallel variance computation for dataframes (:pr:4865) Ksenia Bobrova_
  • Add divmod implementation to arrays and dataframes (:pr:4884) Henrique Ribeiro_
  • Add documentation for dataframe reshape methods (:pr:4896) tpanza_
  • Avoid use of pandas.compat (:pr:4881) Tom Augspurger_
  • Added accessor registration for Series, DataFrame, and Index (:pr:4829) Tom Augspurger_
  • Add read_function keyword to read_json (:pr:4810) Richard J Zamora_
  • Provide full type name in check_meta (:pr:4819) Matthew Rocklin_
  • Correctly estimate bytes per row in read_sql_table (:pr:4807) Lijo Jose_
  • Adding support of non-numeric data to describe() (:pr:4791) Ksenia Bobrova_
  • Scalars for extension dtypes. (:pr:4459) Tom Augspurger_
  • Call head before compute in dd.from_delayed (:pr:4802) Matthew Rocklin_
  • Add support for rolling operations with larger window that partition size in DataFrames with Time-based index (:pr:4796) Jorge Pessoa_
  • Update groupby-apply doc with warning (:pr:4800) Tom Augspurger_
  • Change groupby-ness tests in _maybe_slice (:pr:4786) Benjamin Zaitlen_
  • Add master best practices document (:pr:4745) Matthew Rocklin_
  • Add document for how Dask works with GPUs (:pr:4792) Matthew Rocklin_
  • Add cli API docs (:pr:4788) James Bourbeau_
  • Ensure concat output has coherent dtypes (:pr:4692) Guillaume Lemaitre_
  • Fixes pandas_datareader dependencies installation (:pr:4989) James Bourbeau_
  • Accept pathlib.Path as pattern in read_hdf (:pr:3335) Jörg Dietrich_

Documentation ^^^^^^^^^^^^^

  • Move CLI API docs to relavant pages (:pr:4980) James Bourbeau_
  • Add to_datetime function to dataframe API docs Matthew Rocklin_
  • Add documentation entry for dask.array.ma.average (:pr:4970) Bouwe Andela_
  • Add bag.read_avro to bag API docs (:pr:4969) James Bourbeau_
  • Fix typo (:pr:4968) mbarkhau_
  • Docs: Drop support for Python 2.7 (:pr:4932) Hugo_
  • Remove requirement to modify changelog (:pr:4915) Matthew Rocklin_
  • Add documentation about meta column order (:pr:4887) Tom Augspurger_
  • Add documentation note in DataFrame.shift (:pr:4886) Tom Augspurger_
  • Docs: Fix typo (:pr:4868) Paweł Kordek_
  • Put do/don't into boxes for delayed best practice docs (:pr:3821) Martin Durant_
  • Doc fixups (:pr:2528) Tom Augspurger_
  • Add quansight to paid support doc section (:pr:4838) Martin Durant_
  • Add document for custom startup (:pr:4833) Matthew Rocklin_
  • Allow utils.derive_from to accept functions, apply across array (:pr:4804) Martin Durant_
  • Add "Avoid Large Partitions" section to best practices (:pr:4808) Matthew Rocklin_
  • Update URL for joblib to new website hosting their doc (:pr:4816) Christian Hudon_

.. _v1.2.2 / 2019-05-08:

1.2.2 / 2019-05-08

Array ^^^^^

  • Clarify regions kwarg to array.store (:pr:4759) Martin Durant_
  • Add dtype= parameter to da.random.randint (:pr:4753) Matthew Rocklin_
  • Use "row major" rather than "C order" in docstring (:pr:4452) @asmith26_
  • Normalize Xarray datasets to Dask arrays (:pr:4756) Matthew Rocklin_
  • Remove normed keyword in da.histogram (:pr:4755) Matthew Rocklin_

Bag ^^^

  • Add key argument to Bag.distinct (:pr:4423) Daniel Severo_

Core ^^^^

  • Add core dask config file (:pr:4774) Matthew Rocklin_
  • Add core dask config file to MANIFEST.in (:pr:4780) James Bourbeau_
  • Enabling glob with HTTP file-system (:pr:3926) Martin Durant_
  • HTTPFile.seek with whence=1 (:pr:4751) Martin Durant_
  • Remove config key normalization (:pr:4742) Jim Crist_

DataFrame ^^^^^^^^^

  • Remove explicit references to Pandas in dask.dataframe.groupby (:pr:4778) Matthew Rocklin_
  • Add support for group_keys kwarg in DataFrame.groupby() (:pr:4771) Brian Chu_
  • Describe doc (:pr:4762) Martin Durant_
  • Remove explicit pandas check in cumulative aggregations (:pr:4765) Nick Becker_
  • Added meta for read_json and test (:pr:4588) Abhinav Ralhan_
  • Add test for dtype casting (:pr:4760) Martin Durant_
  • Document alignment in map_partitions (:pr:4757) Jim Crist_
  • Implement Series.str.split(expand=True) (:pr:4744) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

  • Tweaks to develop.rst from trying to run tests (:pr:4772) Christian Hudon_
  • Add document describing phases of computation (:pr:4766) Matthew Rocklin_
  • Point users to Dask-Yarn from spark documentation (:pr:4770) Matthew Rocklin_
  • Update images in delayed doc to remove labels (:pr:4768) Martin Durant_
  • Explain intermediate storage for dask arrays (:pr:4025) John A Kirkham_
  • Specify bash code-block in array best practices (:pr:4764) James Bourbeau_
  • Add array best practices doc (:pr:4705) Matthew Rocklin_
  • Update optimization docs now that cull is not automatic (:pr:4752) Matthew Rocklin_

.. _v1.2.1 / 2019-04-29:

1.2.1 / 2019-04-29

Array ^^^^^

  • Fix map_blocks with block_info and broadcasting (:pr:4737) Bruce Merry_
  • Make 'minlength' keyword argument optional in da.bincount (:pr:4684) Genevieve Buckley_
  • Add support for map_blocks with no array arguments (:pr:4713) Bruce Merry_
  • Add dask.array.trace (:pr:4717) Danilo Horta_
  • Add sizeof support for cupy.ndarray (:pr:4715) Peter Andreas Entschev_
  • Add name kwarg to from_zarr (:pr:4663) Michael Eaton_
  • Add chunks='auto' to from_array (:pr:4704) Matthew Rocklin_
  • Raise TypeError if dask array is given as shape for da.ones, zeros, empty or full (:pr:4707) Genevieve Buckley_
  • Add TileDB backend (:pr:4679) Isaiah Norton_

Core ^^^^

  • Delay long list arguments (:pr:4735) Matthew Rocklin_
  • Bump to numpy >= 1.13, pandas >= 0.21.0 (:pr:4720) Jim Crist_
  • Remove file "test" (:pr:4710) James Bourbeau_
  • Reenable development build, uses upstream libraries (:pr:4696) Peter Andreas Entschev_
  • Remove assertion in HighLevelGraph constructor (:pr:4699) Matthew Rocklin_

DataFrame ^^^^^^^^^

  • Change cum-aggregation last-nonnull-value algorithm (:pr:4736) Nick Becker_
  • Fixup series-groupby-apply (:pr:4738) Jim Crist_
  • Refactor array.percentile and dataframe.quantile to use t-digest (:pr:4677) Janne Vuorela_
  • Allow naive concatenation of sorted dataframes (:pr:4725) Matthew Rocklin_
  • Fix perf issue in dd.Series.isin (:pr:4727) Jim Crist_
  • Remove hard pandas dependency for melt by using methodcaller (:pr:4719) Nick Becker_
  • A few dataframe metadata fixes (:pr:4695) Jim Crist_
  • Add Dataframe.replace (:pr:4714) Matthew Rocklin_
  • Add 'threshold' parameter to pd.DataFrame.dropna (:pr:4625) Nathan Matare_

Documentation ^^^^^^^^^^^^^

  • Add warning about derived docstrings early in the docstring (:pr:4716) Matthew Rocklin_
  • Create dataframe best practices doc (:pr:4703) Matthew Rocklin_
  • Uncomment dask_sphinx_theme (:pr:4728) James Bourbeau_
  • Fix minor typo fix in a Queue/fire_and_forget example (:pr:4709) Matthew Rocklin_
  • Update from_pandas docstring to match signature (:pr:4698) James Bourbeau_

.. _v1.2.0 / 2019-04-12:

1.2.0 / 2019-04-12

Array ^^^^^

  • Fixed mean() and moment() on sparse arrays (:pr:4525) Peter Andreas Entschev_
  • Add test for NEP-18. (:pr:4675) Hameer Abbasi_
  • Allow None to say "no chunking" in normalize_chunks (:pr:4656) Matthew Rocklin_
  • Fix limit value in auto_chunks (:pr:4645) Matthew Rocklin_

Core ^^^^

  • Updated diagnostic bokeh test for compatibility with bokeh>=1.1.0 (:pr:4680) Philipp Rudiger_
  • Adjusts codecov's target/threshold, disable patch (:pr:4671) Peter Andreas Entschev_
  • Always start with empty http buffer, not None (:pr:4673) Martin Durant_

DataFrame ^^^^^^^^^

  • Propagate index dtype and name when create dask dataframe from array (:pr:4686) Henrique Ribeiro_
  • Fix ordering of quantiles in describe (:pr:4647) gregrf_
  • Clean up and document rearrange_column_by_tasks (:pr:4674) Matthew Rocklin_
  • Mark some parquet tests xfail (:pr:4667) Peter Andreas Entschev_
  • Fix parquet breakages with arrow 0.13.0 (:pr:4668) Martin Durant_
  • Allow sample to be False when reading CSV from a remote URL (:pr:4634) Ian Rose_
  • Fix timezone metadata inference on parquet load (:pr:4655) Martin Durant_
  • Use is_dataframe/index_like in dd.utils (:pr:4657) Matthew Rocklin_
  • Add min_count parameter to groupby sum method (:pr:4648) Henrique Ribeiro_
  • Correct quantile to handle unsorted quantiles (:pr:4650) gregrf_

Documentation ^^^^^^^^^^^^^

  • Add delayed extra dependencies to install docs (:pr:4660) James Bourbeau_

.. _v1.1.5 / 2019-03-29:

1.1.5 / 2019-03-29

Array ^^^^^

  • Ensure that we use the dtype keyword in normalize_chunks (:pr:4646) Matthew Rocklin_

Core ^^^^

  • Use recursive glob in LocalFileSystem (:pr:4186) Brett Naul_
  • Avoid YAML deprecation (:pr:4603)
  • Fix CI and add set -e (:pr:4605) James Bourbeau_
  • Support builtin sequence types in dask.visualize (:pr:4602)
  • unpack/repack orderedDict (:pr:4623) Justin Poehnelt_
  • Add da.random.randint to API docs (:pr:4628) James Bourbeau_
  • Add zarr to CI environment (:pr:4604) James Bourbeau_
  • Enable codecov (:pr:4631) Peter Andreas Entschev_

DataFrame ^^^^^^^^^

  • Support setting the index (:pr:4565)
  • DataFrame.itertuples accepts index, name kwargs (:pr:4593) Dan O'Donovan_
  • Support non-Pandas series in dd.Series.unique (:pr:4599) Benjamin Zaitlen_
  • Replace use of explicit type check with ._is_partition_type predicate (:pr:4533)
  • Remove additional pandas warnings in tests (:pr:4576)
  • Check object for name/dtype attributes rather than type (:pr:4606)
  • Fix comparison against pd.Series (:pr:4613) amerkel2_
  • Fixing warning from setting categorical codes to floats (:pr:4624) Julia Signell_
  • Fix renaming on index to_frame method (:pr:4498) Henrique Ribeiro_
  • Fix divisions when joining two single-partition dataframes (:pr:4636) Justin Waugh_
  • Warn if partitions overlap in compute_divisions (:pr:4600) Brian Chu_
  • Give informative meta= warning (:pr:4637) Matthew Rocklin_
  • Add informative error message to Series.getitem (:pr:4638) Matthew Rocklin_
  • Add clear exception message when using index or index_col in read_csv (:pr:4651) Álvaro Abella Bascarán_

Documentation ^^^^^^^^^^^^^

  • Add documentation for custom groupby aggregations (:pr:4571)
  • Docs dataframe joins (:pr:4569)
  • Specify fork-based contributions (:pr:4619) James Bourbeau_
  • correct to_parquet example in docs (:pr:4641) Aaron Fowles_
  • Update and secure several references (:pr:4649) Søren Fuglede Jørgensen_

.. _v1.1.4 / 2019-03-08:

1.1.4 / 2019-03-08

Array ^^^^^

  • Use mask selection in compress (:pr:4548) John A Kirkham_
  • Use asarray in extract (:pr:4549) John A Kirkham_
  • Use correct dtype when test concatenation. (:pr:4539) Elliott Sales de Andrade_
  • Fix CuPy tests or properly marks as xfail (:pr:4564) Peter Andreas Entschev_

Core ^^^^

  • Fix local scheduler callback to deal with custom caching (:pr:4542) Yu Feng_
  • Use parse_bytes in read_bytes(sample=...) (:pr:4554) Matthew Rocklin_

DataFrame ^^^^^^^^^

  • Fix up groupby-standard deviation again on object dtype keys (:pr:4541) Matthew Rocklin_
  • TST/CI: Updates for pandas 0.24.1 (:pr:4551) Tom Augspurger_
  • Add ability to control number of unique elements in timeseries (:pr:4557) Matthew Rocklin_
  • Add support in read_csv for parameter skiprows for other iterables (:pr:4560) @JulianWgs_

Documentation ^^^^^^^^^^^^^

  • DataFrame to Array conversion and unknown chunks (:pr:4516) Scott Sievert_
  • Add docs for random array creation (:pr:4566) Matthew Rocklin_
  • Fix typo in docstring (:pr:4572) Shyam Saladi_

.. _v1.1.3 / 2019-03-01:

1.1.3 / 2019-03-01

Array ^^^^^

  • Modify mean chunk functions to return dicts rather than arrays (:pr:4513) Matthew Rocklin_
  • Change sparse installation in CI for NumPy/Python2 compatibility (:pr:4537) Matthew Rocklin_

DataFrame ^^^^^^^^^

  • Make merge dispatchable on pandas/other dataframe types (:pr:4522) Matthew Rocklin_
  • read_sql_table - datetime index fix and index type checking (:pr:4474) Joe Corbett_
  • Use generalized form of index checking (is_index_like) (:pr:4531) Benjamin Zaitlen_
  • Add tests for groupby reductions with object dtypes (:pr:4535) Matthew Rocklin_
  • Fixes #4467 : Updates time_series for pandas deprecation (:pr:4530) @HSR05_

Documentation ^^^^^^^^^^^^^

  • Add missing method to documentation index (:pr:4528) Bart Broere_

.. _v1.1.2 / 2019-02-25:

1.1.2 / 2019-02-25

Array ^^^^^

  • Fix another unicode/mixed-type edge case in normalize_array (:pr:4489) Marco Neumann_
  • Add dask.array.diagonal (:pr:4431) Danilo Horta_
  • Call asanyarray in unify_chunks (:pr:4506) Jim Crist_
  • Modify moment chunk functions to return dicts (:pr:4519) Peter Andreas Entschev_

Bag ^^^

  • Don't inline output keys in dask.bag (:pr:4464) Jim Crist_
  • Ensure that bag.from_sequence always includes at least one partition (:pr:4475) Anderson Banihirwe_
  • Implement out_type for bag.fold (:pr:4502) Matthew Rocklin_
  • Remove map from bag keynames (:pr:4500) Matthew Rocklin_
  • Avoid itertools.repeat in map_partitions (:pr:4507) Matthew Rocklin_

DataFrame ^^^^^^^^^

  • Fix relative path parsing on windows when using fastparquet (:pr:4445) Janne Vuorela_
  • Fix bug in pyarrow and hdfs (:pr:4453) (:pr:4455) Michał Jastrzębski_
  • df getitem with integer slices is not implemented (:pr:4466) Jim Crist_
  • Replace cudf-specific code with dask-cudf import (:pr:4470) Matthew Rocklin_
  • Avoid groupby.agg(callable) in groupby-var (:pr:4482) Matthew Rocklin_
  • Consider uint types as numerical in check_meta (:pr:4485) Marco Neumann_
  • Fix some typos in groupby comments (:pr:4494) Daniel Saxton_
  • Add error message around set_index(inplace=True) (:pr:4501) Matthew Rocklin_
  • meta_nonempty works with categorical index (:pr:4505) Jim Crist_
  • Add module name to expected meta error message (:pr:4499) Matthew Rocklin_
  • groupby-nunique works on empty chunk (:pr:4504) Jim Crist_
  • Propagate index metadata if not specified (:pr:4509) Jim Crist_

Documentation ^^^^^^^^^^^^^

  • Update docs to use from_zarr (:pr:4472) John A Kirkham_
  • DOC: add section of Using Other S3-Compatible Services for remote-data-services (:pr:4405) Aploium_
  • Fix header level of section in changelog (:pr:4483) Bruce Merry_
  • Add quotes to pip install [skip-ci] (:pr:4508) James Bourbeau_

Core ^^^^

  • Extend started_cbs AFTER state is initialized (:pr:4460) Marco Neumann_
  • Fix bug in HTTPFile.fetch_range with headers (:pr:4479) (:pr:4480) Ross Petchler
  • Repeat optimize_blockwise for diamond fusion (:pr:4492) Matthew Rocklin_

.. _v1.1.1 / 2019-01-31:

1.1.1 / 2019-01-31

Array ^^^^^

  • Add support for cupy.einsum (:pr:4402) Johnnie Gray_
  • Provide byte size in chunks keyword (:pr:4434) Adam Beberg_
  • Raise more informative error for histogram bins and range (:pr:4430) James Bourbeau_

DataFrame ^^^^^^^^^

  • Lazily register more cudf functions and move to backends file (:pr:4396) Matthew Rocklin_
  • Fix ORC tests for pyarrow 0.12.0 (:pr:4413) Jim Crist_
  • rearrange_by_column: ensure that shuffle arg defaults to 'disk' if it's None in dask.config (:pr:4414) George Sakkis_
  • Implement filters for read_pyarrow (:pr:4415) George Sakkis
  • Avoid checking against types in is_dataframe_like (:pr:4418) Matthew Rocklin_
  • Pass username as 'user' when using pyarrow (:pr:4438) Roma Sokolov_

Delayed ^^^^^^^

  • Fix DelayedAttr return value (:pr:4440) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

  • Use SVG for pipeline graphic (:pr:4406) John A Kirkham_
  • Add doctest-modules to py.test documentation (:pr:4427) Daniel Severo_

Core ^^^^

  • Work around psutil 5.5.0 not allowing pickling Process objects Janne Vuorela_

.. _v1.1.0 / 2019-01-18:

1.1.0 / 2019-01-18

Array ^^^^^

  • Fix the average function when there is a masked array (:pr:4236) Damien Garaud_
  • Add allow_unknown_chunksizes to hstack and vstack (:pr:4287) Paul Vecchio_
  • Fix tensordot for 27+ dimensions (:pr:4304) Johnnie Gray_
  • Fixed block_info with axes. (:pr:4301) Tom Augspurger_
  • Use safe_wraps for matmul (:pr:4346) Mark Harfouche_
  • Use chunks="auto" in array creation routines (:pr:4354) Matthew Rocklin_
  • Fix np.matmul in dask.array.Array.array_ufunc (:pr:4363) Stephan Hoyer_
  • COMPAT: Re-enable multifield copy->view change (:pr:4357) Diane Trout_
  • Calling np.dtype on a delayed object works (:pr:4387) Jim Crist_
  • Rework normalize_array for numpy data (:pr:4312) Marco Neumann_

DataFrame ^^^^^^^^^

  • Add fill_value support for series comparisons (:pr:4250) James Bourbeau_
  • Add schema name in read_sql_table for empty tables (:pr:4268) Mina Farid_
  • Adjust check for bad chunks in map_blocks (:pr:4308) Tom Augspurger_
  • Add dask.dataframe.read_fwf (:pr:4316) @slnguyen_
  • Use atop fusion in dask dataframe (:pr:4229) Matthew Rocklin_
  • Use parallel_types() in from_pandas (:pr:4331) Matthew Rocklin_
  • Change DataFrame.repr_data to method (:pr:4330) Matthew Rocklin
  • Install pyarrow fastparquet for Appveyor (:pr:4338) Gábor Lipták_
  • Remove explicit pandas checks and provide cudf lazy registration (:pr:4359) Matthew Rocklin_
  • Replace isinstance(..., pandas) with is_dataframe_like (:pr:4375) Matthew Rocklin_
  • ENH: Support 3rd-party ExtensionArrays (:pr:4379) Tom Augspurger_
  • Pandas 0.24.0 compat (:pr:4374) Tom Augspurger_

Documentation ^^^^^^^^^^^^^

  • Fix link to 'map_blocks' function in array api docs (:pr:4258) David Hoese_
  • Add a paragraph on Dask-Yarn in the cloud docs (:pr:4260) Jim Crist_
  • Copy edit documentation (:pr:4267), (:pr:4263), (:pr:4262), (:pr:4277), (:pr:4271), (:pr:4279), (:pr:4265), (:pr:4295), (:pr:4293), (:pr:4296), (:pr:4302), (:pr:4306), (:pr:4318), (:pr:4314), (:pr:4309), (:pr:4317), (:pr:4326), (:pr:4325), (:pr:4322), (:pr:4332), (:pr:4333), Miguel Farrajota_
  • Fix typo in code example (:pr:4272) Daniel Li_
  • Doc: Update array-api.rst (:pr:4259) (:pr:4282) Prabakaran Kumaresshan_
  • Update hpc doc (:pr:4266) Guillaume Eynard-Bontemps_
  • Doc: Replace from_avro with read_avro in documents (:pr:4313) Prabakaran Kumaresshan_
  • Remove reference to "get" scheduler functions in docs (:pr:4350) Matthew Rocklin_
  • Fix typo in docstring (:pr:4376) Daniel Saxton_
  • Added documentation for dask.dataframe.merge (:pr:4382) Jendrik Jördening_

Core ^^^^

  • Avoid recursion in dask.core.get (:pr:4219) Matthew Rocklin_
  • Remove verbose flag from pytest setup.cfg (:pr:4281) Matthew Rocklin_
  • Support Pytest 4.0 by specifying marks explicitly (:pr:4280) Takahiro Kojima_
  • Add High Level Graphs (:pr:4092) Matthew Rocklin_
  • Fix SerializableLock locked and acquire methods (:pr:4294) Stephan Hoyer_
  • Pin boto3 to earlier version in tests to avoid moto conflict (:pr:4276) Martin Durant_
  • Treat None as missing in config when updating (:pr:4324) Matthew Rocklin_
  • Update Appveyor to Python 3.6 (:pr:4337) Gábor Lipták_
  • Use parse_bytes more liberally in dask.dataframe/bytes/bag (:pr:4339) Matthew Rocklin_
  • Add a better error message when cloudpickle is missing (:pr:4342) Mark Harfouche_
  • Support pool= keyword argument in threaded/multiprocessing get functions (:pr:4351) Matthew Rocklin_
  • Allow updates from arbitrary Mappings in config.update, not only dicts. (:pr:4356) Stuart Berg_
  • Move dask/array/top.py code to dask/blockwise.py (:pr:4348) Matthew Rocklin_
  • Add has_parallel_type (:pr:4395) Matthew Rocklin_
  • CI: Update Appveyor (:pr:4381) Tom Augspurger_
  • Ignore non-readable config files (:pr:4388) Jim Crist_

.. _v1.0.0 / 2018-11-28:

1.0.0 / 2018-11-28

Array ^^^^^

  • Add nancumsum/nancumprod unit tests (:pr:4215) crusaderky_

DataFrame ^^^^^^^^^

  • Add index to to_dask_dataframe docstring (:pr:4232) James Bourbeau_
  • Text and fix when appending categoricals with fastparquet (:pr:4245) Martin Durant_
  • Don't reread metadata when passing ParquetFile to read_parquet (:pr:4247) Martin Durant_

Documentation ^^^^^^^^^^^^^

  • Copy edit documentation (:pr:4222) (:pr:4224) (:pr:4228) (:pr:4231) (:pr:4230) (:pr:4234) (:pr:4235) (:pr:4254) Miguel Farrajota_
  • Updated doc for the new scheduler keyword (:pr:4251) @milesial_

Core ^^^^

  • Avoid a few warnings (:pr:4223) Matthew Rocklin_
  • Remove dask.store module (:pr:4221) Matthew Rocklin_
  • Remove AUTHORS.md Jim Crist_

.. _v0.20.2 / 2018-11-15:

0.20.2 / 2018-11-15

Array ^^^^^

  • Avoid fusing dependencies of atop reductions (:pr:4207) Matthew Rocklin_

Dataframe ^^^^^^^^^

  • Improve memory footprint for dataframe correlation (:pr:4193) Damien Garaud_
  • Add empty DataFrame check to boundary_slice (:pr:4212) James Bourbeau_

Documentation ^^^^^^^^^^^^^

  • Copy edit documentation (:pr:4197) (:pr:4204) (:pr:4198) (:pr:4199) (:pr:4200) (:pr:4202) (:pr:4209) Miguel Farrajota_
  • Add stats module namespace (:pr:4206) James Bourbeau_
  • Fix link in dataframe documentation (:pr:4208) James Bourbeau_

.. _v0.20.1 / 2018-11-09:

0.20.1 / 2018-11-09

Array ^^^^^

  • Only allocate the result space in wrapped_pad_func (:pr:4153) John A Kirkham_
  • Generalize expand_pad_width to expand_pad_value (:pr:4150) John A Kirkham_
  • Test da.pad with 2D linear_ramp case (:pr:4162) John A Kirkham_
  • Fix import for broadcast_to. (:pr:4168) samc0de_
  • Rewrite Dask Array's pad to add only new chunks (:pr:4152) John A Kirkham_
  • Validate index inputs to atop (:pr:4182) Matthew Rocklin_

Core ^^^^

  • Dask.config set and get normalize underscores and hyphens (:pr:4143) James Bourbeau_
  • Only subs on core collections, not subclasses (:pr:4159) Matthew Rocklin_
  • Add block_size=0 option to HTTPFileSystem. (:pr:4171) Martin Durant_
  • Add traverse support for dataclasses (:pr:4165) Armin Berres_
  • Avoid optimization on sharedicts without dependencies (:pr:4181) Matthew Rocklin_
  • Update the pytest version for TravisCI (:pr:4189) Damien Garaud_
  • Use key_split rather than funcname in visualize names (:pr:4160) Matthew Rocklin_

Dataframe ^^^^^^^^^

  • Add fix for DataFrame.setitem for index (:pr:4151) Anderson Banihirwe_
  • Fix column choice when passing list of files to fastparquet (:pr:4174) Martin Durant_
  • Pass engine_kwargs from read_sql_table to sqlalchemy (:pr:4187) Damien Garaud_

Documentation ^^^^^^^^^^^^^

  • Fix documentation in Delayed best practices example that returned an empty list (:pr:4147) Jonathan Fraine_
  • Copy edit documentation (:pr:4164) (:pr:4175) (:pr:4185) (:pr:4192) (:pr:4191) (:pr:4190) (:pr:4180) Miguel Farrajota_
  • Fix typo in docstring (:pr:4183) Carlos Valiente_

.. _v0.20.0 / 2018-10-26:

0.20.0 / 2018-10-26

Array ^^^^^

  • Fuse Atop operations (:pr:3998), (:pr:4081) Matthew Rocklin_
  • Support da.asanyarray on dask dataframes (:pr:4080) Matthew Rocklin_
  • Remove unnecessary endianness check in datetime test (:pr:4113) Elliott Sales de Andrade_
  • Set name=False in array foo_like functions (:pr:4116) Matthew Rocklin_
  • Remove dask.array.ghost module (:pr:4121) Matthew Rocklin_
  • Fix use of getargspec in dask array (:pr:4125) Stephan Hoyer_
  • Adds dask.array.invert (:pr:4127), (:pr:4131) Anderson Banihirwe_
  • Raise informative error on arg-reduction on unknown chunksize (:pr:4128), (:pr:4135) Matthew Rocklin_
  • Normalize reversed slices in dask array (:pr:4126) Matthew Rocklin_

Bag ^^^

  • Add bag.to_avro (:pr:4076) Martin Durant_

Core ^^^^

  • Pull num_workers from config.get (:pr:4086), (:pr:4093) James Bourbeau_
  • Fix invalid escape sequences with raw strings (:pr:4112) Elliott Sales de Andrade_
  • Raise an error on the use of the get= keyword and set_options (:pr:4077) Matthew Rocklin_
  • Add import for Azure DataLake storage, and add docs (:pr:4132) Martin Durant_
  • Avoid collections.Mapping/Sequence (:pr:4138) Matthew Rocklin_

Dataframe ^^^^^^^^^

  • Include index keyword in to_dask_dataframe (:pr:4071) Matthew Rocklin_
  • add support for duplicate column names (:pr:4087) Jan Koch_
  • Implement min_count for the DataFrame methods sum and prod (:pr:4090) Bart Broere_
  • Remove pandas warnings in concat (:pr:4095) Matthew Rocklin_
  • DataFrame.to_csv header option to only output headers in the first chunk (:pr:3909) Rahul Vaidya_
  • Remove Series.to_parquet (:pr:4104) Justin Dennison_
  • Avoid warnings and deprecated pandas methods (:pr:4115) Matthew Rocklin_
  • Swap 'old' and 'previous' when reporting append error (:pr:4130) Martin Durant_

Documentation ^^^^^^^^^^^^^

  • Copy edit documentation (:pr:4073), (:pr:4074), (:pr:4094), (:pr:4097), (:pr:4107), (:pr:4124), (:pr:4133), (:pr:4139) Miguel Farrajota_
  • Fix typo in code example (:pr:4089) Antonino Ingargiola_
  • Add pycon 2018 presentation (:pr:4102) Javad_
  • Quick description for gcsfs (:pr:4109) Martin Durant_
  • Fixed typo in docstrings of read_sql_table method (:pr:4114) TakaakiFuruse_
  • Make target directories in redirects if they don't exist (:pr:4136) Matthew Rocklin_

.. _v0.19.4 / 2018-10-09:

0.19.4 / 2018-10-09

Array ^^^^^

  • Implement apply_gufunc(..., axes=..., keepdims=...) (:pr:3985) Markus Gonser_

Bag ^^^

  • Fix typo in datasets.make_people (:pr:4069) Matthew Rocklin_

Dataframe ^^^^^^^^^

  • Added percentiles options for dask.dataframe.describe method (:pr:4067) Zhenqing Li_
  • Add DataFrame.partitions accessor similar to Array.blocks (:pr:4066) Matthew Rocklin_

Core ^^^^

  • Pass get functions and Clients through scheduler keyword (:pr:4062) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

  • Fix Typo on hpc example. (missing = in kwarg). (:pr:4068) Matthias Bussonier_
  • Extensive copy-editing: (:pr:4065), (:pr:4064), (:pr:4063) Miguel Farrajota_

.. _v0.19.3 / 2018-10-05:

0.19.3 / 2018-10-05

Array ^^^^^

  • Make da.RandomState extensible to other modules (:pr:4041) Matthew Rocklin_
  • Support unknown dims in ravel no-op case (:pr:4055) Jim Crist_
  • Add basic infrastructure for cupy (:pr:4019) Matthew Rocklin_
  • Avoid asarray and lock arguments for from_array(getitem) (:pr:4044) Matthew Rocklin_
  • Move local imports in corrcoef to global imports (:pr:4030) John A Kirkham_
  • Move local indices import to global import (:pr:4029) John A Kirkham_
  • Fix-up Dask Array's fromfunction w.r.t. dtype and kwargs (:pr:4028) John A Kirkham_
  • Don't use dummy expansion for trim_internal in overlapped (:pr:3964) Mark Harfouche_
  • Add unravel_index (:pr:3958) John A Kirkham_

Bag ^^^

  • Sort result in Bag.frequencies (:pr:4033) Matthew Rocklin_
  • Add support for npartitions=1 edge case in groupby (:pr:4050) James Bourbeau_
  • Add new random dataset for people (:pr:4018) Matthew Rocklin_
  • Improve performance of bag.read_text on small files (:pr:4013) Eric Wolak_
  • Add bag.read_avro (:pr:4000) (:pr:4007) Martin Durant_

Dataframe ^^^^^^^^^

  • Added an index parameter to :meth:dask.dataframe.from_dask_array for creating a dask DataFrame from a dask Array with a given index. (:pr:3991) Tom Augspurger_
  • Improve sub-classability of dask dataframe (:pr:4015) Matthew Rocklin_
  • Fix failing hdfs test [test-hdfs] (:pr:4046) Jim Crist_
  • fuse_subgraphs works without normal fuse (:pr:4042) Jim Crist_
  • Make path for reading many parquet files without prescan (:pr:3978) Martin Durant_
  • Index in dd.from_dask_array (:pr:3991) Tom Augspurger_
  • Making skiprows accept lists (:pr:3975) Julia Signell_
  • Fail early in fastparquet read for nonexistent column (:pr:3989) Martin Durant_

Core ^^^^

  • Add support for npartitions=1 edge case in groupby (:pr:4050) James Bourbeau_
  • Automatically wrap large arguments with dask.delayed in map_blocks/partitions (:pr:4002) Matthew Rocklin_
  • Fuse linear chains of subgraphs (:pr:3979) Jim Crist_
  • Make multiprocessing context configurable (:pr:3763) Itamar Turner-Trauring_

Documentation ^^^^^^^^^^^^^

  • Extensive copy-editing (:pr:4049), (:pr:4034), (:pr:4031), (:pr:4020), (:pr:4021), (:pr:4022), (:pr:4023), (:pr:4016), (:pr:4017), (:pr:4010), (:pr:3997), (:pr:3996), Miguel Farrajota_
  • Update shuffle method selection docs (:pr:4048) James Bourbeau_
  • Remove docs/source/examples, point to examples.dask.org (:pr:4014) Matthew Rocklin_
  • Replace readthedocs links with dask.org (:pr:4008) Matthew Rocklin_
  • Updates DataFrame.to_hdf docstring for returned values (:pr:3992) James Bourbeau_

.. _v0.19.2 / 2018-09-17:

0.19.2 / 2018-09-17

Array ^^^^^

  • apply_gufunc implements automatic infer of functions output dtypes (:pr:3936) Markus Gonser_
  • Fix array histogram range error when array has nans (:pr:3980) James Bourbeau_
  • Issue 3937 follow up, int type checks. (:pr:3956) Yu Feng_
  • from_array: add @martindurant's explaining of how hashing is done for an array. (:pr:3965) Mark Harfouche_
  • Support gradient with coordinate (:pr:3949) Keisuke Fujii_

Core ^^^^

  • Fix use of has_keyword with partial in Python 2.7 (:pr:3966) Mark Harfouche_
  • Set pyarrow as default for HDFS (:pr:3957) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

  • Use dask_sphinx_theme (:pr:3963) Matthew Rocklin_
  • Use JupyterLab in Binder links from main page Matthew Rocklin_
  • DOC: fixed sphinx syntax (:pr:3960) Tom Augspurger_

.. _v0.19.1 / 2018-09-06:

0.19.1 / 2018-09-06

Array ^^^^^

  • Don't enforce dtype if result has no dtype (:pr:3928) Matthew Rocklin_
  • Fix NumPy issubtype deprecation warning (:pr:3939) Bruce Merry_
  • Fix arg reduction tokens to be unique with different arguments (:pr:3955) Tobias de Jong_
  • Coerce numpy integers to ints in slicing code (:pr:3944) Yu Feng_
  • Linalg.norm ndim along axis partial fix (:pr:3933) Tobias de Jong_

Dataframe ^^^^^^^^^

  • Deterministic DataFrame.set_index (:pr:3867) George Sakkis_
  • Fix divisions in read_parquet when dealing with filters #3831 #3930 (:pr:3923) (:pr:3931) @andrethrill_
  • Fixing returning type in categorical.as_known (:pr:3888) Sriharsha Hatwar_
  • Fix DataFrame.assign for callables (:pr:3919) Tom Augspurger_
  • Include partitions with no width in repartition (:pr:3941) Matthew Rocklin_
  • Don't constrict stage/k dtype in dataframe shuffle (:pr:3942) Matthew Rocklin_

Documentation ^^^^^^^^^^^^^

  • DOC: Add hint on how to render task graphs horizontally (:pr:3922) Uwe Korn_
  • Add try-now button to main landing page (:pr:3924) Matthew Rocklin_

.. _v0.19.0 / 2018-08-29:

0.19.0 / 2018-08-29

Array ^^^^^

  • Support coordinate in gradient (:pr:3949) Keisuke Fujii_
  • Fix argtopk split_every bug (:pr:3810) crusaderky_
  • Ensure result computing dask.array.isnull() always gives a numpy array (:pr:3825) Stephan Hoyer_
  • Support concatenate for scipy.sparse in dask array (:pr:3836) Matthew Rocklin_
  • Fix argtopk on 32-bit systems. (:pr:3823) Elliott Sales de Andrade_
  • Normalize keys in rechunk (:pr:3820) Matthew Rocklin_
  • Allow shape of dask.array to be a numpy array (:pr:3844) Mark Harfouche_
  • Fix numpy deprecation warning on tuple indexing (:pr:3851) Tobias de Jong_
  • Rename ghost module to overlap (:pr:3830) Robert Sare_
  • Re-add the ghost import to da init (:pr:3861) Jim Crist_
  • Ensure copy preserves masked arrays (:pr:3852) Tobias de Jong_

DataFrame ^^^^^^^^^^

  • Added dtype and sparse keywords to :func:dask.dataframe.get_dummies (:pr:3792) Tom Augspurger_
  • Added :meth:dask.dataframe.to_dask_array for converting a Dask Series or DataFrame to a Dask Array, possibly with known chunk sizes (:pr:3884) Tom Augspurger
  • Changed the behavior for :meth:dask.array.asarray for dask dataframe and series inputs. Previously, the series was eagerly converted to an in-memory NumPy array before creating a dask array with known chunks sizes. This caused unexpectedly high memory usage. Now, no intermediate NumPy array is created, and a Dask array with unknown chunk sizes is returned (:pr:3884) Tom Augspurger
  • DataFrame.iloc (:pr:3805) Tom Augspurger_
  • When reading multiple paths, expand globs. (:pr:3828) Irina Truong_
  • Added index column name after resample (:pr:3833) Eric Bonfadini_
  • Add (lazy) shape property to dataframe and series (:pr:3212) Henrique Ribeiro_
  • Fix failing hdfs test [test-hdfs] (:pr:3858) Jim Crist_
  • Fixes for pyarrow 0.10.0 release (:pr:3860) Jim Crist_
  • Rename to_csv keys for diagnostics (:pr:3890) Matthew Rocklin_
  • Match pandas warnings for concat sort (:pr:3897) Tom Augspurger_
  • Include filename in read_csv (:pr:3908) Julia Signell_

Core ^^^^

  • Better error message on import when missing common dependencies (:pr:3771) Danilo Horta_
  • Drop Python 3.4 support (:pr:3840) Jim Crist_
  • Remove expired deprecation warnings (:pr:3841) Jim Crist_
  • Add DASK_ROOT_CONFIG environment variable (:pr:3849) Joe Hamman_
  • Don't cull in local scheduler, do cull in delayed (:pr:3856) Jim Crist_
  • Increase conda download retries (:pr:3857) Jim Crist_
  • Add python_requires and Trove classifiers (:pr:3855) @hugovk_
  • Fix collections.abc deprecation warnings in Python 3.7.0 (:pr:3876) Jan Margeta_
  • Allow dot jpeg to xfail in visualize tests (:pr:3896) Matthew Rocklin_
  • Add Python 3.7 to travis.yml (:pr:3894) Matthew Rocklin_
  • Add expand_environment_variables to dask.config (:pr:3893) Joe Hamman_

Docs ^^^^

  • Fix typo in import statement of diagnostics (:pr:3826) John Mrziglod_
  • Add link to YARN docs (:pr:3838) Jim Crist_
  • fix of minor typos in landing page index.html (:pr:3746) Christoph Moehl_
  • Update delayed-custom.rst (:pr:3850) Anderson Banihirwe_
  • DOC: clarify delayed docstring (:pr:3709) Scott Sievert_
  • Add new presentations (:pr:3880) Javad_
  • Add dask array normalize_chunks to documentation (:pr:3878) Daniel Rothenberg_
  • Docs: Fix link to snakeviz (:pr:3900) Hans Moritz Günther_
  • Add missing to docstring (:pr:3915) @rtobar`_

.. _v0.18.2 / 2018-07-23:

0.18.2 / 2018-07-23

Array ^^^^^

  • Reimplemented argtopk to make it release the GIL (:pr:3610) crusaderky_
  • Don't overlap on non-overlapped dimensions in map_overlap (:pr:3653) Matthew Rocklin_
  • Fix linalg.tsqr for dimensions of uncertain length (:pr:3662) Jeremy Chen_
  • Break apart uneven array-of-int slicing to separate chunks (:pr:3648) Matthew Rocklin_
  • Align auto chunks to provided chunks, rather than shape (:pr:3679) Matthew Rocklin_
  • Adds endpoint and retstep support for linspace (:pr:3675) James Bourbeau_
  • Implement .blocks accessor (:pr:3689) Matthew Rocklin_
  • Add block_info keyword to map_blocks functions (:pr:3686) Matthew Rocklin_
  • Slice by dask array of ints (:pr:3407) crusaderky_
  • Support dtype in arange (:pr:3722) crusaderky_
  • Fix argtopk with uneven chunks (:pr:3720) crusaderky_
  • Raise error when replace=False in da.choice (:pr:3765) James Bourbeau_
  • Update chunks in Array.__setitem__ (:pr:3767) Itamar Turner-Trauring_
  • Add a chunksize convenience property (:pr:3777) Jacob Tomlinson_
  • Fix and simplify array slicing behavior when step < 0 (:pr:3702) Ziyao Wei_
  • Ensure to_zarr with return_stored True returns a Dask Array (:pr:3786) John A Kirkham_

Bag ^^^

  • Add last_endline optional parameter in to_textfiles (:pr:3745) George Sakkis_

Dataframe ^^^^^^^^^

  • Add aggregate function for rolling objects (:pr:3772) Gerome Pistre_
  • Properly tokenize cumulative groupby aggregations (:pr:3799) Cloves Almeida_

Delayed ^^^^^^^

  • Add the @ operator to the delayed objects (:pr:3691) Mark Harfouche_
  • Add delayed best practices to documentation (:pr:3737) Matthew Rocklin_
  • Fix @delayed decorator for methods and add tests (:pr:3757) Ziyao Wei_

Core ^^^^

  • Fix extra progressbar (:pr:3669) Mike Neish_
  • Allow tasks back onto ordering stack if they have one dependency (:pr:3652) Matthew Rocklin_
  • Prefer end-tasks with low numbers of dependencies when ordering (:pr:3588) Tom Augspurger_
  • Add assert_eq to top-level modules (:pr:3726) Matthew Rocklin_
  • Test that dask collections can hold scipy.sparse arrays (:pr:3738) Matthew Rocklin_
  • Fix setup of lz4 decompression functions (:pr:3782) Elliott Sales de Andrade_
  • Add datasets module (:pr:3780) Matthew Rocklin_

.. _v0.18.1 / 2018-06-22:

0.18.1 / 2018-06-22

Array ^^^^^

  • from_array now supports scalar types and nested lists/tuples in input, just like all numpy functions do; it also produces a simpler graph when the input is a plain ndarray (:pr:3568) crusaderky_
  • Fix slicing of big arrays due to cumsum dtype bug (:pr:3620) Marco Rossi_
  • Add Dask Array implementation of pad (:pr:3578) John A Kirkham_
  • Fix array random API examples (:pr:3625) James Bourbeau_
  • Add average function to dask array (:pr:3640) James Bourbeau_
  • Tokenize ghost_internal with axes (:pr:3643) Matthew Rocklin_
  • Add outer for Dask Arrays (:pr:3658) John A Kirkham_

DataFrame ^^^^^^^^^

  • Add Index.to_series method (:pr:3613) Henrique Ribeiro_
  • Fix missing partition columns in pyarrow-parquet (:pr:3636) Martin Durant_

Core ^^^^

  • Minor tweaks to CI (:pr:3629) crusaderky_
  • Add back dask.utils.effective_get (:pr:3642) Matthew Rocklin_
  • DASK_CONFIG dictates config write location (:pr:3621) Jim Crist_
  • Replace 'collections' key in unpack_collections with unique key (:pr:3632) Yu Feng_
  • Avoid deepcopy in dask.config.set (:pr:3649) Matthew Rocklin_

.. _v0.18.0 / 2018-06-14:

0.18.0 / 2018-06-14

Array ^^^^^

  • Add to/from_zarr for Zarr-format datasets and arrays (:pr:3460) Martin Durant_
  • Experimental addition of generalized ufunc support, apply_gufunc, gufunc, and as_gufunc (:pr:3109) (:pr:3526) (:pr:3539) Markus Gonser_
  • Avoid unnecessary rechunking tasks (:pr:3529) Matthew Rocklin_
  • Compute dtypes at runtime for fft (:pr:3511) Matthew Rocklin_
  • Generate UUIDs for all da.store operations (:pr:3540) Martin Durant_
  • Correct internal dimension of Dask's SVD (:pr:3517) John A Kirkham_
  • BUG: do not raise IndexError for identity slice in array.vindex (:pr:3559) Scott Sievert_
  • Adds isneginf and isposinf (:pr:3581) John A Kirkham_
  • Drop Dask Array's learn module (:pr:3580) John A Kirkham_
  • added sfqr (short-and-fat) as a counterpart to tsqr… (:pr:3575) Jeremy Chen_
  • Allow 0-width chunks in dask.array.rechunk (:pr:3591) Marc Pfister_
  • Document Dask Array's nan_to_num in public API (:pr:3599) John A Kirkham_
  • Show block example (:pr:3601) John A Kirkham_
  • Replace token= keyword with name= in map_blocks (:pr:3597) Matthew Rocklin_
  • Disable locking in to_zarr (needed for using to_zarr in a distributed context) (:pr:3607) John A Kirkham_
  • Support Zarr Arrays in to_zarr/from_zarr (:pr:3561) John A Kirkham_
  • Added recursion to array/linalg/tsqr to better manage the single core bottleneck (:pr:3586) Jeremy Chan_ (:pr:3396) crusaderky_

Dataframe ^^^^^^^^^

  • Add to/read_json (:pr:3494) Martin Durant_
  • Adds index to unsupported arguments for DataFrame.rename method (:pr:3522) James Bourbeau_
  • Adds support to subset Dask DataFrame columns using numpy.ndarray, pandas.Series, and pandas.Index objects (:pr:3536) James Bourbeau_
  • Raise error if meta columns do not match dataframe (:pr:3485) Christopher Ren_
  • Add index to unsupprted argument for DataFrame.rename (:pr:3522) James Bourbeau_
  • Adds support for subsetting DataFrames with pandas Index/Series and numpy ndarrays (:pr:3536) James Bourbeau_
  • Dataframe sample method docstring fix (:pr:3566) James Bourbeau_
  • fixes dd.read_json to infer file compression (:pr:3594) Matt Lee_
  • Adds n to sample method (:pr:3606) James Bourbeau_
  • Add fastparquet ParquetFile object support (:pr:3573) @andrethrill_

Bag ^^^

  • Rename method= keyword to shuffle= in bag.groupby (:pr:3470) Matthew Rocklin_

Core ^^^^

  • Replace get= keyword with scheduler= keyword (:pr:3448) Matthew Rocklin_
  • Add centralized dask.config module to handle configuration for all Dask subprojects (:pr:3432) (:pr:3513) (:pr:3520) Matthew Rocklin_
  • Add dask-ssh CLI Options and Description. (:pr:3476) @beomi_
  • Read whole files fix regardless of header for HTTP (:pr:3496) Martin Durant_
  • Adds synchronous scheduler syntax to debugging docs (:pr:3509) James Bourbeau_
  • Replace dask.set_options with dask.config.set (:pr:3502) Matthew Rocklin_
  • Update sphinx readthedocs-theme (:pr:3516) Matthew Rocklin_
  • Introduce "auto" value for normalize_chunks (:pr:3507) Matthew Rocklin_
  • Fix check in configuration with env=None (:pr:3562) Simon Perkins_
  • Update sizeof definitions (:pr:3582) Matthew Rocklin_
  • Remove --verbose flag from travis-ci (:pr:3477) Matthew Rocklin_
  • Remove "da.random" from random array keys (:pr:3604) Matthew Rocklin_

.. _v0.17.5 / 2018-05-16:

0.17.5 / 2018-05-16

Array ^^^^^

  • Fix rechunk with chunksize of -1 in a dict (:pr:3469) Stephan Hoyer_
  • einsum now accepts the split_every parameter (:pr:3471) crusaderky_
  • Improved slicing performance (:pr:3479) Yu Feng_

DataFrame ^^^^^^^^^

  • Compatibility with pandas 0.23.0 (:pr:3499) Tom Augspurger_

.. _v0.17.4 / 2018-05-03:

0.17.4 / 2018-05-03

Dataframe ^^^^^^^^^

  • Add support for indexing Dask DataFrames with string subclasses (:pr:3461) James Bourbeau_
  • Allow using both sorted_index and chunksize in read_hdf (:pr:3463) Pierre Bartet_
  • Pass filesystem to arrow piece reader (:pr:3466) Martin Durant_
  • Switches to using dask.compat string_types (:pr:3462) James Bourbeau_

.. _v0.17.3 / 2018-05-02:

0.17.3 / 2018-05-02

Array ^^^^^

  • Add einsum for Dask Arrays (:pr:3412) Simon Perkins_
  • Add piecewise for Dask Arrays (:pr:3350) John A Kirkham_
  • Fix handling of nan in broadcast_shapes (:pr:3356) John A Kirkham_
  • Add isin for dask arrays (:pr:3363). Stephan Hoyer_
  • Overhauled topk for Dask Arrays: faster algorithm, particularly for large k's; added support for multiple axes, recursive aggregation, and an option to pick the bottom k elements instead. (:pr:3395) crusaderky_
  • The topk API has changed from topk(k, array) to the more conventional topk(array, k). The legacy API still works but is now deprecated. (:pr:2965) crusaderky_
  • New function argtopk for Dask Arrays (:pr:3396) crusaderky_
  • Fix handling partial depth and boundary in map_overlap (:pr:3445) John A Kirkham_
  • Add gradient for Dask Arrays (:pr:3434) John A Kirkham_

DataFrame ^^^^^^^^^

  • Allow t as shorthand for table in to_hdf for pandas compatibility (:pr:3330) Jörg Dietrich_
  • Added top level isna method for Dask DataFrames (:pr:3294) Christopher Ren_
  • Fix selection on partition column on read_parquet for engine="pyarrow" (:pr:3207) Uwe Korn_
  • Added DataFrame.squeeze method (:pr:3366) Christopher Ren_
  • Added infer_divisions option to read_parquet to specify whether read engines should compute divisions (:pr:3387) Jon Mease_
  • Added support for inferring division for engine="pyarrow" (:pr:3387) Jon Mease_
  • Provide more informative error message for meta= errors (:pr:3343) Matthew Rocklin_
  • add orc reader (:pr:3284) Martin Durant_
  • Default compression for parquet now always Snappy, in line with pandas (:pr:3373) Martin Durant_
  • Fixed bug in Dask DataFrame and Series comparisons with NumPy scalars (:pr:3436) James Bourbeau_
  • Remove outdated requirement from repartition docstring (:pr:3440) Jörg Dietrich_
  • Fixed bug in aggregation when only a Series is selected (:pr:3446) Jörg Dietrich_
  • Add default values to make_timeseries (:pr:3421) Matthew Rocklin_

Core ^^^^

  • Support traversing collections in persist, visualize, and optimize (:pr:3410) Jim Crist_
  • Add schedule= keyword to compute and persist. This replaces common use of the get= keyword (:pr:3448) Matthew Rocklin_

.. _v0.17.2 / 2018-03-21:

0.17.2 / 2018-03-21

Array ^^^^^

  • Add broadcast_arrays for Dask Arrays (:pr:3217) John A Kirkham_
  • Add bitwise_* ufuncs (:pr:3219) John A Kirkham_
  • Add optional axis argument to squeeze (:pr:3261) John A Kirkham_
  • Validate inputs to atop (:pr:3307) Matthew Rocklin_
  • Avoid calls to astype in concatenate if all parts have the same dtype (:pr:3301) Martin Durant_

DataFrame ^^^^^^^^^

  • Fixed bug in shuffle due to aggressive truncation (:pr:3201) Matthew Rocklin_
  • Support specifying categorical columns on read_parquet with categories=[…] for engine="pyarrow" (:pr:3177) Uwe Korn_
  • Add dd.tseries.Resampler.agg (:pr:3202) Richard Postelnik_
  • Support operations that mix dataframes and arrays (:pr:3230) Matthew Rocklin_
  • Support extra Scalar and Delayed args in dd.groupby._Groupby.apply (:pr:3256) Gabriele Lanaro_

Bag ^^^

  • Support joining against single-partitioned bags and delayed objects (:pr:3254) Matthew Rocklin_

Core ^^^^

  • Fixed bug when using unexpected but hashable types for keys (:pr:3238) Daniel Collins_
  • Fix bug in task ordering so that we break ties consistently with the key name (:pr:3271) Matthew Rocklin_
  • Avoid sorting tasks in order when the number of tasks is very large (:pr:3298) Matthew Rocklin_

.. _v0.17.1 / 2018-02-22:

0.17.1 / 2018-02-22

Array ^^^^^

  • Corrected dimension chunking in indices (:issue:3166, :pr:3167) Simon Perkins_
  • Inline store_chunk calls for store's return_stored option (:pr:3153) John A Kirkham_
  • Compatibility with struct dtypes for NumPy 1.14.1 release (:pr:3187) Matthew Rocklin_

DataFrame ^^^^^^^^^

  • Bugfix to allow column assignment of pandas datetimes(:pr:3164) Max Epstein_

Core ^^^^

  • New file-system for HTTP(S), allowing direct loading from specific URLs (:pr:3160) Martin Durant_
  • Fix bug when tokenizing partials with no keywords (:pr:3191) Matthew Rocklin_
  • Use more recent LZ4 API (:pr:3157) Thrasibule_
  • Introduce output stream parameter for progress bar (:pr:3185) Dieter Weber_

.. _v0.17.0 / 2018-02-09:

0.17.0 / 2018-02-09

Array ^^^^^

  • Added a support object-type arrays for nansum, nanmin, and nanmax (:issue:3133) Keisuke Fujii_
  • Update error handling when len is called with empty chunks (:issue:3058) Xander Johnson_
  • Fixes a metadata bug with store's return_stored option (:pr:3064) John A Kirkham_
  • Fix a bug in optimization.fuse_slice to properly handle when first input is None (:pr:3076) James Bourbeau_
  • Support arrays with unknown chunk sizes in percentile (:pr:3107) Matthew Rocklin_
  • Tokenize scipy.sparse arrays and np.matrix (:pr:3060) Roman Yurchak_

DataFrame ^^^^^^^^^

  • Support month timedeltas in repartition(freq=...) (:pr:3110) Matthew Rocklin_
  • Avoid mutation in dataframe groupby tests (:pr:3118) Matthew Rocklin_
  • read_csv, read_table, and read_parquet accept iterables of paths (:pr:3124) Jim Crist_
  • Deprecates the dd.to_delayed function in favor of the existing method (:pr:3126) Jim Crist_
  • Return dask.arrays from df.map_partitions calls when the UDF returns a numpy array (:pr:3147) Matthew Rocklin_
  • Change handling of columns and index in dd.read_parquet to be more consistent, especially in handling of multi-indices (:pr:3149) Jim Crist_
  • fastparquet append=True allowed to create new dataset (:pr:3097) Martin Durant_
  • dtype rationalization for sql queries (:pr:3100) Martin Durant_

Bag ^^^

  • Document bag.map_paritions function may receive either a list or generator. (:pr:3150) Nir_

Core ^^^^

  • Change default task ordering to prefer nodes with few dependents and then many downstream dependencies (:pr:3056) Matthew Rocklin_
  • Add color= option to visualize to color by task order (:pr:3057) (:pr:3122) Matthew Rocklin_
  • Deprecate dask.bytes.open_text_files (:pr:3077) Jim Crist_
  • Remove short-circuit hdfs reads handling due to maintenance costs. May be re-added in a more robust manner later (:pr:3079) Jim Crist_
  • Add dask.base.optimize for optimizing multiple collections without computing. (:pr:3071) Jim Crist_
  • Rename dask.optimize module to dask.optimization (:pr:3071) Jim Crist_
  • Change task ordering to do a full traversal (:pr:3066) Matthew Rocklin_
  • Adds an optimize_graph keyword to all to_delayed methods to allow controlling whether optimizations occur on conversion. (:pr:3126) Jim Crist_
  • Support using pyarrow for hdfs integration (:pr:3123) Jim Crist_
  • Move HDFS integration and tests into dask repo (:pr:3083) Jim Crist_
  • Remove write_bytes (:pr:3116) Jim Crist_

.. _v0.16.1 / 2018-01-09:

0.16.1 / 2018-01-09

Array ^^^^^

  • Fix handling of scalar percentile values in percentile (:pr:3021) James Bourbeau_
  • Prevent bool() coercion from calling compute (:pr:2958) Albert DeFusco_
  • Add matmul (:pr:2904) John A Kirkham_
  • Support N-D arrays with matmul (:pr:2909) John A Kirkham_
  • Add vdot (:pr:2910) John A Kirkham_
  • Explicit chunks argument for broadcast_to (:pr:2943) Stephan Hoyer_
  • Add meshgrid (:pr:2938) John A Kirkham_ and (:pr:3001) Markus Gonser_
  • Preserve singleton chunks in fftshift/ifftshift (:pr:2733) John A Kirkham_
  • Fix handling of negative indexes in vindex and raise errors for out of bounds indexes (:pr:2967) Stephan Hoyer_
  • Add flip, flipud, fliplr (:pr:2954) John A Kirkham_
  • Add float_power ufunc (:pr:2962) (:pr:2969) John A Kirkham_
  • Compatibility for changes to structured arrays in the upcoming NumPy 1.14 release (:pr:2964) Tom Augspurger_
  • Add block (:pr:2650) John A Kirkham_
  • Add frompyfunc (:pr:3030) Jim Crist_
  • Add the return_stored option to store for chaining stored results (:pr:2980) John A Kirkham_

DataFrame ^^^^^^^^^

  • Fixed naming bug in cumulative aggregations (:issue:3037) Martijn Arts_
  • Fixed dd.read_csv when names is given but header is not set to None (:issue:2976) Martijn Arts_
  • Fixed dd.read_csv so that passing instances of CategoricalDtype in dtype will result in known categoricals (:pr:2997) Tom Augspurger_
  • Prevent bool() coercion from calling compute (:pr:2958) Albert DeFusco_
  • DataFrame.read_sql() (:pr:2928) to an empty database tables returns an empty dask dataframe Apostolos Vlachopoulos_
  • Compatibility for reading Parquet files written by PyArrow 0.8.0 (:pr:2973) Tom Augspurger_
  • Correctly handle the column name (df.columns.name) when reading in dd.read_parquet (:pr:2973) Tom Augspurger_
  • Fixed dd.concat losing the index dtype when the data contained a categorical (:issue:2932) Tom Augspurger_
  • Add dd.Series.rename (:pr:3027) Jim Crist_
  • DataFrame.merge() now supports merging on a combination of columns and the index (:pr:2960) Jon Mease_
  • Removed the deprecated dd.rolling* methods, in preparation for their removal in the next pandas release (:pr:2995) Tom Augspurger_
  • Fix metadata inference bug in which single-partition series were mistakenly special cased (:pr:3035) Jim Crist_
  • Add support for Series.str.cat (:pr:3028) Jim Crist_

Core ^^^^

  • Improve 32-bit compatibility (:pr:2937) Matthew Rocklin_
  • Change task prioritization to avoid upwards branching (:pr:3017) Matthew Rocklin_

.. _v0.16.0 / 2017-11-17:

0.16.0 / 2017-11-17

This is a major release. It includes breaking changes, new protocols, and a large number of bug fixes.

Array ^^^^^

  • Add atleast_1d, atleast_2d, and atleast_3d (:pr:2760) (:pr:2765) John A Kirkham_
  • Add allclose (:pr:2771) by John A Kirkham_
  • Remove random.different_seeds from Dask Array API docs (:pr:2772) John A Kirkham_
  • Deprecate vnorm in favor of dask.array.linalg.norm (:pr:2773) John A Kirkham_
  • Reimplement unique to be lazy (:pr:2775) John A Kirkham_
  • Support broadcasting of Dask Arrays with 0-length dimensions (:pr:2784) John A Kirkham_
  • Add asarray and asanyarray to Dask Array API docs (:pr:2787) James Bourbeau_
  • Support unique's return_* arguments (:pr:2779) John A Kirkham_
  • Simplify _unique_internal (:pr:2850) (:pr:2855) John A Kirkham_
  • Avoid removing some getter calls in array optimizations (:pr:2826) Jim Crist_

DataFrame ^^^^^^^^^

  • Support pyarrow in dd.to_parquet (:pr:2868) Jim Crist_
  • Fixed DataFrame.quantile and Series.quantile returning nan when missing values are present (:pr:2791) Tom Augspurger_
  • Fixed DataFrame.quantile losing the result .name when q is a scalar (:pr:2791) Tom Augspurger_
  • Fixed dd.concat return a dask.Dataframe when concatenating a single series along the columns, matching pandas' behavior (:pr:2800) James Munroe_
  • Fixed default inplace parameter for DataFrame.eval to match the pandas defualt for pandas >= 0.21.0 (:pr:2838) Tom Augspurger_
  • Fix exception when calling DataFrame.set_index on text column where one of the partitions was empty (:pr:2831) Jesse Vogt_
  • Do not raise exception when calling DataFrame.set_index on empty dataframe (:pr:2827) Jesse Vogt_
  • Fixed bug in Dataframe.fillna when filling with a Series value (:pr:2810) Tom Augspurger_
  • Deprecate old argument ordering in dd.to_parquet to better match convention of putting the dataframe first (:pr:2867) Jim Crist_
  • df.astype(categorical_dtype -> known categoricals (:pr:2835) Jim Crist_
  • Test against Pandas release candidate (:pr:2814) Tom Augspurger_
  • Add more tests for read_parquet(engine='pyarrow') (:pr:2822) Uwe Korn_
  • Remove unnecessary map_partitions in aggregate (:pr:2712) Christopher Prohm_
  • Fix bug calling sample on empty partitions (:pr:2818) @xwang777_
  • Error nicely when parsing dates in read_csv (:pr:2863) Jim Crist_
  • Cleanup handling of passing filesystem objects to PyArrow readers (:pr:2527) @fjetter_
  • Support repartitioning even if there are no divisions (:pr:2873) @Ced4_
  • Support reading/writing to hdfs using pyarrow in dd.to_parquet (:pr:2894, :pr:2881) Jim Crist_

Core ^^^^

  • Allow tuples as sharedict keys (:pr:2763) Matthew Rocklin_
  • Calling compute within a dask.distributed task defaults to distributed scheduler (:pr:2762) Matthew Rocklin_
  • Auto-import gcsfs when gcs:// protocol is used (:pr:2776) Matthew Rocklin_
  • Fully remove dask.async module, use dask.local instead (:pr:2828) Thomas Caswell_
  • Compatibility with bokeh 0.12.10 (:pr:2844) Tom Augspurger_
  • Reduce test memory usage (:pr:2782) Jim Crist_
  • Add Dask collection interface (:pr:2748) Jim Crist_
  • Update Dask collection interface during XArray integration (:pr:2847) Matthew Rocklin_
  • Close resource profiler process on exit (:pr:2871) Jim Crist_
  • Fix S3 tests (:pr:2875) Jim Crist_
  • Fix port for bokeh dashboard in docs (:pr:2889) Ian Hopkinson_
  • Wrap Dask filesystems for PyArrow compatibility (:pr:2881) Jim Crist_

.. _v0.15.4 / 2017-10-06:

0.15.4 / 2017-10-06

Array ^^^^^

  • da.random.choice now works with array arguments (:pr:2781)
  • Support indexing in arrays with np.int (fixes regression) (:pr:2719)
  • Handle zero dimension with rechunking (:pr:2747)
  • Support -1 as an alias for "size of the dimension" in chunks (:pr:2749)
  • Call mkdir in array.to_npy_stack (:pr:2709)

DataFrame ^^^^^^^^^

  • Added the .str accessor to Categoricals with string categories (:pr:2743)
  • Support int96 (spark) datetimes in parquet writer (:pr:2711)
  • Pass on file scheme to fastparquet (:pr:2714)
  • Support Pandas 0.21 (:pr:2737)

Bag ^^^

  • Add tree reduction support for foldby (:pr:2710)

Core ^^^^

  • Drop s3fs from pip install dask[complete] (:pr:2750)

.. _v0.15.3 / 2017-09-24:

0.15.3 / 2017-09-24

Array ^^^^^

  • Add masked arrays (:pr:2301)
  • Add *_like array creation functions (:pr:2640)
  • Indexing with unsigned integer array (:pr:2647)
  • Improved slicing with boolean arrays of different dimensions (:pr:2658)
  • Support literals in top and atop (:pr:2661)
  • Optional axis argument in cumulative functions (:pr:2664)
  • Improve tests on scalars with assert_eq (:pr:2681)
  • Fix norm keepdims (:pr:2683)
  • Add ptp (:pr:2691)
  • Add apply_along_axis (:pr:2690) and apply_over_axes (:pr:2702)

DataFrame ^^^^^^^^^

  • Added Series.str[index] (:pr:2634)
  • Allow the groupby by param to handle columns and index levels (:pr:2636)
  • DataFrame.to_csv and Bag.to_textfiles now return the filenames to which they have written (:pr:2655)
  • Fix combination of partition_on and append in to_parquet (:pr:2645)
  • Fix for parquet file schemes (:pr:2667)
  • Repartition works with mixed categoricals (:pr:2676)

Core ^^^^

  • python setup.py test now runs tests (:pr:2641)
  • Added new cheatsheet (:pr:2649)
  • Remove resize tool in Bokeh plots (:pr:2688)

.. _v0.15.2 / 2017-08-25:

0.15.2 / 2017-08-25

Array ^^^^^

  • Remove spurious keys from map_overlap graph (:pr:2520)
  • where works with non-bool condition and scalar values (:pr:2543) (:pr:2549)
  • Improve compress (:pr:2541) (:pr:2545) (:pr:2555)
  • Add argwhere, _nonzero, and where(cond) (:pr:2539)
  • Generalize vindex in dask.array to handle multi-dimensional indices (:pr:2573)
  • Add choose method (:pr:2584)
  • Split code into reorganized files (:pr:2595)
  • Add linalg.norm (:pr:2597)
  • Add diff, ediff1d (:pr:2607), (:pr:2609)
  • Improve dtype inference and reflection (:pr:2571)

Bag ^^^

  • Remove deprecated Bag behaviors (:pr:2525)

DataFrame ^^^^^^^^^

  • Support callables in assign (:pr:2513)
  • better error messages for read_csv (:pr:2522)
  • Add dd.to_timedelta (:pr:2523)
  • Verify metadata in from_delayed (:pr:2534) (:pr:2591)
  • Add DataFrame.isin (:pr:2558)
  • Read_hdf supports iterables of files (:pr:2547)

Core ^^^^

  • Remove bare except: blocks everywhere (:pr:2590)

.. _v0.15.1 / 2017-07-08:

0.15.1 / 2017-07-08

  • Add storage_options to to_textfiles and to_csv (:pr:2466)
  • Rechunk and simplify rfftfreq (:pr:2473), (:pr:2475)
  • Better support ndarray subclasses (:pr:2486)
  • Import star in dask.distributed (:pr:2503)
  • Threadsafe cache handling with tokenization (:pr:2511)

.. _v0.15.0 / 2017-06-09:

0.15.0 / 2017-06-09

Array ^^^^^

  • Add dask.array.stats submodule (:pr:2269)
  • Support ufunc.outer (:pr:2345)
  • Optimize fancy indexing by reducing graph overhead (:pr:2333) (:pr:2394)
  • Faster array tokenization using alternative hashes (:pr:2377)
  • Added the matmul @ operator (:pr:2349)
  • Improved coverage of the numpy.fft module (:pr:2320) (:pr:2322) (:pr:2327) (:pr:2323)
  • Support NumPy's __array_ufunc__ protocol (:pr:2438)

Bag ^^^

  • Fix bug where reductions on bags with no partitions would fail (:pr:2324)
  • Add broadcasting and variadic db.map top-level function. Also remove auto-expansion of tuples as map arguments (:pr:2339)
  • Rename Bag.concat to Bag.flatten (:pr:2402)

DataFrame ^^^^^^^^^

  • Parquet improvements (:pr:2277) (:pr:2422)

Core ^^^^

  • Move dask.async module to dask.local (:pr:2318)
  • Support callbacks with nested scheduler calls (:pr:2397)
  • Support pathlib.Path objects as uris (:pr:2310)

.. _v0.14.3 / 2017-05-05:

0.14.3 / 2017-05-05

DataFrame ^^^^^^^^^

  • Pandas 0.20.0 support

.. _v0.14.2 / 2017-05-03:

0.14.2 / 2017-05-03

Array ^^^^^

  • Add da.indices (:pr:2268), da.tile (:pr:2153), da.roll (:pr:2135)
  • Simultaneously support drop_axis and new_axis in da.map_blocks (:pr:2264)
  • Rechunk and concatenate work with unknown chunksizes (:pr:2235) and (:pr:2251)
  • Support non-numpy container arrays, notably sparse arrays (:pr:2234)
  • Tensordot contracts over multiple axes (:pr:2186)
  • Allow delayed targets in da.store (:pr:2181)
  • Support interactions against lists and tuples (:pr:2148)
  • Constructor plugins for debugging (:pr:2142)
  • Multi-dimensional FFTs (single chunk) (:pr:2116)

Bag ^^^

  • to_dataframe enforces consistent types (:pr:2199)

DataFrame ^^^^^^^^^

  • Set_index always fully sorts the index (:pr:2290)
  • Support compatibility with pandas 0.20.0 (:pr:2249), (:pr:2248), and (:pr:2246)
  • Support Arrow Parquet reader (:pr:2223)
  • Time-based rolling windows (:pr:2198)
  • Repartition can now create more partitions, not just less (:pr:2168)

Core ^^^^

  • Always use absolute paths when on POSIX file system (:pr:2263)
  • Support user provided graph optimizations (:pr:2219)
  • Refactor path handling (:pr:2207)
  • Improve fusion performance (:pr:2129), (:pr:2131), and (:pr:2112)

.. _v0.14.1 / 2017-03-22:

0.14.1 / 2017-03-22

Array ^^^^^

  • Micro-optimize optimizations (:pr:2058)
  • Change slicing optimizations to avoid fusing raw numpy arrays (:pr:2075) (:pr:2080)
  • Dask.array operations now work on numpy arrays (:pr:2079)
  • Reshape now works in a much broader set of cases (:pr:2089)
  • Support deepcopy python protocol (:pr:2090)
  • Allow user-provided FFT implementations in da.fft (:pr:2093)

DataFrame ^^^^^^^^^

  • Fix to_parquet with empty partitions (:pr:2020)
  • Optional npartitions='auto' mode in set_index (:pr:2025)
  • Optimize shuffle performance (:pr:2032)
  • Support efficient repartitioning along time windows like repartition(freq='12h') (:pr:2059)
  • Improve speed of categorize (:pr:2010)
  • Support single-row dataframe arithmetic (:pr:2085)
  • Automatically avoid shuffle when setting index with a sorted column (:pr:2091)
  • Improve handling of integer-na handling in read_csv (:pr:2098)

Delayed ^^^^^^^

  • Repeated attribute access on delayed objects uses the same key (:pr:2084)

Core ^^^^

  • Improve naming of nodes in dot visuals to avoid generic apply (:pr:2070)
  • Ensure that worker processes have different random seeds (:pr:2094)

.. _v0.14.0 / 2017-02-24:

0.14.0 / 2017-02-24

Array ^^^^^

  • Fix corner cases with zero shape and misaligned values in arange (:pr:1902), (:pr:1904), (:pr:1935), (:pr:1955), (:pr:1956)
  • Improve concatenation efficiency (:pr:1923)
  • Avoid hashing in from_array if name is provided (:pr:1972)

Bag ^^^

  • Repartition can now increase number of partitions (:pr:1934)
  • Fix bugs in some reductions with empty partitions (:pr:1939), (:pr:1950), (:pr:1953)

DataFrame ^^^^^^^^^

  • Support non-uniform categoricals (:pr:1877), (:pr:1930)
  • Groupby cumulative reductions (:pr:1909)
  • DataFrame.loc indexing now supports lists (:pr:1913)
  • Improve multi-level groupbys (:pr:1914)
  • Improved HTML and string repr for DataFrames (:pr:1637)
  • Parquet append (:pr:1940)
  • Add dd.demo.daily_stock function for teaching (:pr:1992)

Delayed ^^^^^^^

  • Add traverse= keyword to delayed to optionally avoid traversing nested data structures (:pr:1899)
  • Support Futures in from_delayed functions (:pr:1961)
  • Improve serialization of decorated delayed functions (:pr:1969)

Core ^^^^

  • Improve windows path parsing in corner cases (:pr:1910)
  • Rename tasks when fusing (:pr:1919)
  • Add top level persist function (:pr:1927)
  • Propagate errors= keyword in byte handling (:pr:1954)
  • Dask.compute traverses Python collections (:pr:1975)
  • Structural sharing between graphs in dask.array and dask.delayed (:pr:1985)

.. _v0.13.0 / 2017-01-02:

0.13.0 / 2017-01-02

Array ^^^^^

  • Mandatory dtypes on dask.array. All operations maintain dtype information and UDF functions like map_blocks now require a dtype= keyword if it can not be inferred. (:pr:1755)
  • Support arrays without known shapes, such as arises when slicing arrays with arrays or converting dataframes to arrays (:pr:1838)
  • Support mutation by setting one array with another (:pr:1840)
  • Tree reductions for covariance and correlations. (:pr:1758)
  • Add SerializableLock for better use with distributed scheduling (:pr:1766)
  • Improved atop support (:pr:1800)
  • Rechunk optimization (:pr:1737), (:pr:1827)

Bag ^^^

  • Avoid wrong results when recomputing the same groupby twice (:pr:1867)

DataFrame ^^^^^^^^^

  • Add map_overlap for custom rolling operations (:pr:1769)
  • Add shift (:pr:1773)
  • Add Parquet support (:pr:1782) (:pr:1792) (:pr:1810), (:pr:1843), (:pr:1859), (:pr:1863)
  • Add missing methods combine, abs, autocorr, sem, nsmallest, first, last, prod, (:pr:1787)
  • Approximate nunique (:pr:1807), (:pr:1824)
  • Reductions with multiple output partitions (for operations like drop_duplicates) (:pr:1808), (:pr:1823) (:pr:1828)
  • Add delitem and copy to DataFrames, increasing mutation support (:pr:1858)

Delayed ^^^^^^^

  • Changed behaviour for delayed(nout=0) and delayed(nout=1): delayed(nout=1) does not default to out=None anymore, and delayed(nout=0) is also enabled. I.e. functions with return tuples of length 1 or 0 can be handled correctly. This is especially handy, if functions with a variable amount of outputs are wrapped by delayed. E.g. a trivial example: delayed(lambda *args: args, nout=len(vals))(*vals)

Core ^^^^

  • Refactor core byte ingest (:pr:1768), (:pr:1774)
  • Improve import time (:pr:1833)

.. _v0.12.0 / 2016-11-03:

0.12.0 / 2016-11-03

DataFrame ^^^^^^^^^

  • Return a series when functions given to dataframe.map_partitions return scalars (:pr:1515)
  • Fix type size inference for series (:pr:1513)
  • dataframe.DataFrame.categorize no longer includes missing values in the categories. This is for compatibility with a pandas change <https://github.com/pydata/pandas/pull/10929>_ (:pr:1565)
  • Fix head parser error in dataframe.read_csv when some lines have quotes (:pr:1495)
  • Add dataframe.reduction and series.reduction methods to apply generic row-wise reduction to dataframes and series (:pr:1483)
  • Add dataframe.select_dtypes, which mirrors the pandas method <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html>_ (:pr:1556)
  • dataframe.read_hdf now supports reading Series (:pr:1564)
  • Support Pandas 0.19.0 (:pr:1540)
  • Implement select_dtypes (:pr:1556)
  • String accessor works with indexes (:pr:1561)
  • Add pipe method to dask.dataframe (:pr:1567)
  • Add indicator keyword to merge (:pr:1575)
  • Support Series in read_hdf (:pr:1575)
  • Support Categories with missing values (:pr:1578)
  • Support inplace operators like df.x += 1 (:pr:1585)
  • Str accessor passes through args and kwargs (:pr:1621)
  • Improved groupby support for single-machine multiprocessing scheduler (:pr:1625)
  • Tree reductions (:pr:1663)
  • Pivot tables (:pr:1665)
  • Add clip (:pr:1667), align (:pr:1668), combine_first (:pr:1725), and any/all (:pr:1724)
  • Improved handling of divisions on dask-pandas merges (:pr:1666)
  • Add groupby.aggregate method (:pr:1678)
  • Add dd.read_table function (:pr:1682)
  • Improve support for multi-level columns (:pr:1697) (:pr:1712)
  • Support 2d indexing in loc (:pr:1726)
  • Extend resample to include DataFrames (:pr:1741)
  • Support dask.array ufuncs on dask.dataframe objects (:pr:1669)

Array ^^^^^

  • Add information about how dask.array chunks argument work (:pr:1504)
  • Fix field access with non-scalar fields in dask.array (:pr:1484)
  • Add concatenate= keyword to atop to concatenate chunks of contracted dimensions
  • Optimized slicing performance (:pr:1539) (:pr:1731)
  • Extend atop with a concatenate= (:pr:1609) new_axes= (:pr:1612) and adjust_chunks= (:pr:1716) keywords
  • Add clip (:pr:1610) swapaxes (:pr:1611) round (:pr:1708) repeat
  • Automatically align chunks in atop-backed operations (:pr:1644)
  • Cull dask.arrays on slicing (:pr:1709)

Bag ^^^

  • Fix issue with callables in bag.from_sequence being interpreted as tasks (:pr:1491)
  • Avoid non-lazy memory use in reductions (:pr:1747)

Administration ^^^^^^^^^^^^^^

  • Added changelog (:pr:1526)
  • Create new threadpool when operating from thread (:pr:1487)
  • Unify example documentation pages into one (:pr:1520)
  • Add versioneer for git-commit based versions (:pr:1569)
  • Pass through node_attr and edge_attr keywords in dot visualization (:pr:1614)
  • Add continuous testing for Windows with Appveyor (:pr:1648)
  • Remove use of multiprocessing.Manager (:pr:1653)
  • Add global optimizations keyword to compute (:pr:1675)
  • Micro-optimize get_dependencies (:pr:1722)

.. _v0.11.0 / 2016-08-24:

0.11.0 / 2016-08-24

Major Points ^^^^^^^^^^^^

DataFrames now enforce knowing full metadata (columns, dtypes) everywhere. Previously we would operate in an ambiguous state when functions lost dtype information (such as apply). Now all dataframes always know their dtypes and raise errors asking for information if they are unable to infer (which they usually can). Some internal attributes like _pd and _pd_nonempty have been moved.

The internals of the distributed scheduler have been refactored to transition tasks between explicit states. This improves resilience, reasoning about scheduling, plugin operation, and logging. It also makes the scheduler code easier to understand for newcomers.

Breaking Changes ^^^^^^^^^^^^^^^^

  • The distributed.s3 and distributed.hdfs namespaces are gone. Use protocols in normal methods like read_text('s3://...' instead.
  • Dask.array.reshape now errs in some cases where previously it would have create a very large number of tasks

.. _v0.10.2 / 2016-07-27:

0.10.2 / 2016-07-27

  • More Dataframe shuffles now work in distributed settings, ranging from setting-index to hash joins, to sorted joins and groupbys.
  • Dask passes the full test suite when run when under in Python's optimized-OO mode.
  • On-disk shuffles were found to produce wrong results in some highly-concurrent situations, especially on Windows. This has been resolved by a fix to the partd library.
  • Fixed a growth of open file descriptors that occurred under large data communications
  • Support ports in the --bokeh-whitelist option ot dask-scheduler to better routing of web interface messages behind non-trivial network settings
  • Some improvements to resilience to worker failure (though other known failures persist)
  • You can now start an IPython kernel on any worker for improved debugging and analysis
  • Improvements to dask.dataframe.read_hdf, especially when reading from multiple files and docs

.. _v0.10.0 / 2016-06-13:

0.10.0 / 2016-06-13

Major Changes ^^^^^^^^^^^^^

  • This version drops support for Python 2.6
  • Conda packages are built and served from conda-forge
  • The dask.distributed executables have been renamed from dfoo to dask-foo. For example dscheduler is renamed to dask-scheduler
  • Both Bag and DataFrame include a preliminary distributed shuffle.

Bag ^^^

  • Add task-based shuffle for distributed groupbys
  • Add accumulate for cumulative reductions

DataFrame ^^^^^^^^^

  • Add a task-based shuffle suitable for distributed joins, groupby-applys, and set_index operations. The single-machine shuffle remains untouched (and much more efficient.)
  • Add support for new Pandas rolling API with improved communication performance on distributed systems.
  • Add groupby.std/var
  • Pass through S3/HDFS storage options in read_csv
  • Improve categorical partitioning
  • Add eval, info, isnull, notnull for dataframes

Distributed ^^^^^^^^^^^

  • Rename executables like dscheduler to dask-scheduler
  • Improve scheduler performance in the many-fast-tasks case (important for shuffling)
  • Improve work stealing to be aware of expected function run-times and data sizes. The drastically increases the breadth of algorithms that can be efficiently run on the distributed scheduler without significant user expertise.
  • Support maximum buffer sizes in streaming queues
  • Improve Windows support when using the Bokeh diagnostic web interface
  • Support compression of very-large-bytestrings in protocol
  • Support clean cancellation of submitted futures in Joblib interface

Other ^^^^^

  • All dask-related projects (dask, distributed, s3fs, hdfs, partd) are now building conda packages on conda-forge.
  • Change credential handling in s3fs to only pass around delegated credentials if explicitly given secret/key. The default now is to rely on managed environments. This can be changed back by explicitly providing a keyword argument. Anonymous mode must be explicitly declared if desired.

.. _v0.9.0 / 2016-05-11:

0.9.0 / 2016-05-11

API Changes ^^^^^^^^^^^

  • dask.do and dask.value have been renamed to dask.delayed
  • dask.bag.from_filenames has been renamed to dask.bag.read_text
  • All S3/HDFS data ingest functions like db.from_s3 or distributed.s3.read_csv have been moved into the plain read_text, read_csv functions, which now support protocols, like dd.read_csv('s3://bucket/keys*.csv')

Array ^^^^^

  • Add support for scipy.LinearOperator
  • Improve optional locking to on-disk data structures
  • Change rechunk to expose the intermediate chunks

Bag ^^^

  • Rename from_filename\ s to read_text
  • Remove from_s3 in favor of read_text('s3://...')

DataFrame ^^^^^^^^^

  • Fixed numerical stability issue for correlation and covariance
  • Allow no-hash from_pandas for speedy round-trips to and from-pandas objects
  • Generally reengineered read_csv to be more in line with Pandas behavior
  • Support fast set_index operations for sorted columns

Delayed ^^^^^^^

  • Rename do/value to delayed
  • Rename to/from_imperative to to/from_delayed

Distributed ^^^^^^^^^^^

  • Move s3 and hdfs functionality into the dask repository
  • Adaptively oversubscribe workers for very fast tasks
  • Improve PyPy support
  • Improve work stealing for unbalanced workers
  • Scatter data efficiently with tree-scatters

Other ^^^^^

  • Add lzma/xz compression support
  • Raise a warning when trying to split unsplittable compression types, like gzip or bz2
  • Improve hashing for single-machine shuffle operations
  • Add new callback method for start state
  • General performance tuning

.. _v0.8.1 / 2016-03-11:

0.8.1 / 2016-03-11

Array ^^^^^

  • Bugfix for range slicing that could periodically lead to incorrect results.
  • Improved support and resiliency of arg reductions (argmin, argmax, etc.)

Bag ^^^

  • Add zip function

DataFrame ^^^^^^^^^

  • Add corr and cov functions
  • Add melt function
  • Bugfixes for io to bcolz and hdf5

.. _v0.8.0 / 2016-02-20:

0.8.0 / 2016-02-20

Array ^^^^^

  • Changed default array reduction split from 32 to 4
  • Linear algebra, tril, triu, LU, inv, cholesky, solve, solve_triangular, eye, lstsq, diag, corrcoef.

Bag ^^^

  • Add tree reductions
  • Add range function
  • drop from_hdfs function (better functionality now exists in hdfs3 and distributed projects)

DataFrame ^^^^^^^^^

  • Refactor dask.dataframe to include a full empty pandas dataframe as metadata. Drop the .columns attribute on Series
  • Add Series categorical accessor, series.nunique, drop the .columns attribute for series.
  • read_csv fixes (multi-column parse_dates, integer column names, etc. )
  • Internal changes to improve graph serialization

Other ^^^^^

  • Documentation updates
  • Add from_imperative and to_imperative functions for all collections
  • Aesthetic changes to profiler plots
  • Moved the dask project to a new dask organization

.. _v0.7.6 / 2016-01-05:

0.7.6 / 2016-01-05

Array ^^^^^

  • Improve thread safety
  • Tree reductions
  • Add view, compress, hstack, dstack, vstack methods
  • map_blocks can now remove and add dimensions

DataFrame ^^^^^^^^^

  • Improve thread safety
  • Extend sampling to include replacement options

Imperative ^^^^^^^^^^

  • Removed optimization passes that fused results.

Core ^^^^

  • Removed dask.distributed
  • Improved performance of blocked file reading
  • Serialization improvements
  • Test Python 3.5

.. _v0.7.4 / 2015-10-23:

0.7.4 / 2015-10-23

This was mostly a bugfix release. Some notable changes:

  • Fix minor bugs associated with the release of numpy 1.10 and pandas 0.17
  • Fixed a bug with random number generation that would cause repeated blocks due to the birthday paradox
  • Use locks in dask.dataframe.read_hdf by default to avoid concurrency issues
  • Change dask.get to point to dask.async.get_sync by default
  • Allow visualization functions to accept general graphviz graph options like rankdir='LR'
  • Add reshape and ravel to dask.array
  • Support the creation of dask.arrays from dask.imperative objects

Deprecation ^^^^^^^^^^^

This release also includes a deprecation warning for dask.distributed, which will be removed in the next version.

Future development in distributed computing for dask is happening here: https://distributed.dask.org . General feedback on that project is most welcome from this community.

.. _v0.7.3 / 2015-09-25:

0.7.3 / 2015-09-25

Diagnostics ^^^^^^^^^^^

  • A utility for profiling memory and cpu usage has been added to the dask.diagnostics module.

DataFrame ^^^^^^^^^ This release improves coverage of the pandas API. Among other things it includes nunique, nlargest, quantile. Fixes encoding issues with reading non-ascii csv files. Performance improvements and bug fixes with resample. More flexible read_hdf with globbing. And many more. Various bug fixes in dask.imperative and dask.bag.

.. _v0.7.0 / 2015-08-15:

0.7.0 / 2015-08-15

DataFrame ^^^^^^^^^ This release includes significant bugfixes and alignment with the Pandas API. This has resulted both from use and from recent involvement by Pandas core developers.

  • New operations: query, rolling operations, drop
  • Improved operations: quantiles, arithmetic on full dataframes, dropna, constructor logic, merge/join, elemwise operations, groupby aggregations

Bag ^^^

  • Fixed a bug in fold where with a null default argument

Array ^^^^^

  • New operations: da.fft module, da.image.imread

Infrastructure ^^^^^^^^^^^^^^

  • The array and dataframe collections create graphs with deterministic keys. These tend to be longer (hash strings) but should be consistent between computations. This will be useful for caching in the future.
  • All collections (Array, Bag, DataFrame) inherit from common subclass

.. _v0.6.1 / 2015-07-23:

0.6.1 / 2015-07-23

Distributed ^^^^^^^^^^^

  • Improved (though not yet sufficient) resiliency for dask.distributed when workers die

DataFrame ^^^^^^^^^

  • Improved writing to various formats, including to_hdf, to_castra, and to_csv
  • Improved creation of dask DataFrames from dask Arrays and Bags
  • Improved support for categoricals and various other methods

Array ^^^^^

  • Various bug fixes
  • Histogram function

Scheduling ^^^^^^^^^^

  • Added tie-breaking ordering of tasks within parallel workloads to better handle and clear intermediate results

Other ^^^^^

  • Added the dask.do function for explicit construction of graphs with normal python code
  • Traded pydot for graphviz library for graph printing to support Python3
  • There is also a gitter chat room and a stackoverflow tag

.. _crusaderky: https://github.com/crusaderky .. _John A Kirkham: https://github.com/jakirkham .. _Matthew Rocklin: https://github.com/mrocklin .. _Jim Crist: https://github.com/jcrist .. _James Bourbeau: https://github.com/jrbourbeau .. _James Munroe: https://github.com/jmunroe .. _Thomas Caswell: https://github.com/tacaswell .. _Tom Augspurger: https://github.com/tomaugspurger .. _Uwe Korn: https://github.com/xhochy .. _Christopher Prohm: https://github.com/chmp .. _@xwang777: https://github.com/xwang777 .. _@fjetter: https://github.com/fjetter .. _@Ced4: https://github.com/Ced4 .. _Ian Hopkinson: https://github.com/IanHopkinson .. _Stephan Hoyer: https://github.com/shoyer .. _Albert DeFusco: https://github.com/AlbertDeFusco .. _Markus Gonser: https://github.com/magonser .. _Martijn Arts: https://github.com/mfaafm .. _Jon Mease: https://github.com/jonmmease .. _Xander Johnson: https://github.com/metasyn .. _Nir: https://github.com/nirizr .. _Keisuke Fujii: https://github.com/fujiisoup .. _Roman Yurchak: https://github.com/rth .. _Max Epstein: https://github.com/MaxPowerWasTaken .. _Simon Perkins: https://github.com/sjperkins .. _Richard Postelnik: https://github.com/postelrich .. _Daniel Collins: https://github.com/dancollins34 .. _Gabriele Lanaro: https://github.com/gabrielelanaro .. _Jörg Dietrich: https://github.com/joergdietrich .. _Christopher Ren: https://github.com/cr458 .. _Martin Durant: https://github.com/martindurant .. _Thrasibule: https://github.com/thrasibule .. _Dieter Weber: https://github.com/uellue .. _Apostolos Vlachopoulos: https://github.com/avlahop .. _Jesse Vogt: https://github.com/jessevogt .. _Pierre Bartet: https://github.com/Pierre-Bartet .. _Scott Sievert: https://github.com/stsievert .. _Jeremy Chen: https://github.com/convexset .. _Marc Pfister: https://github.com/drwelby .. _Matt Lee: https://github.com/mathewlee11 .. _Yu Feng: https://github.com/rainwoodman .. _@andrethrill: https://github.com/andrethrill .. _@beomi: https://github.com/beomi .. _Henrique Ribeiro: https://github.com/henriqueribeiro .. _Marco Rossi: https://github.com/m-rossi .. _Itamar Turner-Trauring: https://github.com/itamarst .. _Mike Neish: https://github.com/neishm .. _Mark Harfouche: https://github.com/hmaarrfk .. _George Sakkis: https://github.com/gsakkis .. _Ziyao Wei: https://github.com/ZiyaoWei .. _Jacob Tomlinson: https://github.com/jacobtomlinson .. _Elliott Sales de Andrade: https://github.com/QuLogic .. _Gerome Pistre: https://github.com/GPistre .. _Cloves Almeida: https://github.com/cjalmeida .. _Tobias de Jong: https://github.com/tadejong .. _Irina Truong: https://github.com/j-bennet .. _Eric Bonfadini: https://github.com/eric-bonfadini .. _Danilo Horta: https://github.com/horta .. _@hugovk: https://github.com/hugovk .. _Jan Margeta: https://github.com/jmargeta .. _John Mrziglod: https://github.com/JohnMrziglod .. _Christoph Moehl: https://github.com/cmohl2013 .. _Anderson Banihirwe: https://github.com/andersy005 .. _Javad: https://github.com/javad94 .. _Daniel Rothenberg: https://github.com/darothen .. _Hans Moritz Günther: https://github.com/hamogu .. _@rtobar: https://github.com/rtobar .. _Julia Signell: https://github.com/jsignell .. _Sriharsha Hatwar: https://github.com/Sriharsha-hatwar .. _Bruce Merry: https://github.com/bmerry .. _Joe Hamman: https://github.com/jhamman .. _Robert Sare: https://github.com/rmsare .. _Jeremy Chan: https://github.com/convexset .. _Eric Wolak: https://github.com/epall .. _Miguel Farrajota: https://github.com/farrajota .. _Zhenqing Li: https://github.com/DigitalPig .. _Matthias Bussonier: https://github.com/Carreau .. _Jan Koch: https://github.com/datajanko .. _Bart Broere: https://github.com/bartbroere .. _Rahul Vaidya: https://github.com/rvaidya .. _Justin Dennison: https://github.com/justin1dennison .. _Antonino Ingargiola: https://github.com/tritemio .. _TakaakiFuruse: https://github.com/TakaakiFuruse .. _samc0de: https://github.com/samc0de .. _Armin Berres: https://github.com/aberres .. _Damien Garaud: https://github.com/geraud .. _Jonathan Fraine: https://github.com/exowanderer .. _Carlos Valiente: https://github.com/carletes .. _@milesial: https://github.com/milesial .. _Paul Vecchio: https://github.com/vecchp .. _Johnnie Gray: https://github.com/jcmgray .. _Diane Trout: https://github.com/detrout .. _Marco Neumann: https://github.com/crepererum .. _Mina Farid: https://github.com/minafarid .. _@slnguyen: https://github.com/slnguyen .. _Gábor Lipták: https://github.com/gliptak .. _David Hoese: https://github.com/djhoese .. _Daniel Li: https://github.com/li-dan .. _Prabakaran Kumaresshan: https://github.com/nixphix .. _Daniel Saxton: https://github.com/dsaxton .. _Jendrik Jördening: https://github.com/jendrikjoe .. _Takahiro Kojima: https://github.com/515hikaru .. _Stuart Berg: https://github.com/stuarteberg .. _Guillaume Eynard-Bontemps: https://github.com/guillaumeeb .. _Adam Beberg: https://github.com/beberg .. _Roma Sokolov: https://github.com/little-arhat .. _Daniel Severo: https://github.com/dsevero .. _Michał Jastrzębski: https://github.com/inc0 .. _Janne Vuorela: https://github.com/Dimplexion .. _Ross Petchler: https://github.com/rpetchler .. _Aploium: https://github.com/aploium .. _Peter Andreas Entschev: https://github.com/pentschev .. _@JulianWgs: https://github.com/JulianWgs .. _Shyam Saladi: https://github.com/smsaladi .. _Joe Corbett: https://github.com/jcorb .. _@HSR05: https://github.com/HSR05 .. _Benjamin Zaitlen: https://github.com/quasiben .. _Brett Naul: https://github.com/bnaul .. _Justin Poehnelt: https://github.com/jpoehnelt .. _Dan O'Donovan: https://github.com/danodonovan .. _amerkel2: https://github.com/amerkel2 .. _Justin Waugh: https://github.com/bluecoconut .. _Brian Chu: https://github.com/bchu .. _Álvaro Abella Bascarán: https://github.com/alvaroabascar .. _Aaron Fowles: https://github.com/aaronfowles .. _Søren Fuglede Jørgensen: https://github.com/fuglede .. _Hameer Abbasi: https://github.com/hameerabbasi .. _Philipp Rudiger: https://github.com/philippjfr .. _gregrf: https://github.com/gregrf .. _Ian Rose: https://github.com/ian-r-rose .. _Genevieve Buckley: https://github.com/GenevieveBuckley .. _Michael Eaton: https://github.com/mpeaton .. _Isaiah Norton: https://github.com/hnorton .. _Nick Becker: https://github.com/beckernick .. _Nathan Matare: https://github.com/nmatare .. _@asmith26: https://github.com/asmith26 .. _Abhinav Ralhan: https://github.com/abhinavralhan .. _Christian Hudon: https://github.com/chrish42 .. _Alistair Miles: https://github.com/alimanfoo .. _Henry Pinkard: https://github.com/ .. _Ian Bolliger: https://github.com/bolliger32 .. _Mark Bell: https://github.com/MarkCBell .. _Cody Johnson: https://github.com/codercody .. _Endre Mark Borza: https://github.com/endremborza .. _asmith26: https://github.com/asmith26 .. _Philipp S. Sommer: https://github.com/Chilipp .. _mcsoini: https://github.com/mcsoini .. _Ksenia Bobrova: https://github.com/almaleksia .. _tpanza: https://github.com/tpanza .. _Richard J Zamora: https://github.com/rjzamora .. _Lijo Jose: https://github.com/lijose .. _btw08: https://github.com/btw08 .. _Jorge Pessoa: https://github.com/jorge-pessoa .. _Guillaume Lemaitre: https://github.com/glemaitre .. _Bouwe Andela: https://github.com/bouweandela .. _mbarkhau: https://github.com/mbarkhau .. _Hugo: https://github.com/hugovk .. _Paweł Kordek: https://github.com/kordek .. _Ralf Gommers: https://github.com/rgommers .. _Davis Bennett: https://github.com/d-v-b .. _Willi Rath: https://github.com/willirath .. _David Brochart: https://github.com/davidbrochart .. _GALI PREM SAGAR: https://github.com/galipremsagar .. _tshatrov: https://github.com/tshatrov .. _Dustin Tindall: https://github.com/dustindall .. _Sean McKenna: https://github.com/seanmck .. _msbrown47: https://github.com/msbrown47 .. _Natalya Rapstine: https://github.com/natalya-patrikeeva .. _Loïc Estève: https://github.com/lesteve .. _Xavier Holt: https://github.com/xavi-ai .. _Sarah Bird: https://github.com/birdsarah .. _Doug Davis: https://github.com/douglasdavis .. _Nicolas Hug: https://github.com/NicolasHug .. _Blane: https://github.com/BlaneG .. _Ivars Geidans: https://github.com/ivarsfg .. _Scott Sievert: https://github.com/stsievert .. _estebanag: https://github.com/estebanag .. _Benoit Bovy: https://github.com/benbovy .. _Gabe Joseph: https://github.com/gjoseph92 .. _therhaag: https://github.com/therhaag .. _Arpit Solanki: https://github.com/arpit1997 .. _Oliver Hofkens: https://github.com/OliverHofkens .. _Hongjiu Zhang: https://github.com/hongzmsft .. _Wes Roach: https://github.com/WesRoach .. _DomHudson: https://github.com/DomHudson .. _Eugene Huang: https://github.com/eugeneh101 .. _Christopher J. Wright: https://github.com/CJ-Wright .. _Mahmut Bulut: https://github.com/vertexclique .. _Ben Jeffery: https://github.com/benjeffery .. _Ryan Nazareth: https://github.com/ryankarlos .. _garanews: https://github.com/garanews .. _Vijayant: https://github.com/VijayantSoni .. _Ryan Abernathey: https://github.com/rabernat .. _Norman Barker: https://github.com/normanb .. _darindf: https://github.com/darindf .. _Ryan Grout: https://github.com/groutr .. _Krishan Bhasin: https://github.com/KrishanBhasin .. _Albert DeFusco: https://github.com/AlbertDeFusco .. _Bruno Bonfils: https://github.com/asyd .. _Petio Petrov: https://github.com/petioptrv .. _Mads R. B. Kristensen: https://github.com/madsbk .. _Prithvi MK: https://github.com/pmk21 .. _Eric Dill: https://github.com/ericdill .. _Gina Helfrich: https://github.com/Dr-G .. _ossdev07: https://github.com/ossdev07 .. _Nuno Gomes Silva: https://github.com/mgsnuno .. _Ray Bell: https://github.com/raybellwaves .. _Deepak Cherian: https://github.com/dcherian .. _Matteo De Wint: https://github.com/mdwint .. _Tim Gates: https://github.com/timgates42 .. _Erik Welch: https://github.com/eriknw .. _Christian Wesp: https://github.com/ChrWesp .. _Shiva Raisinghani: https://github.com/exemplary-citizen .. _Thomas A Caswell: https://github.com/tacaswell .. _Timost: https://github.com/Timost .. _Maarten Breddels: https://github.com/maartenbreddels .. _Devin Petersohn: https://github.com/devin-petersohn .. _dfonnegra: https://github.com/dfonnegra .. _Chris Roat: https://github.com/ChrisRoat .. _H. Thomson Comer: https://github.com/thomcom .. _Gerrit Holl: https://github.com/gerritholl .. _Thomas Robitaille: https://github.com/astrofrog .. _Yifan Gu: https://github.com/gyf304 .. _Surya Avala: https://github.com/suryaavala .. _Cyril Shcherbin: https://github.com/shcherbin .. _Ram Rachum: https://github.com/cool-RR .. _Igor Gotlibovych: https://github.com/ig248 .. _K.-Michael Aye: https://github.com/michaelaye .. _Yetunde Dada: https://github.com/yetudada .. _Andrew Thomas: https://github.com/amcnicho .. _rockwellw: https://github.com/rockwellw .. _Gil Forsyth: https://github.com/gforsyth .. _Thomas J. Fan: https://github.com/thomasjpfan .. _Henrik Andersson: https://github.com/hnra .. _James Lamb: https://github.com/jameslamb .. _Corey J. Nolet: https://github.com/cjnolet .. _Chuanzhu Xu: https://github.com/xcz011 .. _Lucas Rademaker: https://github.com/lr4d .. _JulianWgs: https://github.com/JulianWgs .. _psimaj: https://github.com/psimaj .. _mlondschien: https://github.com/mlondschien .. _petiop: https://github.com/petiop .. _Richard (Rick) Zamora: https://github.com/rjzamora .. _Mark Boer: https://github.com/mark-boer .. _Florian Jetter: https://github.com/fjetter .. _Adam Lewis: https://github.com/Adam-D-Lewis .. _David Chudzicki: https://github.com/dchudz .. _Nick Evans: https://github.com/nre .. _Kai Mühlbauer: https://github.com/kmuehlbauer .. _swapna: https://github.com/swapna-pg .. _Antonio Ercole De Luca: https://github.com/eracle .. _Amol Umbarkar: https://github.com/mindhash .. _noreentry: https://github.com/noreentry .. _Marius van Niekerk: https://github.com/mariusvniekerk .. _Tung Dang: https://github.com/3cham .. _Jim Crist-Harif: https://github.com/jcrist .. _Brian Larsen: https://github.com/brl0 .. _Nils Braun: https://github.com/nils-braun .. _Scott Sanderson: https://github.com/ssanderson .. _Gaurav Sheni: https://github.com/gsheni .. _Andrew Fulton: https://github.com/andrewfulton9 .. _Stephanie Gott: https://github.com/stephaniegott .. _Huite: https://github.com/Huite .. _Ryan Williams: https://github.com/ryan-williams .. _Eric Czech: https://github.com/eric-czech .. _Abdulelah Bin Mahfoodh: https://github.com/abduhbm .. _Ben Shaver: https://github.com/bpshaver .. _Matthias Bussonnier: https://github.com/Carreau .. _johnomotani: https://github.com/johnomotani .. _Roberto Panai: https://github.com/rpanai .. _Clark Zinzow: https://github.com/clarkzinzow .. _Tom McTiernan: https://github.com/tmct .. _joshreback: https://github.com/joshreback .. _Jun Han (Johnson) Ooi: https://github.com/tebesfinwo .. _Jim Circadian: https://github.com/JimCircadian .. _Jack Xiaosong Xu: https://github.com/jackxxu .. _Mike McCarty: https://github.com/mmccarty .. _michaelnarodovitch: https://github.com/michaelnarodovitch .. _David Sheldon: https://github.com/davidsmf .. _McToel: https://github.com/McToel .. _Kilian Lieret: https://github.com/klieret .. _Noah D. Brenowitz: https://github.com/nbren12 .. _Jon Thielen: https://github.com/jthielen .. _Poruri Sai Rahul: https://github.com/rahulporuri .. _Kyle Nicholson: https://github.com/kylejn27 .. _Rafal Wojdyla: https://github.com/ravwojdyla .. _Sam Grayson: https://github.com/charmoniumQ .. _Madhur Tandon: https://github.com/madhur-tandon .. _Joachim B Haga: https://github.com/jobh .. _Pav A: https://github.com/rs2 .. _GFleishman: https://github.com/GFleishman .. _Shang Wang: https://github.com/shangw-nvidia .. _Illviljan: https://github.com/Illviljan .. _Jan Borchmann: https://github.com/jborchma .. _Ruben van de Geer: https://github.com/rubenvdg .. _Akira Naruse: https://github.com/anaruse .. _Zhengnan Zhao: https://github.com/zzhengnan .. _Greg Hayes: https://github.com/hayesgb .. _RogerMoens: https://github.com/RogerMoens .. _manuels: https://github.com/manuels .. _Rockwell Weiner: https://github.com/rockwellw .. _Devanshu Desai: https://github.com/devanshuDesai .. _David Katz: https://github.com/DavidKatz-il .. _Stephannie Jimenez Gacha: https://github.com/steff456 .. _Magnus Nord: https://github.com/magnunor .. _Callum Noble: https://github.com/callumanoble .. _Pascal Bourgault: https://github.com/aulemahal .. _Joris Van den Bossche: https://github.com/jorisvandenbossche .. _Mark: https://github.com/mchi .. _Kumar Bharath Prabhu: https://github.com/kumarprabhu1988 .. _Rob Malouf: https://github.com/rmalouf .. _sdementen: https://github.com/sdementen .. _patquem: https://github.com/patquem .. _Amit Kumar: https://github.com/aktech .. _D-Stacks: https://github.com/D-Stacks .. _Kyle Barron: https://github.com/kylebarron .. _Julius Busecke: https://github.com/jbusecke .. _Sinclair Target: https://github.com/sinclairtarget .. _Ashwin Srinath: https://github.com/shwina .. _David Hassell: https://github.com/davidhassell .. _brandon-b-miller: https://github.com/brandon-b-miller .. _Hristo Georgiev: https://github.com/hristog .. _Trevor Manz: https://github.com/manzt .. _Madhu94: https://github.com/Madhu94 .. _gerrymanoim: https://github.com/gerrymanoim .. _rs9w33: https://github.com/rs9w33 .. _Tom White: https://github.com/tomwhite .. _Eoin Shanaghy: https://github.com/eoinsha .. _Nick Vazquez: https://github.com/nickvazz .. _cameron16: https://github.com/cameron16 .. _Daniel Mesejo-León: https://github.com/mesejo .. _Naty Clementi: https://github.com/ncclementi .. _JSKenyon: https://github.com/jskenyon .. _Freyam Mehta: https://github.com/freyam .. _Jiaming Yuan: https://github.com/trivialfis .. _c-thiel: https://github.com/c-thiel .. _Andrew Champion: https://github.com/aschampion .. _Justus Magin: https://github.com/keewis .. _Maisie Marshall: https://github.com/maisiemarshall .. _Vibhu Jawa: https://github.com/VibhuJawa .. _Boaz Mohar: https://github.com/boazmohar .. _Kristopher Overholt: https://github.com/koverholt .. _tsuga: https://github.com/tsuga .. _Gabriel Miretti: https://github.com/gmiretti .. _Geoffrey Lentner: https://github.com/glentner .. _Charles Blackmon-Luca: https://github.com/charlesbluca .. _Bryan Van de Ven: https://github.com/bryevdv .. _Fabian Gebhart: https://github.com/fgebhart .. _Ross: https://github.com/rhjmoore .. _gurunath: https://github.com/rajagurunath .. _aa1371: https://github.com/aa1371 .. _Gregory R. Lee: https://github.com/grlee77 .. _Louis Maddox: https://github.com/lmmx .. _Dahn: https://github.com/DahnJ .. _Jordan Jensen: https://github.com/dotNomad .. _Martin Fleischmann: https://github.com/martinfleis .. _Robert Hales: https://github.com/robalar .. _João Paulo Lacerda: https://github.com/jopasdev .. _neel iyer: https://github.com/spiyer99 .. _SnkSynthesis: https://github.com/SnkSynthesis .. _JoranDox: https://github.com/JoranDox .. _Kinshuk Dua: https://github.com/kinshukdua .. _Suriya Senthilkumar: https://github.com/suriya-it19 .. _Vũ Trung Đức: https://github.com/vutrungduc7593 .. _Nathan Danielsen: https://github.com/ndanielsen .. _Wallace Reis: https://github.com/wreis .. _German Shiklov: https://github.com/Jeremaiha-xmetix .. _Pankaj Patil: https://github.com/Patil2099 .. _Samuel Gaist: https://github.com/sgaist .. _Marcel Coetzee: https://github.com/marcelned .. _Matthew Powers: https://github.com/MrPowers .. _Vyas Ramasubramani: https://github.com/vyasr .. _Ayush Dattagupta: https://github.com/ayushdg .. _FredericOdermatt: https://github.com/FredericOdermatt .. _mihir: https://github.com/ek234 .. _Sarah Charlotte Johnson: https://github.com/scharlottej13 .. _ofirr: https://github.com/ofirr .. _kori73: https://github.com/kori73 .. _TnTo: https://github.com/TnTo .. _ParticularMiner: https://github.com/ParticularMiner .. _aeisenbarth: https://github.com/aeisenbarth .. _Aneesh Nema: https://github.com/aneeshnema .. _Deepyaman Datta: https://github.com/deepyaman .. _Maren Westermann: https://github.com/marenwestermann .. _Michael Delgado: https://github.com/delgadom .. _abergou: https://github.com/abergou .. _Pavithra Eswaramoorthy: https://github.com/pavithraes .. _Maxim Lippeveld: https://github.com/MaximLippeveld .. _Kirito1397: https://github.com/Kirito1397 .. _Xinrong Meng: https://github.com/xinrong-databricks .. _Bryan Weber: https://github.com/bryanwweber .. _Amir Kadivar: https://github.com/amirkdv .. _Pedro Silva: https://github.com/ppsbs .. _Knut Nordanger: https://github.com/nordange .. _Ben Glossner: https://github.com/bglossner .. _Dranaxel: https://github.com/Dranaxel .. _Holden Karau: https://github.com/holdenk .. _Peter: https://github.com/peterpandelidis .. _Thomas Grainger: https://github.com/graingert .. _Martin Thøgersen: https://github.com/th0ger .. _Leo Gao: https://github.com/leogao2 .. _Paul Hobson: https://github.com/phobson .. _LSturtew: https://github.com/LSturtew .. _Michał Górny: https://github.com/mgorny .. _lrjball: https://github.com/lrjball .. _Davide Gavio: https://github.com/davidegavio .. _Ben Greiner: https://github.com/bnavigator .. _Roger Filmyer: https://github.com/rfilmyer .. _Richard: https://github.com/richarms .. _Francesco Andreuzzi: https://github.com/fAndreuzzi .. _Nadiem Sissouno: https://github.com/sissnad .. _Jorge López: https://github.com/jorloplaz .. _Cheun Hong: https://github.com/cheunhong .. _Eray Aslan: https://github.com/erayaslan .. _Ben Beasley: https://github.com/musicinmybrain .. _Ryan Russell: https://github.com/ryanrussell .. _Angelos Omirolis: https://github.com/aomirolis .. _Fabien Aulaire: https://github.com/faulaire .. _Alex-JG3: https://github.com/Alex-JG3 .. _Christopher Akiki: https://github.com/cakiki .. _Sultan Orazbayev: https://github.com/SultanOrazbayev .. _Richard Pelgrim: https://github.com/rrpelgrim .. _Ben: https://github.com/benjaminhduncan .. _Angus Hollands: https://github.com/agoose77 .. _Lucas Miguel Ponce: https://github.com/lucasmsp .. _Dylan Stewart: https://github.com/drstewart19 .. _geraninam: https://github.com/geraninam .. _Michael Milton: https://github.com/multimeric .. _Ruth Comer: https://github.com/rcomer .. _Frédéric BRIOL: https://github.com/fbriol .. _Jordan Yap: https://github.com/jjyap .. _Logan Norman: https://github.com/lognorman20 .. _ivojuroro: https://github.com/ivojuroro .. _Shaghayegh: https://github.com/Shadimrad .. _Hendrik Makait: https://github.com/hendrikmakait .. _Luke Conibear: https://github.com/lukeconibear .. _Nicolas Grandemange: https://github.com/epizut .. _Nat Tabris: https://github.com/ntabris .. _Lawrence Mitchell: https://github.com/wence- .. _nouman: https://github.com/noumxn .. _Tim Paine: https://github.com/timkpaine .. _ChrisJar: https://github.com/ChrisJar .. _Shingo OKAWA: https://github.com/ognis1205 .. _qheuristics: https://github.com/qheuristics .. _Jacob Hayes: https://github.com/JacobHayes .. _Shawn: https://github.com/chaokunyang .. _Erik Holmgren: https://github.com/Holmgren825 .. _aywandji: https://github.com/aywandji .. _Chiara Marmo: https://github.com/cmarmo .. _Jayesh Manani: https://github.com/jayeshmanani .. _Patrick Hoefler: https://github.com/phofl .. _Matthew Roeschke: https://github.com/mroeschke .. _Miles: https://github.com/milesgranger .. _Anton Loukianov: https://github.com/antonl .. _Brian Phillips: https://github.com/bphillips-exos .. _hotpotato: https://github.com/hotpotato .. _Alexander Clausen: https://github.com/sk1p .. _Swayam Patil: https://github.com/Swish78 .. _Johan Olsson: https://github.com/johanols .. _wkrasnicki: https://github.com/wkrasnicki .. _Michael Leslie: https://github.com/michaeldleslie .. _Samantha Hughes: https://github.com/shughes-uk .. _Mario Šaško: https://github.com/mariosasko .. _joanrue: https://github.com/joanrue .. _Andrew S. Rosen: https://github.com/Andrew-S-Rosen .. _jochenott: https://github.com/jochenott .. _FTang21: https://github.com/FTang21 .. _Erik Sundell: https://github.com/consideRatio .. _Julian Gilbey: https://github.com/juliangilbey .. _Charles Stern: https://github.com/cisaacstern .. _templiert: https://github.com/templiert .. _Lindsey Gray: https://github.com/lgray .. _wim glenn: https://github.com/wimglenn .. _Dimitri Papadopoulos Orfanos: https://github.com/DimitriPapadopoulos .. _Quentin Lhoest: https://github.com/lhoestq .. _Jonas Lähnemann: https://github.com/jlaehne .. _Abel Aoun: https://github.com/bzah .. _Simon Høxbro Hansen: https://github.com/Hoxbro .. _M Bussonnier: https://github.com/Carreau .. _Greg M. Fleishman: https://github.com/GFleishman .. _Victor Stinner: https://github.com/vstinner .. _alex-rakowski: https://github.com/alex-rakowski .. _Adam Williamson: https://github.com/AdamWill .. _Jonas Dedden: https://github.com/jonded94 .. _Bernhard Raml: https://github.com/SwamyDev .. _Lucas Colley: https://github.com/lucascolley .. _Tao Xin: https://github.com/Tao-VanJS .. _David Stansby: https://github.com/dstansby .. _Mario Linker: https://github.com/maldag .. _Dmitry Balabka: https://github.com/dbalabka .. _Martin Yeo: https://github.com/trexfeathers .. _Ilan Gold: https://github.com/ilan-gold .. _Jean-Baptiste Bayle: https://github.com/j2bbayle .. _dchudz: https://github.com/dchudz .. _Guido Imperiale: https://github.com/crusaderky .. _Alexander: https://github.com/SalikovAlex .. _Philipp A.: https://github.com/flying-sheep .. _Sergey Kolesnikov: https://github.com/SCORE1387 .. _Taylor Braun-Jones: https://github.com/nocnokneo .. _Isaac: https://github.com/icykip .. _Sandro: https://github.com/penguinpee .. _Brigitta Sipőcz: https://github.com/bsipocz .. _Raúl Cumplido: https://github.com/raulcd .. _Lukas Bindreiter: https://github.com/lukasbindreiter .. _Marvin Albert: https://github.com/m-albert .. _Peter Fackeldey: https://github.com/pfackeldey .. _Marco Edward Gorelli: https://github.com/MarcoGorelli .. _Peter A. Jonsson: https://github.com/pjonsson .. _Florian Courtial: https://github.com/fcourtial .. _Tony Ding: https://github.com/tonyyuyiding .. _Oisin-M: https://github.com/Oisin-M .. _Username46786: https://github.com/Username46786 .. _Maneesh Sutar: https://github.com/maneesh29s .. _Jianyu Sun: https://github.com/csfldf .. _DongWon: https://github.com/dongwonmoon .. _Simon-Martin Schröder: https://github.com/moi90 .. _Wouter-Michiel Vierdag: https://github.com/melonora .. _Clément Robert: https://github.com/neutrinoceros .. _Gautham Hullikunte: https://github.com/batcity .. _Vipin Kataria: https://github.com/vipinkataria2209 .. _Matthew Plough: https://github.com/mplough-kobold