docs/source/changelog.rst
.. note::
This is not exhaustive. For an exhaustive list of changes, see the git log.
.. _v2026.3.0:
Highlights ^^^^^^^^^^
12223) Guido Imperiale_9205) Dimitri Papadopoulos Orfanos_.. dropdown:: Additional changes
docs: document approximate algorithm and Dask-specific params in describe() (:pr:12300) Maxime Grenu_
docs: clarify coarsen reduction function contract (:pr:12314) monkeyjack123_
Fix misleading TypeError for scalar overflow in dask.array elemwise (:pr:12301) Maxime Grenu_
Stricter warnings filter (:pr:12274) Guido Imperiale_
Clean up obsolete PANDAS_GE markers (:pr:12279) Guido Imperiale_
Bump actions/upload-artifact from 6 to 7 (:pr:12311) dependabot[bot]_
Remove mention of obsolete default value for 'boundary' parameter. (:pr:12304) Marianne Corvellec_
Pandas in 3.14t CI (:pr:12284) Guido Imperiale_
Quadratic definition time in xarray.DataArray.to_zarr(compute=False) (:pr:12299) Guido Imperiale_
Bump scientific-python/issue-from-pytest-log-action from 1.4.0 to 1.5.0 (:pr:12294) dependabot[bot]_
test_tokenize_range_index fails if cityhash is not installed (:pr:12286) Guido Imperiale_
Bump minimum version of scipy (:pr:12271) Guido Imperiale_
Fix flaky categorical concat test (:pr:12276) Harshith J_
Doc: document Zarr compression options for to_zarr (:pr:12269) Harshith J_
Disable the GIL on 3.14t Windows CI (:pr:12280) Guido Imperiale_
Update obsolete pandas URLs (:pr:12278) Guido Imperiale_
Suppress warning: Consolidated metadata is not part of Zarr 3 (:pr:12273) Guido Imperiale_
Pandas4Warning: Copy-on-Write is always enabled with pandas >= 3.0 (:pr:12272) Guido Imperiale_
Disable the GIL in 3.14t CI (:pr:12270) Guido Imperiale_
Propagate contextvars to worker threads; catch warnings in 3.14t (:pr:12224) Guido Imperiale_
Fix bugs in env.yaml / pytest.xml upload (:pr:12266) Guido Imperiale_
Added full_matrices parameter to dask.array.linalg.svd (:pr:12292) Ayan Bag_
fix: zarr.create_array for better backward compatibility (:pr:12291) Wouter-Michiel Vierdag_
Silence deprecations in global config if local config overrides them (:pr:12315) Guido Imperiale_
Fix Total CPU % on /workers tab to normalize by total nthreads (:pr-distributed:9195) Ernest Provo_
setproctitle: avoid being caught by dask.config; add to test envs (:pr-distributed:9202) Guido Imperiale_
Add return type annotation for Client.register_plugin (:pr-distributed:9201) Simon-Martin Schröder
Bump actions/upload-artifact from 6 to 7 (:pr-distributed:9199) dependabot[bot]_
docs: fix Scheduler.close docstring (:pr-distributed:9198) Chase Naples_
Fix Total CPU % on /workers tab to normalize by total nthreads (:pr-distributed:9195) Ernest Provo_
XFAIL test_handle_null_partitions_2 (:pr-distributed:9191) Guido Imperiale_
Type hints for Future.status (:pr-distributed:9188) Navid_
Pin sphinx=8 (:pr-distributed:9190) Guido Imperiale_
.. _v2026.2.0:
Highlights ^^^^^^^^^^
.. dropdown:: Additional changes
scipy bumped to 1.10.0 (was 1.7.2).. _v2026.1.2:
Highlights ^^^^^^^^^^
**kwargs in to_zarr follow zarr-python API and add mode argument (:pr:12205) Wouter-Michiel Vierdag_.. note::
Passing on io-related arguments in ``**kwargs`` in ``to_zarr`` will be deprecated
and ``read_kwargs`` argument as well as ``zarr_array_kwargs`` (dict) introduced in 2025.12.0
has been removed.
If you passed on either ``mode`` or `read_only` as ``**kwargs`` or ``read_kwargs`` in
``to_zarr``, please use the new ``mode`` argument. The ``read_only`` argument can still
be passed on, but it will give a warning and have no effect (given that ``to_zarr``
is meant to write this should not be an issue). For now no error will be thrown.
``**kwargs`` in ``to_zarr`` has been renamed as ``**zarr_array_kwargs`` to indicate
that this directly follows the ``zarr-python`` API of ``Group.create_array``
when ``zarr>v3.0.0`` and ``zarr.create`` for ``zarr<v3.0.0``. Please see
:func:`dask.array.to_zarr` for more.
.. dropdown:: Additional changes
h5py bumped to 3.7.0 (was 3.4.0)python-snappy bumped to 0.7.1 (was 0.6.0)tiledb bumped to 0.27.0 (was 0.12.0).. _v2026.1.1:
Highlights ^^^^^^^^^^
CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ Jacob Tomlinson_12213) Matthew Rocklin_.. dropdown:: Additional changes
Remove the Python 2 Comment (:pr:12229) Vipin Kataria_
Fix changelog: distributed-pr -> pr-distributed (:pr:12227) Matthew Plough_
Support duck-typed Futures in task graph processing (:pr:12213) Matthew Rocklin_
Relax test_serialization (:pr:12226) Guido Imperiale_
[cosmetic] Reorganise dependency groups in CI environment files (:pr:12222) Guido Imperiale_
Review _array_expr_enabled() (:pr:12217) Guido Imperiale_
Increase coverage; lower codecov threshold to pass (:pr:12214) Guido Imperiale_
Test array expr on mindeps (:pr:12216) Guido Imperiale_
Disable some Mac builds (:pr:12218) Guido Imperiale_
Typing tweaks (:pr:12215) Guido Imperiale_
[CI] unbreak codecov (:pr:12211) Guido Imperiale_
Test array expr on Python 3.14 (:pr:12212) Guido Imperiale_
Fix pickle compatibility for Python 3.14 (:pr:12206) Matthew Rocklin_
Remove deprecated dask._compatibility.entry_points (:pr:12202) Guido Imperiale_
Tweak MacOS CI (:pr:12200) Guido Imperiale_
Remove obsolete CI pins (:pr:12199) Guido Imperiale_
Fix XSS vulnerability CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ Jacob Tomlinson_
Clean up obsolete pins in CI (:pr-distributed:9172) Guido Imperiale_
Fix incompatibility of pyparsing vs. packaging in mindeps CI (:pr-distributed:9170) Guido Imperiale_
Bump mypy; fix mypy failure (:pr-distributed:9171) Guido Imperiale_
.. _v2026.1.0:
Broken yanked release, please ignore.
.. _v2025.12.0:
Highlights ^^^^^^^^^^
Tom Augspurger_12153) Wouter-Michiel Vierdag_Dimitri Papadopoulos Orfanos_12194) Richard (Rick) Zamora_.. dropdown:: Additional changes
Stable sort in Series.value_counts for pandas 3.x (:pr:12191) Tom Augspurger_
Add new "optimization.tune.active" configuration option to disable partition fusion (:pr:12194) Richard (Rick) Zamora_
Build llms.txt files in Sphinx documentation (:pr:12192) Jacob Tomlinson_
Support zarr sharding through create_array (:pr:12153) Wouter-Michiel Vierdag_
Support min/max of datetime (:pr:12183) Julia Signell_
pandas 3.x compatibility (:pr:12180) Tom Augspurger_
Minimal version of setuptools-scm (:pr:12184) Dimitri Papadopoulos Orfanos_
Update test_ufunc_meta for upstream-dev failure (:pr:12170) Tom Augspurger_
Upstream compat (:pr:12165) Tom Augspurger_
Enforce a few more ruff rules (:pr:12157) Dimitri Papadopoulos Orfanos_
Enforce ruff/refurb rules (FURB) (:pr:12144) Dimitri Papadopoulos Orfanos_
DEP: bump minimal requirement on toolz (0.10.0 -> 0.12.0) (:pr:12163) Clément Robert_
Fix execution stop in da.to_zarr due to (misleading) PerformanceWarning raised as exception (:pr:12161) Marvin Albert_
Use f-string interpolation where possible (:pr:12140) Dimitri Papadopoulos Orfanos_
pre-commit black hook: use implicit defaults (:pr:12156) Dimitri Papadopoulos Orfanos_
Enforce ruff/pygrep-hooks rules (PGH) (:pr:12143) Dimitri Papadopoulos Orfanos_
Apply Repo-Review rules (:pr:12148) Dimitri Papadopoulos Orfanos_
Document groupby: split_every, split_out (:pr:12135) Jayesh Manani_
isort → ruff (:pr:12149) Dimitri Papadopoulos Orfanos_
Enforce ruff/pyupgrade rule UP031 (:pr:12137) Dimitri Papadopoulos Orfanos_
Replace pre-commit hook with ruff rule (:pr:12142) Dimitri Papadopoulos Orfanos_
Fix reify to handle sparse arrays and other objects without len (:pr:12103) Gautham Hullikunte_
Ruff supersedes absolufy-imports (:pr:12141) Dimitri Papadopoulos Orfanos_
Enforce ruff/pyupgrade rule UP032 (:pr:12136) Dimitri Papadopoulos Orfanos_
Typing fixes (:pr-distributed:9159) Jacob Tomlinson_
Explicit setuptools-scm minimum version (:pr-distributed:9160) Jacob Tomlinson_
Enforce ruff rules (RUF) (:pr-distributed:9153) Dimitri Papadopoulos Orfanos_
Clean up MANIFEST.in (:pr-distributed:9149) Dimitri Papadopoulos Orfanos_
isort → ruff (:pr-distributed:9152) Dimitri Papadopoulos Orfanos_
Ruff supersedes absolufy-imports (:pr-distributed:9154) Dimitri Papadopoulos Orfanos_
Bump minimum supported toolz to 0.12.0 (:pr-distributed:9151) James Bourbeau_
flake8, bugbear, pyupgrade → ruff (:pr-distributed:9147) Dimitri Papadopoulos Orfanos_
Fix typos found by codespell (:pr-distributed:9145) Dimitri Papadopoulos Orfanos_
Clean up setuptools-specific configuration (:pr-distributed:9150) Dimitri Papadopoulos Orfanos_
PEP 639 compliance (:pr-distributed:9146) Dimitri Papadopoulos Orfanos_
Update black (:pr-distributed:9148) Dimitri Papadopoulos Orfanos_
Fix empty progress bar (:pr-distributed:9144) Jacob Tomlinson_
Exclude broken tblib versions in CI (:pr-distributed:9141) Jacob Tomlinson_
.. _v2025.11.0:
Highlights ^^^^^^^^^^
to_zarr (:pr:12105) Davis Bennett_9133) Jianyu Sun_.. dropdown:: Additional changes
Replace versioneer with setuptools-scm (:pr:12133) Jacob Tomlinson_
Apply ruff/Pylint Refactor rules (PLR) (:pr:12010) Dimitri Papadopoulos Orfanos_
Remove files from MANIFEST.in (:pr:12041) Dimitri Papadopoulos Orfanos_
Stabilize test_filter_nonpartition_columns (:pr:12131) DongWon_
Enforce ruff/pyupgrade rules UP007 and UP033 (:pr:12125) Dimitri Papadopoulos Orfanos_
Update np.accumulate workaround comment (:pr:12129) Jacob Tomlinson_
flake8, bugbear, pyupgrade → ruff (:pr:12002) Dimitri Papadopoulos Orfanos_
Adjust pyarrow version skip in test_parquet (:pr:12124) Tom Augspurger_
Fix ufunc in dask.array.cumreduction (:pr:12119) Tony Ding_
Fix docs footer (:pr:12120) Jacob Tomlinson_
Use integer multiple of shard shape when rechunking in to_zarr (:pr:12106) Davis Bennett_
Ensure that the shard shape is used as the default chunk shape for sharded Zarr arrays (:pr:12104) Davis Bennett_
Skip test_parquet for pyarrow==22.0 (:pr:12116) Tom Augspurger_
Clean up setuptools-specific configuration (:pr:12040) Dimitri Papadopoulos Orfanos_
PEP 639 compliance (:pr:12024) Dimitri Papadopoulos Orfanos_
Fix deprecated quantile interpolation being passed to numpy (:pr:12108) David Hoese_
Add uv.lock to .gitignore (:pr:12110) Jacob Tomlinson_
Use shard shape when available in to_zarr (:pr:12105) Davis Bennett_
Add more optional dependencies to Python 3.13 CI builds (:pr:12100) James Bourbeau_
Remove pip pin for docs (:pr:12102) James Bourbeau_
Address collection-based meta arguments in GroupByApply (:pr:12099) Richard (Rick) Zamora_
Replace versioneer with setuptools-scm (:pr-distributed:9137) Jacob Tomlinson_
Improve worker and nanny support for ipv6 (:pr-distributed:9133) Jianyu Sun_
Fix CI Multiple aliased keys in file /Users/runner/.condarc (:pr-distributed:9136) Jacob Tomlinson_
Remove pip pin for docs (:pr-distributed:9132) James Bourbeau_
Remove UCX configuration schema (:pr-distributed:9127) Peter Andreas Entschev_
Add generic type support to Future and Client methods (:pr-distributed:9123) Simon-Martin Schröder_
.. _v2025.10.0:
Highlights ^^^^^^^^^^
12097, :pr:12089, :pr:12088, and :pr:12090... dropdown:: Additional changes
Use updated docs theme (:pr:12093) Jacob Tomlinson_
Fix: dask.array.cumprod does not deal with dtype (:pr:12097) Tony Ding_
CuPy compatibility for percentile (:pr:12098) Tom Augspurger_
Avoid using methods.concat on empty lists (:pr:12096) Tony Ding_
Add distribution check for optional dependencies (:pr:12087) James Bourbeau_
Fix percentile inconsistencies (:pr:12088) Oisin-M_
Fix warning in test_ufunc_where_no_out (:pr:12094) Tom Augspurger_
Fix/choose trivial case (:pr:12090) Oisin-M_
Add input validation on dask.dataframe.read_sql_query() (:pr:12091) Jacob Tomlinson_
Numpy 2.2 updates for cov function with tests (:pr:12079) Mike McCarty_
Fix nanvar (:pr:12089) Oisin-M_
Document manually triggering the conda-forge bots (:pr:12083) Jacob Tomlinson_
Fix mixed HLG/Expr handling in _ExprSequence._simplify_down (:pr:12081) Richard (Rick) Zamora_
Add dask.tokenize to API docs (:pr:12080) Username46786_
CreateOverlappingPartitions: Add before and after to prepend name (:pr:11965) Fabien Aulaire_
Fix scipy.sparce.csc_matrix scalar declaration in _array_like_safe (:pr:12078) Ilan Gold_
Update docs theme and remove docs env pins (:pr-distributed:9125) Jacob Tomlinson_
Add worker name as prefix to ThreadPoolExecutor name (:pr-distributed:9120) Maneesh Sutar_
Skip hanging SSH tests on Windows (:pr-distributed:9115) Jacob Tomlinson_
Fix macOS CI failure during job startup (:pr-distributed:9113) Jacob Tomlinson_
Prevent task stream dashboard showing 1970 date (:pr-distributed:9109) Guillaume Eynard-Bontemps_
.. _v2025.9.2:
This is a backport security release only.
See CVE-2026-23528 <https://github.com/dask/distributed/security/advisories/GHSA-c336-7962-wfj2>_ for more details.
.. _v2025.9.1:
Highlights ^^^^^^^^^^
12075) Tom Augspurger_.groups (:pr:12071) Tom Augspurger_.. dropdown:: Additional changes
12075) Tom Augspurger_.groups (:pr:12071) Tom Augspurger_9092) Taylor Braun-Jones_9111) Jacob Tomlinson_.. _v2025.9.0:
Highlights ^^^^^^^^^^
12025) Tom Augspurger_9105) Peter Andreas Entschev_.. dropdown:: Additional changes
Fix 0 scalar setting for scipy.sparse (:pr:12027) Ilan Gold_
Workaround failing upstream-dev tests (:pr:12061) Tom Augspurger_
avoid instantiating a potentially very large arange in take (:pr:11998) Justus Magin_
MAINT: address NumPy deprecation in np.minimum (:pr:12059) Marco Edward Gorelli_
CI fixes (:pr:12058) Tom Augspurger_
MAINT: Address NumPy DeprecationWarning (:pr:12056) Marco Edward Gorelli_
Fix test_enforce_columns on Python 3.14 (:pr:12047) Elliott Sales de Andrade_
Fix "th" --> "the" typo in DataFrame SQL docs (:pr:12038) Peter A. Jonsson_
Advance rng state in permutation (:pr:12031) James Bourbeau_
Fix pyarrow chunked array conversion (:pr:12034) James Bourbeau_
Fix xfail condition for pyarrow large_string issue (:pr:12032) James Bourbeau_
pandas 3.x compatibility (:pr:12025) Tom Augspurger_
Fix name not propagated correctly in map_blocks (:pr:11952) Ilan Gold_
Clean tuples dict keys from workers_info in /api/v1/retire_workers. (:pr-distributed:8996) Florian Courtial_
Remove protocol="ucx" support in favor of distributed-ucxx (:pr-distributed:9105) Peter Andreas Entschev_
.. _v2025.7.0:
Highlights ^^^^^^^^^^
__main__ in pickle normalization (:pr:11970) James Bourbeau_MapPartitions (:pr:11875) Richard (Rick) Zamora_direct-to-workers (:pr-distributed:9097) James Bourbeau_.. dropdown:: Additional changes
CI: update actions location (:pr:12019) Brigitta Sipőcz_
Apply ruff/flake8-comprehensions rules (C4) (:pr:12004) Dimitri Papadopoulos Orfanos_
Apply ruff/flake8-pie rules (PIE) (:pr:12006) Dimitri Papadopoulos Orfanos_
Apply ruff/Pylint Error rules (PLE) (:pr:12013) Dimitri Papadopoulos Orfanos_
Apply ruff/Pylint Convention rules (PLC) (:pr:12012) Dimitri Papadopoulos Orfanos_
Apply ruff/flake8-pyi rules (PYI) (:pr:12007) Dimitri Papadopoulos Orfanos_
Apply ruff/flake8-simplify rules (SIM) (:pr:12008) Dimitri Papadopoulos Orfanos_
Apply ruff/Pylint Warning rules (PLW) (:pr:12011) Dimitri Papadopoulos Orfanos_
Apply ruff/flake8-implicit-str-concat rules (ISC) (:pr:12005) Dimitri Papadopoulos Orfanos_
Apply ruff/pycodestyle rule E714 (:pr:12000) Dimitri Papadopoulos Orfanos_
Fix typos found by codespell (:pr:12001) Dimitri Papadopoulos Orfanos_
Update PyPI URL for official nightly pyarrow repository (:pr:11996) Raúl Cumplido_
Fall-back to textual repr in case jinja2 is not installed (:pr:11987) Lukas Bindreiter_
Prevent builtins.any from being shadowed in dask.array.reductions (:pr:11988) Marvin Albert_
Bump conda-incubator/setup-miniconda from 3.1.1 to 3.2.0 (:pr:11982)
Skip groupby cov test for pandas 3.x (:pr:11977) Tom Augspurger_
Fix upstream CI installation (:pr:11976) James Bourbeau_
Make module name logic more resilient in Dispatch (:pr:11974) James Bourbeau_
Ensure memray profiler runs on all workers (:pr-distributed:9095) James Bourbeau_
Update def to class typo in actors docs (:pr-distributed:9091) Peter Fackeldey_
Bump conda-incubator/setup-miniconda from 3.1.1 to 3.2.0 (:pr-distributed:9090)
Update persist in tests for async clients (:pr-distributed:9089) Tom Augspurger_
Fix pyarrow FileInfo import (:pr-distributed:9078) James Bourbeau_
Make module name logic more resilient in _always_use_pickle_for (:pr-distributed:9086) James Bourbeau_
Temporarily pin pytest in CI to avoid coverage error (:pr-distributed:9088) James Bourbeau_
Remove s3fs from testing CI environment (:pr-distributed:9087) James Bourbeau_
Reuse Comm objects in Scheduler.broadcast (:pr-distributed:9083) Tom Augspurger_
Fix test_resubmit_nondeterministic_task_different_deps (:pr-distributed:9085) James Bourbeau_
.. _v2025.5.1:
Highlights
^^^^^^^^^^
Fixed Dask Array slicing regression introduced in the 2025.5.0 release.
See :pr:11947 from Florian Jetter_ for more details.
.. dropdown:: Additional changes
11945) Florian Jetter_task_spec.parse_input" (:pr:11953) Florian Jetter_11946) Florian Jetter_xarray slicing regression (:pr:11947) Florian Jetter_task_spec.parse_input (:pr:11948) Florian Jetter_.. _v2025.5.0:
Highlights ^^^^^^^^^^
setitem when both the array and the indexer have unknown shape.
See :pr:11753 from Tom Augspurger_ for more details.delayed graph handling issues introduced in the 2025.4.0 release.
See :pr:11917, :pr:11907, and :pr-distributed:9071 from Florian Jetter_ for more details... dropdown:: Additional changes
Speed up slicing graph generation (:pr:11945) Florian Jetter_
Optimize dask order for worst case of get_target (:pr:11935) Florian Jetter_
Raise on local executor if tasks are missing dependency (:pr:11944) Florian Jetter_
Fix to_dask_array for single partition (:pr:11931) James Bourbeau_
Ensure parquet plan is fully cached during optimization (:pr:11933) Florian Jetter_
Better documentation for expression system (:pr:11915) Florian Jetter_
Simplify (and speed up) culling (:pr:11899) Florian Jetter_
Update pre-commit (:pr:11926) Florian Jetter_
Don't run post setup-miniconda step in CI (:pr:11925) James Bourbeau_
Try to pin pip for readthedocs (:pr:11923) Florian Jetter_
Fix windows CI (:pr:11919) Florian Jetter_
Use stable crick for py310 (:pr-distributed:9072) Florian Jetter_
Remove internal dependencies mapping in update_graph (:pr-distributed:9036) Florian Jetter_
Partially forgotten dependencies (:pr-distributed:9068) Florian Jetter_
Replace filesystem-spec in CI environment with fsspec (:pr-distributed:9069) James Bourbeau_
Ensure actors set erred state properly in case of worker failure (:pr-distributed:9067) Florian Jetter_
Refactor timeouts in start cluster (:pr-distributed:9062) Florian Jetter_
Fix workers / threads / memory displayed in client repr (:pr-distributed:9066) James Bourbeau_
Pin pip for readthedocs (:pr-distributed:9063) Florian Jetter_
Skip TLS functional tests (:pr-distributed:9061) Florian Jetter_
Ensure client submit does not serialize unnecessarily (:pr-distributed:9057) Florian Jetter_
.. _v2025.4.1:
Highlights
^^^^^^^^^^
This release contains several graph optimization fixes for issues introduced in the 2025.4.0 release.
See :pr:11906, :pr:11898, :pr:11903, and :pr:11904 by Florian Jetter_ for more details.
.. dropdown:: Additional changes
ufuncs and gufunc for array-expr (:pr:11818) Patrick Hoefler_map_overlap for array-expr (:pr:11822) Patrick Hoefler_.. _v2025.4.0:
Highlights ^^^^^^^^^^
force for DataFrame.shuffle which signals the optimizer to not
drop the shuffle during optimization.Breaking changes ^^^^^^^^^^^^^^^^
dask.optimize will now always trigger graph materialization.
Previously this was not always the case. This also causes any low level HLG
annotations to be dropped.dask.compute, DaskCollection.compute, or Client.compute).dask.base.collections_to_dsk has been renamed to collections_to_expr and
no longer returns a HighLevelGraph or dict object but instead
guarantees an dask._expr.Expr object. Further, it no longer performs low
level optimization immediately but instead delays until the Expr instance
is materialized, i.e. the returned object is no longer a mapping such that
converting it to dict or iterating over it is not possible any more... dropdown:: Additional changes
Ensure Future value is in da.from_delayed task graph (:pr:11896) Tom Augspurger_
Fix annotations passed to delayed (:pr:11893) Florian Jetter_
Migrate delayed unpack_collections (:pr:11881) Florian Jetter_
Remove Pub / Sub references from docs (:pr:11891) James Bourbeau_
Ensure only classes without custom init are singletons (:pr:11886) Florian Jetter_
Remove custom initializers for delayed expressions (:pr:11888) Florian Jetter_
Fix persisting multiple DFs at the same time (:pr:11887) Florian Jetter_
Avoid always parsing list inputs to DataFrame.isin as object type numpy arrays (:pr:11869) Matthew Roeschke_
Unskip pandas-dev cov / corr tests (:pr:11873) Tom Augspurger_
HLG blockwise fix (:pr:11871) Florian Jetter_
Ensure annotations for HLG objects are properly generated (:pr:11866) Florian Jetter_
Factor out singleton logic from base Expr class (:pr:11868) Florian Jetter_
Ensure HLGs are using dependencies properly in optimization (:pr:11859) Florian Jetter_
Ensure dictionaries tokenize deterministically (:pr:11867) Florian Jetter_
Ensure default dask scheduler only compute what's needed (:pr:11861) Florian Jetter_
Faster tokenization of pd.RangeIndex (:pr:11863) Florian Jetter_
Update link to Quansight in community doc (:pr:11860) Pavithra Eswaramoorthy_
Relax tolerance in autocorr test (:pr:11857) Tom Augspurger_
Use map_blocks in array.store to avoid materialization and dropping of annotations (:pr:11844) Florian Jetter_
Ensure repartition does not trigger memory size computation during lowering (i.e. on the scheduler) (:pr:11855) Florian Jetter_
Support args and kwargs for rolling aggregations (:pr:11856) Florian Jetter_
Remove nightly h5py from upstream CI job (:pr:11847) James Bourbeau_
Ensure HLGExpr tokenize uniquely (:pr:11849) Florian Jetter_
Do not inject median in describe for pandas 3 (:pr:11846) Florian Jetter_
Fixed Expr.__setattr__ for subclasses (:pr:11845) Tom Augspurger_
Wrap HLGs in an Expr to avoid Client side materialization (:pr:11736) Florian Jetter_
Improve error when submitting work from a closed client (:pr-distributed:9049) James Bourbeau_
Return a default value if address resolution fails (:pr-distributed:9051) Sandro_
Avoid deepcopy when submitting graph (:pr-distributed:8633) Florian Jetter_
Dynamically scale heartbeat and scheduler_info intervals (:pr-distributed:9046) Florian Jetter_
Speed up process startup time by avoiding importing packages on version check (:pr-distributed:9048) Florian Jetter_
Reduce size of scheduler_info (:pr-distributed:9045) Florian Jetter_
Cache WorkerState host property (:pr-distributed:9044) Florian Jetter_
Clear ci env cache (:pr-distributed:9047) Florian Jetter_
Remove deprecated Pub / Sub (:pr-distributed:9039) Florian Jetter_
Perform explicit culling step only if LLG is submitted (:pr-distributed:9040) Florian Jetter_
Do not fully materialize global annotations by type (:pr-distributed:9035) Florian Jetter_
Allow nested worker_client calls (:pr-distributed:9038) George Sakkis_
Dump ci cache (:pr-distributed:9037) Florian Jetter_
Scheduler type annotations (:pr-distributed:9030) Florian Jetter_
Reduce dask.order overhead by removing stripped_dep computation (:pr-distributed:9031) Florian Jetter_
Use Expr instead of HLG (:pr-distributed:9008) Florian Jetter_
.. _v2025.3.0:
Highlights ^^^^^^^^^^
Automatically adjust chunksizes in xarray.apply_ufunc
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
apply_ufunc requires the core dimension to have chunksize=-1. The underlying
rechunking operation will automatically adjust the chunksize of the core dimension
but keep the other dimensions the same. This can cause exploding chunksizes under the hood.
This release adds an intermediate step that resizes the non-core dimensions by the same factor
that the core dimension will increase to keep the maximum chunksize under control. This behavior
is automatically enabled when allow_rechunk=True is set.
.. code-block::
import xarray as xr
import dask.array as da
arr = xr.DataArray(
da.random.random((1, 750, 45910), chunks=(1, "auto", -1)),
dims=["band", "y", "x"],
)
result = arr.interp(
y=arr.coords["y"],
method="linear",
)
.. grid:: 2
.. grid-item:: **Previously**
Individual chunks are exploding to 25 GiB, likely causing out of memory errors.
.. image:: images/changelog/gufunc_chunksizes_exploding.png
:width: 100%
:align: center
:alt: Individual chunks are exploding to 25 GiB, likely causing out of memory errors.
.. grid-item:: **Now**
Dask will now automatically split individual chunks into chunks that will have the
same chunksize minus a small tolerance.
.. image:: images/changelog/gufunc_chunksizes_constant.png
:width: 100%
:align: center
:alt: Individual chunks are now roughly the same size
.. dropdown:: Additional changes
Fix dataset info cache assignment (:pr:11840) Florian Jetter_
Expr setattr (:pr:11836) Florian Jetter_
Follow up to expression tokenization caching (:pr:11837) Florian Jetter_
Consolidate getattr for expr classes (:pr:11835) Florian Jetter_
Reduce pickle size of ReadParquet expression (:pr:11797) Florian Jetter_
arange loses precision on ~2**63 (:pr:11801) Guido Imperiale_
Remove numbagg from upstream build (:pr:11821) Patrick Hoefler_
Dispatch to numbagg for nanmedian and nanquantile (:pr:11817) Patrick Hoefler_
Make missing meta warning more ergonomic (:pr:11814) Patrick Hoefler_
Remove name doc from from_pandas (:pr:11812) Patrick Hoefler_
Implement an Array Scalar (:pr:11810) Patrick Hoefler_
Added to_orc to DataFrame API (:pr:11807) Tom Augspurger_
Implement reverse indexing for DataFrames (:pr:11803) Patrick Hoefler_
Add lazy to_pandas_dispatch registration for cudf (:pr:11799) Richard (Rick) Zamora_
Fix missing imports in array-expr (:pr:11796) Florian Jetter_
Cache tokens on expressions and restore after pickle roundtrip (:pr:11791) Florian Jetter_
Use random dashboard ports for LocalCluster in distributed tests (:pr:11795) Florian Jetter_
Implement slicing for array-expr (:pr:11783) Patrick Hoefler_
Never use an asynchronous Client when calling top level compute function (:pr:11790) Florian Jetter_
Refactor import tests (:pr:11794) Florian Jetter_
Migrate base.unpack_collections to Task class (:pr:11793) Florian Jetter_
Ensure map_blocks generates unique tokens (:pr:11792) Florian Jetter_
Speed up normalize_pickle by 50 percent (:pr:11788) Florian Jetter_
Fix divisions calculation with duplicates (:pr:11787) Patrick Hoefler_
Fix assign align for duplicated divisions (:pr:11786) Patrick Hoefler_
Ensure concat optimize project does not raise (:pr:11784) Florian Jetter_
Add array-expr from_array (:pr:11772) Patrick Hoefler_
Keep chunksizes consistent in apply_gufunc (:pr:11683) Patrick Hoefler_
Test dask.dataframe.__all__ (:pr:11782) Philipp A._
Add __all__ to dask.bag (:pr:11781) Philipp A._
Add test for dask.array.__all__ (:pr:11780) Philipp A._
Bump JamesIves/github-pages-deploy-action from 4.7.2 to 4.7.3 (:pr:11777)
Export dask.array members (:pr:11779) Philipp A._
Fix sorted_divisions_locations with duplicates (:pr:11773) Tom Augspurger_
Fix small typo in best-practices.rst (:pr:11775) Sergey Kolesnikov_
Allow unknown chunks in blockwise adjust_chunks (:pr:11769) Lindsey Gray_
Fix crash in asarray(..., like=...) vs. scipy.sparse objects (:pr:11755) Guido Imperiale_
Remove flaky optional dependency (:pr:11771) Tom Augspurger_
Add support for scipy sparray (:pr:11750) Philipp A._
Added flaky to tests extra (:pr:11770) Tom Augspurger_
Ensure divisions are plain scalars (:pr:11767) Tom Augspurger_
Remove divisions code duplication (:pr:11764) Florian Jetter_
Ensure divisions not diverging from npartitions in Merge (:pr:11762) Florian Jetter_
Skip test_visualize_int_overflow on windows (:pr:11761) Florian Jetter_
Reduce pickle size for tasks (:pr:11687) Florian Jetter_
Implement unify_chunks and Rechunk (:pr:11692) Patrick Hoefler_
Fix expression getitem to avoid alignment (:pr:11760) Patrick Hoefler_
arange(..., like=x) embeds the graph of x (:pr:11754) Guido Imperiale_
Simplify assert_divisions (:pr:11745) Florian Jetter_
Fix Projection logic for Series objects (:pr:11747) Patrick Hoefler_
Remove bytes as keys (:pr:11757) Florian Jetter_
Ensure map_partitions returns Series object if function returns scalar (:pr:11756) Florian Jetter_
Don't upload env twice (:pr:11748) Patrick Hoefler_
Fix badges in readme (:pr-distributed:9029) Florian Jetter_
Properly forward cancellation reason (:pr-distributed:9028) Florian Jetter_
Fix bokeh circle (:pr-distributed:9026) Florian Jetter_
Ensure FileInfo can be serialized (:pr-distributed:9025) Florian Jetter_
Add ipykernel to skipped modules in code sampling (:pr-distributed:9022) Matthew Rocklin_
SpecCluster: add option to not shut down the scheduler when the cluster is closed (:pr-distributed:9021) Taylor Braun-Jones_
Fix CI by using client.persist(collection) instead of collection.persist() (:pr-distributed:9020) Hendrik Makait_
Add redirect from prefix root to status (:pr-distributed:9015) Isaac_
Bump JamesIves/github-pages-deploy-action from 4.7.2 to 4.7.3 (:pr-distributed:9018)
Remove bytes keys from tests (:pr-distributed:9017) Jacob Tomlinson_
.. _v2025.2.0:
Highlights ^^^^^^^^^^ This release includes a critical fix that fixes a deadlock that can arise when seceded task are rescheduled, or cancelled and resubmitted, e.g. due to a worker being lost.
See :pr-distributed:8991 by Hendrik Makait_ for more details.
.. dropdown:: Additional changes
Add big array example (:pr:11744) James Bourbeau_
Fix exploding chunksizes in pad for constant padding (:pr:11743) Patrick Hoefler_
Move optimize method to base class (:pr:11742) Florian Jetter_
Add changelog entry for fixed deadlock (:pr:11741) Hendrik Makait_
Fix graph creation in dask-expr to_delayed (:pr:11739) Patrick Hoefler_
Remove culling from delayed optimisation (:pr:11737) Patrick Hoefler_
Compute meta for from_map on the cluster (:pr:11738) Patrick Hoefler_
Bugs in __setitem__ with dask bool mask (:pr:11728) Guido Imperiale_
Implement infrastructure, random, blockwise and Elemwise (:pr:11689) Patrick Hoefler_
array / asarray with both like= and dtype= (:pr:11733) Guido Imperiale_
Fix annotations warnings test (:pr:11734) Patrick Hoefler_
Catch warnings when writing to remote storage with to_parquet (:pr:11731) Patrick Hoefler_
Remove LocalCluster from tests (:pr:11729) Patrick Hoefler_
Fix partition pruning when using from_array (:pr:11725) Patrick Hoefler_
Fix concatentation with mixed dtype columns (:pr:11727) Patrick Hoefler_
arange: fix extreme values (:pr:11707) Guido Imperiale_
Graph corruption on scalar getitem -> setitem (:pr:11723) Guido Imperiale_
Never share buffers after compute() (:pr:11697) Guido Imperiale_
Extract Dask Array from xarray DataArray in from_array (:pr:11712) Patrick Hoefler_
arange: support kwargs (:pr:11710) Guido Imperiale_
Ensure normalize_token is threadsafe (:pr:11709) Florian Jetter_
Expand advise for instance types and processes (:pr:11705) Florian Jetter_
Drop legacy timeseries implementation (:pr:11704) Florian Jetter_
Update Dask Cloud Provider documentation to include Nebius as a supported cloud option (:pr:11703) Alexander_
Fix normalize_chunks when squashing into a single chunk (:pr:11702) Patrick Hoefler_
Fix positional indexing with newaxis (:pr:11699) Patrick Hoefler_
Set array backend in scipy-sparse-indexing (:pr:11700) Tom Augspurger_
Fix value_counts shuffling strategy (:pr:11698) Patrick Hoefler_
Disentangle core expression class from dataframe specific code (:pr:11688) Patrick Hoefler_
Bump conda-incubator/setup-miniconda from 3.1.0 to 3.1.1 (:pr:11685)
Fixup dataframe conversion from array methods (:pr:11684) Patrick Hoefler_
Remove remaining artifacts of fastparquet (:pr:11682) Patrick Hoefler_
Remove traceback from sizeof failure warning (:pr-distributed:9006) Jacob Tomlinson_
Hotfix: Ignore negative occupancy (:pr-distributed:9012) Hendrik Makait_
Remove expensive tokenization for key uniqueness check (:pr-distributed:9009) Patrick Hoefler_
Fix CI for changes in from_map (:pr-distributed:9011) Patrick Hoefler_
Avoid handling stale long-running messages on scheduler (:pr-distributed:8991) Hendrik Makait_
Bump test_stress timeout (:pr-distributed:9002) Tom Augspurger_
Poll in test_rmm_metrics test (:pr-distributed:9004) Tom Augspurger_
Cache occupancy in WorkStealing.balance() (:pr-distributed:9005) Hendrik Makait_
Homogeneous balancing by accounting for in-flight requests (:pr-distributed:9003) Hendrik Makait_
Consistent estimation of task duration between stealing, adaptive and occupancy calculation (:pr-distributed:9000) Hendrik Makait_
Increase default work-stealing interval by 10x (:pr-distributed:8997) Hendrik Makait_
Remove occupancy plot from status dashboard (:pr-distributed:8995) Hendrik Makait_
Bump conda-incubator/setup-miniconda from 3.1.0 to 3.1.1 (:pr-distributed:8990)
.. _v2025.1.0:
Highlights ^^^^^^^^^^
Legacy Dask DataFrame Implementation removed """"""""""""""""""""""""""""""""""""""""""""
This release drops the legacy Dask DataFrame implementation. The API with query planning is now the only available Dask DataFrame implementation.
This enforces the deprecation of the configuration:
.. code-block::
dask.config.set({"dataframe.query-planning": False})
Dask-Expr was merged into the dask package as well as the dask/dask repository. It is no longer necessary to install dask-expr separately.
Reducing Memory Pressure for Xarray Workloads """""""""""""""""""""""""""""""""""""""""""""
Dask introduced a mechanism that is called root task queuing <https://distributed.dask.org/en/stable/scheduling-policies.html#queuing>_
in 2022. This mechanism allows Dask to detect tasks that are reading data from storage
and schedule them defensively to avoid memory pressure on the cluster through overproduction
of these tasks. The underlying mechanism was very fragile and failed for specific types of
computations like opening multiple zarr stores or loading a large number of netcdf files.
The recent changes in Dask's task graph representation allow for more robust detection of root tasks. This change makes the detection mechanism independent of the workload running and is especially beneficial for Xarray workloads.
This results in significantly more memory stability and a reduced memory footprint for workloads where root task detection was previously failing and makes the expected memory profile deterministic and independent of the topology of the task graph.
.. _v2024.12.1:
Highlights ^^^^^^^^^^
Improved scheduler responsiveness for large task graphs """"""""""""""""""""""""""""""""""""""""""""""""""""""" This release reduces the number of Python object references related to tracking tasks by the Dask scheduler. This increases scheduler responsiveness by reducing the time needed to run garbage collection on the scheduler.
See :issue:8958, :pr:11608, :pr:11600, :pr:11598,
:pr:11597, and :pr-distributed:8963 from Hendrik Makait_ for more details.
.. dropdown:: Additional changes
Fix map_overlap bug where rechunking and trim=False caused inconsistent chunkings (:pr:11605) Patrick Hoefler_
Avoid legacy implementation in read-csv (:pr:11603) Patrick Hoefler_
Remove legacy DataFrame import (:pr:11604) Patrick Hoefler_
asarray ignores dtype for array inputs (:pr:11586) crusaderky_
Add back LLM chatbot to Dask docs (:pr:11594) dchudz_
Bump JamesIves/github-pages-deploy-action from 4.6.9 to 4.7.2 (:pr:11593)
Migrate dask array creation routines to task spec (:pr:11582) James Bourbeau_
Migrate most of dask array random to task spec (:pr:11581) James Bourbeau_
Do not use local function in array.push (:pr:11576) Florian Jetter_
Bump conda-incubator/setup-miniconda from 3.0.3 to 3.1.0 (:pr-distributed:8922)
Pick random dashboard port in tests (:pr-distributed:8965) Hendrik Makait_
Fix formatting for NoValidWorkerException message (:pr-distributed:8967) Hendrik Makait_
Support pynvml>=11.5 in WSL (:pr-distributed:8962) Richard (Rick) Zamora_
Bump JamesIves/github-pages-deploy-action from 4.6.9 to 4.7.2 (:pr-distributed:8960)
.. _v2024.12.0:
Highlights ^^^^^^^^^^
Python 3.13 Support """"""""""""""""""" This release adds support for Python 3.13. Dask now supports Python 3.10-3.13.
See :pr:11456 and :pr-distributed:8904 from Patrick Hoefler_ and James Bourbeau_ for more details.
.. dropdown:: Additional changes
Revert "Add LLM chatbot to Dask docs (:pr:11556)" (:pr:11577) dchudz_
Automatically rechunk if array in to_zarr has irregular chunks (:pr:11553) Patrick Hoefler_
Blockwise uses Task class (:pr:11568) Florian Jetter_
Migrate rechunk and reshape to task spec (:pr:11555) Patrick Hoefler_
Cache svg-representation for arrays (:pr:11560) Deepak Cherian_
Fix empty input for containers (:pr:11571) Florian Jetter_
Convert Bag graphs to TaskSpec graphs during optimization (:pr:11569) Florian Jetter_
Add LLM chatbot to Dask docs (:pr:11556) dchudz_
Fuse data nodes in linear fusion too (:pr:11549) Patrick Hoefler_
Migrate slicing code to task spec (:pr:11548) Patrick Hoefler_
Speed up ArraySliceDep tokenization (:pr:11551) Patrick Hoefler_
Fix fusing of p2p barrier tasks (:pr:11543) Patrick Hoefler_
Remove infra/mentions of GPU CI (:pr:11546) Charles Blackmon-Luca_
Temporarily disable gpuCI update CI job (:pr:11545) James Bourbeau_
Use BlockwiseDep to implement map_blocks keywords (:pr:11542) Patrick Hoefler_
Remove optimize_slices (:pr:11538) Patrick Hoefler_
Make reshape_blockwise a noop if shape is the same (:pr:11541) Patrick Hoefler_
Remove read-only flag from open_arry in open_zarr (:pr:11539) Patrick Hoefler_
Implement linear_fusion for task spec class (:pr:11525) Patrick Hoefler_
Remove recursion from TaskSpec (:pr:11477) Florian Jetter_
Fixup test after dask-expr change (:pr:11536) Patrick Hoefler_
Bump codecov/codecov-action from 3 to 5 (:pr:11532)
Create dask-expr frame directly without roundtripping (:pr:11529) Patrick Hoefler_
Add scikit-image nightly back to upstream CI (:pr:11530) James Bourbeau_
Remove from_dask_dataframe import (:pr:11528) Patrick Hoefler_
Ensure that from_array creates a copy (:pr:11524) Patrick Hoefler_
Simplify and improve performance of normalize chunks (:pr:11521) Patrick Hoefler_
Fix flaky nanquantile test (:pr:11518) Patrick Hoefler_
Fix tests for new read_only kwarg in zarr=3 (:pr:11516) Patrick Hoefler_
Fix test_jupyter.py::test_shutsdown_cleanly (:pr-distributed:8954) Hendrik Makait_
Install tornado from conda-forge in Python 3.13 CI (:pr-distributed:8951) James Bourbeau_
Restore retire workers API (:pr-distributed:8939) Florian Jetter_
Properly convert finalize dependencies to references (:pr-distributed:8949) Hendrik Makait_
Block fusion for barrier tasks (:pr-distributed:8944) Patrick Hoefler_
Remove infra/mentions of GPUCI (:pr-distributed:8946) Charles Blackmon-Luca_
Temporarily disable gpuCI update CI job (:pr-distributed:8945) James Bourbeau_
Remove recursion in task spec (:pr-distributed:8920) Florian Jetter_
Less verbose log messages for remove and register worker (:pr-distributed:8938) Florian Jetter_
Do not log full worker info in retire_workers (:pr-distributed:8935) Florian Jetter_
.. _v2024.11.2:
.. note:: Versions 2024.11.0 and 2024.11.1 included a critical performance regression and should be skipped by every user.
Highlights ^^^^^^^^^^
Legacy Dask DataFrame Deprecated """"""""""""""""""""""""""""""""
This release deprecates the legacy Dask DataFrame implementation. The old implementation will be removed completely in a future release. Users are encourage to switch to the new implementation now and to report any issues they are facing.
Users are also encourage to check that they are only importing functions from dask.dataframe
and not any of the submodules.
New quantile methods for Dask Array API """""""""""""""""""""""""""""""""""""""
Dask Array added new quantile and nanquantile methods.
Previously, Dask dispatched to the NumPy implementation, which blocked the GIL
a lot. This caused large slowdowns on workers with more than one tread and could lead
to runtimes over 200s per chunk.
The new quantile implementation avoids many of these problems and reduces runtime
to around 1s per chunk independently of the number of threads.
Consistent chunksize in Xarray rolling-construct """"""""""""""""""""""""""""""""""""""""""""""""
Using Xarrays rolling(...).construct(...) with Dask Arrays led to very large
chunksizes that rarely fit into memory on a single worker.
The underlying operations is a view on the smaller NumPy array, but triggering a copy of the data will lead to very large memory usage.
.. code-block::
import xarray as xr
import dask.array as da
arr = xr.DataArray(
da.ones((93504, 721, 1440), chunks=("auto", -1, -1)),
dims=["time", "lat", "longitude"],
) # Initial chunks are ~128 MiB
arr.rolling(time=30).construct("window_dim")
.. grid:: 2
.. grid-item:: **Previously**
Individual chunks are exploding to 10 GiB, likely causing out of memory errors.
.. image:: images/changelog/rolling-construct-exploding-chunks.png
:width: 100%
:align: center
:alt: Individual chunks are exploding to 10 GiB, likely causing out of memory errors.
.. grid-item:: **Now**
Dask will now automatically split individual chunks into chunks that will have the
same chunksize minus a small tolerance.
.. image:: images/changelog/rolling-construct-constant-chunks.png
:width: 100%
:align: center
:alt: Individual chunks are now roughly the same size
Improved efficiency of map overlap """"""""""""""""""""""""""""""""""
map_overlap now creates smaller and more efficient graphs to keep task graphs
generally a lot smaller.
The previous version injected a lot of tasks that weren't necessary, increasing the number of tasks by a factor of 2-10x of what actually necessary. This caused a lot of stress on the scheduler.
Consistent chunksizes for Einstein summation """"""""""""""""""""""""""""""""""""""""""""
Einstein summation historically led to very large chunksizes if applied to more than one Dask Array. This behavior is inherited from NumPy but led to out of memory errors on workers:
.. code-block::
import dask.array as da
arr = da.random.random((1024, 64, 64, 64, 64), chunks=(256, 16, 16, 16, 16)) # Initial chunks are 128 MiB
result = da.einsum("aijkl,amnop->ijklmnop", arr, arr)
.. grid:: 2
.. grid-item:: **Previously**
Individual chunks are exploding to 32 GiB, very likely causing out of memory errors.
.. image:: images/changelog/einstein-exploding-chunks.png
:width: 100%
:align: center
:alt: Individual chunks are exploding to 32 GiB, very likely causing out of memory errors
.. grid-item:: **Now**
The operation keeps individual chunksizes the same.
.. image:: images/changelog/einstein-constant-chunks.png
:width: 100%
:align: center
:alt: Individual chunks are now roughly the same size
.. dropdown:: Additional changes
Add changelog for Dask release (:pr:11502) Patrick Hoefler_
Minor updates to optional dependencies table (:pr:11503) James Bourbeau_
Add push for ffill like operations (:pr:11501) Patrick Hoefler_
Remove func packing for TaskSpec (:pr:11496) Florian Jetter_
Make tokenization for vindex more efficient (:pr:11493) Patrick Hoefler_
Cut down runtime of einstein summation test (:pr:11499) Patrick Hoefler_
Improve test runtime for test_rot90 (:pr:11498) Florian Jetter_
Disable low level optimization for TaskSpec in Bags (:pr:11495) Florian Jetter_
Add automatic rechunking to sliding-window-view (:pr:11479) Patrick Hoefler_
Add load_stored kwarg to dask.array.store (:pr:11465) Deepak Cherian_
Fix quantile error in two dimensions (:pr:11489) Patrick Hoefler_
Bump conda-incubator/setup-miniconda from 3.0.4 to 3.1.0 (:pr:11490)
Update map_blocks docstring (:pr:11491) Patrick Hoefler_
Fix einsum with empty arrays (:pr:11488) Patrick Hoefler_
Implement non gil-blocking quantile method (:pr:11473) Patrick Hoefler_
Use internal keyword for trimming in map_overlap to reduce graph size (:pr:11486) Patrick Hoefler_
Minor dask order refactor (:pr:11467) Florian Jetter_
Remove empty tasks from map_overlap (:pr:11483) Patrick Hoefler_
Fixup auto chunks calculation if single chunk goes below 1 (:pr:11485) Patrick Hoefler_
Fix CI after pandas upstream changes (:pr:11482) Patrick Hoefler_
Make sure that block_id and block_info don't create extra tasks (:pr:11484) Patrick Hoefler_
Use repeat to build nearest boundary (:pr:9666) Jean-Baptiste Bayle_
Remove dead code from make_blockwise (:pr:11478) Florian Jetter_
Patch auto-chunks calculation for rioxarray (:pr:11480) Patrick Hoefler_
Skip legacy test because of flaky warning (:pr:11475) Patrick Hoefler_
Unskip a few dask-expr tests (:pr:11474) Patrick Hoefler_
Keep chunk sizes consistent in einsum (:pr:11464) Patrick Hoefler_
Improve how normalize_chunks squashes together chunks when "auto" is set (:pr:11468) Patrick Hoefler_
Fix resolve_aliases when multiple aliases are in graph (:pr:11469) Patrick Hoefler_
Avoid cyclic import in dask.array (:pr:11472) Hendrik Makait_
Unskip dataframe test (:pr:11471) Patrick Hoefler_
Improve dask.order performance for large graphs (:pr:11466) Florian Jetter_
Ensure that slice(None) just maps the keys (:pr:11450) Patrick Hoefler_
Fix Task.__repr__() of unpickled object (:pr:11463) Peter Andreas Entschev_
Use TaskSpec in local dask execution (:pr:11378) Florian Jetter_
Adjust accuracy in test_solve_triangular_vector (:pr:11461) Florian Jetter_
Update Aggregation docstring (:pr:11459) Guillaume Eynard-Bontemps_
Implement fuse option for delayed objects (:pr:11441) Patrick Hoefler_
Deprecate legacy dask dataframe implementation (:pr:11437) Patrick Hoefler_
Fix na casting behavior for groupby.agg with arrow dtypes (:pr:11118) Patrick Hoefler_
Fix behavior of keys_in_tasks for TaskSpec nodes (:pr:11445) Florian Jetter_
Convert dtype to int instead of np.uint8 for visualizing large task graphs (:pr:11440) Patrick Hoefler_
Ensure dependencies are not mutated (:pr:11438) Florian Jetter_
Full support for task spec in dask.order (:pr:11347) Florian Jetter_
Remove redundant methods in P2PBarrierTask (:pr-distributed:8924) Florian Jetter_
Fix skipif condition for test_tell_workers_when_peers_have_left (:pr-distributed:8929) Florian Jetter_
Ensure ConnectionPool is closed even if network stack swallows CancelledErrors (:pr-distributed:8928) Florian Jetter_
Fix flaky test_server_comms_mark_active_handlers (:pr-distributed:8927) Florian Jetter_
Make assumption in P2P's barrier mechanism explicit (:pr-distributed:8926) Hendrik Makait_
Adjust timeouts in Jupyter cli test (:pr-distributed:8925) Florian Jetter_
Add stimulus_id to update_graph plugin hook (:pr-distributed:8923) Hendrik Makait_
Reduce P2P transfer task overhead (:pr-distributed:8912) Hendrik Makait_
Disable profiler on Python 3.11 (:pr-distributed:8916) Florian Jetter_
Fix test_restarting_does_not_deadlock (:pr-distributed:8849) Florian Jetter_
Adjust popen timeouts for testing (:pr-distributed:8848) Florian Jetter_
Add retry to shuffle broadcast (:pr-distributed:8900) Florian Jetter_
Fix test_shuffle_with_array_conversion (:pr-distributed:8909) Florian Jetter_
Refactor some tests (:pr-distributed:8908) Florian Jetter_
Graduate dask-expr from contrib to core project (:pr-distributed:8911) Hendrik Makait_
Skip test_tell_workers_when_peers_have_left on py10 (:pr-distributed:8910) Florian Jetter_
Internal cleanup of P2P code (:pr-distributed:8907) Hendrik Makait_
Use Task class instead of tuple (:pr-distributed:8797) Florian Jetter_
Increase connect timeout for test_tell_workers_when_peers_have_left (:pr-distributed:8906) Florian Jetter_
Remove dispatching in TaskCollection (:pr-distributed:8903) Florian Jetter_
Deduplicate requests to scheduler in P2P (:pr-distributed:8899) Hendrik Makait_
Add configurations for rootish taskgroup threshold (:pr-distributed:8898) Patrick Hoefler_
.. _v2024.10.0:
Notable Changes ^^^^^^^^^^^^^^^
11388)11423)11419).. dropdown:: Additional changes
broadcast_shapes() returns integers, not NumPy scalars. (:pr:11434) Martin Yeo_11430) Ilan Gold_11431) Florian Jetter_8469) Hendrik Makait_8897) Jacob Tomlinson_8893) Jacob Tomlinson_8891) Patrick Hoefler_8886) Hendrik Makait_1150) Patrick Hoefler_1149) Patrick Hoefler_1145) Patrick Hoefler_analyze and explain (:pr-expr:1146) Hendrik Makait_1142) Patrick Hoefler_1141) Patrick Hoefler_.. _v2024.9.1:
Highlights ^^^^^^^^^^
Improved adaptive scaling resilience """""""""""""""""""""""""""""""""""" Adaptive scaling clusters now recover from spurious errors during scaling.
See :pr-distributed:8871 by Hendrik Makait_ for more details.
.. dropdown:: Additional changes
Improve error message for incorrect columns order in meta information (:pr:11393) Dmitry Balabka_
Update gpuCI RAPIDS_VER to 24.12 (:pr:11407)
Bump jacobtomlinson/gha-anaconda-package-version from 0.1.3 to 0.1.4 (:pr:11405)
Switch to using zarr.open_array instead of using the zarr.Array constructor (:pr:11387) Joe Hamman_
Update gpuCI RAPIDS_VER to 24.12 (:pr-distributed:8879)
Don't consider scheduler idle while executing Scheduler.update_graph (:pr-distributed:8877) Hendrik Makait_
Bump jacobtomlinson/gha-anaconda-package-version from 0.1.3 to 0.1.4 (:pr-distributed:8878)
Support P2P rechunking datetime arrays (:pr-distributed:8875) James Bourbeau_
.. _v2024.9.0:
Highlights ^^^^^^^^^^
Bump Bokeh minimum version to 3.1.0
"""""""""""""""""""""""""""""""""""
bokeh>=3.1.0 is now required for diagnostics and the distributed cluster dashboard.
See :pr:11375 and :pr-distributed:8861 by James Bourbeau_ for more details.
Introduce new Task class
""""""""""""""""""""""""
Add a Task class to replace tuples for task specification.
See :pr:11248 by Florian Jetter_ for more details.
.. dropdown:: Additional changes
Bump peter-evans/create-pull-request from 6 to 7 (:pr:11380)
Reduce overhead in tokenize (:pr:11373) Florian Jetter_
Move tokenize to dedicated submodule (:pr:11371) Florian Jetter_
Ensure process_runnables is not too eager in the presence of multiple splits (:pr:11367) Florian Jetter_
Use np.min_scalar_type in shuffle (:pr:11369) James Bourbeau_
Write indexing arrays into dask graph to reduce size for multiple xarray variables (:pr:11362) Patrick Hoefler_
Cast indexer to minimal dtype in shuffle (:pr:11364) Patrick Hoefler_
Reduce memory usage of dask.order (:pr:11361) Florian Jetter_
Bump JamesIves/github-pages-deploy-action from 4.6.3 to 4.6.4 (:pr:11366)
precommit autoupdate (:pr:11360) Florian Jetter_
Homogeneously schedule P2P's unpack tasks (:pr-distributed:8873) Hendrik Makait_
Work/fix firewall for localhost (:pr-distributed:8868) Mario Linker_
Use new tokenize module (:pr-distributed:8858) James Bourbeau_
Point to user code with idempotent plugin warning (:pr-distributed:8856) James Bourbeau_
Fix test nanny timeout (:pr-distributed:8847) Florian Jetter_
Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.4 (:pr-distributed:8853)
Speed up Client.map by computing token only once for func and kwargs (:pr-distributed:8855) Florian Jetter_
Update pre-commit (:pr-distributed:8852) Florian Jetter_
.. _v2024.8.2:
Highlights ^^^^^^^^^^
Automatic selection of rechunking method """"""""""""""""""""""""""""""""""""""""
To enable users to rechunk data at larger scales than before, Dask now automatically chooses an appropriate rechunking method when rechunking on a cluster. This requires no additional configuration and is enabled by default.
Specifically, Dask chooses between task-based and P2P rechunking. While task-based rechunking has been the previous default, P2P rechunking is beneficial when rechunking requires almost all-to-all communication between the old and new chunks, e.g., when changing between spacial and temporal chunking. In these cases, P2P rechunking offers constant memory usage and creates smaller task graphs. As a result, it works for cases where tasks-based rechunking would have previously failed.
To disable automatic selection, users can select their preferred method via the configuration
.. code-block::
import dask.config
# Choose either "tasks" or "p2p"
dask.config.set({"array.rechunk.method": "tasks"})
or when rechunking
.. code-block::
import dask.array as da
arr = da.random.random(size=(1000, 1000, 365), chunks=(-1, -1, "auto"))
# Choose either "tasks" or "p2p"
arr = arr.rechunk(("auto", "auto", -1), method="tasks")
See :pr:11337 by Hendrik Makait_ for more details.
New shuffle API for Dask Arrays """""""""""""""""""""""""""""""
Dask added a shuffle-API to Dask Arrays. This API allows for shuffling the data
along a single dimension. It will ensure that every group of elements along this
dimension are in exactly one chunk. This is a very useful operation for GroupBy-Map
patterns in Xarray. See :py:func:~dask.array.Array.shuffle for more information
and API signature.
See :pr:11267, :pr:11311 and :pr:11326 by Patrick Hoefler_ for more details.
New blockwise_reshape API for Dask Arrays """""""""""""""""""""""""""""""""""""""""
The new :py:func:~dask.array.blockwise_reshape enables an embarassingly parallel
reshaping operation for cases where you don't care about the order of the underlying
array. It is embarassingly parallel and doesn't trigger a rechunking operation
under the hood anymore. This is useful when you don't care about the order of
the resulting Array, i.e. if a reduction is applied to the array or if the reshaping
is only temporary.
.. code-block::
arr = da.random.random(size=(100, 100, 48_000), chunks=(1000, 100, 83)
result = reshape_blockwise(arr, (10_000, 48_000))
result.sum()
# or: do something that preserves the shape of each chunk
result = reshape_blockwise(result, (100, 100, 48_000), chunks=arr.chunks)
Dask will automatically calculate the resulting chunks if the number of dimensions is reduced, but you have to specify the resulting chunks if the number of dimensions is increased.
Reshaping a Dask Array oftentimes creates a very complicated computations with rechunk
operations in between because Dask respect the C ordering of the Array by default. This
ensures that the resulting Dask Array is returned in the same order as the
corresponding NumPy Array. However, this can lead to very inefficient computations.
The blockwise_reshape is a lot more efficient than the default implemenation
if you don't care about the order.
.. warning::
Blockwise reshape operations are more efficient as the default, but they will
return an Array that is ordered differently. Use with care!
See :pr:11328 by Patrick Hoefler_ for more details.
Mutlidimensional positional indexing keeping chunksizes consistent """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Indexing a Dask Array with :py:func:~dask.array.vindex previously created a single
output chunk along the dimensions that were indexed. vindex is commonly used in Xarray
when indexing multiple dimensions in a single step, i.e.:
.. code-block::
arr = xr.DataArray(
da.random.random((100, 100, 100), chunks=(5, 5, 50)),
dims=['a', "b", "c"],
)
Previously, this put the indexed dimensions into a single chunk:
.. image:: images/changelog/vindex-memory-increase.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB
Dask now uses an improved algorithm that ensures that the chunksizes are kept consistent:
.. image:: images/changelog/vindex-memory-constant.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB
See :pr:11330 by Patrick Hoefler_ for more details.
.. dropdown:: Additional changes
Add changelog entries for shuffle, vindex and blockwise_reshape (:pr:11350) Patrick Hoefler_
Ensure persisted collections are released without GC (:pr:11348) Florian Jetter_
Update zoom link for dask meeting (:pr:11357) Sarah Charlotte Johnson_
Add more docstring examples for normalize_chunks (:pr:11271) Illviljan_
Choose automatically between tasks-based and p2p rechunking (:pr:11337) Hendrik Makait_
Implement blockwise reshaping API for arrays (:pr:11328) Patrick Hoefler_
Make rechunking in shuffle more intelligent to distribute unevenly if necessary (:pr:11326) Patrick Hoefler_
Increase visibility of GPU CI updates (:pr:11345) Charles Blackmon-Luca_
Update numpy and pyarrow versions in install docs (:pr:11340) James Bourbeau_
Fixup dask and distributed dependencies (:pr:11338) Patrick Hoefler_
Bump numpy>=1.24 and pyarrow>=14.0.1 minimum versions (:pr:11331) James Bourbeau_
Add crick back to Python 3.11+ CI builds (:pr:11335) James Bourbeau_
Preserve chunksizes in vindex (:pr:11330) Patrick Hoefler_
Fix dask.array.fft mismatch with Numpy's interface (add support for norm argument) (:pr:10665) joanrue_
Pass additional parameters to rechunk_p2p (:pr:11319) Hendrik Makait_
Fix docstring formatting for map_overlap (:pr:11332) Tao Xin_
Fix NumPy overflowing for prod on 2.0 (:pr:11327) Patrick Hoefler_
Ensure axes are positive / add tests for negative axes (:pr:10812) joanrue_
Fix map_overlap with new_axis (:pr:11128) David Stansby_
Avoid capturing code of xdist (:pr-distributed:8846) Florian Jetter_
Reduce memory footprint of culling P2P rechunking (:pr-distributed:8845) Hendrik Makait_
Add tests for choosing default rechunking method (:pr-distributed:8843) Hendrik Makait_
Increase visibility of GPU CI updates (:pr-distributed:8841) Charles Blackmon-Luca_
Bump test_pause_while_idle timeout (:pr-distributed:8844) Florian Jetter_
Concatenate small input chunks before P2P rechunking (:pr-distributed:8832) Hendrik Makait_
Remove dump cluster from gen_cluster (:pr-distributed:8823) Florian Jetter_
Bump numpy>=1.24 and pyarrow>=14.0.1 minimum versions (:pr-distributed:8837) James Bourbeau_
Fix PipInstall plugin on Worker (:pr-distributed:8839) Hendrik Makait_
Remove more Python 3.10 compatibility code (:pr-distributed:8824) James Bourbeau_
Use task-based rechunking to prechunk along partial boundaries (:pr-distributed:8831) Hendrik Makait_
Ensure client_desires_keys does not corrupt Scheduler state (:pr-distributed:8827) Florian Jetter_
Bump minimum cloudpickle to 3 (:pr-distributed:8836) James Bourbeau_
.. _v2024.8.1:
Highlights ^^^^^^^^^^
Improve output chunksizes for reshaping Dask Arrays """""""""""""""""""""""""""""""""""""""""""""""""""
Reshaping a Dask Array oftentimes squashed the dimensions to reshape into a single chunk. This caused very large output chunks and subsequently a lot of out of memory errors and performance issues.
.. code-block::
arr = da.ones(shape=(1000, 100, 48_000), chunks=(1000, 100, 83))
arr.reshape(1000, 100, 4, 12_000)
Previously, this put the last dimension into a single chunk of size 12_000.
.. image:: images/changelog/reshape-memory-increase.png :width: 75% :align: center :alt: Size of each individual chunk increases to over 1GB
The new algorithm will ensure that the chunk-size between in- and output is kept the same. This will avoid large increases in chunk-size and fragmentation of chunks.
.. image:: images/changelog/reshape-constant-memory.png :width: 75% :align: center :alt: Size of each individual chunk stays the same
Improve scheduling efficiency for Xarray Rechunk-GroupBy-Reduce patterns """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
The scheduler previously created an inefficient execution graph for Xarray GroupBy-Reduction patterns that use the cohorts strategy:
.. code-block:: python
import xarray as xr
arr = xr.open_zarr(...)
arr.chunk(time=TimeResampler("ME")).groupby("time.month").mean()
An issue in the algorithm that creates the execution order of the task graph
lead to an inefficient execution strategy that accumulates a lot of unnecessary memory on
the cluster. The improvement is very similar to
:ref:the previous ordering improvement in 2024.08.0 <label.xarray_groupby_ordering>.
Drop support for Python 3.9 """""""""""""""""""""""""""
This release drops support for Python 3.9 in accordance with NEP 29. Python 3.10 is now the required minimum version to run Dask.
See :pr:11245 and :pr-distributed:8793 by Patrick Hoefler_ for more details.
.. dropdown:: Additional changes
Ensure pickle does not change tokens (:pr:11320) Florian Jetter_
Add changelog entry for reshape and ordering improvements (:pr:11324) Patrick Hoefler_
Rename chunksize-tolerance option (:pr:11317) Patrick Hoefler_
Upgrade gpuCI and fix Dask Array failures with "cupy" backend (:pr:11309) Richard (Rick) Zamora_
Implement automatic rechunking for shuffle (:pr:11311) Patrick Hoefler_
Ensure we test against numpy 2 in CI (:pr:11182) James Bourbeau_
Revert "Test ordering on distributed scheduler (:pr:11310)" (:pr:11321) Florian Jetter_
Test ordering on distributed scheduler (:pr:11310) Florian Jetter_
Add tests to cover more cases of new reshape implementation (:pr:11313) Patrick Hoefler_
Order: Choose better target for branches with multiple leaf nodes (:pr:11303) Patrick Hoefler_
Order: Ensure runnable tasks are certainly runnable (:pr:11305) Florian Jetter_
Fix upstream numpy build (:pr:11304) Patrick Hoefler_
Make shuffle a no-op if possible (:pr:11291) Patrick Hoefler_
Keep chunksize consistent in reshape (:pr:11273) Patrick Hoefler_
Enable slicing with only one unknown chunk (:pr:11301) Patrick Hoefler_
Link to dask vs spark benchmarks on Dask docs (:pr:11289) Sarah Charlotte Johnson_
Fix slicing for masked arrays (:pr:11300) Patrick Hoefler_
Array: fix asarray for array input with dtype (:pr:11288) Lucas Colley_
Add numpy constants to array api (:pr:11287) Lucas Colley_
Ignore typing of return value (:pr:11286) Patrick Hoefler_
Remove automatic resizing in reshape (:pr:11269) Patrick Hoefler_
API: expose np dtypes in dask.array namespace (:pr:11178) Lucas Colley_
Reduce frequency of unmanaged memory use warning (:pr-distributed:8834) Patrick Hoefler_
Update gpuCI RAPIDS_VER to 24.10 (:pr-distributed:8786)
Avoid RuntimeError: dictionary changed size during iteration in Server._shift_counters() (:pr-distributed:8828) Hendrik Makait_
Improve concurrent close for scheduler (:pr-distributed:8829) Hendrik Makait_
MINOR: Extract truncation logic out of partial concatenation in P2P rechunking (:pr-distributed:8826) Hendrik Makait_
avoid excessive attribute access overhead for remove_from_task_prefix_count (:pr-distributed:8821) Florian Jetter_
Avoid key validation if validation is disabled (:pr-distributed:8822) Florian Jetter_
Log worker_client event (:pr-distributed:8819) James Bourbeau_
.. _v2024.8.0:
Highlights ^^^^^^^^^^
Improve efficiency and performance of slicing with positional indexers """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Performance improvement for slicing a Dask Array with a positional indexer. Random access patterns are now more stable and produce easier-to-use results.
.. code-block:: python
x[slice(None), [1, 1, 3, 6, 3, 4, 5]]
Using a positional indexer was previously prone to drastically increasing the number of output chunks and generating a very large task graph. This has been fixed with a more efficient algorithm.
The new algorithm will keep the chunk-sizes along the axis that is indexed the same to avoid fragmentation of chunks or a large increase in chunk-size.
See :pr:11262 and :pr:11267 by Patrick Hoefler_ for more details and performance
benchmarks.
.. _label.xarray_groupby_ordering:
Improve scheduling efficiency for Xarray GroupBy-Reduce patterns """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
The scheduler previously created an inefficient execution graph for Xarray GroupBy-Reduction patterns like:
.. code-block:: python
import xarray as xr
arr = xr.open_zarr(...)
arr.groupby("time.month").mean()
An issue in the algorithm that creates the execution order of the task graph lead to an inefficient execution strategy that accumulates a lot of unneceessary memory on the cluster.
.. image:: images/changelog/dask-order-growing-memory.png :width: 75% :align: center :alt: Memory keeps accumulating on the cluster when running an embarassingly parallel operation.
The operation itself is embarassingly parallel. Using the proper execution strategy the scheduler can now execute the operation with constant memory, avoiding spilling and allowing us to scale to larger datasets.
.. image:: images/changelog/dask-order-constant-memory.png :width: 75% :align: center :alt: Same operation is running with constant memory usage for the whole computation and can scale for bigger datasets.
See :pr-distributed:8818 by Patrick Hoefler_ for more details and examples.
.. dropdown:: Additional changes
Add changelog for dask order patch (:pr:11278) Patrick Hoefler_
Add regression test for xarray map reduce (:pr:11277) Florian Jetter_
Add changelog entry for take (:pr:11274) Patrick Hoefler_
Revert "order: remove data task graph normalization" (:pr:11276) Patrick Hoefler_
Use the shuffle algorithm for take (:pr:11267) Patrick Hoefler_
Implement task-based array shuffle (:pr:11262) Patrick Hoefler_
Remove data task graph normalization (:pr:11263) Florian Jetter_
Update zoom link for monthly meeting (:pr:11265) Sarah Charlotte Johnson_
Update data loading section of best practices (:pr:11247) Patrick Hoefler_
Match default chunksize in docstring to actual default set in code (:pr:11254) Bernhard Raml_
Fixup casting error in pandas 3 (:pr:11250) Patrick Hoefler_
Skip new warning from pandas (:pr:11249) Patrick Hoefler_
Fix pandas nightly bugs (:pr:11244) Patrick Hoefler_
Run graph normalisation after dask order (:pr-distributed:8818) Patrick Hoefler_
Update large graph size warning to remove scatter recommendation (:pr-distributed:8815) Patrick Hoefler_
Fail tasks exceeding no-workers-timeout (:pr-distributed:8806) Hendrik Makait_
Fix exception handling for NannyPlugin.setup and NannyPlugin.teardown (:pr-distributed:8811) Hendrik Makait_
Fix exception handling for WorkerPlugin.setup and WorkerPlugin.teardown (:pr-distributed:8810) Hendrik Makait_
typo fix (:pr-distributed:8812) alex-rakowski_
Fix if / else for send_recv_from_rpc (:pr-distributed:8809) Patrick Hoefler_
Ensure that adaptive only stops once (:pr-distributed:8807) Hendrik Makait_
Reduce noise from GC-related logging (:pr-distributed:8804) Hendrik Makait_
Remove unused delete_interval and synchronize_worker_interval from Scheduler (:pr-distributed:8801) Hendrik Makait_
Change log level for Compute Failed log message (:pr-distributed:8802) Patrick Hoefler_
Add Prometheus metric for time spent on GC (:pr-distributed:8803) Hendrik Makait_
Add Prometheus metrics for dask_worker_{added|removed}_total (:pr-distributed:8798) Hendrik Makait_
Add log event for worker-ttl-timed-out (:pr-distributed:8800) Hendrik Makait_
Add Prometheus metrics for dask_client_connections_{added|removed}_total (:pr-distributed:8799) Hendrik Makait_
Fix PackageInstall plugin (:pr-distributed:8794) Hendrik Makait_
Make stealing more robust (:pr-distributed:8788) Hendrik Makait_
Leave a warning about future instantiation (:pr-distributed:8782) Florian Jetter_
.. _v2024.7.1:
Highlights ^^^^^^^^^^
More resilient distributed lock """""""""""""""""""""""""""""""
:py:class:distributed.Lock is now resilient to worker failures.
Previously deadlocks were possible in cases where a lock-holding worker
was lost and/or failed to release the lock due to an error.
See :pr-distributed:8770 by Florian Jetter_ for more details.
.. dropdown:: Additional changes
Remove and warn of persist usage (:pr:11237) Patrick Hoefler_
Preserve timestamp unit during meta creation (:pr:11233) Patrick Hoefler_
Ensure that dask-expr DataFrames are optimized when put into delayed (:pr:11231) Patrick Hoefler_
Fixes for d freq deprecation in pandas=3 (:pr:11228) James Bourbeau_
bump approx threshold for test_quantile (:pr:10720) Florian Jetter_
Bump xarray-contrib/issue-from-pytest-log from 1.2.8 to 1.3.0 (:pr:11221)
Bump JamesIves/github-pages-deploy-action from 4.6.1 to 4.6.3 (:pr:11222)
Ensure Lock always register with scheduler (:pr-distributed:8781) Florian Jetter_
Temporarily pin setuptools < 71 (:pr-distributed:8785) James Bourbeau_
Restore len() on TaskPrefix (:pr-distributed:8783) Hendrik Makait_
Avoid false positives for p2p-failed log event (:pr-distributed:8777) Hendrik Makait_
Expose paused and retired workers separately in prometheus (:pr-distributed:8613) Patrick Hoefler_
Creating transitions-failures log event (:pr-distributed:8776) alex-rakowski_
Implement HLG layer for P2P rechunking (:pr-distributed:8751) Hendrik Makait_
Add another test for a possible deadlock scenario caused by (:pr-distributed:8703) (:pr-distributed:8769) Hendrik Makait_
Raise an error if compute on persisted collection with released futures (:pr-distributed:8764) Florian Jetter_
Re-raise P2PConsistencyError from failed P2P tasks (:pr-distributed:8748) Hendrik Makait_
Robuster faster tests memory sampler (:pr-distributed:8758) Florian Jetter_
Fix scheduler_bokeh::test_shuffling (:pr-distributed:8766) Florian Jetter_
Increase timeouts for pubsub::test_client_worker (:pr-distributed:8765) Florian Jetter_
Factor out async taskgroup (:pr-distributed:8756) Florian Jetter_
Don't sort keys lexicographically in worker table (:pr-distributed:8753) Florian Jetter_
Use functools.cache instead of functools.lru_cache for extremely often called functions (:pr-distributed:8762) Jonas Dedden_
Robuster deeply nested structures (:pr-distributed:8730) Florian Jetter_
Adding HLG to MAP (:pr-distributed:8740) alex-rakowski_
Add close worker button to worker info page (:pr-distributed:8742) James Bourbeau_
.. _v2024.7.0:
Highlights ^^^^^^^^^^
Drop support for pandas 1.x """""""""""""""""""""""""""
This release drops support for pandas<2. pandas 2.0
is now the required minimum version to run Dask DataFrame.
The mimimum version of partd was also raised to 1.4.0. Versions before 1.4
are not compatible with pandas 2.
See :pr:11199 by Patrick Hoefler_ for more details.
Publish-subscribe APIs deprecated """""""""""""""""""""""""""""""""
:py:class:distributed.Pub and :py:class:distributed.Sub have been deprecated and will be removed
in a future release. Please switch to :py:func:distributed.Client.log_event and :py:func:distributed.Worker.log_event
instead.
See :pr-distributed:8724 by Hendrik Makait_ for more details.
.. dropdown:: Additional changes
Only count data that is in memory for xarray sizeof (:pr:11206) Florian Jetter_
Fix botocore re-raising error (:pr:11209) Patrick Hoefler_
Update Coiled links in documentation (:pr:11211) Sarah Charlotte Johnson_
Add some array-expr methods (:pr:11210) Patrick Hoefler_
Fix quantile for arrow dtypes (:pr:11202) Patrick Hoefler_
Add utility to verify optional dependencies (:pr:11205) Patrick Hoefler_
Implement array expression switch (:pr:11203) Patrick Hoefler_
Remove no longer supported ipython reference (:pr:11196) Patrick Hoefler_
Remove from_delayed references (:pr:11195) Patrick Hoefler_
Add other IO connectors to docs (:pr:11189) Patrick Hoefler_
Fix assert_eq import from cudf (:pr-distributed:8747) James Bourbeau_
Log traceback upon task error (:pr-distributed:8746) Hendrik Makait_
Update system monitor when polling Prometheus metrics (:pr-distributed:8745) Hendrik Makait_
Bump pandas to 2.0 in mindeps build (:pr-distributed:8743) James Bourbeau_
Refactor event logging functionality into broker (:pr-distributed:8731) Hendrik Makait_
Drop support for pandas 1.X (:pr-distributed:8741) Hendrik Makait_
Remove is_python_shutting_down (:pr-distributed:8492) Hendrik Makait_
Fix test_task_state_instance_are_garbage_collected (:pr-distributed:8735) Hendrik Makait_
Fix floating-point inaccuracy (:pr-distributed:8736) Hendrik Makait_
Fix pynvml handles (:pr-distributed:8693) Benjamin Zaitlen_
get_ip: handle getting 0.0.0.0 (:pr-distributed:8712) Adam Williamson_
Remove FutureWarning in test_task_state_instance_are_garbage_collected (:pr-distributed:8734) Hendrik Makait_
Fix mindeps-testing on CI (:pr-distributed:8728) Hendrik Makait_
Extract tests related to event-logging into separate file (:pr-distributed:8733) Hendrik Makait_
Use safer context for ProcessPoolExecutor (:pr-distributed:8715) Elliott Sales de Andrade_
Cache URL encoding of worker addresses in dashboard (:pr-distributed:8725) Florian Jetter_
More robust bokeh test_shuffling (:pr-distributed:8727) Florian Jetter_
Fix type in actor docs (:pr-distributed:8711) Sultan Orazbayev_
More useful warning if a plugin type is provided instead of instance (:pr-distributed:8689) Florian Jetter_
Improve error on cancelled tasks due to disconnect (:pr-distributed:8705) Hendrik Makait_
Fix wait condition on test_forget_errors (:pr-distributed:8714) Elliott Sales de Andrade_
Skip test_deadlock_dependency_of_queued_released (:pr-distributed:8723) Hendrik Makait_
Fix test_quiet_client_close (:pr-distributed:8722) Hendrik Makait_
Fix cleanup iteration in save_sys_modules (:pr-distributed:8713) Elliott Sales de Andrade_
Add quotes to missing bokeh installation commands (:pr-distributed:8717) James Bourbeau_
.. _v2024.6.2:
This is a patch release to update an issue with dask and distributed
version pinning in the 2024.6.1 release.
.. dropdown:: Additional changes
11184) James Bourbeau_profile._f_lineno: handle next_line being None in Python 3.13 (:pr:8710) Adam Williamson_.. _v2024.6.1:
Highlights ^^^^^^^^^^
This release includes a critical fix that fixes a deadlock that can arise when dependencies of root-ish tasks are rescheduled, e.g. due to a worker being lost.
See :pr-distributed:8703 by Hendrik Makait_ for more details.
.. dropdown:: Additional changes
11183) Richard (Rick) Zamora_11185) Adam Williamson_test_map_freq_to_period_start for pandas=3 (:pr:11181) James Bourbeau_8699).. _v2024.6.0:
Highlights ^^^^^^^^^^
memmap array tokenization
"""""""""""""""""""""""""
Tokenizing memmap arrays will now avoid materializing the array into memory.
See :pr:11161 by Florian Jetter_ for more details.
.. dropdown:: Additional changes
Fix test_dt_accessor with query planning disabled (:pr:11177) James Bourbeau_
Use packaging.version.Version (:pr:11171) James Bourbeau_
Remove deprecated dask.compatibility module (:pr:11172) James Bourbeau_
Ensure compatibility for xarray.NamedArray (:pr:11168) Hendrik Makait_
Estimate sizes of xarray collections (:pr:11166) Florian Jetter_
Add section about futures and variables (:pr:11164) Florian Jetter_
Update docs for combined Dask community meeting info (:pr:11159) Sarah Charlotte Johnson_
Avoid rounding error in test_prometheus_collect_count_total_by_cost_multipliers (:pr-distributed:8687) Hendrik Makait_
Log key collision count in update_graph log event (:pr-distributed:8692) Hendrik Makait_
Automate GitHub Releases when new tags are pushed (:pr-distributed:8626) Jacob Tomlinson_
Fix log event with multiple topics (:pr-distributed:8691) Hendrik Makait_
Rename safe to expected in Scheduler.remove_worker (:pr-distributed:8686) Hendrik Makait_
Log event during failure (:pr-distributed:8663) Hendrik Makait_
Eagerly update aggregate statistics for TaskPrefix instead of calculating them on-demand (:pr-distributed:8681) Hendrik Makait_
Improve graph submission time for P2P rechunking by avoiding unpack recursion into indices (:pr-distributed:8672) Florian Jetter_
Add safe keyword to remove-worker event (:pr-distributed:8647) alex-rakowski_
Improved errors and reduced logging for P2P RPC calls (:pr-distributed:8666) Hendrik Makait_
Adjust P2P tests for dask-expr (:pr-distributed:8662) Hendrik Makait_
Iterate over copy of Server.digests_total_since_heartbeat to avoid RuntimeError (:pr-distributed:8670) Hendrik Makait_
Log task state in Compute Failed (:pr-distributed:8668) Hendrik Makait_
Add Prometheus gauge for task groups (:pr-distributed:8661) Hendrik Makait_
Fix too strict assertion in shuffle code for pandas subclasses (:pr-distributed:8667) Joris Van den Bossche_
Reduce noise from erring tasks that are not supposed to be running (:pr-distributed:8664) Hendrik Makait_
.. _v2024.5.2:
This release primarily contains minor bug fixes.
.. dropdown:: Additional changes
Fix nightly Zarr installation in CI (:pr:11151) James Bourbeau_
Add python 3.11 build to GPU CI (:pr:11135) Charles Blackmon-Luca_
Update gpuCI RAPIDS_VER to 24.08 (:pr:11141)
Update test_groupby_grouper_dispatch (:pr:11144) Richard (Rick) Zamora_
Bump JamesIves/github-pages-deploy-action from 4.6.0 to 4.6.1 (:pr:11136)
Unskip test_array_function_sparse with new sparse release (:pr:11139) James Bourbeau_
Fix test_parse_dates_multi_column on pandas=3 (:pr:11132) James Bourbeau_
Don't draft release notes for tagged commits (:pr:11138) Jacob Tomlinson_
Reduce task group count for partial P2P rechunks (:pr-distributed:8655) Hendrik Makait_
Update gpuCI RAPIDS_VER to 24.08 (:pr-distributed:8652)
Submit collections metadata to scheduler (:pr-distributed:8612) Florian Jetter_
Fix indent in code example in task-launch.rst (:pr-distributed:8650) Ray Bell_
Avoid multiple WorkerState sphinx error (:pr-distributed:8643) James Bourbeau_
.. _v2024.5.1:
Highlights ^^^^^^^^^^
NumPy 2.0 support """"""""""""""""" This release contains compatibility updates for the upcoming NumPy 2.0 release.
See :pr:11096 by Benjamin Zaitlen_ and :pr:11106 by James Bourbeau_ for more details.
Increased Zarr store support
""""""""""""""""""""""""""""
This release contains adds support for MutableMapping-backed Zarr stores like
:py:class:zarr.storage.DirectoryStore, etc.
See :pr:10422 by Greg M. Fleishman_ for more details.
.. dropdown:: Additional changes
Minor updates to ML page (:pr:11129) James Bourbeau_
Skip failing sparse test on 0.15.2 (:pr:11131) James Bourbeau_
Make sure nightly pyarrow is installed in upstream CI build (:pr:11121) James Bourbeau_
Add initial draft of ML overview document (:pr:11114) Matthew Rocklin_
Test query-planning in gpuCI (:pr:11060) Richard (Rick) Zamora_
Avoid pytest error when skipping NumPy 2.0 tests (:pr:11110) James Bourbeau_
Use nightly h5py in upstream CI build (:pr:11108) James Bourbeau_
Use nightly scikit-image in upstream CI build (:pr:11107) James Bourbeau_
Bump actions/checkout from 4.1.4 to 4.1.5 (:pr:11105)
Enable parquet append tests after fix (:pr:11104) Patrick Hoefler_
Skip fastparquet tests for numpy 2 (:pr:11103) Patrick Hoefler_
Fix misspelling found by codespell (:pr:11097) Dimitri Papadopoulos Orfanos_
Fix doc build (:pr:11099) Patrick Hoefler_
Clean up percentiles_summary logic (:pr:11094) Richard (Rick) Zamora_
Apply ruff/flake8-implicit-str-concat rule ISC001 (:pr:11098) Dimitri Papadopoulos Orfanos_
Fix clocks on Windows with Python 3.13 (:pr-distributed:8642) Victor Stinner_
Fix "Print host info" CI step on Mac OS (arm64) (:pr-distributed:8638) Hendrik Makait_
.. _v2024.5.0:
Highlights ^^^^^^^^^^
This release primarily contains minor bugfixes.
.. dropdown:: Additional changes
Don't link to click intersphinx dev version (:pr:11091) M Bussonnier_
Fix API doc links for some dask-expr expressions (:pr:11092) Patrick Hoefler_
Add dask-expr to upstream build (:pr:11086) Patrick Hoefler_
Add melt support when query-planning is enabled (:pr:11088) Richard (Rick) Zamora_
Skip dataframe/product when in numpy 2 envs (:pr:11089) Benjamin Zaitlen_
Add plots to illustrate what the optimizer does (:pr:11072) Patrick Hoefler_
Fixup pandas upstream tests (:pr:11085) Patrick Hoefler_
Bump conda-incubator/setup-miniconda from 3.0.3 to 3.0.4 (:pr:11084)
Bump actions/checkout from 4.1.3 to 4.1.4 (:pr:11083)
Fix CI after pytest changes (:pr:11082) Patrick Hoefler_
Fixup tests for more efficient dask-expr implementation (:pr:11071) Patrick Hoefler_
Generalize clear_known_categories utility (:pr:11059) Richard (Rick) Zamora_
Bump JamesIves/github-pages-deploy-action from 4.5.0 to 4.6.0 (:pr:11062)
Bump release-drafter/release-drafter from 5 to 6 (:pr:11063)
Bump actions/checkout from 4.1.2 to 4.1.3 (:pr:11061)
Update GPU CI RAPIDS_VER to 24.06, disable query planning (:pr:11045) Charles Blackmon-Luca_
Move tests (:pr-distributed:8631) Hendrik Makait_
Bump actions/checkout from 4.1.2 to 4.1.3 (:pr-distributed:8628)
.. _v2024.4.2:
Highlights ^^^^^^^^^^
Trivial Merge Implementation """"""""""""""""""""""""""""
The Query Optimizer will inspect quires to determine if a merge(...) or
groupby(...).apply(...) requires a shuffle. A shuffle can be avoided, if the
DataFrame was shuffled on the same columns in a previous step without any operations
in between that change the partitioning layout or the relevant values in each
partition.
.. code-block:: python
>>> result = df.merge(df2, on="a")
>>> result = result.merge(df3, on="a")
The Query optimizer will identify that result was previously shuffled on "a" as
well and thus only shuffle df3 in the second merge operation before doing a blockwise
merge.
Auto-partitioning in read_parquet
"""""""""""""""""""""""""""""""""""""
The Query Optimizer will automatically repartition datasets read from Parquet files if individual partitions are too small. This will reduce the number of partitions in consequentially also the size of the task graph.
The Optimizer aims to produce partitions of at least 75MB and will combine multiple files together if necessary to reach this threshold. The value can be configured by using
.. code-block:: python
>>> dask.config.set({"dataframe.parquet.minimum-partition-size": 100_000_000})
The value is given in bytes. The default threshold is relatively conservative to avoid memory issues on worker nodes with a relatively small amount of memory per thread.
.. dropdown:: Additional changes
Add GitHub Releases automation (:pr:11057) Jacob Tomlinson_
Add changelog entries for new release (:pr:11058) Patrick Hoefler_
Reinstate try/except block in _bind_property (:pr:11049) Lawrence Mitchell_
Fix link for query planning docs (:pr:11054) Patrick Hoefler_
Add config parameter for parquet file size (:pr:11052) Patrick Hoefler_
Update percentile docstring (:pr:11053) Abel Aoun_
Add docs for query optimizer (:pr:11043) Patrick Hoefler_
Assignment of np.ma.masked to obect-type Array (:pr:9627) David Hassell_
Don't error if dask_expr is not installed (:pr:11048) Simon Høxbro Hansen_
Adjust test_set_index for "cudf" backend (:pr:11029) Richard (Rick) Zamora_
Use to/from_legacy_dataframe instead of to/from_dask_dataframe (:pr:11025) Richard (Rick) Zamora_
Tokenize bag groupby keys (:pr:10734) Charles Stern_
Add lazy "cudf" registration for p2p-related dispatch functions (:pr:11040) Richard (Rick) Zamora_
Collect memray profiles on exception (:pr-distributed:8625) Florian Jetter_
Ensure inproc properly emulates serialization protocol (:pr-distributed:8622) Florian Jetter_
Relax test stats profiling2 (:pr-distributed:8621) Florian Jetter_
Restart workers when worker-ttl expires (:pr-distributed:8538) crusaderky_
Use monotonic for deadline test (:pr-distributed:8620) Florian Jetter_
Fix race condition for published futures with annotations (:pr-distributed:8577) Florian Jetter_
Scatter by worker instead of worker -> nthreads (:pr-distributed:8590) Miles_
Send log-event if worker is restarted because of memory pressure (:pr-distributed:8617) Patrick Hoefler_
Do not print xfailed tests in CI (:pr-distributed:8619) Florian Jetter_
ensure workers are not downscaled when participating in p2p (:pr-distributed:8610) Florian Jetter_
Run against stable fsspec (:pr-distributed:8615) Florian Jetter_
.. _v2024.4.1:
This is a minor bugfix release that that fixes an error when importing
dask.dataframe with Python 3.11.9.
See :pr:11035 and :pr:11039 from Richard (Rick) Zamora_ for details.
.. dropdown:: Additional changes
11036) Patrick Hoefler_8609) crusaderky_dask-expr to dask conda recipe (:pr-distributed:8601) Charles Blackmon-Luca_.. _v2024.4.0:
Highlights ^^^^^^^^^^
Query planning fixes """""""""""""""""""" This release contains a variety of bugfixes in Dask DataFrame's new query planner.
GPU metric dashboard fixes """""""""""""""""""""""""" GPU memory and utilization dashboard functionality has been restored. Previously these plots were unintentionally left blank.
See :pr-distributed:8572 from Benjamin Zaitlen_ for details.
.. dropdown:: Additional changes
Build nightlies on tag releases (:pr:11014) Charles Blackmon-Luca_
Remove xfail tracebacks from test suite (:pr:11028) Patrick Hoefler_
Fix CI for upstream pandas changes (:pr:11027) Patrick Hoefler_
Fix value_counts raising if branch exists of nans only (:pr:11023) Patrick Hoefler_
Enable custom expressions in dask_cudf (:pr:11013) Richard (Rick) Zamora_
Raise ImportError instead of ValueError when dask-expr cannot be imported (:pr:11007) James Lamb_
Add HypersSpy to ecosystem.rst (:pr:11008) Jonas Lähnemann_
Add Hugging Face hf:// to the list of fsspec compatible remote services (:pr:11012) Quentin Lhoest_
Bump actions/checkout from 4.1.1 to 4.1.2 (:pr:11009)
Refresh documentation for annotations and spans (:pr-distributed:8593) crusaderky_
Fixup deprecation warning from pandas (:pr-distributed:8564) Patrick Hoefler_
Add Python 3.11 to GPU CI matrix (:pr-distributed:8598) Charles Blackmon-Luca_
Deadline to use a monotonic timer (:pr-distributed:8597) crusaderky_
Update gpuCI RAPIDS_VER to 24.06 (:pr-distributed:8588)
Refactor restart() and restart_workers() (:pr-distributed:8550) crusaderky_
Bump actions/checkout from 4.1.1 to 4.1.2 (:pr-distributed:8587)
Fix bokeh deprecations (:pr-distributed:8594) Miles_
Fix flaky test: test_shutsdown_cleanly (:pr-distributed:8582) Miles_
Include type in failed sizeof warning (:pr-distributed:8580) James Bourbeau_
.. _v2024.3.1:
This is a minor release that primarily demotes an exception to a warning if
dask-expr is not installed when upgrading.
.. dropdown:: Additional changes
dask-expr is not installed (:pr:11003) Florian Jetter_10993) Dimitri Papadopoulos Orfanos_dask-expr disabled (:pr-distributed:8583) crusaderky_8528) Miles_test_restart_waits_for_new_workers (:pr-distributed:8573) crusaderky_test_raise_on_incompatible_partitions (:pr-distributed:8571) crusaderky_.. _v2024.3.0:
Released on March 11, 2024
Highlights ^^^^^^^^^^
Query planning """"""""""""""
This release is enabling query planning by default for all users of
dask.dataframe.
The query planning functionality represents a rewrite of the DataFrame using
dask-expr. This is a drop-in replacement and we expect that most users will
not have to adjust any of their code.
Any feedback can be reported on the Dask issue tracker <https://github.com/dask/dask/issues>_ or on the query planning feedback issue <https://github.com/dask/dask/issues/10995>_.
If you are encountering any issues you are still able to opt-out by setting
.. code-block:: python
>>> import dask
>>> dask.config.set({'dataframe.query-planning': False})
Sunset of Pandas 1.X support """"""""""""""""""""""""""""
The new query planning backend is requiring at least pandas 2.0. This pandas
version will automatically be installed if you are installing from conda or if
you are installing using dask[complete] or dask[dataframe] from pip.
The legacy DataFrame implementation is still supporting pandas 1.X if you
install dask without extras.
.. dropdown:: Additional changes
10989) Patrick Hoefler_10990) Patrick Hoefler_10988) Patrick Hoefler_to_delayed test (:pr:10985) Patrick Hoefler_10978)10977) Patrick Hoefler_10976) Patrick Hoefler_10929) David Hoese_10967) Patrick Hoefler_10970) Elliott Sales de Andrade_10972) Elliott Sales de Andrade_10966) Lindsey Gray10968) Patrick Hoefler_10973) Patrick Hoefler_10971) Elliott Sales de Andrade_10969) Elliott Sales de Andrade_bag.to_dataframe (:pr:10963) Patrick Hoefler_10964) Miles_dask.config (:pr:10959) crusaderky_importlib.metadata on Python 3.12+ (:pr:10955) wim glenn_10953) Florian Jetter_10952) Patrick Hoefler_8569) Florian Jetter_8568) crusaderky_8563) crusaderky_8558) crusaderky_memory->erred (:pr-distributed:8549) Hendrik Makait_8560) Miles_8562) crusaderky_test_flaky_connect_recover_with_retry (:pr-distributed:8556) Hendrik Makait_8551) crusaderky_8553)8552) Hendrik Makait_8531) Hendrik Makait_8517) crusaderky_8539) Patrick Hoefler_8535)8533) James Bourbeau_8532) James Bourbeau_8511) Hendrik Makait_8524) crusaderky_8509) crusaderky_8521) crusaderky_8523) crusaderky_.. _v2024.2.1:
Released on February 23, 2024
Highlights ^^^^^^^^^^
Allow silencing dask.DataFrame deprecation warning """"""""""""""""""""""""""""""""""""""""""""""""""
The last release contained a DeprecationWarning that alerts users to an
upcoming switch of dask.dafaframe to use the new backend with support for
query planning (see also :issue:10934).
This DeprecationWarning is triggered in import of the dask.dataframe
module and the community raised concerns about this being to verbose.
It is now possible to silence this warning
.. code::
# via Python
>>> dask.config.set({'dataframe.query-planning-warning': False})
# via CLI
dask config set dataframe.query-planning-warning False
See :pr:10936 and :pr:10925 from Miles_ for details.
More robust distributed scheduler for rare key collisions """""""""""""""""""""""""""""""""""""""""""""""""""""""""
Blockwise fusion optimization can cause a task key collision that is not being
handled properly by the distributed scheduler (see :issue:9888). Users will
typically notice this by seeing one of various internal exceptions that cause a
system deadlock or critical failure. While this issue could not be fixed, the
scheduler now implements a mechanism that should mitigate most occurences and
issues a warning if the issue is detected.
See :pr-distributed:8185 from crusaderky_ and Florian Jetter_ for details.
Over the course of this, various improvements to tokenization have been
implemented. See :pr:10913, :pr:10884, :pr:10919, :pr:10896 and
primarily :pr:10883 from crusaderky_ for more details.
More robust adaptive scaling on large clusters """"""""""""""""""""""""""""""""""""""""""""""
Adaptive scaling could previously lose data during downscaling if many tasks had to be moved. This typically, but not exclusively, occured on large clusters and would manifest as a recomputation of tasks and could cause clusters to oscillate between up- and downscaling without ever finishing.
See :pr-distributed:8522 from crusaderky_ for more details.
.. dropdown:: Additional changes
10948) Patrick Hoefler_10947) Patrick Hoefler_10944) Patrick Hoefler_10942) Patrick Hoefler_10943) crusaderky_10939) crusaderky_10938) crusaderky_dask config set and dask config find updates. (:pr:10930) Miles_10932) crusaderky_10926) crusaderky_dask config get fix when printing None values (:pr:10927) crusaderky_10928) crusaderky_dask config set (:pr:10921) Miles_10922) Patrick Hoefler_10924) crusaderky_10920)8520) Florian Jetter_8518) crusaderky_RAPIDS_VER to 24.04 (:pr-distributed:8471)8516) crusaderky_8512) crusaderky_8513) crusaderky_8507) Florian Jetter_8508) Florian Jetter_8499) crusaderky_update_graph (backport from #8185) (:pr-distributed:8498) crusaderky_8501) crusaderky_8505) crusaderky_8504) James Bourbeau_8458) Hendrik Makait_8503).. _v2024.2.0:
Released on February 9, 2024
Highlights ^^^^^^^^^^
Deprecate Dask DataFrame implementation """"""""""""""""""""""""""""""""""""""" The current Dask DataFrame implementation is deprecated. In a future release, Dask DataFrame will use new implementation that contains several improvements including a logical query planning. The user-facing DataFrame API will remain unchanged.
The new implementation is already available and can be enabled by
installing the dask-expr library:
.. code-block:: bash
$ pip install dask-expr
and turning the query planning option on:
.. code-block:: python
>>> import dask
>>> dask.config.set({'dataframe.query-planning': True})
>>> import dask.dataframe as dd
API documentation for the new implementation is available at https://docs.dask.org/en/stable/dataframe-api.html
Any feedback can be reported on the Dask issue tracker https://github.com/dask/dask/issues
See :pr:10912 from Patrick Hoefler_ for details.
Improved tokenization """"""""""""""""""""" This release contains several improvements to Dask's object tokenization logic. More objects now produce deterministic tokens, which can lead to improved performance through caching of intermediate results.
See :pr:10898, :pr:10904, :pr:10876, :pr:10874, and :pr:10865 from crusaderky_ for details.
.. dropdown:: Additional changes
Fix inplace modification on read-only arrays for string conversion (:pr:10886) Patrick Hoefler_
Add changelog entry for dask-expr (:pr:10915) Patrick Hoefler_
Fix leftsemi merge for cudf (:pr:10914) Patrick Hoefler_
Slight update to dask-expr warning (:pr:10916) James Bourbeau_
Improve performance for groupby.nunique (:pr:10910) Patrick Hoefler_
Add configuration for leftsemi merges in dask-expr (:pr:10908) Patrick Hoefler_
Adjust assign test for dask-expr (:pr:10907) Patrick Hoefler_
Avoid pytest.warns in test_to_datetime for GPU CI (:pr:10902) Richard (Rick) Zamora_
Update deployment options in docs homepage (:pr:10901) James Bourbeau_
Fix typo in dataframe docs (:pr:10900) Matthew Rocklin_
Bump peter-evans/create-pull-request from 5 to 6 (:pr:10894)
Fix mimesis API >=13.1.0 - use random.randint (:pr:10888) Miles_
Adjust invalid test (:pr:10897) Patrick Hoefler_
Pickle da.argwhere and da.count_nonzero (:pr:10885) crusaderky_
Fix dask-expr tests after singleton pr (:pr:10892) Patrick Hoefler_
Set lower bound version for s3fs (:pr:10889) Miles_
Add a couple of dask-expr fixes for new parquet cache (:pr:10880) Florian Jetter_
Update deployment documentation (:pr:10882) Matthew Rocklin_
Start with dask-expr doc build (:pr:10879) Patrick Hoefler_
Test tokenization of static and class methods (:pr:10872) crusaderky_
Add distributed.print and distributed.warn to API docs (:pr:10878) James Bourbeau_
Run macos ci on M1 architecture (:pr:10877) Patrick Hoefler_
Update tests for dask-expr (:pr:10838) Patrick Hoefler_
Update parquet tests to align with dask-expr fixes (:pr:10851) Richard (Rick) Zamora_
Fix regression in test_graph_manipulation (:pr:10873) crusaderky_
Adjust pytest errors for dask-expr ci (:pr:10871) Patrick Hoefler_
Set upper bound version for numba when pandas<2.1 (:pr:10890) Miles_
Deprecate method parameter in DataFrame.fillna (:pr:10846) Miles_
Remove warning filter from pyproject.toml (:pr:10867) Patrick Hoefler_
Skip test_append_with_partition for fastparquet (:pr:10828) Patrick Hoefler_
Fix pytest 8 issues (:pr:10868) Patrick Hoefler_
Adjust test for support of median in Groupby.aggregate in dask-expr (2/2) (:pr:10870) Hendrik Makait_
Allow length of ascending to be larger than one in sort_values (:pr:10864) Florian Jetter_
Allow other message raised in Python 3.9 (:pr:10862) Hendrik Makait_
Don't crash when getting computation code in pathological cases (:pr-distributed:8502) James Bourbeau_
Bump peter-evans/create-pull-request from 5 to 6 (:pr-distributed:8494)
fix test of cudf spilling metrics (:pr-distributed:8478) Mads R. B. Kristensen_
Upgrade to pytest 8 (:pr-distributed:8482) crusaderky_
Fix test_two_consecutive_clients_share_results (:pr-distributed:8484) crusaderky_
Client word mix-up (:pr-distributed:8481) templiert_
.. _v2024.1.1:
Released on January 26, 2024
Highlights ^^^^^^^^^^
Pandas 2.2 and Scipy 1.12 support
"""""""""""""""""""""""""""""""""
This release contains compatibility updates for the latest pandas and scipy releases.
See :pr:10834, :pr:10849, :pr:10845, and :pr-distributed:8474 from crusaderky_ for details.
Deprecations """"""""""""
convert_dtype in apply (:pr:10827) Miles_axis in DataFrame.rolling (:pr:10803) Miles_out= and dtype= parameter in most DataFrame methods (:pr:10800) crusaderky_axis in groupby cumulative transformers (:pr:10796) Miles_shuffle to shuffle_method in remaining methods (:pr:10797) Miles_.. dropdown:: Additional changes
Add recommended deployment options to deployment docs (:pr:10866) James Bourbeau_
Improve _agg_finalize to confirm to output expectation (:pr:10835) Hendrik Makait_
Implement deterministic tokenization for hlg (:pr:10817) Patrick Hoefler_
Refactor: move tests for tokenize() to its own module (:pr:10863) crusaderky_
Update DataFrame examples section (:pr:10856) James Bourbeau_
Temporarily pin mimesis<13.1.0 (:pr:10860) James Bourbeau_
Trivial cosmetic tweaks to _testing.py (:pr:10857) crusaderky_
Unskip and adjust tests for groupby-aggregate with median using dask-expr (:pr:10832) Hendrik Makait_
Fix test for sizeof(pd.MultiIndex) in upstream CI (:pr:10850) crusaderky_
numpy 2.0: fix slicing by uint64 array (:pr:10854) crusaderky_
Rename numpy version constants to match pandas (:pr:10843) crusaderky_
Bump actions/cache from 3 to 4 (:pr:10852)
Update gpuCI RAPIDS_VER to 24.04 (:pr:10841)
Fix deprecations in doctest (:pr:10844) crusaderky_
Changed dtype arithmetics in numpy 2.x (:pr:10831) crusaderky_
Adjust tests for median support in dask-expr (:pr:10839) Patrick Hoefler_
Adjust tests for median support in groupby-aggregate in dask-expr (:pr:10840) Hendrik Makait_
numpy 2.x: fix std() on MaskedArray (:pr:10837) crusaderky_
Fail dask-expr ci if tests fail (:pr:10829) Patrick Hoefler_
Activate query_planning when exporting tests (:pr:10833) Patrick Hoefler_
Expose dataframe tests (:pr:10830) Patrick Hoefler_
numpy 2: deprecations in n-dimensional fft functions (:pr:10821) crusaderky_
Generalize CreationDispatch for dask-expr (:pr:10794) Richard (Rick) Zamora_
Remove circular import when dask-expr enabled (:pr:10824) Miles_
Minor[CI]: publish-test-results not marked as failed (:pr:10825) Miles_
Fix more tests to use pytest.warns() (:pr:10818) Michał Górny_
np.unique(): inverse is shaped in numpy 2 (:pr:10819) crusaderky_
Pin test_split_adaptive_files to pyarrow engine (:pr:10820) Patrick Hoefler_
Adjust remaining tests in dask/dask (:pr:10813) Patrick Hoefler_
Restrict test to Arrow only (:pr:10814) Patrick Hoefler_
Filter warnings from std test (:pr:10815) Patrick Hoefler_
Adjust mostly indexing tests (:pr:10790) Patrick Hoefler_
Updates to deployment docs (:pr:10778) Sarah Charlotte Johnson_
Unblock documentation build (:pr:10807) Miles_
Adjust test_to_datetime for dask-expr compatibility Hendrik Makait_
Upstream CI tweaks (:pr:10806) crusaderky_
Improve tests for to_numeric (:pr:10804) Hendrik Makait_
Fix test-report cache key indent (:pr:10798) Miles_
Add test-report workflow (:pr:10783) Miles_
Handle matrix subclass serialization (:pr-distributed:8480) Florian Jetter_
Use smallest data type for partition column in P2P (:pr-distributed:8479) Florian Jetter_
pandas 2.2: fix test_dataframe_groupby_tasks (:pr-distributed:8475) crusaderky_
Bump actions/cache from 3 to 4 (:pr-distributed:8477)
pandas 2.2 vs. pyarrow 14: deprecated DatetimeTZBlock (:pr-distributed:8476) crusaderky_
pandas 2.2.0: Deprecated frequency alias M in favor of ME (:pr-distributed:8473) Hendrik Makait_
Fix docs build (:pr-distributed:8472) Hendrik Makait_
Fix P2P-based joins with explicit npartitions (:pr-distributed:8470) Hendrik Makait_
Ignore dask-expr in test_report.py script (:pr-distributed:8464) Miles_
Nit: hardcode Python version in test report environment (:pr-distributed:8462) crusaderky_
Change test_report.py - skip bad artifacts in dask/dask (:pr-distributed:8461) Miles_
Replace all occurrences of sys.is_finalizing (:pr-distributed:8449) Florian Jetter_
.. _v2024.1.0:
Released on January 12, 2024
Highlights ^^^^^^^^^^
Partial rechunks within P2P """"""""""""""""""""""""""" P2P rechunking now utilizes the relationships between input and output chunks. For situations that do not require all-to-all data transfer, this may significantly reduce the runtime and memory/disk footprint. It also enables task culling.
See :pr-distributed:8330 from Hendrik Makait_ for details.
Fastparquet engine deprecated
"""""""""""""""""""""""""""""
The fastparquet Parquet engine has been deprecated. Users should migrate to the pyarrow
engine by installing PyArrow <https://arrow.apache.org/docs/python/install.html>_ and removing
engine="fastparquet" in read_parquet or to_parquet calls.
See :pr:10743 from crusaderky_ for details.
Improved serialization for arbitrary data
"""""""""""""""""""""""""""""""""""""""""
This release improves serialization robustness for arbitrary data. Previously there were
some cases where serialization could fail for non-msgpack serializable data.
In those cases we now fallback to using pickle.
See :pr:8447 from Hendrik Makait_ for details.
Additional deprecations """""""""""""""""""""""
shuffle keyword in favour of shuffle_method for DataFrame methods (:pr:10738) Hendrik Makait_repartition (:pr:10691) Patrick Hoefler_compute parameter in set_index (:pr:10784) Miles_inplace in eval (:pr:10785) Miles_Series.view (:pr:10754) Miles_npartitions="auto" for set_index & sort_values (:pr:10750) Miles_.. dropdown:: Additional changes
Avoid shortcut in tasks shuffle that let to data loss (:pr:10763) Patrick Hoefler_
Ignore data tasks when ordering (:pr:10706) Florian Jetter_
Add get_dummies from dask-expr (:pr:10791) Patrick Hoefler_
Adjust IO tests for dask-expr migration (:pr:10776) Patrick Hoefler_
Remove deprecation warning about sort and split_out in groupby (:pr:10788) Patrick Hoefler_
Address pandas deprecations (:pr:10789) Patrick Hoefler_
Import distributed only once in get_scheduler (:pr:10771) Florian Jetter_
Simplify GitHub actions (:pr:10781) crusaderky_
Add unit test overview (:pr:10769) Miles_
Clean up redundant bits in CI (:pr:10768) crusaderky_
Update tests for ufunc (:pr:10773) Patrick Hoefler_
Use pytest.mark.skipif(DASK_EXPR_ENABLED) (:pr:10774) crusaderky_
Adjust shuffle tests for dask-expr (:pr:10759) Patrick Hoefler_
Fix some deprecation warnings from pandas (:pr:10749) Patrick Hoefler_
Adjust shuffle tests for dask-expr (:pr:10762) Patrick Hoefler_
Update pre-commit (:pr:10767) Hendrik Makait_
Clean up config switches in CI (:pr:10766) crusaderky_
Improve exception for validate_key (:pr:10765) Hendrik Makait_
Handle datetimeindexes in set_index with unknown divisions (:pr:10757) Patrick Hoefler_
Add hashing for decimals (:pr:10758) Patrick Hoefler_
Review tests for is_monotonic (:pr:10756) crusaderky_
Change argument order in value_counts_aggregate (:pr:10751) Patrick Hoefler_
Adjust some groupby tests for dask-expr (:pr:10752) Patrick Hoefler_
Restrict mimesis to < 12 for 3.9 build (:pr:10755) Patrick Hoefler_
Don't evaluate config in skip condition (:pr:10753) Patrick Hoefler_
Adjust some tests to be compatible with dask-expr (:pr:10714) Patrick Hoefler_
Make dask.array.utils functions more generic to other Dask Arrays (:pr:10676) Matthew Rocklin_
Remove duplciate "single machine" section (:pr:10747) Matthew Rocklin_
Tweak ORC engine= parameter (:pr:10746) crusaderky_
Add pandas 3.0 deprecations and migration prep for dask-expr (:pr:10723) Miles_
Add task graph animation to docs homepage (:pr:10730) Sarah Charlotte Johnson_
Use new Xarray logo (:pr:10729) James Bourbeau_
Update tab styling on "10 Minutes to Dask" page (:pr:10728) James Bourbeau_
Update environment file upload step in CI (:pr:10726) James Bourbeau_
Don't duplicate unobserved categories in GroupBy.nunqiue if split_out>1 (:pr:10716) Patrick Hoefler_
Changelog entry for dask.order update (:pr:10715) Florian Jetter_
Relax redundant-key check in _check_dsk (:pr:10701) Richard (Rick) Zamora_
Fix test_report.py (:pr-distributed:8459) Miles_
Revert pickle change (:pr-distributed:8456) Florian Jetter_
Adapt test_report.py to support dask/dask repository (:pr-distributed:8450) Miles_
Maintain stable ordering for P2P shuffling (:pr-distributed:8453) Hendrik Makait_
Add no worker timeout for scheduler (:pr-distributed:8371) FTang21_
Allow tests workflow to be dispatched manually by maintainers (:pr-distributed:8445) Erik Sundell_
Make scheduler-related transition functionality private (:pr-distributed:8448) Hendrik Makait_
Update pre-commit hooks (:pr-distributed:8444) Hendrik Makait_
Do not always check if __main__ in result when pickling (:pr-distributed:8443) Florian Jetter_
Delegate wait_for_workers to cluster instances only when implemented (:pr-distributed:8441) Erik Sundell_
Extend sleep in test_pandas (:pr-distributed:8440) Julian Gilbey_
Avoid deprecated shuffle keyword (:pr-distributed:8439) Hendrik Makait_
Shuffle metrics 4/4: Remove bespoke diagnostics (:pr-distributed:8367) crusaderky_
Do not run gilknocker in testsuite (:pr-distributed:8423) Florian Jetter_
Tweak abstractmethods (:pr-distributed:8427) crusaderky_
Shuffle metrics 3/4: Capture background metrics (:pr-distributed:8366) crusaderky_
Shuffle metrics 2/4: Add background metrics (:pr-distributed:8365) crusaderky_
Shuffle metrics 1/4: Add foreground metrics (:pr-distributed:8364) crusaderky_
Bump actions/upload-artifact from 3 to 4 (:pr-distributed:8420)
Fix test_merge_p2p_shuffle_reused_dataframe_with_different_parameters (:pr-distributed:8422) Hendrik Makait_
Expand Client.upload_file docs example (:pr-distributed:8313) Miles_
Improve logging in P2P's scheduler plugin (:pr-distributed:8410) Hendrik Makait_
Re-enable test_decide_worker_coschedule_order_neighbors (:pr-distributed:8402) Florian Jetter_
Add cuDF spilling statistics to RMM/GPU memory plot (:pr-distributed:8148) Charles Blackmon-Luca_
Fix inconsistent hashing for Nanny-spawned workers (:pr-distributed:8400) Charles Stern_
Do not allow workers to downscale if they are running long-running tasks (e.g. worker_client) (:pr-distributed:7481) Florian Jetter_
Fix flaky test_subprocess_cluster_does_not_depend_on_logging (:pr-distributed:8417) crusaderky_
.. _v2023.12.1:
Released on December 15, 2023
Highlights ^^^^^^^^^^
Logical Query Planning now available for Dask DataFrames """"""""""""""""""""""""""""""""""""""""""""""""""""""""
Dask DataFrames are now much more performant by using a logical query planner. This feature is currently off by default, but can be turned on with:
.. code:: python
dask.config.set({"dataframe.query-planning": True})
You also need to have dask-expr installed:
.. code:: bash
pip install dask-expr
We've seen promising performance improvements so far, see
this blog post <https://blog.coiled.io/blog/dask-expr-tpch-dask.html>__
and these regularly updated benchmarks <https://tpch.coiled.io>__ for more information.
A more detailed explanation of how the query optimizer works can be found in
this blog post <https://blog.coiled.io/blog/dask-expr-introduction.html>__.
This feature is still under active development
and the API <https://github.com/dask-contrib/dask-expr#api-coverage>__ isn't stable yet,
so breaking changes can occur. We expect to make the query optimizer the default early next year.
See :pr:10634 from Patrick Hoefler_ for details.
Dtype inference in read_parquet
"""""""""""""""""""""""""""""""""""
read_parquet will now infer the Arrow types pa.date32(), pa.date64() and
pa.decimal() as a ArrowDtype in pandas. These dtypes are backed by the
original Arrow array, and thus avoid the conversion to NumPy object. Additionally,
read_parquet will no longer infer nested and binary types as strings, they will
be stored in NumPy object arrays.
See :pr:10698 and :pr:10705 from Patrick Hoefler_ for details.
Scheduling improvements to reduce memory usage """"""""""""""""""""""""""""""""""""""""""""""
This release includes a major rewrite to a core part of our scheduling logic. It
includes a new approach to the topological sorting algorithm in dask.order
which determines the order in which tasks are run. Improper ordering is known to
be a major contributor to too large cluster memory pressure.
Updates in this release fix a couple of performance regressions that were introduced
in the release 2023.10.0 (see :pr:10535). Generally, computations should now
be much more eager to release data if it is no longer required in memory.
See :pr:10660, :pr:10697 from Florian Jetter_ for details.
Improved P2P-based merging robustness and performance """""""""""""""""""""""""""""""""""""""""""""""""""""
This release contains several updates that fix a possible deadlock introduced in 2023.9.2 and improve the robustness of P2P-based merging when the cluster is dynamically scaling up.
See :pr-distributed:8415, :pr-distributed:8416, and :pr-distributed:8414 from Hendrik Makait_ for details.
Removed disabling pickle option """""""""""""""""""""""""""""""
The distributed.scheduler.pickle configuration option is no longer supported.
As of the 2023.4.0 release, pickle is used to transmit task graphs, so can no
longer be disabled. We now raise an informative error when distributed.scheduler.pickle
is set to False.
See :pr-distributed:8401 from Florian Jetter_ for details.
.. dropdown:: Additional changes
Add changelog entry for recent P2P merge fixes (:pr:10712) Hendrik Makait_
Update DataFrame page (:pr:10710) Matthew Rocklin_
Add changelog entry for dask-expr switch (:pr:10704) Patrick Hoefler_
Improve changelog entry for PipInstall changes (:pr:10711) Hendrik Makait_
Remove PR labeler (:pr:10709) James Bourbeau_
Add .__wrapped__ to Delayed object (:pr:10695) Andrew S. Rosen_
Bump actions/labeler from 4.3.0 to 5.0.0 (:pr:10689)
Bump actions/stale from 8 to 9 (:pr:10690)
[Dask.order] Remove non-runnable leaf nodes from ordering (:pr:10697) Florian Jetter_
Update installation docs (:pr:10699) Matthew Rocklin_
Fix software environment link in docs (:pr:10700) James Bourbeau_
Avoid converting non-strings to arrow strings for read_parquet (:pr:10692) Patrick Hoefler_
Bump xarray-contrib/issue-from-pytest-log from 1.2.7 to 1.2.8 (:pr:10687)
Fix tokenize for pd.DateOffset (:pr:10664) jochenott_
Bugfix for writing empty array to zarr (:pr:10506) Ben_
Docs update, fixup styling, mention free (:pr:10679) Matthew Rocklin_
Update deployment docs (:pr:10680) Matthew Rocklin_
Dask.order rewrite using a critical path approach (:pr:10660) Florian Jetter_
Avoid substituting keys that occur multiple times (:pr:10646) Florian Jetter_
Add missing image to docs (:pr:10694) Matthew Rocklin_
Bump actions/setup-python from 4 to 5 (:pr:10688)
Update landing page (:pr:10674) Matthew Rocklin_
Make meta check simpler in dispatch (:pr:10638) Patrick Hoefler_
Pin PR Labeler (:pr:10675) Matthew Rocklin_
Reorganize docs index a bit (:pr:10669) Matthew Rocklin_
Bump actions/setup-java from 3 to 4 (:pr:10667)
Bump conda-incubator/setup-miniconda from 2.2.0 to 3.0.1 (:pr:10668)
Bump xarray-contrib/issue-from-pytest-log from 1.2.6 to 1.2.7 (:pr:10666)
Fix test_categorize_info with nightly pyarrow (:pr:10662) James Bourbeau_
Rewrite test_subprocess_cluster_does_not_depend_on_logging (:pr-distributed:8409) Hendrik Makait_
Avoid RecursionError when failing to pickle key in SpillBuffer and using tblib=3 (:pr-distributed:8404) Hendrik Makait_
Allow tasks to override is_rootish heuristic (:pr-distributed:8412) Hendrik Makait_
Remove GPU executor (:pr-distributed:8399) Hendrik Makait_
Do not rely on logging for subprocess cluster (:pr-distributed:8398) Hendrik Makait_
Update gpuCI RAPIDS_VER to 24.02 (:pr-distributed:8384)
Bump actions/setup-python from 4 to 5 (:pr-distributed:8396)
Ensure output chunks in P2P rechunking are distributed homogeneously (:pr-distributed:8207) Florian Jetter_
Trivial: fix typo (:pr-distributed:8395) crusaderky_
Bump JamesIves/github-pages-deploy-action from 4.4.3 to 4.5.0 (:pr-distributed:8387)
Bump conda-incubator/setup-miniconda from 3.0.0 to 3.0.1 (:pr-distributed:8388)
.. _v2023.12.0:
Released on December 1, 2023
Highlights ^^^^^^^^^^
PipInstall restart and environment variables """"""""""""""""""""""""""""""""""""""""""""
The distributed.PipInstall plugin now has more robust restart logic and also supports
environment variables <https://pip.pypa.io/en/stable/reference/requirements-file-format/#using-environment-variables>_.
Below shows how users can use the distributed.PipInstall plugin and a TOKEN environment
variable to securely install a package from a private repository:
.. code:: python
from dask.distributed import PipInstall plugin = PipInstall(packages=["private_package@git+https://${TOKEN}@github.com/dask/private_package.git]) client.register_plugin(plugin)
See :pr-distributed:8374, :pr-distributed:8357, and :pr-distributed:8343 from Hendrik Makait_ for details.
Bokeh 3.3.0 compatibility
"""""""""""""""""""""""""
This release contains compatibility updates for using bokeh>=3.3.0 with proxied Dask dashboards.
Previously the contents of dashboard plots wouldn't be displayed.
See :pr-distributed:8347 and :pr-distributed:8381 from Jacob Tomlinson_ for details.
.. dropdown:: Additional changes
network marker to test_pyarrow_filesystem_option_real_data (:pr:10653) Richard (Rick) Zamora_10656) Charles Blackmon-Luca_pandas offsets deterministically (:pr:10643) Patrick Hoefler_pd.NA functionality (:pr:10640) Patrick Hoefler_RAPIDS_VER to 24.02 (:pr:10636)array.linalg.norm (:pr:10556) joanrue_axis argument to DataFrame.clip and Series.clip (:pr:10616) Richard (Rick) Zamora_10630) Florian Jetter_test_resources_reset_after_cancelled_task (:pr-distributed:8373) crusaderky_8376) Charles Blackmon-Luca_conda-incubator/setup-miniconda from 2.2.0 to 3.0.0 (:pr-distributed:8372)8358) Hendrik Makait_O(1) access for /info/task/ endpoint (:pr-distributed:8363) crusaderky_8362) crusaderky_int metrics to float (:pr-distributed:8361) crusaderky_8355) Florian Jetter_context_meter.add_callback (:pr-distributed:8360) crusaderky_sync() propagates contextvars (:pr-distributed:8354) crusaderky_captured_context_meter (:pr-distributed:8352) crusaderky_context_meter.clear_callbacks (:pr-distributed:8353) crusaderky_@log_errors decorator (:pr-distributed:8351) crusaderky_test_statistical_profiling_cycle (:pr-distributed:8356) Florian Jetter_8350) crusaderky_Client.register_plugin s idempotent argument with .idempotent attribute on plugins (:pr-distributed:8342) Hendrik Makait_8346) Hendrik Makait_pyarrow-hotfix on mindeps-pandas CI (:pr-distributed:8344) Hendrik Makait_scheduler.py::TaskState class (:pr-distributed:8331) Miles_pre-commit linters (:pr-distributed:8340) crusaderky_dtype=object (:pr-distributed:8339) Peter Andreas Entschev_Cluster / SpecCluster calls to async close methods (:pr-distributed:8327) Peter Andreas Entschev_.. _v2023.11.0:
Released on November 10, 2023
Highlights ^^^^^^^^^^
Zero-copy P2P Array Rechunking """"""""""""""""""""""""""""""
Users should see significant performance improvements when using in-memory P2P array rechunking. This is due to no longer copying underlying data buffers.
Below shows a simple example where we compare performance of different rechunking methods.
.. code:: python
shape = (30_000, 6_000, 150) # 201.17 GiB input_chunks = (60, -1, -1) # 411.99 MiB output_chunks = (-1, 6, -1) # 205.99 MiB
arr = da.random.random(size, chunks=input_chunks) with dask.config.set({ "array.rechunk.method": "p2p", "distributed.p2p.disk": True, }): ( da.random.random(size, chunks=input_chunks) .rechunk(output_chunks) .sum() .compute() )
.. image:: images/changelog/2023110-rechunking-disk-perf.png :width: 75% :align: center :alt: A comparison of rechunking performance between the different methods tasks, p2p with disk and p2p without disk on different cluster sizes. The graph shows that p2p without disk is up to 60% faster than the default tasks based approach.
See :pr-distributed:8282, :pr-distributed:8318, :pr-distributed:8321 from crusaderky_ and
(:pr-distributed:8322) from Hendrik Makait_ for details.
Deprecating PyArrow <14.0.1
"""""""""""""""""""""""""""
pyarrow<14.0.1 usage is deprecated starting in this release. It's recommended for all users to upgrade their
version of pyarrow or install pyarrow-hotfix. See this CVE <https://www.cve.org/CVERecord?id=CVE-2023-47248>_
for full details.
See :pr:10622 from Florian Jetter_ for details.
Improved PyArrow filesystem for Parquet
"""""""""""""""""""""""""""""""""""""""
Using filesystem="arrow" when reading Parquet datasets now properly inferrs the correct cloud region
when accessing remote, cloud-hosted data.
See :pr:10590 from Richard (Rick) Zamora_ for details.
Improve Type Reconciliation in P2P Shuffling
""""""""""""""""""""""""""""""""""""""""""""
See :pr-distributed:8332 from Hendrik Makait_ for details.
.. dropdown:: Additional changes
- Fix sporadic failure of ``test_dataframe::test_quantile`` (:pr:`10625`) `Miles`_
- Bump minimum ``click`` to ``>=8.1`` (:pr:`10623`) `Jacob Tomlinson`_
- Refactor ``test_quantile`` (:pr:`10620`) `Miles`_
- Avoid ``PerformanceWarning`` for fragmented DataFrame (:pr:`10621`) `Patrick Hoefler`_
- Generalize computation of ``NEW_*_VER`` in GPU CI updating workflow (:pr:`10610`) `Charles Blackmon-Luca`_
- Switch to newer GPU CI images (:pr:`10608`) `Charles Blackmon-Luca`_
- Remove double slash in ``fsspec`` tests (:pr:`10605`) `Mario Šaško`_
- Reenable ``test_ucx_config_w_env_var`` (:pr-distributed:`8272`) `Peter Andreas Entschev`_
- Don't share ``host_array`` when receiving from network (:pr-distributed:`8308`) `crusaderky`_
- Generalize computation of ``NEW_*_VER`` in GPU CI updating workflow (:pr-distributed:`8319`) `Charles Blackmon-Luca`_
- Switch to newer GPU CI images (:pr-distributed:`8316`) `Charles Blackmon-Luca`_
- Minor updates to shuffle dashboard (:pr-distributed:`8315`) `Matthew Rocklin`_
- Don't use ``bytearray().join`` (:pr-distributed:`8312`) `crusaderky`_
- Reuse identical shuffles in P2P hash join (:pr-distributed:`8306`) `Hendrik Makait`_
.. _v2023.10.1:
Released on October 27, 2023
Highlights ^^^^^^^^^^
Python 3.12 """"""""""" This release adds official support for Python 3.12.
See :pr:10544 and :pr-distributed:8223 from Thomas Grainger_ for details.
.. dropdown:: Additional changes
- Avoid splitting parquet files to row groups as aggressively (:pr:`10600`) `Matthew Rocklin`_
- Speed up ``normalize_chunks`` for common case (:pr:`10579`) `Martin Durant`_
- Use Python 3.11 for upstream and doctests CI build (:pr:`10596`) `Thomas Grainger`_
- Bump ``actions/checkout`` from 4.1.0 to 4.1.1 (:pr:`10592`)
- Switch to PyTables ``HEAD`` (:pr:`10580`) `Thomas Grainger`_
- Remove ``numpy.core`` warning filter, link to issue on ``pyarrow`` caused ``BlockManager`` warning (:pr:`10571`) `Thomas Grainger`_
- Unignore and fix deprecated freq aliases (:pr:`10577`) `Thomas Grainger`_
- Move ``register_assert_rewrite`` earlier in ``conftest`` to fix warnings (:pr:`10578`) `Thomas Grainger`_
- Upgrade ``versioneer`` to 0.29 (:pr:`10575`) `Thomas Grainger`_
- change ``test_concat_categorical`` to be non-strict (:pr:`10574`) `Thomas Grainger`_
- Enable SciPy tests with NumPy 2.0 `Thomas Grainger`_
- Enable tests for scikit-image with NumPy 2.0 (:pr:`10569`) `Thomas Grainger`_
- Fix upstream build (:pr:`10549`) `Thomas Grainger`_
- Add optimized code paths for ``drop_duplicates`` (:pr:`10542`) `Richard (Rick) Zamora`_
- Support ``cudf`` backend in ``dd.DataFrame.sort_values`` (:pr:`10551`) `Richard (Rick) Zamora`_
- Rename "GIL Contention" to just GIL in chart labels (:pr-distributed:`8305`) `Matthew Rocklin`_
- Bump ``actions/checkout`` from 4.1.0 to 4.1.1 (:pr-distributed:`8299`)
- Fix dashboard (:pr-distributed:`8293`) `Hendrik Makait`_
- ``@log_errors`` for async tasks (:pr-distributed:`8294`) `crusaderky`_
- Annotations and better tests for serialize_bytes (:pr-distributed:`8300`) `crusaderky`_
- Temporarily xfail ``test_decide_worker_coschedule_order_neighbors`` to unblock CI (:pr-distributed:`8298`) `James Bourbeau`_
- Skip ``xdist`` and ``matplotlib`` in code samples (:pr-distributed:`8290`) `Matthew Rocklin`_
- Use ``numpy._core`` on ``numpy>=2.dev0`` (:pr-distributed:`8291`) `Thomas Grainger`_
- Fix calculation of ``MemoryShardsBuffer.bytes_read`` (:pr-distributed:`8289`) `crusaderky`_
- Allow P2P to store data in-memory (:pr-distributed:`8279`) `Hendrik Makait`_
- Upgrade ``versioneer`` to 0.29 (:pr-distributed:`8288`) `Thomas Grainger`_
- Allow ``ResourceLimiter`` to be unlimited (:pr-distributed:`8276`) `Hendrik Makait`_
- Run ``pre-commit`` autoupdate (:pr-distributed:`8281`) `Thomas Grainger`_
- Annotate instance variables for P2P layers (:pr-distributed:`8280`) `Hendrik Makait`_
- Remove worker gracefully should not mark tasks as suspicious (:pr-distributed:`8234`) `Thomas Grainger`_
- Add signal handling to ``dask spec`` (:pr-distributed:`8261`) `Thomas Grainger`_
- Add typing for ``sync`` (:pr-distributed:`8275`) `Hendrik Makait`_
- Better annotations for shuffle offload (:pr-distributed:`8277`) `crusaderky`_
- Test minimum versions for p2p shuffle (:pr-distributed:`8270`) `crusaderky`_
- Run coverage on test failures (:pr-distributed:`8269`) `crusaderky`_
- Use ``aiohttp`` with extensions (:pr-distributed:`8274`) `Thomas Grainger`_
.. _v2023.10.0:
Released on October 13, 2023
Highlights ^^^^^^^^^^
Reduced memory pressure for multi array reductions """""""""""""""""""""""""""""""""""""""""""""""""" This release contains major updates to Dask's task graph scheduling logic. The updates here significantly reduce memory pressure on array reductions. We anticipate this will have a strong impact on the array computing community.
See :pr:10535 from Florian Jetter_ for details.
Improved P2P shuffling robustness """"""""""""""""""""""""""""""""" There are several updates (listed below) that make P2P shuffling much more robust and less likely to fail.
See :pr-distributed:8262, :pr-distributed:8264, :pr-distributed:8242, :pr-distributed:8244,
and :pr-distributed:8235 from Hendrik Makait_ and :pr-distributed:8124 from
Charles Blackmon-Luca_ for details.
Reduced scheduler CPU load for large graphs """"""""""""""""""""""""""""""""""""""""""" Users should see reduced CPU load on their scheduler when computing large task graphs.
See :pr-distributed:8238 and :pr:10547 from Florian Jetter_ and
:pr-distributed:8240 from crusaderky_ for details.
.. dropdown:: Additional changes
- Dispatch the ``partd.Encode`` class used for disk-based shuffling (:pr:`10552`) `Richard (Rick) Zamora`_
- Add documentation for hive partitioning (:pr:`10454`) `Richard (Rick) Zamora`_
- Add typing to ``dask.order`` (:pr:`10553`) `Florian Jetter`_
- Allow passing ``index_col=False`` in ``dd.read_csv`` (:pr:`9961`) `Michael Leslie`_
- Tighten ``HighLevelGraph`` annotations (:pr:`10524`) `crusaderky`_
- Support for latest ``ipykernel``/``ipywidgets`` (:pr-distributed:`8253`) `crusaderky`_
- Check minimal ``pyarrow`` version for P2P merge (:pr-distributed:`8266`) `Hendrik Makait`_
- Support for Python 3.12 (:pr-distributed:`8223`) `Thomas Grainger`_
- Use ``memoryview.nbytes`` when warning on large graph send (:pr-distributed:`8268`) `crusaderky`_
- Run tests without ``gilknocker`` (:pr-distributed:`8263`) `crusaderky`_
- Disable ipv6 on MacOS CI (:pr-distributed:`8254`) `crusaderky`_
- Clean up redundant minimum versions (:pr-distributed:`8251`) `crusaderky`_
- Clean up use of ``BARRIER_PREFIX`` in scheduler plugin (:pr-distributed:`8252`) `crusaderky`_
- Improve shuffle run handling in P2P's worker plugin (:pr-distributed:`8245`) `Hendrik Makait`_
- Explicitly set ``charset=utf-8`` (:pr-distributed:`8250`) `crusaderky`_
- Typing tweaks to :pr-distributed:`8239` (:pr-distributed:`8247`) `crusaderky`_
- Simplify scheduler assertion (:pr-distributed:`8246`) `crusaderky`_
- Improve typing (:pr-distributed:`8239`) `Hendrik Makait`_
- Respect cgroups v2 "low" memory limit (:pr-distributed:`8243`) `Samantha Hughes`_
- Fix ``PackageInstall`` by making it a scheduler plugin (:pr-distributed:`8142`) `Hendrik Makait`_
- Xfail ``test_ucx_config_w_env_var`` (:pr-distributed:`8241`) `crusaderky`_
- ``SpecCluster`` resilience to broken workers (:pr-distributed:`8233`) `crusaderky`_
- Suppress ``SpillBuffer`` stack traces for cancelled tasks (:pr-distributed:`8232`) `crusaderky`_
- Update annotations after stringification changes (:pr-distributed:`8195`) `crusaderky`_
- Reduce max recursion depth of profile (:pr-distributed:`8224`) `crusaderky`_
- Offload deeply nested objects (:pr-distributed:`8214`) `crusaderky`_
- Fix flaky ``test_close_connections`` (:pr-distributed:`8231`) `crusaderky`_
- Fix flaky ``test_popen_timeout`` (:pr-distributed:`8229`) `crusaderky`_
- Fix flaky ``test_adapt_then_manual`` (:pr-distributed:`8228`) `crusaderky`_
- Prevent collisions in ``SpillBuffer`` (:pr-distributed:`8226`) `crusaderky`_
- Allow ``retire_workers`` to run concurrently (:pr-distributed:`8056`) `Florian Jetter`_
- Fix HTML repr for ``TaskState`` objects (:pr-distributed:`8188`) `Florian Jetter`_
- Fix ``AttributeError`` for ``builtin_function_or_method`` in ``profile.py`` (:pr-distributed:`8181`) `Florian Jetter`_
- Fix flaky ``test_spans`` (v2) (:pr-distributed:`8222`) `crusaderky`_
.. _v2023.9.3:
Released on September 29, 2023
Highlights ^^^^^^^^^^
Restore previous configuration override behavior
""""""""""""""""""""""""""""""""""""""""""""""""
The 2023.9.2 release introduced an unintentional breaking change in
how configuration options are overriden in dask.config.get with
the override_with= keyword (see :issue:10519).
This release restores the previous behavior.
See :pr:10521 from crusaderky_ for details.
Complex dtypes in Dask Array reductions
"""""""""""""""""""""""""""""""""""""""
This release includes improved support for using common reductions
in Dask Array (e.g. var, std, moment) with complex dtypes.
See :pr:10009 from wkrasnicki_ for details.
.. dropdown:: Additional changes
- Bump ``actions/checkout`` from 4.0.0 to 4.1.0 (:pr:`10532`)
- Match ``pandas`` reverting ``apply`` deprecation (:pr:`10531`) `James Bourbeau`_
- Update gpuCI ``RAPIDS_VER`` to ``23.12`` (:pr:`10526`)
- Temporarily skip failing tests with ``fsspec==2023.9.1`` (:pr:`10520`) `James Bourbeau`_
.. _v2023.9.2:
Released on September 15, 2023
Highlights ^^^^^^^^^^
P2P shuffling now raises when outdated PyArrow is installed
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Previously the default shuffling method would silently fallback from P2P
to task-based shuffling if an older version of pyarrow was installed.
Now we raise an informative error with the minimum required pyarrow
version for P2P instead of silently falling back.
See :pr:10496 from Hendrik Makait_ for details.
Deprecation cycle for admin.traceback.shorten
"""""""""""""""""""""""""""""""""""""""""""""
The 2023.9.0 release modified the admin.traceback.shorten configuration option
without introducing a deprecation cycle. This resulted in failures to create Dask
clusters in some cases. This release introduces a deprecation cycle for this configuration
change.
See :pr:10509 from crusaderky_ for details.
.. dropdown:: Additional changes
- Avoid materializing all iterators in ``delayed`` tasks (:pr:`10498`) `James Bourbeau`_
- Overhaul deprecations system in ``dask.config`` (:pr:`10499`) `crusaderky`_
- Remove unnecessary check in ``timeseries`` (:pr:`10447`) `Patrick Hoefler`_
- Use ``register_plugin`` in tests (:pr:`10503`) `James Bourbeau`_
- Make ``preserve_index`` explicit in ``pyarrow_schema_dispatch`` (:pr:`10501`) `Hendrik Makait`_
- Add ``**kwargs`` support for ``pyarrow_schema_dispatch`` (:pr:`10500`) `Hendrik Makait`_
- Centralize and type ``no_default`` (:pr:`10495`) `crusaderky`_
.. _v2023.9.1:
Released on September 6, 2023
.. note::
This is a hotfix release that fixes a P2P shuffling bug introduced in the 2023.9.0
release (see :pr:10493).
Enhancements ^^^^^^^^^^^^
10485) crusaderky_None in DASK_ environment variables (:pr:10487) crusaderky_Bug Fixes ^^^^^^^^^
_partitions dtype in meta for DataFrame.set_index and DataFrame.sort_values (:pr:10493) Hendrik Makait_cached_property decorators in derived_from (:pr:10490) Lawrence Mitchell_Maintenance ^^^^^^^^^^^
actions/checkout from 3.6.0 to 4.0.0 (:pr:10492)import distributed (:pr:10484) crusaderky_.. _v2023.9.0:
Released on September 1, 2023
Bug Fixes ^^^^^^^^^
np.int64 in keys (:pr:10483) crusaderky__partitions dtype in meta for shuffling (:pr:10462) Hendrik Makait_10456) crusaderky_Documentation ^^^^^^^^^^^^^
p2p shuffle option to DataFrame docs (:pr:10477) Patrick Hoefler_Maintenance ^^^^^^^^^^^
pandas=2.1.0 (:pr:10488) Patrick Hoefler_pandas=2.1.0 (:pr:10439) Patrick Hoefler_pytest-timeout (:pr:10482) crusaderky_actions/checkout from 3.5.3 to 3.6.0 (:pr:10470).. _v2023.8.1:
Released on August 18, 2023
Enhancements ^^^^^^^^^^^^
cpu_count (:pr:10419) Johan Olsson_groupby with sort=True and split_out>1 (:pr:10425) Richard (Rick) Zamora_DataFrame.enforce_runtime_divisions method (:pr:10404) Richard (Rick) Zamora_mode="x" with a single_file=True for Dask DataFrame to_csv (:pr:10443) Genevieve Buckley_Bug Fixes ^^^^^^^^^
ValueError when running to_csv in append mode with single_file as True (:pr:10441) Ben_Maintenance ^^^^^^^^^^^
types_mapper to from_pyarrow_table_dispatch for pandas (:pr:10446) Richard (Rick) Zamora_.. _v2023.8.0:
Released on August 4, 2023
Enhancements ^^^^^^^^^^^^
make_timeseries performance regression (:pr:10428) Irina Truong_Documentation ^^^^^^^^^^^^^
distributed.print to debugging docs (:pr:10435) James Bourbeau_9941) Chiara Marmo_Maintenance ^^^^^^^^^^^
license metadata (:pr:10437) John A Kirkham_dask[array] in dask[dataframe] (:pr:10357) John A Kirkham_RAPIDS_VER to 23.10 (:pr:10427)10426) Hendrik Makait_10424) Hendrik Makait_pandas and pyarrow (:pr:10412) Irina Truong_.. _v2023.7.1:
Released on July 20, 2023
.. note::
This release updates Dask DataFrame to automatically convert
text data using object data types to string[pyarrow]
if pandas>=2 and pyarrow>=12 are installed.
This should result in significantly reduced memory consumption and increased computation performance in many workflows that deal with text data.
You can disable this change by setting the dataframe.convert-string
configuration value to False with
.. code-block:: python
dask.config.set({"dataframe.convert-string": False})
Enhancements ^^^^^^^^^^^^
pyarrow strings if proper dependencies are installed (:pr:10400) James Bourbeau_repartition before shuffle for p2p (:pr:10421) Patrick Hoefler_10392) Irina Truong_dask.bag.Bag.random_sample (:pr:10356) crusaderky_ValueError for invalid time units (:pr:10408) Nat Tabris_repartition a no-op when divisions match (divisions provided as a list) (:pr:10395) Nicolas Grandemange_Bug Fixes ^^^^^^^^^
dataframe.convert-string in read_parquet token (:pr:10411) James Bourbeau_dtype is lost when concatenating MultiIndex (:pr:10407) Irina Truong_FutureWarning: The provided callable... (:pr:10405) Irina Truong_read_parquet (:pr:10353) Richard (Rick) Zamora_concat ignoring DataFrame withouth columns (:pr:10359) Patrick Hoefler_.. _v2023.7.0:
Released on July 7, 2023
Enhancements ^^^^^^^^^^^^
10380) Jacob Tomlinson_Bug Fixes ^^^^^^^^^
_clean_ipython_traceback (:pr:10385) Alexander Clausen_df is immutable after from_pandas (:pr:10383) Patrick Hoefler_inplace in Series.rename (:pr:10313) Patrick Hoefler_Documentation ^^^^^^^^^^^^^
10377) Swayam Patil_Maintenance ^^^^^^^^^^^
astype implementation (:pr:10393) Patrick Hoefler_test_first_and_last to accommodate deprecated last (:pr:10373) James Bourbeau_level to create_merge_tree (:pr:10391) Patrick Hoefler_scipy.stats.chisquare docstring (:pr:10382) Doug Davis_.. _v2023.6.1:
Released on June 26, 2023
Enhancements ^^^^^^^^^^^^
clip_lower and clip_upper (:pr:10371) Patrick Hoefler_DataFrame.set_index(..., sort=False) (:pr:10342) Miles_10354) Irina Truong_pyarrow.Table conversion (:pr:10312) Richard (Rick) Zamora_10344) Hendrik Makait_10336) Hendrik Makait_Bug Fixes ^^^^^^^^^
header passed to read_csv (:pr:10355) GALI PREM SAGAR_dropna and observed in GroupBy.var and GroupBy.std (:pr:10350) Patrick Hoefler_H5FD_lock error when writing to hdf with distributed client (:pr:10309) Irina Truong_total_mem_usage of bag.map() (:pr:10341) Irina Truong_Deprecations ^^^^^^^^^^^^
DataFrame.fillna/Series.fillna with method (:pr:10349) Irina Truong_DataFrame.first and Series.first (:pr:10352) Irina Truong_Maintenance ^^^^^^^^^^^
numpy.compat (:pr:10370) Irina Truong_10367) Irina Truong_pyarrow_table_dispatch functions (:pr:10364) Richard (Rick) Zamora_try/except in isna (:pr:10363) Patrick Hoefler_mypy support for numpy 1.25 (:pr:10362) crusaderky_actions/checkout from 3.5.2 to 3.5.3 (:pr:10348)numba in upstream build (:pr:10330) James Bourbeau_pandas/numpy/scipy (:pr:10346) Matthew Roeschke_10343) Hendrik Makait_.. _v2023.6.0:
Released on June 9, 2023
Enhancements ^^^^^^^^^^^^
not in predicate support to read_parquet (:pr:10320) Richard (Rick) Zamora_Bug Fixes ^^^^^^^^^
value_counts (:pr:10323) Irina Truong_describe top and freq values (:pr:10319) James Bourbeau_Documentation ^^^^^^^^^^^^^
10332) Sarah Charlotte Johnson_Maintenance ^^^^^^^^^^^
numba and sparse on Python 3.11 (:pr:10329) Thomas Grainger_numpy.find_common_type warning ignore (:pr:10311) James Bourbeau_RAPIDS_VER to 23.08 (:pr:10310).. _v2023.5.1:
Released on May 26, 2023
.. note::
This release drops support for Python 3.8. As of this release
Dask supports Python 3.9, 3.10, and 3.11.
See this community issue <https://github.com/dask/community/issues/315>_
for more details.
Enhancements ^^^^^^^^^^^^
10295) Thomas Grainger_10294) Jacob Tomlinson_dd.to_datetime for GPU-backed collections, introduce get_meta_library utility (:pr:9881) Charles Blackmon-Luca_na_action to DataFrame.map (:pr:10305) Patrick Hoefler_TypeError in DataFrame.nsmallest and DataFrame.nlargest when columns is not given (:pr:10301) Patrick Hoefler_sizeof for pd.MultiIndex (:pr:10230) Patrick Hoefler_DataFrame methods (:pr:10261) Patrick Hoefler_numeric_only support to DataFrame.idxmin and DataFrame.idxmax (:pr:10253) Patrick Hoefler_numeric_only support for DataFrame.quantile (:pr:10259) Patrick Hoefler_numeric_only=False in DataFrame.std (:pr:10251) Patrick Hoefler_numeric_only=False for GroupBy.cumprod and GroupBy.cumsum (:pr:10262) Patrick Hoefler_numeric_only for skew and kurtosis (:pr:10258) Patrick Hoefler_mask and where should accept a callable (:pr:10289) Irina Truong_Categorical to pa.dictionary in read_parquet (:pr:10285) Patrick Hoefler_Bug Fixes ^^^^^^^^^
10318) crusaderky_10157) Hendrik Makait_drop to support mismatched partitions (:pr:10300) James Bourbeau_divisions construction for to_timestamp (:pr:10304) Patrick Hoefler_ExtensionDtype raising in Series reduction operations (:pr:10149) Patrick Hoefler_da.random interface (:pr:10247) Eray Aslan_da.coarsen doesn't trim an empty chunk in meta (:pr:10281) Irina Truong_engine="pyarrow" in read_csv (:pr:10280) Patrick Hoefler_Documentation ^^^^^^^^^^^^^
meta_from_array to API docs (:pr:10306) Ruth Comer_10296) Sarah Charlotte Johnson_10288) Matthew Rocklin_Maintenance ^^^^^^^^^^^
anaconda-client from conda-forge when uploading conda nightlies (:pr:10316) Charles Blackmon-Luca_isort to add from __future__ import annotations (:pr:10314) Thomas Grainger_pandas Series.__getitem__ deprecation in tests (:pr:10308) James Bourbeau_numpy.find_common_type warning from pandas (:pr:10307) James Bourbeau_DataFrame.__setitem__ does not modify df inplace (:pr:10223) Patrick Hoefler_dropna in value_counts (:pr:10299) Patrick Hoefler_pytest-cov to test extra (:pr:10271) James Bourbeau_.. _v2023.5.0:
Released on May 12, 2023
Enhancements ^^^^^^^^^^^^
numeric_only=False for GroupBy.corr and GroupBy.cov (:pr:10264) Patrick Hoefler_numeric_only=False in DataFrame.var (:pr:10250) Patrick Hoefler_numeric_only support to DataFrame.mode (:pr:10257) Patrick Hoefler_DataFrame.map to dask.DataFrame API (:pr:10246) Patrick Hoefler_DataFrame.applymap deprecation and all NA concat behaviour change (:pr:10245) Patrick Hoefler_numeric_only=False for DataFrame.count (:pr:10234) Patrick Hoefler_10163) Irina Truong_numeric_only=True in GroupBy.corr and GroupBy.cov (:pr:10227) Patrick Hoefler_numeric_only support to GroupBy.median (:pr:10236) Patrick Hoefler_mimesis=9 in dask.datasets (:pr:10241) James Bourbeau_numeric_only support to min, max and prod (:pr:10219) Patrick Hoefler_numeric_only=True support for GroupBy.cumsum and GroupBy.cumprod (:pr:10224) Patrick Hoefler_numeric_only keyword (:pr:10228) Patrick Hoefler_Bug Fixes ^^^^^^^^^
clone + from_array failure (:pr:10211) crusaderky_10150) Patrick Hoefler_numpy=1.25 (:pr:10248) James Bourbeau_10184) Irina Truong_corr and cov on a single-row partition (:pr:9756) Irina Truong_test_groupby_numeric_only_supported and test_groupby_aggregate_categorical_observed upstream errors (:pr:10243) Irina Truong_Documentation ^^^^^^^^^^^^^
10266) Matthew Rocklin_Index API reference (:pr:10263) hotpotato_Maintenance ^^^^^^^^^^^
apply (:pr:10256) Patrick Hoefler_imageio version restriction in CI (:pr:10260) Patrick Hoefler_DataFrame variance methods (:pr:10252) Patrick Hoefler_xfail test_categories with pyarrow strings and pyarrow>=12 (:pr:10244) Irina Truong_PYTHON_VER 3.8->3.9 (:pr:10233) Charles Blackmon-Luca_.. _v2023.4.1:
Released on April 28, 2023
Enhancements ^^^^^^^^^^^^
numeric_only support for DataFrame.sum (:pr:10194) Patrick Hoefler_numeric_only=True in GroupBy operations (:pr:10222) Patrick Hoefler_DataFrame.__setitem__ for pandas 1.4 and up (:pr:10221) Patrick Hoefler_Series.apply with _meta_nonempty (:pr:10212) Patrick Hoefler_sqlalchemy and fix compatibility issues (:pr:10140) Patrick Hoefler_Bug Fixes ^^^^^^^^^
10225) Florian Jetter_Index meta creation (:pr:10170) Patrick Hoefler_10169) Patrick Hoefler_Index from fastparquet to object dtype (:pr:10179) Patrick Hoefler_Documentation ^^^^^^^^^^^^^
10232) Jacob Tomlinson_DataFrame.reduction to API docs (:pr:10229) James Bourbeau_DataFrame.persist to docs and fix links (:pr:10231) Patrick Hoefler_GroupBy.transform (:pr:10185) Irina Truong_10189) Eray Aslan_Maintenance ^^^^^^^^^^^
<2.28 (:pr:10216) Patrick Hoefler_importlib_metadata backport (:pr:10207) James Bourbeau_xarray back to Python 3.11 CI builds (:pr:10200) James Bourbeau_mindeps build with all optional dependencies (:pr:10161) Charles Blackmon-Luca_like value for array_safe in percentiles_summary (:pr:10156) Charles Blackmon-Luca_read_hdf (:pr:10205) Thomas Grainger_10071) Charles Blackmon-Luca_10203) Thomas Grainger_is_period_dtype and is_sparse_dtype (:pr:10197) Patrick Hoefler_actions/checkout from 3.5.0 to 3.5.2 (:pr:10201)is_categorical_dtype from pandas (:pr:10180) Patrick Hoefler_is_interval_dtype and is_datetime64tz_dtype (:pr:10188) Patrick Hoefler_.. _v2023.4.0:
Released on April 14, 2023
Enhancements ^^^^^^^^^^^^
update_defaults (:pr:10159) Gabe Joseph_list and get a value from dask config (:pr:9936) Irina Truong_read_json (:pr:9947) Richard (Rick) Zamora_GroupBy.dtypes (:pr:10111) Irina Truong_Bug Fixes ^^^^^^^^^
grouper-related changes (:pr:10182) Irina Truong_GroupBy.cov raising for non-numeric grouping column (:pr:10171) Patrick Hoefler_Index supporting numpy numeric dtypes (:pr:10154) Irina Truong_dtype for partitioning columns when read with pyarrow (:pr:10115) Patrick Hoefler_to_hdf (:pr:10123) Hendrik Makait_None column name when checking if columns are all numeric (:pr:10128) Lawrence Mitchell_valid_divisions when passed a tuple (:pr:10126) Brian Phillips_DataFrame.categorize (:pr:10120) Hendrik Makait_10042) Richard (Rick) Zamora_Deprecations ^^^^^^^^^^^^
use_nullable_dtypes= and add dtype_backend= (:pr:10076) Irina Truong_convert_dtype in Series.apply (:pr:10133) Irina Truong_Documentation ^^^^^^^^^^^^^
Generator based random number generation (:pr:10134) Eray Aslan_Maintenance ^^^^^^^^^^^
dataframe.convert_string to dataframe.convert-string (:pr:10191) Irina Truong_python-cityhash to CI environments (:pr:10190) Charles Blackmon-Luca_scikit-image to fix Windows CI (:pr:10186) Patrick Hoefler_to_pydatetime and apply (:pr:10168) Patrick Hoefler_bokeh<3 restriction (:pr:10177) James Bourbeau_10173) Patrick Hoefler_pyarrow CI to fail (:pr:10176) James Bourbeau_Generator for random number generation in dask.array (:pr:10003) Eray Aslan_peter-evans/create-pull-request from 4 to 5 (:pr:10166)modf operation in test_arithmetic (:pr:10162) Irina Truong_xarray from CI with pandas 2.0 (:pr:10153) James Bourbeau_update_graph counting logic in test_default_scheduler_on_worker (:pr:10145) James Bourbeau_pandas 2.0 (:pr:10138) James Bourbeau_dask/gpu from gpuCI update reviewers (:pr:10135) Charles Blackmon-Luca_RAPIDS_VER to 23.06 (:pr:10129)actions/stale from 6 to 8 (:pr:10121)setuptools (:pr:10102) Thomas Grainger_assert_eq checks on Scalar-like objects (:pr:10125) Matthew Rocklin_10124) Thomas Grainger_actions/checkout from 3.4.0 to 3.5.0 (:pr:10122)test_null_partition_pyarrow in pyarrow CI build (:pr:10116) Irina Truong_9988) Florian Jetter_dask.compatibility private (:pr:10114) Jacob Tomlinson_.. _v2023.3.2:
Released on March 24, 2023
Enhancements ^^^^^^^^^^^^
observed=False for groupby with categoricals (:pr:10095) Irina Truong_axis= for some groupby operations (:pr:10094) James Bourbeau_axis keyword in DataFrame.rolling/Series.rolling is deprecated (:pr:10110) Irina Truong_DataFrame._data deprecation in pandas (:pr:10081) Irina Truong_importlib_metadata backport to avoid CLI UserWarning (:pr:10070) Thomas Grainger_dask.dataframe.read_parquet to to_parquet (:pr:9981) Anton Loukianov_Bug Fixes ^^^^^^^^^
dd.shuffle in groupby-apply (:pr:10043) Richard (Rick) Zamora_pyarrow parquet engine (:pr:10007) Richard (Rick) Zamora_*_like functions (:pr:10064) Doug Davis_Documentation ^^^^^^^^^^^^^
to_backend methods to API docs (:pr:10093) Lawrence Mitchell_10065) Charles Blackmon-Luca_Maintenance ^^^^^^^^^^^
10104) Thomas Grainger_xfail test_division_or_partition with pyarrow strings active (:pr:10108) Irina Truong_xfail test_different_columns_are_allowed with pyarrow strings active (:pr:10109) Irina Truong_10113) Jacob Tomlinson_xfail test_to_dataframe_optimize_graph with pyarrow strings active (:pr:10087) Irina Truong_test_development_guidelines_matches_ci on editable install (:pr:10106) Charles Blackmon-Luca_xfail test_dataframe_cull_key_dependencies_materialized with pyarrow strings active (:pr:10088) Irina Truong_mimesis in CI environments (:pr:10105) Charles Blackmon-Luca_ipykernel (:pr:10101) Irina Truong_ipykernel (:pr:10103) Thomas Grainger_pyarrow build to continue on failures (:pr:10097) James Bourbeau_actions/checkout from 3.3.0 to 3.4.0 (:pr:10096)test_set_index_on_empty with pyarrow strings active (:pr:10054) Irina Truong_xfail pyarrow pickling tests (:pr:10082) James Bourbeau_10078) James Bourbeau_xfail more pyarrow tests (:pr:10066) Irina Truong_pyarrow_compat tests with pandas 2.0 (:pr:10063) James Bourbeau`_test_melt with pyarrow strings active (:pr:10052) Irina Truong_test_str_accessor with pyarrow strings active (:pr:10048) James Bourbeau_test_better_errors_object_reductions with pyarrow strings active (:pr:10051) James Bourbeau_test_loc_with_non_boolean_series with pyarrow strings active (:pr:10046) James Bourbeau_test_values with pyarrow strings active (:pr:10050) James Bourbeau_xfail test_upstream_packages_installed (:pr:10047) James Bourbeau_.. _v2023.3.1:
Released on March 10, 2023
Enhancements ^^^^^^^^^^^^
MultiIndex (:pr:10040) Irina Truong_pyarrow strings (:pr:10000) Irina Truong_RuntimeWarning during array reductions (:pr:10030) James Bourbeau_complete extras (:pr:10023) James Bourbeau_dataframe.convert-string=True and pandas<2.0 (:pr:10033) Irina Truong_method (:pr:10013) James Bourbeau_pandas extension dtypes to arrays (:pr:10018) James Bourbeau_randomgen support (:pr:9987) Eray Aslan_Bug Fixes ^^^^^^^^^
10027) Hendrik Makait_pyarrow expression (:pr:9885) Richard (Rick) Zamora_numpy scalars and 0d arrays as scalars when padding (:pr:9653) Justus Magin_read_parquet operation (:pr:10002) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
10022) Miles_Maintenance ^^^^^^^^^^^
pyarrow parquet engine (:pr:10039) Richard (Rick) Zamora_pyarrow to 7.0 (:pr:10024) James Bourbeau_9994) (:pr:10037) Florian Jetter_10031) James Bourbeau_9994) Florian Jetter_pyarrow strings turned on (:pr:10017) James Bourbeau_test_groupby_dropna_with_agg for pandas 2.0 (:pr:10001) Irina Truong_test_pickle_roundtrip for pandas 2.0 (:pr:10011) James Bourbeau_.. _v2023.3.0:
Released on March 1, 2023
Bug Fixes ^^^^^^^^^
10005) Florian Jetter_Documentation ^^^^^^^^^^^^^
10008) James Bourbeau_Maintenance ^^^^^^^^^^^
jinja2 dependency (:pr:9999) Charles Blackmon-Luca_.. _v2023.2.1:
Released on February 24, 2023
.. note::
This release changes the default DataFrame shuffle algorithm to ``p2p``
to improve stability and performance. `Learn more here <https://blog.coiled.io/blog/shuffling-large-data-at-constant-memory.html?utm_source=dask-docs&utm_medium=changelog>`_
and please provide any feedback `on this discussion <https://github.com/dask/distributed/discussions/7509>`_.
If you encounter issues with this new algorithm, please see the :ref:`documentation <shuffle-methods>`
for more information, and how to switch back to the old mode.
Enhancements ^^^^^^^^^^^^
9991) Florian Jetter_9939) Hendrik Makait_dataframe.convert-string support for read_parquet (:pr:9979) Irina Truong_9900) Florian Jetter_split_row_groups default to "infer" (:pr:9637) Richard (Rick) Zamora_pyarrow strings (:pr:9926) James Bourbeau_sort_values (:pr:8263) Charles Blackmon-Luca_Generator based random-number generation indask.array (:pr:9038) Eray Aslan_numeric_only for simple groupby aggregations for pandas 2.0 compatibility (:pr:9889) Irina Truong_Bug Fixes ^^^^^^^^^
9739) David Hoese_9989) Matthew Rocklin_describe compatibility for pandas 2.0 (:pr:9982) James Bourbeau_Documentation ^^^^^^^^^^^^^
9912) Sarah Charlotte Johnson_DataFrame.partitions (:pr:9976) Tom Augspurger_9903) Guillaume Eynard-Bontemps_9933) Gabe Joseph_Maintenance ^^^^^^^^^^^
9983) James Bourbeau_9990) Charles Blackmon-Luca_pandas=1.3 and numpy=1.21 (:pr:9950) James Bourbeau_std to work with numeric_only for pandas 2.0 (:pr:9960) Irina Truong_xfail test_roundtrip_partitioned_pyarrow_dataset (:pr:9977) James Bourbeau_test_idxmaxmin (:pr:9944) Patrick Hoefler_pre-commit versions (:pr:9955) crusaderky_test_groupby_unaligned_index for pandas 2.0 (:pr:9963) Irina Truong_xfail test_set_index_overlap_2 for pandas 2.0 (:pr:9959) James Bourbeau_test_merge_by_index_patterns for pandas 2.0 (:pr:9930) Irina Truong_9953) James Bourbeau_test_rolling_agg_aggregate for pandas 2.0 compatibility (:pr:9948) Irina Truong_black to 23.1.0 (:pr:9956) crusaderky_9940) Charles Blackmon-Luca_test_to_timestamp for pandas 2.0 (:pr:9932) Irina Truong_groupby value_counts for pandas 2.0 compatibility (:pr:9928) Irina Truong_9945) Jacob Tomlinson_9873) Joris Van den Bossche_.. _v2023.2.0:
Released on February 10, 2023
Enhancements ^^^^^^^^^^^^
numeric_only default in quantile for pandas 2.0 (:pr:9854) Irina Truong_repartition a no-op when divisions match (:pr:9924) James Bourbeau_datetime_is_numeric behavior in describe for pandas 2.0 (:pr:9868) Irina Truong_value_counts to return correct name in pandas 2.0 (:pr:9919) Irina Truong_axis=None behavior in pandas 2.0 for certain reductions (:pr:9867) James Bourbeau_RuntimeWarning at the chunk level for nanmin and nanmax (:pr:9916) Julia Signell_meta_nonempty index creation for pandas 2.0 (:pr:9908) James Bourbeau_DataFrame.info() tests for pandas 2.0 (:pr:9909) James Bourbeau_Bug Fixes ^^^^^^^^^
GroupBy.value_counts handling for multiple groupby columns (:pr:9905) Charles Blackmon-Luca_Documentation ^^^^^^^^^^^^^
9893) Patrick Hoefler_keep=False in drop_duplicates docstring (:pr:9887) Jayesh Manani_meta details to dask Array (:pr:9886) Jayesh Manani_9906) Gabe Joseph_Maintenance ^^^^^^^^^^^
test_numeric_column_names for pandas 2.0 (:pr:9937) Irina Truong_dask/dataframe/tests/test_utils_dataframe.py tests for pandas 2.0 (:pr:9788) James Bourbeau_index.is_numeric with is_any_real_numeric_dtype for pandas 2.0 compatibility (:pr:9918) Irina Truong_pd.core import in dask utils (:pr:9907) Matthew Roeschke_upstream build on pull requests (:pr:9910) James Bourbeau_sqlalchemy.exc.RemovedIn20Warning (:pr:9904) James Bourbeau_sqlalchemy < 2 in CI (:pr:9897) James Bourbeau_isort version to 5.12.0 (:pr:9895) Lawrence Mitchell_skiprows variable in read_csv (:pr:9892) Patrick Hoefler_.. _v2023.1.1:
Released on January 27, 2023
Enhancements ^^^^^^^^^^^^
to_backend method to Array and _Frame (:pr:9758) Richard (Rick) Zamora_pandas 2.0 (:pr:9872) Irina Truong_numeric_only to DataFrame.cov and DataFrame.corr (:pr:9787) James Bourbeau_group_keys default change in pandas 2.0 (:pr:9855) Irina Truong_infer_datetime_format compatibility for pandas 2.0 (:pr:9783) James Bourbeau_Bug Fixes ^^^^^^^^^
BroadcastJoinLayer (:pr:9871) Richard (Rick) Zamora_broadcast argument in DataFrame.merge (:pr:9852) Richard (Rick) Zamora_pyarrow parquet columns statistics computation (:pr:9772) aywandji_Documentation ^^^^^^^^^^^^^
9863) Chiara Marmo_9864) Chiara Marmo_9768) Jayesh Manani_no-worker tasks (:pr:9839) Florian Jetter_Maintenance ^^^^^^^^^^^
distributed scheduler (:pr:9890) James Bourbeau_RAPIDS_VER to 23.04 (:pr:9876)distributed default (:pr:9869) Florian Jetter_xarray-contrib/issue-from-pytest-log to version 1.2.6 (:pr:9865) James Bourbeau_9826) Florian Jetter_xfail datetime64 Parquet roundtripping tests for new fastparquet (:pr:9811) James Bourbeau_upstream CI build (:pr:9853) James Bourbeau_9844) James Bourbeau_kwargs from make_blockwise_graph (:pr:9838) Florian Jetter_persist call in test_setitem_extended_API_2d_mask (:pr:9843) Charles Blackmon-Luca_9833) James Bourbeau_.. _v2023.1.0:
Released on January 13, 2023
Enhancements ^^^^^^^^^^^^
distributed default clients even if no config is set (:pr:9808) Florian Jetter_ma.where and ma.nonzero (:pr:9760) Erik Holmgren_zarr store creation functions (:pr:9790) Ryan Abernathey_iteritems compatibility for pandas 2.0 (:pr:9785) James Bourbeau_sizeof for pandas string[python] dtype (:pr:9781) crusaderky_sizeof() of duplicate references to pandas object types (:pr:9776) crusaderky_GroupBy.__getitem__ compatibility for pandas 2.0 (:pr:9779) James Bourbeau_append compatibility for pandas 2.0 (:pr:9750) James Bourbeau_get_dummies compatibility for pandas 2.0 (:pr:9752) James Bourbeau_is_monotonic compatibility for pandas 2.0 (:pr:9751) James Bourbeau_numpy=1.24 compatability (:pr:9777) James Bourbeau_Documentation ^^^^^^^^^^^^^
encoding kwarg in docstring for to_json (:pr:9796) Sultan Orazbayev_SubprocessCluster in LocalCluster documentation (:pr:9784) Hendrik Makait_dask/distributed (:pr:9761) crusaderky_Maintenance ^^^^^^^^^^^
RuntimeWarning in test_setitem_extended_API_2d_mask (:pr:9828) James Bourbeau_test_threaded.py::test_interrupt (:pr:9827) Hendrik Makait_xarray-contrib/issue-from-pytest-log in upstream report (:pr:9822) James Bourbeau_pip install dask on gpuCI builds (:pr:9816) Charles Blackmon-Luca_actions/checkout from 3.2.0 to 3.3.0 (:pr:9815)sqlalchemy import failures in mindeps testing (:pr:9809) Charles Blackmon-Luca_sqlalchemy.exc.RemovedIn20Warning (:pr:9801) Thomas Grainger_xfail datetime64 Parquet roundtripping tests for pandas 2.0 (:pr:9786) James Bourbeau_sqlachemy 1.3 compatibility (:pr:9695) McToel_9775) Elliott Sales de Andrade_dask/dataframe/io/orc/utils.py (:pr:9774) Elliott Sales de Andrade_.. _v2022.12.1:
Released on December 16, 2022
Enhancements ^^^^^^^^^^^^
dtype_backend="pandas|pyarrow" configuration (:pr:9719) James Bourbeau_cupy.ndarray to cudf.DataFrame dispatching in dask.dataframe (:pr:9579) Richard (Rick) Zamora_read_parquet (:pr:9699) Richard (Rick) Zamora_pyarrow extension arrays efficiently (:pr:9740) James Bourbeau_Bug Fixes ^^^^^^^^^
tz-aware datetime index (:pr:9741) James Bourbeau_9724) Irina Truong_pyarrow-backed extension dtypes (:pr:9717) James Bourbeau_SeriesGroupby (:pr:9716) Lawrence Mitchell_Documentation ^^^^^^^^^^^^^
9748) Shawn_9696) Hendrik Makait_Maintenance ^^^^^^^^^^^
zarr to Python 3.11 CI environment (:pr:9771) James Bourbeau_9708) Thomas Grainger_actions/checkout from 3.1.0 to 3.2.0 (:pr:9753)np.bool8 deprecation warning (:pr:9737) James Bourbeau_upstream CI build (:pr:9731) James Bourbeau_data.h5 and mydask.html files during tests (:pr:9726) Thomas Grainger_.. _v2022.12.0:
Released on December 2, 2022
Enhancements ^^^^^^^^^^^^
set_index logic from read_parquet (:pr:9661) Richard (Rick) Zamora_use_nullable_dtypes to dd.read_parquet (:pr:9617) Ian Rose_map_overlap in order to accept pandas arguments (:pr:9571) Fabien Aulaire_FutureWarning in .str.split(..., expand=True) (:pr:9704) Jacob Hayes_groupby slicing (:pr:9667) Richard (Rick) Zamora_9685) Ben_9677) Richard (Rick) Zamora_Bug Fixes ^^^^^^^^^
9672) Richard (Rick) Zamora_da.fft.fft for array-like inputs (:pr:9688) James Bourbeau_groupby -aggregation when grouping on an index by name (:pr:9646) Richard (Rick) Zamora_Maintenance ^^^^^^^^^^^
PytestReturnNotNoneWarning in test_inheriting_class (:pr:9707) Thomas Grainger_test_dataframe_aggregations_multilevel (:pr:9701) Richard (Rick) Zamora_mypy version (:pr:9697) crusaderky_test_map_partitions_df_input (:pr:9687) James Bourbeau_xarray-contrib/issue-from-pytest-log in upstream build (:pr:9682) James Bourbeau_xfail ttest_1samp for upstream scipy (:pr:9670) James Bourbeau_RAPIDS_VER to 23.02 (:pr:9678).. _v2022.11.1:
Released on November 18, 2022
Enhancements ^^^^^^^^^^^^
bokeh=3 support (:pr:9673) Gabe Joseph_fastparquet evolution (:pr:9650) Martin Durant_Maintenance ^^^^^^^^^^^
ga-yaml-parser step in gpuCI updating workflow (:pr:9675) Charles Blackmon-Luca_importlib.metadata workaround (:pr:9658) James Bourbeau_mindeps-distributed CI build to handle numpy/pandas not being installed (:pr:9668) James Bourbeau_.. _v2022.11.0:
Released on November 15, 2022
Enhancements ^^^^^^^^^^^^
from_dict implementation to allow usage from other backends (:pr:9628) GALI PREM SAGAR_Bug Fixes ^^^^^^^^^
pandas constructors in dask.dataframe.core (:pr:9570) Richard (Rick) Zamora_sort_values with Timestamp data (:pr:9642) James Bourbeau_pd.Index call in _get_partitions (:pr:9634) Benjamin Zaitlen_read_csv behavior for header=0 and names (:pr:9614) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
9660) Gabe Joseph_import dask as d from docstrings (:pr:9644) Matthew Rocklin_read_parquet docstring (:pr:9636) qheuristics_array/bag/dataframe sections (:pr:9630) Matthew Rocklin_Maintenance ^^^^^^^^^^^
conda-incubator/[email protected] (:pr:9662) John A Kirkham_bokeh=3 (:pr:9659) James Bourbeau_upstream build with Python 3.10 (:pr:9655) James Bourbeau_pyyaml version in mindeps testing (:pr:9640) Charles Blackmon-Luca_pre-commit to catch breakpoint() (:pr:9638) James Bourbeau_xarray-contrib/issue-from-pytest-log from 1.1 to 1.2 (:pr:9635)blosc references (:pr:9625) Naty Clementi_mypy and drop unused comments (:pr:9616) Hendrik Makait_test_repartition_npartitions (:pr:9585) Richard (Rick) Zamora_.. _v2022.10.2:
Released on October 31, 2022
This was a hotfix and has no changes in this repository. The necessary fix was in dask/distributed, but we decided to bump this version number for consistency.
.. _v2022.10.1:
Released on October 28, 2022
Enhancements ^^^^^^^^^^^^
9563) ChrisJar_set_index (:pr:9566) James Bourbeau_9519) Shingo OKAWA_Bug Fixes ^^^^^^^^^
merge with emtpy left DataFrame (:pr:9578) Ian Rose_Documentation ^^^^^^^^^^^^^
9592) James Bourbeau_sphinx-click for dask CLI (:pr:9589) James Bourbeau_9584) James Bourbeau_map_overlap docstring (:pr:9568) James Bourbeau_Maintenance ^^^^^^^^^^^
9595) John A Kirkham_bokeh<3 (:pr:9607) James Bourbeau_importlib-related failures in upstream CI (:pr:9604) Charles Blackmon-Luca_upstream CI report (:pr:9603) James Bourbeau_upstream CI report (:pr:9602) James Bourbeau_setuptools host dep, add CLI entrypoint (:pr:9600) Charles Blackmon-Luca_Backend dispatch class type annotations (:pr:9573) Ian Rose_.. _v2022.10.0:
Released on October 14, 2022
New Features ^^^^^^^^^^^^
9475) Richard (Rick) Zamora_9283) Doug Davis_Enhancements ^^^^^^^^^^^^
9516) Ian Rose_9555) David Hoese_map_overlap (:pr:9559) Nicolas Grandemange_9504) Ian Rose_datetime.datetime tokenize idempotantly (:pr:9532) Martin Durant_datetime.time (:pr:9528) Tim Paine_Bug Fixes ^^^^^^^^^
9545) James Bourbeau_np.nan for int dtype (:pr:9531) Doug Davis_9538) Ian Rose_pickle-able binops in delayed (:pr:9540) Ian Rose_9534) Martin Durant_Documentation ^^^^^^^^^^^^^
9537) Matthew Rocklin_Maintenance ^^^^^^^^^^^
tiledb-py version to avoid CI failures (:pr:9569) James Bourbeau_actions/github-script from 3 to 6 (:pr:9564)actions/stale from 4 to 6 (:pr:9551)peter-evans/create-pull-request from 3 to 4 (:pr:9550)actions/checkout from 2 to 3.1.0 (:pr:9552)codecov/codecov-action from 1 to 3 (:pr:9549)the-coding-turtle/ga-yaml-parser from 0.1.1 to 0.1.2 (:pr:9553)9547) James Bourbeau_9542) James Bourbeau_9530) crusaderky_RAPIDS_VER to 22.12 (:pr:9524).. _v2022.9.2:
Released on September 30, 2022
Enhancements ^^^^^^^^^^^^
9507) James Bourbeau_Documentation ^^^^^^^^^^^^^
9513) James Bourbeau_9511) nouman_Maintenance ^^^^^^^^^^^
9486) Ian Rose_.. _v2022.9.1:
Released on September 16, 2022
New Features ^^^^^^^^^^^^
DataFrame and Series median methods (:pr:9483) James Bourbeau_Enhancements ^^^^^^^^^^^^
groupby default (:pr:9453) Ian Rose_9419) Greg Hayes_distributed.utils.key_split functionality to dask.utils.key_split (:pr:9464) Luke Conibear_Bug Fixes ^^^^^^^^^
set_index doesn't drop rows (:pr:9423) Julia Signell_Series to column when ddf.columns.min() raises (:pr:9485) Erik Welch_stack_partitions (:pr:9481) James Bourbeau_split_out (:pr:9493) Lawrence Mitchell_Deprecations ^^^^^^^^^^^^
split_out to be None, which then defaults to 1 in groupby().aggregate() (:pr:9491) Ian Rose_Documentation ^^^^^^^^^^^^^
enforce_metadata documentation, not checking for dtypes (:pr:9474) Nicolas Grandemange_it's --> its typo (:pr:9484) Nat Tabris_Maintenance ^^^^^^^^^^^
9500) Ian Rose_numeric_only warnings from pandas (:pr:9496) James Bourbeau_set_index(..., inplace=True) where not necessary (:pr:9472) James Bourbeau_9495) James Bourbeau_test_groupby_dropna_cudf based on cudf support for group_keys (:pr:9482) James Bourbeau_dd.from_bcolz (:pr:9479) James Bourbeau_flake8-bugbear to pre-commit hooks (:pr:9457) Luke Conibear_B023) (:pr:9461) Luke Conibear_B015) (:pr:9459) Luke Conibear_9469) James Bourbeau_B007) (:pr:9458) Luke Conibear_getattr calls for constant attributes (B009) (:pr:9460) Luke Conibear_libprotobuf to allow nightly pyarrow in the upstream CI build (:pr:9465) Joris Van den Bossche_B006) (:pr:9462) Luke Conibear_flake8 mirror and updated version (:pr:9456) Luke Conibear_.. _v2022.9.0:
Released on September 2, 2022
Enhancements ^^^^^^^^^^^^
groupby aggregations (:pr:9442) Richard (Rick) Zamora_6710) Gabe Joseph_Bug Fixes ^^^^^^^^^
by columns internally for cumulative operations on the same by columns (:pr:9430) Pavithra Eswaramoorthy_get_group with categoricals (:pr:9436) Pavithra Eswaramoorthy_MaterializedLayer.cull performance regression (:pr:9413) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
9309) James Bourbeau_Maintenance ^^^^^^^^^^^
9439) Pavithra Eswaramoorthy_tmpfile does not end files with period on empty extension (:pr:9429) Hendrik Makait_9432) James Bourbeau_.. _v2022.8.1:
Released on August 19, 2022
New Features ^^^^^^^^^^^^
ma.*_like functions (:pr:9378) Ruth Comer_Enhancements ^^^^^^^^^^^^
9402) Ian Rose_9302) Richard (Rick) Zamora_namedtuple (:pr:9361) Hendrik Makait_Bug Fixes ^^^^^^^^^
SeriesGroupBy cumulative functions with axis=1 (:pr:9377) Pavithra Eswaramoorthy_9342) Ian Rose_make_meta while using categorical column with index (:pr:9348) Pavithra Eswaramoorthy_DataFrame.dropna (:pr:9366) Naty Clementi_set_index handle entirely empty dataframes (:pr:8896) Julia Signell_dataclass handling in unpack_collections (:pr:9345) Hendrik Makait_9349) Ian Rose_da.min/da.max functions (:pr:9268) geraninam_Documentation ^^^^^^^^^^^^^
bind() etc. regenerate the keys (:pr:9385) crusaderky_9357) Sarah Charlotte Johnson_meta information Pavithra Eswaramoorthy_Maintenance ^^^^^^^^^^^
entry_points utility in sizeof (:pr:9390) James Bourbeau_entry_points compatibility utility (:pr:9388) Jacob Tomlinson_9372) James Bourbeau_werkzeug pin in CI (:pr:9371) James Bourbeau_dd.from_pandas and dd.from_delayed (:pr:9362) Jordan Yap_.. _v2022.8.0:
Released on August 5, 2022
Enhancements ^^^^^^^^^^^^
make_meta doesn't hold ref to data (:pr:9354) Jim Crist-Harif_divisions logic in from_pandas (:pr:9221) Richard (Rick) Zamora_9341) Julia Signell_keepdims keyword for da.average (:pr:9332) Ruth Comer_repr methods to avoid Layer materialization (:pr:9289) Richard (Rick) Zamora_Bug Fixes ^^^^^^^^^
order kwarg will not crash the astype method (:pr:9317) Genevieve Buckley_cumsum on cupy chunked dask arrays (:pr:9320) Genevieve Buckley__sample_reduce (:pr:9272) Pavithra Eswaramoorthy_meta in array serialization (:pr:9240) Frédéric BRIOL_Index.memory_usage (:pr:9290) James Bourbeau_dask.dataframe.io.from_dask_array (:pr:9282) Jordan Yap_Documentation ^^^^^^^^^^^^^
9322) Genevieve Buckley_da.from_array about how the order is not preserved (:pr:9346) Julia Signell_9326) Logan Norman_9340) Julia Signell_df and Dask ddf in dataframe-groupby.rst (:pr:9304) ivojuroro_js-yaml for yaml.js in config converter (:pr:9306) Jacob Tomlinson_Maintenance ^^^^^^^^^^^
da.linalg.solve for SciPy 1.9.0 compatibility (:pr:9350) Pavithra Eswaramoorthy_test_getitem_avoids_large_chunks_missing (:pr:9347) Pavithra Eswaramoorthy_sizeof" Doug Davis_loop_in_thread fixture in tests (:pr:9337) James Bourbeau_xfail test_solve_sym_pos (:pr:9336) Pavithra Eswaramoorthy_9329) Shaghayegh_werkzeug in CI to avoid test suite hanging (:pr:9325) James Bourbeau_cupy.angle() (:pr:9312) Peter Andreas Entschev_RAPIDS_VER to 22.10 (:pr:9314)pandas[test] to test extra (:pr:9110) Ben Beasley_bokeh and scipy to upstream CI build (:pr:9265) James Bourbeau_.. _v2022.7.1:
Released on July 22, 2022
Enhancements ^^^^^^^^^^^^
9250) Pavithra Eswaramoorthy_9068) Erik Welch_9285) Naty Clementi_Bug Fixes ^^^^^^^^^
HighLevelGraph.cull (:pr:9267) Richard (Rick) Zamora_9264) Pavithra Eswaramoorthy_max (instead of sum) for calculating warnsize (:pr:9235) Pavithra Eswaramoorthy_9252) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
partition_size (:pr:9288) Dylan Stewart_Array methods, just refer to module docs (:pr:9244) Julia Signell_9278) Pavithra Eswaramoorthy_9270) Tim Gates_9260) geraninam_Maintenance ^^^^^^^^^^^
dd.from_pandas and dd.from_delayed (:pr:9237) Michael Milton_calculate_divisions docstring (:pr:9275) Tom Augspurger_test_plot_multiple for upcoming bokeh release (:pr:9261) James Bourbeau_9255) Illviljan_.. _v2022.7.0:
Released on July 8, 2022
Enhancements ^^^^^^^^^^^^
pathlib.PurePath in normalize_token (:pr:9229) Angus Hollands_AttributeNotImplementedError for properties so IPython glob search works (:pr:9231) Erik Welch_map_overlap: multiple dataframe handling (:pr:9145) Fabien Aulaire_dask.sizeof (:pr:7688) Angus Hollands_Bug Fixes ^^^^^^^^^
TypeError: 'Serialize' object is not subscriptable when writing parquet dataset with Client(processes=False) (:pr:9015) Lucas Miguel Ponce_concat with an empty dataframe (:pr:9193) Pavithra Eswaramoorthy_Documentation ^^^^^^^^^^^^^
9234) Pavithra Eswaramoorthy_9215) Julia Signell_9217) Sarah Charlotte Johnson_Maintenance ^^^^^^^^^^^
math.prod instead of np.prod on lists, tuples, and iters (:pr:9232) crusaderky_9230) Florian Jetter_9206) crusaderky_.. _v2022.6.1:
Released on June 24, 2022
Enhancements ^^^^^^^^^^^^
9053) Ian Rose_dask.utils.show_versions (:pr:9144) Sultan Orazbayev_9201) Julia Signell_allow_rechunk kwarg to dask.array.overlap function (:pr:7776) Genevieve Buckley_dask.utils.format_time (:pr:9116) Matthew Rocklin_9175) Ian Rose_Bug Fixes ^^^^^^^^^
9213) Fabien Aulaire_9212) Fabien Aulaire_shuffle_group(): avoid converting to arrays (:pr:9157) Mads R. B. Kristensen_Deprecations ^^^^^^^^^^^^
format_time utility (:pr:9184) James Bourbeau_Documentation ^^^^^^^^^^^^^
9182) Sarah Charlotte Johnson_9194) Sarah Charlotte Johnson_str.split accessor docstring (:pr:9177) Richard Pelgrim_inconsistencies keyword to derived_from (:pr:9192) Richard Pelgrim_append in delayed best practices example (:pr:9202) Ben_9196) Sarah Charlotte Johnson_Genevieve Buckley's blog on chunk sizes (:pr:9199) Pavithra Eswaramoorthyto_csv docstring (:pr:9094) Sarah Charlotte Johnson_Maintenance ^^^^^^^^^^^
SafeConfigParser to ConfigParser (:pr:9205) Thomas A Caswell_9200) crusaderky_.. _v2022.6.0:
Released on June 10, 2022
Enhancements ^^^^^^^^^^^^
9081) Angelos Omirolis_9169) GALI PREM SAGAR_sort_results argument to assert_eq (:pr:9130) Pavithra Eswaramoorthy_parse_timedelta (:pr:9168) Matthew Rocklin_9148) Pavithra Eswaramoorthy_9140) Jim Crist-Harif__iLocIndexer / _LocIndexer (:pr:9108) Fabien Aulaire_to_parquet pyarrow (:pr:9131) Jim Crist-Harif_Bug Fixes ^^^^^^^^^
pyarrow.StringArray pickle (:pr:9170) Jim Crist-Harif_9165) Richard (Rick) Zamora_pyarrow partitioning logic (:pr:9147) James Bourbeau_pyarrow 8.0 partitioning fix (:pr:9143) James Bourbeau_Documentation ^^^^^^^^^^^^^
9178) Sarah Charlotte Johnson_9167) Sarah Charlotte Johnson_map_partition doctring (:pr:9161) Alex-JG3_9160) Sarah Charlotte Johnson_9128) Sarah Charlotte Johnson_Maintenance ^^^^^^^^^^^
9171) Matthew Rocklin_9156) Ian Rose_9154) Ian Rose_9150) Tom Augspurger_map_partitions func parameter description (:pr:9149) Christopher Akiki_xfail test_groupby_grouper_dispatch (:pr:9139) GALI PREM SAGAR_9138) James Bourbeau_9041) Richard (Rick) Zamora_.. _v2022.05.2:
Released on May 26, 2022
Enhancements ^^^^^^^^^^^^
Grouper objects and use it in GroupBy (:pr:9074) brandon-b-miller_read_parquet & to_parquet files intersect (:pr:9124) Jim Crist-Harif_ipycytoscape (:pr:9091) Ian Rose_Documentation ^^^^^^^^^^^^^
9126) Ryan Russell_Maintenance ^^^^^^^^^^^
test_filter_nonpartition_columns (:pr:9127) Pavithra Eswaramoorthy_RAPIDS_VER to 22.08 (:pr:9120)9115) Ben Beasley_.. _v2022.05.1:
Released on May 24, 2022
New Features ^^^^^^^^^^^^
DataFrame.from_dict classmethod (:pr:9017) Matthew Powers_from_map function to Dask DataFrame (:pr:8911) Richard (Rick) Zamora_Enhancements ^^^^^^^^^^^^
to_parquet error for appended divisions overlap (:pr:9102) Jim Crist-Harif_9087) ParticularMiner_align_dataframes=False option in map_partitions error (:pr:9075) Gabe Joseph_enforce_ndim to dask.array.map_blocks() (:pr:8865) ParticularMiner_Series.GroupBy.fillna / DataFrame.GroupBy.fillna methods (:pr:8869) Pavithra Eswaramoorthy_fillna with Dask DataFrame (:pr:8950) Pavithra Eswaramoorthy_9036) Pavithra Eswaramoorthy_8674) Doug Davis_pandas ArrowStringArray pickling (:pr:9024) Jim Crist-Harif_compute_as_if_collection (:pr:8998) Ian Rose_p2p shuffle option (:pr:8836) Matthew Rocklin_Bug Fixes ^^^^^^^^^
9106) Jim Crist-Harif_dtype (:pr:9100) Ian Rose_from_map (:pr:9078) Richard (Rick) Zamora_8963) Jorge López_is_monotonic methods for more than 8 partitions (:pr:9019) Julia Signell_from_map (:pr:9066) Richard (Rick) Zamora_is_dask_collection; back to previous implementation (:pr:9062) Doug Davis_Blockwise.clone does not handle iterable literal arguments correctly (:pr:8979) JSKenyon_setitem hardmask (:pr:9027) David Hassell_8997) Ian Rose_Deprecations ^^^^^^^^^^^^
read_parquet kwargs chunksize and aggregate_files (:pr:9052) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
map_partitions handling of args vs kwargs, usage of partition_info (:pr:9084) Charles Blackmon-Luca_9097) Doug Davis_9098) Sarah Charlotte Johnson_imread docstring (:pr:9082) Genevieve Buckley_9001) Matthew Rocklin_map_blocks() docstring for kwarg enforce_ndim (:pr:9071) ParticularMiner_9077) Charles Blackmon-Luca_9025) Sarah Charlotte Johnson_Maintenance ^^^^^^^^^^^
NUMPY_LICENSE.txt in license files (:pr:9113) Ben Beasley_pandas (:pr:9103) James Bourbeau_pyarrow in the upstream build (:pr:9095) Joris Van den Bossche_ensure_unicode (:pr:9059) John A Kirkham_pyarrow in the upstream build (:pr:8993) Joris Van den Bossche_is_dask_collection (:pr:9054) Doug Davis_ensure_bytes (:pr:9050) John A Kirkham_9045) James Bourbeau_codespell pre-commit hook (:pr:9040) James Bourbeau_9039) Jim Crist-Harif_test_reductions_2D (:pr:9037) Jim Crist-Harif_9031) Jim Crist-Harif_9029) Jim Crist-Harif_to_timedelta default unit (:pr:9010) Pavithra Eswaramoorthy_.. _v2022.05.0:
Released on May 2, 2022
Highlights
^^^^^^^^^^
This is a bugfix release for this issue <https://github.com/dask/distributed/issues/6255>_.
Documentation ^^^^^^^^^^^^^
9012) James Bourbeau_.. _v2022.04.2:
Released on April 29, 2022
Highlights
^^^^^^^^^^
This release includes several deprecations/breaking API changes to
dask.dataframe.read_parquet and dask.dataframe.to_parquet:
to_parquet no longer writes _metadata files by default. If you want to
write a _metadata file, you can pass in write_metadata_file=True.read_parquet now defaults to split_row_groups=False, which results in one
Dask dataframe partition per parquet file when reading in a parquet dataset.
If you're working with large parquet files you may need to set
split_row_groups=True to reduce your partition size.read_parquet no longer calculates divisions by default. If you require
read_parquet to return dataframes with known divisions, please set
calculate_divisions=True.read_parquet has deprecated the gather_statistics keyword argument.
Please use the calculate_divisions keyword argument instead.read_parquet has deprecated the require_extensions keyword argument.
Please use the parquet_file_extension keyword argument instead.New Features ^^^^^^^^^^^^
removeprefix and removesuffix as StringMethods (:pr:8912) Jorge López_Enhancements ^^^^^^^^^^^^
fs.invalidate_cache in to_parquet (:pr:8994) Jim Crist-Harif_to_parquet default to write_metadata_file=None (:pr:8988) Jim Crist-Harif_keepdims (:pr:8926) Julia Signell_split_row_groups default to False in read_parquet (:pr:8981) Richard (Rick) Zamora_NotImplementedError message for da.reshape (:pr:8987) Jim Crist-Harif_to_parquet compute path (:pr:8982) Jim Crist-Harif_vindex with a Dask object (:pr:8945) Julia Signell_pre_buffer=True when a precache method is specified (:pr:8957) Richard (Rick) Zamora_from_dask_array uses blockwise instead of merging graphs (:pr:8889) Bryan Weber_pre_buffer=True for "pyarrow" Parquet engine (:pr:8952) Richard (Rick) Zamora_Bug Fixes ^^^^^^^^^
dtype=None correctly in da.full (:pr:8954) Tom White_dask-sql bug caused by blockwise fusion (:pr:8989) Richard (Rick) Zamora_to_parquet errors for non-string column names (:pr:8990) Jim Crist-Harif_da.roll works even if shape is 0 (:pr:8925) Julia Signell_set_index (:pr:8967) Paul Hobson_BlockwiseDepDict mapping values when produces_keys=True (:pr:8972) Richard (Rick) Zamora_DataFrameIOLayer in ``DataFrame.from_delayed`` (:pr:8852) Richard (Rick) Zamora`_in predicate in read_parquet are correct (:pr:8846) Bryan Weber_8930) Tom White_dtype when deciding division using np.linspace in read_sql_query (:pr:8940) Cheun Hong_Deprecations ^^^^^^^^^^^^
gather_statistics from read_parquet (:pr:8992) Richard (Rick) Zamora_require_extension to top-level parquet_file_extension read_parquet kwarg (:pr:8935) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
write_metadata_file discussion in documentation (:pr:8995) Richard (Rick) Zamora_DataFrame.merge docstring (:pr:8966) Pavithra Eswaramoorthy_align_arrays in array.blockwise() (:pr:8977) ParticularMiner_map_block(drop_axis=...) on chunked axes of an array (:pr:8921) ParticularMiner_8956) James Bourbeau_Maintenance ^^^^^^^^^^^
8961) Ian Rose_pytest-timeout to distributed envs on CI (:pr:8986) Julia Signell_read_parquet docstring formatting (:pr:8971) Bryan Weber_pytest.warns(None) (:pr:8924) Pavithra Eswaramoorthy_8976) Eray Aslan_parse_timedelta option to enforce explicit unit (:pr:8969) crusaderky_mypy compatibility (:pr:8854) Paul Hobson_8899) Jim Crist-Harif_8933) Bryan Weber_.. _v2022.04.1:
Released on April 15, 2022
New Features ^^^^^^^^^^^^
abs, left_shift, right_shift, positive. (:pr:8920) Tom White_Enhancements ^^^^^^^^^^^^
write_metadata_file=False (:pr:8906) Richard (Rick) Zamora_dd.read_csv() (fixes #8878) (:pr:8908) Roger Filmyer_da.Array rather than dd.Series for non-ufunc elementwise functions on dd.Series (:pr:8558) Julia Signell_get_dummies use meta computation in map_partitions (:pr:8898) Julia Signell_da.from_array (:pr:8895) David Hassell_ValueError in merge_asof for duplicate kwargs (:pr:8861) Bryan Weber_Bug Fixes ^^^^^^^^^
is_monotonic work when some partitions are empty (:pr:8897) Julia Signell_da.from_array when inline_array=False (:pr:8903) Ian Rose_8859) Richard_merge_asof: drop index column if left_on == right_on (:pr:8874) Gil Forsyth_Deprecations ^^^^^^^^^^^^
engine='auto' will change in future (:pr:8907) Jim Crist-Harif_pyarrow-legacy engine from parquet API (:pr:8835) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
out for dask.array.dot (:pr:8913) Francesco Andreuzzi_DataFrame.query docstring (:pr:8890) Pavithra Eswaramoorthy_Maintenance ^^^^^^^^^^^
da.prod on large integer data (:pr:8893) Jim Crist-Harif_network marks to tests that fail without an internet connection (:pr:8881) Paul Hobson_8891) Charles Blackmon-Luca_xfail/skip some flaky distributed tests (:pr:8887) Jim Crist-Harif_ArrowDatasetEngine (:pr:8885) Richard (Rick) Zamora_8867) crusaderky_sample() (:pr:8858) Nadiem Sissouno_.. _v2022.04.0:
Released on April 1, 2022
.. note::
This is the first release with support for Python 3.10
New Features ^^^^^^^^^^^^
8566) James Bourbeau_Enhancements ^^^^^^^^^^^^
dtype.itemsize in order to produce a useful error (:pr:8860) Davide Gavio_8848) Matthew Rocklin_divisions setter (:pr:8806) Jim Crist-Harif_Blockwise and map_partitions for more tasks (:pr:8831) Bryan Weber_Bug Fixes ^^^^^^^^^
dataframe.merge_asof to preserve right_on column (:pr:8857) Sarah Charlotte Johnson_8851) Ben Greiner_SubgraphCallable getter (:pr:8827) Ian Rose_Deprecations ^^^^^^^^^^^^
8863) James Bourbeau_setuptools at runtime (:pr:8855) crusaderky_dataframe.tseries.resample.getnanos (:pr:8834) Sarah Charlotte Johnson_Documentation ^^^^^^^^^^^^^
8871) Naty Clementi_drop_axis option of map_blocks (:pr:8868) ParticularMiner_Maintenance ^^^^^^^^^^^
RAPIDS_VER to 22.06 (:pr:8828)test_parquet in http (:pr:8850) Bryan Weber_8849) Charles Blackmon-Luca_.. _v2022.03.0:
Released on March 18, 2022
New Features ^^^^^^^^^^^^
7636) Daniel Mesejo-León_ma.count to Dask array (:pr:8785) David Hassell_to_parquet default to compression="snappy" (:pr:8814) Jim Crist-Harif_weights parameter to dask.array.reduction (:pr:8805) David Hassell_ddf.compute_current_divisions to get divisions on a sorted index or column (:pr:8517) Julia Signell_Enhancements ^^^^^^^^^^^^
__name__ and __doc__ through on DelayedLeaf (:pr:8820) Leo Gao_how option (:pr:8818) Naty Clementi_Bag.map_partitions to Blockwise (:pr:8646) Richard (Rick) Zamora_8801) Jim Crist-Harif_8692) Richard (Rick) Zamora_8789) Pavithra Eswaramoorthy_8694) Julia Signell_distributed (:pr:8700) Pedro Silva_Bug Fixes ^^^^^^^^^
read_parquet (:pr:8824) Richard (Rick) Zamora_set_index when directly passed a dask Index (:pr:8680) Paul Hobson_7980) Genevieve Buckley_8809) Julia Signell_clone_key("x") to retain prefix (:pr:8792) crusaderky_read_parquet (:pr:8775) Richard (Rick) Zamora_groupby.shift bug caused by unsorted partitions after shuffle (:pr:8782) kori73_8786) Richard (Rick) Zamora_Deprecations ^^^^^^^^^^^^
8791) Charles Blackmon-Luca_bcolz support (:pr:8754) Pavithra Eswaramoorthy_map_overlap default boundary kwarg 'none' (:pr:8743) Genevieve Buckley_Documentation ^^^^^^^^^^^^^
8807) Doug Davis_Series.str, Series.dt, and Series.cat accessors to docs (:pr:8757) Sarah Charlotte Johnson_ddf.compute_current_divisions (:pr:8793) Julia Signell_8648) Naty Clementi_kwarg in repartition docstring (:pr:8781) Sarah Charlotte Johnson_8774) Jacob Tomlinson_Maintenance ^^^^^^^^^^^
pytest parallelism (:pr:8826) GALI PREM SAGAR_absolufy-imports - No relative imports - PEP8 (:pr:8796) Julia Signell_assert_eq calls in array tests (:pr:8812) Julia Signell_pytest.warns(None) (:pr:8718) LSturtew_test_describe_empty to work without global -Werror (:pr:8291) Michał Górny_8794) Jim Crist-Harif_packaging.parse for md5 compatibility (:pr:8763) James Bourbeau_tokenize work in a FIPS 140-2 environment (:pr:8762) Jim Crist-Harif_8761) Julia Signell_8302) lrjball_pull_request_target to pull_request (:pr:8767) Julia Signell_kwarg pass though to sub functions in da.assert_eq (:pr:8755) Julia Signell_.. _v2022.02.1:
Released on February 25, 2022
New Features ^^^^^^^^^^^^
first and last to dask.dataframe.pivot_table (:pr:8649) Knut Nordanger_std() support for datetime64 dtype for pandas-like objects (:pr:8523) Ben Glossner_HighLevelGraph and Layer html reprs (:pr:8589) kori73_Enhancements ^^^^^^^^^^^^
DataFrameGroupBy (:pr:8696) Bryan Weber_info() call on empty DataFrame (:pr:8727) Naty Clementi_groupby.compute as a not implemented method (:pr:8734) Dranaxel_8740) Holden Karau_bool type for Index (:pr:8732) Naty Clementi_ArrowDatasetEngine subclass to override pandas->arrow conversion also for partitioned write (:pr:8741) Joris Van den Bossche_da.diag() and da.diagonal() (:pr:8689) ParticularMiner_linspace creation to match numpy when num equal to 0 (:pr:8676) Peter_dataclasses (:pr:8557) Gabe Joseph_tokenize to treat dict and kwargs differently (:pr:8655) James Bourbeau_Bug Fixes ^^^^^^^^^
dask.array.roll() for roll-shifts that match the size of the input array (:pr:8723) ParticularMiner_normalize_function dataclass methods (:pr:8527) Sarah Charlotte Johnson_8703) ParticularMiner_sqlalchemy connection for picklability (:pr:8745) Julia Signell_Deprecations ^^^^^^^^^^^^
8572) James Bourbeau_iteritems (:pr:8660) James Bourbeau_dataframe.tseries.resample.getnanos (:pr:8752) Sarah Charlotte Johnson_8758) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
8717) James Bourbeau_dask.visualize docstring (:pr:8710) Dranaxel_8731) Jacob Tomlinson_distributed.Client.preload (:pr:8679) Bryan Weber_8595) Thomas Grainger_8748) Martin Thøgersen_dask-sphinx-theme (:pr:8751) Benjamin Zaitlen_Maintenance ^^^^^^^^^^^
coverage in CI (:pr:8690) James Bourbeau_8716) James Bourbeau_scheduler_HLG_unpack_import; flaky test (:pr:8724) Mike McCarty_scipy upstream CI build (:pr:8725) James Bourbeau_8728) Charles Blackmon-Luca_sort_values (:pr:8571) Charles Blackmon-Luca_cloudpickle and scipy in docs requirements (:pr:8737) Julia Signell_8746) Julia Signell_8432) Kristopher Overholt_8747) James Bourbeau_test_pandas_timestamp_overflow_pyarrow test (:pr:8733) Joris Van den Bossche_8756) Charles Blackmon-Luca_.. _v2022.02.0:
Released on February 11, 2022
.. note::
This is the last release with support for Python 3.7
New Features ^^^^^^^^^^^^
region to to_zarr when using existing array (:pr:8590) Chris Roat_engine_kwargs support to dask.dataframe.to_sql (:pr:8609) Amir Kadivar_include_path_column arg to read_json (:pr:8603) Bryan Weber_expand_dims to Dask array (:pr:8687) Tom White_Enhancements ^^^^^^^^^^^^
assert_eq utilities (:pr:8610) Xinrong Meng_dtype=None (:pr:8685) Tom White_axis=None (:pr:8686) Tom White_8295) crusaderky_meta (:pr:8629) Julia Signell_map_partitions (Blockwise) in to_parquet (:pr:8487) Richard (Rick) Zamora_Bug Fixes ^^^^^^^^^
8637) ParticularMiner_map_partitions in ACA code path (:pr:8643) Richard (Rick) Zamora_Deprecations ^^^^^^^^^^^^
is_monotonic (:pr:8653) James Bourbeau_8605) James Bourbeau_Documentation ^^^^^^^^^^^^^
8675) Ray Bell_8715) Julia Signell_8693) Matthias Bussonnier_8483) Genevieve Buckley_ProgressBar out parameter (:pr:8604) Pedro Silva_dask.config.set (:pr:8705) crusaderky_mypy among type checkers (:pr:8699) crusaderky_Maintenance ^^^^^^^^^^^
get_dummies tests (:pr:8651) James Bourbeau_8714) Julia Signell_8665) David Hoese_pre-commit version (:pr:8691) James Bourbeau_scipy in upstream CI build (:pr:8681) James Bourbeau_scipy < 1.8.0 in CI (:pr:8683) James Bourbeau_scipy to less than 1.8.0 in GPU CI (:pr:8698) Julia Signell_pytest.warns(None) in test_multi.py (:pr:8678) James Bourbeau_8652) James Bourbeau_test__get_paths robust to site.PREFIXES being set (:pr:8644) James Bourbeau_8642) Charles Blackmon-Luca_.. _v2022.01.1:
Released on January 28, 2022
New Features ^^^^^^^^^^^^
dask.dataframe.series.view() (:pr:8533) Pavithra Eswaramoorthy_Enhancements ^^^^^^^^^^^^
tz for fastparquet + pandas 1.4.0 (:pr:8626) Martin Durant_pandas compat (:pr:8623) Julia Signell_SQLAlchemy >= 1.4 (:pr:8158) McToel_8621) Julia Signell_meta is not a pandas object (:pr:8563) Julia Signell_fsspec.parquet module for better remote-storage read_parquet performance (:pr:8339) Richard (Rick) Zamora_8468) Richard (Rick) Zamora_DataFrameIOLayer (:pr:8453) Richard (Rick) Zamora_7417) Ian Rose_8573) James Bourbeau_optimize_graph flag to Bag.to_dataframe function (:pr:8486) Maxim Lippeveld_8498) Julia Signell_to_frame name to not pass None (:pr:8554) Julia Signell_axis=None warning (:pr:8555) Julia Signell_8531) abergou_Bug Fixes ^^^^^^^^^
groupby.cumsum with series grouped by index (:pr:8588) Julia Signell_derived_from for pandas methods (:pr:8612) Thomas J. Fan_ascending for sort_values (:pr:8440) Charles Blackmon-Luca___setitem__ indices (:pr:8601) David Hassell_8597) Doug Davis_Deprecations ^^^^^^^^^^^^
meta error in (:pr:8563) to warning (:pr:8628) Julia Signell_append when pandas >= 1.4.0 (:pr:8617) Julia Signell_Documentation ^^^^^^^^^^^^^
columns argument with meta in DataFrame constructor (:pr:8614) kori73_8602) Jacob Tomlinson_Maintenance ^^^^^^^^^^^
coverage in CI (:pr:8631) James Bourbeau_cached_cumsum imports to be from dask.utils (:pr:8606) James Bourbeau_RAPIDS_VER to 22.04 (:pr:8600)from_delayed function (:pr:8576) Kirito1397_plot_width / plot_height deprecations (:pr:8544) Bryan Van de Ven_pyyaml importorskip (:pr:8562) James Bourbeau_assert_eq (:pr:8559) Gabe Joseph_.. _v2022.01.0:
Released on January 14, 2022
New Features ^^^^^^^^^^^^
groupby.shift method (:pr:8522) kori73_DataFrame.nunique (:pr:8479) Sarah Charlotte Johnson_da.ndim to match np.ndim (:pr:8502) Julia Signell_Enhancements ^^^^^^^^^^^^
percentile interpolation= keyword warning if NumPy version >= 1.22 (:pr:8564) Julia Signell_PerformanceWarning when limit and "array.slicing.split-large-chunks" are None (:pr:8511) Julia Signell_normalize_seq function at import time (:pr:8521) Illviljan_8393) Charles Blackmon-Luca_bag.groupby (:pr:8492) Julia Signell_8472) TnTo_read_bytes (:pr:8459) Martin Durant_matmul() by completely removing concatenation (:pr:8423) ParticularMiner_8124) Genevieve Buckley_8470) Martin Durant_Bug Fixes ^^^^^^^^^
8538) David Hassell_dtype on array-likes (:pr:8501) aeisenbarth_optimize_blockwise bug for duplicate dependency names (:pr:8542) Richard (Rick) Zamora_DataFrame.GroupBy.apply and transform (:pr:8507) Sarah Charlotte Johnson_Delayed (:pr:8452) Gabe Joseph_nanmin and nanmax reductions (:pr:8484) Julia Signell_read_csv with comment kwarg work even if there is a comment in the header (:pr:8433) Julia Signell_Deprecations ^^^^^^^^^^^^
interpolation with method and method with internal_method (:pr:8525) Julia Signell_8477) James Bourbeau_Documentation ^^^^^^^^^^^^^
8520) kori73_8510) Ray Bell_8534) Aneesh Nema_8519) Deepyaman Datta_slicing.py (:pr:8512) Maren Westermann_8529) Michael Delgado_8401) Sarah Charlotte Johnson_pyarrow-only reference from split_row_groups in read_parquet docstring (:pr:8490) Naty Clementi_Maintenance ^^^^^^^^^^^
LocalFileSystem tests that fail for fsspec>=2022.1.0 (:pr:8565) Richard (Rick) Zamora_8561) crusaderky_skipna=None for DataFrame.sem (:pr:8556) Julia Signell_PANDAS_GT_140 (:pr:8552) Julia Signell___dask_layers__ (:pr:8548) crusaderky_import llvmlite (:pr:8550) crusaderky_pyyaml (:pr:8545) Gaurav Sheni_nodefaults to environments to fix tiledb + mac issue (:pr:8505) Julia Signell_setuptools (:pr:8509) Julia Signell_8469) Charles Blackmon-Luca_CUDA_VER to 11.5 (:pr:8489) Charles Blackmon-Luca_.. _v2021.12.0:
Released on December 10, 2021
New Features ^^^^^^^^^^^^
Series and Index is_monotonic* methods (:pr:8304) Daniel Mesejo-León_Enhancements ^^^^^^^^^^^^
map_partitions with partition_info (:pr:8310) Gabe Joseph_8436) Doug Davis_by instead of index internally on the Groupby class (:pr:8441) Julia Signell_sort_values (:pr:8345) Charles Blackmon-Luca_read_parquet when statistics and partitions are misaligned (:pr:8416) Richard (Rick) Zamora_where argument in ufuncs (:pr:8253) mihir_8328) JSKenyon_Bug Fixes ^^^^^^^^^
map_blocks not using own arguments in name generation (:pr:8462) David Hoese_8410) Sarah Charlotte Johnson_8400) Richard (Rick) Zamora_8413) Richard (Rick) Zamora_nanmin/nanmax (:pr:8375) Boaz Mohar_Deprecations ^^^^^^^^^^^^
token keyword argument to map_blocks (:pr:8464) James Bourbeau_map_overlap (:pr:8397) Genevieve Buckley_Documentation ^^^^^^^^^^^^^
block_info documentation (:pr:8425) Genevieve Buckley_8456) Sarah Charlotte Johnson_8370) Naty Clementi_8427) Martin Durant_dask-gateway link in ecosystem.rst (:pr:8424) ofirr_8412) Genevieve Buckley_Maintenance ^^^^^^^^^^^
8431) Bryan Van de Ven_fsspec=2021.11.1 release (:pr:8428) Martin Durant_dask/ml.py to pytest exclude list (:pr:8414) Genevieve Buckley_RAPIDS_VER to 22.02 (:pr:8394)graphviz and improve package management in environment-3.7 (:pr:8411) Julia Signell_.. _v2021.11.2:
Released on November 19, 2021
8404) Charles Blackmon-Luca_assert_eq (:pr:8396) Gabe Joseph_divisions is tuple (:pr:8389) Charles Blackmon-Luca_8379) Julia Signell_set_index partition_size parameter description (:pr:8384) FredericOdermatt_blockwise in single_partition_join (:pr:8341) Gabe Joseph_8354) Boaz Mohar_.loc of DataFrame with nullable boolean dtype (:pr:8368) Marco Rossi_8250) Ian Rose_8369) Boaz Mohar_8356) Julia Signell_8367) Julia Signell_graphviz to avoid issue with windows and Python 3.7 (:pr:8365) Julia Signell_graphviz.Diagraph from top of module, not from dot (:pr:8363) Julia Signell_.. _v2021.11.1:
Released on November 8, 2021
Patch release to update distributed dependency to version 2021.11.1.
.. _v2021.11.0:
Released on November 5, 2021
required_extension behavior in read_parquet (:pr:8351) Richard (Rick) Zamora_align_dataframes to map_partitions to broadcast a dataframe passed as an arg (:pr:6628) Julia Signell_dask.dataframe.loc (:pr:8254) Julia Signell_8332) Ian Rose_name_function option to to_parquet (:pr:7682) Matthew Powers_environment-latest.yml and update to Python 3.9 (:pr:8275) Julia Signell_s3fs in CI (:pr:8336) James Bourbeau_8176) Julia Signell_dask.visualize (:pr:7992) Erik Welch_HighLevelGraph optimizations for delayed (:pr:8316) Ian Rose_demo_tuples produces malformed HighLevelGraph (:pr:8325) crusaderky_8312) Genevieve Buckley_test_interrupt (:pr:8314) crusaderky_AxisError (:pr:8305) crusaderky_8311) Vyas Ramasubramani_8300) Ayush Dattagupta_read_parquet (:pr:8274) Richard (Rick) Zamora_dask.ml module (:pr:6384) Matthew Rocklin_8298) James Bourbeau_8248) Julia Signell_8296) Julia Signell_block property with blockview for array-like operations on blocks (:pr:8242) Davis Bennett_file_path and make it possible to save from within a notebook (:pr:8283) Julia Signell_.. _v2021.10.0:
Released on October 22, 2021
da.store to create well-formed HighLevelGraph (:pr:8261) crusaderky_pyarrow in the upstream build (:pr:8281) Joris Van den Bossche_chest (:pr:8279) James Bourbeau_8258) Genevieve Buckley_tmpdir and tmpfile context manager docstrings (:pr:8270) Daniel Mesejo-León_8276) James Bourbeau_8277) JoranDox_8244) Genevieve Buckley_8273) German Shiklov_8257) Genevieve Buckley_read_metadata in fastparquet engine (:pr:8092) Richard (Rick) Zamora_Path objects in from_zarr (:pr:8266) Samuel Gaist_8272) Julia Signell_memory_usage to True if verbose is True in info (:pr:8222) Kinshuk Dua_8238) James Bourbeau_signature (:pr:8267) James Bourbeau_8215) Charles Blackmon-Luca_DataFrame.head shouldn't warn when there's one partition (:pr:8091) Pankaj Patil_pyarrow not installed (:pr:8256) Genevieve Buckley_debugging.html redirect (:pr:8251) James Bourbeau_8225) Charles Blackmon-Luca_setup.html redirect (:pr:8249) Florian Jetter_pyupgrade in CI (:pr:8246) crusaderky_8237) James Bourbeau_8086) Suriya Senthilkumar_Array (:pr:7922) Davis Bennett_dask.multiprocessing import in docs (:pr:8240) Ray Bell__max_workers from Executor (:pr:8228) John A Kirkham_delayed best practices docs (:pr:8231) Vũ Trung Đức_7984) Julia Signell_df.quantile on all missing data (:pr:8129) Julia Signell_tokenize.ensure-deterministic config option (:pr:7413) Hristo Georgiev_inclusive rather than closed with pandas>=1.4.0 and pd.date_range (:pr:8213) Julia Signell_dask-gateway, Coiled, and Saturn-Cloud to list of Dask setup tools (:pr:7814) Kristopher Overholt_HighLevelGraph layers (:pr:8199) Jim Crist-Harif_8162) Julia Signell_read_metadata in pyarrow parquet engines (:pr:8072) Richard (Rick) Zamora_drop_axis in map_blocks and map_overlap (:pr:8192) Gregory R. Lee_8205) Julia Signell_8195) Charles Blackmon-Luca_dask.bag all, any, count methods (:pr:7630) Nathan Danielsen_8202) James Bourbeau_8200) James Bourbeau_pytest.param to properly label param-specific GPU tests (:pr:8197) Charles Blackmon-Luca_test_set_index to tests ran on gpuCI (:pr:8198) Charles Blackmon-Luca_tmpfile OSError (:pr:8191) James Bourbeau_s.isna instead of pd.isna(s) in set_partitions_pre (fix cudf CI) (:pr:8193) Charles Blackmon-Luca_test-upstream failures (:pr:8067) Wallace Reis_to_parquet bug in call to pyarrow.parquet.read_metadata (:pr:8186) Richard (Rick) Zamora_sort_values (:pr:8167) Charles Blackmon-Luca_RAPIDS_VER for gpuCI (:pr:8184) Charles Blackmon-Luca_8185) Jim Crist-Harif_8181) Ray Bell_HighLevelGraphs in DataFrame.from_delayed (:pr:8174) Gabe Joseph_inplace argument for Dask series renaming (:pr:8136) Marcel Coetzee_pandas > 1.3.0 (:pr:8150) Julia Signell_setitem on unknown chunks (:pr:8166) Julia Signell_Index.to_series (:pr:8165) Julia Signell_.. _v2021.09.1:
Released on September 21, 2021
groupby for future pandas (:pr:8151) Julia Signell_8155) Julia Signell_8157) David Hoese_datetime_is_numeric to dataframe.describe (:pr:7719) Julia Signell_pd.Int64Index in anticipation of deprecation (:pr:8144) Julia Signell_loc if needed for series __get_item__ (:pr:7953) Julia Signell_8125) Julia Signell_groupby nunique test for pandas >= 1.3.3 (:pr:8142) Julia Signell_ascending arg for sort_values (:pr:8130) Charles Blackmon-Luca_operator.getitem (:pr:8015) Naty Clementi_zero_broadcast_dimensions and homogeneous_deepmap (:pr:8134) SnkSynthesis_drop_index is negative (:pr:8064) neel iyer_scheduler to be an Executor (:pr:8112) John A Kirkham_asarray/asanyarray cases where like is a dask.Array (:pr:8128) Peter Andreas Entschev_index_col duplication if index_col is type str (:pr:7661) McToel_dtype and order to asarray and asanyarray definitions (:pr:8106) Julia Signell_dask.dataframe.Series.__contains__ (:pr:7914) Julia Signell_like-arrays in _wrapped_qr (:pr:8122) Peter Andreas Entschev_boundary_slice kwarg: kind for pandas compat (:pr:8037) Julia Signell_.. _v2021.09.0:
Released on September 3, 2021
7303) Julia Signell_FileNotFound to expected http errors (:pr:8109) Martin Durant_DataFrame.sort_values to API docs (:pr:8107) Benjamin Zaitlen_dask.order: be more eager at times (:pr:7929) Erik Welch_8090) James Bourbeau_make_people works with processes scheduler (:pr:8103) Dahn_deep param to Dataframe copy method and restrict it to False (:pr:8068) João Paulo Lacerda_8104) Robert Hales_DataFrame.query docstring (:pr:8100) James Bourbeau_sparse tests for 0.13.0 release (:pr:8102) James Bourbeau_8069) Jordan Jensen_da.unique (values only) (:pr:8021) Peter Andreas Entschev_sparse.zeros_like (xfailed) (:pr:8093) crusaderky_like kwarg support to array creation functions (:pr:8054) Peter Andreas Entschev_8079) James Bourbeau_percentile_dispatch to dask.array (:pr:8083) GALI PREM SAGAR_filepath exists in to_parquet (:pr:8057) James Bourbeau_test_scheduler_highlevel_graph_unpack_import (:pr:8080) James Bourbeau_DataFrame.shuffle to API docs (:pr:8076) Martin Fleischmann_8073) John A Kirkham_.. _v2021.08.1:
Released on August 20, 2021
ignore_metadata_file option to read_parquet (pyarrow-dataset and fastparquet support only) (:pr:8034) Richard (Rick) Zamora_pytest-xdist in dev docs (:pr:8066) Julia Signell_tz in meta from to_datetime (:pr:8000) Julia Signell_7985) Benjamin Zaitlen_assert_eq check (:pr:8061) James Bourbeau___class__ when creating DataFrames (:pr:8053) Mads R. B. Kristensen_distributed in gpuCI build (:pr:7976) James Bourbeau_signature (:pr:8049) James Bourbeau_8055) GALI PREM SAGAR_7974) Freyam Mehta_8060) Jacob Tomlinson_dask.widgets and migrate HTML reprs to jinja2 (:pr:8019) Jacob Tomlinson_wrap_func_like_safe, not required with NumPy >= 1.17 (:pr:8052) Peter Andreas Entschev_8040) David Hoese_8029) GALI PREM SAGAR_obj in groupby rather than private _selected_obj (:pr:8038) GALI PREM SAGAR_import rechunk from (:pr:8039) Illviljan_dict to store data for {nan,}arg{min,max} in certain cases (:pr:8014) Peter Andreas Entschev_blocksize description formatting in read_pandas (:pr:8047) Louis Maddox_8043) David Chudzicki_.. _v2021.08.0:
Released on August 13, 2021
to_orc delayed compute behavior (:pr:8035) Richard (Rick) Zamora_compute_as_if_collection (:pr:7969) James Bourbeau_8033) Julia Signell_distributed tests (:pr:8025) James Bourbeau_to_orc collection name (:pr:8024) James Bourbeau_skipfooter problem (:pr:7855) Ross_NotImplementedError for non-indexable arg passed to to_datetime (:pr:7989) Doug Davis_distributed (:pr:8002) James Bourbeau_dict format in to_bag accessories of DataFrame (:pr:7932) gurunath_8016) aa1371_7973) Freyam Mehta_8007) Julia Signell_8013) Peter Andreas Entschev_7756) Richard (Rick) Zamora_enforce=False (:pr:7916) Julia Signell_map_overlap trimming behavior when drop_axis is not None (:pr:7894) Gregory R. Lee_7994) Peter Andreas Entschev_Delayed in to_csv and to_parquet (:pr:7968) Matthew Rocklin_check_dtypes (:pr:7952) gurunath_pytest.warns instead of raises for checking parquet engine deprecation (:pr:7993) Joris Van den Bossche_RAPIDS_VER in gpuCI to 21.10 (:pr:7991) Charles Blackmon-Luca_pyarrow-legacy test coverage for pyarrow>=5 (:pr:7988) Richard (Rick) Zamora_pyarrow>=5 in to_parquet and read_parquet (:pr:7967) Richard (Rick) Zamora_7982) Peter Andreas Entschev_tail and head to SeriesGroupby (:pr:7935) Daniel Mesejo-León_7979) James Bourbeau_7966) Charles Blackmon-Luca_daily_stock utility (:pr:7949) James Bourbeau_distributed.nanny to configuration reference docs (:pr:7955) James Bourbeau_7939) John A Kirkham_.. _v2021.07.2:
Released on July 30, 2021
.. note::
This is the last release with support for NumPy 1.17 and pandas 0.25. Beginning with the next release, NumPy 1.18 and pandas 1.0 will be the minimum supported versions.
dask.array SVG to the HTML Repr (:pr:7886) Freyam Mehta_Delayed in to_parquet (:pr:7958) Matthew Rocklin_pyarrow<5 in CI (:pr:7960) James Bourbeau_ucx and rmm config values (:pr:7956) James Bourbeau_7865) Zhengnan Zhao_7864) Zhengnan Zhao_da.diff (:pr:7946) Peter Andreas Entschev_7931) Freyam Mehta_7942) Julia Signell_ucx and rmm changes (:pr:7943) James Bourbeau___setitem__ (:pr:7940) Peter Andreas Entschev_slice_with_int_dask_array (:pr:7927) Peter Andreas Entschev_7928) James Bourbeau_7872) Zhengnan Zhao_.. _v2021.07.1:
Released on July 23, 2021
assert_eq check dtype (:pr:7903) Julia Signell_7863) Zhengnan Zhao_7925) Matthew Rocklin_7873) Zhengnan Zhao_7917) Julia Signell_Array.__iter__ (:pr:7905) Julia Signell_7913) Julia Signell_numeric_only kwarg to DataFrame reductions (:pr:7831) Julia Signell_7876) Charles Blackmon-Luca_histogram2d in dask.array (:pr:7827) Doug Davis_7874) Zhengnan Zhao_7869) Freyam Mehta_7915) Bryan Van de Ven_fastparquet in CI (:pr:7907) James Bourbeau_dask.array import to progress bar docs (:pr:7910) Fabian Gebhart_7890) Julia Signell_pyarrow-dataset ordering bug (:pr:7902) Richard (Rick) Zamora_7892) GALI PREM SAGAR_NotImplementedError when using pd.Grouper (:pr:7857) Ruben van de Geer_aggregate_files argument to enable multi-file partitions in read_parquet (:pr:7557) Richard (Rick) Zamora_xfail test_daily_stock (:pr:7895) James Bourbeau_7837) Naty Clementi_7820) Elliott Sales de Andrade_merge_asof (:pr:7842) gerrymanoim_.. _v2021.07.0:
Released on July 9, 2021
fastparquet in upstream CI build (:pr:7884) James Bourbeau_7849) Mads R. B. Kristensen_fastparquet now supports new time types, including ns precision (:pr:7880) Martin Durant_ParquetDataset API when appending in ArrowDatasetEngine (:pr:7544) Richard (Rick) Zamora_test_shuffle_priority (:pr:7879) Richard (Rick) Zamora_7878) James Bourbeau_dask.distributed imports (:pr:7866) Matthew Rocklin_7856) Genevieve Buckley_7875) Martin Durant_da.eye fix for chunks=-1 (:pr:7854) Naty Clementi_test_daily_stock (:pr:7858) James Bourbeau_SimpleShuffleLayer (:pr:7846) Richard (Rick) Zamora_7838) Mads R. B. Kristensen_@guvectorize (:pr:6863) Julia Signell_7834) Florian Jetter_7841) Julia Signell_datetime.date (:pr:7836) James Bourbeau_sample_rows to read_csv-like (:pr:7825) Martin Durant_config.deserialize docstring (:pr:7830) Geoffrey Lentner_test_dataframe_picklable (:pr:7822) James Bourbeau_histogramdd (for handling inputs that are sequences-of-arrays). (:pr:7634) Doug Davis_PY_VERSION private (:pr:7824) James Bourbeau_.. _v2021.06.2:
Released on June 22, 2021
layers.py compare parts_out with set(self.parts_out) (:pr:7787) Genevieve Buckley_check_meta understand pandas dtypes better (:pr:7813) Julia Signell_7818) James Bourbeau_.. _v2021.06.1:
Released on June 18, 2021
7817) James Bourbeau_7810) James Bourbeau_dtype= (:pr:7808) Doug Davis_7811) Kristopher Overholt_Layer & HighLevelGraph (:pr:7812) Genevieve Buckley_7809) Jacob Tomlinson_7801) Elliott Sales de Andrade_HighLevelGraph layers (:pr:7763) Genevieve Buckley_blockwise token to avoid DataFrame column name clash (:pr:6546) James Bourbeau_concat for merge_asof (:pr:7806) Julia Signell_7795) Julia Signell_7796) James Bourbeau_7802) Elliott Sales de Andrade_7804) Elliott Sales de Andrade_7799) James Bourbeau_ImportError catching from dask/__init__.py (:pr:7797) James Bourbeau_DataFrame.join() to take a list of DataFrames to merge with (:pr:7578) Krishan Bhasin_dask.array.linspace (:pr:7667) Daniel Mesejo-León_7794) Julia Signell_da.select() implementation and test (:pr:7760) Gabriel Miretti_get_output_keys method (:pr:7790) Genevieve Buckley_freq in divisions (:pr:7785) Julia Signell_HighLevelGraph abstract layer for map_overlap (:pr:7595) Genevieve Buckley_drop (:pr:7784) Julia Signell_7782) Julia Signell_add_(prefix|suffix) to DataFrame and Series (:pr:7745) tsuga_read_hdf to Blockwise (:pr:7625) Richard (Rick) Zamora_Layer.get_output_keys officially an abstract method (:pr:7775) Genevieve Buckley_ravel_multi_index (:pr:7594) Gabe Joseph_7773) Martin Durant_.visualize() with filename=None (:pr:7740) Freyam Mehta_SubgraphCallable (:pr:7637) Bruce Merry_fsspec to 2021.5.0 in CI (:pr:7771) James Bourbeau_from_delayed (:pr:7769) Florian Jetter_meta support for DatetimeTZDtype (:pr:7627) gerrymanoim_7701) James Bourbeau_7752) Julia Signell_.. _v2021.06.0:
Released on June 4, 2021
rewrite_blockwise (:pr:7721) Richard (Rick) Zamora_project_columns (:pr:7761) Richard (Rick) Zamora_7741) Boaz Mohar_to_zarr (:pr:7738) Chris Roatapply_gufunc (:pr:7669) Gabe Joseph_da.fromfunction with da.blockwise (:pr:7704) John A Kirkham_make_meta_util to make_meta (:pr:7743) GALI PREM SAGAR_7715) Vibhu Jawa_7734) Mads R. B. Kristensen_apply_gufunc (:pr:7744) Boaz Mohar_7735) Genevieve Buckley_sizeof sets in Python 3.9 (:pr:7739) Mads R. B. Kristensen_dataframe.__getitem__ (:pr:7749) Julia Signell_client.dashboard_link (:pr:7747) Genevieve Buckley_7733) Genevieve Buckley_7716) Genevieve Buckley_autofunction for unify_chunks in API docs (:pr:7730) James Bourbeau_.. _v2021.05.1:
Released on May 28, 2021
7712) Julia Signell_optimize_dataframe_getitem bug (:pr:7698) Richard (Rick) Zamora_make_meta import in docs (:pr:7713) Benjamin Zaitlen_da.searchsorted (:pr:7696) Tom White_7706) Jiaming Yuan_read_sql_table returning wrong result for single column loads (:pr:7572) c-thiel_support.rst (:pr:7679) Naty Clementi_7700) James Bourbeau_object (:pr:7586) GALI PREM SAGAR_union_categoricals (:pr:7699) GALI PREM SAGAR_Dispatch objects (:pr:7505) James Bourbeau_dispatch.registers to their own file (:pr:7503) Julia Signell_dataclasses where init=False (:pr:7656) Julia Signell_divisions (:pr:7605) Julia Signell_7562) Chris Roat_7694) Genevieve Buckley_DataFrame.set_index() (:pr:7691) James Lamb_7684) David Hoese_axis tuple for flip to be consistent with NumPy (:pr:7675) Andrew Champion_pre-commit hook versions (:pr:7676) James Bourbeau_to_zarr docstring (:pr:7683) David Hoese_read_orc (:pr:7678) Justus Magin_ipyparallel & mpi4py concurrent.futures (:pr:7665) John A Kirkham_7671) Peter Andreas Entschev_HighLevelGraph documentation inaccuracies (:pr:7662) Mads R. B. Kristensen_getitem error message (:pr:7659) Maisie Marshall_.. _v2021.05.0:
Released on May 14, 2021
kind kwarg to comply with pandas 1.3.0 (:pr:7653) Julia Signell_7645) Richard (Rick) Zamora_7565) Mads R. B. Kristensen_inplace= in pandas set_categories (:pr:7633) James Bourbeau_False for Dask-Dataframe (:pr:7620) Richard (Rick) Zamora_RandomState (:pr:7487) Gabe Joseph_str.concat when others=None (:pr:7623) Daniel Mesejo-León_dask.dataframe in sandboxed environments (:pr:7601) Noah D. Brenowitz_cupyx.scipy.linalg (:pr:7563) Benjamin Zaitlen_timeseries and daily-stock to Blockwise (:pr:7615) Richard (Rick) Zamora_7617) Richard (Rick) Zamora_Blockwise for DataFrame IO (parquet, csv, and orc) (:pr:7415) Richard (Rick) Zamora_HighLevelGraph s (:pr:7309) Genevieve Buckley_pyarrow sphinx intersphinx_mapping (:pr:7612) Ray Bell_7608) Julia Signell_read_parquet parameters (:pr:7567) Ray Bell_ignore_abc_warning (:pr:7606) Julia Signell_7575) Richard (Rick) Zamora_ignore_abc decorator (:pr:7604) Julia Signell_7597) Julia Signell_loky example (:pr:7590) Naty Clementi_nout when arguments become tasks (:pr:7593) Gabe Joseph_7602) James Bourbeau_7541) Richard (Rick) Zamora_.. _v2021.04.1:
Released on April 23, 2021
Blockwise HLG pack/unpack for concatenate=True (:pr:7455) Richard (Rick) Zamora_map_partitions: use tokenized info as name of the SubgraphCallable (:pr:7524) Mads R. B. Kristensen_tmp_path and tmpdir to avoid temporary files and directories hanging in the repo (:pr:7592) Naty Clementi_7591) Naty Clementi_7588) James Bourbeau_7508) Gabe Joseph_numpydoc (:pr:7569) Matthias Bussonnier_level= keyword deprecation (:pr:7577) James Bourbeau_.repartition(freq="M") to .repartition(freq="MS") (:pr:7504) Ruben van de Geer_7128) Elliott Sales de Andrade_to_parquet (:pr:7564) Ray Bell_7561) Julia Signell_ValueError in len(index_names) > 1 explicit it's using fastparquet (:pr:7556) Ray Bell_dict-column appending for pyarrow parquet engines (:pr:7527) Richard (Rick) Zamora_7560) Doug Davis_dask.delayed.Delayed to docs so it can be referenced by other sphinx docs (:pr:7559) Doug Davis_idxmaxmin for uneven split_every (:pr:7538) Julia Signell_normalize_token for pandas Series/DataFrame future proof (no direct block access) (:pr:7318) Joris Van den Bossche___setitem__ implementation (:pr:7393) David Hassell_histogram, histogramdd improvements (docs; return consistencies) (:pr:7520) Doug Davis_pyarrow in the upstream build (:pr:7530) Joris Van den Bossche_7533) Benjamin Zaitlen_.to_parquet on dask.dataframe in doc string (:pr:7528) Ray Bell_msgpack serialization of HLGs (:pr:7525) Mads R. B. Kristensen_yaml.safe_load() in configuration doc (:pr:7529) Hristo Georgiev_reshape bug. Add relevant test. Fixes #7171. (:pr:7523) JSKenyon_custom_metadata= argument in to_parquet (:pr:7359) Richard (Rick) Zamora_7518) Daniel Mesejo-León_7426) Julia Signell_product (alias of prod) (:pr:7517) Freyam Mehta___array_ufunc__ tests (:pr:7494) Julia Signell_map_overlap to map_blocks if depth is zero (:pr:7481) Genevieve Buckley_check_type to array assert_eq (:pr:7491) Julia Signell_.. _v2021.04.0:
Released on April 2, 2021
dask.array.histogramdd (:pr:7387) Doug Davis_LocalCluster (:pr:7497) cameron16_7506) Julia Signell_ignore_order from kwargs (:pr:7500) GALI PREM SAGAR_7498) Matthew Rocklin_isort (:pr:7370) Julia Signell_ignore_order parameter in dd.concat (:pr:7473) Daniel Mesejo-León_7484) crusaderky_7485) Tom Augspurger_7227) crusaderky_7478) James Lamb_concurrent.futures in local scheduler (:pr:6322) John A Kirkham_.. _v2021.03.1:
Released on March 26, 2021
is_categorical_dtype to handle non-pandas objects (:pr:7469) brandon-b-miller_multiprocessing.Pool in test_read_text (:pr:7472) John A Kirkham_meta kwarg to gufunc class (:pr:7423) Peter Andreas Entschev_7380) Dieter Weber_xfail pandas and fastparquet failures (:pr:7441) Julia Signell_7357) Ruben van de Geer___array_function__ dispatching for tril/triu (:pr:7457) Peter Andreas Entschev_concurrent.futures.Executors in a few tests (:pr:7429) John A Kirkham_7383) crusaderky_sort_values housekeeping (:pr:7462) Ryan Williams_7249) Ryan Williams_test_config.py (:pr:7464) Hristo Georgiev_7460) Gabe Joseph_rot90 (:pr:7440) Trevor Manz_7454) Nick Vazquez_slice_array docstring (:pr:7453) Gabe Joseph_dask.utils.is_arraylike docstring (:pr:7445) Doug Davis_BlockwiseIODeps importing (:pr:7420) Richard (Rick) Zamora_7430) James Bourbeau_test_describe_empty (:pr:7431) John A Kirkham_Series.dot method to dataframe module (:pr:7236) Madhu94_kurtosis-method and testing (:pr:7273) Jan Borchmann_7403) Bruce Merry_sparse test (:pr:7421) James Bourbeau_7422) James Bourbeau_7418) Julia Signell_7419) Julia Signell_value_counts (:pr:7342) Julia Signell_7381) Richard (Rick) Zamora_7391) Genevieve Buckley_sliding_window_view (:pr:7234) Deepak Cherian_docs/source/develop.rst (:pr:7414) Hristo Georgiev_7397) James Bourbeau_sort_values to dask.DataFrame (:pr:7286) gerrymanoim_sqlalchemy<1.4.0 in CI (:pr:7405) James Bourbeau_7215) Ryan Williams_7388) Ryan Williams_pa.Table.from_pandas calls (:pr:7347) Richard (Rick) Zamora_'container' with 'image' (:pr:7389) James Lamb_7394) Ray Bell_fsspec in bag.read_text (:pr:7349) Martin Durant_read_hdf default mode to "r" (:pr:7039) rs9w33_SubgraphCallable when packing Blockwise (:pr:7353) Mads R. B. Kristensen_test_hdf.py to not reuse file handlers (:pr:7044) rs9w33_7345) Julia Signell_Blockwise + IO infrastructure (:pr:7281) Richard (Rick) Zamora_test_slicing.py (:pr:7365) Hristo Georgiev_7360) Julia Signell_7364) Peter Andreas Entschev_7348) James Bourbeau_dask.array.asarray should handle case where xarray class is in top-level namespace (:pr:7335) Tom White_HighLevelGraph length without materializing layers (:pr:7274) Gabe Joseph_7006) James Bourbeau_create_metadata_file (:pr:7295) Richard (Rick) Zamora_7198) Julia Signell_7338) James Bourbeau_7336) Eoin Shanaghy_7329) James Bourbeau_pytest.register_assert_rewrite on util modules (:pr:7278) Bruce Merry_from_array() (:pr:7330) James Lamb_7247) Julia Signell_.. _v2021.03.0:
Released on March 5, 2021
.. note::
This is the first release with support for Python 3.9 and the
last release with support for Python 3.6
distributed (:pr:7328) James Bourbeau_percentiles_summary with dask_cudf (:pr:7325) Peter Andreas Entschev_Array.__setitem__ updates (:pr:7326) James Bourbeau_Blockwise.clone (:pr:7312) crusaderky_7321) James Bourbeau_.name for array (:pr:7222) Julia Signell_7305) Kyle Barron_exp with CuPy arrays (:pr:7322) John A Kirkham_7277) Bruce Merry_pytest.mark.flaky (:pr:7319) crusaderky_7308) Genevieve Buckley_7289) crusaderky_7143) Richard (Rick) Zamora_split_every to graph_manipulation (:pr:7282) crusaderky_7306) Julius Busecke_dask.graph_manipulation support for xarray.Dataset (:pr:7276) crusaderky_7297) James Bourbeau_tri, triu_indices, triu_indices_from, tril_indices, tril_indices_from (:pr:6997) Illviljan_7260) Sinclair Target_distributed in CI (:pr:7279) James Bourbeau_7179) Mads R. B. Kristensen_merge_percentiles (:pr:7172) Ashwin Srinath_dask-sql and fugue (:pr:7129) Ray Bell_7085) McToel_bincount (:pr:7183) Thomas J. Fan_name in from_array (:pr:7264) Bruce Merry_cumsum for empty partitions (:pr:7230) Julia Signell_map_blocks example to dask array creation docs (:pr:7221) Julia Signell_dask.graph_manipulation.wait_on() (:pr:7258) crusaderky_7246) crusaderky_black rev in pre-commit (:pr:7256) Julia Signell_array-chunks.rst (:pr:7254) Magnus Nord_Blockwise and ShuffleLayer (:pr:7213) Richard (Rick) Zamora_"pyarrow-dataset" with pyarrow-3.0.0 (:pr:7200) Richard (Rick) Zamora_graph_manipulation without NumPy (:pr:7243) crusaderky_6738) Peter Andreas Entschev_7240) James Bourbeau_7238) Julia Signell_7196) crusaderky_dask.array.delete (:pr:7125) Julia Signell_7235) Julia Signell_7211) crusaderky_map_overlap: Don't rechunk axes without overlap (:pr:7233) Deepak Cherian_7232) Julia Signell_html_css_files in docs for custom CSS (:pr:7220) James Bourbeau_clone, bind, checkpoint, wait_on (:pr:7109) crusaderky_pyarrow-dataset engine (:pr:7186) Joris Van den Bossche___setitem__ to more closely match numpy (:pr:7033) David Hassell_7195) crusaderky_Delayed._length (:pr:7194) crusaderky___dask_layers__() tests and tweaks (:pr:7177) crusaderky_HighLevelGraph in multiprocessing scheduler (:pr:7191) Jim Crist-Harif_7188) James Bourbeau_.. _v2021.02.0:
Released on February 5, 2021
percentile support for NEP-35 (:pr:7162) Peter Andreas Entschev_Float64 in column assignment (:pr:7173) Nils Braun_7127) Davis Bennett_6896) Julia Signell_HighLevelGraph Mapping API (:pr:7160) crusaderky_7163) James Bourbeau_7142) crusaderky_7130) Ray Bell_dask.array.append (:pr:7146) D-Stacks_dask.array.ravel to accept array_like argument (:pr:7138) D-Stacks_7152) Thomas J. Fan_blockwise for an outer product (:pr:7119) Bruce Merry_HighlevelGraph.dicts in favor of .layers (:pr:7145) Amit Kumar_FastParquetEngine with pyarrow engines (:pr:7091) Richard (Rick) Zamora_7102) Ian Rose_read_parquet (:pr:7066) Richard (Rick) Zamora_check_meta(): use __class__ when checking DataFrame types (:pr:7099) Mads R. B. Kristensen_7104) Illviljan_getitem optimization (:pr:7106) Richard (Rick) Zamora_7103) James Bourbeau_.. _v2021.01.1:
Released on January 22, 2021
cumprod (:pr:7089) Julia Signell_6996) Joris Van den Bossche_SettingWithCopyWarning (:pr:7092) Julia Signell_'mode' argument passed to bokeh.output_file() (:pr:7034) (:pr:7075) patquem_groupby.value_counts (:pr:7073) Julia Signell_assert_eq() (:pr:7083) James Lamb_7077) Illviljan_.. _v2021.01.0:
Released on January 15, 2021
map_partitions with review comments (:pr:6776) Kumar Bharath Prabhu_population is a real list (:pr:7027) Julia Signell_storage_options in read_csv (:pr:7074) Richard (Rick) Zamora_BlockwiseIO code (:pr:7067) Richard (Rick) Zamora_7069) James Bourbeau_reshape (:pr:6753) Tom Augspurger_linalg.lstsq for complex inputs (:pr:7056) Johnnie Gray_compression='infer' default to read_csv (:pr:6960) Richard (Rick) Zamora_svd_compressed #7003 (:pr:7004) Eric Czech_7064) Martin Durant_BlockwiseIO (:pr:7048) Richard (Rick) Zamora_DataFrame.to_bag() and Series.to_bag() (:pr:7049) Rob Malouf_matmul as blockwise without contraction/concatenate (:pr:7000) Rafal Wojdyla_functools.cached_property in da.shape (:pr:7023) Illviljan_non_empty (:pr:6976) Julia Signell_7002)" (:pr:7014) Rafal Wojdyla_python-graphviz pinning (:pr:7037) Julia Signell_7038) Julia Signell_dropna and observed in agg (:pr:6992) Julia Signell_meta after .str.split with expand (:pr:7026) Ruben van de Geer_7030) Joris Van den Bossche_python-graphviz in CI (:pr:7031) James Bourbeau_numpydoc (:pr:7013) Matthias Bussonnier_7016) Matthew Rocklin_7002) Rafal Wojdyla_6998) Matthias Bussonnier_inline_array option to from_array (:pr:6773) Tom Augspurger_6931)" (:pr:6995) James Bourbeau`_npartitions in set_index (:pr:6978) Julia Signell_config serialization and inheritance (:pr:6987) Jacob Tomlinson_test_minimum_time (:pr:6988) Martin Durant_dtype inference for read_parquet (:pr:6985) Richard (Rick) Zamora_set_index with sorted=True (:pr:6980) Richard (Rick) Zamora_read_parquet for handling un-named indices with index=False (:pr:6969) Richard (Rick) Zamora___class__ when comparing meta data (:pr:6981) Mads R. B. Kristensen_6979) Rafal Wojdyla_6925 (:pr:6982) sdementen_6931) Ian Rose_has_parallel_type() (:pr:6927) Mads R. B. Kristensen_BlockwiseIO (:pr:6934) Simon Perkins_yield_fixture in test_sql.py (:pr:6968) Richard (Rick) Zamora_BlockwiseIO (:pr:6933) Richard (Rick) Zamora_None (:pr:6862) Jacob Tomlinson_from_pandas docstring (:pr:6957) Richard (Rick) Zamora_fuse_roots from clobbering annotations (:pr:6955) Simon Perkins_.. _v2020.12.0:
Released on December 10, 2020
Highlights ^^^^^^^^^^
CalVer <https://calver.org/>_ for versioning scheme.HighLevelGraph to enable sending high-level representations of
task graphs to the distributed scheduler.HighLevelGraph layer objects including BasicLayer, Blockwise,
BlockwiseIO, ShuffleLayer, and more.Layer-level annotations like priority, retries,
etc. with the dask.annotations context manager.pyarrow.dataset API to read_parquet.All changes ^^^^^^^^^^^
observed kwarg optional (:pr:6952) Julia Signell_6895) Julia Signell_6949) Julia Signell_read_parquet (:pr:6918) Richard (Rick) Zamora_observed keyword to groupby (:pr:6854) Julia Signell_include_path_column works when there are multiple partitions per file (:pr:6911) Julia Signell_array.overlap and array.map_overlap block sizes are incorrect when depth is an unsigned bit type (:pr:6909) GFleishman_6946) Mark_Bag from sample (:pr:6941) Shang Wang_ravel_multi_index (:pr:6939) Illviljan_6921) Richard (Rick) Zamora__file in progressbar if it is None (:pr:6938) Mark Harfouche_6932) James Bourbeau_BlockwiseIO layer (:pr:6878) Richard (Rick) Zamora_Layer Annotations to Scheduler (:pr:6889) Simon Perkins_6926) Timost_pyarrow >2.0.0 (:pr:6772) Richard (Rick) Zamora_pyarrow.dataset API for read_parquet (:pr:6534) Richard (Rick) Zamora_da.coarsen when coarsening factors do not divide shape (:pr:6908) Davis Bennett_dask/dask not forks (:pr:6905) Jacob Tomlinson_annotations to ShuffleLayers (:pr:6913) Matthew Rocklin_test_from_s3 (:pr:6915) James Bourbeau_skew method (:pr:6881) Jan Borchmann_dtype in array meta (:pr:6893) Julia Signell_name arg in helm install ... (:pr:6903) Ruben van de Geer_6901) Martin Durant_cupyx sparse to dask.array.dot (:pr:6846) Akira Naruse_6894) Julia Signell_6888) Julia Signell_ArrowEngine bug in use of clear_known_categories (:pr:6887) Richard (Rick) Zamora_6879) Zhengnan Zhao_6883) Jacob Tomlinson_set_index issue (:pr:6866) Richard (Rick) Zamora_BasicLayer: remove dependency arguments (:pr:6859) Mads R. B. Kristensen_Blockwise (:pr:6848) Mads R. B. Kristensen_columns=[] bug (:pr:6871) Richard (Rick) Zamora_6841) Richard (Rick) Zamora_create_metadata_file utility for existing parquet datasets (:pr:6851) Richard (Rick) Zamora_6779) Tom Augspurger_6852) Mads R. B. Kristensen_overwrite=True to to_parquet to remove dangling files when overwriting a pyarrow Dataset. (:pr:6825) Greg Hayes_map_tasks() and map_basic_layers() (:pr:6853) Mads R. B. Kristensen_svd_compressed (:pr:6813) RogerMoens___dask_distributed_pack__() now takes a client argument (:pr:6850) Mads R. B. Kristensen_map_partitions instead of delayed in set_index (:pr:6837) Mads R. B. Kristensen_as_completed().update(futures) (:pr:6817) manuels_setup-miniconda version (:pr:6847) Jacob Tomlinson_6829) Rockwell Weiner_6799) RogerMoens_6794) Jacob Tomlinson_currentmodule usage (:pr:6839) James Bourbeau_6838) James Bourbeau_Blockwise culling (:pr:6815) Richard (Rick) Zamora_6834) Devanshu Desai_HighLevelGraph.merge in collections_to_dsk (:pr:6836) Mads R. B. Kristensen_dtype in svd compression_matrix #2849 (:pr:6802) RogerMoens_6818) Julia Signell_6821) Rockwell Weiner_6823) Martin Durant_DataFrame.join doesn't accept Series as other (:pr:6809) David Katz_to_delayed operations from to_parquet (:pr:6801) Richard (Rick) Zamora_6806) Simon Perkins_6780) Martin Durant_6708) Julia Signell_6767) Simon Perkins_6793) manuels_Blockwise Layers (:pr:6715) Richard (Rick) Zamora_6786) Mads R. B. Kristensen_6789) Stephannie Jimenez Gacha_6778) Martin Durant_get_all_external_keys() (:pr:6774) Mads R. B. Kristensen_chunksize=1 (:pr:6748) Tom Augspurger_6205) Julia Signell_array-slice.rst (:pr:6771) Magnus Nord_6741) Callum Noble_meta kwarg in map_blocks and map_overlap. (:pr:6763) Peter Andreas Entschev_cumsum and cumprod (:pr:6675) Erik Welch_6764) Illviljan_6760) James Bourbeau_6751) Mads R. B. Kristensen_pyarrow<2 in CI (:pr:6759) James Bourbeau_min/max reductions (:pr:6736) Peter Andreas Entschev_da.linalg.lstsq - mirroring numpy (:pr:6749) Pascal Bourgault_6752) Tom Augspurger_6693) Mads R. B. Kristensen_attrs property to Series/Dataframe (:pr:6742) Illviljan_6747) Mads R. B. Kristensen_ArrowEngine to allow more easy subclass for writing (:pr:6505) Joris Van den Bossche_ShuffleStage HLG Layer (:pr:6650) Richard (Rick) Zamora_meta_from_array (:pr:6731) Peter Andreas Entschev_6735) Chris Roat_DataFrame.set_index (:pr:6739) Gil Forsyth_HighLevelGraph layers always contain Layer instances (:pr:6716) James Bourbeau_HighLevelGraph Layers (:pr:6689) Mads R. B. Kristensen_*_like function calls and CuPy tests (:pr:6728) Peter Andreas Entschev_svd with __array_function__ (:pr:6727) Peter Andreas Entschev_6397) Jim Circadian_6724) John A Kirkham_5628) Matthew Rocklin_az (:pr:6719) Ray Bell_get_dependencies() of single keys (:pr:6699) Mads R. B. Kristensen_6510)" (:pr:6697)" (:pr:6707) Tom Augspurger_*_like array creation functions to respect input array type (:pr:6680) Genevieve Buckley_dask-sphinx-theme version (:pr:6700) Gil Forsyth_.. _v2.30.0 / 2020-10-06:
Array ^^^^^
rechunk to evenly split into N chunks (:pr:6420) Scott Sievert_.. _v2.29.0 / 2020-10-02:
Array ^^^^^
_repr_html_: color sides darker instead of drawing all the lines (:pr:6683) Julia Signell_nanstd and nanvar (:pr:6667) Thomas J. Fan_map_overlap (:pr:6682) Julia Signell_np.searchsorted with bisect in indexing (:pr:6669) Joachim B Haga_Bag ^^^
groupby (:pr:6660) Itamar Turner-Trauring_Core ^^^^
HighLevelGraph layers everywhere in collections (:pr:6510)" (:pr:6697) Tom Augspurger_pandas.testing (:pr:6687) John A Kirkham_6676) Elliott Sales de Andrade_DataFrame ^^^^^^^^^
6608) Julia Signell_Documentation ^^^^^^^^^^^^^
6692) garanews_6678) Pav A_.. _v2.28.0 / 2020-09-25:
Array ^^^^^
Array indexing that produces large changes.
This restores the behavior from Dask 2.25.0 and earlier, with a warning
when large chunks are produced. A configuration option is provided
to avoid creating the large chunks, see :ref:array.slicing.efficiency.
(:pr:6665) Tom Augspurger_meta to to_dask_array (:pr:6651) Kyle Nicholson_6631 and :pr:6611 (:pr:6632) Rafal Wojdyla_6629) Daniel Saxton_v_based flag for svd_flip (:pr:6658) Eric Czech_mean (:pr:6656) Sam Grayson_Core ^^^^
dsk equality check from SubgraphCallable.__eq__ (:pr:6666) Mads R. B. Kristensen_HighLevelGraph layers everywhere in collections (:pr:6510) Mads R. B. Kristensen_SubgraphCallable for caching purposes (:pr:6424) Andrew Fulton_6647) Matthew Rocklin_DataFrame ^^^^^^^^^
agg API (:pr:6655) Madhur Tandon_6657) Julia Signell_.. _v2.27.0 / 2020-09-18:
Array ^^^^^
dtype in svd (:pr:6643) Eric Czech_Core ^^^^
store(): create a single HLG layer (:pr:6601) Mads R. B. Kristensen_6645) James Bourbeau_.pre-commit-config to latest black. (:pr:6641) Julia Signell_6630) Poruri Sai Rahul_6633) Poruri Sai Rahul_DataFrame ^^^^^^^^^
to_sql (:pr:6638) Julia Signell_6626) Julia Signell_Documentation ^^^^^^^^^^^^^
autofunction to array api docs for more ufuncs (:pr:6644) James Bourbeau_dask.array docs (:pr:6642) Ralf Gommers_HelmCluster docs (:pr:6290) Jacob Tomlinson_.. _v2.26.0 / 2020-09-11:
Array ^^^^^
6623) Eric Czech_array.reduction docstring match for dtype (:pr:6624) Martin Durant_svd_compressed using rows and cols (:pr:6622) Eric Czech_6616) Eric Czech_svd_flip #6599 (:pr:6613) Eric Czech_6595) Gabe Joseph_getitem with lists (:pr:6514) Tom Augspurger_from_array (:pr:6605) Deepak Cherian_6594) Noah D. Brenowitz_6591) Eric Czech_6393) Jon Thielen_6580) Deepak Cherian_6578) Ryan Williams_Core ^^^^
HighLevelGraph dependencies (:pr:6588) Mads R. B. Kristensen_6598) Tom Augspurger_bokeh version 2.0.0 (:pr:6572) John A Kirkham_DataFrame ^^^^^^^^^
6585) McToel_min_count in Series.sum / prod (:pr:6618) Daniel Saxton_DataFrame.set_index docstring (:pr:6549) Timost_6564) Erik Welch_6573) Abdulelah Bin Mahfoodh_Documentation ^^^^^^^^^^^^^
6215) Kilian Lieret_extraConfig example (:pr:6625) Tom Augspurger_6609) Julia Signell_6560) Tom Augspurger_.. _v2.25.0 / 2020-08-28:
Core ^^^^
subs() (:pr:6559) Mads R. B. Kristensen_black release (:pr:6568) James Bourbeau_6554) Tom Augspurger_DataFrame ^^^^^^^^^
read_parquet example (:pr:6548) Ray Bell_Documentation ^^^^^^^^^^^^^
6558) James Bourbeau_kubernetes-helm.rst (:pr:6523) David Sheldon_6547) Tom Augspurger_.. _v2.24.0 / 2020-08-22:
Array ^^^^^
6518) Elliott Sales de Andrade_6521) joshreback_cupy.sparse with cupyx.scipy.sparse (:pr:6530) John A Kirkham_Dataframe ^^^^^^^^^
6502) Julia Signell_6515) Tom Augspurger_) (:pr:6490) Richard (Rick) Zamora`_6524) Martin Durant_filter arguments in ArrowEngine (:pr:6527) Richard (Rick) Zamora_6536) Richard (Rick) Zamora_Core ^^^^
6517) Thomas J. Fan_6529) Mads R. B. Kristensen_6528) Martin Durant_.. _v2.23.0 / 2020-08-14:
Array ^^^^^
np.zeros, ones, and full array size with broadcasting (:pr:6491) Matthias Bussonnier_meta= for trim in map_overlap (:pr:6494) Peter Andreas Entschev_Bag ^^^
6371) joshreback_Core ^^^^
Scalar.__dask_layers__() to return self._name instead of self.key (:pr:6507) Mads R. B. Kristensen_fuse_root optimization (:pr:6508) Mads R. B. Kristensen_DataFrame ^^^^^^^^^
items to dataframe (:pr:6503) Thomas J. Fan_write_table call (:pr:6499) Julia Signell_nonempty_series (:pr:6485) Tom Augspurger_6479) Matthew Rocklin_mkdirs (:pr:6475) Julia Signell_to_parquet (:pr:6451) michaelnarodovitch_Documentation ^^^^^^^^^^^^^
da.histogram (:pr:6439) Roberto Panai_agg nunique example (:pr:6404) Ray Bell_6489) Mike McCarty_6453) Martin Durant_.. _v2.22.0 / 2020-07-31:
Array ^^^^^
6430) Tom Augspurger_Core ^^^^
sizeof for some bytes-like objects (:pr:6457) John A Kirkham_fsspec (:pr:6446) Martin Durant_RecursionError is raised, return uuid from tokenize function (:pr:6437) Julia Signell_6431) Tom Augspurger_setup.cfg (:pr:6426) Zhengnan Zhao_DataFrame ^^^^^^^^^
6471) Gil Forsyth_ArrowEngine for better read_parquet performance (:pr:6346) Richard (Rick) Zamora_tolist dispatch (:pr:6444) GALI PREM SAGAR_6429) Tom Augspurger_6428) joshreback_to_csv docstring (:pr:6411) Jun Han (Johnson) Ooi_Documentation ^^^^^^^^^^^^^
6472) Jacob Tomlinson_6466) Scott Sievert_6403) Jim Circadian_6449) Scott Sievert_6436) Jack Xiaosong Xu_.. _v2.21.0 / 2020-07-17:
Array ^^^^^
array.routines.gradient() (:pr:6417) johnomotani_dimension=1 (:pr:6342) Matthias Bussonnier_Bag ^^^
bag.take example (:pr:6418) Roberto Panai_Core ^^^^
6409) Benjamin Zaitlen_kwargs provided (:pr:6382) Clark Zinzow_pickle5 for testing on Python 3.7 (:pr:6379) John A Kirkham_DataFrame ^^^^^^^^^
6422) Tom McTiernan_pytest.warns to check for UserWarning (:pr:6378) Richard (Rick) Zamora_bytes_per_chunk keyword from string (:pr:6370) Matthew Rocklin_Documentation ^^^^^^^^^^^^^
6421) Matthias Bussonnier_numpydoc following 1.1 release (:pr:6407) Gil Forsyth_6402) Matthias Bussonnier_6399) Ray Bell_visualize docstrings (:pr:6383) Zhengnan Zhao_.. _v2.20.0 / 2020-07-02:
Array ^^^^^
sizeof for numpy zero-strided arrays (:pr:6343) Matthias Bussonnier_concatenate_lookup in concatenate (:pr:6339) John A Kirkham_6335) Matthias Bussonnier_DataFrame ^^^^^^^^^
iloc``` calls to getitem`` (:pr:6355) Gil Forsyth_RangeIndex in fastparquet engine (:pr:6350) Richard (Rick) Zamora_6282) Richard (Rick) Zamora_ignore_index for pandas' group_split_dispatch (:pr:6251) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
6318) asmith26_.. _v2.19.0 / 2020-06-19:
Array ^^^^^
dtype (:pr:6326) Gil Forsyth_shape=None to *_like() array creation functions (:pr:6064) Anderson Banihirwe_Core ^^^^
6331) Gil Forsyth_parse_bytes (:pr:6311) Gil Forsyth_6308) Ram Rachum_6303) James Lamb_6304) James Lamb_DataFrame ^^^^^^^^^
6262) Gil Forsyth_ValueError when merging an index-only 1-partition dataframe (:pr:6309) Krishan Bhasin_index.map clear divisions. (:pr:6285) Julia Signell_Documentation ^^^^^^^^^^^^^
6328) Tom Augspurger_bag.rst (:pr:6317) Ben Shaver_.. _v2.18.1 / 2020-06-09:
Array ^^^^^
full (:pr:6299) Julia Signell_6252) Gabe Joseph_Core ^^^^
utils.py (:pr:6302) Ram Rachum_HighLevelGraph construction (:pr:6293) Julia Signell_Documentation ^^^^^^^^^^^^^
6295) Antonio Ercole De Luca_asyncssh intersphinx mappings (:pr:6298) Jacob Tomlinson_.. _v2.18.0 / 2020-06-05:
Array ^^^^^
6273) Julia Signell_stack error message (:pr:6268) Stephanie Gott_full & full_like: error on non-scalar fill_value (:pr:6129) Huite_map_overlap (:pr:6165) Eric Czech_6255) Julia Signell_Bag ^^^
6239) Antonio Ercole De Luca_DataFrame ^^^^^^^^^
dropna, sort, and ascending to sort_values (:pr:5880) Julia Signell_from_dask_array (:pr:6263) GALI PREM SAGAR_SeriesGroupby.nunique (:pr:6284) Julia Signell_NotImplementedError in resample with rule (:pr:6274) Abdulelah Bin Mahfoodh_dd.to_sql (:pr:6038) Ryan Williams_Documentation ^^^^^^^^^^^^^
6258) Ray Bell_.. _v2.17.2 / 2020-05-28:
Core ^^^^
complete extra (:pr:6257) Jim Crist-Harif_DataFrame ^^^^^^^^^
resample isn't going to give right answer (:pr:6244) Julia Signell_.. _v2.17.1 / 2020-05-28:
Array ^^^^^
6233) Andrew Fulton_Core ^^^^
pyyaml required (:pr:6250) Jim Crist-Harif_ImportError (:pr:6238) Gaurav Sheni_6249) Jacob Tomlinson_DataFrame ^^^^^^^^^
ignore_index to dd_shuffle from DataFrame.shuffle (:pr:6247) Richard (Rick) Zamora_6204) Martin Durant_describe & quantile apis (:pr:5137) GALI PREM SAGAR_.. _v2.17.0 / 2020-05-26:
Array ^^^^^
da.pad (:pr:6213) Mark Boer_tuple if multiple outputs in dask.array.apply_gufunc, add test to check for tuple (:pr:6207) Kai Mühlbauer_stack with unknown chunksizes (:pr:6195) swapna_Bag ^^^
6208) Antonio Ercole De Luca_Core ^^^^
delayed.visualise() (:pr:6216) Amol Umbarkar_6229) John A Kirkham_fuse() config (:pr:6198) crusaderky_dask.order.order to consider "next" nodes using both FIFO and LIFO (:pr:5872) Erik Welch_DataFrame ^^^^^^^^^
fill_value for more agg methods (:pr:6245) Julia Signell_rearrange_by_column_tasks and add DataFrame.shuffle (:pr:6066) Richard (Rick) Zamora_test_rolling_numba_engine for newer numba and older pandas (:pr:6236) James Bourbeau_fix_overlap (:pr:6240) GALI PREM SAGAR_DataFrame.shape with no columns (:pr:6237) noreentry_6226) Krishan Bhasin_6211) Marius van Niekerk_dd.merge_asof with left_on='col' & right_index=True (:pr:6192) noreentry_concat (:pr:6210) Tung Dang_AUTO_BLOCKSIZE out of read_csv signature (:pr:6214) Jim Crist-Harif_.loc indexing with callable (:pr:6185) Endre Mark Borza__compute_sum_of_squares for groupby std agg (:pr:6186) Richard (Rick) Zamora_test_parquet (:pr:6190) Brian Larsen_6194) GALI PREM SAGAR_test_to_parquet_with_get if no parquet libs available (:pr:6188) Scott Sanderson_Documentation ^^^^^^^^^^^^^
distributed.Event class (:pr:6231) Nils Braun_6124) Ray Bell_.. _v2.16.0 / 2020-05-08:
Array ^^^^^
6176) Nick Evans_dim with shape in unravel_index (:pr:6155) Julia Signell_5339) Gabe Joseph_Core ^^^^
6137) GALI PREM SAGAR_6159) Tom Augspurger_sizeof of dict and sequences returns an integer (:pr:6179) James Bourbeau_6154) Florian Jetter_6146) Tom Augspurger_6144) Tom Augspurger_4003) Itamar Turner-Trauring_6140) Benjamin Zaitlen_DataFrame ^^^^^^^^^
read_parquet (:pr:6160) Richard (Rick) Zamora_kwargs to methods that write data to disk (:pr:6056) Krishan Bhasin_unique returns an index like result from backends (:pr:6153) GALI PREM SAGAR_map_partitions with collections (:pr:6103) Tom Augspurger_Documentation ^^^^^^^^^^^^^
6157) Benjamin Zaitlen_6138) James Lamb_6147) Martin Durant_6143) Martin Durant_.. _v2.15.0 / 2020-04-24:
Array ^^^^^
dask.array.from_array to warn when passed a Dask collection (:pr:6122) James Bourbeau_dask.array.pad (:pr:6042) Mark Boer_repeats=0 in da.repeat (:pr:6080) James Bourbeau_Core ^^^^
6132) Benjamin Zaitlen_6069) Benjamin Zaitlen_6087) Matthew Rocklin_6094) Tom Augspurger_6057) Lucas Rademaker_6072) Martin Durant_6065) James Bourbeau_DataFrame ^^^^^^^^^
Categorical (:pr:6113) GALI PREM SAGAR__metadata on every worker (:pr:6017) Richard (Rick) Zamora_group_split_dispatch and ignore_index in apply_concat_apply (:pr:6119) Richard (Rick) Zamora_6090) Richard (Rick) Zamora_test_partition_on_cats_pyarrow if pyarrow is not installed (:pr:6112) James Bourbeau_6111) James Bourbeau_ArrowEngine bug fixes and test coverage (:pr:6047) Richard (Rick) Zamora_5958) Adam Lewis_Documentation ^^^^^^^^^^^^^
6130) JulianWgs_6077) Matthew Rocklin_map_partitions() docstring (:pr:6115) Eugene Huang_6091) David Chudzicki_array.random.* docs (:pr:6063) Martin Durant_Semaphore in distributed (:pr:6053) Florian Jetter_.. _v2.14.0 / 2020-04-03:
Array ^^^^^
np.iscomplexobj implementation (:pr:6045) Tom Augspurger_Core ^^^^
test_rearrange_disk_cleanup_with_exception to pass without cloudpickle installed (:pr:6052) James Bourbeau_test-rearrange (:pr:5977) Tom Augspurger_DataFrame ^^^^^^^^^
_meta_nonempty for dtype casting in stack_partitions (:pr:6061) mlondschien__metadata creation and filtering in parquet ArrowEngine (:pr:6023) Richard (Rick) Zamora_Documentation ^^^^^^^^^^^^^
6040) Tom Augspurger_.. _v2.13.0 / 2020-03-25:
Array ^^^^^
dtype and other keyword arguments in da.random (:pr:6030) Matthew Rocklin_cupy sparse hstack/vstack (:pr:5735) Corey J. Nolet_self.name to str in dask.array (:pr:6002) Chuanzhu Xu_Bag ^^^
rename_fused_keys to None by default in bag.optimize (:pr:6000) Lucas Rademaker_Core ^^^^
to_graphviz to prevent overwriting (:pr:5996) JulianWgs_xfail (:pr:6024) Tom Augspurger_6013) James Bourbeau_toolz to 0.8.2 and use tlz (:pr:5997) Ryan Grout_5862) James Bourbeau_DataFrame ^^^^^^^^^
read_hdf (:pr:6032) psimaj_dtype handling in dd.concat (:pr:6006) mlondschien_6025) Richard J Zamora_npartitions variable in dd.from_pandas (:pr:6019) Daniel Saxton_DataFrame.random_split (:pr:5980) petiop_Documentation ^^^^^^^^^^^^^
6022) Matthew Rocklin_5928) Julia Signell_5976) Julia Signell_.. _v2.12.0 / 2020-03-06:
Array ^^^^^
5933) Bruce Merry_map_blocks with block_info produce a Blockwise (:pr:5896) Bruce Merry_make_blockwise_graph (:pr:5940) Bruce Merry_da.tensordot (:pr:5975) Gil Forsyth_array.pad (:pr:5931) Thomas J. Fan_Core ^^^^
toolz.memoize dependency in dask.utils (:pr:5978) Ryan Grout_5979) Tom Augspurger_numpydoc to 0.8.0 (fix double autoescape) (:pr:5961) Gil Forsyth_range objects (:pr:5947) James Bourbeau_msgpack in CI (:pr:5930) JAmes Bourbeau_5937) Elliott Sales de Andrade_5920) James Bourbeau_DataFrame ^^^^^^^^^
getitem optimization for some keys (:pr:5917) Tom Augspurger_ignore_index argument to rearrange_by_column code path (:pr:5973) Richard J Zamora_memory_usage_per_partition methods (:pr:5971) James Bourbeau_xfail test_describe when using Pandas 0.24.2 (:pr:5948) James Bourbeau_dask.dataframe.to_numeric (:pr:5929) Julia Signell_5927) Julia Signell_5740) Richard J Zamora_Documentation ^^^^^^^^^^^^^
dask.array.triu docs (:pr:5984) Henrik Andersson_slice_with_int_dask_array error message (:pr:5981) Gabe Joseph_5963) James Lamb_5939) Ray Bell_5954) James Bourbeau_5962) James Lamb_kwarg on _bind_* methods (:pr:5946) Julia Signell_5938) Ray Bell_5926) Julia Signell_.. _v2.11.0 / 2020-02-19:
Array ^^^^^
Array.shape (:pr:5916) Bruce Merry_estimate_graph_size for rechunk (:pr:5907) Bruce Merry_5909) Bruce Merry_dtype and other kwargs in coarsen (:pr:5903) Matthew Rocklin_map_blocks into blockwise (:pr:5895) Bruce Merry_rewrite_blockwise for a singleton (:pr:5890) Bruce Merry_slices_from_chunks (:pr:5891) Bruce Merry___getitem__ in block() when chunks have correct dimensionality (:pr:5884) Thomas Robitaille_Bag ^^^
include_path option for dask.bag.read_text (:pr:5836) Yifan Gu_ValueError in delayed execution of bagged NumPy array (:pr:5828) Surya Avala_Core ^^^^
msgpack (:pr:5923) Tom Augspurger_test_inner to test_outer (:pr:5922) Shiva Raisinghani_quote should quote dicts too (:pr:5905) Bruce Merry_5898) Bruce Merry_5888) Bruce Merry_5892) Julia Signell_5861) Cyril Shcherbin_ThreadPool at exit (:pr:5852) Tom Augspurger_dask.dataframe import in tokenization code (:pr:5855) James Bourbeau_DataFrame ^^^^^^^^^
pandas>=0.23 (:pr:5883) Tom Augspurger_5901) Matthew Rocklin_dataframe/__init__.py (:pr:5882) Ram Rachum_5804) Shiva Raisinghani_sort= argument for groupby (:pr:5801) Richard J Zamora_df.empty property (:pr:5711) rockwellw_fastparquet.api.paths_to_cats. (:pr:5821) Igor Gotlibovych_Documentation ^^^^^^^^^^^^^
doc_wraps (:pr:5912) Tom Augspurger_5889) Bruce Merry_5877) Matthew Rocklin_5876) Matthew Rocklin_5878) K.-Michael Aye_map_blocks see also (:pr:5874) Tom Augspurger_5871) Julia Signell_5866) Yetunde Dada_cloud.rst (:pr:5860) Andrew Thomas_5844) Matthew Rocklin_.. _v2.10.1 / 2020-01-30:
5851) Tom Augspurger_5841) Gerrit Holl_.. _v2.10.0 / 2020-01-28:
BooleanDtype and StringDtype (:pr:5815) Tom Augspurger_5792) Tom Augspurger_5813) Tom Augspurger_5812) Matteo De Wint_5807) dfonnegra_5797) Chris Roat_pyarrow engine (:pr:5799) Richard J Zamora_groupby.std() when some of the keys were large integers (:pr:5737) H. Thomson Comer_.. _v2.9.2 / 2020-01-16:
Array ^^^^^
broadcast_arrays (:pr:5765) Matthew Rocklin_Core ^^^^
xfail CSV encoding tests (:pr:5791) Tom Augspurger_5789) James Bourbeau_dask.order.order (:pr:5646) Erik Welch_DataFrame ^^^^^^^^^
partd (:pr:5786) Christian Wesp_repr for empty dataframes (:pr:5781) Shiva Raisinghani_5784) Tom Augspurger_5783) Tom Augspurger_5782) Tom Augspurger_read_parquet on partitioned datasets (:pr:5777) Richard J Zamora_5779) Tom Augspurger_5776) Richard J Zamora_5730) Matthew Rocklin_set_index accepts single-item unnested list (:pr:5760) Wes Roach_Categorical (:pr:5715) Tom Augspurger_Documentation ^^^^^^^^^^^^^
normalize_token.register (:pr:5766) Thomas A Caswell_repartition docstring (:pr:5772) Timost_5771) Maarten Breddels_5767) James Bourbeau_5764) Devin Petersohn_.. _v2.9.1 / 2019-12-27:
Array ^^^^^
5736) Anderson Banihirwe_5684) Deepak Cherian_Core ^^^^
5734) James Bourbeau_5603) James Bourbeau_5696) Jim Crist_DataFrame ^^^^^^^^^
5743) James Bourbeau_5728) Matthew Rocklin_5719) Tom Augspurger_5656) Matteo De Wint_5690) Richard J Zamora_5697) James Bourbeau_5613) Tom Augspurger_Documentation ^^^^^^^^^^^^^
5750) Ray Bell_5733) Tom Augspurger_5724) James Bourbeau_5685) James Bourbeau_5713) Benjamin Zaitlen_5710) Julia Signell_5708) Tim Gates_5694) James Bourbeau_.. _v2.9.0 / 2019-12-06:
Array ^^^^^
da.std to work with NumPy arrays (:pr:5681) James Bourbeau_Core ^^^^
sizeof functions for Numba and RMM (:pr:5668) John A Kirkham_5682) Tom Augspurger_DataFrame ^^^^^^^^^
dd.DataFrame.drop to use shallow copy (:pr:5675) Richard J Zamora__get_md_row_groups (:pr:5673) Richard J Zamora_5629) Krishan Bhasin_dd.map_partitions to not enforce meta (:pr:5660) Matthew Rocklin_concat_unindexed_dataframes to support cudf-backend (:pr:5659) Richard J Zamora_5636) Benjamin Zaitlen_5635) Matthew Rocklin_Documentation ^^^^^^^^^^^^^
5665) James Bourbeau_5640) James Bourbeau_5639) Ray Bell_5617) James Bourbeau_.. _v2.8.1 / 2019-11-22:
Array ^^^^^
da.rechunk if no value given (:pr:5605) Matthew Rocklin_Core ^^^^
5619) James Bourbeau_DataFrame ^^^^^^^^^
aggregate_row_groups (:pr:5627) Richard J Zamora_chunksize argument to read_parquet (:pr:5607) Richard J Zamora_test_repartition_npartitions to support arch64 architecture (:pr:5620) ossdev07_5423) Oliver Hofkens_5608) Nuno Gomes Silva_5597) Richard J Zamora_Documentation ^^^^^^^^^^^^^
5616) James Bourbeau_5609) Tom Augspurger_html_extra_path (:pr:5614) James Bourbeau_5612) Tom Augspurger_.. _v2.8.0 / 2019-11-14:
Array ^^^^^
5574) Bouwe Andela_5575) Matthew Rocklin_5586) Matthew Rocklin_Bag ^^^
5571) Matthew Rocklin_Core ^^^^
5573) Ryan Nazareth_5576) Matthew Rocklin_5451) Matthew Rocklin_5578) Matthew Rocklin_5588) Tom Augspurger_DataFrame ^^^^^^^^^
5579) Richard J Zamora_5568)" (:pr:5590) Matthew Rocklin_Documentation ^^^^^^^^^^^^^
5583) Matthew Rocklin_5587) Gina Helfrich_5593) Matthew Rocklin_5589) Tom Augspurger_5569) Tom Augspurger_.. _v2.7.0 / 2019-11-08:
This release drops support for Python 3.5
Array ^^^^^
5496) Vijayant_5510) James Bourbeau_5523) Ryan Abernathey_5527) James Bourbeau_5545) Norman Barker_5556) James Bourbeau_Core ^^^^
5528) James Bourbeau_5497) Matthew Rocklin_5507) darindf_5501) James Bourbeau_5516) Tom Augspurger_5479) Ryan Grout_5534) James Bourbeau_5499) Albert DeFusco_5562) James Bourbeau_5511) crusaderky_DataFrame ^^^^^^^^^
3072) Bruno Bonfils_5500) Krishan Bhasin_5224) Henrique Ribeiro-5358) Scott Sievert_5522) Richard J Zamora_5508) Richard J Zamora_5531) James Bourbeau_5530) Tom Augspurger_5553) Petio Petrov_5568) Mads R. B. Kristensen_Documentation ^^^^^^^^^^^^^
5512) Matthew Rocklin_5513) Jacob Tomlinson_5456) Prithvi MK_5539) Jacob Tomlinson_5551) James Bourbeau_5554) Eric Dill_5566) Matthew Rocklin_.. _v2.6.0 / 2019-10-15:
Core ^^^^
ensure_dict on graphs before entering toolz.merge (:pr:5486) Matthew Rocklin_5476) Richard J Zamora_DataFrame ^^^^^^^^^
5491) Benjamin Zaitlen_warn_dtype_mismatch (:pr:5489) Tom Augspurger_3480) Jörg Dietrich_5484) Matthew Rocklin_read_parquet (:pr:5453) Tom Augspurger__constructor_sliced method to determine Series type (:pr:5480) Richard J Zamora_5459) Justin Waugh_KeyError with Groupby label (:pr:5467) Ryan Nazareth_Documentation ^^^^^^^^^^^^^
5494) Matthew Rocklin_5460) Javad_SSHCluster (:pr:5482) Matthew Rocklin_5473) Matthew Rocklin_5469) garanews_.. _v2.5.2 / 2019-10-04:
Array ^^^^^
5449) Ben Jeffery_5443) Matthew Rocklin_DataFrame ^^^^^^^^^
5463) Zhenqing Li_Documentation ^^^^^^^^^^^^^
5445) Matthew Rocklin_5446) Javad_5444) Matthew Rocklin_.. _v2.5.0 / 2019-09-27:
Core ^^^^
5420) James Bourbeau_5415) Matthew Rocklin_5400) Jim Crist_DataFrame ^^^^^^^^^
5436) Christopher J. Wright_5421) Richard J Zamora_5391) Richard J Zamora_5433) amerkel2_5430) Tom Augspurger_5422) Matthew Rocklin_values (:pr:5322) Richard J Zamora_5410) Wes Roach_Documentation ^^^^^^^^^^^^^
5429) (:pr:5424) Matthew Rocklin_5428) Mahmut Bulut_5404) James Bourbeau_.. _v2.4.0 / 2019-09-13:
Array ^^^^^
h5py.File mode (:pr:5390) James Bourbeau_5312) Scott Sievert_compute_meta (:pr:5356) estebanag__meta to Array.__dask_postpersist__ (:pr:5353) Benoit Bovy_da.asarray and da.asanyarray for datetime64 dtype and xarray objects (:pr:5334) Stephan Hoyer_5293) Tom Augspurger_5289) James Bourbeau_5283) Gabe Joseph_Core ^^^^
5401) Jim Crist_funcname when vectorized func has no __name__ (:pr:5399) James Bourbeau_funcname to avoid long key names (:pr:5383) Matthew Rocklin_numpy.vectorize in funcname (:pr:5396) James Bourbeau_5395) Tom Augspurger_parse_bytes/timedelta (:pr:5384) Matthew Rocklin_5351) Henry Pinkard_5300) Tom Augspurger_DataFrame ^^^^^^^^^
5402) Richard J Zamora_dd.pivot_table (:pr:5385) therhaag_5381) Arpit Solanki_set_index on categorical fails with less categories than partitions (:pr:5354) Oliver Hofkens_5304) Hongjiu Zhang_groupby().transform() (:pr:5327) Oliver Hofkens_5348) Richard J Zamora_5335) Sarah Bird_5332) Arpit Solanki_5307) Richard J Zamora_groupby().idxmin/max() (:pr:5273) Oliver Hofkens_5296) Benjamin Zaitlen_Documentation ^^^^^^^^^^^^^
5405) Wes Roach_5403) Wes Roach_5372) Scott Sievert_5387) Tom Augspurger_5374) Matthew Rocklin_5375) Matthew Rocklin_5369) Matthew Rocklin_institutional-faq.rst (:pr:5345) DomHudson_5340) Matthew Rocklin_5328) James Bourbeau_5311) Eugene Huang_5297) James Bourbeau_.. _v2.3.0 / 2019-08-16:
Array ^^^^^
from_array is given a dask array (:pr:5280) David Hoese_5274) Peter Andreas Entschev_meta= keyword to map_blocks and add test with sparse (:pr:5269) Matthew Rocklin_4822) Tobias de Jong_5256) James Bourbeau_3901) Tom Augspurger_5151) James Bourbeau_Bag ^^^
5208) Marco Neumann_Core ^^^^
5220) James Bourbeau_5267) Tom Augspurger_5234) Tom Augspurger_5231) Jim Crist_config.set (:pr:5226) Jim Crist_5227) Jim Crist_5228) Jim Crist_5217) James Bourbeau_5207) Matthew Rocklin_5179) John A Kirkham_DataFrame ^^^^^^^^^
DataFrame.query docstring (incorrect numexpr API) (:pr:5271) Doug Davis_5218) Richard J Zamora_5265) Martin Durant_rearrange_by_divisions and set_index support for cudf (:pr:5205) Richard J Zamora_groupby.std() with integer colum names (:pr:5096) Nicolas Hug_Series.__iter__ (:pr:5071) Blane_hash_pandas_object to work for non-pandas backends (:pr:5184) GALI PREM SAGAR_5154) Ivars Geidans_5223) Henrique Ribeiro_Documentation ^^^^^^^^^^^^^
5277) Matthew Rocklin_5214) Matthew Rocklin_5249) Martin Durant_5213) Matthew Rocklin_5246) Martin Durant_5242) Martin Durant_5248) Matthew Rocklin_5247) Matthew Rocklin_5243) Matthew Rocklin_5245) Martin Durant_5241) Matthew Rocklin_5236) James Bourbeau_5244) James Bourbeau_5240) James Bourbeau_5238) Matthew Rocklin_5235) James Bourbeau_5239) Martin Durant_5237) James Bourbeau_5232) Matthew Rocklin_.. _v2.2.0 / 2019-08-01:
Array ^^^^^
5074) Matthew Rocklin_5108) Peter Andreas Entschev_5035) Peter Andreas Entschev_5148) James Bourbeau_5122) Tom Augspurger_5103) Peter Andreas Entschev_5177) @andrethrill_5192) Matthew Rocklin_Bag ^^^
5172) Tom Augspurger_Core ^^^^
5064) (:pr:5121) Martin Durant_5056) Tom Augspurger_5128) Elliott Sales de Andrade_5130) Martin Durant_5140) Elliott Sales de Andrade_DataFrame ^^^^^^^^^
5066) Brett Naul_5090) GALI PREM SAGAR_4995) Richard J Zamora_5094) msbrown47_5111) Tom Augspurger_5143) Tom Augspurger_5149) Nick Becker_5173) Daniel Saxton_5180) Matthew Rocklin_5150) Sarah Bird_5182) Jim Crist_5157) Richard J Zamora_Documentation ^^^^^^^^^^^^^
5086) James Bourbeau_5093) Jim Crist_Natalya Rapstine_5139) Matthias Bussonier_5147) Tom Augspurger_5155) Loïc Estève_5171) James Bourbeau_5170) Martin Durant_5164) Xavier Holt_5163) Matthew Rocklin_5165) Matthew Rocklin_.. _v2.1.0 / 2019-07-08:
Array ^^^^^
recompute= keyword to svd_compressed for lower-memory use (:pr:5041) Matthew Rocklin___array_function__ implementation for backwards compatibility (:pr:5043) Ralf Gommers_dtype and shape kwargs to apply_along_axis (:pr:3742) Davis Bennett_5025) Peter Andreas Entschev_stack (:pr:4978) John A Kirkham_Core ^^^^
to_parquet call (:pr:5075) James Bourbeau_5072) James Bourbeau_5058) Willi Rath_5038) Tom Augspurger_5033) Tom Augspurger_5027) Tom Augspurger_DataFrame ^^^^^^^^^
compute_meta recursion in blockwise (:pr:5048) Peter Andreas Entschev_get_dummies (:pr:5057) GALI PREM SAGAR_DataFrame.assign (:pr:5047) asmith26_5034) tshatrov_5013) George Sakkis_preserve_index changes in pyarrow (:pr:5018) Richard J Zamora_meta for str.split(expand=False) (:pr:5022) Brett Naul_merge_asof (:pr:5011) Cody Johnson_4992) Matthew Rocklin_melt as a method of Dask DataFrame (:pr:4984) Dustin Tindall_to_hdf (:pr:5003) James Bourbeau_Documentation ^^^^^^^^^^^^^
5065) Sean McKenna_5061) David Brochart_from_sequence typo in delayed best practices (:pr:5045) James Bourbeau_5026) James Bourbeau_5015) James Bourbeau_5006) Tom Augspurger_.. _v2.0.0 / 2019-06-25:
Array ^^^^^
4981) James Bourbeau_4975) John A Kirkham_4863) Michael Eaton_4669) Hameer Abbasi_4931) Henry Pinkard_4945) Alistair Miles_4946) Peter Andreas Entschev_4914) Peter Andreas Entschev_4921) Matthew Rocklin_4938) Matthew Rocklin_4923) Bruce Merry_4167) John A Kirkham_4927) John A Kirkham_concatenate using _meta (:pr:4925) John A Kirkham_4895) Matthew Rocklin_4543) Peter Andreas Entschev
4912) Peter Andreas Entschev4937) Peter Andreas Entschev_4944) Matthew Rocklin_4972) Matthew Rocklin_4977) John A Kirkham_4976) John A Kirkham_4954) Matthew Rocklin_4853) Genevieve Buckley_numpy_compat functions (:pr:4850) John A Kirkham_4834) Anderson Banihirwe_4805) Tom Augspurger_4831) Bruce Merry_4794) Matthew Rocklin_4708) Peter Andreas Entschev_4990) John A Kirkham_da.block with 0-size arrays (:pr:4991) John A Kirkham_Core ^^^^
4919) Jim Crist_4960) Tom Augspurger_4916) Tom Augspurger_4924) John A Kirkham_4935) btw08_4918) James Bourbeau_4901) Matthew Rocklin_4908) Mark Bell_4903) Ian Bolliger_4890) Tom Augspurger_4836) James Bourbeau_4806) Justin Poehnelt_4798) Tom Augspurger_4793) Matthew Rocklin_4983) James Bourbeau_4988) John A Kirkham_DataFrame ^^^^^^^^^
4416) George Sakkis_4877) Cody Johnson_4882) Endre Mark Borza_4962) James Bourbeau_4515) asmith26_4955) Matthew Rocklin_4934) Philipp S. Sommer_4872) Justin Waugh_4917) mcsoini_4889) Benjamin Zaitlen_4905) Ian Bolliger_4865) Ksenia Bobrova_4884) Henrique Ribeiro_4896) tpanza_4881) Tom Augspurger_4829) Tom Augspurger_4810) Richard J Zamora_4819) Matthew Rocklin_4807) Lijo Jose_4791) Ksenia Bobrova_4459) Tom Augspurger_4802) Matthew Rocklin_4796) Jorge Pessoa_4800) Tom Augspurger__maybe_slice (:pr:4786) Benjamin Zaitlen_4745) Matthew Rocklin_4792) Matthew Rocklin_4788) James Bourbeau_4692) Guillaume Lemaitre_4989) James Bourbeau_3335) Jörg Dietrich_Documentation ^^^^^^^^^^^^^
4980) James Bourbeau_Matthew Rocklin_4970) Bouwe Andela_4969) James Bourbeau_4968) mbarkhau_4932) Hugo_4915) Matthew Rocklin_4887) Tom Augspurger_4886) Tom Augspurger_4868) Paweł Kordek_3821) Martin Durant_2528) Tom Augspurger_4838) Martin Durant_4833) Matthew Rocklin_utils.derive_from to accept functions, apply across array (:pr:4804) Martin Durant_4808) Matthew Rocklin_4816) Christian Hudon_.. _v1.2.2 / 2019-05-08:
Array ^^^^^
4759) Martin Durant_4753) Matthew Rocklin_4452) @asmith26_4756) Matthew Rocklin_4755) Matthew Rocklin_Bag ^^^
4423) Daniel Severo_Core ^^^^
4774) Matthew Rocklin_4780) James Bourbeau_3926) Martin Durant_4751) Martin Durant_4742) Jim Crist_DataFrame ^^^^^^^^^
4778) Matthew Rocklin_4771) Brian Chu_4762) Martin Durant_4765) Nick Becker_4588) Abhinav Ralhan_4760) Martin Durant_4757) Jim Crist_4744) Matthew Rocklin_Documentation ^^^^^^^^^^^^^
4772) Christian Hudon_4766) Matthew Rocklin_4770) Matthew Rocklin_4768) Martin Durant_4025) John A Kirkham_4764) James Bourbeau_4705) Matthew Rocklin_4752) Matthew Rocklin_.. _v1.2.1 / 2019-04-29:
Array ^^^^^
4737) Bruce Merry_4684) Genevieve Buckley_4713) Bruce Merry_4717) Danilo Horta_4715) Peter Andreas Entschev_4663) Michael Eaton_4704) Matthew Rocklin_4707) Genevieve Buckley_4679) Isaiah Norton_Core ^^^^
4735) Matthew Rocklin_4720) Jim Crist_4710) James Bourbeau_4696) Peter Andreas Entschev_4699) Matthew Rocklin_DataFrame ^^^^^^^^^
4736) Nick Becker_4738) Jim Crist_4677) Janne Vuorela_4725) Matthew Rocklin_4727) Jim Crist_4719) Nick Becker_4695) Jim Crist_4714) Matthew Rocklin_4625) Nathan Matare_Documentation ^^^^^^^^^^^^^
4716) Matthew Rocklin_4703) Matthew Rocklin_4728) James Bourbeau_4709) Matthew Rocklin_4698) James Bourbeau_.. _v1.2.0 / 2019-04-12:
Array ^^^^^
4525) Peter Andreas Entschev_4675) Hameer Abbasi_4656) Matthew Rocklin_4645) Matthew Rocklin_Core ^^^^
4680) Philipp Rudiger_4671) Peter Andreas Entschev_4673) Martin Durant_DataFrame ^^^^^^^^^
4686) Henrique Ribeiro_4647) gregrf_4674) Matthew Rocklin_4667) Peter Andreas Entschev_4668) Martin Durant_4634) Ian Rose_4655) Martin Durant_4657) Matthew Rocklin_4648) Henrique Ribeiro_4650) gregrf_Documentation ^^^^^^^^^^^^^
4660) James Bourbeau_.. _v1.1.5 / 2019-03-29:
Array ^^^^^
4646) Matthew Rocklin_Core ^^^^
4186) Brett Naul_4603)4605) James Bourbeau_4602)4623) Justin Poehnelt_4628) James Bourbeau_4604) James Bourbeau_4631) Peter Andreas Entschev_DataFrame ^^^^^^^^^
4565)4593) Dan O'Donovan_4599) Benjamin Zaitlen_4533)4576)4606)4613) amerkel2_4624) Julia Signell_4498) Henrique Ribeiro_4636) Justin Waugh_4600) Brian Chu_4637) Matthew Rocklin_4638) Matthew Rocklin_4651) Álvaro Abella Bascarán_Documentation ^^^^^^^^^^^^^
4571)4569)4619) James Bourbeau_4641) Aaron Fowles_4649) Søren Fuglede Jørgensen_.. _v1.1.4 / 2019-03-08:
Array ^^^^^
4548) John A Kirkham_asarray in extract (:pr:4549) John A Kirkham_4539) Elliott Sales de Andrade_4564) Peter Andreas Entschev_Core ^^^^
4542) Yu Feng_4554) Matthew Rocklin_DataFrame ^^^^^^^^^
4541) Matthew Rocklin_4551) Tom Augspurger_4557) Matthew Rocklin_4560) @JulianWgs_Documentation ^^^^^^^^^^^^^
4516) Scott Sievert_4566) Matthew Rocklin_4572) Shyam Saladi_.. _v1.1.3 / 2019-03-01:
Array ^^^^^
4513) Matthew Rocklin_4537) Matthew Rocklin_DataFrame ^^^^^^^^^
4522) Matthew Rocklin_4474) Joe Corbett_4531) Benjamin Zaitlen_4535) Matthew Rocklin_4530) @HSR05_Documentation ^^^^^^^^^^^^^
4528) Bart Broere_.. _v1.1.2 / 2019-02-25:
Array ^^^^^
4489) Marco Neumann_4431) Danilo Horta_4506) Jim Crist_4519) Peter Andreas Entschev_Bag ^^^
4464) Jim Crist_4475) Anderson Banihirwe_4502) Matthew Rocklin_4500) Matthew Rocklin_4507) Matthew Rocklin_DataFrame ^^^^^^^^^
4445) Janne Vuorela_4453) (:pr:4455) Michał Jastrzębski_4466) Jim Crist_4470) Matthew Rocklin_4482) Matthew Rocklin_4485) Marco Neumann_4494) Daniel Saxton_4501) Matthew Rocklin_4505) Jim Crist_4499) Matthew Rocklin_4504) Jim Crist_4509) Jim Crist_Documentation ^^^^^^^^^^^^^
from_zarr (:pr:4472) John A Kirkham_Using Other S3-Compatible Services for remote-data-services (:pr:4405) Aploium_4483) Bruce Merry_4508) James Bourbeau_Core ^^^^
4460) Marco Neumann_4479) (:pr:4480) Ross Petchler4492) Matthew Rocklin_.. _v1.1.1 / 2019-01-31:
Array ^^^^^
4402) Johnnie Gray_4434) Adam Beberg_4430) James Bourbeau_DataFrame ^^^^^^^^^
4396) Matthew Rocklin_4413) Jim Crist_4414) George Sakkis_4415) George Sakkis4418) Matthew Rocklin_4438) Roma Sokolov_Delayed ^^^^^^^
4440) Matthew Rocklin_Documentation ^^^^^^^^^^^^^
4406) John A Kirkham_4427) Daniel Severo_Core ^^^^
Janne Vuorela_.. _v1.1.0 / 2019-01-18:
Array ^^^^^
4236) Damien Garaud_4287) Paul Vecchio_4304) Johnnie Gray_4301) Tom Augspurger_4346) Mark Harfouche_4354) Matthew Rocklin_4363) Stephan Hoyer_4357) Diane Trout_4387) Jim Crist_4312) Marco Neumann_DataFrame ^^^^^^^^^
4250) James Bourbeau_4268) Mina Farid_4308) Tom Augspurger_4316) @slnguyen_4229) Matthew Rocklin_4331) Matthew Rocklin_4330) Matthew Rocklin4338) Gábor Lipták_4359) Matthew Rocklin_4375) Matthew Rocklin_4379) Tom Augspurger_4374) Tom Augspurger_Documentation ^^^^^^^^^^^^^
4258) David Hoese_4260) Jim Crist_4267), (:pr:4263), (:pr:4262), (:pr:4277), (:pr:4271), (:pr:4279), (:pr:4265), (:pr:4295), (:pr:4293), (:pr:4296), (:pr:4302), (:pr:4306), (:pr:4318), (:pr:4314), (:pr:4309), (:pr:4317), (:pr:4326), (:pr:4325), (:pr:4322), (:pr:4332), (:pr:4333), Miguel Farrajota_4272) Daniel Li_4259) (:pr:4282) Prabakaran Kumaresshan_4266) Guillaume Eynard-Bontemps_4313) Prabakaran Kumaresshan_4350) Matthew Rocklin_4376) Daniel Saxton_4382) Jendrik Jördening_Core ^^^^
4219) Matthew Rocklin_4281) Matthew Rocklin_4280) Takahiro Kojima_4092) Matthew Rocklin_4294) Stephan Hoyer_4276) Martin Durant_4324) Matthew Rocklin_4337) Gábor Lipták_4339) Matthew Rocklin_4342) Mark Harfouche_4351) Matthew Rocklin_4356) Stuart Berg_4348) Matthew Rocklin_4395) Matthew Rocklin_4381) Tom Augspurger_4388) Jim Crist_.. _v1.0.0 / 2018-11-28:
Array ^^^^^
4215) crusaderky_DataFrame ^^^^^^^^^
4232) James Bourbeau_4245) Martin Durant_4247) Martin Durant_Documentation ^^^^^^^^^^^^^
4222) (:pr:4224) (:pr:4228) (:pr:4231) (:pr:4230) (:pr:4234) (:pr:4235) (:pr:4254) Miguel Farrajota_4251) @milesial_Core ^^^^
4223) Matthew Rocklin_4221) Matthew Rocklin_Jim Crist_.. _v0.20.2 / 2018-11-15:
Array ^^^^^
4207) Matthew Rocklin_Dataframe ^^^^^^^^^
4193) Damien Garaud_4212) James Bourbeau_Documentation ^^^^^^^^^^^^^
4197) (:pr:4204) (:pr:4198) (:pr:4199) (:pr:4200) (:pr:4202) (:pr:4209) Miguel Farrajota_4206) James Bourbeau_4208) James Bourbeau_.. _v0.20.1 / 2018-11-09:
Array ^^^^^
4153) John A Kirkham_4150) John A Kirkham_4162) John A Kirkham_4168) samc0de_pad to add only new chunks (:pr:4152) John A Kirkham_4182) Matthew Rocklin_Core ^^^^
4143) James Bourbeau_4159) Matthew Rocklin_4171) Martin Durant_4165) Armin Berres_4181) Matthew Rocklin_4189) Damien Garaud_4160) Matthew Rocklin_Dataframe ^^^^^^^^^
4151) Anderson Banihirwe_4174) Martin Durant_4187) Damien Garaud_Documentation ^^^^^^^^^^^^^
4147) Jonathan Fraine_4164) (:pr:4175) (:pr:4185) (:pr:4192) (:pr:4191) (:pr:4190) (:pr:4180) Miguel Farrajota_4183) Carlos Valiente_.. _v0.20.0 / 2018-10-26:
Array ^^^^^
3998), (:pr:4081) Matthew Rocklin_4080) Matthew Rocklin_4113) Elliott Sales de Andrade_4116) Matthew Rocklin_4121) Matthew Rocklin_4125) Stephan Hoyer_4127), (:pr:4131) Anderson Banihirwe_4128), (:pr:4135) Matthew Rocklin_4126) Matthew Rocklin_Bag ^^^
4076) Martin Durant_Core ^^^^
4086), (:pr:4093) James Bourbeau_4112) Elliott Sales de Andrade_4077) Matthew Rocklin_4132) Martin Durant_4138) Matthew Rocklin_Dataframe ^^^^^^^^^
4071) Matthew Rocklin_4087) Jan Koch_4090) Bart Broere_4095) Matthew Rocklin_3909) Rahul Vaidya_4104) Justin Dennison_4115) Matthew Rocklin_4130) Martin Durant_Documentation ^^^^^^^^^^^^^
4073), (:pr:4074), (:pr:4094), (:pr:4097), (:pr:4107), (:pr:4124), (:pr:4133), (:pr:4139) Miguel Farrajota_4089) Antonino Ingargiola_4102) Javad_4109) Martin Durant_4114) TakaakiFuruse_4136) Matthew Rocklin_.. _v0.19.4 / 2018-10-09:
Array ^^^^^
apply_gufunc(..., axes=..., keepdims=...) (:pr:3985) Markus Gonser_Bag ^^^
4069) Matthew Rocklin_Dataframe ^^^^^^^^^
percentiles options for dask.dataframe.describe method (:pr:4067) Zhenqing Li_4066) Matthew Rocklin_Core ^^^^
4062) Matthew Rocklin_Documentation ^^^^^^^^^^^^^
= in kwarg). (:pr:4068) Matthias Bussonier_4065), (:pr:4064), (:pr:4063) Miguel Farrajota_.. _v0.19.3 / 2018-10-05:
Array ^^^^^
4041) Matthew Rocklin_4055) Jim Crist_4019) Matthew Rocklin_4044) Matthew Rocklin_corrcoef to global imports (:pr:4030) John A Kirkham_indices import to global import (:pr:4029) John A Kirkham_4028) John A Kirkham_3964) Mark Harfouche_3958) John A Kirkham_Bag ^^^
4033) Matthew Rocklin_4050) James Bourbeau_4018) Matthew Rocklin_4013) Eric Wolak_4000) (:pr:4007) Martin Durant_Dataframe ^^^^^^^^^
index parameter to :meth:dask.dataframe.from_dask_array for creating a dask DataFrame from a dask Array with a given index. (:pr:3991) Tom Augspurger_4015) Matthew Rocklin_4046) Jim Crist_4042) Jim Crist_3978) Martin Durant_3991) Tom Augspurger_3975) Julia Signell_3989) Martin Durant_Core ^^^^
4050) James Bourbeau_4002) Matthew Rocklin_3979) Jim Crist_3763) Itamar Turner-Trauring_Documentation ^^^^^^^^^^^^^
4049), (:pr:4034), (:pr:4031), (:pr:4020), (:pr:4021), (:pr:4022), (:pr:4023), (:pr:4016), (:pr:4017), (:pr:4010), (:pr:3997), (:pr:3996), Miguel Farrajota_4048) James Bourbeau_4014) Matthew Rocklin_4008) Matthew Rocklin_3992) James Bourbeau_.. _v0.19.2 / 2018-09-17:
Array ^^^^^
apply_gufunc implements automatic infer of functions output dtypes (:pr:3936) Markus Gonser_3980) James Bourbeau_3956) Yu Feng_3965) Mark Harfouche_3949) Keisuke Fujii_Core ^^^^
3966) Mark Harfouche_3957) Matthew Rocklin_Documentation ^^^^^^^^^^^^^
3963) Matthew Rocklin_Matthew Rocklin_3960) Tom Augspurger_.. _v0.19.1 / 2018-09-06:
Array ^^^^^
3928) Matthew Rocklin_3939) Bruce Merry_3955) Tobias de Jong_3944) Yu Feng_3933) Tobias de Jong_Dataframe ^^^^^^^^^
3867) George Sakkis_3923) (:pr:3931) @andrethrill_3888) Sriharsha Hatwar_3919) Tom Augspurger_3941) Matthew Rocklin_3942) Matthew Rocklin_Documentation ^^^^^^^^^^^^^
3922) Uwe Korn_3924) Matthew Rocklin_.. _v0.19.0 / 2018-08-29:
Array ^^^^^
3949) Keisuke Fujii_3810) crusaderky_3825) Stephan Hoyer_3836) Matthew Rocklin_3823) Elliott Sales de Andrade_3820) Matthew Rocklin_3844) Mark Harfouche_3851) Tobias de Jong_3830) Robert Sare_3861) Jim Crist_3852) Tobias de Jong_DataFrame ^^^^^^^^^^
dtype and sparse keywords to :func:dask.dataframe.get_dummies (:pr:3792) Tom Augspurger_dask.dataframe.to_dask_array for converting a Dask Series or DataFrame to a
Dask Array, possibly with known chunk sizes (:pr:3884) Tom Augspurgerdask.array.asarray for dask dataframe and series inputs. Previously,
the series was eagerly converted to an in-memory NumPy array before creating a dask array with known
chunks sizes. This caused unexpectedly high memory usage. Now, no intermediate NumPy array is created,
and a Dask array with unknown chunk sizes is returned (:pr:3884) Tom Augspurger3805) Tom Augspurger_3828) Irina Truong_3833) Eric Bonfadini_3212) Henrique Ribeiro_3858) Jim Crist_3860) Jim Crist_3890) Matthew Rocklin_3897) Tom Augspurger_3908) Julia Signell_Core ^^^^
3771) Danilo Horta_3840) Jim Crist_3841) Jim Crist_3849) Joe Hamman_3856) Jim Crist_3857) Jim Crist_3855) @hugovk_3876) Jan Margeta_3896) Matthew Rocklin_3894) Matthew Rocklin_3893) Joe Hamman_Docs ^^^^
3826) John Mrziglod_3838) Jim Crist_3746) Christoph Moehl_3850) Anderson Banihirwe_3709) Scott Sievert_3880) Javad_3878) Daniel Rothenberg_3900) Hans Moritz Günther_ to docstring (:pr:3915) @rtobar`_.. _v0.18.2 / 2018-07-23:
Array ^^^^^
argtopk to make it release the GIL (:pr:3610) crusaderky_map_overlap (:pr:3653) Matthew Rocklin_linalg.tsqr for dimensions of uncertain length (:pr:3662) Jeremy Chen_3648) Matthew Rocklin_3679) Matthew Rocklin_3675) James Bourbeau_.blocks accessor (:pr:3689) Matthew Rocklin_block_info keyword to map_blocks functions (:pr:3686) Matthew Rocklin_3407) crusaderky_dtype in arange (:pr:3722) crusaderky_argtopk with uneven chunks (:pr:3720) crusaderky_replace=False in da.choice (:pr:3765) James Bourbeau_Array.__setitem__ (:pr:3767) Itamar Turner-Trauring_chunksize convenience property (:pr:3777) Jacob Tomlinson_step < 0 (:pr:3702) Ziyao Wei_to_zarr with return_stored True returns a Dask Array (:pr:3786) John A Kirkham_Bag ^^^
last_endline optional parameter in to_textfiles (:pr:3745) George Sakkis_Dataframe ^^^^^^^^^
3772) Gerome Pistre_3799) Cloves Almeida_Delayed ^^^^^^^
@ operator to the delayed objects (:pr:3691) Mark Harfouche_3737) Matthew Rocklin_@delayed decorator for methods and add tests (:pr:3757) Ziyao Wei_Core ^^^^
3669) Mike Neish_3652) Matthew Rocklin_3588) Tom Augspurger_assert_eq to top-level modules (:pr:3726) Matthew Rocklin_scipy.sparse arrays (:pr:3738) Matthew Rocklin_3782) Elliott Sales de Andrade_3780) Matthew Rocklin_.. _v0.18.1 / 2018-06-22:
Array ^^^^^
from_array now supports scalar types and nested lists/tuples in input,
just like all numpy functions do; it also produces a simpler graph when the
input is a plain ndarray (:pr:3568) crusaderky_3620) Marco Rossi_3578) John A Kirkham_3625) James Bourbeau_3640) James Bourbeau_3643) Matthew Rocklin_3658) John A Kirkham_DataFrame ^^^^^^^^^
3613) Henrique Ribeiro_3636) Martin Durant_Core ^^^^
3629) crusaderky_3642) Matthew Rocklin_3621) Jim Crist_3632) Yu Feng_3649) Matthew Rocklin_.. _v0.18.0 / 2018-06-14:
Array ^^^^^
3460) Martin Durant_apply_gufunc, gufunc, and
as_gufunc (:pr:3109) (:pr:3526) (:pr:3539) Markus Gonser_3529) Matthew Rocklin_3511) Matthew Rocklin_3540) Martin Durant_3517) John A Kirkham_3559) Scott Sievert_isneginf and isposinf (:pr:3581) John A Kirkham_learn module (:pr:3580) John A Kirkham_3575) Jeremy Chen_3591) Marc Pfister_nan_to_num in public API (:pr:3599) John A Kirkham_3601) John A Kirkham_3597) Matthew Rocklin_3607) John A Kirkham_to_zarr/from_zarr (:pr:3561) John A Kirkham_3586) Jeremy Chan_
(:pr:3396) crusaderky_Dataframe ^^^^^^^^^
3494) Martin Durant_index to unsupported arguments for DataFrame.rename method (:pr:3522) James Bourbeau_numpy.ndarray, pandas.Series, and
pandas.Index objects (:pr:3536) James Bourbeau_3485) Christopher Ren_3522) James Bourbeau_3536) James Bourbeau_3566) James Bourbeau_3594) Matt Lee_3606) James Bourbeau_3573) @andrethrill_Bag ^^^
3470) Matthew Rocklin_Core ^^^^
3448) Matthew Rocklin_3432) (:pr:3513) (:pr:3520) Matthew Rocklin_dask-ssh CLI Options and Description. (:pr:3476) @beomi_3496) Martin Durant_3509) James Bourbeau_3502) Matthew Rocklin_3516) Matthew Rocklin_3507) Matthew Rocklin_3562) Simon Perkins_3582) Matthew Rocklin_3477) Matthew Rocklin_3604) Matthew Rocklin_.. _v0.17.5 / 2018-05-16:
Array ^^^^^
rechunk with chunksize of -1 in a dict (:pr:3469) Stephan Hoyer_einsum now accepts the split_every parameter (:pr:3471) crusaderky_3479) Yu Feng_DataFrame ^^^^^^^^^
3499) Tom Augspurger_.. _v0.17.4 / 2018-05-03:
Dataframe ^^^^^^^^^
3461) James Bourbeau_3463) Pierre Bartet_3466) Martin Durant_3462) James Bourbeau_.. _v0.17.3 / 2018-05-02:
Array ^^^^^
einsum for Dask Arrays (:pr:3412) Simon Perkins_piecewise for Dask Arrays (:pr:3350) John A Kirkham_nan in broadcast_shapes (:pr:3356) John A Kirkham_isin for dask arrays (:pr:3363). Stephan Hoyer_topk for Dask Arrays: faster algorithm, particularly for large k's; added support
for multiple axes, recursive aggregation, and an option to pick the bottom k elements instead.
(:pr:3395) crusaderky_topk API has changed from topk(k, array) to the more conventional topk(array, k).
The legacy API still works but is now deprecated. (:pr:2965) crusaderky_argtopk for Dask Arrays (:pr:3396) crusaderky_map_overlap (:pr:3445) John A Kirkham_gradient for Dask Arrays (:pr:3434) John A Kirkham_DataFrame ^^^^^^^^^
t as shorthand for table in to_hdf for pandas compatibility (:pr:3330) Jörg Dietrich_isna method for Dask DataFrames (:pr:3294) Christopher Ren_read_parquet for engine="pyarrow" (:pr:3207) Uwe Korn_3366) Christopher Ren_infer_divisions option to read_parquet to specify whether read engines should compute divisions (:pr:3387) Jon Mease_engine="pyarrow" (:pr:3387) Jon Mease_3343) Matthew Rocklin_3284) Martin Durant_3373) Martin Durant_3436) James Bourbeau_3440) Jörg Dietrich_3446) Jörg Dietrich_3421) Matthew Rocklin_Core ^^^^
3410) Jim Crist_3448) Matthew Rocklin_.. _v0.17.2 / 2018-03-21:
Array ^^^^^
broadcast_arrays for Dask Arrays (:pr:3217) John A Kirkham_bitwise_* ufuncs (:pr:3219) John A Kirkham_axis argument to squeeze (:pr:3261) John A Kirkham_3307) Matthew Rocklin_3301) Martin Durant_DataFrame ^^^^^^^^^
3201) Matthew Rocklin_read_parquet with categories=[…] for engine="pyarrow" (:pr:3177) Uwe Korn_dd.tseries.Resampler.agg (:pr:3202) Richard Postelnik_3230) Matthew Rocklin_dd.groupby._Groupby.apply (:pr:3256) Gabriele Lanaro_Bag ^^^
3254) Matthew Rocklin_Core ^^^^
3238) Daniel Collins_3271) Matthew Rocklin_3298) Matthew Rocklin_.. _v0.17.1 / 2018-02-22:
Array ^^^^^
3166, :pr:3167) Simon Perkins_store_chunk calls for store's return_stored option (:pr:3153) John A Kirkham_3187) Matthew Rocklin_DataFrame ^^^^^^^^^
3164) Max Epstein_Core ^^^^
3160) Martin Durant_3191) Matthew Rocklin_3157) Thrasibule_3185) Dieter Weber_.. _v0.17.0 / 2018-02-09:
Array ^^^^^
3133) Keisuke Fujii_3058) Xander Johnson_store's return_stored option (:pr:3064) John A Kirkham_optimization.fuse_slice to properly handle when first input is None (:pr:3076) James Bourbeau_3107) Matthew Rocklin_3060) Roman Yurchak_DataFrame ^^^^^^^^^
3110) Matthew Rocklin_3118) Matthew Rocklin_read_csv, read_table, and read_parquet accept iterables of paths
(:pr:3124) Jim Crist_dd.to_delayed function in favor of the existing method
(:pr:3126) Jim Crist_3147) Matthew Rocklin_columns and index in dd.read_parquet to be more
consistent, especially in handling of multi-indices (:pr:3149) Jim Crist_3097) Martin Durant_3100) Martin Durant_Bag ^^^
bag.map_paritions function may receive either a list or generator. (:pr:3150) Nir_Core ^^^^
3056) Matthew Rocklin_3057) (:pr:3122) Matthew Rocklin_dask.bytes.open_text_files (:pr:3077) Jim Crist_3079) Jim Crist_dask.base.optimize for optimizing multiple collections without
computing. (:pr:3071) Jim Crist_dask.optimize module to dask.optimization (:pr:3071) Jim Crist_3066) Matthew Rocklin_optimize_graph keyword to all to_delayed methods to allow
controlling whether optimizations occur on conversion. (:pr:3126) Jim Crist_pyarrow for hdfs integration (:pr:3123) Jim Crist_3083) Jim Crist_3116) Jim Crist_.. _v0.16.1 / 2018-01-09:
Array ^^^^^
percentile (:pr:3021) James Bourbeau_bool() coercion from calling compute (:pr:2958) Albert DeFusco_matmul (:pr:2904) John A Kirkham_matmul (:pr:2909) John A Kirkham_vdot (:pr:2910) John A Kirkham_chunks argument for broadcast_to (:pr:2943) Stephan Hoyer_meshgrid (:pr:2938) John A Kirkham_ and (:pr:3001) Markus Gonser_fftshift/ifftshift (:pr:2733) John A Kirkham_vindex and raise errors for out of bounds indexes (:pr:2967) Stephan Hoyer_flip, flipud, fliplr (:pr:2954) John A Kirkham_float_power ufunc (:pr:2962) (:pr:2969) John A Kirkham_2964) Tom Augspurger_block (:pr:2650) John A Kirkham_frompyfunc (:pr:3030) Jim Crist_return_stored option to store for chaining stored results (:pr:2980) John A Kirkham_DataFrame ^^^^^^^^^
3037) Martijn Arts_dd.read_csv when names is given but header is not set to None (:issue:2976) Martijn Arts_dd.read_csv so that passing instances of CategoricalDtype in dtype will result in known categoricals (:pr:2997) Tom Augspurger_bool() coercion from calling compute (:pr:2958) Albert DeFusco_DataFrame.read_sql() (:pr:2928) to an empty database tables returns an empty dask dataframe Apostolos Vlachopoulos_2973) Tom Augspurger_df.columns.name) when reading in dd.read_parquet (:pr:2973) Tom Augspurger_dd.concat losing the index dtype when the data contained a categorical (:issue:2932) Tom Augspurger_dd.Series.rename (:pr:3027) Jim Crist_DataFrame.merge() now supports merging on a combination of columns and the index (:pr:2960) Jon Mease_dd.rolling* methods, in preparation for their removal in the next pandas release (:pr:2995) Tom Augspurger_3035) Jim Crist_Series.str.cat (:pr:3028) Jim Crist_Core ^^^^
2937) Matthew Rocklin_3017) Matthew Rocklin_.. _v0.16.0 / 2017-11-17:
This is a major release. It includes breaking changes, new protocols, and a large number of bug fixes.
Array ^^^^^
atleast_1d, atleast_2d, and atleast_3d (:pr:2760) (:pr:2765) John A Kirkham_allclose (:pr:2771) by John A Kirkham_random.different_seeds from Dask Array API docs (:pr:2772) John A Kirkham_vnorm in favor of dask.array.linalg.norm (:pr:2773) John A Kirkham_unique to be lazy (:pr:2775) John A Kirkham_2784) John A Kirkham_asarray and asanyarray to Dask Array API docs (:pr:2787) James Bourbeau_unique's return_* arguments (:pr:2779) John A Kirkham__unique_internal (:pr:2850) (:pr:2855) John A Kirkham_2826) Jim Crist_DataFrame ^^^^^^^^^
pyarrow in dd.to_parquet (:pr:2868) Jim Crist_DataFrame.quantile and Series.quantile returning nan when missing values are present (:pr:2791) Tom Augspurger_DataFrame.quantile losing the result .name when q is a scalar (:pr:2791) Tom Augspurger_dd.concat return a dask.Dataframe when concatenating a single series along the columns, matching pandas' behavior (:pr:2800) James Munroe_DataFrame.eval to match the pandas defualt for pandas >= 0.21.0 (:pr:2838) Tom Augspurger_DataFrame.set_index on text column where one of the partitions was empty (:pr:2831) Jesse Vogt_DataFrame.set_index on empty dataframe (:pr:2827) Jesse Vogt_Dataframe.fillna when filling with a Series value (:pr:2810) Tom Augspurger_dd.to_parquet to better match convention of putting the dataframe first (:pr:2867) Jim Crist_2835) Jim Crist_2814) Tom Augspurger_2822) Uwe Korn_2712) Christopher Prohm_2818) @xwang777_2863) Jim Crist_2527) @fjetter_2873) @Ced4_pyarrow in dd.to_parquet (:pr:2894, :pr:2881) Jim Crist_Core ^^^^
2763) Matthew Rocklin_2762) Matthew Rocklin_2776) Matthew Rocklin_2828) Thomas Caswell_2844) Tom Augspurger_2782) Jim Crist_2748) Jim Crist_2847) Matthew Rocklin_2871) Jim Crist_2875) Jim Crist_2889) Ian Hopkinson_2881) Jim Crist_.. _v0.15.4 / 2017-10-06:
Array ^^^^^
da.random.choice now works with array arguments (:pr:2781)2719)2747)chunks (:pr:2749)2709)DataFrame ^^^^^^^^^
.str accessor to Categoricals with string categories (:pr:2743)2711)2714)2737)Bag ^^^
2710)Core ^^^^
pip install dask[complete] (:pr:2750).. _v0.15.3 / 2017-09-24:
Array ^^^^^
2301)*_like array creation functions (:pr:2640)2647)2658)top and atop (:pr:2661)2664)assert_eq (:pr:2681)2683)ptp (:pr:2691)2690) and apply_over_axes (:pr:2702)DataFrame ^^^^^^^^^
Series.str[index] (:pr:2634)2636)DataFrame.to_csv and Bag.to_textfiles now return the filenames to
which they have written (:pr:2655)partition_on and append in to_parquet
(:pr:2645)2667)2676)Core ^^^^
python setup.py test now runs tests (:pr:2641)2649)2688).. _v0.15.2 / 2017-08-25:
Array ^^^^^
2520)2543) (:pr:2549)2541) (:pr:2545) (:pr:2555)2539)2573)2584)2595)2597)2607), (:pr:2609)2571)Bag ^^^
2525)DataFrame ^^^^^^^^^
2513)2522)2523)2534) (:pr:2591)2558)2547)Core ^^^^
except: blocks everywhere (:pr:2590).. _v0.15.1 / 2017-07-08:
2466)2473), (:pr:2475)2486)2503)2511).. _v0.15.0 / 2017-06-09:
Array ^^^^^
2269)ufunc.outer (:pr:2345)2333) (:pr:2394)2377)@ operator (:pr:2349)numpy.fft module (:pr:2320) (:pr:2322) (:pr:2327) (:pr:2323)__array_ufunc__ protocol (:pr:2438)Bag ^^^
2324)db.map top-level function. Also remove
auto-expansion of tuples as map arguments (:pr:2339)Bag.concat to Bag.flatten (:pr:2402)DataFrame ^^^^^^^^^
2277) (:pr:2422)Core ^^^^
2318)2397)2310).. _v0.14.3 / 2017-05-05:
DataFrame ^^^^^^^^^
.. _v0.14.2 / 2017-05-03:
Array ^^^^^
2268), da.tile (:pr:2153), da.roll (:pr:2135)2264)2235) and (:pr:2251)2234)2186)2181)2148)2142)2116)Bag ^^^
2199)DataFrame ^^^^^^^^^
2290)2249), (:pr:2248), and (:pr:2246)2223)2198)2168)Core ^^^^
2263)2219)2207)2129), (:pr:2131), and (:pr:2112).. _v0.14.1 / 2017-03-22:
Array ^^^^^
2058)2075) (:pr:2080)2079)2089)2090)da.fft (:pr:2093)DataFrame ^^^^^^^^^
2020)npartitions='auto' mode in set_index (:pr:2025)2032)repartition(freq='12h') (:pr:2059)2010)2085)2091)2098)Delayed ^^^^^^^
2084)Core ^^^^
apply
(:pr:2070)2094).. _v0.14.0 / 2017-02-24:
Array ^^^^^
arange
(:pr:1902), (:pr:1904), (:pr:1935), (:pr:1955), (:pr:1956)1923)from_array if name is provided (:pr:1972)Bag ^^^
1934)1939), (:pr:1950),
(:pr:1953)DataFrame ^^^^^^^^^
1877), (:pr:1930)1909)1913)1914)1637)1940)dd.demo.daily_stock function for teaching (:pr:1992)Delayed ^^^^^^^
traverse= keyword to delayed to optionally avoid traversing nested
data structures (:pr:1899)1961)1969)Core ^^^^
1910)1919)persist function (:pr:1927)errors= keyword in byte handling (:pr:1954)1975)1985).. _v0.13.0 / 2017-01-02:
Array ^^^^^
1755)1838)1840)1758)1766)1800)1737), (:pr:1827)Bag ^^^
1867)DataFrame ^^^^^^^^^
map_overlap for custom rolling operations (:pr:1769)shift (:pr:1773)1782) (:pr:1792) (:pr:1810), (:pr:1843),
(:pr:1859), (:pr:1863)1787)1807), (:pr:1824)1808), (:pr:1823) (:pr:1828)1858)Delayed ^^^^^^^
delayed(nout=0) and delayed(nout=1):
delayed(nout=1) does not default to out=None anymore, and
delayed(nout=0) is also enabled. I.e. functions with return
tuples of length 1 or 0 can be handled correctly. This is especially
handy, if functions with a variable amount of outputs are wrapped by
delayed. E.g. a trivial example:
delayed(lambda *args: args, nout=len(vals))(*vals)Core ^^^^
1768), (:pr:1774)1833).. _v0.12.0 / 2016-11-03:
DataFrame ^^^^^^^^^
dataframe.map_partitions return
scalars (:pr:1515)1513)dataframe.DataFrame.categorize no longer includes missing values
in the categories. This is for compatibility with a pandas change <https://github.com/pydata/pandas/pull/10929>_ (:pr:1565)dataframe.read_csv when some lines have quotes
(:pr:1495)dataframe.reduction and series.reduction methods to apply generic
row-wise reduction to dataframes and series (:pr:1483)dataframe.select_dtypes, which mirrors the pandas method <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.select_dtypes.html>_ (:pr:1556)dataframe.read_hdf now supports reading Series (:pr:1564)1540)select_dtypes (:pr:1556)1561)1567)indicator keyword to merge (:pr:1575)read_hdf (:pr:1575)1578)df.x += 1 (:pr:1585)1621)1625)1663)1665)1667), align (:pr:1668), combine_first (:pr:1725), and
any/all (:pr:1724)1666)groupby.aggregate method (:pr:1678)dd.read_table function (:pr:1682)1697) (:pr:1712)loc (:pr:1726)resample to include DataFrames (:pr:1741)1669)Array ^^^^^
dask.array chunks argument work (:pr:1504)dask.array (:pr:1484)1539) (:pr:1731)atop with a concatenate= (:pr:1609) new_axes=
(:pr:1612) and adjust_chunks= (:pr:1716) keywords1610) swapaxes (:pr:1611) round (:pr:1708) repeatatop-backed operations (:pr:1644)1709)Bag ^^^
bag.from_sequence being interpreted as
tasks (:pr:1491)1747)Administration ^^^^^^^^^^^^^^
1526)1487)1520)1569)1614)1648)1653)1675)1722).. _v0.11.0 / 2016-08-24:
Major Points ^^^^^^^^^^^^
DataFrames now enforce knowing full metadata (columns, dtypes) everywhere.
Previously we would operate in an ambiguous state when functions lost dtype
information (such as apply). Now all dataframes always know their dtypes
and raise errors asking for information if they are unable to infer (which
they usually can). Some internal attributes like _pd and
_pd_nonempty have been moved.
The internals of the distributed scheduler have been refactored to transition tasks between explicit states. This improves resilience, reasoning about scheduling, plugin operation, and logging. It also makes the scheduler code easier to understand for newcomers.
Breaking Changes ^^^^^^^^^^^^^^^^
distributed.s3 and distributed.hdfs namespaces are gone. Use
protocols in normal methods like read_text('s3://...' instead.Dask.array.reshape now errs in some cases where previously it would have
create a very large number of tasks.. _v0.10.2 / 2016-07-27:
--bokeh-whitelist option ot dask-scheduler to better
routing of web interface messages behind non-trivial network settingsdask.dataframe.read_hdf, especially when reading from
multiple files and docs.. _v0.10.0 / 2016-06-13:
Major Changes ^^^^^^^^^^^^^
dask.distributed executables have been renamed from dfoo to dask-foo.
For example dscheduler is renamed to dask-schedulerBag ^^^
DataFrame ^^^^^^^^^
groupby.std/varread_csvDistributed ^^^^^^^^^^^
Other ^^^^^
.. _v0.9.0 / 2016-05-11:
API Changes ^^^^^^^^^^^
dask.do and dask.value have been renamed to dask.delayeddask.bag.from_filenames has been renamed to dask.bag.read_textdb.from_s3 or
distributed.s3.read_csv have been moved into the plain read_text,
read_csv functions, which now support protocols, like
dd.read_csv('s3://bucket/keys*.csv')Array ^^^^^
scipy.LinearOperatorBag ^^^
from_filename\ s to read_textfrom_s3 in favor of read_text('s3://...')DataFrame ^^^^^^^^^
from_pandas for speedy round-trips to and from-pandas
objectsread_csv to be more in line with Pandas behaviorset_index operations for sorted columnsDelayed ^^^^^^^
do/value to delayedto/from_imperative to to/from_delayedDistributed ^^^^^^^^^^^
Other ^^^^^
.. _v0.8.1 / 2016-03-11:
Array ^^^^^
arg reductions (argmin, argmax, etc.)Bag ^^^
zip functionDataFrame ^^^^^^^^^
corr and cov functionsmelt function.. _v0.8.0 / 2016-02-20:
Array ^^^^^
tril, triu, LU, inv, cholesky,
solve, solve_triangular, eye, lstsq, diag, corrcoef.Bag ^^^
from_hdfs function (better functionality now exists in hdfs3 and
distributed projects)DataFrame ^^^^^^^^^
dask.dataframe to include a full empty pandas dataframe as
metadata. Drop the .columns attribute on Series.columns
attribute for series.read_csv fixes (multi-column parse_dates, integer column names, etc. )Other ^^^^^
.. _v0.7.6 / 2016-01-05:
Array ^^^^^
view, compress, hstack, dstack, vstack methodsmap_blocks can now remove and add dimensionsDataFrame ^^^^^^^^^
Imperative ^^^^^^^^^^
Core ^^^^
dask.distributed.. _v0.7.4 / 2015-10-23:
This was mostly a bugfix release. Some notable changes:
dask.dataframe.read_hdf by default to avoid concurrency
issuesdask.get to point to dask.async.get_sync by defaultdask.arraydask.arrays from dask.imperative objectsDeprecation ^^^^^^^^^^^
This release also includes a deprecation warning for dask.distributed, which
will be removed in the next version.
Future development in distributed computing for dask is happening here: https://distributed.dask.org . General feedback on that project is most welcome from this community.
.. _v0.7.3 / 2015-09-25:
Diagnostics ^^^^^^^^^^^
dask.diagnostics module.DataFrame
^^^^^^^^^
This release improves coverage of the pandas API. Among other things
it includes nunique, nlargest, quantile. Fixes encoding issues
with reading non-ascii csv files. Performance improvements and bug fixes
with resample. More flexible read_hdf with globbing. And many more. Various
bug fixes in dask.imperative and dask.bag.
.. _v0.7.0 / 2015-08-15:
DataFrame ^^^^^^^^^ This release includes significant bugfixes and alignment with the Pandas API. This has resulted both from use and from recent involvement by Pandas core developers.
Bag ^^^
Array ^^^^^
Infrastructure ^^^^^^^^^^^^^^
.. _v0.6.1 / 2015-07-23:
Distributed ^^^^^^^^^^^
dask.distributed
when workers dieDataFrame ^^^^^^^^^
Array ^^^^^
Scheduling ^^^^^^^^^^
Other ^^^^^
.. _crusaderky: https://github.com/crusaderky
.. _John A Kirkham: https://github.com/jakirkham
.. _Matthew Rocklin: https://github.com/mrocklin
.. _Jim Crist: https://github.com/jcrist
.. _James Bourbeau: https://github.com/jrbourbeau
.. _James Munroe: https://github.com/jmunroe
.. _Thomas Caswell: https://github.com/tacaswell
.. _Tom Augspurger: https://github.com/tomaugspurger
.. _Uwe Korn: https://github.com/xhochy
.. _Christopher Prohm: https://github.com/chmp
.. _@xwang777: https://github.com/xwang777
.. _@fjetter: https://github.com/fjetter
.. _@Ced4: https://github.com/Ced4
.. _Ian Hopkinson: https://github.com/IanHopkinson
.. _Stephan Hoyer: https://github.com/shoyer
.. _Albert DeFusco: https://github.com/AlbertDeFusco
.. _Markus Gonser: https://github.com/magonser
.. _Martijn Arts: https://github.com/mfaafm
.. _Jon Mease: https://github.com/jonmmease
.. _Xander Johnson: https://github.com/metasyn
.. _Nir: https://github.com/nirizr
.. _Keisuke Fujii: https://github.com/fujiisoup
.. _Roman Yurchak: https://github.com/rth
.. _Max Epstein: https://github.com/MaxPowerWasTaken
.. _Simon Perkins: https://github.com/sjperkins
.. _Richard Postelnik: https://github.com/postelrich
.. _Daniel Collins: https://github.com/dancollins34
.. _Gabriele Lanaro: https://github.com/gabrielelanaro
.. _Jörg Dietrich: https://github.com/joergdietrich
.. _Christopher Ren: https://github.com/cr458
.. _Martin Durant: https://github.com/martindurant
.. _Thrasibule: https://github.com/thrasibule
.. _Dieter Weber: https://github.com/uellue
.. _Apostolos Vlachopoulos: https://github.com/avlahop
.. _Jesse Vogt: https://github.com/jessevogt
.. _Pierre Bartet: https://github.com/Pierre-Bartet
.. _Scott Sievert: https://github.com/stsievert
.. _Jeremy Chen: https://github.com/convexset
.. _Marc Pfister: https://github.com/drwelby
.. _Matt Lee: https://github.com/mathewlee11
.. _Yu Feng: https://github.com/rainwoodman
.. _@andrethrill: https://github.com/andrethrill
.. _@beomi: https://github.com/beomi
.. _Henrique Ribeiro: https://github.com/henriqueribeiro
.. _Marco Rossi: https://github.com/m-rossi
.. _Itamar Turner-Trauring: https://github.com/itamarst
.. _Mike Neish: https://github.com/neishm
.. _Mark Harfouche: https://github.com/hmaarrfk
.. _George Sakkis: https://github.com/gsakkis
.. _Ziyao Wei: https://github.com/ZiyaoWei
.. _Jacob Tomlinson: https://github.com/jacobtomlinson
.. _Elliott Sales de Andrade: https://github.com/QuLogic
.. _Gerome Pistre: https://github.com/GPistre
.. _Cloves Almeida: https://github.com/cjalmeida
.. _Tobias de Jong: https://github.com/tadejong
.. _Irina Truong: https://github.com/j-bennet
.. _Eric Bonfadini: https://github.com/eric-bonfadini
.. _Danilo Horta: https://github.com/horta
.. _@hugovk: https://github.com/hugovk
.. _Jan Margeta: https://github.com/jmargeta
.. _John Mrziglod: https://github.com/JohnMrziglod
.. _Christoph Moehl: https://github.com/cmohl2013
.. _Anderson Banihirwe: https://github.com/andersy005
.. _Javad: https://github.com/javad94
.. _Daniel Rothenberg: https://github.com/darothen
.. _Hans Moritz Günther: https://github.com/hamogu
.. _@rtobar: https://github.com/rtobar
.. _Julia Signell: https://github.com/jsignell
.. _Sriharsha Hatwar: https://github.com/Sriharsha-hatwar
.. _Bruce Merry: https://github.com/bmerry
.. _Joe Hamman: https://github.com/jhamman
.. _Robert Sare: https://github.com/rmsare
.. _Jeremy Chan: https://github.com/convexset
.. _Eric Wolak: https://github.com/epall
.. _Miguel Farrajota: https://github.com/farrajota
.. _Zhenqing Li: https://github.com/DigitalPig
.. _Matthias Bussonier: https://github.com/Carreau
.. _Jan Koch: https://github.com/datajanko
.. _Bart Broere: https://github.com/bartbroere
.. _Rahul Vaidya: https://github.com/rvaidya
.. _Justin Dennison: https://github.com/justin1dennison
.. _Antonino Ingargiola: https://github.com/tritemio
.. _TakaakiFuruse: https://github.com/TakaakiFuruse
.. _samc0de: https://github.com/samc0de
.. _Armin Berres: https://github.com/aberres
.. _Damien Garaud: https://github.com/geraud
.. _Jonathan Fraine: https://github.com/exowanderer
.. _Carlos Valiente: https://github.com/carletes
.. _@milesial: https://github.com/milesial
.. _Paul Vecchio: https://github.com/vecchp
.. _Johnnie Gray: https://github.com/jcmgray
.. _Diane Trout: https://github.com/detrout
.. _Marco Neumann: https://github.com/crepererum
.. _Mina Farid: https://github.com/minafarid
.. _@slnguyen: https://github.com/slnguyen
.. _Gábor Lipták: https://github.com/gliptak
.. _David Hoese: https://github.com/djhoese
.. _Daniel Li: https://github.com/li-dan
.. _Prabakaran Kumaresshan: https://github.com/nixphix
.. _Daniel Saxton: https://github.com/dsaxton
.. _Jendrik Jördening: https://github.com/jendrikjoe
.. _Takahiro Kojima: https://github.com/515hikaru
.. _Stuart Berg: https://github.com/stuarteberg
.. _Guillaume Eynard-Bontemps: https://github.com/guillaumeeb
.. _Adam Beberg: https://github.com/beberg
.. _Roma Sokolov: https://github.com/little-arhat
.. _Daniel Severo: https://github.com/dsevero
.. _Michał Jastrzębski: https://github.com/inc0
.. _Janne Vuorela: https://github.com/Dimplexion
.. _Ross Petchler: https://github.com/rpetchler
.. _Aploium: https://github.com/aploium
.. _Peter Andreas Entschev: https://github.com/pentschev
.. _@JulianWgs: https://github.com/JulianWgs
.. _Shyam Saladi: https://github.com/smsaladi
.. _Joe Corbett: https://github.com/jcorb
.. _@HSR05: https://github.com/HSR05
.. _Benjamin Zaitlen: https://github.com/quasiben
.. _Brett Naul: https://github.com/bnaul
.. _Justin Poehnelt: https://github.com/jpoehnelt
.. _Dan O'Donovan: https://github.com/danodonovan
.. _amerkel2: https://github.com/amerkel2
.. _Justin Waugh: https://github.com/bluecoconut
.. _Brian Chu: https://github.com/bchu
.. _Álvaro Abella Bascarán: https://github.com/alvaroabascar
.. _Aaron Fowles: https://github.com/aaronfowles
.. _Søren Fuglede Jørgensen: https://github.com/fuglede
.. _Hameer Abbasi: https://github.com/hameerabbasi
.. _Philipp Rudiger: https://github.com/philippjfr
.. _gregrf: https://github.com/gregrf
.. _Ian Rose: https://github.com/ian-r-rose
.. _Genevieve Buckley: https://github.com/GenevieveBuckley
.. _Michael Eaton: https://github.com/mpeaton
.. _Isaiah Norton: https://github.com/hnorton
.. _Nick Becker: https://github.com/beckernick
.. _Nathan Matare: https://github.com/nmatare
.. _@asmith26: https://github.com/asmith26
.. _Abhinav Ralhan: https://github.com/abhinavralhan
.. _Christian Hudon: https://github.com/chrish42
.. _Alistair Miles: https://github.com/alimanfoo
.. _Henry Pinkard: https://github.com/
.. _Ian Bolliger: https://github.com/bolliger32
.. _Mark Bell: https://github.com/MarkCBell
.. _Cody Johnson: https://github.com/codercody
.. _Endre Mark Borza: https://github.com/endremborza
.. _asmith26: https://github.com/asmith26
.. _Philipp S. Sommer: https://github.com/Chilipp
.. _mcsoini: https://github.com/mcsoini
.. _Ksenia Bobrova: https://github.com/almaleksia
.. _tpanza: https://github.com/tpanza
.. _Richard J Zamora: https://github.com/rjzamora
.. _Lijo Jose: https://github.com/lijose
.. _btw08: https://github.com/btw08
.. _Jorge Pessoa: https://github.com/jorge-pessoa
.. _Guillaume Lemaitre: https://github.com/glemaitre
.. _Bouwe Andela: https://github.com/bouweandela
.. _mbarkhau: https://github.com/mbarkhau
.. _Hugo: https://github.com/hugovk
.. _Paweł Kordek: https://github.com/kordek
.. _Ralf Gommers: https://github.com/rgommers
.. _Davis Bennett: https://github.com/d-v-b
.. _Willi Rath: https://github.com/willirath
.. _David Brochart: https://github.com/davidbrochart
.. _GALI PREM SAGAR: https://github.com/galipremsagar
.. _tshatrov: https://github.com/tshatrov
.. _Dustin Tindall: https://github.com/dustindall
.. _Sean McKenna: https://github.com/seanmck
.. _msbrown47: https://github.com/msbrown47
.. _Natalya Rapstine: https://github.com/natalya-patrikeeva
.. _Loïc Estève: https://github.com/lesteve
.. _Xavier Holt: https://github.com/xavi-ai
.. _Sarah Bird: https://github.com/birdsarah
.. _Doug Davis: https://github.com/douglasdavis
.. _Nicolas Hug: https://github.com/NicolasHug
.. _Blane: https://github.com/BlaneG
.. _Ivars Geidans: https://github.com/ivarsfg
.. _Scott Sievert: https://github.com/stsievert
.. _estebanag: https://github.com/estebanag
.. _Benoit Bovy: https://github.com/benbovy
.. _Gabe Joseph: https://github.com/gjoseph92
.. _therhaag: https://github.com/therhaag
.. _Arpit Solanki: https://github.com/arpit1997
.. _Oliver Hofkens: https://github.com/OliverHofkens
.. _Hongjiu Zhang: https://github.com/hongzmsft
.. _Wes Roach: https://github.com/WesRoach
.. _DomHudson: https://github.com/DomHudson
.. _Eugene Huang: https://github.com/eugeneh101
.. _Christopher J. Wright: https://github.com/CJ-Wright
.. _Mahmut Bulut: https://github.com/vertexclique
.. _Ben Jeffery: https://github.com/benjeffery
.. _Ryan Nazareth: https://github.com/ryankarlos
.. _garanews: https://github.com/garanews
.. _Vijayant: https://github.com/VijayantSoni
.. _Ryan Abernathey: https://github.com/rabernat
.. _Norman Barker: https://github.com/normanb
.. _darindf: https://github.com/darindf
.. _Ryan Grout: https://github.com/groutr
.. _Krishan Bhasin: https://github.com/KrishanBhasin
.. _Albert DeFusco: https://github.com/AlbertDeFusco
.. _Bruno Bonfils: https://github.com/asyd
.. _Petio Petrov: https://github.com/petioptrv
.. _Mads R. B. Kristensen: https://github.com/madsbk
.. _Prithvi MK: https://github.com/pmk21
.. _Eric Dill: https://github.com/ericdill
.. _Gina Helfrich: https://github.com/Dr-G
.. _ossdev07: https://github.com/ossdev07
.. _Nuno Gomes Silva: https://github.com/mgsnuno
.. _Ray Bell: https://github.com/raybellwaves
.. _Deepak Cherian: https://github.com/dcherian
.. _Matteo De Wint: https://github.com/mdwint
.. _Tim Gates: https://github.com/timgates42
.. _Erik Welch: https://github.com/eriknw
.. _Christian Wesp: https://github.com/ChrWesp
.. _Shiva Raisinghani: https://github.com/exemplary-citizen
.. _Thomas A Caswell: https://github.com/tacaswell
.. _Timost: https://github.com/Timost
.. _Maarten Breddels: https://github.com/maartenbreddels
.. _Devin Petersohn: https://github.com/devin-petersohn
.. _dfonnegra: https://github.com/dfonnegra
.. _Chris Roat: https://github.com/ChrisRoat
.. _H. Thomson Comer: https://github.com/thomcom
.. _Gerrit Holl: https://github.com/gerritholl
.. _Thomas Robitaille: https://github.com/astrofrog
.. _Yifan Gu: https://github.com/gyf304
.. _Surya Avala: https://github.com/suryaavala
.. _Cyril Shcherbin: https://github.com/shcherbin
.. _Ram Rachum: https://github.com/cool-RR
.. _Igor Gotlibovych: https://github.com/ig248
.. _K.-Michael Aye: https://github.com/michaelaye
.. _Yetunde Dada: https://github.com/yetudada
.. _Andrew Thomas: https://github.com/amcnicho
.. _rockwellw: https://github.com/rockwellw
.. _Gil Forsyth: https://github.com/gforsyth
.. _Thomas J. Fan: https://github.com/thomasjpfan
.. _Henrik Andersson: https://github.com/hnra
.. _James Lamb: https://github.com/jameslamb
.. _Corey J. Nolet: https://github.com/cjnolet
.. _Chuanzhu Xu: https://github.com/xcz011
.. _Lucas Rademaker: https://github.com/lr4d
.. _JulianWgs: https://github.com/JulianWgs
.. _psimaj: https://github.com/psimaj
.. _mlondschien: https://github.com/mlondschien
.. _petiop: https://github.com/petiop
.. _Richard (Rick) Zamora: https://github.com/rjzamora
.. _Mark Boer: https://github.com/mark-boer
.. _Florian Jetter: https://github.com/fjetter
.. _Adam Lewis: https://github.com/Adam-D-Lewis
.. _David Chudzicki: https://github.com/dchudz
.. _Nick Evans: https://github.com/nre
.. _Kai Mühlbauer: https://github.com/kmuehlbauer
.. _swapna: https://github.com/swapna-pg
.. _Antonio Ercole De Luca: https://github.com/eracle
.. _Amol Umbarkar: https://github.com/mindhash
.. _noreentry: https://github.com/noreentry
.. _Marius van Niekerk: https://github.com/mariusvniekerk
.. _Tung Dang: https://github.com/3cham
.. _Jim Crist-Harif: https://github.com/jcrist
.. _Brian Larsen: https://github.com/brl0
.. _Nils Braun: https://github.com/nils-braun
.. _Scott Sanderson: https://github.com/ssanderson
.. _Gaurav Sheni: https://github.com/gsheni
.. _Andrew Fulton: https://github.com/andrewfulton9
.. _Stephanie Gott: https://github.com/stephaniegott
.. _Huite: https://github.com/Huite
.. _Ryan Williams: https://github.com/ryan-williams
.. _Eric Czech: https://github.com/eric-czech
.. _Abdulelah Bin Mahfoodh: https://github.com/abduhbm
.. _Ben Shaver: https://github.com/bpshaver
.. _Matthias Bussonnier: https://github.com/Carreau
.. _johnomotani: https://github.com/johnomotani
.. _Roberto Panai: https://github.com/rpanai
.. _Clark Zinzow: https://github.com/clarkzinzow
.. _Tom McTiernan: https://github.com/tmct
.. _joshreback: https://github.com/joshreback
.. _Jun Han (Johnson) Ooi: https://github.com/tebesfinwo
.. _Jim Circadian: https://github.com/JimCircadian
.. _Jack Xiaosong Xu: https://github.com/jackxxu
.. _Mike McCarty: https://github.com/mmccarty
.. _michaelnarodovitch: https://github.com/michaelnarodovitch
.. _David Sheldon: https://github.com/davidsmf
.. _McToel: https://github.com/McToel
.. _Kilian Lieret: https://github.com/klieret
.. _Noah D. Brenowitz: https://github.com/nbren12
.. _Jon Thielen: https://github.com/jthielen
.. _Poruri Sai Rahul: https://github.com/rahulporuri
.. _Kyle Nicholson: https://github.com/kylejn27
.. _Rafal Wojdyla: https://github.com/ravwojdyla
.. _Sam Grayson: https://github.com/charmoniumQ
.. _Madhur Tandon: https://github.com/madhur-tandon
.. _Joachim B Haga: https://github.com/jobh
.. _Pav A: https://github.com/rs2
.. _GFleishman: https://github.com/GFleishman
.. _Shang Wang: https://github.com/shangw-nvidia
.. _Illviljan: https://github.com/Illviljan
.. _Jan Borchmann: https://github.com/jborchma
.. _Ruben van de Geer: https://github.com/rubenvdg
.. _Akira Naruse: https://github.com/anaruse
.. _Zhengnan Zhao: https://github.com/zzhengnan
.. _Greg Hayes: https://github.com/hayesgb
.. _RogerMoens: https://github.com/RogerMoens
.. _manuels: https://github.com/manuels
.. _Rockwell Weiner: https://github.com/rockwellw
.. _Devanshu Desai: https://github.com/devanshuDesai
.. _David Katz: https://github.com/DavidKatz-il
.. _Stephannie Jimenez Gacha: https://github.com/steff456
.. _Magnus Nord: https://github.com/magnunor
.. _Callum Noble: https://github.com/callumanoble
.. _Pascal Bourgault: https://github.com/aulemahal
.. _Joris Van den Bossche: https://github.com/jorisvandenbossche
.. _Mark: https://github.com/mchi
.. _Kumar Bharath Prabhu: https://github.com/kumarprabhu1988
.. _Rob Malouf: https://github.com/rmalouf
.. _sdementen: https://github.com/sdementen
.. _patquem: https://github.com/patquem
.. _Amit Kumar: https://github.com/aktech
.. _D-Stacks: https://github.com/D-Stacks
.. _Kyle Barron: https://github.com/kylebarron
.. _Julius Busecke: https://github.com/jbusecke
.. _Sinclair Target: https://github.com/sinclairtarget
.. _Ashwin Srinath: https://github.com/shwina
.. _David Hassell: https://github.com/davidhassell
.. _brandon-b-miller: https://github.com/brandon-b-miller
.. _Hristo Georgiev: https://github.com/hristog
.. _Trevor Manz: https://github.com/manzt
.. _Madhu94: https://github.com/Madhu94
.. _gerrymanoim: https://github.com/gerrymanoim
.. _rs9w33: https://github.com/rs9w33
.. _Tom White: https://github.com/tomwhite
.. _Eoin Shanaghy: https://github.com/eoinsha
.. _Nick Vazquez: https://github.com/nickvazz
.. _cameron16: https://github.com/cameron16
.. _Daniel Mesejo-León: https://github.com/mesejo
.. _Naty Clementi: https://github.com/ncclementi
.. _JSKenyon: https://github.com/jskenyon
.. _Freyam Mehta: https://github.com/freyam
.. _Jiaming Yuan: https://github.com/trivialfis
.. _c-thiel: https://github.com/c-thiel
.. _Andrew Champion: https://github.com/aschampion
.. _Justus Magin: https://github.com/keewis
.. _Maisie Marshall: https://github.com/maisiemarshall
.. _Vibhu Jawa: https://github.com/VibhuJawa
.. _Boaz Mohar: https://github.com/boazmohar
.. _Kristopher Overholt: https://github.com/koverholt
.. _tsuga: https://github.com/tsuga
.. _Gabriel Miretti: https://github.com/gmiretti
.. _Geoffrey Lentner: https://github.com/glentner
.. _Charles Blackmon-Luca: https://github.com/charlesbluca
.. _Bryan Van de Ven: https://github.com/bryevdv
.. _Fabian Gebhart: https://github.com/fgebhart
.. _Ross: https://github.com/rhjmoore
.. _gurunath: https://github.com/rajagurunath
.. _aa1371: https://github.com/aa1371
.. _Gregory R. Lee: https://github.com/grlee77
.. _Louis Maddox: https://github.com/lmmx
.. _Dahn: https://github.com/DahnJ
.. _Jordan Jensen: https://github.com/dotNomad
.. _Martin Fleischmann: https://github.com/martinfleis
.. _Robert Hales: https://github.com/robalar
.. _João Paulo Lacerda: https://github.com/jopasdev
.. _neel iyer: https://github.com/spiyer99
.. _SnkSynthesis: https://github.com/SnkSynthesis
.. _JoranDox: https://github.com/JoranDox
.. _Kinshuk Dua: https://github.com/kinshukdua
.. _Suriya Senthilkumar: https://github.com/suriya-it19
.. _Vũ Trung Đức: https://github.com/vutrungduc7593
.. _Nathan Danielsen: https://github.com/ndanielsen
.. _Wallace Reis: https://github.com/wreis
.. _German Shiklov: https://github.com/Jeremaiha-xmetix
.. _Pankaj Patil: https://github.com/Patil2099
.. _Samuel Gaist: https://github.com/sgaist
.. _Marcel Coetzee: https://github.com/marcelned
.. _Matthew Powers: https://github.com/MrPowers
.. _Vyas Ramasubramani: https://github.com/vyasr
.. _Ayush Dattagupta: https://github.com/ayushdg
.. _FredericOdermatt: https://github.com/FredericOdermatt
.. _mihir: https://github.com/ek234
.. _Sarah Charlotte Johnson: https://github.com/scharlottej13
.. _ofirr: https://github.com/ofirr
.. _kori73: https://github.com/kori73
.. _TnTo: https://github.com/TnTo
.. _ParticularMiner: https://github.com/ParticularMiner
.. _aeisenbarth: https://github.com/aeisenbarth
.. _Aneesh Nema: https://github.com/aneeshnema
.. _Deepyaman Datta: https://github.com/deepyaman
.. _Maren Westermann: https://github.com/marenwestermann
.. _Michael Delgado: https://github.com/delgadom
.. _abergou: https://github.com/abergou
.. _Pavithra Eswaramoorthy: https://github.com/pavithraes
.. _Maxim Lippeveld: https://github.com/MaximLippeveld
.. _Kirito1397: https://github.com/Kirito1397
.. _Xinrong Meng: https://github.com/xinrong-databricks
.. _Bryan Weber: https://github.com/bryanwweber
.. _Amir Kadivar: https://github.com/amirkdv
.. _Pedro Silva: https://github.com/ppsbs
.. _Knut Nordanger: https://github.com/nordange
.. _Ben Glossner: https://github.com/bglossner
.. _Dranaxel: https://github.com/Dranaxel
.. _Holden Karau: https://github.com/holdenk
.. _Peter: https://github.com/peterpandelidis
.. _Thomas Grainger: https://github.com/graingert
.. _Martin Thøgersen: https://github.com/th0ger
.. _Leo Gao: https://github.com/leogao2
.. _Paul Hobson: https://github.com/phobson
.. _LSturtew: https://github.com/LSturtew
.. _Michał Górny: https://github.com/mgorny
.. _lrjball: https://github.com/lrjball
.. _Davide Gavio: https://github.com/davidegavio
.. _Ben Greiner: https://github.com/bnavigator
.. _Roger Filmyer: https://github.com/rfilmyer
.. _Richard: https://github.com/richarms
.. _Francesco Andreuzzi: https://github.com/fAndreuzzi
.. _Nadiem Sissouno: https://github.com/sissnad
.. _Jorge López: https://github.com/jorloplaz
.. _Cheun Hong: https://github.com/cheunhong
.. _Eray Aslan: https://github.com/erayaslan
.. _Ben Beasley: https://github.com/musicinmybrain
.. _Ryan Russell: https://github.com/ryanrussell
.. _Angelos Omirolis: https://github.com/aomirolis
.. _Fabien Aulaire: https://github.com/faulaire
.. _Alex-JG3: https://github.com/Alex-JG3
.. _Christopher Akiki: https://github.com/cakiki
.. _Sultan Orazbayev: https://github.com/SultanOrazbayev
.. _Richard Pelgrim: https://github.com/rrpelgrim
.. _Ben: https://github.com/benjaminhduncan
.. _Angus Hollands: https://github.com/agoose77
.. _Lucas Miguel Ponce: https://github.com/lucasmsp
.. _Dylan Stewart: https://github.com/drstewart19
.. _geraninam: https://github.com/geraninam
.. _Michael Milton: https://github.com/multimeric
.. _Ruth Comer: https://github.com/rcomer
.. _Frédéric BRIOL: https://github.com/fbriol
.. _Jordan Yap: https://github.com/jjyap
.. _Logan Norman: https://github.com/lognorman20
.. _ivojuroro: https://github.com/ivojuroro
.. _Shaghayegh: https://github.com/Shadimrad
.. _Hendrik Makait: https://github.com/hendrikmakait
.. _Luke Conibear: https://github.com/lukeconibear
.. _Nicolas Grandemange: https://github.com/epizut
.. _Nat Tabris: https://github.com/ntabris
.. _Lawrence Mitchell: https://github.com/wence-
.. _nouman: https://github.com/noumxn
.. _Tim Paine: https://github.com/timkpaine
.. _ChrisJar: https://github.com/ChrisJar
.. _Shingo OKAWA: https://github.com/ognis1205
.. _qheuristics: https://github.com/qheuristics
.. _Jacob Hayes: https://github.com/JacobHayes
.. _Shawn: https://github.com/chaokunyang
.. _Erik Holmgren: https://github.com/Holmgren825
.. _aywandji: https://github.com/aywandji
.. _Chiara Marmo: https://github.com/cmarmo
.. _Jayesh Manani: https://github.com/jayeshmanani
.. _Patrick Hoefler: https://github.com/phofl
.. _Matthew Roeschke: https://github.com/mroeschke
.. _Miles: https://github.com/milesgranger
.. _Anton Loukianov: https://github.com/antonl
.. _Brian Phillips: https://github.com/bphillips-exos
.. _hotpotato: https://github.com/hotpotato
.. _Alexander Clausen: https://github.com/sk1p
.. _Swayam Patil: https://github.com/Swish78
.. _Johan Olsson: https://github.com/johanols
.. _wkrasnicki: https://github.com/wkrasnicki
.. _Michael Leslie: https://github.com/michaeldleslie
.. _Samantha Hughes: https://github.com/shughes-uk
.. _Mario Šaško: https://github.com/mariosasko
.. _joanrue: https://github.com/joanrue
.. _Andrew S. Rosen: https://github.com/Andrew-S-Rosen
.. _jochenott: https://github.com/jochenott
.. _FTang21: https://github.com/FTang21
.. _Erik Sundell: https://github.com/consideRatio
.. _Julian Gilbey: https://github.com/juliangilbey
.. _Charles Stern: https://github.com/cisaacstern
.. _templiert: https://github.com/templiert
.. _Lindsey Gray: https://github.com/lgray
.. _wim glenn: https://github.com/wimglenn
.. _Dimitri Papadopoulos Orfanos: https://github.com/DimitriPapadopoulos
.. _Quentin Lhoest: https://github.com/lhoestq
.. _Jonas Lähnemann: https://github.com/jlaehne
.. _Abel Aoun: https://github.com/bzah
.. _Simon Høxbro Hansen: https://github.com/Hoxbro
.. _M Bussonnier: https://github.com/Carreau
.. _Greg M. Fleishman: https://github.com/GFleishman
.. _Victor Stinner: https://github.com/vstinner
.. _alex-rakowski: https://github.com/alex-rakowski
.. _Adam Williamson: https://github.com/AdamWill
.. _Jonas Dedden: https://github.com/jonded94
.. _Bernhard Raml: https://github.com/SwamyDev
.. _Lucas Colley: https://github.com/lucascolley
.. _Tao Xin: https://github.com/Tao-VanJS
.. _David Stansby: https://github.com/dstansby
.. _Mario Linker: https://github.com/maldag
.. _Dmitry Balabka: https://github.com/dbalabka
.. _Martin Yeo: https://github.com/trexfeathers
.. _Ilan Gold: https://github.com/ilan-gold
.. _Jean-Baptiste Bayle: https://github.com/j2bbayle
.. _dchudz: https://github.com/dchudz
.. _Guido Imperiale: https://github.com/crusaderky
.. _Alexander: https://github.com/SalikovAlex
.. _Philipp A.: https://github.com/flying-sheep
.. _Sergey Kolesnikov: https://github.com/SCORE1387
.. _Taylor Braun-Jones: https://github.com/nocnokneo
.. _Isaac: https://github.com/icykip
.. _Sandro: https://github.com/penguinpee
.. _Brigitta Sipőcz: https://github.com/bsipocz
.. _Raúl Cumplido: https://github.com/raulcd
.. _Lukas Bindreiter: https://github.com/lukasbindreiter
.. _Marvin Albert: https://github.com/m-albert
.. _Peter Fackeldey: https://github.com/pfackeldey
.. _Marco Edward Gorelli: https://github.com/MarcoGorelli
.. _Peter A. Jonsson: https://github.com/pjonsson
.. _Florian Courtial: https://github.com/fcourtial
.. _Tony Ding: https://github.com/tonyyuyiding
.. _Oisin-M: https://github.com/Oisin-M
.. _Username46786: https://github.com/Username46786
.. _Maneesh Sutar: https://github.com/maneesh29s
.. _Jianyu Sun: https://github.com/csfldf
.. _DongWon: https://github.com/dongwonmoon
.. _Simon-Martin Schröder: https://github.com/moi90
.. _Wouter-Michiel Vierdag: https://github.com/melonora
.. _Clément Robert: https://github.com/neutrinoceros
.. _Gautham Hullikunte: https://github.com/batcity
.. _Vipin Kataria: https://github.com/vipinkataria2209
.. _Matthew Plough: https://github.com/mplough-kobold