docs/source/array.rst
.. toctree:: :maxdepth: 1 :hidden:
array-best-practices.rst array-chunks.rst array-creation.rst array-overlap.rst array-design.rst array-sparse.rst array-stats.rst array-slicing.rst array-assignment.rst array-stack.rst array-gufunc.rst array-random.rst array-api.rst array-numpy-compatibility.rst
Dask Array implements a subset of the NumPy ndarray interface using blocked algorithms, cutting up the large array into many small arrays. This lets us compute on arrays larger than memory using all of our cores. We coordinate these blocked algorithms using Dask graphs.
.. raw:: html
<iframe width="560" height="315" src="https://www.youtube.com/embed/9h_61hXCDuI" style="margin: 0 auto 20px auto; display: block;" frameborder="0" allow="accelerometer; autoplay; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>Dask Array is used across a wide variety of applications — anywhere where working with large array dataset.
This
Analyzing the National Water Model with Xarray, Dask, and Coiled <https://docs.coiled.io/user_guide/xarray.html?utm_source=dask-docs&utm_medium=array>_
example process 6 TB of geospatial data on a cluster using Xarray and Dask Array.
The cluster in this example is deployed with
Coiled <https://coiled.io/?utm_source=dask-docs&utm_medium=array>_,
but there are many options for managing and deploying Dask.
See our :doc:deploying documentation for more information on deployment options.
You can also visit https://examples.dask.org/array.html for a collection of additional examples.
.. image:: images/dask-array.svg :alt: Dask arrays coordinate many numpy arrays :align: right :scale: 35%
Dask arrays coordinate many NumPy arrays (or "duck arrays" that are sufficiently NumPy-like in API such as CuPy or Sparse arrays) arranged into a grid. These arrays may live on disk or on other machines.
New duck array chunk types (types below Dask on
NEP-13's type-casting hierarchy_) can be registered via
:func:~dask.array.register_chunk_type. Any other duck array types that are
not registered will be deferred to in binary operations and NumPy
ufuncs/functions (that is, Dask will return NotImplemented). Note, however,
that any ndarray-like type can be inserted into a Dask Array using
:func:~dask.array.Array.from_array.
Dask Array is used in fields like atmospheric and oceanographic science, large scale imaging, genomics, numerical algorithms for optimization or statistics, and more.
Dask arrays support most of the NumPy interface like the following:
+, *, exp, log, ...sum(), mean(), std(), sum(axis=0), ...tensordottransposex[:100, 500:100:-2]x[:, [10, 1, 5]]__array__ and __array_ufunc__svd, qr, solve, solve_triangular, lstsqHowever, Dask Array does not implement the entire NumPy interface. Users expecting this will be disappointed. Notably, Dask Array lacks the following features:
np.linalg has not been implemented.
This has been done by a number of excellent BLAS/LAPACK implementations,
and is the focus of numerous ongoing academic research projectssort which are notoriously
difficult to do in parallel, and are of somewhat diminished value on very
large data (you rarely actually need a full sort).
Often we include parallel-friendly alternatives like topktolist that would be very
inefficient for larger datasets. Likewise, it is very inefficient to iterate
over a Dask array with for loopsSee :doc:the dask.array API<array-api> for a more extensive list of
functionality.
By default, Dask Array uses the threaded scheduler in order to avoid data
transfer costs, and because NumPy releases the GIL well. It is also quite
effective on a cluster using the dask.distributed_ scheduler.
.. _dask.distributed: https://distributed.dask.org/en/latest/
.. _NEP-13's type-casting hierarchy: https://numpy.org/neps/nep-0013-ufunc-overrides.html#type-casting-hierarchy