doc/neps/nep-0044-restructuring-numpy-docs.rst
.. _NEP44:
:Author: Ralf Gommers :Author: Melissa Mendonça :Author: Mars Lee :Status: Accepted :Type: Process :Created: 2020-02-11 :Resolution: https://mail.python.org/pipermail/numpy-discussion/2020-March/080467.html
This document proposes a restructuring of the NumPy Documentation, both in form and content, with the goal of making it more organized and discoverable for beginners and experienced users.
See here <https://numpy.org/devdocs/>_ for the front page of the latest docs.
The organization is quite confusing and illogical (e.g. user and developer docs
are mixed). We propose the following:
The documentation is a fundamental part of any software project, especially open source projects. In the case of NumPy, many beginners might feel demotivated by the current structure of the documentation, since it is difficult to discover what to learn (unless the user has a clear view of what to look for in the Reference docs, which is not always the case).
Looking at the results of a "NumPy Tutorial" search on any search engine also gives an idea of the demand for this kind of content. Having official high-level documentation written using up-to-date content and techniques will certainly mean more users (and developers/contributors) are involved in the NumPy community.
The restructuring will effectively demand a complete rewrite of links and some of the current content. Input from the community will be useful for identifying key links and pages that should not be broken.
As discussed in the article [1]_, there are four categories of doc content:
We propose to use those categories as the ones we use (for writing and reviewing) whenever we add a new documentation section.
The reasoning for this is that it is clearer both for developers/documentation writers and to users where each piece of information should go, and the scope and tone of each document. For example, if explanations are mixed with basic tutorials, beginners might be overwhelmed and alienated. On the other hand, if the reference guide contains basic how-tos, it might be difficult for experienced users to find the information they need, quickly.
Currently, there are many blogs and tutorials on the internet about NumPy or using NumPy. One of the issues with this is that if users search for this information they may end up in an outdated (unofficial) tutorial before they find the current official documentation. This can be especially confusing, especially for beginners. Having a better infrastructure for the documentation also aims to solve this problem by giving users high-level, up-to-date official documentation that can be easily updated.
Reference guide ^^^^^^^^^^^^^^^
NumPy has a quite complete reference guide. All functions are documented, most have examples, and most are cross-linked well with See Also sections. Further improving the reference guide is incremental work that can be done (and is being done) by many people. There are, however, many explanations in the reference guide. These can be moved to a more dedicated Explanations section on the docs.
How-to guides ^^^^^^^^^^^^^
NumPy does not have many how-to's. The subclassing and array ducktyping section may be an example of a how-to. Others that could be added are:
threadpoolctl, using
multiprocessing, random number generation, etc.).npy/.npz format, text formats, Zarr, HDF5,
Bloscpack, etc.)Explanations ^^^^^^^^^^^^
There is a reasonable amount of content on fundamental NumPy concepts such as indexing, vectorization, broadcasting, (g)ufuncs, and dtypes. This could be organized better and clarified to ensure it's really about explaining the concepts and not mixed with tutorial or how-to like content.
There are few explanations about anything other than those fundamental NumPy concepts.
Some examples of concepts that could be expanded:
In addition, there are many explanations in the Reference Guide, which should be moved to this new dedicated Explanations section.
Tutorials ^^^^^^^^^
There's a lot of scope for writing better tutorials. We have a new NumPy for
absolute beginners tutorial [3]_ (GSoD project of Anne Bonner). In addition we
need a number of tutorials addressing different levels of experience with Python
and NumPy. This could be done using engaging data sets, ideas or stories. For
example, curve fitting with polynomials and functions in numpy.linalg could
be done with the Keeling curve (decades worth of CO2 concentration in air
measurements) rather than with synthetic random data.
Ideas for tutorials (these capture the types of things that make sense, they're not necessarily the exact topics we propose to implement):
Nicolas Rougier's book <https://www.labri.fr/perso/nrougier/from-python-to-numpy/#the-game-of-life>_)gridMet data <http://www.climatologylab.org/gridmet.html>_)(n_speech, n_sentences, n_words))The Preparing to Teach document [2]_ from the Software Carpentry Instructor Training materials is a nice summary of how to write effective lesson plans (and tutorials would be very similar). In addition to adding new tutorials, we also propose a How to write a tutorial document, which would help users contribute new high-quality content to the documentation.
Data sets
Using interesting data in the NumPy docs requires giving all users access to
that data, either inside NumPy or in a separate package. The former is not the
best idea, since it's hard to do without increasing the size of NumPy
significantly.
Whenever possible, documentation pages should use examples from the
:mod:`scipy.datasets` package.
Related work
============
Some examples of documentation organization in other projects:
- `Documentation for Jupyter <https://jupyter.org/documentation>`_
- `Documentation for Python <https://docs.python.org/3/>`_
- `Documentation for TensorFlow <https://www.tensorflow.org/learn>`_
These projects make the intended audience for each part of the documentation
more explicit, as well as previewing some of the content in each section.
Implementation
==============
Currently, the `documentation for NumPy <https://numpy.org/devdocs/>`_ can be
confusing, especially for beginners. Our proposal is to reorganize the docs in
the following structure:
- For users:
- Absolute Beginners Tutorial
- main Tutorials section
- How Tos for common tasks with NumPy
- Reference Guide (API Reference)
- Explanations
- F2Py Guide
- Glossary
- For developers/contributors:
- Contributor's Guide
- Under-the-hood docs
- Building and extending the documentation
- Benchmarking
- NumPy Enhancement Proposals
- Meta information
- Reporting bugs
- Release Notes
- About NumPy
- License
Ideas for follow-up
-------------------
Besides rewriting the current documentation to some extent, it would be ideal
to have a technical infrastructure that would allow more contributions from the
community. For example, if Jupyter Notebooks could be submitted as-is as
tutorials or How-Tos, this might create more contributors and broaden the NumPy
community.
Similarly, if people could download some of the documentation in Notebook
format, this would certainly mean people would use less outdated material for
learning NumPy.
It would also be interesting if the new structure for the documentation makes
translations easier.
Discussion
==========
Discussion around this NEP can be found on the NumPy mailing list:
- https://mail.python.org/pipermail/numpy-discussion/2020-February/080419.html
References and footnotes
========================
.. [1] `Diátaxis - A systematic framework for technical documentation authoring <https://diataxis.fr/>`_
.. [2] `Preparing to Teach <https://carpentries.github.io/instructor-training/15-lesson-study/index.html>`_ (from the `Software Carpentry <https://software-carpentry.org/>`_ Instructor Training materials)
.. [3] `NumPy for absolute beginners Tutorial <https://numpy.org/devdocs/user/absolute_beginners.html>`_ by Anne Bonner
Copyright
=========
This document has been placed in the public domain.