web/dev/jenkins.rst
This is an overview of how our continuous integration_ setup works. It
includes a quick introduction to the tasks it runs, and the later sections
detail the process of setting up these tasks.
Our continuous integration is currently hosted at Shining Panda, free thanks
to their FLOSS program. The setup is not specific to their solutions, it could
be moved to any Jenkins instance. The URL of our current instance is
https://jenkins.shiningpanda.com/nltk/
.. _continuous integration: https://en.wikipedia.org/wiki/Continuous_integration
.. _Shining Panda: http://shiningpanda.com
.. _Jenkins: https://jenkins-ci.org
The base tasks of the c-i instance is as follows:
Because the NLTK build environment is highly customized, we only run tests on one configuration - the lowest version supported. NLTK 2 supports python down to version 2.5, so all tests are run using a python2.5 virtualenv. The virtualenv configuration is slightly simplified on ShiningPanda machines by their having compiled all relevant python versions and making virtualenv use these versions in their custom virtualenv builders.
All operations are done against the NLTK repos on Github_. The Jenkins
instance on ShiningPanda has a limit to the build time it can use each day.
Because of this, it only polls the main NLTK repo once a day, using the Poll SCM option in Jenkins. Against the main code repo it uses public access only,
and for pushing to the nltk.github.com repo it uses the key of the user
nltk-webdeploy.
.. _NLTK repos on Github: https://github.com/nltk/
To build the project, the following tasks are run:
git describe --tags --match '*.*.*' > nltk/VERSION.
This makes the most recent VCS tag available in nltk.version etc.python setup.py build
This essentially copies the files that are required to run NLTK into build/The tests require that all dependencies be installed. These have all been installed beforehand, and to make them run a series of extra environment variables are initialized. These dependencies will not be detailed until the last section.
The test suite itself consists of doctests and unittests. Doctests are found in
each module as docstrings, and in all the .doctest files under the test folder in
the nltk repo. We run these tests using pytest_, find code coverage using
pytest-cov_ and check for PEP-8_ etc. standard violations using pylint_.
All these tools are easily installable through pip your favourite OS' software
packaging system. For testing, you can install the requirements with pip install -r requirements-test.txt
The results of these programs are parsed and published by the jenkins instance, giving us pretty graphs :)
.. _pytest: https://docs.pytest.org/
.. _pytest-cov: https://pytest-cov.readthedocs.io/
.. _PEP-8: https://www.python.org/dev/peps/pep-0008/
.. _pylint: https://pylint.org/
The packages are built using make dist. The outputted builds are all placed
in our jenkins workspace_ and should be safe to distribute. Builds
specifically for mac are not available. File names are made based on the
__version__ string, so they change every build.
.. _in our jenkins workspace: https://example.com/
The web page is built using Sphinx_. It fetches all code documentation directly
from the code's docstrings. After building the page using make web it
pushes it to the nltk.github.com repo on github_. To push it, it needs access
to the repo – because this cannot be done using a deploy key, it has the ssh
key of the nltk-webdeploy user.
.. _Sphinx: https://www.sphinx-doc.org
.. _nltk.github.com repo on github: https://github.com/nltk/nltk.github.com