This section shows a way to configure a development environment that allows you to run tests and build documentation.
virtualenv env
source env/bin/activate
pip install -U pip setuptools
pip install -e .[opencv,tf,test,torch]
Additionally, you can use the Dockerized Linux workspace via the Makefile provided at docker/Makefile. The following will build the Docker image, start a running container with petastorm source mounted into it from the host, and open a BASH shell into it (you must have GNU Make and Docker installed beforehand):
make build run shell
Within the Dockerized workspace, you can find the Python virtual environments at /petastorm_venv2.7 and /petastorm_venv3.6, and the local petastorm/ mounted at /petastorm.
To run unit tests:
pytest -v petastorm
NOTE: you need to have Java 1.8 to be installed for the test to pass (it's a dependency of Spark)
pytest
has mulitple useful plugins. Consider installing the following plugins:
pip install pytest-xdist pytest-repeat pytest-pycharm
which enable you to run tests in parallel (-n
switch) and repeat tests multiple times (--count
switch)
Some unit tests rely on mock data. Generating these datasets is not very fast as it spins up local Spark isntance.
Use -Y
switch to cache these datasets. Be careful, as the dataset generation exercises Petastorm code, hence
in some cases you would need to invalidate the cache for the test to take all code changes into account.
Use --cache-clear
switch to do so.
The petastorm project uses sphinx autodoc capabilities, along with free documentation hosting by ReadTheDocs.org (RTD), to serve up auto-generated API docs on http://petastorm.rtfd.io .
The RTD site is configured via webhooks to trigger sphinx doc builds from changes in the petastorm github repo. Documents are configured to build the same locally or on RTD.
All the source files needed to generate the autodocs reside under docs/autodoc/
.
To make documents locally:
pip install -e .[docs]
cd docs/autodoc
# To nuke all generated HTMLs
make clean
# Each run incrementally updates HTML based on file changes
make html
Once the HTML build process completes successfully, naviate your browser to
file:///tmp/autodocs/_build/html/index.html
.
Some changes may require build and deployment to see, including:
- Changes to
readthedocs.yml
- Changes to
docs/autodoc/conf.py
- A change that makes RTD build different from a local build
To see the above documentation changes:
- One needs to create a petastorm branch and push it
- Then configure RTD to activate a version for that branch
- A project maintainer will need to effect such version activation
- The status of a built version, as well as the resulting docs, can then be viewed
By default, RTD defines the latest
version, which can be pointed at master
or another branch. Additionally, each release may have an associated RTD build
version, which must be explicitly activated in the
Versions settings page.
As with any source file, once a release is tagged, it is essentially immutable, so be sure that all the desired documentation changes are in place before tagging a release.
Note that conf.py
defines a release
and version
property. For ease
of maintenance, we've set that to be the same version string as defined in
petastorm/__init__.py
.
- Due to RTD's build resource limitations, we are unable to pip install any of the petastorm extra-required library packages.
- Since Sphinx must be able to load a python module to read its docstrings,
the doc page for any module that imports
cv2
,tensorflow
, ortorch
will, unfortunately, fail to build. - The alabaster Sphinx theme defaults to using
travis-ci.org
for the Travis CI build badge, whereas the petastorm project is served on.com
, so we don't currently have a working Travis CI build status.
Sphinx has the ability to auto-generate the entire API, either via the
autosummary extension, or the sphinx-apidoc
tool.
The following sphinx-apidoc
invocation will autogenerate an api/
subdirectory of rST files for each of the petastorm modules. Those files can
then be glob'd into a TOC tree.
cd docs/autodocs
sphinx-apidoc -fTo api ../.. ../../setup.py
The apidoc_experiment
branch and RTD output demonstrates the outcome of
vanilla usage. Actually leveraging this approach to produce uncluttered
auto-generated API doc will require:
- Code package reorganization
- Experimentation with sphinx settings, if available, to shorten link names
- Configuration change to auto-run
sphinx-apidoc
in RTD build, as opposed to committing theapi/*.rst
files