This is a project to create a reference code in NumPy for aperture synthesis imaging. It is structured to match the SDP processing architecture.
In many software packages, the only function specification is the application code itself. Although the underlying algorithm may be documented (e.g. published), the implementation tends to diverge overtime, making this method of documentation less effective.
The Algorithm Reference Library is designed to present calibration and imaging algorithms in a simple Python-based form. This is so that the implemented functions can be seen and understood without resorting to interpreting source code shaped by real-world concerns such as optimisations.
The actual executable code may be accessed directly from the documentation.
The ARL has a few dependencies:
- Python 3.6+
- git 2.11.0+
- git-lfs 2.2.1+
- Python package dependencies as defined in the requirements.txt
The Python dependencies will install (amongst other things) Jupyter, numpy, scipy, scikit, and Dask. Because of this it maybe advantageous to setup a virtualenv to contain the project - instructions can be found here.
Note that git-lfs is required for some data files. Complete platform dependent installation instructions can be found here.
Finally, add the location of the installation (e.g. ~/algorithm-reference-library/ )to PYTHONPATH
install required packages for git-lfs:
sudo apt-get install software-properties-common python-software-properties build-essential curl
sudo add-apt-repository ppa:git-core/ppa
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt-get install git-lfs
# the following will update your ~/.gitconfig with the command filter for lfs
git lfs install
install the required packages for python3.6:
sudo apt-get install python3.6-dev python3-pip python3-tk virtualenv virtualenvwrapper
# note that pip will be install as pip3 and python will be python3
# so be mindful to make the correct substitutions below
# Optional for simple Dask test notebook - simple-dask.ipynb
sudo apt-get install graphviz
Optional: for those wishing to use pipenv and virtualenv
sudo pip3 install pipenv
virtualenv -p python3.6 /path/to/where/you/want/to/keep/arlvenv
# activate the virtualenv with:
. /path/to/where/you/want/to/keep/arlvenv/bin/activate
# optionally install the Bokeh server to enable the Dask diagnostics web interface
pip install bokeh
# if you want to use pytest
pip install pytest
# if you want to do the lint checking
pip install pylint
# permanently fix up the ARL lib path in the virtualenv with the following:
add2virtualenv /path/to/checked/out/algorithm-reference-library
# this updates arlvenv/lib/python3.x/site-packages/_virtualenv_path_extensions.pth
# to turn off/deactivate the virtualenv with:
deactivate
Pipenv/pipfile now provides a complete integrated virtual requirement and tracking which seem to be effective. You can run it simply as:
pipenv shell
And this will leave you in a shell with the environment fully setup.
-
Use git to make a local clone of the Github repository::
git clone https://github.com/SKA-ScienceDataProcessor/algorithm-reference-library
-
Change into that directory::
cd algorithm-reference-library
-
Install required python package::
pip install -r requirements.txt
-
Setup ARL::
python setup.py install
-
Get the data files form Git LFS::
git-lfs pull
-
you may need to fix up the python search path so that Jupyter can find the arl with something like:
export PYTHONPATH=/path/to/checked/out/algorithm-reference-library
The prime focus of the ARL is on learning and experimentation, not usage. If you are here to learn about the process of imaging, here is a quick guide to the project:
processing_library
: Algorithm independent library codeprocessing_components
: Processing functions used in algorithmsworkflows
: Serial and distributed processing workflowstests
: Unit and regression testsdocs
: Complete documentation. Includes non-interactive output of examples.data
: Data usedtools
: package requirements, and Docker image building recipe
Jupyter notebooks end with .ipynb
and can be run as follows from the
command line:
$ jupyter-notebook workflow/notebooks/imaging_serial.ipynb
The last build documentation is at:
http://www.mrao.cam.ac.uk/projects/jenkins/algorithm-reference-library/docs/build/html/index.html
For building the documentation you will need Sphinx as well as Pandoc. This will extract docstrings from the crocodile source code, evaluate all notebooks and compose the result to form the documentation package.
You can build it as follows:
$ make -C docs [format]
Omit [format] to view a list of documentation formats that Sphinx can generate for you. You probably want dirhtml.
Test and code analysis requires nosetests3 and flake8 to be installed.
install flake8, nose, and pylint:
sudo apt-get install flake8 python3-nose pylint3
All unit tests can be run with:
make unittest
or nose:
make nosetests
or pytest:
make pytest
Code analysis can be run with:
make code-analysis
In the tools/ directory is a Dockerfile that enables the notebooks, tests, and lint checks to be run in a container.
This has been tested with Docker-CE 17.09+ on Ubuntu 17.04. This can be installed by following the instructions here.
It is likely that the Makefile commands will not work on anything other than modern Linux systems (eg: Ubuntu 17.04) as it relies on command line tools to discover the host system IP address.
The project source code directory needs to be checked out where ever the containers are to be run
as the complete project is mounted into the container as a volume at /arl
.
For example - to launch the notebook server, the general recipe is:
docker run --name arl_notebook --hostname arl_notebook --volume /path/to/repo:/arl \
-e IP=my.ip.address --net=host -p 8888:8888 -p 8787:8787 -p 8788:8788 -p 8789:8789 \
-d arl_img
After a few seconds, check the logs for the Jupyter URL with:
docker logs arl_notebook
See the Makefile
for more examples.
To build the container image arl_img
required to launch the dockerised notebooks,tests, and lint checks run (from the root directory of this checked out project) - pass in the PYTHON variable to specify which build of Python to use - python3 or python3.6:
make docker_build PYTHON=python3
Then push to a given Docker repository:
make docker_push PYTHON=python3 DOCKER_REPO=localhost:5000
To run the Jupyter notebooks:
make docker_notebook
Wait for the command to complete, and it will print out the URL with token to use for access to the notebooks (example output):
...
Successfully built 8a4a7b55025b
...
docker run --name arl_notebook --hostname arl_notebook --volume $(pwd):/arl -e IP=${IP} \
--net=host -p 8888:8888 -p 8787:8787 -p 8788:8788 -p 8789:8789 -d arl_img
Launching at IP: 10.128.26.15
da4fcfac9af117c92ac63d4228087a05c5cfbb2fc55b2e281f05ccdbbe3ca0be
sleep 3
docker logs arl_notebook
[I 02:25:37.803 NotebookApp] Writing notebook server cookie secret to /root/.local/share/jupyter/runtime/notebook_cookie_secret
[I 02:25:37.834 NotebookApp] Serving notebooks from local directory: /arl/examples/arl
[I 02:25:37.834 NotebookApp] 0 active kernels
[I 02:25:37.834 NotebookApp] The Jupyter Notebook is running at:
[I 02:25:37.834 NotebookApp] http://10.128.26.15:8888/?token=2c9f8087252ea67b4c09404dc091563b16f154f3906282b7
[I 02:25:37.834 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 02:25:37.835 NotebookApp]
Copy/paste this URL into your browser when you connect for the first time,
to login with a token:
http://10.128.26.15:8888/?token=2c9f8087252ea67b4c09404dc091563b16f154f3906282b7
To run the tests:
make docker_tests
To run the lint checks:
make docker_lint
If you have a Docker Swarm cluster then the Dask cluster can be launched as follows:
Assuming that the Docker Image for the ARL has been built and pushed to a repository tag eg:
10.128.26.15:5000/arl_img:latest
And the Swarm cluster master resides on:
10.101.1.23
And the contents of arl/data is available on every work at:
/home/ubuntu/arldata
Then launch the Dask Scheduler, and Workers with the following:
docker -H 10.101.1.23 service create --detach=true \
--constraint 'node.role == manager' \
--name dask_scheduler --network host --mode=global \
10.128.26.15:5000/arl_img:latest \
dask-scheduler --host 0.0.0.0 --bokeh --show
docker -H 10.101.1.23 service create --detach=true \
--name dask_worker --network host --mode=global \
--mount type=bind,source=/home/ubuntu/arldata,destination=/arl/data \
10.128.26.15:5000/arl_img:latest \
dask-worker --host 0.0.0.0 --bokeh --bokeh-port 8788 --nprocs 4 --nthreads 1 --reconnect 10.101.1.23:8786
Now you can point the Dask client at the cluster with:
export ARL_DASK_SCHEDULER=10.101.1.23:8786
python examples/performance/pipelines-timings.py 4 4
It is possible to use daliuge as an (experimental) backend of the arlexecute module. This works by using daliuge's delayed function, which accepts the same parameters as dask's; therefore the change is simple, and transparent to the rest of the ARL code. See the Integration of ARL with the DALiuGE Execution Framework, part 2