-
Notifications
You must be signed in to change notification settings - Fork 42
Usage with Docker
To be able to use Docker in your system, you need to have installed Docker-engine. Step-by-step instructions for this for Windows 10, MacOS, and Linux distributions can be found in Docker documentation.
These instructions have been written for Linux use, but most of them should work also when using Windows or MacOS. Windows and MacOS users should make sure the available memory for Docker is at least 8 GB (click the whale icon in the notification area, and select Settings -> Advanced). In Windows you should use a Command Prompt or PowerShell terminal window for entering the commands.
"Installation" is very easy: The following command will download the Docker image for Annif from quay.io registry (if the image does not yet exist locally) and start the Bash shell in a container:
docker run -it --rm quay.io/natlibfi/annif bash
In the shell it is possible to run Annif commands (here the -it
option is for enabling interactive mode and --rm
for removing the container when it is exited). The container can be exited with exit
command. (Note that by default Docker image with the latest
tag is used, which in case of Annif is build on the current master
git branch; to use an image of a specific release, append the image name with a colon and the release version number, use e.g. quay.io/natlibfi/annif:0.42
. The first release with Docker image is 0.42)
However, the Annif image itself does not contain any vocabulary or training data. A directory containing these can be bind mounted from the host file system to the container using the syntax -v /absolute_path/on/host:/path/in/container
after the docker run
command. (Alternatively, it is possible to create and mount a named volume, which initially is empty, and get data into it by copying from host or fetching from internet, e.g. using wget in a running container to dowload Annif-corpora GitHub repository.
Also, the user in a Docker container is by default not the same as on the host system and any file created in a container is not owned by the host user, and with bind mounts this can lead to problems with file permissions. Therefore it is best to make the user in the container the same as on the host, using -u $(id -u):$(id -g)
(in Windows this is not possible and this option can be omitted).
With the bind-mount and user-setting options the command to run bash in a container with Annif looks like this:
docker run \
-v ~/annif-projects:/annif-projects \
-u $(id -u):$(id -g) \
-it quay.io/natlibfi/annif bash
Here the annif-projects/
directory is assumed to exist in home directory on host (and it is mounted with the same name on the root of the container filesystem). From here on the post-installation steps for using Annif in Getting Started can be followed.
Specifically, the template configuration file projects.cfg.dist
can be placed to ~/annif-projects/
in the host system with the name projects.cfg
along the vocabulary and training data (e.g. Annif-corpora).
Note that any data should not be stored in other locations in the container but in the mounted directory, as after the container has stopped, it is not convenient to gain access to the data again.
If the web UI started by annif run
is used from within the container, also the option --network="host"
needs to be included in the docker run
command.
If the pre-built image does not suit your needs, you can customize the Dockerfile as wished and build your own image. However, if you would like to just reduce the image size by dropping some optional features or backends, the default Dockerfile can be used straight from the GitHub repository; the list of optional Python dependencies to install can be given using the --build-arg
option of the build command. For example, to install only Omikuji and Voikko dependencies, the command is
docker build \
--build-arg optional_dependencies=omikuji,voikko \
--tag annif-custom https://github.com/NatLibFi/Annif.git
If you have chosen to include the spaCy analyzer optional feature (included by default), you can also customize the selection of spaCy models included in the Docker image by adjusting the spacy_models
build argument. It takes a comma-separated list of spaCy model names. The default is to include only the English small model (en_core_web_sm
). For example, if you want to include both English and German models, use this command:
docker build \
--build-arg spacy_models=en_core_web_sm,de_core_news_sm \
--tag annif-custom https://github.com/NatLibFi/Annif.git
Different containerized services can be conveniently linked together by using docker-compose. The instructions to set up the services are in docker-compose.yml
, which in this case instructs docker to start separate containers for
- Gunicorn server running Annif Web UI
- NGINX proxy server
To start these services, while in a directory where the docker-compose.yml
is (only this file is necessary; the whole Annif repository is not), run
ANNIF_PROJECTS=~/annif-projects MY_UID=$(id -u) MY_GID=$(id -g) docker-compose up
Here the environment variables are needed for mounting the directory for vocabulary and training data files and setting the user in the container the same as on the host. In Windows setting these variables should be omitted and the lines including MY_UID
and MY_GID
in docker-compose.yml
removed, and there also the path of the directory to be mounted should be directly given in place of ${ANNIF_PROJECTS}
(e.g. c:/users/example.user/annif/annif-projects/
). Once the services have started, the Annif web UI is accessible at http://localhost/ run by NGINX (see this in case of problems for accessing localhost in Windows).
Note that the NGINX configuration file for proxying requests to Annif is created when the NGINX starts; this avoids the need to mount that file from host as the configuration is contained inline in the docker-compose.yaml
.
To connect to the already running annif_app
container for using Annif commands, run
docker exec -it -u $(id -u):$(id -g) annif_annif_app_1 bash
In the shell all the Annif commands can now be used.
Note also that the docker run
or docker-compose up
commands do not automatically fetch a new version of an image, even if one is available in repository. To update to the most recent image or images, you must run docker pull IMAGE_NAME
or docker-compose pull
.
The docker-compose
command is mostly intended to be used for local development, not for production. For that a more suitable approach is to run containers in swarm mode. For example see this compose file, which shows how the stack for api.annif.org used to be set up until 2022.
Currently the containers for api.annif.org and Finto AI are run in an OpenShift environment, see Finto AI repository.
It is possible to mount also the Annif source code into the container, which allows editing it in the host system with your favourite editor but running and testing it in the container. For running tests, while in the directory with the Annif source, use the following:
docker run \
-v ~/annif-projects:/annif-projects \
-v $(pwd):/Annif \
-u $(id -u):$(id -g) \
-w /Annif \
-it quay.io/natlibfi/annif pytest
- Home
- Getting started
- System requirements
- Optional features and dependencies
- Usage with Docker
- Architecture
- Commands
- Web user interface
- REST API
- Corpus formats
- Project configuration
- Analyzers
- Transforms
- Language detection
- Hugging Face Hub integration
- Achieving good results
- Reusing preprocessed training data
- Running as a WSGI service
- Backward compatibility between Annif releases
- Backends
- Development flow, branches and tags
- Release process
- Creating a new backend