-
Notifications
You must be signed in to change notification settings - Fork 42
Usage with Docker
To be able to use Docker in your system, you need to have installed Docker-engine. Step-by-step instructions for this for Windows, Mac, and Linux distributions can be found in Docker documentation.
In case you are using Linux, you can get the Annif docker image from the docker registry at https://quay.io/ with:
docker pull quay.io/natlibfi/annif
Then the bash shell can be started in a container with ready-to-use Annif with:
docker run -it quay.io/natlibfi/annif bash
In the shell it is possible to run Annif Commands (here the -it
flag is for enabling interactive mode). The container can be exited with exit
.
However, the Annif image itself does not contain any vocabulary or training data. A directory containing these can be bind mounted as a volume from the host file system to the container using the syntax -v /absolute_path/on/host:/path/in/container
after the docker run
command1. Also, the user in a docker container is by default not the same as on the host system and any file created in a container is not owned by the host user, and with bind-mounts this can lead to issues with file permissions. Therefore it is best to make the user in the container the same as on the host, using -u $(id -u):$(id -g)
. With these flags the command to run bash in a container with Annif looks like this:
docker run \
-v ~/annif-projects:/annif-projects \
-u $(id -u):$(id -g) \
-it quay.io/natlibfi/annif bash
Here the annif-projects/
directory is assumed to exist in home directory on host (and it is mounted with the same name on the root of the container filesystem). From here on the post-installation steps for using Annif in Getting Started can be followed.
Specifically, the template configuration file projects.cfg.dist
can be placed to ~/annif-projects/
in the host system with the name projects.cfg
along the vocabulary and training data (e.g. Annif-corpora).
Note that any data should not be stored in other locations in the container but in the mounted directory, as after the container has stopped, it is not convenient to gain access to the data again.
If the web UI started by annif run
is used from within the container, also the flag --network="host"
needs to be included in the docker run
command.
Different containerized services can be conveniently linked together by using docker-compose. The instructions to set up the services are in docker-compose.yml
, which in this case instructs docker to start separate containers for
- bash shell to run Annif commands
- Gunicorn server running Annif
- NGINX proxy server
- Mauiservice to access Maui backend
To start these services, while in Annif/
run
ANNIF_PROJECTS=~/annif-projects UID=${UID} GID=${GID} docker-compose up
Here the environment variables are needed for mounting the directory for vocabulary and training data files and setting the user in the container the same as on the host. Once the services have started, the Annif web UI is accessible at http://localhost/ run by NGINX.
To connect to the already running bash
service for using Annif commands, run
docker exec -it -u $(id -u):$(id -g) annif_bash_1 bash
To create model for Maui backend (see here for details), run
docker exec -u $(id -u):$(id -g) annif_mauiservice_1 \
java -Xmx4G -cp maui-1.4.5-jar-with-dependencies.jar com.entopix.maui.main.MauiModelBuilder -l /annif-projects/Annif-corpora/fulltext/kirjastonhoitaja/maui-train/ -m /annif-projects/kirjastonhoitaja -v /annif-projects/Annif-corpora/vocab/yso-skos.rdf -f skos -i fi -s StopwordsFinnish -t CachingFinnishStemmer
To connect to the Maui backend while running via docker-compose
, in the endpoint entries of projects.cfg
file the default localhost
needs to be replaced by mauiservice
(and when trained as above the model name is kirjastonhoitaja
, and the full entry is then endpoint=http://mauiservice:8080/mauiservice/kirjastonhoitaja/analyze
). Note also that to be able to use a new model the services need to be restarted.
A custom Mauiservice configuration file can be used by changing the path in the env JAVA_OPTS="-DMAUISERVICE_CONFIGURATION=/srv/maui/mauiservice.ini"
in docker-compose.yml
to the path of the customized file, e.g. to /annif-projects/mauiservice.ini
, which is then mounted from the host system and can be conveniently edited.
The docker-compose.yml
can be edited to remove unnecessary services, e.g. if if one only wants to use the Maui backend. Note that the mauiservice container can also be run withouth docker-compose
, and in that case the container needs to be started with --network="host"
flag so it is accessible from the host system.
It is possible to mount also the Annif source code into the container, which allows editing the code in the host system while running Annif and tests (included in the annif-dev
image) in the container:
docker run \
-v ~/annif-projects:/annif-projects \
-v $(pwd):/Annif \
-u $(id -u):$(id -g) \
-it quay.io/natlibfi/annif:dev bash
Here it is assumed that the current working directory is the one containing the source code (thus the use of $(pwd)
).
-
Train models and store
projects.cfg
anddata/
directory to~/annif-projects
. -
Build a data image (which could also be versioned with a custom tag, the default tag is
latest
):docker build -t quay.io/natlibfi/annif-data -f Dockerfile-data ~/annif-projects
Here the data for models are included in the image, but the corpora are not (even if they happen to reside in
~/annif-projects
). -
Push the image to https://quay.io/repository/natlibfi/annif-data repository:
docker push quay.io/natlibfi/annif-data
-
In the Services view of Portainer first select the data service (
annif-test_data
) and update it using the GUI button. Select to pull the latest image version when asked. Then, to make Annif use the new data, similarly update the Gunicorn_server service (now pulling the latest image is not necessary).
1: Alternatively, it is possible to create and bind a named volume, which initially is empty, and get data into it by copying from host or fetching from internet, e.g. using wget in a running container to dowload Annif-corpora Git Hub page:
wget -O - https://github.com/NatLibFi/Annif-corpora/tarball/master | tar xz
- Home
- Getting started
- System requirements
- Optional features and dependencies
- Usage with Docker
- Architecture
- Commands
- Web user interface
- REST API
- Corpus formats
- Project configuration
- Analyzers
- Transforms
- Language detection
- Hugging Face Hub integration
- Achieving good results
- Reusing preprocessed training data
- Running as a WSGI service
- Backward compatibility between Annif releases
- Backends
- Development flow, branches and tags
- Release process
- Creating a new backend