Skip to content

Docker on Windows

Király Péter edited this page Apr 28, 2023 · 7 revisions

In Windows the docker setup is a bit different than in linux. The biggest problem is that I/O operations between the host and the container file systems are quite slow, and QA catalogue constantly reads from, and frequently writes to files, which makes a simple mount slow. It also improves the speed if you use a virtual Ubuntu within Windows, and run commands within this environment.

Setup virtual linux (optional)

  1. Open command line or PowerShell and enter:
wsl --install -d Ubuntu
wsl --set-default-version 2
wsl --set-default ubuntu
  1. In Docker Desktop to to Settings > Resources > WSL Integration and enable integration with Ubuntu, then restart Docker Desktop.

  2. in Windows search enter Ubuntu and click on the Ubuntu icon, or in command line/PowerShell enter

ubuntu

When you enter this virtual Ubuntu the first time, you should give a user name (which might be the same or different as your Windows user name), and a password.

You can find more details and troubleshooting in the following documentation pages:

Mount and copy data

As mentioned before the bottleneck in Windows is the slowness of I/O operations, so we should have an extra step. After we mount the data, we should copy it to the marc directory, which is not mounted, so the I/O operations are much faster on it.

Suppose you have the MARC/PICA files in a subdirectory of the current directory called data. Most importantly, Windows use %cd% to refer to the current director.

Steps:

  1. remove the existing container (if exists)
  2. recreate the container with

a. If you are in Windows command line or PowerShell:

docker run -d -v %cd%\data:/opt/metadata-qa-marc/mounted -p 8983:8983 -p 80:80 --name metadata-qa-marc pkiraly/metadata-qa-marc:0.6.0

b. If you are in the virtual Ubuntu:

docker run -d -v $(pwd)/data:/opt/metadata-qa-marc/mounted -p 8983:8983 -p 80:80 --name metadata-qa-marc pkiraly/metadata-qa-marc:0.6.0
  1. find the container identifier
docker ps -aqf "name=metadata-qa-marc"

it will give you a string such as bc0388a936c4. You should use the actual value you receive in the following command

  1. copy your files from the host's just mounted mounted directory to the marc directory (/opt/metadata-qa-marc/marc) of the container.
docker exec -it bc0388a936c4 cp mounted/[your file(s)] marc

Check the marc directory within the container:

docker exec -it bc0388a936c4 ls -la /opt/metadata-qa-marc/marc

If the list will contain the files in your host's data directory, then you can move on to the analysis phase.

Now your files are in a fast disk location, so you can start the launching the data analysis.

run the analysis

  1. If you run it from Windows command line or PowerShell, you can not use the linux's \ at the end of a line which tells the shell that the next line belongs to this command. Instead, you should put all the arguments into a single line:
docker container exec -ti metadata-qa-marc ./metadata-qa.sh --params "--marcxml --fixAlma --emptyLargeCollectors" --mask "*.xml" --catalogue gent all

Important: --mask "*.xml" is a file mask. If you have a thousand files, all will be analysed. You might want to start with a single or a small number of files to make sure that everything is working:

docker container exec -ti metadata-qa-marc ./metadata-qa.sh --params "--marcxml --fixAlma" --mask "BIB01.xml" --catalogue gent all
  1. To check if the process is still running:

This is not different in Windows and linux:

docker container exec -it metadata-qa-marc ps -aux

It will return a number of lines, one of them is the following:

root         131  0.0  0.0   4352  3248 pts/0    Ss+  13:15   0:00 /bin/bash ./metadata-qa.sh --params --marcxml --fixAlma --mask *.xml --catalogue gent all

where the parameters after ./metadata-qa.sh reflects the values you gave.

Clone this wiki locally