-
Notifications
You must be signed in to change notification settings - Fork 17
Docker on Windows
In Windows the docker setup is a bit different than in linux. The biggest problem is that I/O operations between the host and the container file systems are quite slow, and QA catalogue constantly reads from, and frequently writes to files, which makes a simple mount slow. It also improves the speed if you use a virtual Ubuntu within Windows, and run commands within this environment.
- Open command line or PowerShell and enter:
wsl --install -d Ubuntu
wsl --set-default-version 2
wsl --set-default ubuntu
-
In Docker Desktop to to Settings > Resources > WSL Integration and enable integration with Ubuntu, then restart Docker Desktop.
-
in Windows search enter
Ubuntu
and click on the Ubuntu icon, or in command line/PowerShell enter
ubuntu
When you enter this virtual Ubuntu the first time, you should give a user name (which might be the same or different as your Windows user name), and a password.
You can find more details and troubleshooting in the following documentation pages:
- WSL Installation
- Get started with Docker remote containers on WSL 2
- Docker Desktop WSL 2 backend
- Install and run Docker natively on Windows 10 Home
As mentioned before the bottleneck in Windows is the slowness of I/O operations, so we should have an extra step. After we mount the data, we should copy it to the marc
directory, which is not mounted, so the I/O operations are much faster on it.
Suppose you have the MARC/PICA files in a subdirectory of the current directory called data
. Most importantly, Windows use %cd%
to refer to the current director.
Steps:
- remove the existing container (if exists)
- recreate the container with
a. If you are in Windows command line or PowerShell:
docker run -d -v %cd%\data:/opt/metadata-qa-marc/mounted -p 8983:8983 -p 80:80 --name metadata-qa-marc pkiraly/metadata-qa-marc:0.6.0
b. If you are in the virtual Ubuntu:
docker run -d -v $(pwd)/data:/opt/metadata-qa-marc/mounted -p 8983:8983 -p 80:80 --name metadata-qa-marc pkiraly/metadata-qa-marc:0.6.0
- find the container identifier
docker ps -aqf "name=metadata-qa-marc"
it will give you a string such as bc0388a936c4
. You should use the actual value you receive in the following command
- copy your files from the host's just mounted
mounted
directory to themarc
directory (/opt/metadata-qa-marc/marc
) of the container.
docker exec -it bc0388a936c4 cp mounted/[your file(s)] marc
Check the marc
directory within the container:
docker exec -it bc0388a936c4 ls -la /opt/metadata-qa-marc/marc
If the list will contain the files in your host's data
directory, then you can move on to the analysis phase.
Now your files are in a fast disk location, so you can start the launching the data analysis.
- If you run it from Windows command line or PowerShell, you can not use the linux's
\
at the end of a line which tells the shell that the next line belongs to this command. Instead, you should put all the arguments into a single line:
docker container exec -ti metadata-qa-marc ./metadata-qa.sh --params "--marcxml --fixAlma --emptyLargeCollectors" --mask "*.xml" --catalogue gent all
Important: --mask "*.xml"
is a file mask. If you have a thousand files, all will be analysed. You might want to start with a single or a small number of files to make sure that everything is working:
docker container exec -ti metadata-qa-marc ./metadata-qa.sh --params "--marcxml --fixAlma" --mask "BIB01.xml" --catalogue gent all
- To check if the process is still running:
This is not different in Windows and linux:
docker container exec -it metadata-qa-marc ps -aux
It will return a number of lines, one of them is the following:
root 131 0.0 0.0 4352 3248 pts/0 Ss+ 13:15 0:00 /bin/bash ./metadata-qa.sh --params --marcxml --fixAlma --mask *.xml --catalogue gent all
where the parameters after ./metadata-qa.sh
reflects the values you gave.