migrate remaining galaxy-central, added dive tests, some fixes and op…

…timizations
bgruening · Nov 11, 2024 · 4739bb8 · 4739bb8
1 parent d2dc319
commit 4739bb8
Show file tree

Hide file tree

Showing 14 changed files with 117 additions and 103 deletions.
diff --git a/.dive-ci b/.dive-ci
@@ -0,0 +1,13 @@
+rules:
+  # If the efficiency is measured below X%, mark as failed.
+  # Expressed as a ratio between 0-1.
+  lowestEfficiency: 0.95
+
+  # If the amount of wasted space is at least X or larger than X, mark as failed.
+  # Expressed in B, KB, MB, and GB.
+  # highestWastedBytes: 20MB
+
+  # If the amount of wasted space makes up for X% or more of the image, mark as failed.
+  # Note: the base image layer is NOT included in the total image size.
+  # Expressed as a ratio between 0-1; fails if the threshold is met or crossed.
+  highestUserWastedPercent: 0.10
diff --git a/.github/workflows/single.sh b/.github/workflows/single.sh
@@ -12,6 +12,11 @@ sudo apt-get update -qq
 #sudo apt-get install docker-ce --no-install-recommends -y -o Dpkg::Options::="--force-confmiss" -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confnew"
 sudo apt-get install sshpass --no-install-recommends -y
 
+DIVE_VERSION=$(curl -sL "https://api.github.com/repos/wagoodman/dive/releases/latest" | grep '"tag_name":' | sed -E 's/.*"v([^"]+)".*/\1/')
+curl -OL https://github.com/wagoodman/dive/releases/download/v${DIVE_VERSION}/dive_${DIVE_VERSION}_linux_amd64.deb
+sudo apt install ./dive_${DIVE_VERSION}_linux_amd64.deb
+rm ./dive_${DIVE_VERSION}_linux_amd64.deb
+
 pip3 install ephemeris
 
 docker --version
@@ -100,38 +105,36 @@ date > time.txt
 # Test FTP Server get
 #curl -v --fail ftp://localhost:8021 --user $GALAXY_USER:$GALAXY_USER_PASSWD
 
+# Test SFTP Server
+sshpass -p $GALAXY_USER_PASSWD sftp -v -P 8022 -o User=$GALAXY_USER -o "StrictHostKeyChecking no" localhost <<< $'put time.txt'
+
 # Test CVMFS
 docker_exec bash -c "service autofs start"
 docker_exec bash -c "cvmfs_config chksetup"
 docker_exec bash -c "ls /cvmfs/data.galaxyproject.org/byhand"
 
-# Test SFTP Server
-sshpass -p $GALAXY_USER_PASSWD sftp -v -P 8022 -o User=$GALAXY_USER -o "StrictHostKeyChecking no" localhost <<< $'put time.txt'
-
 # Run a ton of BioBlend test against our servers.
 cd "$WORKING_DIR/test/bioblend/" && . ./test.sh && cd "$WORKING_DIR/"
 
-# not working anymore in 18.01
-# executing: /galaxy_venv/bin/uwsgi --yaml /etc/galaxy/galaxy.yml --master --daemonize2 galaxy.log --pidfile2 galaxy.pid  --log-file=galaxy_install.log --pid-file=galaxy_install.pid
-# [uWSGI] getting YAML configuration from /etc/galaxy/galaxy.yml
-# /galaxy_venv/bin/python: unrecognized option '--log-file=galaxy_install.log'
-# getopt_long() error
-# cat: galaxy_install.pid: No such file or directory
-# tail: cannot open ‘galaxy_install.log’ for reading: No such file or directory
-#- |
-#  if [ "${COMPOSE_SLURM}" ] || [ "${KUBE}" ] || [ "${COMPOSE_CONDOR_DOCKER}" ] || [ "${COMPOSE_SLURM_SINGULARITY}" ]
-#  then
-#    # Test without install-repository wrapper
-#      sleep 10
-#      docker_exec_run bash -c 'cd $GALAXY_ROOT_DIR && python ./scripts/api/install_tool_shed_repositories.py --api admin -l http://localhost:80 --url https://toolshed.g2.bx.psu.edu -o devteam --name cut_columns --panel-section-name BEDTools'
-#  fi
+# Test without install-repository wrapper
+curl -v --fail POST -H "Content-Type: application/json" -H "x-api-key: fakekey" -d \
+    '{
+        "tool_shed_url": "https://toolshed.g2.bx.psu.edu",
+        "name": "cut_columns",
+        "owner": "devteam",
+        "changeset_revision": "cec635fab700",
+        "new_tool_panel_section_label": "BEDTools"
+    }' \
+"http://localhost:8080/api/tool_shed_repositories"
 
 
 # Test the 'new' tool installation script
 docker_exec install-tools "$SAMPLE_TOOLS"
 # Test the Conda installation
 docker_exec_run bash -c 'export PATH=$GALAXY_CONFIG_TOOL_DEPENDENCY_DIR/_conda/bin/:$PATH && conda --version && conda install samtools -c bioconda --yes'
 
+# analyze image using dive tool
+CI=true dive quay.io/bgruening/galaxy
 
 docker stop galaxy
 docker rm -f galaxy

diff --git a/README.md b/README.md
@@ -176,7 +176,7 @@ docker run -p 8080:80 -v /data/galaxy-data:/export --name <new_container_name> b
 
 ```sh
 cd /data/galaxy-data/.distribution_config
-for f in *; do echo $f; diff $f ../galaxy-central/config/$f; read; done
+for f in *; do echo $f; diff $f ../galaxy/config/$f; read; done
 ```
 
 4. Upgrade the database schema
@@ -239,19 +239,19 @@ With this method, you keep a backup in case you decide to downgrade, but require
 
   ```
   $ cd /data/galaxy-data/.distribution_config
-  $ for f in *; do echo $f; diff $f ../../galaxy-data-old/galaxy-central/config/$f; read; done
+  $ for f in *; do echo $f; diff $f ../../galaxy-data-old/galaxy/config/$f; read; done
   ```
 8. Copy all the users' datasets to the new instance
 
   ```
-  $ sudo rsync -var /data/galaxy-data-old/galaxy-central/database/files/* /data/galaxy-data/galaxy-central/database/files/
+  $ sudo rsync -var /data/galaxy-data-old/galaxy/database/files/* /data/galaxy-data/galaxy/database/files/
   ```
 9. Copy all the installed tools
 
   ```
   $ sudo rsync -var /data/galaxy-data-old/tool_deps/* /data/galaxy-data/tool_deps/
-  $ sudo rsync -var /data/galaxy-data-old/galaxy-central/database/shed_tools/* /data/galaxy-data/galaxy-central/database/shed_tools/
-  $ sudo rsync -var /data/galaxy-data-old/galaxy-central/database/config/* /data/galaxy-data/galaxy-central/database/config/
+  $ sudo rsync -var /data/galaxy-data-old/galaxy/database/shed_tools/* /data/galaxy-data/galaxy/database/shed_tools/
+  $ sudo rsync -var /data/galaxy-data-old/galaxy/database/config/* /data/galaxy-data/galaxy/database/config/
   ```
 10. Copy the welcome page and all its files.
 
@@ -328,7 +328,7 @@ exit
 
 ```sh
 cd /data/galaxy-data/.distribution_config
-for f in *; do echo $f; diff $f ../galaxy-central/config/$f; read; done
+for f in *; do echo $f; diff $f ../galaxy/config/$f; read; done
 ```
 
 7. Upgrade the database schema (= step 4 of the "The quick upgrade method" above)
@@ -470,11 +470,11 @@ docker run -p 8080:80 \
     bgruening/galaxy-stable
 ```
 
-Note, that if you would like to run any of the [cleanup scripts](https://galaxyproject.org/admin/config/performance/purge-histories-and-datasets/), you will need to add the following to `/export/galaxy-central/config/galaxy.yml`:
+Note, that if you would like to run any of the [cleanup scripts](https://galaxyproject.org/admin/config/performance/purge-histories-and-datasets/), you will need to add the following to `/export/galaxy/config/galaxy.yml`:
 
 ```
 database_connection = postgresql://galaxy:galaxy@localhost:5432/galaxy
-file_path = /export/galaxy-central/database/files
+file_path = /export/galaxy/database/files
 ```
 
 ## Security Configuration
@@ -492,7 +492,7 @@ Additionally Galaxy encodes various internal values that can be part of output u
 id_secret: d5c910cc6e32cad08599987ab64dcfae
 ```
 
-You should change all three configuration variables above manually in `/export/galaxy-central/config/galaxy.yml`.
+You should change all three configuration variables above manually in `/export/galaxy/config/galaxy.yml`.
 
 Alternatively you can pass the security configuration when running the image but please note that it is a security problem. E.g. if a tool exposes all `env`'s your secret API key will also be exposed.
 
@@ -598,47 +598,47 @@ The easiest way is to create a `/export` mount point on the cluster and mount th
 #### Not using the /export mount point on the cluster.
 The docker container sets up all its files on the /export directory, but this directory may not exist on the cluster filesystem. This can be solved with symbolic links on the cluster filesystem but it can also be solved within the container itself.
 
-In this example configuration the cluster file system has a directory `/cluster_storage/galaxy` which is accessible for the galaxy user in the container (UID 1450) and the user starting the container.
+In this example configuration the cluster file system has a directory `/cluster_storage/galaxy_data` which is accessible for the galaxy user in the container (UID 1450) and the user starting the container.
 
 The container should be started with the following settings configured:
 ```bash
 docker run -d -p 8080:80 -p 8021:21 \
--v /cluster_storage/galaxy/galaxy_export:/export \ # This makes sure all galaxy files are on the cluster filesystem
--v /cluster_storage/galaxy:/cluster_storage/galaxy \ # This ensures the links within the docker container and on the cluster fs are the same
+-v /cluster_storage/galaxy_data/galaxy_export:/export \ # This makes sure all galaxy files are on the cluster filesystem
+-v /cluster_storage/galaxy_data:/cluster_storage/galaxy_data \ # This ensures the links within the docker container and on the cluster fs are the same
 # The following settings make sure that each job is configured with the paths on the cluster fs instead of /export
--e GALAXY_CONFIG_TOOL_DEPENDENCY_DIR="/cluster_storage/galaxy/galaxy_export/tool_deps" \
--e GALAXY_CONFIG_TOOL_DEPENDENCY_CACHE_DIR="/cluster_storage/galaxy/galaxy_export/tool_deps/_cache" \
--e GALAXY_CONFIG_FILE_PATH="/cluster_storage/galaxy/galaxy_export/galaxy-central/database/files" \
--e GALAXY_CONFIG_TOOL_PATH="/cluster_storage/galaxy/galaxy_export/galaxy-central/tools" \
--e GALAXY_CONFIG_TOOL_DATA_PATH="/cluster_storage/galaxy/galaxy_export/galaxy-central/tool-data" \
--e GALAXY_CONFIG_SHED_TOOL_DATA_PATH="/cluster_storage/galaxy/galaxy_export/galaxy-central/tool-data" \
+-e GALAXY_CONFIG_TOOL_DEPENDENCY_DIR="/cluster_storage/galaxy_data/galaxy_export/tool_deps" \
+-e GALAXY_CONFIG_TOOL_DEPENDENCY_CACHE_DIR="/cluster_storage/galaxy_data/galaxy_export/tool_deps/_cache" \
+-e GALAXY_CONFIG_FILE_PATH="/cluster_storage/galaxy_data/galaxy_export/galaxy/database/files" \
+-e GALAXY_CONFIG_TOOL_PATH="/cluster_storage/galaxy_data/galaxy_export/galaxy/tools" \
+-e GALAXY_CONFIG_TOOL_DATA_PATH="/cluster_storage/galaxy_data/galaxy_export/galaxy/tool-data" \
+-e GALAXY_CONFIG_SHED_TOOL_DATA_PATH="/cluster_storage/galaxy_data/galaxy_export/galaxy/tool-data" \
 # The following settings are for directories that can be anywhere on the cluster fs.
--e GALAXY_CONFIG_JOB_WORKING_DIRECTORY="/cluster_storage/galaxy/galaxy_export/galaxy-central/database/job_working_directory" \ #IMPORTANT: needs to be created manually. Can also be placed elsewhere, but is originally located here
--e GALAXY_CONFIG_NEW_FILE_PATH="/cluster_storage/galaxy/tmp" \ # IMPORTANT: needs to be created manually. This needs to be writable by UID=1450 and have its flippy bit set (chmod 1777 for world-writable with flippy bit)
+-e GALAXY_CONFIG_JOB_WORKING_DIRECTORY="/cluster_storage/galaxy_data/galaxy_export/galaxy/database/job_working_directory" \ #IMPORTANT: needs to be created manually. Can also be placed elsewhere, but is originally located here
+-e GALAXY_CONFIG_NEW_FILE_PATH="/cluster_storage/galaxy_data/tmp" \ # IMPORTANT: needs to be created manually. This needs to be writable by UID=1450 and have its flippy bit set (chmod 1777 for world-writable with flippy bit)
 -e GALAXY_CONFIG_OUTPUTS_TO_WORKING_DIRECTORY=False \ # Writes Job scripts, stdout and stderr to job_working_directory.   
 -e GALAXY_CONFIG_RETRY_JOB_OUTPUT_COLLECTION=5 \ #IF your cluster fs uses nfs this may introduce latency. You can set galaxy to retry if a job output is not yet created.
 # Conda settings. IMPORTANT!
--e GALAXY_CONFIG_CONDA_PREFIX="/cluster_storage/galaxy/_conda" \ # Can be anywhere EXCEPT cluster_storage/galaxy/galaxy_export!
+-e GALAXY_CONFIG_CONDA_PREFIX="/cluster_storage/galaxy_data/_conda" \ # Can be anywhere EXCEPT cluster_storage/galaxy/galaxy_export!
 # Conda uses $PWD to determine where the virtual environment is. If placed inside the export directory conda will determine $PWD to be a subirectory of the  /export folder which does not exist on the cluster!
 -e GALAXY_CONFIG_CONDA_AUTO_INIT=True # When the necessary environment can not be found a new one will automatically be created
 ```
 ### Setting up a Python virtual environment on the cluster  <a name="Setting-up-a-python-virtual-environment-on-the-cluster" />[[toc]](#toc)
 The Python environment in the container is not accessible from the cluster. So it needs to be created beforehand.
-In this example configuration the Python virtual environment is created on  `/cluster_storage/galaxy/galaxy_venv` and the export folder on `/cluster_storage/galaxy/galaxy_export`. To create the virtual environment:
-1. Create the virtual environment `virtualenv /cluster_storage/galaxy/galaxy_venv`
-2. Activate the virtual environment `source /cluster_storage/galaxy/galaxy_venv/bin/activate`
-3. Install the galaxy requirements `pip install --index-url https://wheels.galaxyproject.org/simple --only-binary all -r /cluster_storage/galaxy/galaxy-central//lib/galaxy/dependencies/pinned-requirements.txt`
+In this example configuration the Python virtual environment is created on  `/cluster_storage/galaxy_data/galaxy_venv` and the export folder on `/cluster_storage/galaxy_data/galaxy_export`. To create the virtual environment:
+1. Create the virtual environment `virtualenv /cluster_storage/galaxy_data/galaxy_venv`
+2. Activate the virtual environment `source /cluster_storage/galaxy_data/galaxy_venv/bin/activate`
+3. Install the galaxy requirements `pip install --index-url https://wheels.galaxyproject.org/simple --only-binary all -r /cluster_storage/galaxy_data/galaxy/lib/galaxy/dependencies/pinned-requirements.txt`
   * Make sure to upgrade the environment with the new requirements when a new version of galaxy is released.
 
-To make the Python environment usable on the cluster, create your custom `job_conf.xml` file and put it in `/cluster_storage/galaxy/galaxy_export/galaxy-central/config`.
+To make the Python environment usable on the cluster, create your custom `job_conf.xml` file and put it in `/cluster_storage/galaxy_data/galaxy_export/galaxy/config`.
 In the destination section the following code should be added:
 ```xml
 <destinations default="cluster">
   <destination id="cluster" runner="your_cluster_runner">
-    <env file="/cluster_storage/galaxy/galaxy_venv/bin/activate"/>
-    <env id="GALAXY_ROOT_DIR">/cluster_storage/galaxy/galaxy_export/galaxy-central</env>
-    <env id="GALAXY_LIB">/cluster_storage/galaxy/galaxy_export/galaxy-central/lib</env>
-    <env id="PYTHONPATH">/cluster_storage/galaxy/galaxy_export/galaxy-central/lib</env>
+    <env file="/cluster_storage/galaxy_data/galaxy_venv/bin/activate"/>
+    <env id="GALAXY_ROOT_DIR">/cluster_storage/galaxy_data/galaxy_export/galaxy</env>
+    <env id="GALAXY_LIB">/cluster_storage/galaxy_data/galaxy_export/galaxy/lib</env>
+    <env id="PYTHONPATH">/cluster_storage/galaxy_data/galaxy_export/galaxy/lib</env>
     <param id="embed_metadata_in_job">True</param>
   </destination>
 ```
@@ -655,7 +655,7 @@ It is often convenient to configure Galaxy to use a high-performance cluster for
 1. munge.key
 2. slurm.conf
 
-These files from the cluster must be copied to the `/export` mount point (i.e., `/cluster_storage/galaxy/galaxy_export/` on the host if using below command) accessible to Galaxy before starting the container. This must be done regardless of which Slurm daemons are running within Docker. At start, symbolic links will be created to these files to `/etc` within the container, allowing the various Slurm functions to communicate properly with your cluster. In such cases, there's no reason to run `slurmctld`, the Slurm controller daemon, from within Docker, so specify `-e "NONUSE=slurmctld"`. Unless you would like to also use Slurm (rather than the local job runner) to run jobs within the Docker container, then alternatively specify `-e "NONUSE=slurmctld,slurmd"`.
+These files from the cluster must be copied to the `/export` mount point (i.e., `/cluster_storage/galaxy_data/galaxy_export/` on the host if using below command) accessible to Galaxy before starting the container. This must be done regardless of which Slurm daemons are running within Docker. At start, symbolic links will be created to these files to `/etc` within the container, allowing the various Slurm functions to communicate properly with your cluster. In such cases, there's no reason to run `slurmctld`, the Slurm controller daemon, from within Docker, so specify `-e "NONUSE=slurmctld"`. Unless you would like to also use Slurm (rather than the local job runner) to run jobs within the Docker container, then alternatively specify `-e "NONUSE=slurmctld,slurmd"`.
 
 Importantly, Slurm relies on a shared filesystem between the Docker container and the execution nodes. To allow things to function correctly, checkout the basic filesystem setup above.
 
@@ -804,7 +804,7 @@ ENV GALAXY_CONFIG_BRAND deepTools
 ENV http_proxy 'http://yourproxyIP:8080'
 ENV https_proxy 'http://yourproxyIP:8080'
 
-WORKDIR /galaxy-central
+WORKDIR /galaxy
 
 RUN add-tool-shed --url 'http://testtoolshed.g2.bx.psu.edu/' --name 'Test Tool Shed'
 
@@ -933,7 +933,7 @@ RabbitMQ is configured with:
 You can clone this repository with:
 
 ```sh
-git clone --recursive https://github.com/bgruening/docker-galaxy-stable.git
+git clone https://github.com/bgruening/docker-galaxy-stable.git
 ```
 
 This repository uses various [Ansible](http://www.ansible.com/) roles as specified in [requirements.yml](galaxy/ansible/requirements.yml) to manage configurations and dependencies. You can install these roles with the following command:
@@ -949,6 +949,12 @@ If you simply want to change the Galaxy repository and/or the Galaxy branch, fro
  --build-arg GALAXY_REPO=https://github.com/manabuishii/galaxy
 ```
 
+To keep docker images lean and optimize storage, we recommend using [Dive](https://github.com/wagoodman/dive). It provides an interactive UI that lets you explore each layer of the image, helping you quickly identify files and directories that take up significant space. To install Dive, follow the installation instructions provided in the [Dive GitHub repository](https://github.com/wagoodman/dive?tab=readme-ov-file#installation). After building your docker image, use Dive to analyze it:
+
+```bash
+dive <your-docker-image-name>
+```
+
 # Requirements <a name="Requirements" /> [[toc]](#toc)
 
 - [Docker](https://www.docker.io/gettingstarted/#h_installation)