Skip to content

Infrastructure playbooks

Kjetil Klepper edited this page Oct 14, 2022 · 94 revisions

Contents

Overview

The Ansible playbook used to set up the UseGalaxy.no infrastructure was inspired by the UseGalaxy.eu playbook, but our repository is not a fork of the European playbook. Rather, our playbooks were written from scratch based on lessons learned at the Galaxy Admin Training course in Barcelona 2020.

The playbooks can be found under the env directory in the repository. There are separate playbooks for the production stack in env/main and the test stack in env/test. However, most of the files are common for the two setups and are just symlinks to files residing under env/common.

The execute a playbook, either enter the env/main directory (for production) or env/test directory (for test) and run the playbook from there. This will then run the playbooks against either the production servers listed in the env/main/hosts file or the test servers in env/test/hosts. You will also need our ansible vault password which should be saved to a file named vault_password inside the directory. Ask another admin if you don't have the password already.

Hosts

The UseGalaxy.no services are split across multiple nodes with different purposes. The original production setup consisted of 3 VMs (Main, Database and Compute) created on a hypervisor running on hardware that we purchased specifically to host the Norwegian Galaxy server (located at UiB). More nodes from NREC were added later to increase the compute capacity. The test stack is almost identical to the production stack, but it only runs on NREC VMs (usually smaller), and the domain names are on the form *.test.usegalaxy.no.

Node Domain name Ansible hosts group
Main usegalaxy.no galaxyserver
Database db.usegalaxy.no database
Compute slurm.usegalaxy.no + nrec2.usegalaxy.no slurm
Dynamic compute eccN.usegalaxy.no (N is a number between 1 and 6?) slurm
Squid proxies cvmfsproxy01.usegalaxy.no and cvmfsproxy02.usegalaxy.no cvmfsproxy
CVMFS Stratum 0 cvmfs0.usegalaxy.no (same as Database node!) cvmfsstratum0servers

Playbooks

The system.yml playbook performs general tasks that should be applied to all nodes irrespective of their intended purpose. Most of the other playbooks, like galaxy.yml, database.yml, slurm.yml and cvmfsproxy.yml, target specific nodes (or host groups) and will set up all the services that are needed on those nodes. However, a few playbooks, like ecc.yml, nga.yml and cvmfs-server.yml only sets up a single service on a selected node.

Playbook Description Code
galaxy.yml Sets up Galaxy itself, plus an NGINX reverse proxy and some other services on the Main node galaxy.yml
database.yml Sets up Postgres and some other services on the Database node database.yml
system.yml General boot strapping of all servers system.yml
slurm.yml Sets up the Slurm Workload Manager (controller and compute nodes) slurm.yml
ecc.yml Sets up the Elastic Compute Cloud service (ECC) ecc.yml
cvmfsproxy.yml Sets up caching proxies (Squid) for CVMFS and configures the CVMFS clients to use these cvmfsproxy.yml
cvmfs-server.yml Sets up a local CVMFS stratum 0 server for our /cvmfs/data.usegalaxy.no reference data repository cvmfs-server.yml
nga.yml Sets up the NeLS Galaxy API service (NGA) nga.yml
workflows.yml Automatic installation of workflows to Galaxy server workflows.yml

Unused or deprecated playbooks:

Playbook Description Code
dns.yml For setting up CloudFlare DNS entries? (not used) dns.yml
pulsar.yml Sets up Pulsar service at NMBU (for testing) pulsar.yml
jenkins.yml Sets up Jenkins on the Main node (not used) jenkins.yml
logrotate.yml Sets up logrotate? (test) logrotate.yml

Requirements

All the Ansible roles required by the playbooks should already be included in the repository, but if you get complaints about missing roles or undefined variables when running the playbook, you can try to reinstall them by running the following command inside either env/main or env/test.

ansible-galaxy install -r requirements.yml

Note that the versions of the roles are pinned in the env/common/requirements.yml file. Future upgrades of UseGalaxy.no should probably use more recent versions of the roles also.

If you get complaints about missing modules when running a playbook, you may have to install some Python dependencies by running the following command in the repository's root directory. (You can create a virtual environment to install the modules in if you want.)

pip install -r requirements.txt

Playbooks

galaxy.yml

The galaxy.yml playbook installs Galaxy, the NGINX web proxy and some other stuff on the Main node. This is the playbook that should be run whenever we need to upgrade Galaxy to a new version. Most of the work in the playbook is handled by "official" ansible roles developed by the Galaxy community.

Service Installed by Ansible role Code
Galaxy galaxyproject.galaxy galaxyproject.galaxy
NGINX galaxyproject.nginx galaxyproject.nginx
CVMFS galaxyproject.cvmfs galaxyproject.cvmfs

Galaxy

The galaxyproject.galaxy role installs Galaxy itself (plus dependencies in a virtual environment) guided by the configuration in group_vars/galaxy.yml.

The role will fetch the Galaxy codebase from GitHub, using the branch specified by the galaxy_commit_id variable. Unlike most other Galaxy-related settings, we decided to place this particular variable in the group_vars/env.yml file. This file is one of very few that have different versions for test and production in our playbook.

Besides galaxy_commit_id, one of the most important variables used by the role is galaxy_layout. This is a high-level setting that controls where all the files are placed on the server. We use the "root-dir" layout, which means that all the Galaxy code and configuration files by default will be installed into subdirectories of the galaxy_root directory. In our setup, the root directory is /srv/galaxy/. The Galaxy codebase will end up under /srv/galaxy/server/, and static configuration files are placed in /srv/galaxy/config/. For security reasons, these files are owned by the "root" OS user, whereas the Galaxy service is run by the "galaxy" OS user. This means that Galaxy does not have permission to change these files itself, so if a hacker somehow manages to run custom code through the Galaxy interface, they cannot destroy these files. Configuration files that Galaxy needs to modify dynamically are placed in /srv/galaxy/var/ and are owned by the "galaxy" OS user. These include the HTML display "whitelist" file and anything related to tools installed from tool sheds, such as the tool wrappers themselves in /srv/galaxy/var/shed_tools/. Data files uploaded or created by users in their Galaxy histories are placed under the directory specified by the galaxy_file_path setting, which points to /data/part0/ on our server.

All the settings listed under galaxy_config in group_vars/galaxy.yml are copied directly into the main Galaxy configuration file /srv/galaxy/config/galaxy.yml (after substituting Ansible variables, of course). These settings also refer to several other configuration files, which are copied from the env/common/files/galaxy/config and env/common/templates/galaxy/config directories in the playbook.

Our playbook also installs some basic dependencies on the Main node that are needed by the Galaxy role. And after the Galaxy setup is complete, it installs Singularity and the Slurm DRMAA package. Finally, the playbook ensures that certain files and directories have the right access permissions and ownerships, so that these can be accessed by the services that need them. Starting and stopping the Galaxy service and mule handlers is handled by systemd, as set up by the usegalaxy_eu.galaxy_systemd role.

NeLS Galaxy customizations

UseGalaxy.no has a few customizations compared to the out-of-the-box Galaxy server installed by the galaxyproject.galaxy role, including a few files that replace standard Galaxy codebase files. Most of this customization is handled by tasks defined in tasks/galaxy.yml and executed here. This includes changes to the welcome page and visual style, adding the NeLS OIDC backend and changing the login page, adding webhooks for history import/export between Galaxy and NeLS Storage and adding the NeLS Storage remote data source plugin (configured here).

Previously, we also used the usegalaxy-no.nels_storage role to install two tools in the Tools Panel to import and export datasets between NeLS Storage and Galaxy. These tools are not used anymore, but the role also creates a configuration file for the NeLS Storage API which is used by the NeLS Storage plugin (and possibly NGA). The values used for the settings in this configuration file is encrypted in the vault.

The tasks/galaxy.yml file also performes a few miscellaneous tasks, for instance setting up log directories and log rotation (via tasks/logrotate.yml), configure some environment variables used by the gxadmin utility (previously installed by the usegalaxy_eu.gxadmin role which no longer exists?), installing some telegraf plugins, setting up cronjobs to remove deleted Galaxy history files and other tmp-files, and some SELinux related stuff.

Galaxy Reports server

The Galaxy codebase comes bundled with a separate reports server which can optionally be deployed alongside Galaxy to summarize and visualize statistics about users, data, tools and jobs obtained from the Galaxy database. We have deployed this Reports server at https://usegalaxy.no/reports. The configuration template for the server can be found in env/common/templates/galaxy/config/reports.yml, and the resulting file is placed in /srv/galaxy/config/reports.yml. We use systemd to manage this service instead of running the run_reports.sh script. This is handled by the usegalaxy_eu.galaxy_systemd role.

NGINX

The Galaxy service running on port 8080 on the Main node is reverse proxied by an NGINX web server set up on the same node by the galaxyproject.nginx role. Most of the configuration settings for this role are defined in group_vars/galaxy.yml. The role is configured to set up 3 web servers (virtual hosts):

Virtual host Port Configuration template local file destination
galaxy 443 env/common/templates/nginx/galaxy.j2 /etc/nginx/sites-enabled/galaxy
galaxy-gie-proxy 443 env/common/templates/nginx/galaxy-gie-proxy.j2 /etc/nginx/sites-enabled/galaxy-gie-proxy
redirect-ssl 80 env/common/templates/nginx/redirect-ssl.j2 /etc/nginx/sites-enabled/redirect-ssl

The unsecure redirect-ssl host just redirects all plain HTTP traffic coming in to port 80 to the secure 443 port. The galaxy-gie-proxy host enables support for interactive tools and will redirect requests coming to *.interactivetool.usegalaxy.no to the "GIE proxy server" which is set up by the usegalaxy_eu.gie_proxy role. Note that we do not currently support interactive tools!

All other HTTPS requests are handled by the galaxy host, which mostly just redirects them to Galaxy's internal uWSGI server on port 8080. However, "static" files located in the /srv/galaxy/server/static/ directory on the server and accessed from the https://usegalaxy.no/static/ subdirectory are served directly by NGINX itself. A few other subdirectories are also recognized and redirected to other services. For instance, requests referring to the https://usegalaxy.no/reports subdirectory will be redirected to the Reports server on port 9001, and requests referring to https://usegalaxy.no/nga will be redirected to the NGA server on port 8888.

SSL settings and certificates

We have configured the NGINX role to use Certbot (via the usegalaxy_eu.certbot role) to automatically obtain, install and renew SSL certificates from Let's Encrypt. The certificates are placed under /etc/letsencrypt/live/ on the server. Additional configuration settings for Certbot can be found in the group_vars/galaxy.yml file.

Other SSL-related settings for NGINX, such as ciphers and the location of certificates, are inserted into the galaxyproject.nginx role's ssl.conf.j2 template file and copied into /etc/nginx/conf.d/ssl.conf.

CVMFS

The Galaxy community has set up CVMFS servers to share a large number of reference datasets, including full genome sequences for several standard model organisms and corresponding indexes for many popular sequence mapping tools. Installing and configuring a CVMFS client on any machine allows this repository to be mounted locally as a read-only FUSE file system under /cvmfs/data.galaxyproject.org/. Files and metadata are fetched on-the-fly (and only when needed) from the CVMFS server using the HTTP protocol and then stored in a small local cache behind the scenes. This gives the appearance of having direct access to a large file repository (>4 TB) on the local machine without actually spending lots of resources.

CVMFS clients are installed on both the Main node (in the galaxy.yml playbook) and every Compute node (in the slurm.yml playbook) using the galaxyproject.cvmfs ansible role.

Configuration in galaxy.yml

CVMFS configuration for Compute nodes in "group_vars/slurm.yml": https://github.com/usegalaxy-no/infrastructure-playbook/blob/e37cdcd8b36f503af0bc995333f194ee81eda5b7/env/common/group_vars/slurm.yml#L91-L108

By setting the "galaxy_cvmfs_repos_enabled" variable to "config-repo" the "galaxyproject.cvmfs" role will automatically configure the CVMFS client with repositores for "data.galaxyproject.org" using a config repository (files can be found under /cvmfs/cvmfs-config.galaxyproject.org)

Also on the Main node: https://github.com/usegalaxy-no/infrastructure-playbook/blob/0f2cd5486dc4cbc4e1348e4654add6fe97e58ab1/env/common/cvmfsproxy.yml#L99-L134

And on compute nodes ("slurm" hosts group): https://github.com/usegalaxy-no/infrastructure-playbook/blob/0f2cd5486dc4cbc4e1348e4654add6fe97e58ab1/env/common/cvmfsproxy.yml#L136-L240

And on the Database node: https://github.com/usegalaxy-no/infrastructure-playbook/blob/0f2cd5486dc4cbc4e1348e4654add6fe97e58ab1/env/common/cvmfsproxy.yml#L242-L277

Both the Main node and Compute nodes run the command "cvmfs_config probe " against all the CVMFS repositories to check that they work https://github.com/usegalaxy-no/infrastructure-playbook/blob/0f2cd5486dc4cbc4e1348e4654add6fe97e58ab1/env/common/cvmfsproxy.yml#L279-L298

In addition to the , we have also set up our own Stratum 0 server on the Database node to host . This is done by the 'cvmfs-server.yml' playbook described below.

database.yml

The database.yml playbook installs a PostgreSQL database, which is used by Galaxy, on the Database node db.usegalaxy.no. The main work of this playbook is handled by the galaxyproject.postgresql ansible role, and the database_connection setting in Galaxy's config/galaxy.yml file is specified by the db_connection variable defined in group_vars/global.yml. The connection string includes the database user defined in the same file, the database host taken from the hosts file, the name of the database defined in group_vars/env.yml (different for the test setup and production setup) and the database password encrypted in the vault in secret_group_vars/global.vault.

The playbook also installs a MySQL server using the geerlingguy.mysql ansible role, an Apache server using the geerlingguy.apache role, and a RabbitMQ server via tasks defined in tasks/rabbitmq.yml. The MySQL database is used by Slurm for accounting (via the "slurmdbd" daemon), the Apache server is used to proxy the RabbitMQ service and RabbitMQ is used by NGA.

system.yml

The system.yml playbook is meant to be run against all the nodes in our infrastructure to perform some common setup tasks.

For instance, it sets the hostname of the VM, installs some basic utilities/dependencies, disables IPv6 in the kernel, sets the SELinux policy and also sets the time zone.

OS users and groups are created in tasks/users_groups.yml (which also includes tasks/_system_users.yml). The configuration in group_vars/global.yml specifies which users/groups to create, viz. galaxy, galaxyadmin, sysadmin, slurm and docker (group only). The playbook also installs the sudoers file and adds SSH keys for galaxyadmins and sysadmins to the ~/.ssh/authorized_keys files for these users. The SSH keys are stored in vault-encrypted files that can be found under env/common/files/ssh.

The playbook also performs miscellaneous tasks in tasks/system.yml that, among other, install some scripts that are used by Telegraf to obtain status information from the system. Some of those tasks are node-specific, for instance related to the Postfix mail server on the Database node or NFS stuff on the Main node.

Via ready-made ansible roles the playbook installs Fail2ban (an intrusion prevention software framework), Telegraf for monitoring and metrics collection (configured here with extra plugins here and here), the Postfix mail server (configured here), the EPEL repository (Extra Packages for Enterprise Linux) and dhclient (Dynamic Host Configuration Protocol Client, which is configured here). The weareinteractive.sudo role is used to give sudo access to the telegraf user.

Note that firewall installation and configuration (using Firewalld) is not handled by this playbook but is usually done by including tasks/firewall.yml in other playbooks that target more specific host groups, such as galaxy.yml, database.yml and slurm.yml.

slurm.yml

ecc.yml

cvmfsproxy.yml

For clusters of nodes with CVMFS clients, the CVMFS documentation recommends setting up two or more forward proxy servers to act as caching proxies. This will "reduce the latency for the local worker nodes, which is critical for cold cache performance [and] also reduce the load on the Stratum 1 servers".

The cvmfsproxy.yml playbook installs Squid proxies on all the nodes in the "cvmfsproxy" hosts group (which currently contains the two NREC VMs "cvmfsproxy01.(test).usegalaxy.no" and "cvmfsproxy02.(test).usegalaxy.no"). It also configures the other nodes running CVMFS clients to use these proxies simply by changing the CVMFS_HTTP_PROXY directive in the /etc/cvmfs/default.local configuration files to point to these proxies. (The default value for this setting is DIRECT, which means that no proxies are used and the CVMFS clients should contact the Stratum server directly.)

Setting up Squid proxies

The playbook installs the packages for Squid and Firewalld via standard package managers. These services should be managed by systemd and are thus enabled and started by the playbook. The squid -z command will trigger Squid to create any missing cache directories, if necessary. The configuration file for Squid is copied from the template in env/common/templates/cvmfs/squid.conf.j2 and installed into /etc/squid/squid.conf on the proxy node. Some of the settings used in the configuration are defined in the group_vars/cvmfsproxy.yml file (which is different for test and production since it contains hardcoded references!).

The "ssh" port (22) is added to the "public zone" of the firewall to allow admins (and Ansible) to login via SSH. The "squid" port (3128) is next added to the "internal zone" to allow communication between Squid and the CVMFS clients. (Firewalld already knows about both of these services, which is why they can be referenced by name instead of port number. See /usr/lib/firewalld/services/.) The IP ranges of potential CVMFS clients are also added as sources to the "internal zone".

cvmfs-server.yml

nga.yml

workflows.yml

Clone this wiki locally