-
Notifications
You must be signed in to change notification settings - Fork 0
Infrastructure playbooks
The Ansible playbook used to set up the UseGalaxy.no infrastructure was inspired by the UseGalaxy.eu playbook, but our repository is not a fork of the European playbook. Rather, our playbooks were written from scratch based on lessons learned at the Galaxy Admin Training course in Barcelona 2020.
The playbooks can be found under the env
directory in the repository. There are separate playbooks for the production stack in env/main
and the test stack in env/test
. However, most of the files are common for the two setups and are just symlinks to files residing under env/common
.
The execute a playbook, either enter the env/main
directory (for production) or env/test
directory (for test) and run the playbook from there. This will then run the playbooks against either the production servers listed in the env/main/hosts
file or the test servers in env/test/hosts
. You will also need our ansible vault password which should be saved to a file named vault_password
inside the directory. Ask another admin if you don't have the password already.
The UseGalaxy.no services are split across multiple nodes with different purposes. The original production setup consisted of 3 VMs (Main, Database and Compute) created on a hypervisor running on hardware that we purchased specifically to host the Norwegian Galaxy server (located at UiB). More nodes from NREC were added later to increase the compute capacity. The test stack is almost identical to the production stack, but it only runs on NREC VMs (usually smaller), and the domain names are on the form *.test.usegalaxy.no
.
Node | Domain name | Ansible hosts group |
---|---|---|
Main | usegalaxy.no |
galaxyserver |
Database | db.usegalaxy.no |
database |
Compute |
slurm.usegalaxy.no + nrec2.usegalaxy.no
|
slurm |
Dynamic compute |
eccN.usegalaxy.no (N is a number between 1 and 6?)
|
slurm |
Squid proxies |
cvmfsproxy01.usegalaxy.no and cvmfsproxy02.usegalaxy.no
|
cvmfsproxy |
CVMFS Stratum 0 |
cvmfs0.usegalaxy.no (same as Database node!) |
cvmfsstratum0servers |
The system.yml
playbook performs general tasks that should be applied to all nodes irrespective of their intended purpose.
Most of the other playbooks, like galaxy.yml
, database.yml
, slurm.yml
and cvmfsproxy.yml
, target specific nodes (or host groups) and will set up all the services that are needed on those nodes. However, a few playbooks, like ecc.yml
, nga.yml
and cvmfs-server.yml
only sets up a single service on a selected node.
Playbook | Description | Code |
---|---|---|
galaxy.yml | Sets up Galaxy itself, plus an NGINX reverse proxy and some other services on the Main node | galaxy.yml |
database.yml | Sets up Postgres and some other services on the Database node | database.yml |
system.yml | General boot strapping of all servers | system.yml |
slurm.yml | Sets up the Slurm Workload Manager (controller and compute nodes) | slurm.yml |
ecc.yml | Sets up the Elastic Compute Cloud service (ECC) | ecc.yml |
cvmfsproxy.yml | Sets up caching proxies (Squid) for CVMFS and configures the CVMFS clients to use these | cvmfsproxy.yml |
cvmfs-server.yml | Sets up a local CVMFS stratum 0 server for our /cvmfs/data.usegalaxy.no reference data repository |
cvmfs-server.yml |
nga.yml | Sets up the NeLS Galaxy API service (NGA) | nga.yml |
workflows.yml | Automatic installation of workflows to Galaxy server | workflows.yml |
Unused or deprecated playbooks:
Playbook | Description | Code |
---|---|---|
dns.yml | For setting up CloudFlare DNS entries? (not used) | dns.yml |
pulsar.yml | Sets up Pulsar service at NMBU (for testing) | pulsar.yml |
jenkins.yml | Sets up Jenkins on the Main node (not used) | jenkins.yml |
logrotate.yml | Sets up logrotate? (test) | logrotate.yml |
All the Ansible roles required by the playbooks should already be included in the repository, but if you get complaints about missing roles or undefined variables when running the playbook, you can try to reinstall them by running the following command inside either env/main
or env/test
.
ansible-galaxy install -r requirements.yml
Note that the versions of the roles are pinned in the env/common/requirements.yml
file. Future upgrades of UseGalaxy.no should probably use more recent versions of the roles also.
If you get complaints about missing modules when running a playbook, you may have to install some Python dependencies by running the following command in the repository's root directory. (You can create a virtual environment to install the modules in if you want.)
pip install -r requirements.txt
The galaxy.yml
playbook installs Galaxy, the NGINX web proxy and some other stuff on the Main node. This is the playbook that should be run whenever we need to upgrade Galaxy to a new version. Most of the work in the playbook is handled by "official" ansible roles developed by the Galaxy community.
Service | Installed by Ansible role | Code |
---|---|---|
Galaxy | galaxyproject.galaxy | galaxyproject.galaxy |
NGINX | galaxyproject.nginx | galaxyproject.nginx |
CVMFS | galaxyproject.cvmfs | galaxyproject.cvmfs |
The galaxyproject.galaxy
role installs Galaxy itself (plus dependencies in a virtual environment) guided by the configuration in group_vars/galaxy.yml
.
The role will fetch the Galaxy codebase from GitHub, using the branch specified by the galaxy_commit_id
variable. Unlike most other Galaxy-related settings, we decided to place this particular variable in the group_vars/env.yml
file. This file is one of very few that have different versions for test and production in our playbook.
Besides galaxy_commit_id
, one of the most important variables used by the role is galaxy_layout
. This is a high-level setting that controls where all the files are placed on the server. We use the "root-dir" layout, which means that all the Galaxy code and configuration files by default will be installed into subdirectories of the galaxy_root
directory. In our setup, the root directory is /srv/galaxy/
. The Galaxy codebase will end up under /srv/galaxy/server/
, and static configuration files are placed in /srv/galaxy/config/
. For security reasons, these files are owned by the "root" OS user, whereas the Galaxy service is run by the "galaxy" OS user. This means that Galaxy does not have permission to change these files itself, so if a hacker somehow manages to run custom code through the Galaxy interface, they cannot destroy these files. Configuration files that Galaxy needs to modify dynamically are placed in /srv/galaxy/var/
and are owned by the "galaxy" OS user. These include the HTML display "whitelist" file and anything related to tools installed from tool sheds, such as the tool wrappers themselves in /srv/galaxy/var/shed_tools/
. Data files uploaded or created by users in their Galaxy histories are placed under the directory specified by the galaxy_file_path
setting, which points to /data/part0/
on our server.
All the settings listed under galaxy_config
in group_vars/galaxy.yml
are copied directly into the main Galaxy configuration file /srv/galaxy/config/galaxy.yml
(after substituting Ansible variables, of course). These settings also refer to several other configuration files, which are copied from the env/common/files/galaxy/config
and env/common/templates/galaxy/config
directories in the playbook.
Our playbook also installs some basic dependencies on the Main node that are needed by the Galaxy role. And after the Galaxy setup is complete, it installs Singularity and the Slurm DRMAA package. Finally, the playbook ensures that certain files and directories have the right access permissions and ownerships, so that these can be accessed by the services that need them. Starting and stopping the Galaxy service and mule handlers is handled by systemd, as set up by the usegalaxy_eu.galaxy_systemd
role.
UseGalaxy.no has a few customizations compared to the out-of-the-box Galaxy server installed by the galaxyproject.galaxy role, including a few files that replace standard Galaxy codebase files. Most of this customization is handled by tasks defined in tasks/galaxy.yml
and executed here. This includes changes to the welcome page and visual style, adding the NeLS OIDC backend and changing the login page, adding webhooks for history import/export between Galaxy and NeLS Storage and adding the NeLS Storage remote data source plugin (configured here).
Previously, we also used the usegalaxy-no.nels_storage
role to install two tools in the Tools Panel to import and export datasets between NeLS Storage and Galaxy. These tools are not used anymore, but the role also creates a configuration file for the NeLS Storage API which is used by the NeLS Storage plugin (and possibly NGA). The values used for the settings in this configuration file is encrypted in the vault.
The tasks/galaxy.yml
file also performes a few miscellaneous tasks, for instance setting up log directories and log rotation (via tasks/logrotate.yml
), configure some environment variables used by the gxadmin utility (previously installed by the usegalaxy_eu.gxadmin
role which no longer exists?), installing some telegraf plugins, setting up cronjobs to remove deleted Galaxy history files and other tmp-files, and some SELinux related stuff.
The Galaxy codebase comes bundled with a separate reports server which can optionally be deployed alongside Galaxy to summarize and visualize statistics about users, data, tools and jobs obtained from the Galaxy database. We have deployed this Reports server at https://usegalaxy.no/reports. The configuration template for the server can be found in env/common/templates/galaxy/config/reports.yml
, and the resulting file is placed in /srv/galaxy/config/reports.yml
. We use systemd to manage this service instead of running the run_reports.sh
script. This is handled by the usegalaxy_eu.galaxy_systemd
role.
The Galaxy service running on port 8080 on the Main node is reverse proxied by an NGINX web server set up on the same node by the galaxyproject.nginx
role.
Most of the configuration settings for this role are defined in group_vars/galaxy.yml
. The role is configured to set up 3 web servers (virtual hosts):
Virtual host | Port | Configuration template | local file destination |
---|---|---|---|
galaxy | 443 | env/common/templates/nginx/galaxy.j2 | /etc/nginx/sites-enabled/galaxy |
galaxy-gie-proxy | 443 | env/common/templates/nginx/galaxy-gie-proxy.j2 | /etc/nginx/sites-enabled/galaxy-gie-proxy |
redirect-ssl | 80 | env/common/templates/nginx/redirect-ssl.j2 | /etc/nginx/sites-enabled/redirect-ssl |
The unsecure redirect-ssl
host just redirects all plain HTTP traffic coming in to port 80 to the secure 443 port. The galaxy-gie-proxy
host enables support for interactive tools and will redirect requests coming to *.interactivetool.usegalaxy.no
to the "GIE proxy server" which is set up by the usegalaxy_eu.gie_proxy
role. Note that we do not currently support interactive tools!
All other HTTPS requests are handled by the galaxy
host, which mostly just redirects them to Galaxy's internal uWSGI server on port 8080. However, "static" files located in the /srv/galaxy/server/static/
directory on the server and accessed from the https://usegalaxy.no/static/ subdirectory are served directly by NGINX itself. A few other subdirectories are also recognized and redirected to other services. For instance, requests referring to the https://usegalaxy.no/reports subdirectory will be redirected to the Reports server on port 9001, and requests referring to https://usegalaxy.no/nga will be redirected to the NGA server on port 8888.
We have configured the NGINX role to use Certbot (via the usegalaxy_eu.certbot
role) to automatically obtain, install and renew SSL certificates from Let's Encrypt. The certificates are placed under /etc/letsencrypt/live/
on the server. Additional configuration settings for Certbot can be found in the group_vars/galaxy.yml
file.
Other SSL-related settings for NGINX, such as ciphers and the location of certificates, are inserted into the galaxyproject.nginx
role's ssl.conf.j2
template file and copied into /etc/nginx/conf.d/ssl.conf
.
The Galaxy community has set up CVMFS servers to share a large number of reference datasets, including full genome sequences for several standard model organisms and corresponding indexes for many popular sequence mapping tools. Installing and configuring a CVMFS client on any machine allows this repository to be mounted locally as a read-only FUSE file system under /cvmfs/data.galaxyproject.org/
. Files and metadata are fetched on-the-fly (and only when needed) from the CVMFS server using the HTTP protocol and then stored in a small local cache behind the scenes. This gives the appearance of having direct access to a large file repository (>4 TB) on the local machine without actually spending lots of resources.
CVMFS clients are installed on both the Main node (in the galaxy.yml
playbook) and every Compute node (in the slurm.yml
playbook) using the galaxyproject.cvmfs
ansible role.
CVMFS configuration for Compute nodes in "group_vars/slurm.yml": https://github.com/usegalaxy-no/infrastructure-playbook/blob/e37cdcd8b36f503af0bc995333f194ee81eda5b7/env/common/group_vars/slurm.yml#L91-L108
By setting the "galaxy_cvmfs_repos_enabled" variable to "config-repo" the "galaxyproject.cvmfs" role will automatically configure the CVMFS client with repositores for "data.galaxyproject.org" using a config repository (files can be found under /cvmfs/cvmfs-config.galaxyproject.org
)
Also on the Main node: https://github.com/usegalaxy-no/infrastructure-playbook/blob/0f2cd5486dc4cbc4e1348e4654add6fe97e58ab1/env/common/cvmfsproxy.yml#L99-L134
And on compute nodes ("slurm" hosts group): https://github.com/usegalaxy-no/infrastructure-playbook/blob/0f2cd5486dc4cbc4e1348e4654add6fe97e58ab1/env/common/cvmfsproxy.yml#L136-L240
And on the Database node: https://github.com/usegalaxy-no/infrastructure-playbook/blob/0f2cd5486dc4cbc4e1348e4654add6fe97e58ab1/env/common/cvmfsproxy.yml#L242-L277
Both the Main node and Compute nodes run the command "cvmfs_config probe " against all the CVMFS repositories to check that they work https://github.com/usegalaxy-no/infrastructure-playbook/blob/0f2cd5486dc4cbc4e1348e4654add6fe97e58ab1/env/common/cvmfsproxy.yml#L279-L298
In addition to the , we have also set up our own Stratum 0 server on the Database node to host . This is done by the 'cvmfs-server.yml' playbook described below.
The database.yml
playbook installs a PostgreSQL database, which is used by Galaxy, on the Database node db.usegalaxy.no
. The main work of this playbook is handled by the galaxyproject.postgresql
ansible role, and the database_connection setting in Galaxy's config/galaxy.yml
file is specified by the db_connection
variable defined in group_vars/global.yml
. The connection string includes the database user defined in the same file, the database host taken from the hosts file, the name of the database defined in group_vars/env.yml
(different for the test setup and production setup) and the database password encrypted in the vault in secret_group_vars/global.vault
.
The playbook also installs a MySQL server using the geerlingguy.mysql
ansible role, an Apache server using the geerlingguy.apache
role, and a RabbitMQ server via tasks defined in tasks/rabbitmq.yml
. The MySQL database is used by Slurm for accounting (via the "slurmdbd" daemon), the Apache server is used to proxy the RabbitMQ service and RabbitMQ is used by NGA.
The system.yml
playbook is meant to be run against all the nodes in our infrastructure to perform some common setup tasks.
For instance, it sets the hostname of the VM, installs some basic utilities/dependencies, disables IPv6 in the kernel, sets the SELinux policy and also sets the time zone.
OS users and groups are created in tasks/users_groups.yml
(which also includes tasks/_system_users.yml
). The configuration in group_vars/global.yml
specifies which users/groups to create, viz. galaxy, galaxyadmin, sysadmin, slurm and docker (group only). The playbook also installs the sudoers
file and adds SSH keys for galaxyadmins and sysadmins to the ~/.ssh/authorized_keys
files for these users. The SSH keys are stored in vault-encrypted files that can be found under env/common/files/ssh
.
The playbook also performs miscellaneous tasks in tasks/system.yml
that, among other, install some scripts that are used by Telegraf to obtain status information from the system. Some of those tasks are node-specific, for instance related to the Postfix mail server on the Database node or NFS stuff on the Main node.
Via ready-made ansible roles the playbook installs Fail2ban (an intrusion prevention software framework), Telegraf for monitoring and metrics collection (configured here with extra plugins here and here), the Postfix mail server (configured here), the EPEL repository (Extra Packages for Enterprise Linux) and dhclient (Dynamic Host Configuration Protocol Client, which is configured here). The weareinteractive.sudo
role is used to give sudo access to the telegraf user.
Note that firewall installation and configuration (using Firewalld) is not handled by this playbook but is usually done by including tasks/firewall.yml
in other playbooks that target more specific host groups, such as galaxy.yml
, database.yml
and slurm.yml
.
For clusters of nodes with CVMFS clients, the CVMFS documentation recommends setting up two or more forward proxy servers to act as caching proxies. This will "reduce the latency for the local worker nodes, which is critical for cold cache performance [and] also reduce the load on the Stratum 1 servers".
The cvmfsproxy.yml
playbook installs Squid proxies on all the nodes in the "cvmfsproxy" hosts group (which currently contains the two NREC VMs "cvmfsproxy01.(test).usegalaxy.no" and "cvmfsproxy02.(test).usegalaxy.no"). It also configures the other nodes running CVMFS clients to use these proxies simply by changing the CVMFS_HTTP_PROXY
directive in the /etc/cvmfs/default.local
configuration files to point to these proxies. (The default value for this setting is DIRECT, which means that no proxies are used and the CVMFS clients should contact the Stratum server directly.)
The playbook installs the packages for Squid and Firewalld via standard package managers. These services should be managed by systemd and are thus enabled and started by the playbook. The squid -z
command will trigger Squid to create any missing cache directories, if necessary. The configuration file for Squid is copied from the template in env/common/templates/cvmfs/squid.conf.j2
and installed into /etc/squid/squid.conf
on the proxy node. Some of the settings used in the configuration are defined in the group_vars/cvmfsproxy.yml
file (which is different for test and production since it contains hardcoded references!).
The "ssh" port (22) is added to the "public zone" of the firewall to allow admins (and Ansible) to login via SSH. The "squid" port (3128) is next added to the "internal zone" to allow communication between Squid and the CVMFS clients. (Firewalld already knows about both of these services, which is why they can be referenced by name instead of port number. See /usr/lib/firewalld/services/
.) The IP ranges of potential CVMFS clients are also added as sources to the "internal zone".