title

author

date

output

Workshop running longitudinal GWAS from Netflow's longgwas

name	affiliation
Alejandro Martinez-Carrasco & Michael Ta	UCL & datatecnica

6 November 2023

html_document

df_print	keep_md
kable	true

Worshop pre requisites

Before you can start this workshop, there are a few requisites you should meet to be able to take hands on this workshop.

Have access to Ubuntu Linux distribution:
- Access to your personal Ubuntu Linux distribution (ie Linux Ubuntu on a VM, Linux Ubuntu on WSL)
- Access to Ubuntu Linux on a High Computing Cluster (HPC)
- Access to Ubuntu Linux Virtual Machine in Terra.
Bash version must be 3.2 or later in your Linux distribution
Java Runtime Environment JRE 11 or higher
longgwas Nextflow now makes use of the newest DSL2. In order to run longgwas workflows, you need to have JRE 11 or higher on your Ubuntu Linux
Install docker or docker desktop

Nextflow runs the different processes that make up a workflow within docker containers to guarantee all workflow dependencies and versions are specified before hand. This guarantees the tool's portability making reproducibility now at hand.
Install Nextflow
Get git installed in Linux Ubuntu We need to install git so that we can copy the tool from a github remote to our local directory.
This guarantees
Modify /etc/default/grub. In order to enable Nextflow to manage memory resources within Docker containers, run the following command in your Bash Shell

GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"

Clone longgwas tool in your working directory

git clone --single-branch --branch modularize  git@github.com:michael-ta/longitudinal-GWAS-pipeline.git

Outline

Introduction to Nextflow and longgwas
Get familiar with longgwas components
longgwas workflow summary
Building your first nextflow process
Running longitudinal GWASs
Summary

1. Introduction to Nextflow

To get started, we are going to follow a brief presentation to understand what Nextflow is.
Then, we will introduce the longgwas tool. Finally, we will review all longgwas components and capabilities.

2. Get familiar with longgwas components

longgwas is hosted on github remote, and it is where all the development and

2.1 Modules, subworkflows and workflows

longgwas is hosted on github.
It has three main components everyone should be familiar with:

Modules: These are the the different processes that all together make a subworkflow
Subworkflow: They are the combination of different processes from a module or several modules. Based on user inputs, the subworkflow will select which of the available modules to includes.
Workflow: A workflow encloses the invocation of one or more processes and operators, that in our case are multiple processes and operators wrapped up in subworkflows

As an example, we could ask ourselves the question of how the GWAS analysis is run within the longgwas workflow after all the QC has been performed.

Which are the modules? We could look at the content in the gwasrun module. There are three processess definitions, one per nextflow file.

Which workflow are the modules included? These three modules are included in the rungwas subworkflow. Based on user inputs, this subworkflow will allow us to run either of the three main models currently available through longgwas (GLM, CPH, GALLOP-LMM)

Do we see the GWAS subworkflow in the worjflow? Finally, the subworkflow is included in the main workflow as one step.

2.2 config and yml file

In addition, there are other key components that allow the workflow to run:

.config file: We use the config file to set up general specifications for each of the executors we have available to run longgwas
.yml file: We use this file to modify the longgwas arguments so that we can run longgwas based on our needs.

2.3 Dockerfile and docker image hosted in the Hub

There is a Dockerfile that contains all software, dependencies and versions longgwas uses to run the main workflow.
However, it is no longer needed to build the docker image yourself. We are currently hosting the longgwas docker container on the Hub, which means that as long as you have docker installed, when you run longgwas, the tool will automatically pull the image from the Hub

2.4 Documentation pages

longgwas has a very good online documentation resource It has information on how to run longgwas as a thorough description for all the parameters the tool supports

3 Workflow summary

The workflow to run longgwas could be thought in two somewhat simple steps:

Terra step. We need to filter the genetic data for the cohorts we want to include. In addition, we also need to process the clinical data so that we generate a covariates file, get our model outcomes, as well as do a bit of data QC if needed.
longgwas step. longgwas can be run in two different ways:
- Using a local executor
- Using the google-bactch executor

We won't go through the data preparation step in Terra today as it is out of the scope of this workshop, but I have added an example notebook to quickly see an example. It is available on github so you can download and reuse.
CLICK ME

4. Building your first nextflow process

We are going to go through three examples running nextflow workflows and getting hands on interacting with some nextflow components Please, clone the github repository if you have not done so yet.

git clone git@github.com:AMCalejandro/longgwas_workshop.git

Example 1

A very easy example to get familiar with process and workflow nextflow keywords.
This is convenient to familiarise yourself with dataflow. Where does my data go after the process run on the workflow?

Example 2

This is an example to introduce attendees with channels nextflow structures, and how they are uused coupled with processes.
We will then try to add an extra process that makes use of the data coming out of the first process

Example 3

A very complete example provided by the Nextflow training team which is great as you can easily understand all the components part of the nextflow script.

5. Running longitudinal GWASs

Before getting started, please clone the modularize branch of longwas github remote if you have not done so yet.

git clone --single-branch --branch modularize  git@github.com:michael-ta/longitudinal-GWAS-pipeline.git

Now that we have seen some basic examples running nextflow, we are going to try to run our very first job with longgwas. To do so, we are going to through the following steps together.

Define arguments and data paths on the yml file
Choose one of the executors available on the config file
Run longgwas with one simple comand

5.1 Run longgwas analysis

Once we have applied all the changes on the yml file, we can run with a local executor. We are going to apply several changes so that:

We modify the filtering parameters based on our needs
We specify to run either GLM, CPH, or GALLOP-LMM
We define well the input data to the tool

Tu run the analysis we will repeteadly use the follow comand

nxtflow run workflows/main.nf \
  -params-file params.yml \
  --profile standard

5.2 Demo longitudinal GWASs with cloud batch

Now I am going to give a quick demonstration on how we can connect to google and run our analyses using Nextflow google cloud batch executor.

6. Summary.

Nextflow is cool.
longgwas allows us to run different type of longitudinal analyses from one simple command.
There are two steps that need to be performed to run longgwas:
- Prepare the data in Terra.
- Run longgwas job in the background using google or HPC resources.
Thanks to the abstraction provided by Nextflow executors, we can run our portable pipeline in other systems such as Google Cloud Platform

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

index.md

index.md

Worshop pre requisites

Outline

1. Introduction to Nextflow

2. Get familiar with longgwas components

2.1 Modules, subworkflows and workflows

2.2 config and yml file

2.3 Dockerfile and docker image hosted in the Hub

2.4 Documentation pages

3 Workflow summary

4. Building your first nextflow process

Example 1

Example 2

Example 3

5. Running longitudinal GWASs

5.1 Run longgwas analysis

5.2 Demo longitudinal GWASs with cloud batch

6. Summary.

Files

index.md

Latest commit

History

index.md

File metadata and controls

Worshop pre requisites

Outline

1. Introduction to Nextflow

2. Get familiar with longgwas components

2.1 Modules, subworkflows and workflows

2.2 config and yml file

2.3 Dockerfile and docker image hosted in the Hub

2.4 Documentation pages

3 Workflow summary

4. Building your first nextflow process

Example 1

Example 2

Example 3

5. Running longitudinal GWASs

5.1 Run longgwas analysis

5.2 Demo longitudinal GWASs with cloud batch

6. Summary.