title | author | date | output | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Workshop running longitudinal GWAS from Netflow's longgwas |
|
6 November 2023 |
|
Before you can start this workshop, there are a few requisites you should meet to be able to take hands on this workshop.
-
Have access to Ubuntu Linux distribution:
- Access to your personal Ubuntu Linux distribution (ie Linux Ubuntu on a VM, Linux Ubuntu on WSL)
- Access to Ubuntu Linux on a High Computing Cluster (HPC)
- Access to Ubuntu Linux Virtual Machine in Terra.
-
Bash version must be 3.2 or later in your Linux distribution
-
Java Runtime Environment JRE 11 or higher
longgwas Nextflow now makes use of the newest DSL2. In order to run longgwas workflows, you need to have JRE 11 or higher on your Ubuntu Linux -
Install docker or docker desktop
Nextflow runs the different processes that make up a workflow within docker containers to guarantee all workflow dependencies and versions are specified before hand. This guarantees the tool's portability making reproducibility now at hand.
-
Install Nextflow
-
Get git installed in Linux Ubuntu We need to install git so that we can copy the tool from a github remote to our local directory.
This guarantees -
Modify /etc/default/grub. In order to enable Nextflow to manage memory resources within Docker containers, run the following command in your Bash Shell
GRUB_CMDLINE_LINUX="cgroup_enable=memory swapaccount=1"
- Clone longgwas tool in your working directory
git clone --single-branch --branch modularize [email protected]:michael-ta/longitudinal-GWAS-pipeline.git
- Introduction to Nextflow and longgwas
- Get familiar with longgwas components
- longgwas workflow summary
- Building your first nextflow process
- Running longitudinal GWASs
- Summary
To get started, we are going to follow a brief presentation to understand what Nextflow is.
Then, we will introduce the longgwas tool.
Finally, we will review all longgwas components and capabilities.
longgwas is hosted on github remote, and it is where all the development and
longgwas is hosted on github.
It has three main components everyone should be familiar with:
- Modules: These are the the different processes that all together make a subworkflow
- Subworkflow: They are the combination of different processes from a module or several modules. Based on user inputs, the subworkflow will select which of the available modules to includes.
- Workflow: A workflow encloses the invocation of one or more processes and operators, that in our case are multiple processes and operators wrapped up in subworkflows
As an example, we could ask ourselves the question of how the GWAS analysis is run within the longgwas workflow after all the QC has been performed.
Which are the modules? We could look at the content in the gwasrun module. There are three processess definitions, one per nextflow file.
Which workflow are the modules included? These three modules are included in the rungwas subworkflow. Based on user inputs, this subworkflow will allow us to run either of the three main models currently available through longgwas (GLM, CPH, GALLOP-LMM)
Do we see the GWAS subworkflow in the worjflow? Finally, the subworkflow is included in the main workflow as one step.
In addition, there are other key components that allow the workflow to run:
- .config file: We use the config file to set up general specifications for each of the executors we have available to run longgwas
- .yml file: We use this file to modify the longgwas arguments so that we can run longgwas based on our needs.
There is a Dockerfile that contains all software, dependencies and versions longgwas uses to run the main workflow.
However, it is no longer needed to build the docker image yourself. We are currently hosting the longgwas docker container on the Hub, which means that as long as you have docker installed, when you run longgwas, the tool will automatically pull the image from the Hub
longgwas has a very good online documentation resource It has information on how to run longgwas as a thorough description for all the parameters the tool supports
The workflow to run longgwas could be thought in two somewhat simple steps:
-
Terra step. We need to filter the genetic data for the cohorts we want to include. In addition, we also need to process the clinical data so that we generate a covariates file, get our model outcomes, as well as do a bit of data QC if needed.
-
longgwas step. longgwas can be run in two different ways:
- Using a local executor
- Using the google-bactch executor
We won't go through the data preparation step in Terra today as it is out of the scope of this workshop, but I have added an example notebook to quickly see an example. It is available on github so you can download and reuse.
CLICK ME
We are going to go through three examples running nextflow workflows and getting hands on interacting with some nextflow components Please, clone the github repository if you have not done so yet.
git clone [email protected]:AMCalejandro/longgwas_workshop.git
A very easy example to get familiar with process and workflow nextflow keywords.
This is convenient to familiarise yourself with dataflow. Where does my data go after the process run on the workflow?
This is an example to introduce attendees with channels nextflow structures, and how they are uused coupled with processes.
We will then try to add an extra process that makes use of the data coming out of the first process
A very complete example provided by the Nextflow training team which is great as you can easily understand all the components part of the nextflow script.
Before getting started, please clone the modularize branch of longwas github remote if you have not done so yet.
git clone --single-branch --branch modularize [email protected]:michael-ta/longitudinal-GWAS-pipeline.git
Now that we have seen some basic examples running nextflow, we are going to try to run our very first job with longgwas. To do so, we are going to through the following steps together.
- Define arguments and data paths on the yml file
- Choose one of the executors available on the config file
- Run longgwas with one simple comand
Once we have applied all the changes on the yml file, we can run with a local executor. We are going to apply several changes so that:
- We modify the filtering parameters based on our needs
- We specify to run either GLM, CPH, or GALLOP-LMM
- We define well the input data to the tool
Tu run the analysis we will repeteadly use the follow comand
nxtflow run workflows/main.nf \
-params-file params.yml \
--profile standard
Now I am going to give a quick demonstration on how we can connect to google and run our analyses using Nextflow google cloud batch executor.
- Nextflow is cool.
- longgwas allows us to run different type of longitudinal analyses from one simple command.
- There are two steps that need to be performed to run longgwas:
- Prepare the data in Terra.
- Run longgwas job in the background using google or HPC resources.
- Thanks to the abstraction provided by Nextflow executors, we can run our portable pipeline in other systems such as Google Cloud Platform