-
Notifications
You must be signed in to change notification settings - Fork 151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WiP] add nextflow-draft #45
Changes from 24 commits
62d2fb2
0aa0169
054a4b4
3b16aba
91d6f89
88aeabb
ae03368
dc79f09
f2a7c1c
6d0731d
0204f0e
bab090c
7eb6b21
7359e07
61f61fd
445394f
8126e9c
2224be0
bb3af5a
e185659
f9e43e0
b788f95
9b26681
83e51dc
35710af
1ed876c
8488bfa
bea547a
b916201
742aab2
9c7e67d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
title: "Singapore" | ||
chapter: false | ||
disableToc: true | ||
hidden: true | ||
--- | ||
|
||
Create a Cloud9 Environment: [https://ap-southeast-1.console.aws.amazon.com/cloud9/home?region=ap-southeast-1](https://ap-southeast-1.console.aws.amazon.com/cloud9/home?region=ap-southeast-1) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
title: "Ireland" | ||
chapter: false | ||
disableToc: true | ||
hidden: true | ||
--- | ||
|
||
Create a Cloud9 Environment: [https://eu-west-1.console.aws.amazon.com/cloud9/home?region=eu-west-1](https://eu-west-1.console.aws.amazon.com/cloud9/home?region=eu-west-1) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
title: "N. Virgina" | ||
chapter: false | ||
disableToc: true | ||
hidden: true | ||
--- | ||
|
||
Create a Cloud9 Environment: [https://us-east-2.console.aws.amazon.com/cloud9/home?region=us-east-1](https://us-east-1.console.aws.amazon.com/cloud9/home?region=us-east-1) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
title: "Ohio" | ||
chapter: false | ||
disableToc: true | ||
hidden: true | ||
--- | ||
|
||
Create a Cloud9 Environment: [https://us-east-2.console.aws.amazon.com/cloud9/home?region=us-east-2](https://us-east-2.console.aws.amazon.com/cloud9/home?region=us-east-2) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,8 @@ | ||
--- | ||
title: "Oregon" | ||
chapter: false | ||
disableToc: true | ||
hidden: true | ||
--- | ||
|
||
Create a Cloud9 Environment: [https://us-west-2.console.aws.amazon.com/cloud9/home?region=us-west-2](https://us-west-2.console.aws.amazon.com/cloud9/home?region=us-west-2) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,43 @@ | ||
--- | ||
title: "Create a Workspace" | ||
chapter: false | ||
weight: 30 | ||
--- | ||
|
||
{{% notice warning %}} | ||
If you are running the workshop on your own, the Cloud9 workspace should be built by an IAM user with Administrator privileges, not the root account user. Please ensure you are logged in as an IAM user, not the root | ||
account user. | ||
{{% /notice %}} | ||
|
||
{{% notice info %}} | ||
If you are at an AWS hosted event (such as re:Invent, Kubecon, Immersion Day, or any other event hosted by | ||
an AWS employee) follow the instructions on the region that should be used to launch resources | ||
{{% /notice %}} | ||
|
||
{{% notice tip %}} | ||
Ad blockers, javascript disablers, and tracking blockers should be disabled for | ||
the cloud9 domain, or connecting to the workspace might be impacted. | ||
Cloud9 requires third-party-cookies. You can whitelist the [specific domains]( https://docs.aws.amazon.com/cloud9/latest/user-guide/troubleshooting.html#troubleshooting-env-loading). | ||
{{% /notice %}} | ||
|
||
### Launch Cloud9 in your closest region: | ||
|
||
{{< tabs name="Region" >}} | ||
{{< tab name="Oregon" include="30_us-west-2.md" />}} | ||
{{< tab name="Ireland" include="30_eu-west-1.md" />}} | ||
{{< tab name="N. Virginia" include="30_us-east-1.md" />}} | ||
{{< tab name="Ohio" include="30_us-east-2.md" />}} | ||
{{< tab name="Singapore" include="30_ap-southeast-1.md" />}} | ||
{{< /tabs >}} | ||
|
||
- Select **Create environment** | ||
- Name it **nextflowworkshop**, and take all other defaults | ||
- When it comes up, customize the environment by closing the **welcome tab** | ||
and **lower work area**, and opening a new **terminal** tab in the main work area: | ||
![c9before](/images/nextflow-on-aws-batch/prerequisites/c9before.png) | ||
|
||
- Your workspace should now look like this: | ||
![c9after](/images/nextflow-on-aws-batch/prerequisites/c9after.png) | ||
|
||
- If you like this theme, you can choose it yourself by selecting **View / Themes / Solarized / Solarized Dark** | ||
in the Cloud9 workspace menu. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
--- | ||
title: "Resize Root Volume" | ||
chapter: false | ||
weight: 31 | ||
--- | ||
|
||
## Resize Cloud9 EBS | ||
|
||
The default 10GB is quite small when using a docker file for Genomics. | ||
Thus, let us resize the EBS volume used by the Cloud9 instance. | ||
|
||
To change the EBS volume, please do | ||
|
||
1. Select the Cloud9 instance in the EC2 console [deep link to get there](https://console.aws.amazon.com/ec2/v2/home) | ||
2. Click the root-device link | ||
3. click on the EBS-ID in the box appearing | ||
|
||
![](/images/nextflow-on-aws-batch/prerequisites/resize_ebs_0.png) | ||
|
||
Afterward modify the EBS volume. | ||
|
||
![](/images/nextflow-on-aws-batch/prerequisites/resize_ebs_1.png) | ||
|
||
And chose a new volume size (e.g. 100GB). | ||
|
||
![](/images/nextflow-on-aws-batch/prerequisites/resize_ebs_2.png) | ||
|
||
{{% notice info %}} | ||
Please make sure that the changes went through and the EBS volume now reflects the new size of the volume. | ||
{{% /notice %}} | ||
|
||
|
||
## Resize FS | ||
|
||
Changing the block device does not increase the size of the file system. | ||
|
||
To do so head back to the Cloud9 instance and use the following commands. | ||
|
||
```bash | ||
sudo growpart /dev/xvda 1 | ||
sudo resize2fs $(df -h |awk '/^\/dev/{print $1}') | ||
``` | ||
|
||
The root file-system should now show 99GB. | ||
|
||
```bash | ||
df -h | ||
``` | ||
|
||
```bash | ||
Filesystem Size Used Avail Use% Mounted on | ||
devtmpfs 483M 60K 483M 1% /dev | ||
tmpfs 493M 0 493M 0% /dev/shm | ||
/dev/xvda1 99G 8.0G 91G 9% / | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
--- | ||
title: "Attach the IAM role to your Workspace" | ||
chapter: false | ||
weight: 40 | ||
--- | ||
|
||
## Create an IAM role for your Workspace | ||
|
||
1. Follow [this deep link to create an IAM role with Administrator access.](https://console.aws.amazon.com/iam/home#/roles$new?step=review&commonUseCase=EC2%2BEC2&selectedUseCase=EC2&policies=arn:aws:iam::aws:policy%2FAdministratorAccess) | ||
1. Confirm that **AWS service** and **EC2** are selected, then click **Next** to view permissions. | ||
1. Confirm that **AdministratorAccess** is checked, then click **Next: Tags** to assign tags. | ||
1. Take the defaults, and click **Next: Review** to review. | ||
1. Enter **nextflow-workshop-admin** for the Name, and click **Create role**. | ||
![createrole](/images/nextflow-on-aws-batch/prerequisites/createrole.png) | ||
|
||
## Attach the IAM role to your Workspace | ||
|
||
1. Follow [this deep link to find your Cloud9 EC2 instance](https://console.aws.amazon.com/ec2/v2/home?#Instances:tag:Name=aws-cloud9-.*workshop.*;sort=desc:launchTime) | ||
1. Select the instance, then choose **Actions / Instance Settings / Attach/Replace IAM Role** | ||
![c9instancerole](/images/nextflow-on-aws-batch/prerequisites/c9instancerole.png) | ||
1. Choose **nextflow-workshop-admin** from the **IAM Role** drop down, and select **Apply** | ||
![c9attachrole](/images/nextflow-on-aws-batch/prerequisites/c9attachrole.png) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
--- | ||
title: "Disable AWS Credential" | ||
chapter: false | ||
weight: 45 | ||
--- | ||
|
||
## Attach the IAM role to your Workspace | ||
|
||
![](/images/nextflow-on-aws-batch/prerequisites/disable_cred.png) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
--- | ||
title: "Install Tools" | ||
chapter: false | ||
weight: 50 | ||
--- | ||
|
||
## Install Java and Nextflow | ||
|
||
The nextflow command-line tool uses the JVM. Thus, we will install AWS open-source variant [Amazon Corretto](https://docs.aws.amazon.com/corretto). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Perhaps adding a quick {% Notice %} with a bit of info of why coretto is so cool and why we prefer to use it rather than other JDK's may bring attention to stuff that our AWS teams are doing :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the abstract from their website |
||
|
||
### Amazon Corretto | ||
|
||
To [install Corretto](https://docs.aws.amazon.com/corretto/latest/corretto-11-ug/generic-linux-install.html), we are adding the repository first. | ||
|
||
``` | ||
sudo rpm --import https://yum.corretto.aws/corretto.key | ||
sudo curl -L -o /etc/yum.repos.d/corretto.repo https://yum.corretto.aws/corretto.repo | ||
``` | ||
|
||
Afterwards install java-11 and check the installation. | ||
|
||
``` | ||
sudo yum install -y java-11-amazon-corretto-devel | ||
java --version | ||
``` | ||
|
||
### Nextflow | ||
|
||
Installing Nextflow using the online installer. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. At this stage we are installing nextflow but we have not introduced why nextflow is so cool and what makes it a great tool to manage genomic flows. For example explaining to other why DSL's make pipeline and workflow declaration more effective, etc. Perhaps the link is further in the workshop, but I reckon at this stage a narrative on introduction that explain why nextflow would help to specific users that want to do this workshop and understand some of the concepts they will see later on. (concepts (a) Nextflow, what is it (b) DSLs and Pipelines/workflows (c)why this is required in genomics and more generally). Narrative does not necessarily need to be huge, but pointers to back up concepts will definitely help There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added a destinct 'install nextflow' page with some praises |
||
The snippet creates the nextflow launcher in the current directory. So we just move the command to `/usr/local/bin` to have it ready to be executed anywhere. | ||
``` | ||
curl -s https://get.nextflow.io | bash | ||
sudo mv nextflow /usr/local/bin/ | ||
``` | ||
|
||
### Graphviz | ||
|
||
Nextflow is able to render graphs for which it needs `graphviz` to be installed. `jq` will help us deal with JSON files. | ||
|
||
``` | ||
sudo yum install -y graphviz jq | ||
``` | ||
|
||
### AWS Region | ||
kniec marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Even though we are depending on an IAM Role and not local permissions some tools depend on having the `AWS_REGION` defined as environment variable - let's add it to our login shell configuration. | ||
|
||
``` | ||
export AWS_REGION=$(curl --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region) | ||
echo "AWS_REGION=${AWS_REGION}" |tee -a ~/.bashrc | ||
``` |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,13 @@ | ||
--- | ||
title: "Start the workshop..." | ||
chapter: true | ||
weight: 10 | ||
--- | ||
|
||
# Getting Started | ||
To start the workshop, follow one of the following depending on whether you are... | ||
|
||
* ...[running the workshop on your own (in your own account)](/nextflow-on-aws-batch/10_prerequisites/nf_self_paced.html), or | ||
* ...[attending an AWS hosted event (using AWS provided hashes)](/nextflow-on-aws-batch/10_prerequisites/nf_aws_event.html) | ||
|
||
Once you have completed with either setup, continue with **[Create a Workspace]({{< relref "30_workspace.md" >}})** |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
title: "...at an AWS event" | ||
chapter: false | ||
weight: 20 | ||
--- | ||
|
||
### Running the workshop at an AWS Event | ||
|
||
{{% notice warning %}} | ||
Only complete this section if you are at an AWS hosted event (such as re:Invent, | ||
Kubecon, Immersion Day, or any other event hosted by an AWS employee). If you | ||
are running the workshop on your own, go to: [Start the workshop on your own]({{< relref "self_paced.md" >}}). | ||
{{% /notice %}} | ||
|
||
### Login to the AWS Workshop Portal | ||
|
||
If you are at an AWS event, an AWS acccount was created for you to use throughout the workshop. You will need the **Participant Hash** provided to you by the event's organizers. | ||
|
||
1. Connect to the portal by browsing to [https://dashboard.eventengine.run/](https://dashboard.eventengine.run/). | ||
2. Enter the Hash in the text box, and click **Proceed** | ||
3. In the User Dashboard screen, click **AWS Console** | ||
4. In the popup page, click **Open Console** | ||
|
||
You are now logged in to the AWS console in an account that was created for you, and will be available only throughout the workshop run time. | ||
|
||
Once you have completed the step above, **you can head straight to [Create a Workspace]({{< relref "30_workspace.md" >}})** |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
--- | ||
title: "...on your own" | ||
chapter: false | ||
weight: 10 | ||
--- | ||
|
||
{{% notice warning %}} | ||
Only complete this section if you are running the workshop on your own. If you are at an AWS hosted event (such as re:Invent, Kubecon, Immersion Day, etc), go to [Start the workshop at an AWS event]({{< relref "aws_event.md" >}}). | ||
{{% /notice %}} | ||
|
||
### Running the workshop on your own | ||
|
||
{{% notice warning %}} | ||
Your account must have the ability to create new IAM roles and scope other IAM permissions. | ||
{{% /notice %}} | ||
|
||
1. If you don't already have an AWS account with Administrator access: [create | ||
one now by clicking here](https://aws.amazon.com/getting-started/) | ||
|
||
1. Once you have an AWS account, ensure you are following the remaining workshop steps | ||
as an IAM user with administrator access to the AWS account: | ||
[Create a new IAM user to use for the workshop](https://console.aws.amazon.com/iam/home?#/users$new) | ||
|
||
1. Enter the user details: | ||
![Create User](/images/using_ec2_spot_instances_with_eks/prerequisites/iam-1-create-user.png) | ||
|
||
1. Attach the AdministratorAccess IAM Policy: | ||
![Attach Policy](/images/using_ec2_spot_instances_with_eks/prerequisites/iam-2-attach-policy.png) | ||
|
||
1. Click to create the new user: | ||
![Confirm User](/images/using_ec2_spot_instances_with_eks/prerequisites/iam-3-create-user.png) | ||
|
||
1. Take note of the login URL and save: | ||
![Login URL](/images/using_ec2_spot_instances_with_eks/prerequisites/iam-4-save-url.png) | ||
|
||
|
||
Once you have completed the step above, **you can head straight to [Create a Workspace]({{< relref "30_workspace.md" >}})** |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,85 @@ | ||
--- | ||
title: "Small Workflow Example" | ||
chapter: false | ||
weight: 10 | ||
--- | ||
|
||
## Local Run | ||
|
||
|
||
Nextflow allows the execution of any command or user script by using a process definition. | ||
|
||
A process is defined by providing three main declarations: the process [inputs](https://www.nextflow.io/docs/latest/process.html#inputs), the process [outputs](https://www.nextflow.io/docs/latest/process.html#outputs) and finally the command [script](https://www.nextflow.io/docs/latest/process.html#script). | ||
|
||
The example workflow implements a simple RNA-seq pipeline which: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A decoration of what RNA-Seq means with references will help a lot for people to get context. https://en.wikipedia.org/wiki/RNA-Seq Think the workshop is also public and there will be other people with little context of genomic domain that actually are doing the workshop to understand this type of things, Help them with visual aids . If anything refer at least to wikipedia for people to have fun :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @plample can you help here? |
||
|
||
1. Indexes a trascriptome file. | ||
2. performs quality controls | ||
3. performs quantification | ||
4. creates a MultiQC report | ||
|
||
### Pull Image and run | ||
|
||
As nextflow will run the image `nextflow/rnaseq-nf` and thus needs to download almost 3GB without advancing, we will first download the image so that we can see what docker is doing. | ||
|
||
```bash | ||
docker pull nextflow/rnaseq-nf | ||
kniec marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
Outut will look like this: | ||
|
||
```bash | ||
$ docker pull nextflow/rnaseq-nf | ||
Using default tag: latest | ||
latest: Pulling from nextflow/rnaseq-nf | ||
b8f262c62ec6: Pull complete | ||
fa9712f20293: Pull complete | ||
6ec1e76960c6: Pull complete | ||
fe231f126300: Pull complete | ||
b5060e108b58: Pull complete | ||
ba0e69f9489f: Pull complete | ||
248da7e19707: Pull complete | ||
Digest: sha256:0ac11ff903d39ad7db18e63c8958fb11864192840b3d9ece823007a54f3703e0 | ||
Status: Downloaded newer image for nextflow/rnaseq-nf:latest | ||
``` | ||
|
||
Afterwards we can start the script, which will subsequently start a container using the just pulled image. | ||
|
||
```bash | ||
nextflow run script7.nf --reads 'data/ggal/*_{1,2}.fq' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Before executing this file, it would help to explain how users can read what's in the file, and get some attribution to the documentation and syntax (although perhaps that comes later on). The idea at this stage is that people that are curious and want to know what's going on will look for the script7.nf and get through repetition and re-emphasis the point on DSL. inputs outputs and scripts. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added a paragraph pointing out how UNIX-esce the DSL is and referred to documentation and tutorials to dive deeper.. |
||
``` | ||
|
||
The output will look like this. | ||
|
||
```bash | ||
$ nextflow run script7.nf --reads 'data/ggal/*_{1,2}.fq' | ||
N E X T F L O W ~ version 20.01.0 | ||
Launching `script7.nf` [admiring_edison] - revision: ce58523d1d | ||
R N A S E Q - N F P I P E L I N E | ||
=================================== | ||
transcriptome: /home/ec2-user/environment/nextflow-tutorial/data/ggal/transcriptome.fa | ||
reads : data/ggal/*_{1,2}.fq | ||
outdir : results | ||
executor > local (8) | ||
[62/dfabf8] process > index [100%] 1 of 1 ✔ | ||
[c7/aa994c] process > quantification [100%] 3 of 3 ✔ | ||
[86/c377f4] process > fastqc [100%] 3 of 3 ✔ | ||
[08/3c2c49] process > multiqc [100%] 1 of 1 ✔ | ||
Done! Open the following report in your browser --> results/multiqc_report.html | ||
$ | ||
``` | ||
|
||
The report can be previewed within Cloud9. Right-click (**[1]**) on the file and choose `Preview` (**[2]**) from the context menue. | ||
|
||
![multiqc_report](/images/nextflow-on-aws-batch/nextflow101/multiqc_report.png) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I was going to ask: "So what am I looking?" at the report, but definitely after seeing the results below, I'm probably looking at a really mashed up fragmented parts of a body (with gut liver and lungs overrepresented ) 👍 )... Jokes aside, if there's anything in particular to call out in outputs, nice to do, otherwise that above was intended to be a joke... :P |
||
|
||
With more elaborate output nextflow can create more reports. | ||
|
||
```bash | ||
nextflow run script7.nf -with-report -with-trace -with-timeline -with-dag dag.png | ||
``` | ||
|
||
This creates a bunch more reports about the workflow. E.g.: | ||
|
||
![dag](/images/nextflow-on-aws-batch/nextflow101/dag.png) | ||
![timeline](/images/nextflow-on-aws-batch/nextflow101/timeline.png) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm super biased on workshop that guide Extra Activities/Execises that are guided but not solved and help others with pointers and in a structured way to find their understanding on a topic. You probably know it well from the exercises at the bottom of https://ec2spotworkshops.com/using_ec2_spot_instances_with_eks/scaling/test_hpa.html. Could you think of any relevant Optional exercise that could help people going through this workshop to go through this and learn something new either on Genomics, Nextflow or AWS Batch & AWS ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. removed - just some eyecandy to show the DAG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This title seems misplaced as the copy&paste from the previous exercise, there might be some text lacking, explaining what does it mean the things the png is pointing; The picture refers to a set of steps that may need to be described (reference to steps 1 , 2, 3)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming that unlike the EKS and with Kubectl in this workshop we don't need to refresh credentials , but I'd suggest adding a validation entry where people have to run
To validate that the workshop credentials are what you expected to run. This also will help to AWS SA's attending the workshop to understand if someone skipped a step
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, during the workshop on purpose I left enabled the AWS managed temporary credentials and it came back to the credentials of my user creating cloud-9 instead of the ones from the acquired Cloud-9 role. Definitely more instructions are needed if it takes more than 60mins to do the workshop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed the headline and included get-caller-identity. good catch!