Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WiP] add nextflow-draft #45

Merged
merged 31 commits into from
Apr 27, 2020
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
62d2fb2
[WiP] add nextflow-draft
kniec Mar 30, 2020
0aa0169
update IAM AWSBatch role
kniec Mar 31, 2020
054a4b4
Finished batch-squared
kniec Mar 31, 2020
3b16aba
add learnings
kniec Mar 31, 2020
91d6f89
Adjust wording within intro
kniec Apr 2, 2020
88aeabb
more intro on what we will do within the workshop
kniec Apr 2, 2020
ae03368
included @ranshn feadback
kniec Apr 2, 2020
dc79f09
typo
kniec Apr 20, 2020
f2a7c1c
Update _index.md
plample Apr 21, 2020
6d0731d
update the role to attach to the instance
plample Apr 21, 2020
0204f0e
slide adjustment to job queue build
kniec Apr 22, 2020
bab090c
add sudo within packer build
kniec Apr 22, 2020
7eb6b21
typo in code
plample Apr 22, 2020
7359e07
Suggestions to guide and allow copy/paste of value
plample Apr 22, 2020
61f61fd
More content for the conclusion
plample Apr 22, 2020
445394f
Merge pull request #5 from plample/patch-5
kniec Apr 23, 2020
8126e9c
tweaked the what-we-learned page
kniec Apr 23, 2020
2224be0
Merge pull request #4 from plample/patch-4
kniec Apr 23, 2020
bb3af5a
adjustments to batched-squared-run
kniec Apr 23, 2020
e185659
Merge pull request #3 from plample/patch-1
kniec Apr 23, 2020
f9e43e0
Merge pull request #2 from plample/patch-3
kniec Apr 23, 2020
b788f95
Merge pull request #1 from plample/patch-2
kniec Apr 23, 2020
9b26681
start removing AMI creation
kniec Apr 23, 2020
83e51dc
update complete to run on base AMI + CleanUp
kniec Apr 24, 2020
35710af
fixed some markdown style errors
kniec Apr 24, 2020
1ed876c
incorperated feedback by Carlos (PR #45)
kniec Apr 27, 2020
8488bfa
fixed some markdown style errors
kniec Apr 27, 2020
bea547a
adding more feedback from Carlos
kniec Apr 27, 2020
b916201
fixed some markdown style errors
kniec Apr 27, 2020
742aab2
fixed some markdown style errors v2
kniec Apr 27, 2020
9c7e67d
fixed some markdown style errors v3
kniec Apr 27, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "Singapore"
chapter: false
disableToc: true
hidden: true
---

Create a Cloud9 Environment: [https://ap-southeast-1.console.aws.amazon.com/cloud9/home?region=ap-southeast-1](https://ap-southeast-1.console.aws.amazon.com/cloud9/home?region=ap-southeast-1)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "Ireland"
chapter: false
disableToc: true
hidden: true
---

Create a Cloud9 Environment: [https://eu-west-1.console.aws.amazon.com/cloud9/home?region=eu-west-1](https://eu-west-1.console.aws.amazon.com/cloud9/home?region=eu-west-1)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "N. Virgina"
chapter: false
disableToc: true
hidden: true
---

Create a Cloud9 Environment: [https://us-east-2.console.aws.amazon.com/cloud9/home?region=us-east-1](https://us-east-1.console.aws.amazon.com/cloud9/home?region=us-east-1)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "Ohio"
chapter: false
disableToc: true
hidden: true
---

Create a Cloud9 Environment: [https://us-east-2.console.aws.amazon.com/cloud9/home?region=us-east-2](https://us-east-2.console.aws.amazon.com/cloud9/home?region=us-east-2)
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: "Oregon"
chapter: false
disableToc: true
hidden: true
---

Create a Cloud9 Environment: [https://us-west-2.console.aws.amazon.com/cloud9/home?region=us-west-2](https://us-west-2.console.aws.amazon.com/cloud9/home?region=us-west-2)
43 changes: 43 additions & 0 deletions content/nextflow-on-aws-batch/10_prerequisites/30_workspace.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
---
title: "Create a Workspace"
chapter: false
weight: 30
---

{{% notice warning %}}
If you are running the workshop on your own, the Cloud9 workspace should be built by an IAM user with Administrator privileges, not the root account user. Please ensure you are logged in as an IAM user, not the root
account user.
{{% /notice %}}

{{% notice info %}}
If you are at an AWS hosted event (such as re:Invent, Kubecon, Immersion Day, or any other event hosted by
an AWS employee) follow the instructions on the region that should be used to launch resources
{{% /notice %}}

{{% notice tip %}}
Ad blockers, javascript disablers, and tracking blockers should be disabled for
the cloud9 domain, or connecting to the workspace might be impacted.
Cloud9 requires third-party-cookies. You can whitelist the [specific domains]( https://docs.aws.amazon.com/cloud9/latest/user-guide/troubleshooting.html#troubleshooting-env-loading).
{{% /notice %}}

### Launch Cloud9 in your closest region:

{{< tabs name="Region" >}}
{{< tab name="Oregon" include="30_us-west-2.md" />}}
{{< tab name="Ireland" include="30_eu-west-1.md" />}}
{{< tab name="N. Virginia" include="30_us-east-1.md" />}}
{{< tab name="Ohio" include="30_us-east-2.md" />}}
{{< tab name="Singapore" include="30_ap-southeast-1.md" />}}
{{< /tabs >}}

- Select **Create environment**
- Name it **nextflowworkshop**, and take all other defaults
- When it comes up, customize the environment by closing the **welcome tab**
and **lower work area**, and opening a new **terminal** tab in the main work area:
![c9before](/images/nextflow-on-aws-batch/prerequisites/c9before.png)

- Your workspace should now look like this:
![c9after](/images/nextflow-on-aws-batch/prerequisites/c9after.png)

- If you like this theme, you can choose it yourself by selecting **View / Themes / Solarized / Solarized Dark**
in the Cloud9 workspace menu.
55 changes: 55 additions & 0 deletions content/nextflow-on-aws-batch/10_prerequisites/31_resize_ebs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
title: "Resize Root Volume"
chapter: false
weight: 31
---

## Resize Cloud9 EBS

The default 10GB is quite small when using a docker file for Genomics.
Thus, let us resize the EBS volume used by the Cloud9 instance.

To change the EBS volume, please do

1. Select the Cloud9 instance in the EC2 console [deep link to get there](https://console.aws.amazon.com/ec2/v2/home)
2. Click the root-device link
3. click on the EBS-ID in the box appearing

![](/images/nextflow-on-aws-batch/prerequisites/resize_ebs_0.png)

Afterward modify the EBS volume.

![](/images/nextflow-on-aws-batch/prerequisites/resize_ebs_1.png)

And chose a new volume size (e.g. 100GB).

![](/images/nextflow-on-aws-batch/prerequisites/resize_ebs_2.png)

{{% notice info %}}
Please make sure that the changes went through and the EBS volume now reflects the new size of the volume.
{{% /notice %}}


## Resize FS

Changing the block device does not increase the size of the file system.

To do so head back to the Cloud9 instance and use the following commands.

```bash
sudo growpart /dev/xvda 1
sudo resize2fs $(df -h |awk '/^\/dev/{print $1}')
```

The root file-system should now show 99GB.

```bash
df -h
```

```bash
Filesystem Size Used Avail Use% Mounted on
devtmpfs 483M 60K 483M 1% /dev
tmpfs 493M 0 493M 0% /dev/shm
/dev/xvda1 99G 8.0G 91G 9% /
```
22 changes: 22 additions & 0 deletions content/nextflow-on-aws-batch/10_prerequisites/40_updateiam.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
title: "Attach the IAM role to your Workspace"
chapter: false
weight: 40
---

## Create an IAM role for your Workspace

1. Follow [this deep link to create an IAM role with Administrator access.](https://console.aws.amazon.com/iam/home#/roles$new?step=review&commonUseCase=EC2%2BEC2&selectedUseCase=EC2&policies=arn:aws:iam::aws:policy%2FAdministratorAccess)
1. Confirm that **AWS service** and **EC2** are selected, then click **Next** to view permissions.
1. Confirm that **AdministratorAccess** is checked, then click **Next: Tags** to assign tags.
1. Take the defaults, and click **Next: Review** to review.
1. Enter **nextflow-workshop-admin** for the Name, and click **Create role**.
![createrole](/images/nextflow-on-aws-batch/prerequisites/createrole.png)

## Attach the IAM role to your Workspace

1. Follow [this deep link to find your Cloud9 EC2 instance](https://console.aws.amazon.com/ec2/v2/home?#Instances:tag:Name=aws-cloud9-.*workshop.*;sort=desc:launchTime)
1. Select the instance, then choose **Actions / Instance Settings / Attach/Replace IAM Role**
![c9instancerole](/images/nextflow-on-aws-batch/prerequisites/c9instancerole.png)
1. Choose **nextflow-workshop-admin** from the **IAM Role** drop down, and select **Apply**
![c9attachrole](/images/nextflow-on-aws-batch/prerequisites/c9attachrole.png)
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
title: "Disable AWS Credential"
chapter: false
weight: 45
---

## Attach the IAM role to your Workspace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This title seems misplaced as the copy&paste from the previous exercise, there might be some text lacking, explaining what does it mean the things the png is pointing; The picture refers to a set of steps that may need to be described (reference to steps 1 , 2, 3)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming that unlike the EKS and with Kubectl in this workshop we don't need to refresh credentials , but I'd suggest adding a validation entry where people have to run

aws sts get-caller-identity

To validate that the workshop credentials are what you expected to run. This also will help to AWS SA's attending the workshop to understand if someone skipped a step

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, during the workshop on purpose I left enabled the AWS managed temporary credentials and it came back to the credentials of my user creating cloud-9 instead of the ones from the acquired Cloud-9 role. Definitely more instructions are needed if it takes more than 60mins to do the workshop.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed the headline and included get-caller-identity. good catch!


![](/images/nextflow-on-aws-batch/prerequisites/disable_cred.png)
51 changes: 51 additions & 0 deletions content/nextflow-on-aws-batch/10_prerequisites/50_install_tools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
title: "Install Tools"
chapter: false
weight: 50
---

## Install Java and Nextflow

The nextflow command-line tool uses the JVM. Thus, we will install AWS open-source variant [Amazon Corretto](https://docs.aws.amazon.com/corretto).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps adding a quick {% Notice %} with a bit of info of why coretto is so cool and why we prefer to use it rather than other JDK's may bring attention to stuff that our AWS teams are doing :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the abstract from their website


### Amazon Corretto

To [install Corretto](https://docs.aws.amazon.com/corretto/latest/corretto-11-ug/generic-linux-install.html), we are adding the repository first.

```
sudo rpm --import https://yum.corretto.aws/corretto.key
sudo curl -L -o /etc/yum.repos.d/corretto.repo https://yum.corretto.aws/corretto.repo
```

Afterwards install java-11 and check the installation.

```
sudo yum install -y java-11-amazon-corretto-devel
java --version
```

### Nextflow

Installing Nextflow using the online installer.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this stage we are installing nextflow but we have not introduced why nextflow is so cool and what makes it a great tool to manage genomic flows. For example explaining to other why DSL's make pipeline and workflow declaration more effective, etc. Perhaps the link is further in the workshop, but I reckon at this stage a narrative on introduction that explain why nextflow would help to specific users that want to do this workshop and understand some of the concepts they will see later on. (concepts (a) Nextflow, what is it (b) DSLs and Pipelines/workflows (c)why this is required in genomics and more generally).

Narrative does not necessarily need to be huge, but pointers to back up concepts will definitely help

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a destinct 'install nextflow' page with some praises

The snippet creates the nextflow launcher in the current directory. So we just move the command to `/usr/local/bin` to have it ready to be executed anywhere.
```
curl -s https://get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
```

### Graphviz

Nextflow is able to render graphs for which it needs `graphviz` to be installed. `jq` will help us deal with JSON files.

```
sudo yum install -y graphviz jq
```

### AWS Region
kniec marked this conversation as resolved.
Show resolved Hide resolved

Even though we are depending on an IAM Role and not local permissions some tools depend on having the `AWS_REGION` defined as environment variable - let's add it to our login shell configuration.

```
export AWS_REGION=$(curl --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region)
echo "AWS_REGION=${AWS_REGION}" |tee -a ~/.bashrc
```
13 changes: 13 additions & 0 deletions content/nextflow-on-aws-batch/10_prerequisites/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
---
title: "Start the workshop..."
chapter: true
weight: 10
---

# Getting Started
To start the workshop, follow one of the following depending on whether you are...

* ...[running the workshop on your own (in your own account)](/nextflow-on-aws-batch/10_prerequisites/nf_self_paced.html), or
* ...[attending an AWS hosted event (using AWS provided hashes)](/nextflow-on-aws-batch/10_prerequisites/nf_aws_event.html)

Once you have completed with either setup, continue with **[Create a Workspace]({{< relref "30_workspace.md" >}})**
26 changes: 26 additions & 0 deletions content/nextflow-on-aws-batch/10_prerequisites/nf_aws_event.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: "...at an AWS event"
chapter: false
weight: 20
---

### Running the workshop at an AWS Event

{{% notice warning %}}
Only complete this section if you are at an AWS hosted event (such as re:Invent,
Kubecon, Immersion Day, or any other event hosted by an AWS employee). If you
are running the workshop on your own, go to: [Start the workshop on your own]({{< relref "self_paced.md" >}}).
{{% /notice %}}

### Login to the AWS Workshop Portal

If you are at an AWS event, an AWS acccount was created for you to use throughout the workshop. You will need the **Participant Hash** provided to you by the event's organizers.

1. Connect to the portal by browsing to [https://dashboard.eventengine.run/](https://dashboard.eventengine.run/).
2. Enter the Hash in the text box, and click **Proceed**
3. In the User Dashboard screen, click **AWS Console**
4. In the popup page, click **Open Console**

You are now logged in to the AWS console in an account that was created for you, and will be available only throughout the workshop run time.

Once you have completed the step above, **you can head straight to [Create a Workspace]({{< relref "30_workspace.md" >}})**
37 changes: 37 additions & 0 deletions content/nextflow-on-aws-batch/10_prerequisites/nf_self_paced.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
---
title: "...on your own"
chapter: false
weight: 10
---

{{% notice warning %}}
Only complete this section if you are running the workshop on your own. If you are at an AWS hosted event (such as re:Invent, Kubecon, Immersion Day, etc), go to [Start the workshop at an AWS event]({{< relref "aws_event.md" >}}).
{{% /notice %}}

### Running the workshop on your own

{{% notice warning %}}
Your account must have the ability to create new IAM roles and scope other IAM permissions.
{{% /notice %}}

1. If you don't already have an AWS account with Administrator access: [create
one now by clicking here](https://aws.amazon.com/getting-started/)

1. Once you have an AWS account, ensure you are following the remaining workshop steps
as an IAM user with administrator access to the AWS account:
[Create a new IAM user to use for the workshop](https://console.aws.amazon.com/iam/home?#/users$new)

1. Enter the user details:
![Create User](/images/using_ec2_spot_instances_with_eks/prerequisites/iam-1-create-user.png)

1. Attach the AdministratorAccess IAM Policy:
![Attach Policy](/images/using_ec2_spot_instances_with_eks/prerequisites/iam-2-attach-policy.png)

1. Click to create the new user:
![Confirm User](/images/using_ec2_spot_instances_with_eks/prerequisites/iam-3-create-user.png)

1. Take note of the login URL and save:
![Login URL](/images/using_ec2_spot_instances_with_eks/prerequisites/iam-4-save-url.png)


Once you have completed the step above, **you can head straight to [Create a Workspace]({{< relref "30_workspace.md" >}})**
85 changes: 85 additions & 0 deletions content/nextflow-on-aws-batch/20_nextflow101/10_small.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
---
title: "Small Workflow Example"
chapter: false
weight: 10
---

## Local Run


Nextflow allows the execution of any command or user script by using a process definition.

A process is defined by providing three main declarations: the process [inputs](https://www.nextflow.io/docs/latest/process.html#inputs), the process [outputs](https://www.nextflow.io/docs/latest/process.html#outputs) and finally the command [script](https://www.nextflow.io/docs/latest/process.html#script).

The example workflow implements a simple RNA-seq pipeline which:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A decoration of what RNA-Seq means with references will help a lot for people to get context. https://en.wikipedia.org/wiki/RNA-Seq

Think the workshop is also public and there will be other people with little context of genomic domain that actually are doing the workshop to understand this type of things, Help them with visual aids . If anything refer at least to wikipedia for people to have fun :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@plample can you help here?


1. Indexes a trascriptome file.
2. performs quality controls
3. performs quantification
4. creates a MultiQC report

### Pull Image and run

As nextflow will run the image `nextflow/rnaseq-nf` and thus needs to download almost 3GB without advancing, we will first download the image so that we can see what docker is doing.

```bash
docker pull nextflow/rnaseq-nf
kniec marked this conversation as resolved.
Show resolved Hide resolved
```

Outut will look like this:

```bash
$ docker pull nextflow/rnaseq-nf
Using default tag: latest
latest: Pulling from nextflow/rnaseq-nf
b8f262c62ec6: Pull complete
fa9712f20293: Pull complete
6ec1e76960c6: Pull complete
fe231f126300: Pull complete
b5060e108b58: Pull complete
ba0e69f9489f: Pull complete
248da7e19707: Pull complete
Digest: sha256:0ac11ff903d39ad7db18e63c8958fb11864192840b3d9ece823007a54f3703e0
Status: Downloaded newer image for nextflow/rnaseq-nf:latest
```

Afterwards we can start the script, which will subsequently start a container using the just pulled image.

```bash
nextflow run script7.nf --reads 'data/ggal/*_{1,2}.fq'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before executing this file, it would help to explain how users can read what's in the file, and get some attribution to the documentation and syntax (although perhaps that comes later on).

The idea at this stage is that people that are curious and want to know what's going on will look for the script7.nf and get through repetition and re-emphasis the point on DSL. inputs outputs and scripts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a paragraph pointing out how UNIX-esce the DSL is and referred to documentation and tutorials to dive deeper..

```

The output will look like this.

```bash
$ nextflow run script7.nf --reads 'data/ggal/*_{1,2}.fq'
N E X T F L O W ~ version 20.01.0
Launching `script7.nf` [admiring_edison] - revision: ce58523d1d
R N A S E Q - N F P I P E L I N E
===================================
transcriptome: /home/ec2-user/environment/nextflow-tutorial/data/ggal/transcriptome.fa
reads : data/ggal/*_{1,2}.fq
outdir : results
executor > local (8)
[62/dfabf8] process > index [100%] 1 of 1 ✔
[c7/aa994c] process > quantification [100%] 3 of 3 ✔
[86/c377f4] process > fastqc [100%] 3 of 3 ✔
[08/3c2c49] process > multiqc [100%] 1 of 1 ✔
Done! Open the following report in your browser --> results/multiqc_report.html
$
```

The report can be previewed within Cloud9. Right-click (**[1]**) on the file and choose `Preview` (**[2]**) from the context menue.

![multiqc_report](/images/nextflow-on-aws-batch/nextflow101/multiqc_report.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was going to ask: "So what am I looking?" at the report, but definitely after seeing the results below, I'm probably looking at a really mashed up fragmented parts of a body (with gut liver and lungs overrepresented ) 👍 )... Jokes aside, if there's anything in particular to call out in outputs, nice to do, otherwise that above was intended to be a joke... :P

image


With more elaborate output nextflow can create more reports.

```bash
nextflow run script7.nf -with-report -with-trace -with-timeline -with-dag dag.png
```

This creates a bunch more reports about the workflow. E.g.:

![dag](/images/nextflow-on-aws-batch/nextflow101/dag.png)
![timeline](/images/nextflow-on-aws-batch/nextflow101/timeline.png)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm super biased on workshop that guide Extra Activities/Execises that are guided but not solved and help others with pointers and in a structured way to find their understanding on a topic. You probably know it well from the exercises at the bottom of https://ec2spotworkshops.com/using_ec2_spot_instances_with_eks/scaling/test_hpa.html.

Could you think of any relevant Optional exercise that could help people going through this workshop to go through this and learn something new either on Genomics, Nextflow or AWS Batch & AWS ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed - just some eyecandy to show the DAG

Loading