Skip to content

Commit

Permalink
included @ranshn feadback
Browse files Browse the repository at this point in the history
  • Loading branch information
kniec committed Apr 2, 2020
1 parent 97b3f53 commit 0f29d57
Show file tree
Hide file tree
Showing 17 changed files with 95 additions and 108 deletions.
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: "Resize Root Volume"
chapter: false
weight: 60
weight: 31
---

## Resize EBS
## Resize Cloud9 EBS

The default 10GB is quite small when using a docker file for Genomics.
Thus, let us resize the EBS volume used by the Cloud9 instance.
Expand All @@ -21,25 +21,33 @@ Afterward modify the EBS volume.

![](/images/nextflow-on-aws-batch/prerequisites/resize_ebs_1.png)

And chose a new volume size (e.g. 100GB)
And chose a new volume size (e.g. 100GB).

![](/images/nextflow-on-aws-batch/prerequisites/resize_ebs_2.png)

{{% notice info %}}
Please make sure that the changes went through and the EBS volume now reflects the new size of the volume.
{{% /notice %}}


## Resize FS

Changing the block device does not increase the size of the file system.

To do so head back to the Cloud9 instance and use the following commands.

```
```bash
sudo growpart /dev/xvda 1
sudo resize2fs $(df -h |awk '/^\/dev/{print $1}')
```

The root file-system should now show 99GB.

```bash
df -h
```
$ df -h

```bash
Filesystem Size Used Avail Use% Mounted on
devtmpfs 483M 60K 483M 1% /dev
tmpfs 493M 0 493M 0% /dev/shm
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@ weight: 50

## Install Java and Nextflow

The nextflow command-line tool uses the JVM. Thus, we will install AWS open-source variant [Amazon Corretto](https://docs.aws.amazon.com/corretto).

### Amazon Corretto

As a JVM we install [Amazon Corretto](https://docs.aws.amazon.com/corretto/latest/corretto-11-ug/generic-linux-install.html).
Adding the repository first.
To [install Corretto](https://docs.aws.amazon.com/corretto/latest/corretto-11-ug/generic-linux-install.html), we are adding the repository first.

```
sudo rpm --import https://yum.corretto.aws/corretto.key
Expand All @@ -26,24 +27,24 @@ java --version
### Nextflow

Installing Nextflow using the online installer.

The snippet creates the nextflow launcher in the current directory. So we just move the command to `/usr/local/bin` to have it ready to be executed anywhere.
```
curl -s https://get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
```

The above snippet creates the nextflow launcher in the current directory.

### Graphviz

To create svg we need to install graphviz.
Nextflow is able to render graphs for which it needs `graphviz` to be installed. `jq` will help us deal with JSON files.

```
sudo yum install -y graphviz jq
```

### AWS Region

Even though we are depending on an IAM Role and not local permissions some tools depend on having the `AWS_REGION` defined as environment variable - let's add it to our login shell configuration.

```
export AWS_REGION=$(curl --silent http://169.254.169.254/latest/dynamic/instance-identity/document | jq -r .region)
echo "AWS_REGION=${AWS_REGION}" |tee -a ~/.bashrc
Expand Down
4 changes: 2 additions & 2 deletions content/nextflow-on-aws-batch/10_prerequisites/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ weight: 10
# Getting Started
To start the workshop, follow one of the following depending on whether you are...

* ...[running the workshop on your own (in your own account)]({{< relref "nf_self_paced.md" >}}), or
* ...[attending an AWS hosted event (using AWS provided hashes)]({{< relref "nf_aws_event" >}})
* ...[running the workshop on your own (in your own account)](/nextflow-on-aws-batch/10_prerequisites/nf_self_paced.html), or
* ...[attending an AWS hosted event (using AWS provided hashes)](/nextflow-on-aws-batch/10_prerequisites/nf_aws_event.html)

Once you have completed with either setup, continue with **[Create a Workspace]({{< relref "30_workspace.md" >}})**
13 changes: 11 additions & 2 deletions content/nextflow-on-aws-batch/20_nextflow101/10_small.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,17 @@ The example workflow implements a simple RNA-seq pipeline which:
3. performs quantification
4. creates a MultiQC report


### Pull Image and run

As nextflow will run the image `nextflow/rnaseq-nf` and thus needs to download almost 3GB without advancing, we will first download the image so that we can see what docker is doing.
```
docker pull nextflow/rnaseq-nf
```

Afterwards we can start the script, which will subsequently start a container using the just pulled image.

```
nextflow run script7.nf --reads 'data/ggal/*_{1,2}.fq'
```

Expand Down Expand Up @@ -55,7 +64,7 @@ Done! Open the following report in your browser --> results/multiqc_report.html
$
```

The report can be previewed within Cloud9.
The report can be previewed within Cloud9. Right-click (**[1]**) on the file and choose `Preview` (**[2]**) from the context menue.

![](/images/nextflow-on-aws-batch/nextflow101/multiqc_report.png)

Expand All @@ -66,7 +75,7 @@ With more elaborate output nextflow can create more reports.
nextflow run script7.nf -with-report -with-trace -with-timeline -with-dag dag.png
```

This creates a bnuch more reports about the workflow. E.g.
This creates a bunch more reports about the workflow. E.g.:

![](/images/nextflow-on-aws-batch/nextflow101/dag.png)
![](/images/nextflow-on-aws-batch/nextflow101/timeline.png)
46 changes: 18 additions & 28 deletions content/nextflow-on-aws-batch/30_batch/00_ami.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
---
title: "Create AMI"
title: "Create Custom AMI"
chapter: false
weight: 01
---

AWS Batch uses ECS as an execution host and as such uses the official ECS-optimized image as a default.
AWS Batch uses Amazon ECS to schedule the container and as such uses the official ECS-optimized image as a default.
As the nextflow container needs to run the AWS-Cli we need to update the AMI so that we have everything we need.

<!--
Expand All @@ -18,10 +18,10 @@ Click [1] to copy the credentials in your clipboard and paste them into your Clo

## Install Packer

To update the image we use Hashicorps [packer]().
To update the image we use [Hashicorp packer](https://packer.io/). First we install the tool `bsdtar` to download and unzip the file in one go, before we change the permissions so that it can be executed.

```
sudo yum install -y bsdtar jq
sudo yum install -y bsdtar
curl -sLo - \
https://releases.hashicorp.com/packer/1.5.4/packer_1.5.4_linux_amd64.zip \
| sudo bsdtar xfz - -C /usr/bin/
Expand All @@ -30,29 +30,18 @@ sudo chmod +x /usr/bin/packer

### Build image

First we need to fetch the source AMI-ID.
We need to fetch the AMI-ID of the official ecs-optimized image and store the ID in an environment variable for later use.

```
export SOURCE_AMI=$(aws ec2 --region=us-east-1 describe-images --owners amazon \
export SOURCE_AMI=$(aws ec2 --region=${$AWS_REGION} describe-images --owners amazon \
--filters 'Name=name,Values=amzn-ami-????.??.???????-amazon-ecs-optimized ' 'Name=state,Values=available' \
--query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text)
echo $SOURCE_AMI
```

After that we create a `packer.json` file with the instruction on how to update the source AMI.

```
mkdir packer
cd packer
cat << \EOF > install-tools.sh
#!/bin/bash
set -x
yum install -y wget
wget -q https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /home/ec2-user/miniconda
/home/ec2-user/miniconda/bin/conda install -c conda-forge awscli
/home/ec2-user/miniconda/bin/aws --version
EOF
```
**Please CHECK: Changed to inline, to reduce the number of manual steps...**

```
cat << \EOF > packer.json
Expand All @@ -68,8 +57,13 @@ cat << \EOF > packer.json
"provisioners": [
{
"type": "shell",
"execute_command": "echo 'vagrant' | {{.Vars}} sudo -S -E bash '{{.Path}}'",
"script": "install-tools.sh"
"inline": [
"yum install -y wget",
"wget -q https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh",
"bash ./Miniconda3-latest-Linux-x86_64.sh -b -f -p /home/ec2-user/miniconda",
"/home/ec2-user/miniconda/bin/conda install -c conda-forge awscli",
"/home/ec2-user/miniconda/bin/aws --version"
]
}
],
"builders": [{
Expand All @@ -86,15 +80,11 @@ cat << \EOF > packer.json
}
EOF
```
Once the file is created we overwrite the `source_ami` with the gathered AMI-ID and start a build.
This process will take 5 to 10 minutes.

```
packer build -var "source_ami=${SOURCE_AMI}" packer.json
```

Fetch the AMI-ID.

```
aws ec2 --region=us-east-1 describe-images --owners $(aws sts get-caller-identity |jq -r '.Account') \
--filters 'Name=name,Values=ecs-batch-ami*' \
--query 'reverse(sort_by(Images, &CreationDate))[:1].ImageId' --output text
```
Please copy the resulting AMI-ID and into your clipboard; we will need it in the next step.
15 changes: 0 additions & 15 deletions content/nextflow-on-aws-batch/30_batch/10_wizard.md

This file was deleted.

8 changes: 8 additions & 0 deletions content/nextflow-on-aws-batch/30_batch/20_dashboard.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,14 @@ weight: 20

## AWS Batch Dashboard

Follow this [deep link to get to AWS Batch ](https://console.aws.amazon.com/batch/home) you will be greated by the landing page.

![landingpage](/images/nextflow-on-aws-batch/batch/1_landingpage.png)

Click on 'get started' and skip the wizard.

![wizard](/images/nextflow-on-aws-batch/batch/2_wizard.png)

Now we are at the AWS Batch Dashboard, which allows us to create

1. **Compute Environments**
Expand Down
21 changes: 0 additions & 21 deletions content/nextflow-on-aws-batch/30_batch/50_iam_batchrole.md

This file was deleted.

14 changes: 6 additions & 8 deletions content/nextflow-on-aws-batch/30_batch/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,17 +4,15 @@ chapter: true
weight: 30
---

# Setup AWS Batch as a Backend
# Setup AWS Batch

Nextflow uses **process** definitions to define what script or command to execute, an executor is used to determine **how** the process is executed on the target system.

## Architecture
The [nextflow documentation](https://www.nextflow.io/docs/latest/basic.html#execution-abstraction) exmplains it nicely:

The workshop will use two queues to submit and execute jobs.
> In other words, Nextflow provides an abstraction between the pipeline's functional logic and the underlying execution system. Thus it is possible to write a pipeline once and to seamlessly run it on your computer, a grid platform, or the cloud, without modifying it, by simply defining the target execution platform in the configuration file.
### workflow queue
> In other words, Nextflow provides an abstraction between the pipeline's functional logic and the underlying execution system. Thus it is possible to write a pipeline once and to seamlessly run it on your computer, a grid platform, or the cloud, without modifying it, by simply defining the target execution platform in the configuration file.
As the nextflow process is supervising the execution of a job it needs to run continuesly. Thus, a workflow queue will hold this job and executes them on rather small instances with 2vCPUs.
Within this workshop we already used a local **Docker** executor in the small example - for the remainder of the workshop we are going to use the [awsbatch executor](https://www.nextflow.io/docs/latest/awscloud.html#aws-batch) to submit jobs to [AWS Batch](https://aws.amazon.com/batch/).

### job queue

The nextflow process will compute a execution flow and submit jobs for individual tasks into the *job-queue*. Those tasks do the actual computation.
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: "ComputeEnv: Spot"
title: "Spot"
chapter: false
weight: 30
weight: 10
---

## Create EC2 Spot Compute Environment

TO run the actual genomics tasks, we create a compute environment (CE) using EC2 Spot instances.
To run the actual genomics tasks, we create a compute environment (CE) using EC2 Spot instances.

![](/images/nextflow-on-aws-batch/batch/4_create_ce_0.png?classes=shadow)

Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "ComputeEnv: OD"
title: "On-Demand"
chapter: false
weight: 31
---
Expand Down
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
---
title: "Job Queue"
title: "Create Job Queues"
chapter: false
weight: 40
---

## Create Job Queue

Two queues need to be created. Both are created via the consol.
Two queues need to be created. Both are created via the console.
![](/images/nextflow-on-aws-batch/batch/5_queue_workflow_0.png)

To create both queues we choose a name (**workflow-queue** / **job-queue**) a priority of 1 and the map them to the correct compute environment.
Expand Down
17 changes: 17 additions & 0 deletions content/nextflow-on-aws-batch/31_ce/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
title: "Compute Environments & Queues"
chapter: true
weight: 31
---

### Setup AWS Batch Compute Environments and Job Queues

**needs some improvements**

AWS Batch defines Compute Environments to executes jobs and Job Queues to submit jobs to.

The workshop will use two queues to submit and execute jobs.

- **job-queue** The nextflow process will compute a execution flow and submit jobs for individual tasks into the *job-queue*. Those tasks do the actual computation and uses the a Compute Environment `spot-ce` leveraging EC2 Spot instances.
- **workflow queue** As the nextflow process is supervising the execution of a job it needs to run continuously. Thus, a workflow queue will hold this job and execute them on rather small, On-Demand instances.

6 changes: 3 additions & 3 deletions content/nextflow-on-aws-batch/40_nextflow202/10_setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ EOF
## Create S3 Bucket

```
export BUCKET_NAME=nextflow-spot-batch-$(date +%s)
aws s3 mb s3://${BUCKET_NAME}
export BUCKET_NAME=nextflow-spot-batch--${RANDOM}-$(date +%s)
aws --region ${AWS_REGION} s3 mb s3://${BUCKET_NAME}
sed -i -e "s#workDir =.*#workDir = 's3://${BUCKET_NAME}'#" $HOME/.nextflow/config
sed -i -e "s/aws.region =.*/aws.region = '${AWS_REGION}'/" $HOME/.nextflow/config
```

6 changes: 4 additions & 2 deletions content/nextflow-on-aws-batch/40_nextflow202/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@ chapter: true
weight: 40
---

Now that we setup AWS Batch we can use Nextflow to submit jobs we are getting closer to our architecture.
To aproach it slowly we will do two steps here.
# Nextflow on AWS Batch

Now that we setup AWS Batch we can use Nextflow to submit jobs we are getting closer to our architecture.<br>
To approach it slowly we will do two steps here.

### Local Run

Expand Down
Loading

0 comments on commit 0f29d57

Please sign in to comment.