Skip to content
This repository has been archived by the owner on Aug 9, 2023. It is now read-only.

Release Update #65

Merged
merged 33 commits into from
Sep 25, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
8a1300a
WIP add test/prod deployment
wleepang Jul 11, 2019
4a845f7
WIP: simplify nextflow container entrypoint
wleepang Jul 18, 2019
f693e9b
Merge branch 'update/multistage-deployment' into update/nextflow-cont…
wleepang Jul 30, 2019
97cf3e3
functionalize s3_uri creation
wleepang Jul 30, 2019
eabba8f
fix typo
wleepang Jul 31, 2019
94c2eff
Update README
wleepang Sep 9, 2019
e4c3aa1
refactor nextflow assets
wleepang Sep 9, 2019
6d763e1
update generated config
wleepang Sep 9, 2019
6d28550
optimize codebuid speed
wleepang Sep 10, 2019
94fd5a9
Merge pull request #60 from wleepang/update/sfn-build
wleepang Sep 10, 2019
900cd4e
Merge branch 'master' into update/nextflow-container
wleepang Sep 10, 2019
5f6be5e
increase size of build instance
wleepang Sep 10, 2019
fad3834
Merge branch 'master' into update/nextflow-container
wleepang Sep 10, 2019
a450d49
add apt-get update
wleepang Sep 11, 2019
474d1a7
increase thread count
wleepang Sep 11, 2019
7ff1e03
increase thread count
wleepang Sep 11, 2019
a437408
handle bam index file staging
wleepang Sep 11, 2019
3eff0f1
handle non-standard paired read file names
wleepang Sep 11, 2019
6246cac
update example workflow
wleepang Sep 11, 2019
5a0db47
add more logging output to entrypoint
wleepang Sep 13, 2019
5ac47b3
update nextflow guide
wleepang Sep 13, 2019
f8e0840
update branches for two stage deployment
wleepang Sep 13, 2019
f9e1985
use simplified parameters requirements
wleepang Sep 13, 2019
d518f65
Merge pull request #61 from wleepang/update/nextflow-container
wleepang Sep 13, 2019
24e3934
fix typo
wleepang Sep 13, 2019
90e90d7
add log output for config
wleepang Sep 15, 2019
4b32352
update aio templates
wleepang Sep 21, 2019
82cd60f
remove deprecated keypairname parameter
wleepang Sep 21, 2019
4304e11
Merge pull request #63 from wleepang/master
wleepang Sep 21, 2019
7382f60
implement ebs-autoscale on docker datavolume for sfn
wleepang Sep 21, 2019
3cd57a3
Merge branch 'master' of github.com:aws-samples/aws-genomics-workflows
wleepang Sep 24, 2019
e19f52b
update details to core-environment
wleepang Sep 25, 2019
e5e1430
Merge pull request #64 from wleepang/master
wleepang Sep 25, 2019
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 12 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,15 @@ before_deploy:
- bash _scripts/configure-deploy.sh

deploy:
provider: script
script: bash _scripts/deploy.sh
skip_cleanup: true
on:
repo: aws-samples/aws-genomics-workflows
branch: master
- provider: script
script: bash _scripts/deploy.sh production
skip_cleanup: true
on:
repo: aws-samples/aws-genomics-workflows
branch: release
- provider: script
script: bash _scripts/deploy.sh test
skip_cleanup: true
on:
repo: aws-samples/aws-genomics-workflows
branch: master
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The documentation is built using mkdocs.
Install dependencies:

```bash
$ conda env create --file enviroment.yaml
$ conda env create --file environment.yaml
```

This will create a `conda` environment called `mkdocs`
Expand Down
87 changes: 66 additions & 21 deletions _scripts/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,75 @@ set -e
bash _scripts/make-artifacts.sh
mkdocs build

ASSET_BUCKET=s3://aws-genomics-workflows
ASSET_STAGE=${1:-production}

echo "publishing artifacts:"
aws s3 sync \
--profile asset-publisher \
--acl public-read \
--delete \
./artifacts \
s3://aws-genomics-workflows/artifacts

function s3_uri() {
BUCKET=$1
shift

echo "publishing templates:"
aws s3 sync \
--profile asset-publisher \
--acl public-read \
--delete \
--metadata commit=$(git rev-parse HEAD) \
./src/templates \
s3://aws-genomics-workflows/templates
IFS=""
PREFIX_PARTS=("$@")
PREFIX_PARTS=(${PREFIX_PARTS[@]})
PREFIX=$(printf '/%s' "${PREFIX_PARTS[@]%/}")

echo "${BUCKET%/}/${PREFIX:1}"
}


echo "publishing site"
aws s3 sync \
--acl public-read \
--delete \
./site \
s3://docs.opendata.aws/genomics-workflows
function artifacts() {
S3_URI=$(s3_uri $ASSET_BUCKET $ASSET_STAGE_PATH "artifacts")

echo "publishing artifacts: $S3_URI"
aws s3 sync \
--profile asset-publisher \
--acl public-read \
--delete \
./artifacts \
$S3_URI
}

function templates() {
S3_URI=$(s3_uri $ASSET_BUCKET $ASSET_STAGE_PATH "templates")

echo "publishing templates: $S3_URI"
aws s3 sync \
--profile asset-publisher \
--acl public-read \
--delete \
--metadata commit=$(git rev-parse HEAD) \
./src/templates \
$S3_URI
}

function site() {
echo "publishing site"
aws s3 sync \
--acl public-read \
--delete \
./site \
s3://docs.opendata.aws/genomics-workflows
}

function all() {
artifacts
templates
site
}

echo "DEPLOYMENT STAGE: $ASSET_STAGE"
case $ASSET_STAGE in
production)
ASSET_STAGE_PATH=""
all
;;
test)
ASSET_STAGE_PATH="test"
artifacts
templates
;;
*)
echo "unsupported staging level - $ASSET_STAGE"
exit 1
esac
138 changes: 109 additions & 29 deletions docs/core-env/create-custom-compute-resources.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,17 @@
# Creating Custom Compute Resources
# Custom Compute Resources

Genomics is a data-heavy workload and requires some modification to the defaults
used for batch job processing. In particular, instances running the Tasks/Jobs
need scalable storage to meet unpredictable runtime demands.
used by AWS Batch for job processing. To efficiently use resources, AWS Batch places multiple jobs on an worker instance. The data requirements for individual jobs can range from a few MB to 100s of GB. Instances running workflow jobs will not know beforehand how much space is required, and need scalable storage to meet unpredictable runtime demands.

By default, AWS Batch relies upon the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
to launch container instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large
storage requirements noted above, require customization of the base AMI.

This section provides two methods for customizing the base ECS-Optimized AMI
that adds an expandable working directory for jobs to write data.
A process will monitor the directory and add more EBS volumes on the fly to expand the free space
based on the capacity threshold, like so:
To handle this use case, we can use a process that monitors a scratch directory on an instance and expands free space as needed based on capacity thresholds. This can be done using logical volume management and attaching EBS volumes as needed to the instance like so:

![Autoscaling EBS storage](images/ebs-autoscale.png)

The above process - "EBS autoscaling" - requires a few small dependencies and a simple daemon installed on the host instance.

By default, AWS Batch uses the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
to launch instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large storage requirements noted above, require customization of the base AMI. Because the provisioning requirements for EBS autoscaling are fairly simple and light weight, one can use an EC2 Launch Template to customize instances.

## EC2 Launch Template

The simplest method for customizing an instance is to use an EC2 Launch Template.
Expand Down Expand Up @@ -43,11 +40,13 @@ packages:
- python27-pip
- sed
- wget
# add more package names here if you need them

runcmd:
- pip install -U awscli boto3
- cd /opt && wget https://aws-genomics-workflows.s3.amazonaws.com/artifacts/aws-ebs-autoscale.tgz && tar -xzf aws-ebs-autoscale.tgz
- sh /opt/ebs-autoscale/bin/init-ebs-autoscale.sh /scratch /dev/sdc 2>&1 > /var/log/init-ebs-autoscale.log
# you can add more commands here if you have additional provisioning steps

--==BOUNDARY==--
```
Expand All @@ -58,23 +57,113 @@ If you want this volume to be larger initially, you can specify a bigger one
mapped to `/dev/sdc` the Launch Template.

!!! note
The mount point is specific to what orchestration method / engine you intend
to use. `/scratch` is considered the default for AWS Step Functions. If you
are using a 3rd party workflow orchestration engine this mount point will need
to be adjusted to fit that engine's expectations.
The mount point is specific to what orchestration method / engine you intend to use. `/scratch` is considered a generic default. If you are using a 3rd party workflow orchestration engine this mount point will need to be adjusted to fit that engine's expectations.

Also note that the script has MIME multi-part boundaries. This is because AWS Batch will combind this script with others that it uses to provision instances.

## Creating an EC2 Launch Template

Instructions on how to create a launch template are below. Once your Launch Template is created, you can reference it when you setup resources in AWS Batch to ensure that jobs run therein have your customizations available
to them.

### Automated via CloudFormation

You can use the following CloudFormation template to create a Launch Template
suitable for your needs.

| Name | Description | Source | Launch Stack |
| -- | -- | :--: | :--: |
{{ cfn_stack_row("EC2 Launch Template", "GenomicsWorkflow-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}
{{ cfn_stack_row("EC2 Launch Template", "GWFCore-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}

### Manually via the AWS CLI

In most cases, EC2 Launch Templates can be created using the AWS EC2 Console.
For this case, we need to use the AWS CLI.

Create a file named `launch-template-data.json` with the following contents:

```json
{
"TagSpecifications": [
{
"ResourceType": "instance",
"Tags": [
{
"Key": "architecture",
"Value": "genomics-workflow"
},
{
"Key": "solution",
"Value": "nextflow"
}
]
}
],
"BlockDeviceMappings": [
{
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 50,
"VolumeType": "gp2"
},
"DeviceName": "/dev/xvda"
},
{
"Ebs": {
"Encrypted": true,
"DeleteOnTermination": true,
"VolumeSize": 75,
"VolumeType": "gp2"
},
"DeviceName": "/dev/xvdcz"
},
{
"Ebs": {
"Encrypted": true,
"DeleteOnTermination": true,
"VolumeSize": 20,
"VolumeType": "gp2"
},
"DeviceName": "/dev/sdc"
}
],
"UserData": "...base64-encoded-string..."
}
```

Once your Launch Template is created, you can reference it when you setup resources
in AWS Batch to ensure that jobs run therein have your customizations available
to them.
The above template will create an instance with three attached EBS volumes.

* `/dev/xvda`: will be used for the root volume
* `/dev/xvdcz`: will be used for the docker metadata volume
* `/dev/sdc`: will be the initial volume use for scratch space (more on this below)

## Custom AMI
The `UserData` value should be the `base64` encoded version of the UserData script used to provision instances.

Use the command below to create the corresponding launch template:

```bash
aws ec2 \
create-launch-template \
--launch-template-name genomics-workflow-template \
--launch-template-data file://launch-template-data.json
```

You should get something like the following as a response:

```json
{
"LaunchTemplate": {
"LatestVersionNumber": 1,
"LaunchTemplateId": "lt-0123456789abcdef0",
"LaunchTemplateName": "genomics-workflow-template",
"DefaultVersionNumber": 1,
"CreatedBy": "arn:aws:iam::123456789012:user/alice",
"CreateTime": "2019-01-01T00:00:00.000Z"
}
}
```

## Custom AMIs

A slightly more involved method for customizing an instance is
to create a new AMI based on the ECS Optimized AMI. This is good if you have
Expand All @@ -83,14 +172,5 @@ datasets preloaded that will be needed by all your jobs.

You can learn more about how to [create your own AMIs in the EC2 userguide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html).

The CloudFormation template below automates the tasks needed to create an AMI and should take about 10-15min to complete.

| Name | Description | Source | Launch Stack |
| -- | -- | :--: | :--: |
{{ cfn_stack_row("Custom AMI (Existing VPC)", "GenomicsWorkflow-AMI", "deprecated/aws-genomics-ami.template.yaml", "Creates a custom AMI that EC2 instances can be based on for processing genomics workflow tasks. The creation process will happen in a VPC you specify") }}

Once your AMI is created, you will need to jot down its unique AMI Id. You will
need this when creating compute resources in AWS Batch.

!!! note
This is considered advanced use. All documentation and CloudFormation templates hereon assumes use of EC2 Launch Templates.
Loading