Skip to content
This repository has been archived by the owner on Aug 9, 2023. It is now read-only.

Commit

Permalink
Release Update (#65)
Browse files Browse the repository at this point in the history
* Update README

* update nextflow assets
  * update container entrypoint script
    * handle s3 uris as projects
    * sync session cache for resume
    * use logdir and workdir as defined in environment variables
  * update job definition to match container entrypoint script
    * create logdir and workdir environment variables
  * update s3 paths to create / use logdir and workdir
  * update generated config
  * use the new (19.07) config syntax for specifying path to awscli

* update step-functions assets
  * optimize codebuild speed
    * increase size of build instance
  * update container builds
    * add apt-get update
    * increase thread counts
    * handle bam index file staging
    * handle non-standard paired read file names
  * update example workflow
    * use a smaller dataset so that the demo runs faster
  * implement ebs-autoscale on docker datavolume for sfn

* implement two stage deployment

* update aio templates
  * remove deprecated parameters
  * make default values for parameters consistent
  * automatically pick 2 AZs in VPCs
  * make stack exports consistent

* update details to core-environment
  • Loading branch information
wleepang authored Sep 25, 2019
1 parent 73ab29f commit b874046
Show file tree
Hide file tree
Showing 27 changed files with 980 additions and 546 deletions.
18 changes: 12 additions & 6 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -27,9 +27,15 @@ before_deploy:
- bash _scripts/configure-deploy.sh

deploy:
provider: script
script: bash _scripts/deploy.sh
skip_cleanup: true
on:
repo: aws-samples/aws-genomics-workflows
branch: master
- provider: script
script: bash _scripts/deploy.sh production
skip_cleanup: true
on:
repo: aws-samples/aws-genomics-workflows
branch: release
- provider: script
script: bash _scripts/deploy.sh test
skip_cleanup: true
on:
repo: aws-samples/aws-genomics-workflows
branch: master
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ The documentation is built using mkdocs.
Install dependencies:

```bash
$ conda env create --file enviroment.yaml
$ conda env create --file environment.yaml
```

This will create a `conda` environment called `mkdocs`
Expand Down
87 changes: 66 additions & 21 deletions _scripts/deploy.sh
Original file line number Diff line number Diff line change
Expand Up @@ -5,30 +5,75 @@ set -e
bash _scripts/make-artifacts.sh
mkdocs build

ASSET_BUCKET=s3://aws-genomics-workflows
ASSET_STAGE=${1:-production}

echo "publishing artifacts:"
aws s3 sync \
--profile asset-publisher \
--acl public-read \
--delete \
./artifacts \
s3://aws-genomics-workflows/artifacts

function s3_uri() {
BUCKET=$1
shift

echo "publishing templates:"
aws s3 sync \
--profile asset-publisher \
--acl public-read \
--delete \
--metadata commit=$(git rev-parse HEAD) \
./src/templates \
s3://aws-genomics-workflows/templates
IFS=""
PREFIX_PARTS=("$@")
PREFIX_PARTS=(${PREFIX_PARTS[@]})
PREFIX=$(printf '/%s' "${PREFIX_PARTS[@]%/}")

echo "${BUCKET%/}/${PREFIX:1}"
}


echo "publishing site"
aws s3 sync \
--acl public-read \
--delete \
./site \
s3://docs.opendata.aws/genomics-workflows
function artifacts() {
S3_URI=$(s3_uri $ASSET_BUCKET $ASSET_STAGE_PATH "artifacts")

echo "publishing artifacts: $S3_URI"
aws s3 sync \
--profile asset-publisher \
--acl public-read \
--delete \
./artifacts \
$S3_URI
}

function templates() {
S3_URI=$(s3_uri $ASSET_BUCKET $ASSET_STAGE_PATH "templates")

echo "publishing templates: $S3_URI"
aws s3 sync \
--profile asset-publisher \
--acl public-read \
--delete \
--metadata commit=$(git rev-parse HEAD) \
./src/templates \
$S3_URI
}

function site() {
echo "publishing site"
aws s3 sync \
--acl public-read \
--delete \
./site \
s3://docs.opendata.aws/genomics-workflows
}

function all() {
artifacts
templates
site
}

echo "DEPLOYMENT STAGE: $ASSET_STAGE"
case $ASSET_STAGE in
production)
ASSET_STAGE_PATH=""
all
;;
test)
ASSET_STAGE_PATH="test"
artifacts
templates
;;
*)
echo "unsupported staging level - $ASSET_STAGE"
exit 1
esac
138 changes: 109 additions & 29 deletions docs/core-env/create-custom-compute-resources.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,17 @@
# Creating Custom Compute Resources
# Custom Compute Resources

Genomics is a data-heavy workload and requires some modification to the defaults
used for batch job processing. In particular, instances running the Tasks/Jobs
need scalable storage to meet unpredictable runtime demands.
used by AWS Batch for job processing. To efficiently use resources, AWS Batch places multiple jobs on an worker instance. The data requirements for individual jobs can range from a few MB to 100s of GB. Instances running workflow jobs will not know beforehand how much space is required, and need scalable storage to meet unpredictable runtime demands.

By default, AWS Batch relies upon the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
to launch container instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large
storage requirements noted above, require customization of the base AMI.

This section provides two methods for customizing the base ECS-Optimized AMI
that adds an expandable working directory for jobs to write data.
A process will monitor the directory and add more EBS volumes on the fly to expand the free space
based on the capacity threshold, like so:
To handle this use case, we can use a process that monitors a scratch directory on an instance and expands free space as needed based on capacity thresholds. This can be done using logical volume management and attaching EBS volumes as needed to the instance like so:

![Autoscaling EBS storage](images/ebs-autoscale.png)

The above process - "EBS autoscaling" - requires a few small dependencies and a simple daemon installed on the host instance.

By default, AWS Batch uses the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
to launch instances for running jobs. This is sufficient in most cases, but specialized needs, such as the large storage requirements noted above, require customization of the base AMI. Because the provisioning requirements for EBS autoscaling are fairly simple and light weight, one can use an EC2 Launch Template to customize instances.

## EC2 Launch Template

The simplest method for customizing an instance is to use an EC2 Launch Template.
Expand Down Expand Up @@ -43,11 +40,13 @@ packages:
- python27-pip
- sed
- wget
# add more package names here if you need them
runcmd:
- pip install -U awscli boto3
- cd /opt && wget https://aws-genomics-workflows.s3.amazonaws.com/artifacts/aws-ebs-autoscale.tgz && tar -xzf aws-ebs-autoscale.tgz
- sh /opt/ebs-autoscale/bin/init-ebs-autoscale.sh /scratch /dev/sdc 2>&1 > /var/log/init-ebs-autoscale.log
# you can add more commands here if you have additional provisioning steps
--==BOUNDARY==--
```
Expand All @@ -58,23 +57,113 @@ If you want this volume to be larger initially, you can specify a bigger one
mapped to `/dev/sdc` the Launch Template.

!!! note
The mount point is specific to what orchestration method / engine you intend
to use. `/scratch` is considered the default for AWS Step Functions. If you
are using a 3rd party workflow orchestration engine this mount point will need
to be adjusted to fit that engine's expectations.
The mount point is specific to what orchestration method / engine you intend to use. `/scratch` is considered a generic default. If you are using a 3rd party workflow orchestration engine this mount point will need to be adjusted to fit that engine's expectations.

Also note that the script has MIME multi-part boundaries. This is because AWS Batch will combind this script with others that it uses to provision instances.

## Creating an EC2 Launch Template

Instructions on how to create a launch template are below. Once your Launch Template is created, you can reference it when you setup resources in AWS Batch to ensure that jobs run therein have your customizations available
to them.

### Automated via CloudFormation

You can use the following CloudFormation template to create a Launch Template
suitable for your needs.

| Name | Description | Source | Launch Stack |
| -- | -- | :--: | :--: |
{{ cfn_stack_row("EC2 Launch Template", "GenomicsWorkflow-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}
{{ cfn_stack_row("EC2 Launch Template", "GWFCore-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}

### Manually via the AWS CLI

In most cases, EC2 Launch Templates can be created using the AWS EC2 Console.
For this case, we need to use the AWS CLI.

Create a file named `launch-template-data.json` with the following contents:

```json
{
"TagSpecifications": [
{
"ResourceType": "instance",
"Tags": [
{
"Key": "architecture",
"Value": "genomics-workflow"
},
{
"Key": "solution",
"Value": "nextflow"
}
]
}
],
"BlockDeviceMappings": [
{
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 50,
"VolumeType": "gp2"
},
"DeviceName": "/dev/xvda"
},
{
"Ebs": {
"Encrypted": true,
"DeleteOnTermination": true,
"VolumeSize": 75,
"VolumeType": "gp2"
},
"DeviceName": "/dev/xvdcz"
},
{
"Ebs": {
"Encrypted": true,
"DeleteOnTermination": true,
"VolumeSize": 20,
"VolumeType": "gp2"
},
"DeviceName": "/dev/sdc"
}
],
"UserData": "...base64-encoded-string..."
}
```

Once your Launch Template is created, you can reference it when you setup resources
in AWS Batch to ensure that jobs run therein have your customizations available
to them.
The above template will create an instance with three attached EBS volumes.

* `/dev/xvda`: will be used for the root volume
* `/dev/xvdcz`: will be used for the docker metadata volume
* `/dev/sdc`: will be the initial volume use for scratch space (more on this below)

## Custom AMI
The `UserData` value should be the `base64` encoded version of the UserData script used to provision instances.

Use the command below to create the corresponding launch template:

```bash
aws ec2 \
create-launch-template \
--launch-template-name genomics-workflow-template \
--launch-template-data file://launch-template-data.json
```

You should get something like the following as a response:

```json
{
"LaunchTemplate": {
"LatestVersionNumber": 1,
"LaunchTemplateId": "lt-0123456789abcdef0",
"LaunchTemplateName": "genomics-workflow-template",
"DefaultVersionNumber": 1,
"CreatedBy": "arn:aws:iam::123456789012:user/alice",
"CreateTime": "2019-01-01T00:00:00.000Z"
}
}
```

## Custom AMIs

A slightly more involved method for customizing an instance is
to create a new AMI based on the ECS Optimized AMI. This is good if you have
Expand All @@ -83,14 +172,5 @@ datasets preloaded that will be needed by all your jobs.

You can learn more about how to [create your own AMIs in the EC2 userguide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html).

The CloudFormation template below automates the tasks needed to create an AMI and should take about 10-15min to complete.

| Name | Description | Source | Launch Stack |
| -- | -- | :--: | :--: |
{{ cfn_stack_row("Custom AMI (Existing VPC)", "GenomicsWorkflow-AMI", "deprecated/aws-genomics-ami.template.yaml", "Creates a custom AMI that EC2 instances can be based on for processing genomics workflow tasks. The creation process will happen in a VPC you specify") }}

Once your AMI is created, you will need to jot down its unique AMI Id. You will
need this when creating compute resources in AWS Batch.

!!! note
This is considered advanced use. All documentation and CloudFormation templates hereon assumes use of EC2 Launch Templates.
Loading

0 comments on commit b874046

Please sign in to comment.