Release Update (#65)

* Update README * update nextflow assets * update container entrypoint script * handle s3 uris as projects * sync session cache for resume * use logdir and workdir as defined in environment variables * update job definition to match container entrypoint script * create logdir and workdir environment variables * update s3 paths to create / use logdir and workdir * update generated config * use the new (19.07) config syntax for specifying path to awscli * update step-functions assets * optimize codebuild speed * increase size of build instance * update container builds * add apt-get update * increase thread counts * handle bam index file staging * handle non-standard paired read file names * update example workflow * use a smaller dataset so that the demo runs faster * implement ebs-autoscale on docker datavolume for sfn * implement two stage deployment * update aio templates * remove deprecated parameters * make default values for parameters consistent * automatically pick 2 AZs in VPCs * make stack exports consistent * update details to core-environment
aws-samples · Sep 25, 2019 · b874046 · b874046
1 parent 73ab29f
commit b874046
Show file tree

Hide file tree

Showing 27 changed files with 980 additions and 546 deletions.
diff --git a/.travis.yml b/.travis.yml
@@ -27,9 +27,15 @@ before_deploy:
   - bash _scripts/configure-deploy.sh
 
 deploy:
-  provider: script
-  script: bash _scripts/deploy.sh
-  skip_cleanup: true
-  on:
-    repo: aws-samples/aws-genomics-workflows
-    branch: master
+  - provider: script
+    script: bash _scripts/deploy.sh production
+    skip_cleanup: true
+    on:
+      repo: aws-samples/aws-genomics-workflows
+      branch: release
+  - provider: script
+    script: bash _scripts/deploy.sh test
+    skip_cleanup: true
+    on:
+      repo: aws-samples/aws-genomics-workflows
+      branch: master
diff --git a/README.md b/README.md
@@ -11,7 +11,7 @@ The documentation is built using mkdocs.
 Install dependencies:
 
 ```bash
-$ conda env create --file enviroment.yaml
+$ conda env create --file environment.yaml
 ```
 
 This will create a `conda` environment called `mkdocs`

diff --git a/_scripts/deploy.sh b/_scripts/deploy.sh
@@ -5,30 +5,75 @@ set -e
 bash _scripts/make-artifacts.sh
 mkdocs build
 
+ASSET_BUCKET=s3://aws-genomics-workflows
+ASSET_STAGE=${1:-production}
 
-echo "publishing artifacts:"
-aws s3 sync \
-    --profile asset-publisher \
-    --acl public-read \
-    --delete \
-    ./artifacts \
-    s3://aws-genomics-workflows/artifacts
 
+function s3_uri() {
+    BUCKET=$1
+    shift
 
-echo "publishing templates:"
-aws s3 sync \
-    --profile asset-publisher \
-    --acl public-read \
-    --delete \
-    --metadata commit=$(git rev-parse HEAD) \
-    ./src/templates \
-    s3://aws-genomics-workflows/templates
+    IFS=""
+    PREFIX_PARTS=("$@")
+    PREFIX_PARTS=(${PREFIX_PARTS[@]})
+    PREFIX=$(printf '/%s' "${PREFIX_PARTS[@]%/}")
+
+    echo "${BUCKET%/}/${PREFIX:1}"
+}
 
 
-echo "publishing site"
-aws s3 sync \
-    --acl public-read \
-    --delete \
-    ./site \
-    s3://docs.opendata.aws/genomics-workflows
+function artifacts() {
+    S3_URI=$(s3_uri $ASSET_BUCKET $ASSET_STAGE_PATH "artifacts")
 
+    echo "publishing artifacts: $S3_URI"
+    aws s3 sync \
+        --profile asset-publisher \
+        --acl public-read \
+        --delete \
+        ./artifacts \
+        $S3_URI
+}
+
+function templates() {
+    S3_URI=$(s3_uri $ASSET_BUCKET $ASSET_STAGE_PATH "templates")
+
+    echo "publishing templates: $S3_URI"
+    aws s3 sync \
+        --profile asset-publisher \
+        --acl public-read \
+        --delete \
+        --metadata commit=$(git rev-parse HEAD) \
+        ./src/templates \
+        $S3_URI
+}
+
+function site() {
+    echo "publishing site"
+    aws s3 sync \
+        --acl public-read \
+        --delete \
+        ./site \
+        s3://docs.opendata.aws/genomics-workflows
+}
+
+function all() {
+    artifacts
+    templates
+    site
+}
+
+echo "DEPLOYMENT STAGE: $ASSET_STAGE"
+case $ASSET_STAGE in
+    production)
+        ASSET_STAGE_PATH=""
+        all
+        ;;
+    test)
+        ASSET_STAGE_PATH="test"
+        artifacts
+        templates
+        ;;
+    *)
+        echo "unsupported staging level - $ASSET_STAGE"
+        exit 1
+esac
diff --git a/docs/core-env/create-custom-compute-resources.md b/docs/core-env/create-custom-compute-resources.md
@@ -1,20 +1,17 @@
-# Creating Custom Compute Resources
+# Custom Compute Resources
 
 Genomics is a data-heavy workload and requires some modification to the defaults
-used for batch job processing. In particular, instances running the Tasks/Jobs 
-need scalable storage to meet unpredictable runtime demands.
+used by AWS Batch for job processing.  To efficiently use resources, AWS Batch places multiple jobs on an worker instance.  The data requirements for individual jobs can range from a few MB to 100s of GB.  Instances running workflow jobs will not know beforehand how much space is required, and need scalable storage to meet unpredictable runtime demands.
 
-By default, AWS Batch relies upon the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
-to launch container instances for running jobs.  This is sufficient in most cases, but specialized needs, such as the large 
-storage requirements noted above, require customization of the base AMI.
-
-This section provides two methods for customizing the base ECS-Optimized AMI 
-that adds an expandable working directory for jobs to write data.
-A process will monitor the directory and add more EBS volumes on the fly to expand the free space 
-based on the capacity threshold, like so:
+To handle this use case, we can use a process that monitors a scratch directory on an instance and expands free space as needed based on capacity thresholds. This can be done using logical volume management and attaching EBS volumes as needed to the instance like so:
 
 ![Autoscaling EBS storage](images/ebs-autoscale.png)
 
+The above process - "EBS autoscaling" - requires a few small dependencies and a simple daemon installed on the host instance.
+
+By default, AWS Batch uses the [Amazon ECS-Optimized AMI](https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-optimized_AMI.html)
+to launch instances for running jobs.  This is sufficient in most cases, but specialized needs, such as the large storage requirements noted above, require customization of the base AMI.  Because the provisioning requirements for EBS autoscaling are fairly simple and light weight, one can use an EC2 Launch Template to customize instances.
+
 ## EC2 Launch Template
 
 The simplest method for customizing an instance is to use an EC2 Launch Template.
@@ -43,11 +40,13 @@ packages:
 - python27-pip
 - sed
 - wget
+# add more package names here if you need them
 
 runcmd:
 - pip install -U awscli boto3
 - cd /opt && wget https://aws-genomics-workflows.s3.amazonaws.com/artifacts/aws-ebs-autoscale.tgz && tar -xzf aws-ebs-autoscale.tgz
 - sh /opt/ebs-autoscale/bin/init-ebs-autoscale.sh /scratch /dev/sdc  2>&1 > /var/log/init-ebs-autoscale.log
+# you can add more commands here if you have additional provisioning steps
 
 --==BOUNDARY==--
 ```
@@ -58,23 +57,113 @@ If you want this volume to be larger initially, you can specify a bigger one
 mapped to `/dev/sdc`  the Launch Template.
 
 !!! note
-    The mount point is specific to what orchestration method / engine you intend
-    to use.  `/scratch` is considered the default for AWS Step Functions.  If you
-    are using a 3rd party workflow orchestration engine this mount point will need
-    to be adjusted to fit that engine's expectations.
+    The mount point is specific to what orchestration method / engine you intend to use.  `/scratch` is considered a generic default.  If you are using a 3rd party workflow orchestration engine this mount point will need to be adjusted to fit that engine's expectations.
+
+Also note that the script has MIME multi-part boundaries.  This is because AWS Batch will combind this script with others that it uses to provision instances.
+
+## Creating an EC2 Launch Template
+
+Instructions on how to create a launch template are below.  Once your Launch Template is created, you can reference it when you setup resources in AWS Batch to ensure that jobs run therein have your customizations available
+to them.
+
+### Automated via CloudFormation
 
 You can use the following CloudFormation template to create a Launch Template
 suitable for your needs.
 
 | Name | Description | Source | Launch Stack |
 | -- | -- | :--: | :--: |
-{{ cfn_stack_row("EC2 Launch Template", "GenomicsWorkflow-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}
+{{ cfn_stack_row("EC2 Launch Template", "GWFCore-LT", "aws-genomics-launch-template.template.yaml", "Creates an EC2 Launch Template that provisions instances on first boot for processing genomics workflow tasks.") }}
+
+### Manually via the AWS CLI
+
+In most cases, EC2 Launch Templates can be created using the AWS EC2 Console.
+For this case, we need to use the AWS CLI.
+
+Create a file named `launch-template-data.json` with the following contents:
+
+```json
+{
+  "TagSpecifications": [
+    {
+      "ResourceType": "instance",
+      "Tags": [
+        {
+          "Key": "architecture",
+          "Value": "genomics-workflow"
+        },
+        {
+          "Key": "solution",
+          "Value": "nextflow"
+        }
+      ]
+    }
+  ],
+  "BlockDeviceMappings": [
+    {
+      "Ebs": {
+        "DeleteOnTermination": true,
+        "VolumeSize": 50,
+        "VolumeType": "gp2"
+      },
+      "DeviceName": "/dev/xvda"
+    },
+    {
+      "Ebs": {
+        "Encrypted": true,
+        "DeleteOnTermination": true,
+        "VolumeSize": 75,
+        "VolumeType": "gp2"
+      },
+      "DeviceName": "/dev/xvdcz"
+    },
+    {
+      "Ebs": {
+        "Encrypted": true,
+        "DeleteOnTermination": true,
+        "VolumeSize": 20,
+        "VolumeType": "gp2"
+      },
+      "DeviceName": "/dev/sdc"
+    }
+  ],
+  "UserData": "...base64-encoded-string..."
+}
+```
 
-Once your Launch Template is created, you can reference it when you setup resources
-in AWS Batch to ensure that jobs run therein have your customizations available
-to them.
+The above template will create an instance with three attached EBS volumes.
+
+* `/dev/xvda`: will be used for the root volume
+* `/dev/xvdcz`: will be used for the docker metadata volume
+* `/dev/sdc`: will be the initial volume use for scratch space (more on this below)
 
-## Custom AMI
+The `UserData` value should be the `base64` encoded version of the UserData script used to provision instances.
+
+Use the command below to create the corresponding launch template:
+
+```bash
+aws ec2 \
+    create-launch-template \
+        --launch-template-name genomics-workflow-template \
+        --launch-template-data file://launch-template-data.json
+```
+
+You should get something like the following as a response:
+
+```json
+{
+    "LaunchTemplate": {
+        "LatestVersionNumber": 1, 
+        "LaunchTemplateId": "lt-0123456789abcdef0", 
+        "LaunchTemplateName": "genomics-workflow-template", 
+        "DefaultVersionNumber": 1, 
+        "CreatedBy": "arn:aws:iam::123456789012:user/alice", 
+        "CreateTime": "2019-01-01T00:00:00.000Z"
+    }
+}
+```
+
+## Custom AMIs
 
 A slightly more involved method for customizing an instance is
 to create a new AMI based on the ECS Optimized AMI.  This is good if you have 
@@ -83,14 +172,5 @@ datasets preloaded that will be needed by all your jobs.
 
 You can learn more about how to [create your own AMIs in the EC2 userguide](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AMIs.html).
 
-The CloudFormation template below automates the tasks needed to create an AMI and should take about 10-15min to complete.
-
-| Name | Description | Source | Launch Stack |
-| -- | -- | :--: | :--: |
-{{ cfn_stack_row("Custom AMI (Existing VPC)", "GenomicsWorkflow-AMI", "deprecated/aws-genomics-ami.template.yaml", "Creates a custom AMI that EC2 instances can be based on for processing genomics workflow tasks.  The creation process will happen in a VPC you specify") }}
-
-Once your AMI is created, you will need to jot down its unique AMI Id.  You will
-need this when creating compute resources in AWS Batch.
-
 !!! note
     This is considered advanced use.  All documentation and CloudFormation templates hereon assumes use of EC2 Launch Templates.