Skip to content

Commit

Permalink
Merge branch 'master' into autogluon-tabular
Browse files Browse the repository at this point in the history
  • Loading branch information
metrizable authored Oct 29, 2020
2 parents da97c4b + ffee0c8 commit b0d3568
Show file tree
Hide file tree
Showing 26 changed files with 103 additions and 3,142 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,10 @@
"\n",
"For the inference container to serve multiple models in a multi-model endpoint, it must implement [additional APIs](https://docs.aws.amazon.com/sagemaker/latest/dg/build-multi-model-build-container.html) in order to load, list, get, unload and invoke specific models. This notebook demonstrates how to build your own inference container that implements these APIs.\n",
"\n",
"**Note**: Because this notebook builds a Docker container, it does not run in Amazon SageMaker Studio.\n",
"\n",
"This notebook was tested with the `conda_mxnet_p36` kernel running SageMaker Python SDK version 2.15.3 on an Amazon SageMaker notebook instance.\n",
"\n",
"---\n",
"\n",
"### Contents\n",
Expand Down Expand Up @@ -553,7 +557,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.7.6"
}
},
"nbformat": 4,
Expand Down
33 changes: 33 additions & 0 deletions buildspec.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
version: 0.2

env:
variables:
INSTANCE_TYPE: 'ml.c4.8xlarge'
REGION: 'us-west-2'

phases:
pre_build:
commands:
- PR_NUM=$(echo $CODEBUILD_SOURCE_VERSION | grep -o '[0-9]\+')
- NOTEBOOKS="$(pr-notebook-filenames --pr $PR_NUM)"

build:
commands:
- |-
if [ -z "$NOTEBOOKS" ]; then
echo "No notebooks to test in this pull request."
else
echo "Testing $NOTEBOOKS"
aws s3 cp s3://sagemaker-mead-cli/mead-nb-test.tar.gz mead-nb-test.tar.gz
tar -xzf mead-nb-test.tar.gz
export JAVA_HOME=$(get-java-home)
echo "set JAVA_HOME=$JAVA_HOME"
export SAGEMAKER_ROLE_ARN=$(aws iam list-roles --output text --query "Roles[?RoleName == 'SageMakerRole'].Arn")
echo "set SAGEMAKER_ROLE_ARN=$SAGEMAKER_ROLE_ARN"
./runtime/bin/mead-run-nb-test \
--instance-type $INSTANCE_TYPE \
--region $REGION \
--notebook-instance-role-arn $SAGEMAKER_ROLE_ARN \
$NOTEBOOKS
fi
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,10 @@
"region = boto3.Session().region_name\n",
"sage_client = boto3.Session().client('sagemaker')\n",
"\n",
"## You must have already run a hyperparameter tuning job to analyze it here.\n",
"## The Hyperparameter tuning jobs you have run are listed in the Training section on your SageMaker dashboard.\n",
"## Copy the name of a completed job you want to analyze from that list.\n",
"## For example: tuning_job_name = 'mxnet-training-201007-0054'.\n",
"tuning_job_name = 'YOUR-HYPERPARAMETER-TUNING-JOB-NAME'"
]
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,25 +4,22 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Object Detection using Managed Spot Training\n",
"# Object detection using managed spot training\n",
"\n",
"The example here is almost the same as [Amazon SageMaker Object Detection using the RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb).\n",
"This notebook shows how to use [Amazon SageMaker managed spot training](https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html) to run training jobs at potentially lower cost. Managed spot training uses [Amazon EC2 Spot instances](https://aws.amazon.com/ec2/spot/) and manages the Spot interruptions on your behalf.\n",
"\n",
"This notebook tackles the exact same problem with the same solution, but it has been modified to be able to run using SageMaker Managed Spot infrastructure. SageMaker Managed Spot uses [EC2 Spot Instances](https://aws.amazon.com/ec2/spot/) to run Training at a lower cost.\n",
"\n",
"Please read the original notebook and try it out to gain an understanding of the ML use-case and how it is being solved. We will not delve into that here in this notebook.\n",
"To highlight the differences between on-demand and Spot instances, this notebook is the same as [Amazon SageMaker Object Detection using the RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb), but has been updated to use managed spot training. For a full description of the ML use case and how it is being solved, see the original notebook.\n",
"\n",
"## Setup\n",
"Again, we won't go into detail explaining the code below, it has been lifted verbatim from [Amazon SageMaker Object Detection using the RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"!pip install -qU awscli boto3 sagemaker"
"\n",
"See [Amazon SageMaker Object Detection using the RecordIO format](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/object_detection_pascalvoc_coco/object_detection_recordio_format.ipynb) for a description of the code.\n",
"\n",
"### Prerequisites\n",
"\n",
"This notebook has been tested with:\n",
"* SageMaker Python SDK 1.72.1\n",
"* Python 3.6\n",
"* Kernel: conda_mxnet_p36"
]
},
{
Expand All @@ -48,10 +45,10 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Download And Prepare Data\n",
"Note: this notebook downloads and uses the Pascal VOC dateset, please be aware of the database usage rights:\n",
"\"The VOC data includes images obtained from the \"flickr\" website. Use of these images must respect the corresponding terms of use: \n",
"* \"flickr\" terms of use (https://www.flickr.com/help/terms)\""
"### Download and prepare data\n",
"This notebook downloads and uses the Pascal VOC dataset, which has the following database usage rights:\n",
"> The VOC data includes images obtained from the Flickr website. Use of these images must respect the corresponding terms of use: \n",
"> * Flickr terms of use (https://www.flickr.com/help/terms)"
]
},
{
Expand Down Expand Up @@ -81,7 +78,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Upload data to S3"
"### Upload data to Amazon Simple Storage Service (Amazon S3)"
]
},
{
Expand All @@ -105,15 +102,16 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Object Detection using Managed Spot Training\n",
"## Managed spot training\n",
"\n",
"For Managed Spot Training using Object Detection we need to configure two things:\n",
"1. Enable the `train_use_spot_instances` constructor arg - a simple self-explanatory boolean.\n",
"2. Set the `train_max_wait` constructor arg - this is an int arg representing the amount of time you are willing to wait for Spot infrastructure to become available. Some instance types are harder to get at Spot prices and you may have to wait longer. You are not charged for time spent waiting for Spot infrastructure to become available, you're only charged for actual compute time spent once Spot instances have been successfully procured.\n",
"Managed spot training is controlled by two arguments to the `sagemaker.estimator.Estimator` constructor:\n",
"\n",
"Feel free to toggle the `train_use_spot_instances` variable to see the effect of running the same job using regular (a.k.a. \"On Demand\") infrastructure.\n",
"* `train_use_spot_instances`: Set to `True` to use Spot instances for training jobs.\n",
"* `train_max_wait`: Represents the amount of time to wait for a Spot instance to become available. Be aware that some Spot instance types take longer to get. You are charged only for actual compute time spent once Spot instances have been acquired, and not for time spent waiting for Spot instances to become available.\n",
"\n",
"Note that `train_max_wait` can be set if and only if `train_use_spot_instances` is enabled and **must** be greater than or equal to `train_max_run`."
"Note that `train_max_wait` can be set only if `train_use_spot_instances` is `True` and **must** be greater than or equal to `train_max_run`.\n",
"\n",
"Toggle `train_use_spot_instances` in the following code to see the effect of running the same job using on-demand instances."
]
},
{
Expand All @@ -131,8 +129,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Training\n",
"Now that we are done with all the setup that is needed, we are ready to train our object detector. To begin, let us create a ``sageMaker.estimator.Estimator`` object. This estimator will launch the training job."
"### Training\n",
"\n",
"Train the object detector by creating a `sagemaker.estimator.Estimator` object and launching the training job."
]
},
{
Expand Down Expand Up @@ -182,16 +181,21 @@
},
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"pycharm": {
"name": "#%% md\n"
}
},
"source": [
"# Savings\n",
"Towards the end of the job you should see two lines of output printed:\n",
"### Savings\n",
"At the end of the job output, two lines are printed:\n",
"\n",
"* `Training seconds: X` : The actual compute time spent on the training job.\n",
"* `Billable seconds: Y` : The time you will be billed for after Spot discounting is applied.\n",
"\n",
"- `Training seconds: X` : This is the actual compute-time your training job spent\n",
"- `Billable seconds: Y` : This is the time you will be billed for after Spot discounting is applied.\n",
"When `train_use_spot_instances` is `True`, you should see a notable difference between training and billable seconds. This shows the cost savings when managed spot training is used, and is summarized in the final output:\n",
"\n",
"If you enabled the `train_use_spot_instances` var then you should see a notable difference between `X` and `Y` signifying the cost savings you will get for having chosen Managed Spot Training. This should be reflected in an additional line:\n",
"- `Managed Spot Training savings: (1-Y/X)*100 %`"
"* `Managed Spot Training savings: (1-Y/X)*100 %`"
]
}
],
Expand All @@ -212,9 +216,9 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.5"
"version": "3.6.10"
},
"notice": "Copyright 2018 Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
"notice": "Copyright Amazon.com, Inc. or its affiliates. All Rights Reserved. Licensed under the Apache License, Version 2.0 (the \"License\"). You may not use this file except in compliance with the License. A copy of the License is located at http://aws.amazon.com/apache2.0/ or in the \"license\" file accompanying this file. This file is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License."
},
"nbformat": 4,
"nbformat_minor": 4
Expand Down
Loading

0 comments on commit b0d3568

Please sign in to comment.