Skip to content

Commit

Permalink
Integrate SageMaker Automatic Model Tuning (HPO) with one XGBoost Aba…
Browse files Browse the repository at this point in the history
…lone notebook. (aws#3623)

* Integrate SageMaker Automatic Model Tuning (HPO) with one XGBoost Abalone notebook.

* Addressed comments for HPO integration.

Co-authored-by: Aaron Markham <[email protected]>
  • Loading branch information
2 people authored and atqy committed Oct 28, 2022
1 parent 8aedf58 commit 9afa94a
Showing 1 changed file with 175 additions and 17 deletions.
192 changes: 175 additions & 17 deletions introduction_to_amazon_algorithms/xgboost_abalone/xgboost_abalone.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,9 @@
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"metadata": {
"tags": []
},
"source": [
"# Regression with Amazon SageMaker XGBoost algorithm\n",
"_**Single machine training for regression with Amazon SageMaker XGBoost algorithm**_\n",
Expand Down Expand Up @@ -102,7 +104,13 @@
"source": [
"## Training the XGBoost model\n",
"\n",
"After setting training parameters, we kick off training, and poll for status until training is completed, which in this example, takes between 5 and 6 minutes."
"After setting training parameters, we kick off training, and poll for status until training is completed, which in this example, takes between 5 and 6 minutes.\n",
"\n",
"Training can be done by either calling SageMaker Training with a set of hyperparameters values to train with, or by leveraging SageMaker Automatic Model Tuning ([AMT](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning.html)). AMT, also known as hyperparameter tuning (HPO), finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose.\n",
"\n",
"In this notebook, both methods are used for demonstration purposes, but the model that the HPO job creates is the one that is eventually hosted. You can instead choose to deploy the model created by the standalone training job by changing the below variable `deploy_amt_model` to False.\n",
"\n",
"### Initiliazing common variables "
]
},
{
Expand All @@ -111,7 +119,16 @@
"metadata": {},
"outputs": [],
"source": [
"container = sagemaker.image_uris.retrieve(\"xgboost\", region, \"1.5-1\")"
"container = sagemaker.image_uris.retrieve(\"xgboost\", region, \"1.5-1\")\n",
"client = boto3.client(\"sagemaker\", region_name=region)\n",
"deploy_amt_model = True"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Training with SageMaker Training"
]
},
{
Expand All @@ -123,9 +140,9 @@
"%%time\n",
"import boto3\n",
"from time import gmtime, strftime\n",
"import time\n",
"\n",
"job_name = f\"DEMO-xgboost-regression-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}\"\n",
"print(\"Training job\", job_name)\n",
"training_job_name = f\"DEMO-xgboost-regression-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}\"\n",
"\n",
"# Ensure that the training and validation data folders generated above are reflected in the \"InputDataConfig\" parameter below.\n",
"\n",
Expand All @@ -134,7 +151,7 @@
" \"RoleArn\": role,\n",
" \"OutputDataConfig\": {\"S3OutputPath\": f\"{output_bucket_path}/{output_prefix}/single-xgboost\"},\n",
" \"ResourceConfig\": {\"InstanceCount\": 1, \"InstanceType\": \"ml.m5.2xlarge\", \"VolumeSizeInGB\": 5},\n",
" \"TrainingJobName\": job_name,\n",
" \"TrainingJobName\": training_job_name,\n",
" \"HyperParameters\": {\n",
" \"max_depth\": \"5\",\n",
" \"eta\": \"0.2\",\n",
Expand Down Expand Up @@ -174,17 +191,13 @@
" ],\n",
"}\n",
"\n",
"\n",
"client = boto3.client(\"sagemaker\", region_name=region)\n",
"print(f\"Creating a training job with name: {training_job_name}. It will take between 5 and 6 minutes to complete.\")\n",
"client.create_training_job(**create_training_params)\n",
"\n",
"import time\n",
"\n",
"status = client.describe_training_job(TrainingJobName=job_name)[\"TrainingJobStatus\"]\n",
"status = client.describe_training_job(TrainingJobName=training_job_name)[\"TrainingJobStatus\"]\n",
"print(status)\n",
"while status != \"Completed\" and status != \"Failed\":\n",
" time.sleep(60)\n",
" status = client.describe_training_job(TrainingJobName=job_name)[\"TrainingJobStatus\"]\n",
" status = client.describe_training_job(TrainingJobName=training_job_name)[\"TrainingJobStatus\"]\n",
" print(status)"
]
},
Expand All @@ -195,6 +208,146 @@
"Note that the \"validation\" channel has been initialized too. The SageMaker XGBoost algorithm actually calculates RMSE and writes it to the CloudWatch logs on the data passed to the \"validation\" channel."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Tuning with SageMaker Automatic Model Tuning\n",
"\n",
"To create a tuning job using the AWS SageMaker Automatic Model Tuning API, you need to define 3 attributes. \n",
"\n",
"1. the tuning job name (string)\n",
"2. the tuning job config (to specify settings for the hyperparameter tuning job - JSON object)\n",
"3. training job definition (to configure the training jobs that the tuning job launches - JSON object).\n",
"\n",
"To learn more about that, refer to the [Configure and Launch a Hyperparameter Tuning Job](https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-ex-tuning-job.html) documentation.\n",
"\n",
"Note that the tuning job will 12-17 minutes to complete."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"from time import gmtime, strftime, sleep\n",
"\n",
"tuning_job_name = \"DEMO-xgboost-reg-\" + strftime(\"%d-%H-%M-%S\", gmtime())\n",
"\n",
"tuning_job_config = {\n",
" \"ParameterRanges\": {\n",
" \"CategoricalParameterRanges\": [],\n",
" \"ContinuousParameterRanges\": [\n",
" {\n",
" \"MaxValue\": \"0.5\",\n",
" \"MinValue\": \"0.1\",\n",
" \"Name\": \"eta\",\n",
" },\n",
" {\n",
" \"MaxValue\": \"5\",\n",
" \"MinValue\": \"0\",\n",
" \"Name\": \"gamma\",\n",
" },\n",
" {\n",
" \"MaxValue\": \"120\",\n",
" \"MinValue\": \"0\",\n",
" \"Name\": \"min_child_weight\",\n",
" },\n",
" {\n",
" \"MaxValue\": \"1\",\n",
" \"MinValue\": \"0.5\",\n",
" \"Name\": \"subsample\",\n",
" },\n",
" {\n",
" \"MaxValue\": \"2\",\n",
" \"MinValue\": \"0\",\n",
" \"Name\": \"alpha\",\n",
" },\n",
" ],\n",
" \"IntegerParameterRanges\": [\n",
" {\n",
" \"MaxValue\": \"10\",\n",
" \"MinValue\": \"0\",\n",
" \"Name\": \"max_depth\",\n",
" },\n",
" {\n",
" \"MaxValue\": \"4000\",\n",
" \"MinValue\": \"1\",\n",
" \"Name\": \"num_round\",\n",
" }\n",
" ],\n",
" },\n",
" # SageMaker sets the following default limits for resources used by automatic model tuning:\n",
" # https://docs.aws.amazon.com/sagemaker/latest/dg/automatic-model-tuning-limits.html\n",
" \"ResourceLimits\": {\n",
" # Increase the max number of training jobs for increased accuracy (and training time).\n",
" \"MaxNumberOfTrainingJobs\": 6, \n",
" # Change parallel training jobs run by AMT to reduce total training time. Constrained by your account limits.\n",
" # if max_jobs=max_parallel_jobs then Bayesian search turns to Random.\n",
" \"MaxParallelTrainingJobs\": 2\n",
" },\n",
" \"Strategy\": \"Bayesian\",\n",
" \"HyperParameterTuningJobObjective\": {\"MetricName\": \"validation:rmse\", \"Type\": \"Minimize\"},\n",
"}\n",
"\n",
"training_job_definition = {\n",
" \"AlgorithmSpecification\": {\"TrainingImage\": container, \"TrainingInputMode\": \"File\"},\n",
" \"InputDataConfig\": [\n",
" {\n",
" \"ChannelName\": \"train\",\n",
" \"DataSource\": {\n",
" \"S3DataSource\": {\n",
" \"S3DataType\": \"S3Prefix\",\n",
" \"S3Uri\": f\"{output_bucket_path}/{output_prefix}/train\",\n",
" \"S3DataDistributionType\": \"FullyReplicated\",\n",
" }\n",
" },\n",
" \"ContentType\": \"libsvm\",\n",
" \"CompressionType\": \"None\",\n",
" },\n",
" {\n",
" \"ChannelName\": \"validation\",\n",
" \"DataSource\": {\n",
" \"S3DataSource\": {\n",
" \"S3DataType\": \"S3Prefix\",\n",
" \"S3Uri\": f\"{output_bucket_path}/{output_prefix}/validation\",\n",
" \"S3DataDistributionType\": \"FullyReplicated\",\n",
" }\n",
" },\n",
" \"ContentType\": \"libsvm\",\n",
" \"CompressionType\": \"None\",\n",
" },\n",
" ],\n",
" \"OutputDataConfig\": {\"S3OutputPath\": f\"{output_bucket_path}/{output_prefix}/single-xgboost\"},\n",
" \"ResourceConfig\": {\"InstanceCount\": 1, \"InstanceType\": \"ml.m5.2xlarge\", \"VolumeSizeInGB\": 5},\n",
" \"RoleArn\": role,\n",
" \"StaticHyperParameters\": {\n",
" \"objective\": \"reg:linear\",\n",
" \"verbosity\": \"2\",\n",
" },\n",
" \"StoppingCondition\": {\"MaxRuntimeInSeconds\": 43200},\n",
"}\n",
"\n",
"print(f\"Creating a tuning job with name: {tuning_job_name}. It will take between 12 and 17 minutes to complete.\")\n",
"client.create_hyper_parameter_tuning_job(\n",
" HyperParameterTuningJobName=tuning_job_name,\n",
" HyperParameterTuningJobConfig=tuning_job_config,\n",
" TrainingJobDefinition=training_job_definition,\n",
")\n",
"\n",
"status = client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)[\n",
" \"HyperParameterTuningJobStatus\"\n",
"]\n",
"print(status)\n",
"while status != \"Completed\" and status != \"Failed\":\n",
" time.sleep(60)\n",
" status = client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)[\n",
" \"HyperParameterTuningJobStatus\"\n",
" ]\n",
" print(status)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -217,10 +370,15 @@
"import boto3\n",
"from time import gmtime, strftime\n",
"\n",
"model_name = f\"{job_name}-model\"\n",
"if deploy_amt_model == True:\n",
" training_of_model_to_be_hosted = client.describe_hyper_parameter_tuning_job(HyperParameterTuningJobName=tuning_job_name)[\"BestTrainingJob\"][\"TrainingJobName\"]\n",
"else:\n",
" training_of_model_to_be_hosted = training_job_name\n",
" \n",
"model_name = f\"{training_of_model_to_be_hosted}-model\"\n",
"print(model_name)\n",
"\n",
"info = client.describe_training_job(TrainingJobName=job_name)\n",
"info = client.describe_training_job(TrainingJobName=training_of_model_to_be_hosted)\n",
"model_data = info[\"ModelArtifacts\"][\"S3ModelArtifacts\"]\n",
"print(model_data)\n",
"\n",
Expand Down Expand Up @@ -251,7 +409,7 @@
"from time import gmtime, strftime\n",
"\n",
"endpoint_config_name = f\"DEMO-XGBoostEndpointConfig-{strftime('%Y-%m-%d-%H-%M-%S', gmtime())}\"\n",
"print(endpoint_config_name)\n",
"print(f\"Creating endpoint config with name: {endpoint_config_name}.\")\n",
"create_endpoint_config_response = client.create_endpoint_config(\n",
" EndpointConfigName=endpoint_config_name,\n",
" ProductionVariants=[\n",
Expand Down Expand Up @@ -288,7 +446,7 @@
"import time\n",
"\n",
"endpoint_name = f'DEMO-XGBoostEndpoint-{strftime(\"%Y-%m-%d-%H-%M-%S\", gmtime())}'\n",
"print(endpoint_name)\n",
"print(f\"Creating endpoint with name: {endpoint_name}. This will take between 9 and 11 minutes to complete.\")\n",
"create_endpoint_response = client.create_endpoint(\n",
" EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n",
")\n",
Expand Down

0 comments on commit 9afa94a

Please sign in to comment.