Reverting to v22.07 (aws#3637)

* reverting to v22.07 * fixed formating issue * added images to fix format issue
atqy · Oct 28, 2022 · 34e5855 · 34e5855
1 parent 3c2f3f1
commit 34e5855
Show file tree

Hide file tree

Showing 4 changed files with 45 additions and 34 deletions.
diff --git a/multi-model-endpoints/mme-on-gpu/cv/images/pyt-model-repo.png b/multi-model-endpoints/mme-on-gpu/cv/images/pyt-model-repo.png
diff --git a/multi-model-endpoints/mme-on-gpu/cv/images/trt-model-repo.png b/multi-model-endpoints/mme-on-gpu/cv/images/trt-model-repo.png
diff --git a/multi-model-endpoints/mme-on-gpu/cv/resnet50_mme_with_gpu.ipynb b/multi-model-endpoints/mme-on-gpu/cv/resnet50_mme_with_gpu.ipynb
@@ -9,7 +9,7 @@
     "\n",
     "[Amazon SageMaker](https://aws.amazon.com/sagemaker/) multi-model endpoints(MME) provide a scalable and cost-effective way to deploy large number of deep learning models. Previously, customers had limited options to deploy 100s of deep learning models that need accelerated compute with GPUs. Now customers can deploy 1000s of deep learning models behind one SageMaker endpoint. Now, MME will run multiple models on a GPU core, share GPU instances behind an endpoint across multiple models and dynamically load/unload models based on the incoming traffic. With this, customers can significantly save cost and achieve best price performance.\n",
     "\n",
-    "<div class=\"alert alert-info\"> 💡 <strong> Note </strong>\n",
+    "<div class=\"alert alert-info\"> <strong> Note </strong>\n",
     "This notebook was tested with the `conda_python3` kernel on an Amazon SageMaker notebook instance of type `g4dn.4xlarge`\n",
     "</div>"
    ]
@@ -119,7 +119,7 @@
     "runtime_sm_client = boto3.client(\"sagemaker-runtime\")\n",
     "sagemaker_session = sagemaker.Session(boto_session=boto3.Session())\n",
     "bucket = sagemaker_session.default_bucket()\n",
-    "prefix = \"resnet-mme-gpu-v1\"\n",
+    "prefix = \"resnet50-mme-gpu\"\n",
     "\n",
     "# endpoint variables\n",
     "sm_model_name = f\"{prefix}-mdl-{ts}\"\n",
@@ -159,7 +159,7 @@
     "\n",
     "base = \"amazonaws.com.cn\" if region.startswith(\"cn-\") else \"amazonaws.com\"\n",
     "mme_triton_image_uri = (\n",
-    "    \"{account_id}.dkr.ecr.{region}.{base}/sagemaker-tritonserver:22.09-py3\".format(\n",
+    "    \"{account_id}.dkr.ecr.{region}.{base}/sagemaker-tritonserver:22.07-py3\".format(\n",
     "        account_id=account_id_map[region], region=region, base=base\n",
     "    )\n",
     ")"
@@ -175,7 +175,7 @@
     "This section presents overview of steps to prepare ResNet-50 pre-trained model to be deployed on SageMaker MME using Triton Inference server model configurations. \n",
     "\n",
     "\n",
-    "<div class=\"alert alert-info\"> 💡 <strong> Note </strong>\n",
+    "<div class=\"alert alert-info\"><strong> Note </strong>\n",
     "We are demonstrating deployment with 2 models. However, customers can prepare and 100s of models. The models may or may not share the same framework.\n",
     "</div>"
    ]
@@ -187,7 +187,7 @@
    "source": [
     "#### Prepare PyTorch Model \n",
     "\n",
-    "`generate_model_pytorch.sh` file in the `workspace` directory contains scripts to generate a PyTorch model. First, we load a pre-trained ResNet50 model using torchvision models package. We save the model as model.pt file in TorchScript optimized and serialized format. TorchScript needs an example inputs to do a model forward pass, so we pass one instance of a RGB image with 3 color channels of dimension 224X224."
+    "`generate_model_pytorch.sh` file in the `workspace` directory contains scripts to generate a PyTorch model. First, we load a pre-trained ResNet50 model using torchvision models package. We save the model as model.pt file in TorchScript optimized and serialized format. TorchScript needs an example inputs to do a model forward pass, so we pass one instance of a RGB image with 3 color channels of dimension 224X224. The script for exporting this model can be found [here](./workspace/generate_model_pytorch.sh)"
    ]
   },
   {
@@ -200,25 +200,28 @@
    "outputs": [],
    "source": [
     "!docker run --gpus=all --rm -it \\\n",
-    "            -v `pwd`/workspace:/workspace nvcr.io/nvidia/pytorch:22.09-py3 \\\n",
+    "            -v `pwd`/workspace:/workspace nvcr.io/nvidia/pytorch:22.07-py3 \\\n",
     "            /bin/bash generate_model_pytorch.sh"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "32082d09",
+   "id": "16351596",
    "metadata": {},
    "source": [
     "#### PyTorch Model Respository\n",
     "\n",
-    "The model repository contains model to serve, in our case it will be the model.pt and configuration file with input/output specifications and metadata.\n",
-    "\n",
-    "```\n",
-    "resnet\n",
-    "├── 1\n",
-    "│   └── model.pt\n",
-    "└── config.pbtxt\n",
-    "```"
+    "The model repository contains model to serve, in our case it will be the `model.pt` and configuration file with input/output specifications and metadata."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "50ed759b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.Image(\"images/pyt-model-repo.png\")"
    ]
   },
   {
@@ -228,7 +231,7 @@
    "source": [
     "#### PyTorch Model configuration\n",
     "\n",
-    "Model configuration file config.pbtxt must specify name of the model(resnet), the platform and backend properties (pytorch_libtorch), max_batch_size(128) and the input and output tensors along with the data type(TYPE_FP32) information. Additionally, you can specify instance_group and dynamic_batching properties to achieve high performance inference."
+    "Model configuration file `config.pbtxt` must specify name of the model(`resnet`), the platform and backend properties (`pytorch_libtorch`), max_batch_size(128) and the input and output tensors along with the data type(TYPE_FP32) information. Additionally, you can specify `instance_group` and `dynamic_batching` properties to achieve high performance inference."
    ]
   },
   {
@@ -273,7 +276,7 @@
    "source": [
     "#### Prepare TensorRT Model\n",
     "\n",
-    "1. We export the pre-trained ResNet model into an ONNX file, which runs the model once to trace its execution and then export the traced model to the specified file. It is one of the better options in terms model conversion and deployment when converting using ONNX.\n",
+    "1. We export the pre-trained ResNet50 model into an ONNX file, which runs the model once to trace its execution and then export the traced model to the specified file. It is one of the better options in terms model conversion and deployment when converting using ONNX.\n",
     "\n",
     "2. We use `trtexec` to automatically convert ONNX model to TensorRT plan. As ONNX is framework agnostic it works with models in TF, PyTorch and more. You will export the weights of your model from the framework and load them into your TensorRT network.\n",
     "\n",
@@ -289,25 +292,28 @@
    "outputs": [],
    "source": [
     "!docker run --gpus=all --rm -it \\\n",
-    "            -v `pwd`/workspace:/workspace nvcr.io/nvidia/pytorch:22.09-py3 \\\n",
+    "            -v `pwd`/workspace:/workspace nvcr.io/nvidia/pytorch:22.07-py3 \\\n",
     "            /bin/bash generate_model_trt.sh"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "8289b3a4",
+   "id": "0a76c963",
    "metadata": {},
    "source": [
     "#### TensorRT Model Respository\n",
     "\n",
-    "The model repository contains model to serve, for TensorRT model it will be the model.plan(created in above steps) and configuration file with input/output specifications and metadata.\n",
-    "\n",
-    "```\n",
-    "resnet\n",
-    "├── 1\n",
-    "│   └── model.plan\n",
-    "└── config.pbtxt\n",
-    "```"
+    "The model repository contains model to serve, for TensorRT model it will be the model.plan(created in above steps) and configuration file with input/output specifications and metadata."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "81fefa31",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "display.Image(\"images/trt-model-repo.png\")"
    ]
   },
   {
@@ -317,7 +323,7 @@
    "source": [
     "#### TensorRT Model configuration\n",
     "\n",
-    " For the TensorRT model, we specify tensorrt_plan as platform, input tensor specification of the image of dimension 224X224 which has 3 color channels. Output tensor with 1000 dimensions of type TYPE_FP32 corresponding the different object categories."
+    " For the TensorRT model, we specify `tensorrt_plan` as platform, input tensor specification of the image of dimension 224X224 which has 3 color channels. Output tensor with 1000 dimensions of type TYPE_FP32 corresponding the different object categories."
    ]
   },
   {
@@ -424,7 +430,7 @@
     "\n",
     "\n",
     "\n",
-    "<div class=\"alert alert-info\"> 💡 <strong> Note </strong>\n",
+    "<div class=\"alert alert-info\"> <strong> Note </strong>\n",
     "you can deploy 100s of models. The models can use same framework. They can also use different frameworks as shown in this note.\n",
     "</div>\n",
     "\n",
@@ -490,7 +496,7 @@
     "\n",
     "Create a multi-model endpoint configurations using [create_endpoint_config](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_endpoint_config) boto3 API. Specify an accelerated GPU computing instance in InstanceType, in this post we will use g4dn.4xlarge instance. We recommend configuring your endpoints with at least two instances. This allows SageMaker to provide a highly available set of predictions across multiple Availability Zones for the models.\n",
     "\n",
-    "<div class=\"alert alert-info\"> 💡 <strong> Note </strong>\n",
+    "<div class=\"alert alert-info\"> <strong> Note </strong>\n",
     "Based on our findings, customers get price performance on ML optimized instances with single GPU core. Hence, this feature is only enabled for single GPU core instances. For full list of instances supported see this (https://docs.aws.amazon.com/sagemaker/latest/dg/multi-model-endpoints.html#multi-model-support to Docs page where we capture list of isntances.)\n",
     "</div>\n"
    ]
@@ -862,6 +868,14 @@
     "sm_client.delete_endpoint_config(EndpointConfigName=endpoint_config_name)\n",
     "sm_client.delete_endpoint(EndpointName=endpoint_name)"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "11047290",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {

diff --git a/multi-model-endpoints/mme-on-gpu/cv/workspace/generate_model_pytorch.sh b/multi-model-endpoints/mme-on-gpu/cv/workspace/generate_model_pytorch.sh
@@ -6,7 +6,4 @@ python pt_exporter.py
 
 # Optional Scripts
 # use this script to convert Pytorch model to ONNX format
-# python onnx_exporter.py
-
-#use this command to generate a model plan that will be used to host SageMaker Endpoint
-#trtexec --onnx=model.onnx --saveEngine=model.plan --explicitBatch --minShapes=input:1x3x224x224 --optShapes=input:128x3x224x224 --maxShapes=input:128x3x224x224 --fp16 --verbose | tee conversion.txt
+# python onnx_exporter.py