Skip to content

Commit

Permalink
Boto3 version notebook (aws#3597)
Browse files Browse the repository at this point in the history
* CLI upgrade

* reformat

* grammatical changes

* boto3 version

* boto3 version-with minor change

* serving.perperties remove empty line

* set env variable for tensor_parallel_degree

* grammatic fix

* black-nb

* grammatical change

* endpoint_name fix

* "By" cap

* minor change

Co-authored-by: Qingwei Li <[email protected]>
Co-authored-by: atqy <[email protected]>
Co-authored-by: atqy <[email protected]>
  • Loading branch information
4 people committed Oct 28, 2022
1 parent 8664dc3 commit 0332284
Showing 1 changed file with 69 additions and 56 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,16 @@
"In this notebook, we deploy a PyTorch GPT-J model from Hugging Face with 6 billion parameters across two GPUs on an Amazon SageMaker ml.g5.48xlarge instance. DeepSpeed is used for tensor parallelism inference while DJLServing handles inference requests and the distributed workers. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "d6ed354b",
"metadata": {},
"outputs": [],
"source": [
"!pip install boto3==1.24.68"
]
},
{
"cell_type": "markdown",
"id": "81c2bdf4",
Expand Down Expand Up @@ -179,13 +189,13 @@
"source": [
"### Setup serving.properties\n",
"\n",
"User needs to add engine Rubikon as shown below. If you would like to control how many worker groups, you can set\n",
"User needs to add engine Rubikon as shown below. If you would like to control how many worker groups, you can set by adding these lines in the below file.\n",
"\n",
"```\n",
"gpu.minWorkers=1\n",
"gpu.maxWorkers=1\n",
"```\n",
"by adding these lines in the below file. By default, we will create as much worker group as possible based on `gpu_numbers/tensor_parallel_degree`."
"By default, we will create as much worker group as possible based on `gpu_numbers/tensor_parallel_degree`."
]
},
{
Expand All @@ -196,7 +206,6 @@
"outputs": [],
"source": [
"%%writefile serving.properties\n",
"\n",
"engine = Rubikon"
]
},
Expand All @@ -221,10 +230,9 @@
"account = session.account_id()\n",
"region = session.boto_region_name\n",
"img = \"djl_deepspeed\"\n",
"fullname = account + \".dkr.ecr.\" + region + \"amazonaws.com/\" + img + \":latest\"\n",
"\n",
"fullname = account + \".dkr.ecr.\" + region + \".amazonaws.com/\" + img + \":latest\"\n",
"bucket = session.default_bucket()\n",
"path = \"s3://\" + bucket + \"/DEMO-djl-big-model/\""
"path = \"s3://\" + bucket + \"/DEMO-djl-big-model\""
]
},
{
Expand Down Expand Up @@ -253,7 +261,9 @@
"metadata": {},
"outputs": [],
"source": [
"!aws s3 cp gpt-j.tar.gz {path}"
"model_s3_url = sagemaker.s3.S3Uploader.upload(\n",
" \"gpt-j.tar.gz\", path, kms_key=None, sagemaker_session=session\n",
")"
]
},
{
Expand All @@ -266,77 +276,82 @@
},
{
"cell_type": "markdown",
"id": "f96c494a",
"id": "32589338",
"metadata": {},
"source": [
"First let us make sure we have the lastest awscli"
"Now we create our [SageMaker model](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker.html#SageMaker.Client.create_model). Make sure your execution role has access to your model artifacts and ECR image. Please check out our SageMaker Roles [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) for more details. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "0b665515",
"id": "026d27d2",
"metadata": {},
"outputs": [],
"source": [
"!pip3 install --upgrade --user awscli"
"from datetime import datetime\n",
"\n",
"sm_client = boto3.client(\"sagemaker\")\n",
"\n",
"time_stamp = datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
"model_name = \"gpt-j-\" + time_stamp\n",
"\n",
"create_model_response = sm_client.create_model(\n",
" ModelName=model_name,\n",
" ExecutionRoleArn=session.get_caller_identity_arn(),\n",
" PrimaryContainer={\n",
" \"Image\": fullname,\n",
" \"ModelDataUrl\": model_s3_url,\n",
" \"Environment\": {\"TENSOR_PARALLEL_DEGREE\": \"2\"},\n",
" },\n",
")"
]
},
{
"cell_type": "markdown",
"id": "32589338",
"id": "22d2fc2b",
"metadata": {},
"source": [
"You should see two images from code above. Please note the image name similar to`<AWS_account_ID>.dkr.ecr.us-east-1.amazonaws.com/djl_deepspeed`. This is the ECR image URL that we need for later use. \n",
"\n",
"Now we create our [SageMaker model](https://docs.aws.amazon.com/cli/latest/reference/sagemaker/create-model.html). Make sure you provide an IAM role that SageMaker can assume to access model artifacts and docker image for deployment on ML compute hosting instances. In addition, you also use the IAM role to manage permissions the inference code needs. Please check out our SageMaker Roles [documentation](https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-roles.html) for more details. \n",
"\n",
" <span style=\"color:red\"> You must enter ECR image name, S3 path for the model file, and an execution-role-arn</span> in the code below."
"Now we create an endpoint configuration that SageMaker hosting services uses to deploy models. Note that we configured `ModelDataDownloadTimeoutInSeconds` and `ContainerStartupHealthCheckTimeoutInSeconds` to accommodate the large size of our model. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "026d27d2",
"id": "84e25dd4",
"metadata": {},
"outputs": [],
"source": [
"!aws sagemaker create-model \\\n",
"--model-name gpt-j \\\n",
"--primary-container \\\n",
"Image=<ECR image>,ModelDataUrl={path},Environment={TENSOR_PARALLEL_DEGREE=2} \\\n",
"--execution-role-arn <your execution-role-arn>"
"initial_instance_count = 1\n",
"instance_type = \"ml.g5.48xlarge\"\n",
"variant_name = \"AllTraffic\"\n",
"endpoint_config_name = \"t-j-config-\" + time_stamp\n",
"\n",
"production_variants = [\n",
" {\n",
" \"VariantName\": variant_name,\n",
" \"ModelName\": model_name,\n",
" \"InitialInstanceCount\": initial_instance_count,\n",
" \"InstanceType\": instance_type,\n",
" \"ModelDataDownloadTimeoutInSeconds\": 1800,\n",
" \"ContainerStartupHealthCheckTimeoutInSeconds\": 3600,\n",
" }\n",
"]\n",
"\n",
"endpoint_config = {\n",
" \"EndpointConfigName\": endpoint_config_name,\n",
" \"ProductionVariants\": production_variants,\n",
"}\n",
"\n",
"ep_conf_res = sm_client.create_endpoint_config(**endpoint_config)"
]
},
{
"cell_type": "markdown",
"id": "22d2fc2b",
"id": "e4b3bc26",
"metadata": {},
"source": [
"Note that we configure `ModelDataDownloadTimeoutInSeconds` and `ContainerStartupHealthCheckTimeoutInSeconds` to acommodate the large size of our model. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "84e25dd4",
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"aws sagemaker create-endpoint-config \\\n",
" --region $(aws configure get region) \\\n",
" --endpoint-config-name gpt-j-config \\\n",
" --production-variants '[\n",
" {\n",
" \"ModelName\": \"gpt-j\",\n",
" \"VariantName\": \"AllTraffic\",\n",
" \"InstanceType\": \"ml.g5.48xlarge\",\n",
" \"InitialInstanceCount\": 1,\n",
" \"ModelDataDownloadTimeoutInSeconds\": 1800,\n",
" \"ContainerStartupHealthCheckTimeoutInSeconds\": 3600\n",
" }\n",
" ]'"
"We are ready to create an endpoint using the model and the endpoint configuration created from above steps. "
]
},
{
Expand All @@ -346,10 +361,10 @@
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"aws sagemaker create-endpoint \\\n",
"--endpoint-name gpt-j \\\n",
"--endpoint-config-name gpt-j-config"
"endpoint_name = \"gpt-j\" + time_stamp\n",
"ep_res = sm_client.create_endpoint(\n",
" EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n",
")"
]
},
{
Expand All @@ -367,11 +382,10 @@
"metadata": {},
"outputs": [],
"source": [
"import boto3, json\n",
"import json\n",
"\n",
"client = boto3.client(\"sagemaker-runtime\")\n",
"\n",
"endpoint_name = \"gpt-j\" # Your endpoint name.\n",
"content_type = \"text/plain\" # The MIME type of the input data in the request body.\n",
"payload = \"Amazon.com is the best\" # Payload for inference.\n",
"response = client.invoke_endpoint(\n",
Expand All @@ -395,8 +409,7 @@
"metadata": {},
"outputs": [],
"source": [
"%%sh\n",
"aws sagemaker delete-endpoint --endpoint-name gpt-j"
"sm_client.delete_endpoint(endpoint_name)"
]
},
{
Expand Down

0 comments on commit 0332284

Please sign in to comment.