grammatical changes

aws · Sep 9, 2022 · f211fb2 · f211fb2
1 parent 875c110
commit f211fb2
Showing 1 changed file with 2 additions and 3 deletions.
diff --git a/..._functionality/pytorch_deploy_large_GPT_model/GPT-J-6B-model-parallel-inference-DJL.ipynb b/..._functionality/pytorch_deploy_large_GPT_model/GPT-J-6B-model-parallel-inference-DJL.ipynb
@@ -313,7 +313,7 @@
    "id": "22d2fc2b",
    "metadata": {},
    "source": [
-    "Note that we configured `ModelDataDownloadTimeoutInSeconds` and `ContainerStartupHealthCheckTimeoutInSeconds` to acommodate the large size of our model. "
+    "Note that we configure `ModelDataDownloadTimeoutInSeconds` and `ContainerStartupHealthCheckTimeoutInSeconds` to acommodate the large size of our model. "
    ]
   },
   {
@@ -373,7 +373,6 @@
     "\n",
     "endpoint_name = \"gpt-j\"  # Your endpoint name.\n",
     "content_type = \"text/plain\"  # The MIME type of the input data in the request body.\n",
-    "# accept = \"...\"                                              # The desired MIME type of the inference in the response.\n",
     "payload = \"Amazon.com is the best\"  # Payload for inference.\n",
     "response = client.invoke_endpoint(\n",
     "    EndpointName=endpoint_name, ContentType=content_type, Body=payload\n",
@@ -407,7 +406,7 @@
    "source": [
     "## Conclusion\n",
     "\n",
-    "In this notebook, you used tensor parallelism to partition a large language model across multiple GPUs for low latency inference. With tensor parallelism, multiple GPUs work on the same model layer at once allowing for faster inference latency when a low batch size is used. Here, we used open source DeepSpeed as the model parallel library to partition the model and open source Deep Java Library Serving as the model serving solution.\n",
+    "In this notebook, you use tensor parallelism to partition a large language model across multiple GPUs for low latency inference. With tensor parallelism, multiple GPUs work on the same model layer at once allowing for faster inference latency when a low batch size is used. Here, we use open source DeepSpeed as the model parallel library to partition the model and open source Deep Java Library Serving as the model serving solution.\n",
     "\n",
     "As a next step, you can experiment with larger models from Hugging Face such as GPT-NeoX. You can also adjust the tensor parallel degree to see the impact to latency with models of different sizes."
    ]