grammatical changes

aws · atqy · Sep 12, 2022 · Sep 8, 2022 · Sep 8, 2022 · Sep 9, 2022
commit f211fb239efca196b7ab0d49ad67f431dd021889
diff --git a/..._functionality/pytorch_deploy_large_GPT_model/GPT-J-6B-model-parallel-inference-DJL.ipynb b/..._functionality/pytorch_deploy_large_GPT_model/GPT-J-6B-model-parallel-inference-DJL.ipynb
@@ -313,7 +313,7 @@
    "id": "22d2fc2b",
    "metadata": {},
    "source": [
-    "Note that we configured `ModelDataDownloadTimeoutInSeconds` and `ContainerStartupHealthCheckTimeoutInSeconds` to acommodate the large size of our model. "
+    "Note that we configure `ModelDataDownloadTimeoutInSeconds` and `ContainerStartupHealthCheckTimeoutInSeconds` to acommodate the large size of our model. "
    ]
   },
   {
@@ -373,7 +373,6 @@
     "\n",
     "endpoint_name = \"gpt-j\"  # Your endpoint name.\n",
     "content_type = \"text/plain\"  # The MIME type of the input data in the request body.\n",
-    "# accept = \"...\"                                              # The desired MIME type of the inference in the response.\n",
     "payload = \"Amazon.com is the best\"  # Payload for inference.\n",
     "response = client.invoke_endpoint(\n",
     "    EndpointName=endpoint_name, ContentType=content_type, Body=payload\n",
@@ -407,7 +406,7 @@
    "source": [
     "## Conclusion\n",
     "\n",
-    "In this notebook, you used tensor parallelism to partition a large language model across multiple GPUs for low latency inference. With tensor parallelism, multiple GPUs work on the same model layer at once allowing for faster inference latency when a low batch size is used. Here, we used open source DeepSpeed as the model parallel library to partition the model and open source Deep Java Library Serving as the model serving solution.\n",
+    "In this notebook, you use tensor parallelism to partition a large language model across multiple GPUs for low latency inference. With tensor parallelism, multiple GPUs work on the same model layer at once allowing for faster inference latency when a low batch size is used. Here, we use open source DeepSpeed as the model parallel library to partition the model and open source Deep Java Library Serving as the model serving solution.\n",
     "\n",
     "As a next step, you can experiment with larger models from Hugging Face such as GPT-NeoX. You can also adjust the tensor parallel degree to see the impact to latency with models of different sizes."
    ]