Merge branch 'main' into local-pyspark

aws · Jun 2, 2022 · 225c924 · 225c924
2 parents 9e5493d + 5c1bf79
commit 225c924
Show file tree

Hide file tree

Showing 34 changed files with 1,188 additions and 1,106 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -217,6 +217,40 @@ Please remember to:
 * Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
 
 
+## Writing Sequential Notebooks
+
+Most notebooks are singular - only one notebook (.ipynb file) is needed to run that example. However, there are a few cases in which an example may be split into multiple notebooks. These are called sequential notebooks, as the sequence of the example is split among multiple notebooks. An example you can look at is [this series of sequential notebooks that demonstrate how to build a music recommender](https://github.com/aws/amazon-sagemaker-examples/tree/main/end_to_end/music_recommendation).
+
+### When should Sequential Notebooks be used?
+
+You may want to consider using sequential notebooks to write your example if the following conditions apply:
+
+* Your example takes over two hours to execute.
+* You want to emphasize on the different steps of the example in great detail and depth (i.e. one notebook goes into detail about data exploration, the next notebook thoroughly describes the model training process, etc).
+* You want customers to have the ability to run part of your example if they wish to (i.e. they only want to run the training portion).
+
+### What are the guidelines for writing Sequential Notebooks?
+
+If you determine that sequential notebooks are the most suitable format to write your examples, please follow these guidelines:
+
+* *Each notebook in the series must independently run end-to-end so that it can be tested in the daily CI (i.e. the CI test amazon-sagemaker-example-pr must pass).*
+    * This may include generating intermediate artifacts which can be immediately loaded up for use in later notebooks, etc. Depending on the situation, intermediate artifacts can be stored in the following places: 
+        * The repo in the same folder where your notebook is stored: This is possible for very small files (on the order of KB)
+        * The sagemaker-sample-files S3 bucket: This is for larger files (on or above the order of MB).
+* Each notebook must have a 'Background Section' clearly stating that the notebook is part of a notebook sequence. It must contain the following elements below. You can look at the 'Background' section in [Music Recommender Data Exploration](https://github.com/aws/amazon-sagemaker-examples/blob/main/end_to_end/music_recommendation/01_data_exploration.ipynb) for an example.
+    * The objective and/or short summary of the notebook series.
+    * A statement that the notebook is part of a notebook series.
+    * A statement communicating that the customer can choose to run the notebook by itself or as part of the series.
+    * List and link to the other notebooks in the series.
+    * Clearly display where the current notebook fits in relation to the other notebooks (i.e. it is the 3rd notebook in the series).
+    * If you have a README that contains more introductory information about the notebook series as a whole, link to it. For example, it is nice to have an architecture diagram showing how the services interact across different notebooks - the README would be a good place to put such information. An example of such a README is You can look at this [README.md](https://github.com/aws/amazon-sagemaker-examples/blob/main/end_to_end/music_recommendation/README.md).
+* If you have a lot of introductory material for your series, please put it in a README that is located in the same directory with your notebook series instead of an introductory notebook. You can look at this [README.md](https://github.com/aws/amazon-sagemaker-examples/blob/main/end_to_end/music_recommendation/README.md) as an example.
+* When you first use an intermediate artifact in a notebook, add a link to the notebook that is responsible for generating that artifact. That way, customers can easily look up how that artifact was created if they wanted to.
+* Use links to shorten the length of your notebook and keep it simple and organized. Instead of writing a long passage about how a feature works (i.e Batch Transform), it is better to link to the documentation for it. 
+* Design your notebook series such that the customer can get benefit from both the individual notebooks and the whole series. For example, each notebook should have clear takeaway points for the customer (i.e. one notebook teaches data preparation and feature engineering, the next notebook teaches training, etc).
+* Put the sequence order in the notebook file name. For example, the first notebook should start with "1_", the second notebook with "2_", etc.
+
+
 ## Example Notebook Best Practices
 
 Here are some general guidelines to follow when writing example notebooks:

diff --git a/async-inference/Async-Inference-Walkthrough.ipynb b/async-inference/Async-Inference-Walkthrough.ipynb
@@ -4,7 +4,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Amazon SageMaker Asynchronous Inference\n",
+    "# Amazon SageMaker Asynchronous Inference\n",
     "_**A new near real-time Inference option for generating machine learning model predictions**_"
    ]
   },
@@ -34,7 +34,7 @@
     "Asynchronous inference is a new inference option for near real-time inference needs. Requests can take up to 15 minutes to process and have payload sizes of up to 1 GB. Asynchronous inference is suitable for workloads that do not have sub-second latency requirements and have relaxed latency requirements. For example, you might need to process an inference on a large image of several MBs within 5 minutes. In addition, asynchronous inference endpoints let you control costs by scaling down endpoints instance count to zero when they are idle, so you only pay when your endpoints are processing requests. \n",
     "\n",
     "### Notebook scope <a id='scope'></a>  \n",
-    "This notebook provides an introduction to the SageMaker Asynchronous inference capability. This notebook will cover the steps required to create an asynchonous inference endpoint and test it with some sample requests. \n",
+    "This notebook provides an introduction to the SageMaker Asynchronous inference capability. This notebook will cover the steps required to create an asynchronous inference endpoint and test it with some sample requests. \n",
     "\n",
     "### Overview and sample end to end flow <a id='overview'></a>\n",
     "Asynchronous inference endpoints have many similarities (and some key differences) compared to real-time endpoints. The process to create asynchronous endpoints is similar to real-time endpoints. You need to create: a model, an endpoint configuration, and then an endpoint. However, there are specific configuration parameters specific to asynchronous inference endpoints which we will explore below. \n",
@@ -66,13 +66,7 @@
     "\n",
     "> The original Titanic dataset, describing the survival status of individual passengers on the Titanic. The titanic data does not contain information from the crew, but it does contain actual ages of half of the passengers. The principal source for data about Titanic passengers is the Encyclopedia Titanica. The datasets used here were begun by a variety of researchers. One of the original sources is Eaton & Haas (1994) Titanic: Triumph and Tragedy, Patrick Stephens Ltd, which includes a passenger list created by many researchers and edited by Michael A. Findlay.\n",
     ">\n",
-    "> Thomas Cason of UVa has greatly updated and improved the Titanic data frame using the Encyclopedia Titanica and created the dataset here. Some duplicate passengers have been dropped, many errors corrected, many missing ages filled in, and new variables created.\n",
-    ">\n",
-    "> For more information about how this dataset was constructed:\n",
-    "http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3info.txt\n",
-    ">\n",
-    "> [1] Author: Frank E. Harrell Jr., Thomas Cason\n",
-    "Source: [Vanderbilt Biostatistics](http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.html)"
+    "> Thomas Cason of UVa has greatly updated and improved the Titanic data frame using the Encyclopedia Titanica and created the dataset here. Some duplicate passengers have been dropped, many errors corrected, many missing ages filled in, and new variables created.\n"
    ]
   },
   {
@@ -116,6 +110,7 @@
     "import sagemaker\n",
     "import boto3\n",
     "from time import gmtime, strftime\n",
+    "from datetime import datetime\n",
     "\n",
     "boto_session = boto3.session.Session()\n",
     "sm_session = sagemaker.session.Session()\n",
@@ -129,28 +124,11 @@
    "metadata": {},
    "source": [
     "Specify your IAM role. Go the AWS IAM console (https://console.aws.amazon.com/iam/home) and add the following policies to your IAM Role:\n",
-    "* SageMakerFullAccessPolicy\n",
-    "* Amazon S3 access: Apply this to get and put objects in your Amazon S3 bucket. Replace `bucket_name` with the name of your Amazon S3 bucket:      \n",
     "\n",
-    "```json\n",
-    "{\n",
-    "    \"Version\": \"2012-10-17\",\n",
-    "    \"Statement\": [\n",
-    "        {\n",
-    "            \"Action\": [\n",
-    "                \"s3:GetObject\",\n",
-    "                \"s3:PutObject\",\n",
-    "                \"s3:AbortMultipartUpload\",\n",
-    "                \"s3:ListBucket\"\n",
-    "            ],\n",
-    "            \"Effect\": \"Allow\",\n",
-    "            \"Resource\": \"arn:aws:s3:::bucket_name/*\"\n",
-    "        }\n",
-    "    ]\n",
-    "}\n",
-    "```\n",
+    "   * SageMakerFullAccessPolicy\n",
+    "\n",
     "\n",
-    "* (Optional) Amazon SNS access: Add `sns:Publish` on the topics you define. Apply this if you plan to use Amazon SNS to receive notifications.\n",
+    "   * (Optional) Amazon SNS access: Add `sns:Publish` on the topics you define. Apply this if you plan to use Amazon SNS to receive notifications.\n",
     "\n",
     "```json\n",
     "{\n",
@@ -161,7 +139,7 @@
     "                \"sns:Publish\"\n",
     "            ],\n",
     "            \"Effect\": \"Allow\",\n",
-    "            \"Resource\": \"arn:aws:sns:us-east-2:123456789012:MyTopic\"\n",
+    "            \"Resource\": \"arn:aws:sns:<aws-region>:<account-id>:<topic-name>\"\n",
     "        }\n",
     "    ]\n",
     "}\n",
@@ -209,7 +187,7 @@
    "outputs": [],
    "source": [
     "bucket_prefix = \"async-inference-demo\"\n",
-    "resource_name = \"AsyncInferenceDemo\""
+    "resource_name = \"AsyncInferenceDemo-{}-{}\""
    ]
   },
   {
@@ -273,7 +251,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "model_name = resource_name.format(\"Model\")\n",
+    "model_name = resource_name.format(\"Model\", datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\"))\n",
     "create_model_response = sm_client.create_model(\n",
     "    ModelName=model_name,\n",
     "    ExecutionRoleArn=sm_role,\n",
@@ -306,7 +284,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "endpoint_config_name = resource_name.format(\"EndpointConfig\")\n",
+    "endpoint_config_name = resource_name.format(\n",
+    "    \"EndpointConfig\", datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\")\n",
+    ")\n",
     "create_endpoint_config_response = sm_client.create_endpoint_config(\n",
     "    EndpointConfigName=endpoint_config_name,\n",
     "    ProductionVariants=[\n",
@@ -322,8 +302,8 @@
     "            \"S3OutputPath\": f\"s3://{s3_bucket}/{bucket_prefix}/output\",\n",
     "            # Optionally specify Amazon SNS topics\n",
     "            # \"NotificationConfig\": {\n",
-    "            #   \"SuccessTopic\": \"arn:aws:sns:us-east-2:123456789012:MyTopic\",\n",
-    "            #   \"ErrorTopic\": \"arn:aws:sns:us-east-2:123456789012:MyTopic\",\n",
+    "            # \"SuccessTopic\": \"arn:aws:sns:<aws-region>:<account-id>:<topic-name>\",\n",
+    "            # \"ErrorTopic\": \"arn:aws:sns:<aws-region>:<account-id>:<topic-name>\",\n",
     "            # }\n",
     "        },\n",
     "        \"ClientConfig\": {\"MaxConcurrentInvocationsPerInstance\": 4},\n",
@@ -352,7 +332,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "endpoint_name = resource_name.format(\"Endpoint\")\n",
+    "endpoint_name = resource_name.format(\"Endpoint\", datetime.now().strftime(\"%Y-%m-%d-%H-%M-%S\"))\n",
+    "\n",
     "create_endpoint_response = sm_client.create_endpoint(\n",
     "    EndpointName=endpoint_name, EndpointConfigName=endpoint_config_name\n",
     ")\n",
@@ -623,14 +604,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": null,
    "metadata": {},
    "outputs": [],
    "source": [
     "response = client.deregister_scalable_target(\n",
-    "    ServiceNamespace='sagemaker',\n",
+    "    ServiceNamespace=\"sagemaker\",\n",
     "    ResourceId=resource_id,\n",
-    "    ScalableDimension='sagemaker:variant:DesiredInstanceCount'\n",
+    "    ScalableDimension=\"sagemaker:variant:DesiredInstanceCount\",\n",
     ")"
    ]
   },

diff --git a/end_to_end/nlp_mlops_company_sentiment/nlp_company_earnings_analysis_pipeline.ipynb b/end_to_end/nlp_mlops_company_sentiment/nlp_company_earnings_analysis_pipeline.ipynb
@@ -341,9 +341,7 @@
    },
    "outputs": [],
    "source": [
-    "# Install updated version of SageMaker\n",
-    "# !pip install -q sagemaker==2.49\n",
-    "!pip install sagemaker --upgrade\n",
+    "!pip install -q sagemaker==2.91.1\n",
     "\n",
     "!pip install transformers\n",
     "!pip install typing\n",
@@ -1520,4 +1518,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}
diff --git a/ingest_data/ingest-with-aws-services/ingest_data_with_Athena.ipynb b/ingest_data/ingest-with-aws-services/ingest_data_with_Athena.ipynb
@@ -175,7 +175,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "When you run the following commend, you will see an error that you cannot list policies if `IAMFullAccess` policy is not attached to your role. Please follow the steps above to attach the IAMFullAccess policy to your role if you see an error."
+    "When you run the following command, you will see an error that you cannot list policies if `IAMFullAccess` policy is not attached to your role. Please follow the steps above to attach the IAMFullAccess policy to your role if you see an error."
    ]
   },
   {