Merge branch 'aws:main' into main

aws · Aug 16, 2022 · 71f921b · 71f921b
2 parents 9c368b1 + 1c5da89
commit 71f921b
Show file tree

Hide file tree

Showing 38 changed files with 74,709 additions and 70,339 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -11,6 +11,6 @@ _Put an `x` in the boxes that apply. You can also fill these out after creating
 - [ ] I have read the [CONTRIBUTING](https://github.com/aws/amazon-sagemaker-examples/blob/master/CONTRIBUTING.md) doc and adhered to the example notebook best practices
 - [ ] I have updated any necessary documentation, including [READMEs](https://github.com/aws/amazon-sagemaker-examples/blob/master/README.md)
 - [ ] I have tested my notebook(s) and ensured it runs end-to-end
-- [ ] I have linted my notebook(s) and code using `tox -e black-format,black-nb-format`
+- [ ] I have linted my notebook(s) and code using `black-nb -l 100 {path}/{notebook-name}.ipynb`
 
 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -36,12 +36,13 @@ Before sending us a pull request, please ensure that:
 1. Clone your fork of the repository: `git clone https://github.com/<username>/amazon-sagemaker-examples` where `<username>` is your github username.
 
 
-### Run the Linters
+### Run the Linter
 
-1. Install tox using `pip install tox`
-1. cd into the amazon-sagemaker-examples folder: `cd amazon-sagemaker-examples` or `cd /environment/amazon-sagemaker-examples`
-1. Run the following tox command and verify that all linters pass: `tox -e black-check,black-nb-check`
-1. If the linters did not pass, run the following tox command to fix the issues: `tox -e black-format,black-nb-format`
+Apply Python code formatting to Jupyter notebook files using [black-nb](https://pypi.org/project/black-nb/).
+
+1. Install black-nb using `pip install black-nb`
+1. Run the following black-nb command on each of your ipynb notebook files and verify that the linter passes: `black-nb -l 100 {path}/{notebook-name}.ipynb`
+1. Some notebook features such as `%` bash commands or `%%` cell magic cause black-nb to fail. As long as you run the above command to format as much as possible, that is sufficient, even if the check fails
 
 
 ### Test Your Notebook End-to-End

diff --git a/...text_classification_20_newsgroups/hpo_huggingface_text_classification_20_newsgroups.ipynb b/...text_classification_20_newsgroups/hpo_huggingface_text_classification_20_newsgroups.ipynb
@@ -14,7 +14,7 @@
     "Text Classification can be used to solve various use-cases like sentiment analysis, spam detection, hashtag prediction etc. \n",
     "\n",
     "\n",
-    "This notebook demonstrates the use of the [HuggingFace `transformers` library](https://huggingface.co/transformers/) together with a custom Amazon sagemaker-sdk extension to fine-tune a pre-trained transformer on multi class text classification. In particular, the pre-trained model will be fine-tuned using the [`20 newsgroups dataset`](http://qwone.com/~jason/20Newsgroups/). To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on."
+    "This notebook demonstrates the use of the [HuggingFace Transformers library](https://huggingface.co/transformers/) together with a custom Amazon sagemaker-sdk extension to fine-tune a pre-trained transformer on multi class text classification. In particular, the pre-trained model will be fine-tuned using the [20 Newsgroups dataset](http://qwone.com/~jason/20Newsgroups/). To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on."
    ]
   },
   {
@@ -107,7 +107,7 @@
     "\n",
     "Now we'll download a dataset from the web on which we want to train the text classification model.\n",
     "\n",
-    "In this example, let us train the text classification model on the [`20 newsgroups dataset`](http://qwone.com/~jason/20Newsgroups/). The `20 newsgroups dataset` consists of 20000 messages taken from 20 Usenet newsgroups."
+    "In this example, let us train the text classification model on the [20 Newsgroups dataset](http://qwone.com/~jason/20Newsgroups/). The 20 Newsgroups dataset consists of 20000 messages taken from 20 Usenet newsgroups."
    ]
   },
   {
@@ -1040,7 +1040,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Now, let's define the SageMaker `HuggingFace` estimator with resource configurations and hyperparameters to train Text Classification on `20 newsgroups` dataset, running on a `p3.2xlarge` instance."
+    "Now, let's define the SageMaker `HuggingFace` estimator with resource configurations and hyperparameters to train Text Classification on 20 Newsgroups dataset, running on a `p3.2xlarge` instance."
    ]
   },
   {

diff --git a/introduction_to_amazon_algorithms/forecasting_services_comparison/README.md b/introduction_to_amazon_algorithms/forecasting_services_comparison/README.md
@@ -0,0 +1,71 @@
+# Time Series Modeling with Amazon Forecast and DeepAR on SageMaker
+## Overview
+Amazon offers customers a multitude of time series prediction services, including [DeepAR on SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html) and the fully managed service [Amazon Forecast](https://aws.amazon.com/forecast/). The purpose of this notebook series is to compare the two services and highlight their features through two notebooks that demonstrate how to use each service:
+ 1. [DeepAR on SageMaker Example](./deepar_example.ipynb)
+ 2. [Amazon Forecast Example](./forecast_example.ipynb)
+
+ This README will offer a top-level comparison between the two services, while each notebook will serve as a guide in using their respective services as well as understanding their features. Both notebooks will use the UCI [Beijing Multi-Site Air-Quality Data Data Set](https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data), allowing for simpler observation on how each service differs. 
+
+## Introduction
+ ![DeepAR vs Amazon Forecast Comparison Graphic](./images/readme_1.png)
+
+**DeepAR** is a proprietary supervised learning algorithm developed by Amazon Research for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN). While classical forecasting methods such as autoregressive integrated moving average (ARIMA) and exponential smoothing (ETS) fit a single model to each individual time series, DeepAR trains a single model jointly over all of the time series. This is beneficial when there is a set of related time series, and DeepAR begins to outperform classical methods when datasets contain hundreds of related time series. **DeepAR** can be accessed through **Amazon SageMaker**, allowing users to prepare data, train and deploy models, and create forecasts.
+
+**Amazon Forecast** is a fully managed deep learning service dedicated to time series forecasting. It offers a no-code solution, allowing users to prepare data, train and deploy models, and create forecasts with just a few clicks. This means that **Amazon Forecast** can be used without any prior ML knowledge. It currently uses multiple algorithms under the hood to provide increased accuracy, and can be used in a variety of business related domains such as inventory planning, web traffic forecasting, EC2 capacity forecasting, and work force planning.
+
+## Comparison
+### Prior Knowledge Requirements
+The most obvious difference between the two services would be that **Amazon Forecast** requires no coding, which means little to no machine learning or even programming knowledge is required to use the service. However, **Amazon Forecast** can also be accessed through the AWS Command Line Interface (AWS CLI), or [Boto3](https://aws.amazon.com/sdk-for-python/) (the AWS SDK for Python), as demonstrated in this notebook series. 
+
+In contrast, **DeepAR on SageMaker** is accessed through the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/), requiring some programming and ML knowledge to use the service.
+### Models
+
+**DeepAR on SageMaker** offers one algorithm: DeepAR. **Amazon Forecast** offers six algorithms:
+ - CNN-QR
+ - Autoregressive Integrated Moving Average (ARIMA)
+ - DeepAR+
+ - Exponential Smoothing (ETS)
+ - Non-Parametric Time Series (NPTS)
+ - Prophet
+
+Use cases for each algorithm differ, but a general comparison between each algorithm offered by **Amazon Forecast** can be found in the table below:
+![Amazon Forecast Algorithms Comparison Table](./images/readme_2.png)
+More information can be found here: [Comparing Forecast Algorithms](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html#comparing-algos)
+
+When deploying a predictor in **Amazon Forecast**, three options are offered:
+- **Manual Selection** - Manually select a single algorithm to apply to entire dataset
+- **AutoML** - Service finds and applies best-performing algorithm to entire dataset
+- **AutoPredictor** - Service runs all models and blends predictions with the goal of improving accuracy
+
+It is recommended to use AutoPredictor as Manual Selection and AutoML are considered legacy models, and new features will only be supported by the AutoPredictor model. More information on **Amazon Forecast**'s AutoPredictor can be found here: [Amazon Forecast AutoPredictor](https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/AutoPredictor.md).
+
+### Training Time
+With **DeepAR on SageMaker**, your model is trained on a dedicated **EC2** instance. In contrast, **Amazon Forecast** uses fully managed computing resources. This has an impact on training time, as **DeepAR**  on **SageMaker** instances immediately begin training. **Amazon Forecast** resources have a pending phase, and speed of training is not modifiable. In addition, training an AutoPredictor is recommended in **Amazon Forecast**. Since this type of predictor trains every available algorithm, the training time of **Amazon Forecast** is much greater than **DeepAR on SageMaker**. 
+
+In this specific notebook series, the [DeepAR](./deepar_example.ipynb) model took approximately `25 minutes` to train, while Forecast's [AutoPredictor](./forecast_example.ipynb) took appoximately `6 hours and 30 minutes`.
+### Accuracy
+**Amazon Forecast** is generally more accurate than **DeepAR** on its own, since Forecast offers the **AutoPredictor** option. However, users can still choose to use one algorithm over their entire dataset if desired. 
+### Pricing
+#### DeepAR on SageMaker
+There are two main costs to consider when using DeepAR on SageMaker: Training and Inference. Both of these costs are charged hourly, and pricing depending on **EC2** instance type can be found below:
+ - [Available SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/)
+
+In this particular notebook series, an `ml.c4.2xlarge` instance is used for training, while an `ml.c5.large` instance is used for inference.
+#### Amazon Forecast
+Amazon Forecast has 4 cost types:
+ - Imported Data
+ - Training a predictor
+ - Generated forecast data points
+ - Forecast Explanations
+
+For more detailed pricing information, please consult the link below:
+ - [Amazon Forecast Pricing](https://aws.amazon.com/forecast/pricing/)
+
+
+*References:*
+- Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
+- https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data
+- [DeepAR: Probabilistic Forecasting with  
+Autoregressive Recurrent Networks](https://arxiv.org/pdf/1704.04110.pdf)
+- https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/AutoPredictor.md
+- `util` library borrowed from [amazon-forecast-samples](https://github.com/aws-samples/amazon-forecast-samples) library