Skip to content

Commit

Permalink
Merge branch 'aws:main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
marckarp authored Aug 16, 2022
2 parents 9c368b1 + 1c5da89 commit 71f921b
Show file tree
Hide file tree
Showing 38 changed files with 74,709 additions and 70,339 deletions.
2 changes: 1 addition & 1 deletion .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,6 @@ _Put an `x` in the boxes that apply. You can also fill these out after creating
- [ ] I have read the [CONTRIBUTING](https://github.com/aws/amazon-sagemaker-examples/blob/master/CONTRIBUTING.md) doc and adhered to the example notebook best practices
- [ ] I have updated any necessary documentation, including [READMEs](https://github.com/aws/amazon-sagemaker-examples/blob/master/README.md)
- [ ] I have tested my notebook(s) and ensured it runs end-to-end
- [ ] I have linted my notebook(s) and code using `tox -e black-format,black-nb-format`
- [ ] I have linted my notebook(s) and code using `black-nb -l 100 {path}/{notebook-name}.ipynb`

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
11 changes: 6 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,12 +36,13 @@ Before sending us a pull request, please ensure that:
1. Clone your fork of the repository: `git clone https://github.com/<username>/amazon-sagemaker-examples` where `<username>` is your github username.


### Run the Linters
### Run the Linter

1. Install tox using `pip install tox`
1. cd into the amazon-sagemaker-examples folder: `cd amazon-sagemaker-examples` or `cd /environment/amazon-sagemaker-examples`
1. Run the following tox command and verify that all linters pass: `tox -e black-check,black-nb-check`
1. If the linters did not pass, run the following tox command to fix the issues: `tox -e black-format,black-nb-format`
Apply Python code formatting to Jupyter notebook files using [black-nb](https://pypi.org/project/black-nb/).

1. Install black-nb using `pip install black-nb`
1. Run the following black-nb command on each of your ipynb notebook files and verify that the linter passes: `black-nb -l 100 {path}/{notebook-name}.ipynb`
1. Some notebook features such as `%` bash commands or `%%` cell magic cause black-nb to fail. As long as you run the above command to format as much as possible, that is sufficient, even if the check fails


### Test Your Notebook End-to-End
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"Text Classification can be used to solve various use-cases like sentiment analysis, spam detection, hashtag prediction etc. \n",
"\n",
"\n",
"This notebook demonstrates the use of the [HuggingFace `transformers` library](https://huggingface.co/transformers/) together with a custom Amazon sagemaker-sdk extension to fine-tune a pre-trained transformer on multi class text classification. In particular, the pre-trained model will be fine-tuned using the [`20 newsgroups dataset`](http://qwone.com/~jason/20Newsgroups/). To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on."
"This notebook demonstrates the use of the [HuggingFace Transformers library](https://huggingface.co/transformers/) together with a custom Amazon sagemaker-sdk extension to fine-tune a pre-trained transformer on multi class text classification. In particular, the pre-trained model will be fine-tuned using the [20 Newsgroups dataset](http://qwone.com/~jason/20Newsgroups/). To get started, we need to set up the environment with a few prerequisite steps, for permissions, configurations, and so on."
]
},
{
Expand Down Expand Up @@ -107,7 +107,7 @@
"\n",
"Now we'll download a dataset from the web on which we want to train the text classification model.\n",
"\n",
"In this example, let us train the text classification model on the [`20 newsgroups dataset`](http://qwone.com/~jason/20Newsgroups/). The `20 newsgroups dataset` consists of 20000 messages taken from 20 Usenet newsgroups."
"In this example, let us train the text classification model on the [20 Newsgroups dataset](http://qwone.com/~jason/20Newsgroups/). The 20 Newsgroups dataset consists of 20000 messages taken from 20 Usenet newsgroups."
]
},
{
Expand Down Expand Up @@ -1040,7 +1040,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, let's define the SageMaker `HuggingFace` estimator with resource configurations and hyperparameters to train Text Classification on `20 newsgroups` dataset, running on a `p3.2xlarge` instance."
"Now, let's define the SageMaker `HuggingFace` estimator with resource configurations and hyperparameters to train Text Classification on 20 Newsgroups dataset, running on a `p3.2xlarge` instance."
]
},
{
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Time Series Modeling with Amazon Forecast and DeepAR on SageMaker
## Overview
Amazon offers customers a multitude of time series prediction services, including [DeepAR on SageMaker](https://docs.aws.amazon.com/sagemaker/latest/dg/deepar.html) and the fully managed service [Amazon Forecast](https://aws.amazon.com/forecast/). The purpose of this notebook series is to compare the two services and highlight their features through two notebooks that demonstrate how to use each service:
1. [DeepAR on SageMaker Example](./deepar_example.ipynb)
2. [Amazon Forecast Example](./forecast_example.ipynb)

This README will offer a top-level comparison between the two services, while each notebook will serve as a guide in using their respective services as well as understanding their features. Both notebooks will use the UCI [Beijing Multi-Site Air-Quality Data Data Set](https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data), allowing for simpler observation on how each service differs.

## Introduction
![DeepAR vs Amazon Forecast Comparison Graphic](./images/readme_1.png)

**DeepAR** is a proprietary supervised learning algorithm developed by Amazon Research for forecasting scalar (one-dimensional) time series using recurrent neural networks (RNN). While classical forecasting methods such as autoregressive integrated moving average (ARIMA) and exponential smoothing (ETS) fit a single model to each individual time series, DeepAR trains a single model jointly over all of the time series. This is beneficial when there is a set of related time series, and DeepAR begins to outperform classical methods when datasets contain hundreds of related time series. **DeepAR** can be accessed through **Amazon SageMaker**, allowing users to prepare data, train and deploy models, and create forecasts.

**Amazon Forecast** is a fully managed deep learning service dedicated to time series forecasting. It offers a no-code solution, allowing users to prepare data, train and deploy models, and create forecasts with just a few clicks. This means that **Amazon Forecast** can be used without any prior ML knowledge. It currently uses multiple algorithms under the hood to provide increased accuracy, and can be used in a variety of business related domains such as inventory planning, web traffic forecasting, EC2 capacity forecasting, and work force planning.

## Comparison
### Prior Knowledge Requirements
The most obvious difference between the two services would be that **Amazon Forecast** requires no coding, which means little to no machine learning or even programming knowledge is required to use the service. However, **Amazon Forecast** can also be accessed through the AWS Command Line Interface (AWS CLI), or [Boto3](https://aws.amazon.com/sdk-for-python/) (the AWS SDK for Python), as demonstrated in this notebook series.

In contrast, **DeepAR on SageMaker** is accessed through the [Amazon SageMaker Python SDK](https://sagemaker.readthedocs.io/en/stable/), requiring some programming and ML knowledge to use the service.
### Models

**DeepAR on SageMaker** offers one algorithm: DeepAR. **Amazon Forecast** offers six algorithms:
- CNN-QR
- Autoregressive Integrated Moving Average (ARIMA)
- DeepAR+
- Exponential Smoothing (ETS)
- Non-Parametric Time Series (NPTS)
- Prophet

Use cases for each algorithm differ, but a general comparison between each algorithm offered by **Amazon Forecast** can be found in the table below:
![Amazon Forecast Algorithms Comparison Table](./images/readme_2.png)
More information can be found here: [Comparing Forecast Algorithms](https://docs.aws.amazon.com/forecast/latest/dg/aws-forecast-choosing-recipes.html#comparing-algos)

When deploying a predictor in **Amazon Forecast**, three options are offered:
- **Manual Selection** - Manually select a single algorithm to apply to entire dataset
- **AutoML** - Service finds and applies best-performing algorithm to entire dataset
- **AutoPredictor** - Service runs all models and blends predictions with the goal of improving accuracy

It is recommended to use AutoPredictor as Manual Selection and AutoML are considered legacy models, and new features will only be supported by the AutoPredictor model. More information on **Amazon Forecast**'s AutoPredictor can be found here: [Amazon Forecast AutoPredictor](https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/AutoPredictor.md).

### Training Time
With **DeepAR on SageMaker**, your model is trained on a dedicated **EC2** instance. In contrast, **Amazon Forecast** uses fully managed computing resources. This has an impact on training time, as **DeepAR** on **SageMaker** instances immediately begin training. **Amazon Forecast** resources have a pending phase, and speed of training is not modifiable. In addition, training an AutoPredictor is recommended in **Amazon Forecast**. Since this type of predictor trains every available algorithm, the training time of **Amazon Forecast** is much greater than **DeepAR on SageMaker**.

In this specific notebook series, the [DeepAR](./deepar_example.ipynb) model took approximately `25 minutes` to train, while Forecast's [AutoPredictor](./forecast_example.ipynb) took appoximately `6 hours and 30 minutes`.
### Accuracy
**Amazon Forecast** is generally more accurate than **DeepAR** on its own, since Forecast offers the **AutoPredictor** option. However, users can still choose to use one algorithm over their entire dataset if desired.
### Pricing
#### DeepAR on SageMaker
There are two main costs to consider when using DeepAR on SageMaker: Training and Inference. Both of these costs are charged hourly, and pricing depending on **EC2** instance type can be found below:
- [Available SageMaker Pricing](https://aws.amazon.com/sagemaker/pricing/)

In this particular notebook series, an `ml.c4.2xlarge` instance is used for training, while an `ml.c5.large` instance is used for inference.
#### Amazon Forecast
Amazon Forecast has 4 cost types:
- Imported Data
- Training a predictor
- Generated forecast data points
- Forecast Explanations

For more detailed pricing information, please consult the link below:
- [Amazon Forecast Pricing](https://aws.amazon.com/forecast/pricing/)


*References:*
- Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.
- https://archive.ics.uci.edu/ml/datasets/Beijing+Multi-Site+Air-Quality+Data
- [DeepAR: Probabilistic Forecasting with
Autoregressive Recurrent Networks](https://arxiv.org/pdf/1704.04110.pdf)
- https://github.com/aws-samples/amazon-forecast-samples/blob/main/library/content/AutoPredictor.md
- `util` library borrowed from [amazon-forecast-samples](https://github.com/aws-samples/amazon-forecast-samples) library
Loading

0 comments on commit 71f921b

Please sign in to comment.