Merge branch 'main' into main

aws · Oct 13, 2022 · 8179f22 · 8179f22
2 parents 5a71a37 + 50cee68
commit 8179f22
Show file tree

Hide file tree

Showing 466 changed files with 365,302 additions and 81,706 deletions.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -11,6 +11,6 @@ _Put an `x` in the boxes that apply. You can also fill these out after creating
 - [ ] I have read the [CONTRIBUTING](https://github.com/aws/amazon-sagemaker-examples/blob/master/CONTRIBUTING.md) doc and adhered to the example notebook best practices
 - [ ] I have updated any necessary documentation, including [READMEs](https://github.com/aws/amazon-sagemaker-examples/blob/master/README.md)
 - [ ] I have tested my notebook(s) and ensured it runs end-to-end
-- [ ] I have linted my notebook(s) and code using `tox -e black-format,black-nb-format`
+- [ ] I have linted my notebook(s) and code using `black-nb -l 100 {path}/{notebook-name}.ipynb`
 
 By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
diff --git a/.gitignore b/.gitignore
@@ -6,3 +6,4 @@
 
 **/_build
 *.iml
+tox.ini
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -36,12 +36,13 @@ Before sending us a pull request, please ensure that:
 1. Clone your fork of the repository: `git clone https://github.com/<username>/amazon-sagemaker-examples` where `<username>` is your github username.
 
 
-### Run the Linters
+### Run the Linter
 
-1. Install tox using `pip install tox`
-1. cd into the amazon-sagemaker-examples folder: `cd amazon-sagemaker-examples` or `cd /environment/amazon-sagemaker-examples`
-1. Run the following tox command and verify that all linters pass: `tox -e black-check,black-nb-check`
-1. If the linters did not pass, run the following tox command to fix the issues: `tox -e black-format,black-nb-format`
+Apply Python code formatting to Jupyter notebook files using [black-nb](https://pypi.org/project/black-nb/).
+
+1. Install black-nb using `pip install black-nb`
+1. Run the following black-nb command on each of your ipynb notebook files and verify that the linter passes: `black-nb -l 100 {path}/{notebook-name}.ipynb`
+1. Some notebook features such as `%` bash commands or `%%` cell magic cause black-nb to fail. As long as you run the above command to format as much as possible, that is sufficient, even if the check fails
 
 
 ### Test Your Notebook End-to-End
@@ -217,6 +218,40 @@ Please remember to:
 * Pay attention to any automated CI failures reported in the pull request, and stay involved in the conversation.
 
 
+## Writing Sequential Notebooks
+
+Most notebooks are singular - only one notebook (.ipynb file) is needed to run that example. However, there are a few cases in which an example may be split into multiple notebooks. These are called sequential notebooks, as the sequence of the example is split among multiple notebooks. An example you can look at is [this series of sequential notebooks that demonstrate how to build a music recommender](https://github.com/aws/amazon-sagemaker-examples/tree/main/end_to_end/music_recommendation).
+
+### When should Sequential Notebooks be used?
+
+You may want to consider using sequential notebooks to write your example if the following conditions apply:
+
+* Your example takes over two hours to execute.
+* You want to emphasize on the different steps of the example in great detail and depth (i.e. one notebook goes into detail about data exploration, the next notebook thoroughly describes the model training process, etc).
+* You want customers to have the ability to run part of your example if they wish to (i.e. they only want to run the training portion).
+
+### What are the guidelines for writing Sequential Notebooks?
+
+If you determine that sequential notebooks are the most suitable format to write your examples, please follow these guidelines:
+
+* *Each notebook in the series must independently run end-to-end so that it can be tested in the daily CI (i.e. the CI test amazon-sagemaker-example-pr must pass).*
+    * This may include generating intermediate artifacts which can be immediately loaded up for use in later notebooks, etc. Depending on the situation, intermediate artifacts can be stored in the following places: 
+        * The repo in the same folder where your notebook is stored: This is possible for very small files (on the order of KB)
+        * The sagemaker-sample-files S3 bucket: This is for larger files (on or above the order of MB).
+* Each notebook must have a 'Background Section' clearly stating that the notebook is part of a notebook sequence. It must contain the following elements below. You can look at the 'Background' section in [Music Recommender Data Exploration](https://github.com/aws/amazon-sagemaker-examples/blob/main/end_to_end/music_recommendation/01_data_exploration.ipynb) for an example.
+    * The objective and/or short summary of the notebook series.
+    * A statement that the notebook is part of a notebook series.
+    * A statement communicating that the customer can choose to run the notebook by itself or as part of the series.
+    * List and link to the other notebooks in the series.
+    * Clearly display where the current notebook fits in relation to the other notebooks (i.e. it is the 3rd notebook in the series).
+    * If you have a README that contains more introductory information about the notebook series as a whole, link to it. For example, it is nice to have an architecture diagram showing how the services interact across different notebooks - the README would be a good place to put such information. An example of such a README is You can look at this [README.md](https://github.com/aws/amazon-sagemaker-examples/blob/main/end_to_end/music_recommendation/README.md).
+* If you have a lot of introductory material for your series, please put it in a README that is located in the same directory with your notebook series instead of an introductory notebook. You can look at this [README.md](https://github.com/aws/amazon-sagemaker-examples/blob/main/end_to_end/music_recommendation/README.md) as an example.
+* When you first use an intermediate artifact in a notebook, add a link to the notebook that is responsible for generating that artifact. That way, customers can easily look up how that artifact was created if they wanted to.
+* Use links to shorten the length of your notebook and keep it simple and organized. Instead of writing a long passage about how a feature works (i.e Batch Transform), it is better to link to the documentation for it. 
+* Design your notebook series such that the customer can get benefit from both the individual notebooks and the whole series. For example, each notebook should have clear takeaway points for the customer (i.e. one notebook teaches data preparation and feature engineering, the next notebook teaches training, etc).
+* Put the sequence order in the notebook file name. For example, the first notebook should start with "1_", the second notebook with "2_", etc.
+
+
 ## Example Notebook Best Practices
 
 Here are some general guidelines to follow when writing example notebooks:

diff --git a/LICENSE.txt b/LICENSE.txt
@@ -200,3 +200,21 @@
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
+
+   ======================================================================================
+    Amazon SageMaker Examples Subcomponents:
+
+    The Amazon SageMaker Examples project contains subcomponents with separate
+    copyright notices and license terms. Your use of the source code for the
+    these subcomponents is subject to the terms and conditions of the following
+    licenses. See licenses/ for text of these licenses.
+
+    If a folder hierarchy is listed as subcomponent, separate listings of
+    further subcomponents (files or folder hierarchies) part of the hierarchy
+    take precedence.
+
+    =======================================================================================
+    2-clause BSD license
+    =======================================================================================
+    _static/kendrasearchtools.js
+    _templates/search.html
diff --git a/README.md b/README.md
@@ -54,6 +54,7 @@ These examples provide a gentle introduction to machine learning concepts as the
 - [Population Segmentation of US Census Data using PCA and Kmeans](introduction_to_applying_machine_learning/US-census_population_segmentation_PCA_Kmeans) analyzes US census data and reduces dimensionality using PCA then clusters US counties using KMeans to identify segments of similar counties.
 - [Document Embedding using Object2Vec](introduction_to_applying_machine_learning/object2vec_document_embedding) is an example to embed a large collection of documents in a common low-dimensional space, so that the semantic distances between these documents are preserved.
 - [Traffic violations forecasting using DeepAR](introduction_to_applying_machine_learning/deepar_chicago_traffic_violations) is an example to use daily traffic violation data to predict pattern and seasonality to use Amazon DeepAR alogorithm.
+- [Visual Inspection Automation with Pre-trained Amazon SageMaker Models](introduction_to_applying_machine_learning/visual_object_detection) is an example for fine-tuning pre-trained Amazon Sagemaker models on a target dataset.
 
 ### SageMaker Automatic Model Tuning
 
@@ -75,6 +76,7 @@ These examples introduce SageMaker Autopilot. Autopilot automatically performs f
 - [Customer Churn AutoML](autopilot/) shows how to use SageMaker Autopilot to automatically train a model for the [Predicting Customer Churn](introduction_to_applying_machine_learning/xgboost_customer_churn) task.
 - [Targeted Direct Marketing AutoML](autopilot/) shows how to use SageMaker Autopilot to automatically train a model.
 - [Housing Prices AutoML](sagemaker-autopilot/housing_prices) shows how to use SageMaker Autopilot for a linear regression problem (predict housing prices).
+- [Portfolio Churn Prediction with Amazon SageMaker Autopilot and Neo4j](autopilot/sagemaker_autopilot_neo4j_portfolio_churn.ipynb) shows how to use SageMaker Autopilot with graph embeddings to predict investment portfolio churn.
 
 ### Introduction to Amazon Algorithms
 
@@ -185,6 +187,7 @@ These examples showcase unique functionality available in Amazon SageMaker. They
 - [Host Multiple Models with SKLearn](advanced_functionality/multi_model_sklearn_home_value) shows how to deploy multiple models to a realtime hosted endpoint using a multi-model enabled SKLearn container.
 - [SageMaker Training and Inference with Script Mode](sagemaker-script-mode) shows how to use custom training and inference scripts, similar to those you would use outside of SageMaker, with SageMaker's prebuilt containers for various frameworks like Scikit-learn, PyTorch, and XGBoost.
 - [Host Models with NVidia Triton Server](sagemaker-triton) shows how to deploy models to a realtime hosted endpoint using [Triton](https://developer.nvidia.com/nvidia-triton-inference-server) as the model inference server.
+- [Heterogenous Clusters Training in TensorFlow or PyTorch ](training/heterogeneous-clusters/README.md) shows how to train using TensorFlow tf.data.service (distributed data pipeline) or Pytorch (with gRPC) on top of Amazon SageMaker Heterogenous clusters to overcome CPU bottlenecks by including different instance types (GPU/CPU) in the same training job.
 
 ### Amazon SageMaker Neo Compilation Jobs
 
@@ -212,6 +215,7 @@ These examples show you how to use [SageMaker Pipelines](https://aws.amazon.com/
 
 - [Amazon Comprehend with SageMaker Pipelines](sagemaker-pipelines/nlp/amazon_comprehend_sagemaker_pipeline) shows how to deploy a custom text classification using Amazon Comprehend and SageMaker Pipelines.
 - [Amazon Forecast with SageMaker Pipelines](sagemaker-pipelines/time_series_forecasting/amazon_forecast_pipeline) shows how you can create a dataset, dataset group and predictor with Amazon Forecast and SageMaker Pipelines.
+- [Multi-model SageMaker Pipeline with Hyperparamater Tuning and Experiments](sagemaker-pipeline-multi-model) shows how you can generate a regression model by training real estate data from Athena using Data Wrangler, and uses multiple algorithms both from a custom container and a SageMaker container in a single pipeline.
 
 ### Amazon SageMaker Pre-Built Framework Containers and the Python SDK