diff --git a/LICENSE.txt b/LICENSE.txt index d645695673..6ff2c6fd00 100644 --- a/LICENSE.txt +++ b/LICENSE.txt @@ -200,3 +200,21 @@ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. + + ====================================================================================== + Amazon SageMaker Examples Subcomponents: + + The Amazon SageMaker Examples project contains subcomponents with separate + copyright notices and license terms. Your use of the source code for the + these subcomponents is subject to the terms and conditions of the following + licenses. See licenses/ for text of these licenses. + + If a folder hierarchy is listed as subcomponent, separate listings of + further subcomponents (files or folder hierarchies) part of the hierarchy + take precedence. + + ======================================================================================= + 2-clause BSD license + ======================================================================================= + _static/kendrasearchtools.js + _templates/search.html diff --git a/README.md b/README.md index 14d69da747..97e48030be 100644 --- a/README.md +++ b/README.md @@ -54,6 +54,7 @@ These examples provide a gentle introduction to machine learning concepts as the - [Population Segmentation of US Census Data using PCA and Kmeans](introduction_to_applying_machine_learning/US-census_population_segmentation_PCA_Kmeans) analyzes US census data and reduces dimensionality using PCA then clusters US counties using KMeans to identify segments of similar counties. - [Document Embedding using Object2Vec](introduction_to_applying_machine_learning/object2vec_document_embedding) is an example to embed a large collection of documents in a common low-dimensional space, so that the semantic distances between these documents are preserved. - [Traffic violations forecasting using DeepAR](introduction_to_applying_machine_learning/deepar_chicago_traffic_violations) is an example to use daily traffic violation data to predict pattern and seasonality to use Amazon DeepAR alogorithm. +- [Visual Inspection Automation with Pre-trained Amazon SageMaker Models](introduction_to_applying_machine_learning/visual_object_detection) is an example for fine-tuning pre-trained Amazon Sagemaker models on a target dataset. ### SageMaker Automatic Model Tuning @@ -75,6 +76,8 @@ These examples introduce SageMaker Autopilot. Autopilot automatically performs f - [Customer Churn AutoML](autopilot/) shows how to use SageMaker Autopilot to automatically train a model for the [Predicting Customer Churn](introduction_to_applying_machine_learning/xgboost_customer_churn) task. - [Targeted Direct Marketing AutoML](autopilot/) shows how to use SageMaker Autopilot to automatically train a model. - [Housing Prices AutoML](sagemaker-autopilot/housing_prices) shows how to use SageMaker Autopilot for a linear regression problem (predict housing prices). +- [Portfolio Churn Prediction with Amazon SageMaker Autopilot and Neo4j](autopilot/sagemaker_autopilot_neo4j_portfolio_churn.ipynb) shows how to use SageMaker Autopilot with graph embeddings to predict investment portfolio churn. +- [Move Amazon SageMaker Autopilot ML models from experimentation to production using Amazon SageMaker Pipelines](autopilot/sagemaker-autopilot-pipelines) shows how to use SageMaker Autopilot in combination with SageMaker Pipelines for end-to-end AutoML training automation. ### Introduction to Amazon Algorithms @@ -149,6 +152,15 @@ These examples provide and introduction to SageMaker Debugger which allows debug - [Reacting to CloudWatch Events from Rules to take an action based on status with TensorFlow](sagemaker-debugger/tensorflow_action_on_rule/) - [Using SageMaker Debugger with a custom PyTorch container](sagemaker-debugger/pytorch_custom_container/) +### Amazon SageMaker Distributed Training + +These examples provide an introduction to SageMaker Distributed Training Libraries for data parallelism and model parallelism. The libraries are optimized for the SageMaker training environment, help adapt your distributed training jobs to SageMaker, and improve training speed and throughput. +More examples for models such as BERT and YOLOv5 can be found in [distributed_training/](https://github.com/aws/amazon-sagemaker-examples/tree/main/training/distributed_training). + +- [Train GPT-2 with Sharded Data Parallel](https://github.com/aws/amazon-sagemaker-examples/tree/main/training/distributed_training/pytorch/model_parallel/gpt2/smp-train-gpt-simple-sharded-data-parallel.ipynb) shows how to train GPT-2 with near-linear scaling using Sharded Data Parallelism technique in SageMaker Model Parallelism Library. +- [Train EleutherAI GPT-J with Model Parallel](https://github.com/aws/amazon-sagemaker-examples/blob/main/training/distributed_training/pytorch/model_parallel/gpt-j/11_train_gptj_smp_tensor_parallel_notebook.ipynb) shows how to train EleutherAI GPT-J with PyTorch and Tensor Parallelism technique in the SageMaker Model Parallelism Library. +- [Train MaskRCNN with Data Parallel](https://github.com/aws/amazon-sagemaker-examples/blob/main/training/distributed_training/pytorch/data_parallel/maskrcnn/pytorch_smdataparallel_maskrcnn_demo.ipynb) shows how to train MaskRCNN with PyTorch and SageMaker Data Parallelism Library. + ### Amazon SageMaker Clarify These examples provide an introduction to SageMaker Clarify which provides machine learning developers with greater visibility into their training data and models so they can identify and limit bias and explain predictions. @@ -185,6 +197,7 @@ These examples showcase unique functionality available in Amazon SageMaker. They - [Host Multiple Models with SKLearn](advanced_functionality/multi_model_sklearn_home_value) shows how to deploy multiple models to a realtime hosted endpoint using a multi-model enabled SKLearn container. - [SageMaker Training and Inference with Script Mode](sagemaker-script-mode) shows how to use custom training and inference scripts, similar to those you would use outside of SageMaker, with SageMaker's prebuilt containers for various frameworks like Scikit-learn, PyTorch, and XGBoost. - [Host Models with NVidia Triton Server](sagemaker-triton) shows how to deploy models to a realtime hosted endpoint using [Triton](https://developer.nvidia.com/nvidia-triton-inference-server) as the model inference server. +- [Heterogenous Clusters Training in TensorFlow or PyTorch ](training/heterogeneous-clusters/README.md) shows how to train using TensorFlow tf.data.service (distributed data pipeline) or Pytorch (with gRPC) on top of Amazon SageMaker Heterogenous clusters to overcome CPU bottlenecks by including different instance types (GPU/CPU) in the same training job. ### Amazon SageMaker Neo Compilation Jobs @@ -213,6 +226,8 @@ These examples show you how to use [SageMaker Pipelines](https://aws.amazon.com/ - [Amazon Comprehend with SageMaker Pipelines](sagemaker-pipelines/nlp/amazon_comprehend_sagemaker_pipeline) shows how to deploy a custom text classification using Amazon Comprehend and SageMaker Pipelines. - [Amazon Forecast with SageMaker Pipelines](sagemaker-pipelines/time_series_forecasting/amazon_forecast_pipeline) shows how you can create a dataset, dataset group and predictor with Amazon Forecast and SageMaker Pipelines. - [Multi-model SageMaker Pipeline with Hyperparamater Tuning and Experiments](sagemaker-pipeline-multi-model) shows how you can generate a regression model by training real estate data from Athena using Data Wrangler, and uses multiple algorithms both from a custom container and a SageMaker container in a single pipeline. +- [SageMaker Pipeline Local Mode with FrameworkProcessor and BYOC for PyTorch with sagemaker-training-toolkig](sagemaker-pipelines/tabular/local-mode/framework-processor-byoc) +- [SageMaker Pipeline Step Caching](sagemaker-pipelines/tabular/caching) shows how you can leverage pipeline step caching while building pipelines and shows expected cache hit / cache miss behavior. ### Amazon SageMaker Pre-Built Framework Containers and the Python SDK diff --git a/_static/kendrasearchtools.js b/_static/kendrasearchtools.js new file mode 100644 index 0000000000..4920607010 --- /dev/null +++ b/_static/kendrasearchtools.js @@ -0,0 +1,700 @@ +/* + * kendrasearchtools.js + * ~~~~~~~~~~~~~~~~ + * + * A modification of searchtools.js (https://github.com/sphinx-doc/sphinx/blob/275d9/sphinx/themes/basic/static/searchtools.js) + * where the default full-text search implemented in searchtools.js is replaced with AWS Kendra searching over multiple + * websites. The default full-text search is still kept and implemented as a fallback in the case that the Kendra search doesn't work. + * + * :copyright: Copyright 2007-2021 by the Sphinx team, see AUTHORS. + * :license: BSD, see LICENSE for details. + * + */ + +if (!Scorer) { + /** + * Simple result scoring code. + */ + var Scorer = { + // Implement the following function to further tweak the score for each result + // The function takes a result array [filename, title, anchor, descr, score] + // and returns the new score. + /* + score: function(result) { + return result[4]; + }, + */ + + // query matches the full name of an object + objNameMatch: 11, + // or matches in the last dotted part of the object name + objPartialMatch: 6, + // Additive scores depending on the priority of the object + objPrio: {0: 15, // used to be importantResults + 1: 5, // used to be objectResults + 2: -5}, // used to be unimportantResults + // Used when the priority is not in the mapping. + objPrioDefault: 0, + + // query found in title + title: 15, + partialTitle: 7, + // query found in terms + term: 5, + partialTerm: 2 + }; +} + +if (!splitQuery) { + function splitQuery(query) { + return query.split(/\s+/); + } +} + +/** + * default rtd search (used as fallback) + */ +var Search = { + + _index : null, + _queued_query : null, + _pulse_status : -1, + + htmlToText : function(htmlString) { + var virtualDocument = document.implementation.createHTMLDocument('virtual'); + var htmlElement = $(htmlString, virtualDocument); + htmlElement.find('.headerlink').remove(); + docContent = htmlElement.find('[role=main]')[0]; + if(docContent === undefined) { + console.warn("Content block not found. Sphinx search tries to obtain it " + + "via '[role=main]'. Could you check your theme or template."); + return ""; + } + return docContent.textContent || docContent.innerText; + }, + + init : function() { + var params = $.getQueryParameters(); + if (params.q) { + var query = params.q[0]; + $('input[name="q"]')[0].value = query; + // this.performSearch(query); + } + }, + + loadIndex : function(url) { + $.ajax({type: "GET", url: url, data: null, + dataType: "script", cache: true, + complete: function(jqxhr, textstatus) { + if (textstatus != "success") { + document.getElementById("searchindexloader").src = url; + } + }}); + }, + + setIndex : function(index) { + var q; + this._index = index; + if ((q = this._queued_query) !== null) { + this._queued_query = null; + Search.query(q); + } + }, + + hasIndex : function() { + return this._index !== null; + }, + + deferQuery : function(query) { + this._queued_query = query; + }, + + stopPulse : function() { + this._pulse_status = 0; + }, + + startPulse : function() { + if (this._pulse_status >= 0) + return; + function pulse() { + var i; + Search._pulse_status = (Search._pulse_status + 1) % 4; + var dotString = ''; + for (i = 0; i < Search._pulse_status; i++) + dotString += '.'; + Search.dots.text(dotString); + if (Search._pulse_status > -1) + window.setTimeout(pulse, 500); + } + pulse(); + }, + + /** + * perform a search for something (or wait until index is loaded) + */ + performSearch : function(query) { + // create the required interface elements + this.out = $('#search-results'); + this.title = $('#search-results h2:first'); // $('

' + _('Searching') + '

').appendTo(this.out); + this.dots = $('#search-results span:first'); //$('').appendTo(this.title); + this.status = $('#search-results p:first'); // $('

 

').appendTo(this.out); + this.output = $('#search-results ul:first'); //$('