Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating Training Compiler Single Node Multi GPU notebook to use HF-PT 1.11 #3593

Merged
merged 9 commits into from
Sep 12, 2022

Conversation

Lokiiiiii
Copy link
Contributor

@Lokiiiiii Lokiiiiii commented Sep 8, 2022

Description of changes:
This PR is part of a set to update Training Compiler notebooks with our latest release for PT 1.11

Testing done:
Local testing completed in SageMaker Notebook:
language-modeling-multi-gpu-single-node.pdf

Merge Checklist

Put an x in the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.

  • I have read the CONTRIBUTING doc and adhered to the example notebook best practices
  • I have updated any necessary documentation, including READMEs
  • I have tested my notebook(s) and ensured it runs end-to-end
  • I have linted my notebook(s) and code using black-nb -l 100 {path}/{notebook-name}.ipynb

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: ab5389c
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: ab5389c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: ab5389c
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 0a61dfc
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 0a61dfc
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 0a61dfc
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: ef28339
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: ef28339
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: ef28339
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 8d84c72
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 8d84c72
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 8d84c72
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: 8d84c72
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@Lokiiiiii Lokiiiiii requested a review from mchoi8739 September 8, 2022 21:21
@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 973a413
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 973a413
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 973a413
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: 973a413
  • Result: FAILED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: adb2def
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: adb2def
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: adb2def
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: adb2def
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

atqy
atqy previously approved these changes Sep 9, 2022
@@ -2,15 +2,15 @@
"cells": [
Copy link
Contributor

@mchoi8739 mchoi8739 Sep 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't been informed that the native PyTorch support became available. We added the new pytorchxla as distribution strategy only for the HugginigFace estimator. That's also how we documented in the dev guide and the SageMaker HuggingFace estimator's docstring. We didn't add pytorchxla to the SageMaker PyTorch estimator.

Is this statement true?

Reply via ReviewNB

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, pytorchxla support is restricted to HuggingFace estimator with the Training Compiler enabled. On an unrelated note, pytorchddp support is currently only available for the PyTorch estimator.

Training compiler does not support PyTorch estimator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the text here should be fixed/reverted

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The text already reflects the use of PyTorch estimator for native and HuggingFace estimator with training compiler. Can you make a suggestion ?

Copy link
Contributor

@mchoi8739 mchoi8739 Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the statement "We use a PyTorch estimator for native PyTorch ... for SageMaker Training Compiler" is wrong for any case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even without the distribution strategy pytorchxla, we do not support the native PyTorch, as far as I understand.

Copy link
Contributor

@mchoi8739 mchoi8739 Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SageMaker PyTorch estimator class does not have the arg for SMTC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we support native TF through the SM TensorFlow estimator, HF-PT&HF-TF through the SM HuggingFace estimator. But not the native PyTorch.

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 67f4f17
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 67f4f17
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 67f4f17
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: 67f4f17
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-code-formatting
  • Commit ID: 0a2069a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-grammar
  • Commit ID: 0a2069a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: sagemaker-examples-link-check
  • Commit ID: 0a2069a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@sagemaker-bot
Copy link
Collaborator

AWS CodeBuild CI Report

  • CodeBuild project: amazon-sagemaker-examples-pr
  • Commit ID: 0a2069a
  • Result: SUCCEEDED
  • Build Logs (available for 30 days)

Powered by github-codebuild-logs, available on the AWS Serverless Application Repository

@atqy atqy merged commit 81ee21f into aws:main Sep 12, 2022
atqy added a commit to atqy/amazon-sagemaker-examples that referenced this pull request Oct 28, 2022
* Initial files to show Triton fil example with Training using RAPIDS a… (aws#3524)

* Initial files to show Triton fil example with Training using RAPIDS and deploying ensemble for inference time using Conda

* Applied review suggestions and corrected spelling, grammar, link references, and code to call proper wait method instead of creating our own

* Fixed URL for when this will be posted to proper repo

* Refined endpoint waiting logic

* Changed wording of informational paragraphs

* Update wait=True to ensure training job completes before tuning job is launched (aws#3538)

* Deep ar forecast comparison notebooks (aws#3533)

* Initial Draft of Forecasting Service Comparison Notebook

* added DeepAR example

* Cleaned up Example

* DeepAR and Forecast Examples

* Added util in response to comments

* Added Notebook Series and Markdown

* Edited Example Files

* Changed README due to comments, modified util files by removing unnecessary functions and commented util files

Co-authored-by: Jiang <[email protected]>

* Added Model Registry Code (aws#3534)

* added model registry code

Added model registry code and updated the model deployment from model registry.

* Black formatting completed

* Black formatting completed. Resolved the comments

Co-authored-by: Mani Khanuja <[email protected]>

* Fix scikit_learn_data_processing_and_model_evaluation.ipynb (aws#3539)

* enable optional steps to avoid error being raised in scikit_learn_data_processing_and_model_evaluation.ipynb

* edit markdown

* reformat

* fix working-with-tfrecords.ipynb (aws#3542)

* fix advanced_functionality/causal-inference/causal-inference-container.ipynb (aws#3544)

* fix advanced_functionality/causal-inference/causal-inference-container.ipynb

* fix login command

* fix login

* fix login

* fix login

Co-authored-by: EC2 Default User <[email protected]>

* fix pipe_bring_your_own.ipynb (aws#3547)

* fix pipe_bring_your_own.ipynb

* login before pushing to docker

* login before pushing to docker

* fix login issues

* fix login issues

* revert login fix code

Co-authored-by: EC2 Default User <[email protected]>

* fix sagemaker-pipelines/time_series_forecasting/amazon_forecast_pipeline/sm_pipeline_with_amazon_forecast.ipynb (aws#3548)

Co-authored-by: EC2 Default User <[email protected]>

* rename FastAPI Example.ipynb (aws#3550)

Co-authored-by: EC2 Default User <[email protected]>

* fix RestRServe Example (aws#3553)

* rename Plumber Example.ipynb (aws#3551)

Co-authored-by: EC2 Default User <[email protected]>

* change: Update callback step notebook as per recent sdk changes and fix existing issues (aws#3516)

Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Julia Kroll <[email protected]>

* Implement Kendra search in RTD website (aws#3537)

* implement unified search in RTD website

* add sagemaker-debugger rtd to unified search

* add licensing information

* add licensing information

* add licensing information

* add licensing information

* Added local mode notebook (aws#3549)

* Added local mode notebook

* Updated local mode notebook

* Updated sklearn version. Added conclusion

* Fixed whitespace issue

Co-authored-by: Julia Kroll <[email protected]>

* Fix 'JSONLines' -> 'JSON Lines' (aws#3554)

Co-authored-by: atqy <[email protected]>

* fix multi_model_catboost.ipynb (aws#3561)

Co-authored-by: EC2 Default User <[email protected]>

* fix scikit_bring_your_own.ipynb (aws#3552)

* fix scikit_bring_your_own.ipynb

* debug

* debug

* debug

* debug

* cleanup

* cleanup

* cleanup

Co-authored-by: EC2 Default User <[email protected]>

* fix tune_r_bring_your_own.ipynb (aws#3562)

* delete r_examples/r_api_serving_examples (aws#3564)

* delete paddlepaddle_sentiment_analysis_byo_mms (aws#3565)

* Fix 'JSONLines' -> 'JSON Lines' (aws#3558)

Co-authored-by: atqy <[email protected]>

* Fix 'JSONLines' -> 'JSON Lines' (aws#3555)

Co-authored-by: atqy <[email protected]>

* Fix 'JSONLines' -> 'JSON Lines' (aws#3556)

Co-authored-by: atqy <[email protected]>

* Update the studio kernal notebook to TF 2.6 (aws#3568)

Changed the studio notebook TF 2.6

Verified the changes by local testing

* update pytorch DLC version to 1.11 in pytorch mnist sample (aws#3574)

* update pytorch DLC version to 1.11

The notebook fails with current 1.8 pytorch. I think its a problem with the torchvision installed in the container.

```
AlgorithmError: ExecuteUserScriptError: Command "/opt/conda/bin/python3.6 mnist.py --backend gloo --epochs 1" INFO:__main__:Initialized the distributed environment: 'gloo' backend on 2 nodes. Current host rank is 0. Number of gpus: 0 INFO:__main__:Get train data loader Traceback (most recent call last): File "mnist.py", line 257, in <module> train(parser.parse_args()) File "mnist.py", line 114, in train train_loader = _get_train_data_loader(args.batch_size, args.data_dir, is_distributed, **kwargs) File "mnist.py", line 48, in _get_train_data_loader [transforms.ToTensor(), transforms.Normalize((0.1307,), (0.3081,))] File "/opt/conda/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 83, in __init__ ' You can use download=True to download it') RuntimeError: Dataset not found. You can use download=True to download it, exit code: 1
```

* formatting

* l = 100

* fix rapids_sagemaker_hpo.ipynb (aws#3545)

* fix batch_transform_pca_dbscan_movie_clusters_notebook.ipynb (aws#3566)

* fix batch_transform_pca_dbscan_movie_clusters.ipynb

* lower test sample

* cleanup

* lower test percentage

* lower test percentage

* lower test percentage

Co-authored-by: EC2 Default User <[email protected]>

* add new example notebook to compare sagemaker lightgbm catboost autogluon and tabtransformer with AMT on customer churn dataset (aws#3573)

* add new example notebook to compare sagemaker lightgbm catboost autogluon and tabtransformer with AMT on customer churn dataset

* add new example notebook to compare sagemaker lightgbm catboost autogluon and tabtransformer with AMT on customer churn dataset

* Add SageMaker Autopilot and Neo4j portfolio churn notebook. (aws#3505)

* Add SageMaker Autopilot and Neo4j portfolio churn notebook.

* update table of contents for graph embedding notebook

* correct link

* newline

* note on edgar, s3

* notes on ASG

* url anonymized

* spelling

* use s3

* spelling

* name for link

* comment drop

* formatting

* 20 minutes

* more descriptive va name

* branding issues

* remove extra comment

* note on validation

* conclusion

* no more '

* brackets on URL

* black-nb -l 100 sagemaker_autopilot_neo4j_portfolio_churn.ipynb

* incorporate Julia changes to downloadNotebook function

* performance issue

* working with large notebook

* clear outputs.  run linter one more time

* typo

* render link

* format

* remove link

* insert link

* no dash

* fiddling w link

* maybe it's a bad character escape?

* AutoPilot caps

* camel case SageMaker

* bucket specfics

* Bump version to 4.4.9 from 4.4.8

* add stack name, disk size

* add note per Aramide on stack delete.

* note

* typos

Co-authored-by: Julia Kroll <[email protected]>

* Updated the serialisation function for CSV (aws#3580)

Fixed string formatting issue for inference

* Built-in Algorithm: TensorFlow Image Classification (aws#3579)

* TF IC notebook

* TF IC notebook

* TF IC notebook

Co-authored-by: username <[email protected]>
Co-authored-by: atqy <[email protected]>

* Add RTD Search Filters (aws#3581)

* add filters

* correct search url

* change search textbox

* change search box text

* remove AWS in AWS Dev Guide

* cleanup

* more cleanup

* built-in algorithm - tensorflow image classification: Pull Cloudwatch logs (aws#3590)

Co-authored-by: Vivek Madan <[email protected]>

* Pipeline local mode (aws#3587)

* Add notebook that transitions back to SageMaker managed pipeline after valid local mode pipeline.

* Added comments about how to locate CloudWatch logs for Training step output.

* Added optional lookup of SageMaker Execution Role for local laptop runs.

* Renamed new notebook to name of pre-existing local-mode notebook.

* Re-formatted code cells with black-nb; removed cell output.

* Changed SKLearnProcessor framework version back to 1.0-1

* reformat

Co-authored-by: atqy <[email protected]>
Co-authored-by: atqy <[email protected]>

* Add GPT large inference notebook (aws#3594)

* CLI upgrade

* reformat

* grammatical changes

Co-authored-by: Qingwei Li <[email protected]>
Co-authored-by: atqy <[email protected]>

* Updating Training Compiler Single Node Multi GPU notebook to use HF-PT 1.11  (aws#3593)

* Adding new CV notebook for distributed training with PT 1.11

* Upgrading notebook to demonstrate PT 1.11 capabilities

* Removing stale files

* Renaming notebook

* Retry tests

* Upgrading numpy and pandas installation

* Minor correction in wording

* Boto3 version notebook (aws#3597)

* CLI upgrade

* reformat

* grammatical changes

* boto3 version

* boto3 version-with minor change

* serving.perperties remove empty line

* set env variable for tensor_parallel_degree

* grammatic fix

* black-nb

* grammatical change

* endpoint_name fix

* "By" cap

* minor change

Co-authored-by: Qingwei Li <[email protected]>
Co-authored-by: atqy <[email protected]>
Co-authored-by: atqy <[email protected]>

* Add TensorFlow Triton example (aws#3543)

* Add CatBoost MME BYOC example

* formatted

* Resolving comment # 1 and 2

* Resolving comment # 1 and 2

* Resolving comment # 4

* Resolving clean up comment

* Added comments about CatBoost and usage for MME

* Reformatted the jupyter file

* Added the container with the relevant py files

* Added formatting using Black. Also fixed the comments from the Jupyter file

* Added formatting using Black. Also fixed the comments from the Jupyter file

* Added formatting using Black. Also fixed the comments from the Jupyter file

* Add TensorFlow Triton example

* format TensorFlow Triton example

* Action feedback

* Fix link(s) to be descriptive

* Formatted

* Update delete cell

Co-authored-by: rsgrewal <[email protected]>
Co-authored-by: atqy <[email protected]>

* SageMaker-Debugger PT zcc deprecation (aws#3591)

* Updated CNN class activation example for PT 1.12 ZCC deprecation

* Updated PyTorch MNIST script change example

* updated iterative model pruning examples to PT 1.12

* Updated profiler examples to be nonzcc

* Changed nll_loss to NLLLoss

* Fixed build issues

* Removed vscode metadata from notebooks

* renamed experiments to be model specific

* Add standalone visual object detection notebook. (aws#3586)

* Add standalone visual object detection notebook.

* Debug the upload issue

- previously the CI test failed at uplaading .rec to s3.
- use absolute path instead

* Debug code change

* Debug

* Use aws s3 cp to upload data to s3

* Use aws s3 cp to upload data to s3

* Test will small number of training epochs.

* Try to fix the opencv issue by using python3.8

* Try to fix the opencv issue

- remove the 'opencv-python-headless<4.3' restriction

* Downgrade opencv try to resolve the opencv issue.

- ref: https://stackoverflow.com/a/72812857

* Update opencv version trying to resolve the AttributeError issue.

* opendv-python 4.6.0.66 not working, change to 4.5.5.64

* Change to pytorch 1.8 python 3.6 kernel

* Address all comments from the reviewer

- move all behind-the-scene package installation to the beginning of the
  notebook
- polish the README file and address all concerns from the reviewer

* Change to pytorch 1.8 and python 3.6 kernel

* Remove most outputs in the notebook.

Co-authored-by: Tao Sun <[email protected]>

* Add visual object detection notebook to README (aws#3605)

Co-authored-by: atqy <[email protected]>

* Sagemaker DataWrangler Samples addition (aws#3510)

* Create readme.md

* Add files via upload

Joined flow added

* Add files via upload

* Add files via upload

* Add files via upload

* Delete TS-Workshop-Advanced.ipynb

* Delete TS-Workshop-Cleanup.ipynb

* Delete TS-Workshop.ipynb

* Add files via upload

Updated after the CI errors

* Create test.txt

* Add files via upload

* Delete sagemaker-datawrangler/timeseries-dataflow/pictures directory

* Delete timeseries.flow

* Add files via upload

* Add files via upload

* Add files via upload

* Update index.rst

* Add files via upload

Added rst file for joined

* Add files via upload

added tabular index.rst file

* Add files via upload

Uploaded index.rst for time series data

* Delete sagemaker-datawrangler/tabular-dataflow/img directory

Images are now in S3 bucket so deleting this

* Update README.md

updating image links with s3 links

* Update and rename sagemaker-datawrangler/tabular-dataflow/Data-Exploration.md to sagemaker-datawrangler/tabular-dataflow/data-exploration/Data-Exploration.md

updating image link and folder

* Add files via upload

uploading index.rst

* Update and rename sagemaker-datawrangler/tabular-dataflow/Data-Import.md to sagemaker-datawrangler/tabular-dataflow/data-import/Data-Import.md

updated image links

* Add files via upload

index.rst for data import

* Update Data-Transformations.md

* Rename sagemaker-datawrangler/tabular-dataflow/Data-Transformations.md to sagemaker-datawrangler/tabular-dataflow/data-transformations/Data-Transformations.md

* Add files via upload

* Update readme.md

* Delete sagemaker-datawrangler/joined-dataflow/img directory

* Update readme.md

* Delete sagemaker-datawrangler/timeseries-dataflow/img directory

* Update index.rst

* Update index.rst

Updated index.rst to link to other files

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update README.md

referring to /readme.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update README.md

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Add files via upload

* Add files via upload

* Update index.rst

* Create index.rst

* Update index.rst

* Update index.rst

* Add files via upload

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Delete sagemaker-datawrangler/import-flow directory

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

* Update index.rst

added data wrangler to the prep section

* Update index.rst

* Update index.rst

* Add files via upload

Updated per comments from aqyt

* Update explore_data.ipynb

Updated per Amelia comment - present tense

* Update index.rst

Grammer

* Update index.rst

Grammer

* Update index.rst

* Update import-flow.md

Co-authored-by: atqy <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>

* Updated instructions to mention streamings jobs are not supported on GT Console (aws#3608)

Co-authored-by: atqy <[email protected]>

* "docker tag" call improvement (aws#3604)

* CLI upgrade

* reformat

* grammatical changes

* boto3 version

* boto3 version-with minor change

* serving.perperties remove empty line

* set env variable for tensor_parallel_degree

* grammatic fix

* black-nb

* grammatical change

* endpoint_name fix

* "By" cap

* minor change

* docker tag call improvement

Co-authored-by: Qingwei Li <[email protected]>
Co-authored-by: atqy <[email protected]>
Co-authored-by: atqy <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>

* Update SageMaker Training Compiler Example Notebooks for PT1.11 (aws#3592)

* update pytorch_single_gpu_single_node example notebooks

* edit estimator from PyTorch to HuggingFace

* update parameters and fix grammar for roberta-base and bert-base-cased notebook

* update parameters for albert-base-v2 notebook and reformat it

* fix grammar mistake

* fix syntax errors and update albert-base-v2 analysis part

* fix panda and numpy version

* rerun tests

* edit code format

Co-authored-by: Bruce Zhang <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: atqy <[email protected]>

* Add ContainerConfig example comment to ir notebooks (aws#3600)

* Add ContainerConfig example comment to ir notebooks

* adding containerConfig md to rest of the notebooks

* add containerConfig md and handle missing variantName

* rerun pr tests

* rerun pr tests

* rerun pr tests

* rerun pr tests

Co-authored-by: Gary Wang <[email protected]>

* Added Structure for Inferencing examples (aws#3602)

* Inference recommender fix typos (aws#3226)

* Changed FailedReason to FailureReason in JSON query

* Fixed inference typo in failure print statements

* replaced client with inference_client

Co-authored-by: Aaron Markham <[email protected]>

* Adding Heterogeneous Clusters example for TensorFlow and PyTorch (aws#3599)

* initial commit

* notebook fix and misspelling

* add link from root readme.md

* switching cifar-10 to artificial dataset for TF

* adding retries to fit()

* grammer fixes

* remove cifar references

* Removing local tf and pt execution exmaples

* Add security group info for private VPC use case

* Adding index.rst for heterogeneous clusters

* fix PT notebook heading for rst

* fix rst and notebook tables for rst

* Adding programmatic kernel restart

* removing programmatic kernel restart - breaks CI

* Remove tables that don't render in RST

* [Feature]Add Online Explainability notebooks for SageMaker Clarify (aws#3613)

* Add Online Explainability notebooks for SageMaker Clarify

* Correcting text in clean-up sections of online explainability example notebooks

* Updating install commands for captum and sagemaker pypy packages

* debug captum installation

* change instance type

Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: atqy <[email protected]>
Co-authored-by: atqy <[email protected]>

* updating rst files (aws#3619)

* Added  sentence transformers example with TensorRT and Triton Ensemble (aws#3615)

* Added  sentence transformers example with TensorRT and Triton Ensemble

* Notebook changes to pass CI build

* Grammar fixes and installing torch for CI build

* Installing torch to pass CI build

Co-authored-by: atqy <[email protected]>

* Bump protobuf (aws#3616)

Bumps [protobuf](https://github.com/protocolbuffers/protobuf) from 3.20.1 to 3.20.2.
- [Release notes](https://github.com/protocolbuffers/protobuf/releases)
- [Changelog](https://github.com/protocolbuffers/protobuf/blob/main/generate_changelog.py)
- [Commits](protocolbuffers/protobuf@v3.20.1...v3.20.2)

---
updated-dependencies:
- dependency-name: protobuf
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Aaron Markham <[email protected]>

* Fixing outofdate readme.md for heterogeneous clusters feature (aws#3617)

* initial commit

* notebook fix and misspelling

* add link from root readme.md

* switching cifar-10 to artificial dataset for TF

* adding retries to fit()

* grammer fixes

* remove cifar references

* Removing local tf and pt execution exmaples

* Add security group info for private VPC use case

* Adding index.rst for heterogeneous clusters

* fix PT notebook heading for rst

* fix rst and notebook tables for rst

* Adding programmatic kernel restart

* removing programmatic kernel restart - breaks CI

* Remove tables that don't render in RST

* updating outofdate readme.md

* Fix 'JSONLines' -> 'JSON Lines' (aws#3557)

* Fix 'JSONLines' -> 'JSON Lines'

* Open a subset of ~10k S3 files to reduce runtime

Co-authored-by: Aaron Markham <[email protected]>

* Update SMMP GPT sample (aws#3433)

* update smp

* update smp

* fp16 change

* minor fix

* minor fix

* pin transformer version

* Update SMMP notebooks

* update gpt2 script

* update notebook

* minor fix

* minor fix

* minor fix

* minor fix

* fix

* update gptj script and noteboook

* update memory tracker

* minor fix

* fix

* fix gptj notebook

* Update training/distributed_training/pytorch/model_parallel/gpt-j/11_train_gptj_smp_tensor_parallel_notebook.ipynb

Co-authored-by: Miyoung <[email protected]>

* Fix typos&expressions

* reformat

Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>

* Add Sharded Data Parallel notebook (aws#3622)

* add sdp notebook

* minor fix

Co-authored-by: Miyoung <[email protected]>

* minor fix

Co-authored-by: Miyoung <[email protected]>

* minor fix

Co-authored-by: Miyoung <[email protected]>

* minor fix

Co-authored-by: Miyoung <[email protected]>

* review & add additional references

* revert the title fix

* Update README.md

* run black-nb formatting

* incorporate feedback

* Update training/distributed_training/pytorch/model_parallel/gpt2/smp-train-gpt-simple-sharded-data-parallel.ipynb

Co-authored-by: erinho <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Miyoung Choi <[email protected]>

* JumpStart Tensorflow Object Detection algorithm notebook (aws#3624)

* JumpStart Tensorflow Object Detection algorithm notebook

* JumpStart Amazon Tensorflow notebook

* typo fix

* Update SageMaker Training Compiler MNMG Example Notebook for PT1.11 (aws#3611)

* update mnmg notebook and test file

* edit parameters for estimators

* fix format

* edit by comments and update learning rate

* turn off amp

* change dataset from sst2 to wikitext

* edit package install and add comments for ptxla

* fix comments

* fix grammar

Co-authored-by: BruceZhang@eitug <[email protected]>

* Creating SageMaker Autopilot/Pipelines example. (aws#3627)

* Creating SageMaker Autopilot/Pipelines example.

* Applying black code formatter to notebook.

Co-authored-by: atqy <[email protected]>

* Integrate SageMaker Automatic Model Tuning (HPO) with one XGBoost Abalone notebook. (aws#3623)

* Integrate SageMaker Automatic Model Tuning (HPO) with one XGBoost Abalone notebook.

* Addressed comments for HPO integration.

Co-authored-by: Aaron Markham <[email protected]>

* Launch Feature - SageMaker Multi-model endpoints on GPU (aws#3625)

* added MME with GPU code

* added mme on gpu code

* removed mme on gpu code

* removed outputs from the notebook

* added notebook metadata with gpu instance type

* test

* test

* test

* test

* test

* correct folder spelling

Co-authored-by: atqy <[email protected]>
Co-authored-by: atqy <[email protected]>

* updated autoscaling metrics (aws#3633)

* change the job names to be unified with all the other jobs in JumpStart (aws#3631)

Co-authored-by: atqy <[email protected]>

* [FEATURE] Add SageMaker Pipeline local mode example with BYOC and FrameworkProcessor (aws#3614)

* added framework-processor-local-pipelines

* black-np on notebook

* updated README.md

* solving problems for commit id fc80e0d

* solved formatting problem in notebook

* reviewed notebook content, added dataset description, download dataset ffrom public sagemaker s3 bucket

* grammar check

* changed dataset to synthetic transactions dataset

* removed reference to dataset origin

* updated to main branch

* fixing grammar spell

Co-authored-by: Aaron Markham <[email protected]>

* updated sagemaker triton to v22.09 (aws#3634)

* updated sagemaker triton to v22.09

* black nb format notebook

Co-authored-by: atqy <[email protected]>

* Reverting to v22.07 (aws#3637)

* reverting to v22.07

* fixed formating issue

* added images to fix format issue

* Pipeline Step Caching Example Notebook (aws#3638)

* feature: pipeline caching notebook example

* change: initialize notebook

* feature: pipeline caching notebook example and tuning notebook adjustment

* fix: example notebook

* change: README

* fix: notebook code

* fix: grammar

* fix: more grammar

* fix: pr syntax and remove dataset

* fix: updated paths

* fix: tuning notebook formatting

* fix: more path corrections

Co-authored-by: Brock Wade <[email protected]>

* change: Pipeline Caching Example Notebook Improvements (aws#3640)

* feature: pipeline caching notebook example

* change: initialize notebook

* feature: pipeline caching notebook example and tuning notebook adjustment

* fix: example notebook

* change: README

* fix: notebook code

* fix: grammar

* fix: more grammar

* fix: pr syntax and remove dataset

* fix: updated paths

* fix: tuning notebook formatting

* fix: more path corrections

* feature: more commentary, notebook improvements

* fix: grammar

* fix: use present tense

Co-authored-by: Brock Wade <[email protected]>

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: James Park <[email protected]>
Co-authored-by: Shreya Pandit <[email protected]>
Co-authored-by: byj-aws <[email protected]>
Co-authored-by: Jiang <[email protected]>
Co-authored-by: rsgrewal-aws <[email protected]>
Co-authored-by: Mani Khanuja <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: EC2 Default User <[email protected]>
Co-authored-by: qidewenwhen <[email protected]>
Co-authored-by: Dewen Qi <[email protected]>
Co-authored-by: Julia Kroll <[email protected]>
Co-authored-by: Kirit Thadaka <[email protected]>
Co-authored-by: Mohan Gandhi <[email protected]>
Co-authored-by: Suraj Kota <[email protected]>
Co-authored-by: Xin Huang <[email protected]>
Co-authored-by: Ben Lackey <[email protected]>
Co-authored-by: duk-amz <[email protected]>
Co-authored-by: khetan2 <[email protected]>
Co-authored-by: username <[email protected]>
Co-authored-by: vivekmadan2 <[email protected]>
Co-authored-by: Vivek Madan <[email protected]>
Co-authored-by: Paul Hargis <[email protected]>
Co-authored-by: Qingwei Li <[email protected]>
Co-authored-by: Qingwei Li <[email protected]>
Co-authored-by: Loki <[email protected]>
Co-authored-by: Marc Karp <[email protected]>
Co-authored-by: rsgrewal <[email protected]>
Co-authored-by: Jihyeong Lee <[email protected]>
Co-authored-by: Tao Sun <[email protected]>
Co-authored-by: Tao Sun <[email protected]>
Co-authored-by: neelamkoshiya <[email protected]>
Co-authored-by: Aaron Markham <[email protected]>
Co-authored-by: Parth Brahmbhatt <[email protected]>
Co-authored-by: Dingheng (Bruce) Zhang <[email protected]>
Co-authored-by: Bruce Zhang <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: Gary Wang <[email protected]>
Co-authored-by: Noah Luna <[email protected]>
Co-authored-by: Gili Nachum <[email protected]>
Co-authored-by: Aman Malhotra <[email protected]>
Co-authored-by: AnushaVelumani <[email protected]>
Co-authored-by: João Moura <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Gili Nachum <[email protected]>
Co-authored-by: haohanchen-yagao <[email protected]>
Co-authored-by: Miyoung <[email protected]>
Co-authored-by: Erin <[email protected]>
Co-authored-by: erinho <[email protected]>
Co-authored-by: Miyoung Choi <[email protected]>
Co-authored-by: Marcelo Aberle <[email protected]>
Co-authored-by: Choucri Bechir <[email protected]>
Co-authored-by: evikram <[email protected]>
Co-authored-by: Bruno Pistone <[email protected]>
Co-authored-by: Brock Wade <[email protected]>
Co-authored-by: Brock Wade <[email protected]>
atqy pushed a commit to atqy/amazon-sagemaker-examples that referenced this pull request Oct 28, 2022
…T 1.11 (aws#3593)

* Adding new CV notebook for distributed training with PT 1.11

* Upgrading notebook to demonstrate PT 1.11 capabilities

* Removing stale files

* Renaming notebook

* Retry tests

* Upgrading numpy and pandas installation

* Minor correction in wording
atqy pushed a commit to atqy/amazon-sagemaker-examples that referenced this pull request Oct 28, 2022
…T 1.11 (aws#3593)

* Adding new CV notebook for distributed training with PT 1.11

* Upgrading notebook to demonstrate PT 1.11 capabilities

* Removing stale files

* Renaming notebook

* Retry tests

* Upgrading numpy and pandas installation

* Minor correction in wording
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants