-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace s3_input with s3_data parameter in xgboost_customer_churn #75
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Related: #74 I have not run this notebook, but I ran the related notebook above. Help on class s3_input in module sagemaker.session: class s3_input(builtins.object) | Amazon SageMaker channel configurations for S3 data sources. | | Attributes: | config (dict[str, dict]): A SageMaker ``DataSource`` referencing a SageMaker ``S3DataSource``. | | Methods defined here: | | __init__(self, s3_data, distribution='FullyReplicated', compression=None, content_type=None, record_wrapping=None, s3_data_type='S3Prefix') | Create a definition for input data used by an SageMaker training job. | | See AWS documentation on the ``CreateTrainingJob`` API for more details on the parameters. | | Args: | s3_data (str): Defines the location of s3 data to train on. | distribution (str): Valid values: 'FullyReplicated', 'ShardedByS3Key' | (default: 'FullyReplicated'). | compression (str): Valid values: 'Gzip', 'Bzip2', 'Lzop' (default: None). | content_type (str): MIME type of the input data (default: None). | record_wrapping (str): Valid values: 'RecordIO' (default: None). | s3_data_type (str): Value values: 'S3Prefix', 'ManifestFile'. If 'S3Prefix', ``s3_data`` defines | a prefix of s3 objects to train on. All objects with s3 keys beginning with ``s3_data`` will | be used to train. If 'ManifestFile', then ``s3_data`` defines a single s3 manifest file, listing | each s3 object to train on. The Manifest file format is described in the SageMaker API documentation: | https://aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html |
Ran the notebook #74 and the job completed. INFO:sagemaker:Creating training-job with name: xgboost-2017-11-27-19-22-23-984 |
djarpin
approved these changes
Nov 27, 2017
yuanzhua
pushed a commit
that referenced
this pull request
Dec 4, 2019
* Sagemaker debugger example notebook (#38) * add files for sagemaker debugger notebook * updated notebook * added subfolder * update files * Notebook to enable monitoring for existing Endpoints (#43) * MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs (#49) * MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs * Model monitor end to end notebook (#53) This notebook demonstrates the end to end flow of setting up an Endpoint with data capture, creating a baseline and setting up a monitoring schedule. * Fixing minor issues (#51) * SMDebugger notebook for plotting tensors in realtime (#39) * SMDebugger notebook for plotting tensors in realtime This shows how to run training job asynchronously and plot output of first convolutional layer activations and weights. * Update mxnet-realtime-analysis.ipynb * Adding TensorFlow MNIST analysis example with a custom gradient rule (#41) * Adding TensorFlow MNIST analysis example with a custom gradient rule * Addressing comments * Apply suggestions from code review * Correcting errors in the TensorFlow MNIST Custom Rule example (#47) * Correcting errors in the TensorFlow MNIST Custom Rule example * Correcting typos * Tornasole -> Debugger * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Edit pass for grammar in code review Simple editorial pass for readability. * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * SMDebugger notebook editorial review Edit pass for readability. * Update to use some new methods in smdebug and pysdk * First version or README for SM Debugger (#56) * First version or README for SM Debugger * Update README.md * Model monitoring visualization example (#52) Notebook for visualizing the results generated by a monitoring execution. * Debugger readme brief edit (#60) Terminology udpate * Adding example of MNIST TensorFlow analysis with Rules and reacting o… (#55) * Adding example of MNIST TensorFlow analysis with Rules and reacting off Rule statuses with CloudWatch Events * Correcting Typos and more instructions for Lambda setup * One more Typo fix * fixed example (#61) * fixed example * Add installing section * Modified realtime analysis notebook for MEAD/Loosleaf (#58) * Modified realtime analysis notebook for MEAD/Loosleaf As a result of this notebook will only be runnable after GA as setup section is missing. * Update mxnet version * Update mxnet version * Update install smdebug section * Notebook demonstrating how to enable spot training with sagemaker debugger (#44) * Notebook demonstrating the usage of spot training with SageMaker Debugger * Fixed the notebook to add the checkpoint s3 uri * Moved the files to right location. * Updated the notebook for GA * Added the description regarding how SageMaker works with Spot Training. * Making some content changes * Restructure notebooks, and update the rules notebooks (#57) * Move tensorflow debugging to own folder * Rename folder * Rename files add readme * Move and update custom rule notebook * Update links to notebooks in README * Test and update for GA, the rule notebooks for TF * Brought cloudwatch notebook to using_rules folder, and updated script name and import line, Also added to readme * Update main readme * Ishan's comments addressed * fix train_volume and mxnet version - train_volume_size should be 400GB - mxnet version should be 1.6.0 * fixed kernel name * fixed kernel to conda_mxnet_p36 * Add SageMaker Debugger XGBoost Rules notebook (#54) * Add SageMaker Debugger XGBoost Rules notebook * Rename notebook * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * XGBoost Rules notebook edit review Simple edit pass for grammar and style. * Address comments by rahul003 and john-andrilla * Address comments by john-andrilla * Add debughookconfig and rules description * Final test and add one more feature importance plot * Add description for SHAP * Rename 1p rules to builtin rules * Updated to imports * Add SageMaker Debugger XGBoost realtime analysis notebook (#50) * Add SageMaker Debugger XGBoost realtime analysis notebook * Remove pre-GA setup * Change all Debugger to Amazon SageMaker Debugger * clean up output of cells * Add an example notebook for data preprocessing using SageMaker Processing and Spark (#59) Add an example notebook for data preprocessing using SageMaker Processing and Spark * Add processing sklearn example (#63) Add processing sklearn example * tf-mnist-custom-rule.ipynb editorial review (#65) Simple edit pass for grammar and style. * XGBoost realtime analysis text edit (#64) * Start new review * Move notebook to a temp file for text review * Produce text diff * Copy to another file for edit * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * XGBoost Rules notebook edit review Simple edit pass for grammar and readability. * Change temp file with edits back to original * Changes to catch up with SDK (#66) * Add keras example, and rename folders to make it clear that they are tensorflow (#67) * Rename to tensorflow_using_rules * Add keras example and split notebooks into three different folders * Rename notebook * Incorporate editorial changes for custom rule notebook * Another rename * Editorial changes * Address review comments by adding description at top of scripts, and making tf_keras tensorflow_keras * Update tf_keras_resnet_byoc.py * Doc update and bug fixes in TF MNIST stop training job example (#70) * Doc update and bug fixes in TF MNIST stop training job example * Cleanup * Updated the text in notebook from Sagemaker-dbugger to Amazon SageMak… (#68) * Updated the text in notebook from Sagemaker-dbugger to Amazon SageMaker Debugger * Added try-except block around invoke_rule to catch NoMoreData exception * Removed the empty cell * Modified SageMaker Debugger to Amazon SageMaker Debugger * TF MNIST buildin rule doc fix and bug fix (#69) * TF MNIST buildin rule doc fix and bug fix * remove output selectively * Remove output selectively * Fix typos (#72) * Name Change (#73) * name change * fix * fix * Update tf-mnist-builtin-rule.ipynb Remove the in front of Amazon SageMaker * Update tf-keras-custom-rule.ipynb * Update links in readme (#74) * add sagemaker experiment management sample notebook (#40) * add sagemaker experiment management sample notebook * Downloading sdks and service2.json from a different s3 uri * Address comments and update text. * remove pre-GA cmds * remove cleanup and rename the directory * delete old dir * remove cell output * Fix intro text * Removed unnencessary imports and used the right pip command to install smdebug (#77) * Clear outputs except plots in xgboost debugger notebooks (#75) * Add regression plots * Add confusion matrix gif * Add plots back in * Use python -m instead of pip and use latest pysdk api * Remove extra exclamation point * Fix typos * SMDebugger Example with BYOC (#45) * Example with BYOC * Addressed review comments * Updates to text of notebook * Added a rule and hook from config * Change folder and notebook name * Remove old notebook and script * Remove extra files * Update README * Dont use import all * Updated notebook * Add Deep Graph Library Amazon examples for SageMaker (#48) * Add gcn_tox21 example * Add DGL examples for sagemaker Including GCMC, KGE and GCN * Add KGE hypertune example * add more details * Add sample notebook for AP (#62) * Create sagemaker_automl_direct_marketing.ipynb * Add files via upload * Notebook for AP editorial review Simple edit pass for grammar and style. * Apply suggestions from code review Applied changes from editor review. Co-Authored-By: John Andrilla <[email protected]> * Add files via upload Updated to show the results of batch transform from Tanya * Delete sagemaker_automl_direct_marketing.ipynb Remove incorrectly named file. * Add files via upload Removed output * rebase staging repo with public repo (#78) * Adding a link to DeepRacer ArXiv paper (#915) * adding a video that shows deepracer working in various tracks * adding deepracer paper link to readme * fixing line breaks * fixing line breaks * fixing line breaks * fixing back slash * Add example notebooks for AWS Step Functions Data Science SDK (#917) * Support restoring checkpoint in ray; Add packages in dockerfile (#919) * Support restoring checkpoint in ray; Add packages in dockerfile * Minor formatting fix * Allow user to specify checkpoint in two ways; move notice to launch() * Remove extra space * Add dict and list in _autotype() * Recursively looking for checkpoint path * Fixed broken links (#897) * Adding a new sample notebook demonstrating using data from AWS Data Exchange for training a machine learning model" (#930) * Renamed sample notebook file (#932) * Adding a new sample notebook demonstrating using data from AWS Data Exchange for training a machine learning model" * Rename file * Sagemaker public notebook consistant with the AWS Deepracer console code (#927) * Sagemaker public notebook consistant with the AWS Deepracer console code 1. Fix Exception being Thrown By 4XX Errors 2. Separate User and System Errors for S3 Calls 3. Fix Out of Memory Issue in Training Worker 4. Reduced the time to upload sim traces to S3 to 1minute befor job exits 5. Remove sending training metrics to cloudwatch as reward graphs uses training metrics from the s3 bucket 6. Upload SIM_TRACE data to s3 bucket of the launched job for training and evaluation 7. Modified mameory backend system errors as inofrmation logs as SIMAPP is not halted by these errors 8. Don't allow the car to move backwards 7. Call os._exit when simapp encounters system or user error leading to exit * Changing the S3 bucket for the robomaker simapp * Add sample notebooks for multi-model endpoints functionality (#935) * Add sample notebooks for multi-model endpoints functionality. * Add batch RL example notebook (#926) * adding mxnet embedding serving notebook (#882) * adding ImagNet embedding notebook from gluon vision zoo * modified the resnet embedding following PR review * removed the training API part, updated to mxnet 1.4, enriched the markdown doc. * Embedding demo fix (#938) * using a more robust artifact finding mechanism using an environment variable to explicitely call model file by name instead of using os.listdir() * added the working demo to the fix branch for the PR * Add three ORL examples (#934) * Initial commit for ORL examples * Address comments on bin packing nb; Add news vendor * Finish three notebooks; Add README * Delete duplicated noteboks; Clean up cells * Fix typo & address comments * Add README for RL directory; Typo fix in network compression README file (#940) * Add README for RL directory; Typo fix in network compression README file * Modify README for batch example * Sagemaker debugger example notebook (#38) * add files for sagemaker debugger notebook * updated notebook * added subfolder * update files * Notebook to enable monitoring for existing Endpoints (#43) * MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs (#49) * MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs * Model monitor end to end notebook (#53) This notebook demonstrates the end to end flow of setting up an Endpoint with data capture, creating a baseline and setting up a monitoring schedule. * Fixing minor issues (#51) * SMDebugger notebook for plotting tensors in realtime (#39) * SMDebugger notebook for plotting tensors in realtime This shows how to run training job asynchronously and plot output of first convolutional layer activations and weights. * Update mxnet-realtime-analysis.ipynb * Adding TensorFlow MNIST analysis example with a custom gradient rule (#41) * Adding TensorFlow MNIST analysis example with a custom gradient rule * Addressing comments * Apply suggestions from code review * Correcting errors in the TensorFlow MNIST Custom Rule example (#47) * Correcting errors in the TensorFlow MNIST Custom Rule example * Correcting typos * Tornasole -> Debugger * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Edit pass for grammar in code review Simple editorial pass for readability. * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * SMDebugger notebook editorial review Edit pass for readability. * Update to use some new methods in smdebug and pysdk * First version or README for SM Debugger (#56) * First version or README for SM Debugger * Update README.md * Model monitoring visualization example (#52) Notebook for visualizing the results generated by a monitoring execution. * Debugger readme brief edit (#60) Terminology udpate * Adding example of MNIST TensorFlow analysis with Rules and reacting o… (#55) * Adding example of MNIST TensorFlow analysis with Rules and reacting off Rule statuses with CloudWatch Events * Correcting Typos and more instructions for Lambda setup * One more Typo fix * fixed example (#61) * fixed example * Add installing section * Modified realtime analysis notebook for MEAD/Loosleaf (#58) * Modified realtime analysis notebook for MEAD/Loosleaf As a result of this notebook will only be runnable after GA as setup section is missing. * Update mxnet version * Update mxnet version * Update install smdebug section * Notebook demonstrating how to enable spot training with sagemaker debugger (#44) * Notebook demonstrating the usage of spot training with SageMaker Debugger * Fixed the notebook to add the checkpoint s3 uri * Moved the files to right location. * Updated the notebook for GA * Added the description regarding how SageMaker works with Spot Training. * Making some content changes * Restructure notebooks, and update the rules notebooks (#57) * Move tensorflow debugging to own folder * Rename folder * Rename files add readme * Move and update custom rule notebook * Update links to notebooks in README * Test and update for GA, the rule notebooks for TF * Brought cloudwatch notebook to using_rules folder, and updated script name and import line, Also added to readme * Update main readme * Ishan's comments addressed * fix train_volume and mxnet version - train_volume_size should be 400GB - mxnet version should be 1.6.0 * fixed kernel name * fixed kernel to conda_mxnet_p36 * Add SageMaker Debugger XGBoost Rules notebook (#54) * Add SageMaker Debugger XGBoost Rules notebook * Rename notebook * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * XGBoost Rules notebook edit review Simple edit pass for grammar and style. * Address comments by rahul003 and john-andrilla * Address comments by john-andrilla * Add debughookconfig and rules description * Final test and add one more feature importance plot * Add description for SHAP * Rename 1p rules to builtin rules * Updated to imports * Add SageMaker Debugger XGBoost realtime analysis notebook (#50) * Add SageMaker Debugger XGBoost realtime analysis notebook * Remove pre-GA setup * Change all Debugger to Amazon SageMaker Debugger * clean up output of cells * Add an example notebook for data preprocessing using SageMaker Processing and Spark (#59) Add an example notebook for data preprocessing using SageMaker Processing and Spark * Add processing sklearn example (#63) Add processing sklearn example * tf-mnist-custom-rule.ipynb editorial review (#65) Simple edit pass for grammar and style. * XGBoost realtime analysis text edit (#64) * Start new review * Move notebook to a temp file for text review * Produce text diff * Copy to another file for edit * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * XGBoost Rules notebook edit review Simple edit pass for grammar and readability. * Change temp file with edits back to original * Changes to catch up with SDK (#66) * Add keras example, and rename folders to make it clear that they are tensorflow (#67) * Rename to tensorflow_using_rules * Add keras example and split notebooks into three different folders * Rename notebook * Incorporate editorial changes for custom rule notebook * Another rename * Editorial changes * Address review comments by adding description at top of scripts, and making tf_keras tensorflow_keras * Update tf_keras_resnet_byoc.py * Doc update and bug fixes in TF MNIST stop training job example (#70) * Doc update and bug fixes in TF MNIST stop training job example * Cleanup * Updated the text in notebook from Sagemaker-dbugger to Amazon SageMak… (#68) * Updated the text in notebook from Sagemaker-dbugger to Amazon SageMaker Debugger * Added try-except block around invoke_rule to catch NoMoreData exception * Removed the empty cell * Modified SageMaker Debugger to Amazon SageMaker Debugger * TF MNIST buildin rule doc fix and bug fix (#69) * TF MNIST buildin rule doc fix and bug fix * remove output selectively * Remove output selectively * Fix typos (#72) * Name Change (#73) * name change * fix * fix * Update tf-mnist-builtin-rule.ipynb Remove the in front of Amazon SageMaker * Update tf-keras-custom-rule.ipynb * Update links in readme (#74) * add sagemaker experiment management sample notebook (#40) * add sagemaker experiment management sample notebook * Downloading sdks and service2.json from a different s3 uri * Address comments and update text. * remove pre-GA cmds * remove cleanup and rename the directory * delete old dir * remove cell output * Fix intro text * Removed unnencessary imports and used the right pip command to install smdebug (#77) * Clear outputs except plots in xgboost debugger notebooks (#75) * Add regression plots * Add confusion matrix gif * Add plots back in * Use python -m instead of pip and use latest pysdk api * Remove extra exclamation point * Fix typos * Add readme for dgl_kge example (#79) Add Readme for dgl_kge example * Add sagemaker experiments LL sample notebook. (#80) * rebase staging with public repo (#81) * Adding a link to DeepRacer ArXiv paper (#915) * adding a video that shows deepracer working in various tracks * adding deepracer paper link to readme * fixing line breaks * fixing line breaks * fixing line breaks * fixing back slash * Add example notebooks for AWS Step Functions Data Science SDK (#917) * Support restoring checkpoint in ray; Add packages in dockerfile (#919) * Support restoring checkpoint in ray; Add packages in dockerfile * Minor formatting fix * Allow user to specify checkpoint in two ways; move notice to launch() * Remove extra space * Add dict and list in _autotype() * Recursively looking for checkpoint path * Fixed broken links (#897) * Adding a new sample notebook demonstrating using data from AWS Data Exchange for training a machine learning model" (#930) * Renamed sample notebook file (#932) * Adding a new sample notebook demonstrating using data from AWS Data Exchange for training a machine learning model" * Rename file * Sagemaker public notebook consistant with the AWS Deepracer console code (#927) * Sagemaker public notebook consistant with the AWS Deepracer console code 1. Fix Exception being Thrown By 4XX Errors 2. Separate User and System Errors for S3 Calls 3. Fix Out of Memory Issue in Training Worker 4. Reduced the time to upload sim traces to S3 to 1minute befor job exits 5. Remove sending training metrics to cloudwatch as reward graphs uses training metrics from the s3 bucket 6. Upload SIM_TRACE data to s3 bucket of the launched job for training and evaluation 7. Modified mameory backend system errors as inofrmation logs as SIMAPP is not halted by these errors 8. Don't allow the car to move backwards 7. Call os._exit when simapp encounters system or user error leading to exit * Changing the S3 bucket for the robomaker simapp * Add sample notebooks for multi-model endpoints functionality (#935) * Add sample notebooks for multi-model endpoints functionality. * Add batch RL example notebook (#926) * adding mxnet embedding serving notebook (#882) * adding ImagNet embedding notebook from gluon vision zoo * modified the resnet embedding following PR review * removed the training API part, updated to mxnet 1.4, enriched the markdown doc. * Embedding demo fix (#938) * using a more robust artifact finding mechanism using an environment variable to explicitely call model file by name instead of using os.listdir() * added the working demo to the fix branch for the PR * Add three ORL examples (#934) * Initial commit for ORL examples * Address comments on bin packing nb; Add news vendor * Finish three notebooks; Add README * Delete duplicated noteboks; Clean up cells * Fix typo & address comments * Add README for RL directory; Typo fix in network compression README file (#940) * Add README for RL directory; Typo fix in network compression README file * Modify README for batch example * Sagemaker debugger example notebook (#38) * add files for sagemaker debugger notebook * updated notebook * added subfolder * update files * Notebook to enable monitoring for existing Endpoints (#43) * MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs (#49) * MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs * Model monitor end to end notebook (#53) This notebook demonstrates the end to end flow of setting up an Endpoint with data capture, creating a baseline and setting up a monitoring schedule. * Fixing minor issues (#51) * SMDebugger notebook for plotting tensors in realtime (#39) * SMDebugger notebook for plotting tensors in realtime This shows how to run training job asynchronously and plot output of first convolutional layer activations and weights. * Update mxnet-realtime-analysis.ipynb * Adding TensorFlow MNIST analysis example with a custom gradient rule (#41) * Adding TensorFlow MNIST analysis example with a custom gradient rule * Addressing comments * Apply suggestions from code review * Correcting errors in the TensorFlow MNIST Custom Rule example (#47) * Correcting errors in the TensorFlow MNIST Custom Rule example * Correcting typos * Tornasole -> Debugger * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Edit pass for grammar in code review Simple editorial pass for readability. * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * SMDebugger notebook editorial review Edit pass for readability. * Update to use some new methods in smdebug and pysdk * First version or README for SM Debugger (#56) * First version or README for SM Debugger * Update README.md * Model monitoring visualization example (#52) Notebook for visualizing the results generated by a monitoring execution. * Debugger readme brief edit (#60) Terminology udpate * Adding example of MNIST TensorFlow analysis with Rules and reacting o… (#55) * Adding example of MNIST TensorFlow analysis with Rules and reacting off Rule statuses with CloudWatch Events * Correcting Typos and more instructions for Lambda setup * One more Typo fix * fixed example (#61) * fixed example * Add installing section * Modified realtime analysis notebook for MEAD/Loosleaf (#58) * Modified realtime analysis notebook for MEAD/Loosleaf As a result of this notebook will only be runnable after GA as setup section is missing. * Update mxnet version * Update mxnet version * Update install smdebug section * Notebook demonstrating how to enable spot training with sagemaker debugger (#44) * Notebook demonstrating the usage of spot training with SageMaker Debugger * Fixed the notebook to add the checkpoint s3 uri * Moved the files to right location. * Updated the notebook for GA * Added the description regarding how SageMaker works with Spot Training. * Making some content changes * Restructure notebooks, and update the rules notebooks (#57) * Move tensorflow debugging to own folder * Rename folder * Rename files add readme * Move and update custom rule notebook * Update links to notebooks in README * Test and update for GA, the rule notebooks for TF * Brought cloudwatch notebook to using_rules folder, and updated script name and import line, Also added to readme * Update main readme * Ishan's comments addressed * fix train_volume and mxnet version - train_volume_size should be 400GB - mxnet version should be 1.6.0 * fixed kernel name * fixed kernel to conda_mxnet_p36 * Add SageMaker Debugger XGBoost Rules notebook (#54) * Add SageMaker Debugger XGBoost Rules notebook * Rename notebook * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_1p_rules/xgboost-regression-debugger-rules.ipynb Co-Authored-By: John Andrilla <[email protected]> * XGBoost Rules notebook edit review Simple edit pass for grammar and style. * Address comments by rahul003 and john-andrilla * Address comments by john-andrilla * Add debughookconfig and rules description * Final test and add one more feature importance plot * Add description for SHAP * Rename 1p rules to builtin rules * Updated to imports * Add SageMaker Debugger XGBoost realtime analysis notebook (#50) * Add SageMaker Debugger XGBoost realtime analysis notebook * Remove pre-GA setup * Change all Debugger to Amazon SageMaker Debugger * clean up output of cells * Add an example notebook for data preprocessing using SageMaker Processing and Spark (#59) Add an example notebook for data preprocessing using SageMaker Processing and Spark * Add processing sklearn example (#63) Add processing sklearn example * tf-mnist-custom-rule.ipynb editorial review (#65) Simple edit pass for grammar and style. * XGBoost realtime analysis text edit (#64) * Start new review * Move notebook to a temp file for text review * Produce text diff * Copy to another file for edit * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * Update sagemaker-debugger/xgboost_realtime_analysis/temp-xgboost-realtime-analysis.ipynb Co-Authored-By: John Andrilla <[email protected]> * XGBoost Rules notebook edit review Simple edit pass for grammar and readability. * Change temp file with edits back to original * Changes to catch up with SDK (#66) * Add keras example, and rename folders to make it clear that they are tensorflow (#67) * Rename to tensorflow_using_rules * Add keras example and split notebooks into three different folders * Rename notebook * Incorporate editorial changes for custom rule notebook * Another rename * Editorial changes * Address review comments by adding description at top of scripts, and making tf_keras tensorflow_keras * Update tf_keras_resnet_byoc.py * Doc update and bug fixes in TF MNIST stop training job example (#70) * Doc update and bug fixes in TF MNIST stop training job example * Cleanup * Updated the text in notebook from Sagemaker-dbugger to Amazon SageMak… (#68) * Updated the text in notebook from Sagemaker-dbugger to Amazon SageMaker Debugger * Added try-except block around invoke_rule to catch NoMoreData exception * Removed the empty cell * Modified SageMaker Debugger to Amazon SageMaker Debugger * TF MNIST buildin rule doc fix and bug fix (#69) * TF MNIST buildin rule doc fix and bug fix * remove output selectively * Remove output selectively * Fix typos (#72) * Name Change (#73) * name change * fix * fix * Update tf-mnist-builtin-rule.ipynb Remove the in front of Amazon SageMaker * Update tf-keras-custom-rule.ipynb * Update links in readme (#74) * add sagemaker experiment management sample notebook (#40) * add sagemaker experiment management sample notebook * Downloading sdks and service2.json from a different s3 uri * Address comments and update text. * remove pre-GA cmds * remove cleanup and rename the directory * delete old dir * remove cell output * Fix intro text * Removed unnencessary imports and used the right pip command to install smdebug (#77) * Clear outputs except plots in xgboost debugger notebooks (#75) * Add regression plots * Add confusion matrix gif * Add plots back in * Use python -m instead of pip and use latest pysdk api * Remove extra exclamation point * Fix typos * SMDebugger Example with BYOC (#45) * Example with BYOC * Addressed review comments * Updates to text of notebook * Added a rule and hook from config * Change folder and notebook name * Remove old notebook and script * Remove extra files * Update README * Dont use import all * Updated notebook * Add Deep Graph Library Amazon examples for SageMaker (#48) * Add gcn_tox21 example * Add DGL examples for sagemaker Including GCMC, KGE and GCN * Add KGE hypertune example * add more details * Add sample notebook for AP (#62) * Create sagemaker_automl_direct_marketing.ipynb * Add files via upload * Notebook for AP editorial review Simple edit pass for grammar and style. * Apply suggestions from code review Applied changes from editor review. Co-Authored-By: John Andrilla <[email protected]> * Add files via upload Updated to show the results of batch transform from Tanya * Delete sagemaker_automl_direct_marketing.ipynb Remove incorrectly named file. * Add files via upload Removed output * rebase staging repo with public repo (#78) * Adding a link to DeepRacer ArXiv paper (#915) * adding a video that shows deepracer working in various tracks * adding deepracer paper link to readme * fixing line breaks * fixing line breaks * fixing line breaks * fixing back slash * Add example notebooks for AWS Step Functions Data Science SDK (#917) * Support restoring checkpoint in ray; Add packages in dockerfile (#919) * Support restoring checkpoint in ray; Add packages in dockerfile * Minor formatting fix * Allow user to specify checkpoint in two ways; move notice to launch() * Remove extra space * Add dict and list in _autotype() * Recursively looking for checkpoint path * Fixed broken links (#897) * Adding a new sample notebook demonstrating using data from AWS Data Exchange for training a machine learning model" (#930) * Renamed sample notebook file (#932) * Adding a new sample notebook demonstrating using data from AWS Data Exchange for training a machine learning model" * Rename file * Sagemaker public notebook consistant with the AWS Deepracer console code (#927) * Sagemaker public notebook consistant with the AWS Deepracer console code 1. Fix Exception being Thrown By 4XX Errors 2. Separate User and System Errors for S3 Calls 3. Fix Out of Memory Issue in Training Worker 4. Reduced the time to upload sim traces to S3 to 1minute befor job exits 5. Remove sending training metrics to cloudwatch as reward graphs uses training metrics from the s3 bucket 6. Upload SIM_TRACE data to s3 bucket of the launched job for training and evaluation 7. Modified mameory backend system errors as inofrmation logs as SIMAPP is not halted by these errors 8. Don't allow the car to move backwards 7. Call os._exit when simapp encounters system or user error leading to exit * Changing the S3 bucket for the robomaker simapp * Add sample notebooks for multi-model endpoints functionality (#935) * Add sample notebooks for multi-model endpoints functionality. * Add batch RL example notebook (#926) * adding mxnet embedding serving notebook (#882) * adding ImagNet embedding notebook from gluon vision zoo * modified the resnet embedding following PR review * removed the training API part, updated to mxnet 1.4, enriched the markdown doc. * Embedding demo fix (#938) * using a more robust artifact finding mechanism using an environment variable to explicitely call model file by name instead of using os.listdir() * added the working demo to the fix branch for the PR * Add three ORL examples (#934) * Initial commit for ORL examples * Address comments on bin packing nb; Add news vendor * Finish three notebooks; Add README * Delete duplicated noteboks; Clean up cells * Fix typo & address comments * Add README for RL directory; Typo fix in network compression README file (#940) * Add README for RL directory; Typo fix in network compression README file * Modify README for batch example * Sagemaker debugger example notebook (#38) * add files for sagemaker debugger notebook * updated notebook * added subfolder * update files * Notebook to enable monitoring for existing Endpoints (#43) * MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs (#49) * MNIST training with TensorFlow and CloudWatch monitoring of Rule jobs * Model monitor end to end notebook (#53) This notebook demonstrates the end to end flow of setting up an Endpoint with data capture, creating a baseline and setting up a monitoring schedule. * Fixing minor issues (#51) * SMDebugger notebook for plotting tensors in realtime (#39) * SMDebugger notebook for plotting tensors in realtime This shows how to run training job asynchronously and plot output of first convolutional layer activations and weights. * Update mxnet-realtime-analysis.ipynb * Adding TensorFlow MNIST analysis example with a custom gradient rule (#41) * Adding TensorFlow MNIST analysis example with a custom gradient rule * Addressing comments * Apply suggestions from code review * Correcting errors in the TensorFlow MNIST Custom Rule example (#47) * Correcting errors in the TensorFlow MNIST Custom Rule example * Correcting typos * Tornasole -> Debugger * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Edit pass for grammar in code review Simple editorial pass for readability. * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * Editorial change Co-Authored-By: John Andrilla <[email protected]> * SMDebugger notebook editorial review Edit pass for readability. * Update to use some new methods in smdebug and pysdk * First version or README for SM Debugger (#56) * First version or README for SM Debugger * Update README.md * Model monitoring visualization example (#52) Notebook for visualizing the results g…
atqy
pushed a commit
to atqy/amazon-sagemaker-examples
that referenced
this pull request
Aug 16, 2022
test for training_end functions
atqy
pushed a commit
to atqy/amazon-sagemaker-examples
that referenced
this pull request
Aug 16, 2022
* rotation policy * fix tests * fix write event call * add comments in code * add a test through hook * fix rotation * some fixes * delete file if empty * enable multi-process test * fix multi-process test * add pt distrib test * Revert "add pt distrib test" This reverts commit a8fc661a02ba29e6fdc49019006b2dafc3cbd67d. * enable write to s3 * address some review comments * address some more review comments * cleanup * some fixes * make timestamp mandatory * filename timestamp matches 1st event * more cleanup and fixes * consolidate classes * timestamp in UTC * address review comments * edit base_start_time * remove delete if empty * default queue size and flush secs * Add timestamp test * add abs and rel timestamp in record * save default values to constants file * Cached the names of parsed files to avoid parsing them everytime. * address review comments * lazy file creation * drop events if file creation fails * rename file to event end ts * correct s3 bucket name * test timestamp with file rotation check if timestamp of all events in a file are lesser than timestamp in file name * remove ref to s3 * remove changes to s3.py * add checks for healthy writer * test file open failure * Cleanup hook * Added the buffer for looking up trace file, removed the get_events_at_time function, updated the implementation of get_events to return the active events * make timestamp mandatory everywhere * fix mxnet test * Corrected the multiplier for microseconds * remove flush_secs * Updating the tests directory with new file format. * Simplify class structure * save base_start_time in record * Updated the test directories to the updated YYYYMMDDHR format * init env variables once * Renamed the function and added function comments * address some review comments * cleanup * Fixed the trace file look for start and end time events * Truncating the trace files and updating the test file. * fix pt test * fallback node ID * Removed the functionality to cap the upper_bound_timestamp * Optimize the refreshing the file list based on the last available timestamp in the datasource viz. local or S3 * Correctly named the file suffix. Truncated the horovod timeline file * Added the functionality to download the S3 files in parallel * Addressed the review comments * address review comments * Trace events writer - part 2 (#6) * ensure there's a dir for the new file * add .tmp * handle the case when events are far apart * fix a mistake in cur_hour * updated last_file_close_time to now Co-authored-by: Vikas-kum <[email protected]> * Record step duration in keras hook (#8) * add step duration to keras hook Co-authored-by: Vikas-kum <[email protected]> * test TF step time with timeline writer (#9) * Read node ID from Resource config (#10) * read host ID from resource config * use timeline writer directly (#11) * Added functionality to record node_id in the events (#7) * Added functionality to record node_id in the events * Added the test to verify node id from file * Moved the functions to extract node id and timestamp to utils directory. * Add profiler config parser (#12) * Timeline file name timestamp in us (#15) * file timestamp in us * Add comprehensive tests for detailed profiler config (#18) * adding comprehensive tests * refactoring fixtures * renaming vars * remove imports * remove extraneous fixture * PR changes * documenting test cases * documenting test cases * refactoring fixtures * Supporting efficiently downloding s3 files for distributed training (#14) * Supporting efficiently downloding s3 files for distributed training * updated op_name and args when recording step duration (#17) * fixes for right directory name(#20) * Fix folder name (#21) * fixes * change all variables to microsecs * Updating the files to fix the pre-commit failures (#23) * Change invalid file path (#25) * change invalid file path * fix other precommit errors * Add error handling for parsing profiler config (#27) * Fixing the tests for CI (#28) * Fixing the tests for CI * fix out_dir bug Co-authored-by: Neelesh Dodda <[email protected]> * Default path for profiler has changed (#29) * Update and correct some documentation (#30) * Enabling TF profiler in smdebug (#5) * Enabling TF profiler in smdebug Co-authored-by: Neelesh Dodda <[email protected]> * change variable name and folder path (#35) * change variable name and folder path * add tests to check rotation policy * Add ProfilerSystemMetricFileParser and basic tests (#16) * Add ProfilerSystemMetricFileParser and basic tests * Refactor MetricsReaderBase class * Fix timestamp to event files mapping for both MetricsReader and SystemMetricsReader * rename MetricsReader to AlgorithmMetricsReader * refactoring. Providing a way to avoid cache and hence going OOM (#38) * refactoring. Providing a way to avoid cache and hence going OOM * modifying test cases to have use_in_memory_cache param * Time annotations in PyTorch hook (#13) * modified pytorch hook to record time annotations Co-authored-by: Vikas Kumar <[email protected]> * Pulling in changes from smdebug repo to private (#39) * latest commit from smdebug repo master is * Disable TB Testing (aws#275) with commit id b8661de Co-authored-by: Nihal Harish <[email protected]> Co-authored-by: Vikas-kum <[email protected]> * Reorganizing the profiler tests for PR CI build (#41) * Organized the profiler tests. * Updated the tests.sh for PR CI build * Updated the tests.sh for PR CI build * profiler dashboards (#4) * add files for profiler dashboards * updated dashboards to use timeline reader * fixed bug 2,5,6,7,9,10 from bugbash * fixed bug 1,3,4,8,16,17,19 from bugbash * linked x-axis of timeline charts * Creating a generic profiler dashboard & report (#42) * Creating a generic profiler dashboard which can take a training job name and region and execute the notebook. * review comments * Updated notebooks and added Pandas functionalities (aws#43) (aws#44) * updated notebook and added Pandas functionalities * minor fixes in profiler_generic_dashboard.ipynb Co-authored-by: Nathalie Rauschmayr <[email protected]> * Enable file rotation for Horovod trace file (#33) * Hvd file reader and rotation of files Co-authored-by: Anirudh <[email protected]> * Pytorch profiler new (#40) * adding profiling info to pytorch hook * imore changes * capturing forward and backward time from within pytorch hook Note that hook provides backward end time, so backward start time is approximated to end of last forward/backward or now So, forward times and backward end times should be accurate while backward start time is approximated. * irmeoved print statements * ran pre-commit and removed some log statements * pre commit run * Fixed the assert * Temporarily skipping the test on codebuild projects where pytorch is not installed. * Temporarily skipping the test on codebuild projects where pytorch is not installed. * Temporarily skipping the test on codebuild projects where pytorch is not installed. * Temporarily skipping the test on codebuild projects where pytorch is not installed. * Temporarily skipping the test on codebuild projects where pytorch is not installed. * reverted the temporary changes * Fixed the assert * FIxing the CI test failure * Fixed the code to include the last layer * Updated the tests and refactored the TraceEvent class. * Converted the rnn test to pytest variant * Fixed the assert for passing CI Co-authored-by: Vikas-Kum <[email protected]> Co-authored-by: Vikas Kumar <[email protected]> * Python profiler (#36) Co-authored-by: Neelesh Dodda <[email protected]> * Changes to horovod file parser (aws#46) * TF2 profiler tests (aws#48) * test detailed step/time based profiling * Bug fixes for autograd profiler in Pytorch hook. (aws#50) * fixed pytorch hook * fixed merge conflict * fixed bug in hook * Adding action class (aws#285) (aws#54) * Adding action class Actions added: stop trianing job, email, sms Co-authored-by: Vikas-kum <[email protected]> * Pull in changes from the sagemaker-debugger repository (aws#55) * Pull in changes from the sagemaker-debugger repository * Typecasting profiling parameters to int (aws#52) * Refactor analysis utils (aws#57) * Integration tests for profiler on sagemaker (#19) scripts and infrastructure code * Typecasting str profiling parameters to bool (aws#58) * Typecasting str profiling parameters to bool * Add pyinstrument for python profiling (aws#56) * Make DetailedProfilingConfig a string in profiler config (aws#67) * detailed profiling config now is string * install tf_datasets (aws#66) * Convert profiler data to pandas frame (aws#47) * add class to convert profiler data to pandas frame * fixed local reader * add notebook for pandas queries * added code to find workload balancing issues in multi GPU training * Adding more checks to integration tests (aws#73) * pytorch Added step event, mode and more details to detailed profiling (aws#78) * Added step event, mode and more details to detailed profiling * Changing op name string * Making op_name equivalent to TF * changing step num to mode_step * Adding phase to autograd events * Change timeline node_id for distributed workers (aws#80) * change timeline node_id for distributed workers * Add integration tests for detailed profiling and python profiling (aws#71) * Fixing a bug where step num was not correctly used when enabling detailed profiling Dumping the torch autograd profiler every step. If there are multiple steps then data builds up and can cause gpu memory build up. * Feature to profile for different step phases 2.Capturing profiling step phases for pytorch 3.Fix bug with path string which was always having cprofile in path even if pyinstrument profiler is used * Fix pre-commit * Fix call to stats_filename * Fixing PythonStepStats * auto commit * ifix x * iFix * fix * pre commit fix * fix bug * removed code * make profiling parameters case insensitive * docstring for case insensitive config * precommit * push profiler images to alpha and get tag from environment variable * push profiler images to alpha and get tag from environment variable * Add height param to HeatMap * specify registry ID as env variable, alpha by default * Some cleanup, adding total time in cprofile * Refactored metricsHistogram and stepHistogram and amde more modular * separate usepyinstrument * iFixes for metrics historgram * Fixing StepHistogram * removing pritn with logger * refactoring * changes in detailed profiling * remove imports * notebook fixes and histogram class fixes * Adding wheel lfile * running pre-commit * fix tests * Adding unique thread id , pid, for trace event parser In every event added event_phase, node_id * pre-commit * fixing notebook and other changes * fix check for event_Args None * Changing ntoebook * upload files to s3 during test * minor fix * create new s3 folder for stats * fix syntax errors * Some cleanup * Fix int typecast for rotatemaxfilesizebytes (#19) Co-authored-by: Vikas-kum <[email protected]> * Pull in smdebug 145d43b (#38) * Pull in latest smdebug (0.9.1) (upto commit 145d43b) * Reverting the change to GET_OBJECTS_MULTIPROCESSING_THRESHOLD in #14. * Adding metadata file for TF Profiler parser to include startitime (#4) * TF profiler event parser * fix can_start_prof bug * populate start time * handle tf trace json in reader * separate file for metadata * Reorder the writing of events so that events get correctly written according to their end timestamp. (#39) Co-authored-by: Vikas-kum <[email protected]> * Enable profiling between steps for tensorflow (#2) * Dump HTML for each pyinstrument stats file (#16) * output html in python profiler * dump output html for pyinstrument * Add higher level analysis functions for cProfile python profiling (#6) * Updated preview notebooks (#8) * Valid trace file check (#41) * fix valid trace file check * change log level * Adding analysis utils and updating the analysis notebook (#9) * add pandas analysis utils * update profiler analysis notebook (#32) * Updated analysis utils (#34) * add python profiling to notebook (untested) Co-authored-by: NRauschmayr <[email protected]> Co-authored-by: Neelesh Dodda <[email protected]> * check record end time similar to c++ writer (aws#45) * remove flakiness offset from sm tests (aws#43) * Add example notebook fixes for python profiling (aws#46) * Refactored profiler dashboards (#42) * refactored dashboards to plot new system metrics * updated step timeline chart to plot train/eval/global step * bugfixes for analysis notebook (aws#44) * Bugfixes in analysis and notebooks (aws#49) * Followup to the PR on analysis utils (aws#50) * Prevent metrics reader from reading invalid files (aws#52) * Modify horovod tests to generate check for horovod timeline (aws#51) * Bugfixes (aws#57) * fix for dashboards * Add timeline image for bottlenecks notebook (aws#59) * Error handling for pyinstrument (aws#58) * Enable/disable python profiling after forward pass of pytorch hook instead of backward pass (aws#56) * Pytorch integration tests (#33) * Enabling integration tests for pytorch * Fixed the job index for codebuild project. * Fixed the job index for codebuild project. * Fixing the codebuild project to install smdebugger in docker * Fixing codebuild project * Adding cpu jobs * Adjusted the parameters for cpu jobs * PyTorch detailed profiler traces are not present in detailed_profiling directory. * Fixing the test yml file. * Fixing the test yml file. * Removed commented code. * Added test configuration for absent profiler. * Preloading the cifar10 dataset into source directory. * ENabled the assert for checking the timestamp * adjusted the tracefile counts * Fixed the job names, added tests for cprofile * Updated the job configs * Adjusted the expected trace file count. * Changed the order in which the trace events are written * Reduced the batch size for cpu tests. * Reduced the batch size for cpu tests. * Fixed the imports * Added capability to handle html file. * Adding horovod tests for integration * Adding horovod tests for integration * Fixed the assert for horovod trace file count * Valid trace file check (#41) * fix valid trace file check * change log level * Fixed the expected count of stats and trace files. * Fixed the profiler config name UsePyinstrument * Preloading mnist dataset to avoid downloading it from internet during training. * Bugfixes in analysis and notebooks (aws#49) * Added test scenario to test the file rotations. * Adding more test scenarios * Adding integration test for distributed training using distributed api * Adding horovod training with resnet50 and cifar10 * FIxing tehe launcher script for resnet50 with horovod. * Increased the batch size * Supporting res50 and cifar with horovod. * Fixed the validation for horovod tracefiles. * Update tests/sagemaker/test_profiler_pytorch.py Co-authored-by: Anirudh <[email protected]> * Scheduling sagemaker jobs in parallel. * Fixed the config file path. Co-authored-by: Vandana Kannan <[email protected]> Co-authored-by: Nathalie Rauschmayr <[email protected]> Co-authored-by: Anirudh <[email protected]> * Fix buildspec yaml file for TF integration tests (aws#66) * Merge latest changes from smdebug to smprofiler (aws#68) * Updating analysis utils (aws#63) * Modify step stats util to compute stats for multiproc data * Modify utils to handle multi-node data * Modify notebook utils to handle multi-node data Co-authored-by: Neelesh Dodda <[email protected]> * Merge timeline for framework events (#5) * Fixing the CI failure caused by awscli (aws#72) * Add metrics config (aws#67) * Add API functions to python profiling analysis for correlation with framework metrics (aws#53) * Dataloader analysis for PyTorch (aws#64) * Adding the functions to get the dataloader events for pytorch * Adding the training script and notebook for dataloader analysis * Fixed the timeconversion from timestamp to UTC and fixed the local reader for system tracefiles. * Updating the dataloader analysis notebook * Updated the notebook with analysis for batch processing. * Updated notebook to display python profiler stats. * Updated the notebook with documenation and layout * Updated the notebook to have static contents * Updating the notebook to handle absence of traceevents * FIxed the tracevents as per the current format and added notebook for triggering the pytorch training jobs * Moved the analysis functions from notebook to a class * Updated the utility functions to retrieve the dataloader events * Added the test scripts for horovod and distributed training * Adding a script that uses dummy custom dataloader * Addressed the review comments * Updated the utility code and added a training script that uses custom datasets * Added hyper parameteres for custom dataset training. * Fix TF event file decompression issue (aws#73) * Fix bugs in keras hook (aws#75) * Reorder events in pytorch hook (aws#60) * Refactor metrics config (aws#76) * Perf benchmark (#31) * Fix for hvd reader issue and one more change (aws#74) * Fixing the batch time analysis in interactive notebook to not generate incorrect plot (aws#81) * Fixing the compuation of batchtime * Fixing the compuation of batchtime * retrigger CI * Attempting to fix PR CI * Attempting to fix PR CI for PyTorch * Attempting to fix PR CI for PyTorch * Merge timeline fixes (aws#82) * Merge timeline fixes 1) putting the node_ids as threads. 2) Providing right sort order for processes and threads 3) Fixing bugs * add check if gpu is available (aws#62) Co-authored-by: Vikas-kum <[email protected]> * Performance benchmarking for PyTorch (aws#78) * Pytorch performance tests * Fixed the estimator * Fixed the training script for correct metrics generation * Added train duration metrics in the training script * Adjusted the alarm values * Adjusted the alarm values * Fixed the job name for no smdebug and no profiler * Optimized the training script and added comments in the driver script. * Updated the scripts for framework only training job * Removed the unenecessary code. * Updating the instance types. * Notebook for interactive analysis (aws#69) * Notebook for interactive analysis * add python profiling to interactive analysis notebook * Updated the interactive notebook with dataloader analysis for pytorch * updated the utility functions to retrieve the dataloader events * some changes to the nb * some fixes to the nb * fixes * reset index * editing nb content * fixes * nit fix * fixes after metricsconfig * update notebooks * add updated job notebooks * updated notebooks for bug bash * update TF notebook * rename notebooks * rename notebooks * updating notebooks with feedback * Renamed Profiler to EagleEye * minor edits * scripts * fix * Updated the interactive anlaysis notebook with minor fix. * Updated the instance type for rules to ml.m5.8xlarge' * Updated the rules instances to ml.r5.4xlarge' * miyoung's changes Co-authored-by: Neelesh Dodda <[email protected]> Co-authored-by: Amol Lele <[email protected]> Co-authored-by: Anirudh <[email protected]> * Fixed the metrics names to have correct instance names. (aws#88) * Added empty name in an event during merge_timeline if it is missing (aws#87) * Add an empty name only for Horovod and Herring events if name is missing for E events. * Add ProfilerTrial class and profiler builtin rules (aws#54) * add files for gpu usage rule * adding rule to detect cpu bottlenecks * add rule to detect outliers in step duration * added node id to rule analysis * add rule for checking gpu memory increase * added rules for batch size and max initialization time * add rule to detect load balancing issues in multi GPU training * add dockerfiles to build rule container * applying changes from https://github.com/awslabs/sagemaker-profiler/commit/57dfe2bd960ae798610b6ff52f661a4f5475eded fixed output directory and label legends Co-authored-by: Vandana Kannan <[email protected]> Co-authored-by: Vikas Kumar <[email protected]> * Fixing the writing of first event in the tracefile that stores the start time from epoch (aws#85) * Fixing the writing of first event in the tracefile. * Added the master table to ensure that we always write the metaevent in the new traceevent file. * Fixing bugs in KerasHook and profiler utils (aws#89) * Change smdebug version in notebooks (aws#90) * change smdebug version * rename tf_python_stats_dir to python_stats_dir Co-authored-by: Neelesh Dodda <[email protected]> * Dynamic ON/OFF Herring timeline for PyTorch framework (aws#80) * Fix pytest version (aws#91) * support mixed precision training (aws#96) * merging sys metrics and bottlenecks in the timeline (aws#93) * merging sys metrics and bottlenecks in the timeline * Fix hvd failures and add native TF training in TF integration tests (aws#97) * Reading rule stop signal file and stopping the rule if gracetime has … (aws#98) * Reading rule stop signal file and stopping the rule if gracetime(60s) has passed * [Sync] Sync smdebug with sagemaker-debugger master branch (aws#95) Co-authored-by: Vikas-kum <[email protected]> Co-authored-by: Vandana Kannan <[email protected]> Co-authored-by: Anirudh <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Miyoung Choi <[email protected]> Co-authored-by: Rahul Huilgol <[email protected]> Co-authored-by: Amol Lele <[email protected]> * add rule for framework metrics (aws#100) * add rule for framework metrics overview * update report * replaced matplolib figures with bokeh charts * fix pre-commit error * minor fixes in report notebook Co-authored-by: Connor Goggins <[email protected]> * Update Profiler Trial and Rules to Generate Report on Every Invoke (aws#102) * [TRSL-1037] Emit RuleEvaluationConditionMet from ProfilerReport Rule (aws#105) * [TRSL-1037] Emit RuleEvaluationConditionMet from ProfilerReport Rule Update ProfilerReport rule to emit RuleEvaluationConditionMet if any subrule having rule evaluation confition met. * Update to emit RuleEvaluationConditionMet at the end of job * Fix comment * add unit test for ProfilerReport * remove scanel_interval passed in * Update unit tests * Fix incorrect comment on last step. * Update log message. * Sync with sagemaker-debugger master branch and fix issue with tensorflow_datasets version (aws#114) * Update sagemaker.md (aws#250) * Bumping version to 0.9.0 (aws#251) * Skip using standalone keras Py3.7+ (aws#253) * Gradtape zcc (aws#252) * Fix Incorrect Log Statement (aws#256) * Incorrect number of tensors saved with MirroredStrategy (aws#257) * Change Version to 0.8.1 (aws#258) * Save Scalars With Mirrored Strategy (aws#259) * skip flaky test (aws#262) * Don't export to collections for all workers with unsupported distrib training (aws#263) * version bump (aws#265) * Avoiding Basehook object pickling (aws#266) * handle eager tensors (aws#271) * TF 2.x: Support for keras to estimator (aws#268) * Revert "TF 2.x: Support for keras to estimator (aws#268)" (aws#273) This reverts commit 749bded. * Disable TB Testing (aws#275) * Support for TF 2 estimator (aws#274) * Adding a TF2 Hvd example and test (aws#279) * Moved end of training log from info to debug (aws#281) awslabs/sagemaker-debugger#280 * Adding action class (aws#285) * Adding action class Actions added: stop trianing job, email, sms * Fix buildspec used for PR CI (aws#287) * Adding a test to check that PT model is saved without issues (aws#283) * test that model can be pickled without issues * Save Model Inputs, Model Outputs, Gradients, Custom Tensors, Layer Inputs, Layer Outputs (aws#282) * Pin pytest version (aws#293) * Load IRIS Dataset from S3 (aws#298) * Load dataset from s3 (aws#299) * remove problematic log (aws#300) * Change Enum (aws#301) * Doc update (aws#292) * rename enum (aws#305) * version bump to 0.9.1 (aws#304) * modify asserts (aws#307) * version compare (aws#306) * Support TF 2.3 Tests (aws#312) * Disable TB in ZCC for AWS TF 2.3.0 (aws#316) * Update Assert Statements For New TF 2.2.0 DLC (aws#320) * Version Bump (aws#319) * add a note for TF 2.2 limited support (aws#303) Co-authored-by: Miyoung Choi <[email protected]> Co-authored-by: Nihal Harish <[email protected]> * TF 2.2 documentation update (aws#322) * update TF 2.2 smdebug features * Update code samples/notes for new pySDK and smdebug/add and fix links * add 'New features' note Co-authored-by: Miyoung Choi <[email protected]> * Adding pagination in list_training_jobs (aws#323) * Adding pagination in list_Training_jobs * Test Custom Step Usecase (aws#331) * save tf2 model (aws#333) * Add ability to only save shapes of tensors (aws#328) * Revert "Add ability to only save shapes of tensors (aws#328)" (aws#337) This reverts commit c9eb769. * Function to Test If the hook has been configured with the Default hook config (aws#332) * Default hook config (aws#338) * version bump (aws#339) * TF ZCC limitation footnote (aws#342) * Ability to save shapes (aws#341) * WIP saveshape * Add shape writer * Add pytorch test * Add untested keras test * fix syntax * fix syntax * Import * Import * Add tests for TF * Simplify read code * Add read API and tests * Add mxnet test * Add s3 and json tests * lint * Fix payload * fix import * Handle different num tensors for losses * Fix exact equal condition * Fix mode bug * trigger CI * Add support for distributed training with writer map * Check that value throws exception * Fix tests to make them more resilient * Fix mxnet and pytorch tests * Remove tensor names * pre-commmit * Fix get_mode * Fix bug with old index files * Fix keras test with names of tensors * Set original name to None if tf_obj is None * Fix mirrored test for cpu * Add docs * trigger CI * Fix shape writer get * Simplify by removing shape writer * Cleanup * Fix name of writer * Addressed review comments * trigger ci * retrigger CI Co-authored-by: NihalHarish <[email protected]> * Support Inputs and Labels in the dict format (aws#345) * 0.9.4 (aws#347) * Refactor Make Numpy Array (aws#329) * warn gradtape users about tf.function support (aws#348) * Support all tf types (aws#346) * Model Subclassing Test (aws#351) * Modify Should Save Tensor Test To Work on Any Version of TF (aws#352) * framework version updates (aws#360) * list training jobs improvements (aws#349) * Earlier list training job would make 50 attempts irrespective. This may be bad because of unnecessary traffic. * if there are training jobs found with prefix, we break * if there are exceptions caught more than 5 times we break. * Handle Deprecation Of experimental_ref api (aws#356) * check file exist before moving (aws#364) * check file exist before moving when closing the file. * Support Saving Tensors in Graph Mode with add_for_mode (aws#353) * Change layer name logic (aws#357) * Pass Variable Length Argument To Old Function Call (aws#366) * test concat layers (aws#367) * Update README.md (aws#371) * Pinning the version of tensorflow_datasets package so that it does not require updating TF (aws#373) Co-authored-by: NihalHarish <[email protected]> * Bugfix: Debugger breaks if should_save_tensor is called before collections are prepared (aws#372) * Fixing the nightly build pipelines. Avoid force reinstall of rules package when not necessary (aws#374) * returning list instead of dict keys (aws#376) fix in reuturn of _get_sm_tj_jobs_with_prefix . This function should return list always. * Add support for mixed precision training (aws#378) * Modify Asserts to Work with TF 2.1.0 and TF 2.0.0 (aws#380) * pytorch tmp (aws#382) * extend zcc to 2.1.2 (aws#384) * disable pytorch (aws#386) * Removed the redundant installation of smdebug and smdebug-rules (aws#391) * Incrementing the version to 0.9.5 (aws#396) * pin tensorflow dataset in test config (aws#399) * add back test * revert some changes * unpin pytest version Co-authored-by: Nihal Harish <[email protected]> Co-authored-by: Vikas-kum <[email protected]> Co-authored-by: Vandana Kannan <[email protected]> Co-authored-by: Anirudh <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Miyoung Choi <[email protected]> Co-authored-by: Rahul Huilgol <[email protected]> Co-authored-by: Amol Lele <[email protected]> * Changing the Herring user-facing API (aws#110) * [TRSL-998] Update Rule Test with Result Checking (aws#106) * [TRSL-998] Update Rule Test with Result Checking Update existing rule testing to assert against rule output. This will ensure rule are tested with its report result which should be deterministic thru CI. * Generate HTML Report at every ProfilerReport invoke (aws#112) This change adds HTML report generation at the end of every invoke of ProfilerReport rule. * Update RuleEvaluationConditionMet to indicate end of the rule (aws#115) * fix: Remove the hard code notebook file path (aws#117) * Run rules tests in CI (aws#116) * Log fix memory issue fix (aws#113) * Changed the Herring API and variable names (aws#118) * Removing the functionality to attach the backward hook to the module (aws#125) * Removing the functionality to attach the backward hook to the module * Updated the number of traceevents as the backward hook is no longer registered. * Herring TF2 Native Graident Tape SMDebugger support (aws#122) * Fix bug in base hook (aws#127) * Minor bugfixes/changes in rules (aws#126) * minor bugfixes for rules * Updating batch size rule (aws#123) * fix for batch size rule * Dataloader rule (aws#108) * added dataloader rule and updated profiler report * Redesign TF dataloader metrics collection (aws#92) * Update profiler config parser to match latest SDK changes (aws#120) * Replaced herringsinglenode command with smddpsinglenode (aws#129) * Updating the version for profiler GA release (aws#124) * Updating the version for profiler GA release * Trigger Build * Trigger Build * Trigger Build * Fix paths in profiler report (aws#131) * changed path in profiler report * fixed env variable (aws#132) * making info log to debug from trace event parser as it is very verbose (aws#134) * Only do detailed profiling for supported TF versions. (aws#135) * Update PT tests (aws#136) * Fix bug in parser (aws#137) * smdistributed.dataparallel should be invoked from mpi command (aws#138) * smdistributed.dataparallel should be invoked from mpi command * Added comments * Bugfix: Invalid Worker (aws#139) * smdistributed.dataparallel environment check (aws#140) * smdistributed.dataparallel environment check * addressed comments * Modified check_smdataparallel_env logic * Install rules packages in PR CI (aws#143) * Removed the files and folders that are not required in the public repository * Removed the integration tests. * FIxed the pre-commit checks Co-authored-by: Vandana Kannan <[email protected]> Co-authored-by: Vikas-kum <[email protected]> Co-authored-by: Vandana Kannan <[email protected]> Co-authored-by: Nathalie Rauschmayr <[email protected]> Co-authored-by: Neelesh Dodda <[email protected]> Co-authored-by: Rajan Singh <[email protected]> Co-authored-by: sife <[email protected]> Co-authored-by: Anirudh <[email protected]> Co-authored-by: Vikas Kumar <[email protected]> Co-authored-by: Anirudh <[email protected]> Co-authored-by: Karan Jariwala <[email protected]> Co-authored-by: Nihal Harish <[email protected]> Co-authored-by: Miyoung <[email protected]> Co-authored-by: Miyoung Choi <[email protected]> Co-authored-by: Rahul Huilgol <[email protected]> Co-authored-by: Connor Goggins <[email protected]> Co-authored-by: JC-Gu <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Related: #74
I have not run this notebook, but I ran the related notebook above.
Help on class s3_input in module sagemaker.session:
class s3_input(builtins.object)
| Amazon SageMaker channel configurations for S3 data sources.
|
| Attributes:
| config (dict[str, dict]): A SageMaker
DataSource
referencing a SageMakerS3DataSource
.|
| Methods defined here:
|
| init(self, s3_data, distribution='FullyReplicated', compression=None, content_type=None, record_wrapping=None, s3_data_type='S3Prefix')
| Create a definition for input data used by an SageMaker training job.
|
| See AWS documentation on the
CreateTrainingJob
API for more details on the parameters.|
| Args:
| s3_data (str): Defines the location of s3 data to train on.
| distribution (str): Valid values: 'FullyReplicated', 'ShardedByS3Key'
| (default: 'FullyReplicated').
| compression (str): Valid values: 'Gzip', 'Bzip2', 'Lzop' (default: None).
| content_type (str): MIME type of the input data (default: None).
| record_wrapping (str): Valid values: 'RecordIO' (default: None).
| s3_data_type (str): Value values: 'S3Prefix', 'ManifestFile'. If 'S3Prefix',
s3_data
defines| a prefix of s3 objects to train on. All objects with s3 keys beginning with
s3_data
will| be used to train. If 'ManifestFile', then
s3_data
defines a single s3 manifest file, listing| each s3 object to train on. The Manifest file format is described in the SageMaker API documentation:
| https://aws.amazon.com/sagemaker/latest/dg/API_S3DataSource.html
|