Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResNet CIFAR 10 generates scalar data faster #154

Merged
merged 1 commit into from
Dec 25, 2017
Merged

Conversation

mvsusp
Copy link
Contributor

@mvsusp mvsusp commented Dec 22, 2017

This change is explained in details in aws/sagemaker-python-sdk#26.

@mvsusp mvsusp requested review from djarpin, JunLyu and owen-t and removed request for JunLyu December 22, 2017 17:45
@@ -137,7 +138,8 @@
"\n",
"It takes a few minutes to provision containers and start the training job.**TensorBoard** will start to display metrics shortly after that.\n",
"\n",
"You can access **TensorBoard** locally at [http://localhost:6006](http://localhost:6006) or using your SageMaker notebook instance [proxy/6006/](/proxy/6006/)(TensorBoard will not work if forget to put the slash, '/', in end of the url). If TensorBoard started on a different port, adjust these URLs to match."
"You can access **TensorBoard** locally at [http://localhost:6006](http://localhost:6006) or using your SageMaker notebook instance [proxy/6006/](/proxy/6006/)(TensorBoard will not work if forget to put the slash, '/', in end of the url). If TensorBoard started on a different port, adjust these URLs to match.",
"This example uses the optional hyperparameter **```min_eval_frequency```** to generate training evaluations more often, allowing to visualize **TensorBoard** scalar data faster. You can find the available optional hyperparameters [here](https://github.com/aws/sagemaker-python-sdk#optional-hyperparameters)**."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the trailing "**" at the end of line 142 intentional? It's fine either way, but thought it might just be imbalanced markdown bolding.

@djarpin djarpin merged commit 0636049 into master Dec 25, 2017
@djarpin djarpin deleted the mvs-scalar-data branch August 17, 2018 19:57
atqy pushed a commit to atqy/amazon-sagemaker-examples that referenced this pull request Aug 16, 2022
* Make outdir optional arg, use default path in sagemaker environment, also change temp location when writing local files

* remove is_s3 import

* add tests and fix case when / is at the front of filepath

* add comments

* change to .tmp suffix

* update testing script to take a tag
atqy pushed a commit to atqy/amazon-sagemaker-examples that referenced this pull request Aug 16, 2022
* Add custom rule

* updated notebook

* Add rule script

* Expanding rule monitoring section and improving BYOR notebook (aws#180)

* Adding sagemaker example notebook

* Remocing unused training script

* Tornasole hook from config json (aws#104)

* creating tornasole hook from config

* making a quick variance fix (aws#99)

* Adding the change to convert ndarray to np.ndarray when operator is not available in mxnet.

* Cleanup and tests for TF and mxnet

* remove rmtree from s3 test

* Fixed the function invocation of get_numpy_reduction

* Changes to read from hardcoded path

* fixing pytorch test

* Setting SaveConfig per mode (aws#94)

* add doc for passing saveconfig specific to modes

* add save config for collection

* Create an option to build tornasole with no framework, TORNASOLE_FOR_RULES=1 (aws#95)

* add option to build only for rules

* Adding support to set save config per mode through json, also copying load collection method to all frameworks as that was missed

* remove set -ex from tests script since it prevents upload of reports

* move json config out of hooks

* Adding tests to create hook from tornasole configs for pytorch

* Change link of latest tornasole binaries (aws#120)

* change link to binary and introduce latest

* make container scripts working again

* remove -U

* fix path to ts binary in docker

* log when single process is to stdout

* Addressed the review comments. Added the correct asserts to check the reduction values. Added the test to test the training mode.

* Setup versioning (aws#119)

* added _verion.py and support

* fixed __init__.py

* Improve PR template (aws#128)

* Setup versioning (aws#134)

* added _verion.py and support

* fixed __init__.py

* using PEP 440 standard versioning it.

* Json Config Hook Tests (aws#129)

* added json config hook tests

* Add LossNotDecreasing rule and change how required tensors API works (aws#126)

* add loss rule and tests. refactoring rules api.

* Adding mxnet tests for hook_from_json (aws#143)

* Adding config file for reduce and save_all test scripts

* Fixing bug in mxnet reduction util
sloved issue aws#142

* Update build script for PT container

- modified S3 path to pick up from PT folder
- added parameter to enable installation of sagemaker_pytorch_container.whl into image

* mode writer support (aws#144)

* Add sagemaker docs and notebooks (aws#133)

* Changing link of latest binaries for 0.3 (aws#122)

* change link to binary and introduce latest

* make container scripts working again

* remove -U

* fix path to ts binary in docker

* log when single process is to stdout

* uploaded sagemaker docs

update analysis docs

remove sagemaker docs

update TF doc

add sagemaker docs

update api docs

change link for rules binary

add files from s3 bucket

* refactor positions

* minor changes

* fix links in old examples

* fix paths in integration tests

* Update test_training_end.py

* Update test_training_end.py

* Update integration_testing_rules.py

* bring back examples section in analysis readme

* create sagemaker-notebooks directory

* fix links

* remove accidental include of key

* update links, and update dev guide rules after changes in alpha

* Add new regions for container images (aws#147)

* update regions

* add check for tag

* add regions

* Make required tensors optional (aws#148)

* make required tensors optional

* Update README.md

* add a directory to clean in build binaries script

* Updating the notebooks to include good and bad exampels.

* Update scripts to build containers (aws#153)

* Update scripts to build containers

add a directory to clean in build binaries script

add policy

working container scripts for TF now added along with other frameworks

fix binary in container script

* Add script to tag as latest

* Sagemaker TF notebook (aws#145)

* Changing link of latest binaries for 0.3 (aws#122)

* change link to binary and introduce latest

* make container scripts working again

* remove -U

* fix path to ts binary in docker

* log when single process is to stdout

* uploaded sagemaker docs

update analysis docs

remove sagemaker docs

update TF doc

add sagemaker docs

update api docs

change link for rules binary

add files from s3 bucket

* refactor positions

* minor changes

* fix links in old examples

* fix paths in integration tests

* Update test_training_end.py

* Update test_training_end.py

* Update integration_testing_rules.py

* bring back examples section in analysis readme

* create sagemaker-notebooks directory

* fix links

* updated notebook for tf

* fix name of rule

* Delete README.md

* remove rules scripts

* Update tensorflow-simple.ipynb

* Update tensorflow-simple.ipynb

* add pytorch notebook from s3 (aws#156)

* Changes for temp location and out_dir with Sagemaker in mind (aws#154)

* Make outdir optional arg, use default path in sagemaker environment, also change temp location when writing local files

* remove is_s3 import

* add tests and fix case when / is at the front of filepath

* add comments

* change to .tmp suffix

* update testing script to take a tag

* Updated the uploader script to include pytorch scripts

* Updating the paths to the examples in the notebooks.

* Removed unnecessary copy

* resolving warning mesg of loading yaml (aws#149)

* Fix out dir bug (aws#160)

* fix out dir bug

* print mode.name instead of mode

* print mode.name instead of mode

* print mode.name instead of mode

* parallelize builds for pytorch and mxnet (aws#162)

* TF notebook (aws#163)

* Changing link of latest binaries for 0.3 (aws#122)

* change link to binary and introduce latest

* make container scripts working again

* remove -U

* fix path to ts binary in docker

* log when single process is to stdout

* uploaded sagemaker docs

update analysis docs

remove sagemaker docs

update TF doc

add sagemaker docs

update api docs

change link for rules binary

add files from s3 bucket

* refactor positions

* minor changes

* fix links in old examples

* fix paths in integration tests

* Update test_training_end.py

* Update test_training_end.py

* Update integration_testing_rules.py

* bring back examples section in analysis readme

* create sagemaker-notebooks directory

* fix links

* updated notebook for tf

* fix name of rule

* Delete README.md

* remove rules scripts

* Update tensorflow-simple.ipynb

* Update tensorflow-simple.ipynb

* add sagemaker args

* add model dir to resnet

* remove action style args in script and reindent

* update resnet example

* make num epochs take priority over num_batches

* change name of tf notebook

* Add updated sagemaker tf notebook

* change scripts to include all scripts in tf examples

* change names of estimators

* update files

* Updating the mxnet notebook

* Updating the mxnet notebook.

* Updated notebook as per review.

* Update mxnet.ipynb

* Update mxnet.ipynb

* Fixed the type of container from TensorFlow to MXNet.

* Pytorch Notebook Updates (aws#170)

* pytorch notebook

* Update pytorch.ipynb

* Update pytorch.ipynb

* Pytorch (aws#171)

* pytorch notebook

* Update pytorch.ipynb

* Update pytorch.ipynb

* Heading fix

* Expanding rule section and modifying BYOR

* make tf notebook same as alpha

* undo changes for rules, as that's now going into a different PR

* Revert "Expanding rule monitoring section and improving BYOR notebook (aws#180)"

This reverts commit 7f7c17c0f73b95f614859fa9ed05b29e50166eec.

* Add first party rules file

* update cloudwatch section
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants