-
Notifications
You must be signed in to change notification settings - Fork 6.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tensorboard not displaying scalars #153
Comments
This issue if duplicated here aws/sagemaker-python-sdk#26 |
atqy
pushed a commit
to atqy/amazon-sagemaker-examples
that referenced
this issue
Aug 16, 2022
* Update scripts to build containers add a directory to clean in build binaries script add policy working container scripts for TF now added along with other frameworks fix binary in container script * Add script to tag as latest
atqy
pushed a commit
to atqy/amazon-sagemaker-examples
that referenced
this issue
Aug 16, 2022
* Add custom rule * updated notebook * Add rule script * Expanding rule monitoring section and improving BYOR notebook (aws#180) * Adding sagemaker example notebook * Remocing unused training script * Tornasole hook from config json (aws#104) * creating tornasole hook from config * making a quick variance fix (aws#99) * Adding the change to convert ndarray to np.ndarray when operator is not available in mxnet. * Cleanup and tests for TF and mxnet * remove rmtree from s3 test * Fixed the function invocation of get_numpy_reduction * Changes to read from hardcoded path * fixing pytorch test * Setting SaveConfig per mode (aws#94) * add doc for passing saveconfig specific to modes * add save config for collection * Create an option to build tornasole with no framework, TORNASOLE_FOR_RULES=1 (aws#95) * add option to build only for rules * Adding support to set save config per mode through json, also copying load collection method to all frameworks as that was missed * remove set -ex from tests script since it prevents upload of reports * move json config out of hooks * Adding tests to create hook from tornasole configs for pytorch * Change link of latest tornasole binaries (aws#120) * change link to binary and introduce latest * make container scripts working again * remove -U * fix path to ts binary in docker * log when single process is to stdout * Addressed the review comments. Added the correct asserts to check the reduction values. Added the test to test the training mode. * Setup versioning (aws#119) * added _verion.py and support * fixed __init__.py * Improve PR template (aws#128) * Setup versioning (aws#134) * added _verion.py and support * fixed __init__.py * using PEP 440 standard versioning it. * Json Config Hook Tests (aws#129) * added json config hook tests * Add LossNotDecreasing rule and change how required tensors API works (aws#126) * add loss rule and tests. refactoring rules api. * Adding mxnet tests for hook_from_json (aws#143) * Adding config file for reduce and save_all test scripts * Fixing bug in mxnet reduction util sloved issue aws#142 * Update build script for PT container - modified S3 path to pick up from PT folder - added parameter to enable installation of sagemaker_pytorch_container.whl into image * mode writer support (aws#144) * Add sagemaker docs and notebooks (aws#133) * Changing link of latest binaries for 0.3 (aws#122) * change link to binary and introduce latest * make container scripts working again * remove -U * fix path to ts binary in docker * log when single process is to stdout * uploaded sagemaker docs update analysis docs remove sagemaker docs update TF doc add sagemaker docs update api docs change link for rules binary add files from s3 bucket * refactor positions * minor changes * fix links in old examples * fix paths in integration tests * Update test_training_end.py * Update test_training_end.py * Update integration_testing_rules.py * bring back examples section in analysis readme * create sagemaker-notebooks directory * fix links * remove accidental include of key * update links, and update dev guide rules after changes in alpha * Add new regions for container images (aws#147) * update regions * add check for tag * add regions * Make required tensors optional (aws#148) * make required tensors optional * Update README.md * add a directory to clean in build binaries script * Updating the notebooks to include good and bad exampels. * Update scripts to build containers (aws#153) * Update scripts to build containers add a directory to clean in build binaries script add policy working container scripts for TF now added along with other frameworks fix binary in container script * Add script to tag as latest * Sagemaker TF notebook (aws#145) * Changing link of latest binaries for 0.3 (aws#122) * change link to binary and introduce latest * make container scripts working again * remove -U * fix path to ts binary in docker * log when single process is to stdout * uploaded sagemaker docs update analysis docs remove sagemaker docs update TF doc add sagemaker docs update api docs change link for rules binary add files from s3 bucket * refactor positions * minor changes * fix links in old examples * fix paths in integration tests * Update test_training_end.py * Update test_training_end.py * Update integration_testing_rules.py * bring back examples section in analysis readme * create sagemaker-notebooks directory * fix links * updated notebook for tf * fix name of rule * Delete README.md * remove rules scripts * Update tensorflow-simple.ipynb * Update tensorflow-simple.ipynb * add pytorch notebook from s3 (aws#156) * Changes for temp location and out_dir with Sagemaker in mind (aws#154) * Make outdir optional arg, use default path in sagemaker environment, also change temp location when writing local files * remove is_s3 import * add tests and fix case when / is at the front of filepath * add comments * change to .tmp suffix * update testing script to take a tag * Updated the uploader script to include pytorch scripts * Updating the paths to the examples in the notebooks. * Removed unnecessary copy * resolving warning mesg of loading yaml (aws#149) * Fix out dir bug (aws#160) * fix out dir bug * print mode.name instead of mode * print mode.name instead of mode * print mode.name instead of mode * parallelize builds for pytorch and mxnet (aws#162) * TF notebook (aws#163) * Changing link of latest binaries for 0.3 (aws#122) * change link to binary and introduce latest * make container scripts working again * remove -U * fix path to ts binary in docker * log when single process is to stdout * uploaded sagemaker docs update analysis docs remove sagemaker docs update TF doc add sagemaker docs update api docs change link for rules binary add files from s3 bucket * refactor positions * minor changes * fix links in old examples * fix paths in integration tests * Update test_training_end.py * Update test_training_end.py * Update integration_testing_rules.py * bring back examples section in analysis readme * create sagemaker-notebooks directory * fix links * updated notebook for tf * fix name of rule * Delete README.md * remove rules scripts * Update tensorflow-simple.ipynb * Update tensorflow-simple.ipynb * add sagemaker args * add model dir to resnet * remove action style args in script and reindent * update resnet example * make num epochs take priority over num_batches * change name of tf notebook * Add updated sagemaker tf notebook * change scripts to include all scripts in tf examples * change names of estimators * update files * Updating the mxnet notebook * Updating the mxnet notebook. * Updated notebook as per review. * Update mxnet.ipynb * Update mxnet.ipynb * Fixed the type of container from TensorFlow to MXNet. * Pytorch Notebook Updates (aws#170) * pytorch notebook * Update pytorch.ipynb * Update pytorch.ipynb * Pytorch (aws#171) * pytorch notebook * Update pytorch.ipynb * Update pytorch.ipynb * Heading fix * Expanding rule section and modifying BYOR * make tf notebook same as alpha * undo changes for rules, as that's now going into a different PR * Revert "Expanding rule monitoring section and improving BYOR notebook (aws#180)" This reverts commit 7f7c17c0f73b95f614859fa9ed05b29e50166eec. * Add first party rules file * update cloudwatch section
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The notebook example with Tensorboard
amazon-sagemaker-examples/sagemaker-python-sdk/tensorflow_resnet_cifar10_with_tensorboard/tensorflow_resnet_cifar10_with_tensorboard.ipynb
is not displaying scalars or images. Only the graph and projector are displayed.If one run is terminated and a new one is started (using the same
base_job_name
so it starts from the previously saved checkpoint) by running again:estimator.fit(inputs, run_tensorboard_locally=True)
then the scalars and images of the previous run are displayed on
Tensorboard
but they are not updated as training continues.The text was updated successfully, but these errors were encountered: