Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Tensorboard Profilers #240

Merged
merged 1 commit into from
Jan 16, 2024
Merged

fix: Tensorboard Profilers #240

merged 1 commit into from
Jan 16, 2024

Conversation

MikhailKardash
Copy link
Contributor

Description

Reinstate the tensorboard profilers after they were removed in a PR.

Checklist

  • Bump VERSION to make the pushed images are tagged with the right version.
  • Licenses should be included for new code which was copied and/or modified from any external code.
  • Test the images by running the test bumpenvs procedure in the determined repo. See README.

@cla-bot cla-bot bot added the cla-signed label Jan 16, 2024
@MikhailKardash MikhailKardash marked this pull request as ready for review January 16, 2024 17:44
@MikhailKardash MikhailKardash merged commit f66cbce into main Jan 16, 2024
2 checks passed
MikhailKardash added a commit that referenced this pull request Jan 31, 2024
* Add tensorboard profilers back into images
rb-determined-ai pushed a commit that referenced this pull request Feb 2, 2024
* Add tensorboard profilers back into images
@MikhailKardash MikhailKardash deleted the tensorboard_profilers branch March 21, 2024 16:20
soohoonchoi added a commit that referenced this pull request Mar 29, 2024
* fix: Tensorboard Profilers (#240)

* Add tensorboard profilers back into images

* we don't need 3.9 yet

* wrong tag

* build hpc/ngc together and update makefile

* version matrix and update comment

* profiler arg relocation

* address some duplicates

* formatting and libnss

* yaml formatting

* use actual yaml linter

* relocate again

* backport additional-requirements-torch and bump VERSION

* additional-requirements for tf

* bash syntax

* cleanup dockerfiles, remove duplicate publishing steps, correct a dockerfile

* try different syntax

* semicolons

* version pin and revert

* pip

* try python 3.10

* maybe it's a concurrency thing

* no more version pin

* ngc dockerfile cleanup

* bump version file, minor formatting, publish artifacts

* debian frontend google

* google_cloud_cli...

* cloud cli?

* minor cleanup

* version-matrix update and lots of formatting

* unparametrize deepspeed

* oops

* update nvidia drivers to 535.161.07 (#246)

minor version upgrade

* version bump

* feat: NGC+ Image Template (#235)

* add templates for NGC+ images

* add image matrix

* backport lots of improvements to scripts

* remove tf2.8 images

* Removed a duplicate line

* Added WITH_NCCL option to the Dockerfile-ngc-tf

---------

Co-authored-by: Michael Kardash <[email protected]>
Co-authored-by: Hamid Zare <[email protected]>
soohoonchoi added a commit that referenced this pull request Apr 30, 2024
* fix: Tensorboard Profilers (#240)

* Add tensorboard profilers back into images

* update nvidia drivers to 535.161.07 (#246)

minor version upgrade

* feat: NGC+ Image Template (#235)

* add templates for NGC+ images

* add image matrix

* backport lots of improvements to scripts

* remove tf2.8 images

* fix: dependabot alert for `jupyterlab-3.6.7`. (#241)

* Add support to build the tf2-gpu image for Libfabric(OFI) (#251)

* Add support to build the tf2-gpu image for Libfabric(OFI), which incorporates the AWS libfabric plug-in for NCCL to use on Slingshot 11(SS11) networks.

* Increment the VERSION to 0.31.1

* feat: update ngc version (#253)

* feat: update ngc version

* feat: Update naming (#252)

* renaming a bunch of stuff, removing py3.8 and old cuda, changes to CI job names

---------

Co-authored-by: Michael Kardash <[email protected]>
Co-authored-by: Hamid Zare <[email protected]>
Co-authored-by: Ilia Glazkov <[email protected]>
Co-authored-by: Jerry G <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants