-
Notifications
You must be signed in to change notification settings - Fork 357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Framework splitting #9318
Conversation
✅ Deploy Preview for determined-ui canceled.
|
7f220e2
to
89cd926
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #9318 +/- ##
=======================================
Coverage 45.28% 45.29%
=======================================
Files 1227 1227
Lines 154048 154051 +3
Branches 2404 2404
=======================================
+ Hits 69766 69776 +10
+ Misses 84090 84083 -7
Partials 192 192
Flags with carried forward coverage won't be shown. Click here to find out more.
|
815074b
to
a55c9cb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CI config changes seemed reasonable to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggested edits
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am new to determined and new to this code set, but the changes look good to me! I made one minor comment about an import within a test method instead of at the module level. Feel free to take it or leave it :). Thanks for the changes!
a55c9cb
to
f940f44
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirming that experiment image gets updated
"cpu": "determinedai/pytorch-tensorflow-cpu-dev:8b3bea3", | ||
"cuda": "determinedai/pytorch-tensorflow-cuda-dev:8b3bea3", | ||
"cpu": "determinedai/tensorflow-ngc-dev:8b3bea3", | ||
"cuda": "determinedai/tensorflow-ngc-dev:8b3bea3\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo?
do we install tensorboard in the pytorch-only images? IIRC, launching a tensorboard from an experiment will default to the image defined in an experiment. another thing to test would be if you launch a tensorboard with multiple experiments, where one has pytorch-only and another is tf-only. after this change, what image do we expect to be used to launch NTSC? |
Yes we do, that's done in environments: https://github.com/determined-ai/environments/blob/8b3bea3b81bd3934ad885bdf159320f11f6ea1ba/Dockerfile-pytorch-ngc#L21
I'll verify this and then add it to testing instructions.
After this change, we expect to launch in |
We have this page in our docs: https://docs.determined.ai/latest/model-dev-guide/prepare-container/set-environment-images.html#set-environment-images |
seems like that environments image only installs the profilers? where does actual also, if we only install |
They do all load in tensorboard right now. See the following image, where trial 3 is a pytorch mnist trial while trials 4-6 are iris tfkeras trials. Note that the tensorboard inherits the image used in the trial it is associated with. I tried with |
7c952f7
to
3d3503e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
* make ngc-pytorch images default * fix a profiler bug * update docs for ngc image changes
This reverts commit a96cafd.
Ticket
MD-272
Description
Makes pytorch-ngc our default image. Sets tensorflow-ngc in tests when required.
Moves legacy PyTorchTrial launcher from
horovod
totorch.distributed.launch
.Lots of docs updates.
Update master restart test to include a configurable timeout. In this case, we have to pull the pytorch image, so we timeout. This will be fixed in a later
environments
PR, but this has to be merged first.Also splits our tests a bit better at the directory level to make a clearer distinction between PyTorch and TensorFlow.
Test Plan
Unit tests pass, experiments default to pytorch-ngc images.
Verify that Tensorboard launches from ngc-based image experiments.
Checklist
docs/release-notes/
.See Release Note for details.