-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FP8 + FSDP2 + torch.compile examples for PyTorch Lightning and Fabric #20440
Conversation
for more information, see https://pre-commit.ci
1cf53e6
to
55a6fde
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #20440 +/- ##
=======================================
Coverage 88% 88%
=======================================
Files 267 267
Lines 23203 23274 +71
=======================================
+ Hits 20313 20381 +68
- Misses 2890 2893 +3 |
Investigating CI failures in isolation, the same code succeeds when running standalone but something's up. |
Also, updates to docs at https://lightning.ai/docs/pytorch/stable/advanced/model_parallel/ will be coming as a follow up |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think would be easier and more robust to refer the examples directly in docs and not copy the code which maybe yield in accidently updating only one of them...
That's fair, will code examples expand on the page? Also, docs skip the torch ao part, since it's an optional / cutting-edge dependency (it depends on triton nightly) so it only appears in the examples |
for more information, see https://pre-commit.ci
Ok CI has been fixed. Merging, the only failing tests are doc links to master (which don't exist yet until this is merged). |
What does this PR do?
This PR adds:
torch.compile
for both PyTorch Lightning and Fabric, together with the respective READMEstorch.compile
fromconfigure_module
andparallelize_fn
when usingModelParallelStrategy
torch.compile
called fromconfigure_module
andparallelize_fn
in PyTorch Lightning and FabricThis PR unearthed several issues with integration tests, which are addressed here to get CI green.
cc @Borda @awaelchli @justusschock