Add fp8 (torchao)/fsdp2/torch_compile handlers and tests #20445
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
[Closed the previous PR and open an new one as examples]
Add fp8/fsdp2/torch_compile handlers and tests. Since FP8 need to bundle with compile on same dedicated layers to achieve real memory reduction and speed up so put them together. FSPD2 is a plus since FSDP1 solution needs a bit hacky grad unrolling when doing with torch compile.
Note:
model.model.layers
precompute_float8_dynamic_scale_for_fsdp
and syncingsync_float8_amax_and_scale_history
on Amax or scale values may need custom callbacks or change in model hooks, will share an example in the integration example.TODO:
Add examples with lightning training (FSDP1/2 + FP8 + Torch Compile)
Related PR: #20440
Fixes #<issue_number>
Before submitting
PR review
Anyone in the community is welcome to review the PR.
Before you start reviewing, make sure you have read the review guidelines. In short, see the following bullet-list:
Reviewer checklist
📚 Documentation preview 📚: https://pytorch-lightning--20445.org.readthedocs.build/en/20445/