float8 training: fix bug with AC + compile #1329

vkuzo · 2024-11-22T19:22:34Z

Summary:

In #1306 I accidentally broke torchtitan + float8 + AC + compile.

I don't have a non-torchtitan repro now, putting up the fix first to ensure torchtitan still works, and we should follow-up later with adding test coverage to torchao to prevent similar breakages in the future.

What broke:

in the forward of Float8Linear, we were setting an attribute on the module
^ is not supported with compile + something how torchtitan specifically calls AC

The fix: remove this attribute setting altogether. Unfortunately this breaks an edge case feature for ensuring scales are reprensentable in float16. Since float16 training is not commonly used with float8 and this feature was added during very early testing, removing this for now is fine.

If we need to add this feature back in the future, I'd advocate for doing it via explicit configuration such as config.set_scale_upper_bound and avoiding the stateful hacks, which are usually not compiler friendly.

Test Plan:

// this repo
./test/float8/test_everything.sh

// torchtitan - broken before this PR, works after this PR
with-proxy CONFIG_FILE="./train_configs/llama3_8b.toml" ./run_llama_train.sh --float8.enable_float8_linear --training.compile

Reviewers:

Subscribers:

Tasks:

Tags:

Summary: In #1306 I accidentally broke torchtitan + float8 + AC + compile. I don't have a non-torchtitan repro now, putting up the fix first to ensure torchtitan still works, and we should follow-up later with adding test coverage to torchao to prevent similar breakages in the future. What broke: * in the forward of `Float8Linear`, we were setting an attribute on the module * ^ is not supported with compile + something how torchtitan specifically calls AC The fix: remove this attribute setting altogether. Unfortunately this breaks an edge case feature for ensuring scales are reprensentable in `float16`. Since `float16` training is not commonly used with `float8` and this feature was added during very early testing, removing this for now is fine. If we need to add this feature back in the future, I'd advocate for doing it via explicit configuration such as `config.set_scale_upper_bound` and avoiding the stateful hacks, which are usually not compiler friendly. Test Plan: ``` // this repo ./test/float8/test_everything.sh // torchtitan - broken before this PR, works after this PR with-proxy CONFIG_FILE="./train_configs/llama3_8b.toml" ./run_llama_train.sh --float8.enable_float8_linear --training.compile ``` Reviewers: Subscribers: Tasks: Tags:

pytorch-bot · 2024-11-22T19:22:38Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1329

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

[DomainsOnly] Jobs fail with GLIBC version not found

✅ No Failures

As of commit a8ccff4 with merge base 7489c7d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Summary: In pytorch#1306 I accidentally broke torchtitan + float8 + AC + compile. I don't have a non-torchtitan repro now, putting up the fix first to ensure torchtitan still works, and we should follow-up later with adding test coverage to torchao to prevent similar breakages in the future. What broke: * in the forward of `Float8Linear`, we were setting an attribute on the module * ^ is not supported with compile + something how torchtitan specifically calls AC The fix: remove this attribute setting altogether. Unfortunately this breaks an edge case feature for ensuring scales are reprensentable in `float16`. Since `float16` training is not commonly used with `float8` and this feature was added during very early testing, removing this for now is fine. If we need to add this feature back in the future, I'd advocate for doing it via explicit configuration such as `config.set_scale_upper_bound` and avoiding the stateful hacks, which are usually not compiler friendly. Test Plan: ``` // this repo ./test/float8/test_everything.sh // torchtitan - broken before this PR, works after this PR with-proxy CONFIG_FILE="./train_configs/llama3_8b.toml" ./run_llama_train.sh --float8.enable_float8_linear --training.compile ``` Reviewers: Subscribers: Tasks: Tags:

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2024

vkuzo added topic: bug fix Use this tag for PRs that fix bugs and removed CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. labels Nov 22, 2024

drisspg approved these changes Nov 22, 2024

View reviewed changes

weifengpy approved these changes Nov 22, 2024

View reviewed changes

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 22, 2024

vkuzo merged commit f3c1a00 into main Nov 22, 2024
22 checks passed

vkuzo mentioned this pull request Nov 26, 2024

[float8] Allow specifying arbitrary dtype for each tensor #1326

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

float8 training: fix bug with AC + compile #1329

float8 training: fix bug with AC + compile #1329

vkuzo commented Nov 22, 2024

pytorch-bot bot commented Nov 22, 2024 •

edited

Loading

float8 training: fix bug with AC + compile #1329

float8 training: fix bug with AC + compile #1329

Conversation

vkuzo commented Nov 22, 2024

pytorch-bot bot commented Nov 22, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/ao/1329

❗ 1 Active SEVs

✅ No Failures

pytorch-bot bot commented Nov 22, 2024 •

edited

Loading