Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge OpenAI Triton commit 86a2ac7 #2630

Merged
merged 6 commits into from
Nov 5, 2024
Merged

Merge OpenAI Triton commit 86a2ac7 #2630

merged 6 commits into from
Nov 5, 2024

Conversation

anmyachev
Copy link
Contributor

@anmyachev anmyachev commented Nov 5, 2024

This PR change the Triton base from 1d5fdfe to 86a2ac7 (Oct 28).
Pass rate: 99.83%->97.41%

Please do not squash and merge this PR.

pawelszczerbuk and others added 6 commits October 28, 2024 08:58
…odegen bug (#4873)" (#4973)

After investigation of the differences caused by
triton-lang/triton#4774 in the internal tests,
we concluded that they were introduced by change in the layouts selected
for the reduce operations. Re-introducing that change, as it is
functionally correct and should be beneficial for performance.
This commit adds initial support for scaled_dot with
mxfp8 LHS and fp8 RHS. It supports both mfma32
and mfma16 intrinsic variants.

Right now we are missing software emulation for
`Float8E4M3FN` type, so this only enables for
`Float8E5M2`.
…`interpreter.cc` (#4976)

`#include <atomic>` is already used in other triton files, so I believe
it's not a cardinally change.

Changes come from triton-lang/triton#4045
@anmyachev anmyachev marked this pull request as ready for review November 5, 2024 14:29
@anmyachev
Copy link
Contributor Author

@whitneywhtsang I am ending this and stopping this activity for now as agreed with you offline :)

@anmyachev
Copy link
Contributor Author

@whitneywhtsang ready for review

@whitneywhtsang
Copy link
Contributor

Is the pass rate degradation solely due to test_scaled_dot? Can we open an issue to fix that?

@anmyachev
Copy link
Contributor Author

Is the pass rate degradation solely due to test_scaled_dot? Can we open an issue to fix that?

Yes, simply because the number of parameter combinations has increased, before this PR this test also did not work on XPU. Will open.

@anmyachev
Copy link
Contributor Author

#2633

@whitneywhtsang
Copy link
Contributor

Is the pass rate degradation solely due to test_scaled_dot? Can we open an issue to fix that?

Yes, simply because the number of parameter combinations has increased, before this PR this test also did not work on XPU. Will open.

Looks like to the number of test cases are unchanged, but this PR marks the failures as skipped instead of xfailed, that's why pass rate is affected.

@anmyachev
Copy link
Contributor Author

Is the pass rate degradation solely due to test_scaled_dot? Can we open an issue to fix that?

Yes, simply because the number of parameter combinations has increased, before this PR this test also did not work on XPU. Will open.

Looks like to the number of test cases are unchanged, but this PR marks the failures as skipped instead of xfailed, that's why pass rate is affected.

Ah, it increased only for AMD, I see.

@anmyachev anmyachev merged commit 1fc59f6 into main Nov 5, 2024
4 checks passed
@anmyachev anmyachev deleted the amyachev/merge4 branch November 5, 2024 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants