-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge OpenAI Triton commit 86a2ac7
#2630
Conversation
…odegen bug (#4873)" (#4973) After investigation of the differences caused by triton-lang/triton#4774 in the internal tests, we concluded that they were introduced by change in the layouts selected for the reduce operations. Re-introducing that change, as it is functionally correct and should be beneficial for performance.
This commit adds initial support for scaled_dot with mxfp8 LHS and fp8 RHS. It supports both mfma32 and mfma16 intrinsic variants. Right now we are missing software emulation for `Float8E4M3FN` type, so this only enables for `Float8E5M2`.
…`interpreter.cc` (#4976) `#include <atomic>` is already used in other triton files, so I believe it's not a cardinally change. Changes come from triton-lang/triton#4045
Signed-off-by: Anatoly Myachev <[email protected]>
@whitneywhtsang I am ending this and stopping this activity for now as agreed with you offline :) |
@whitneywhtsang ready for review |
Is the pass rate degradation solely due to test_scaled_dot? Can we open an issue to fix that? |
Yes, simply because the number of parameter combinations has increased, before this PR this test also did not work on XPU. Will open. |
Looks like to the number of test cases are unchanged, but this PR marks the failures as skipped instead of xfailed, that's why pass rate is affected. |
Ah, it increased only for AMD, I see. |
This PR change the Triton base from 1d5fdfe to 86a2ac7 (Oct 28).
Pass rate: 99.83%->97.41%
Please do not squash and merge this PR.