-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
warp_group_dot lowering crashes for specific instruction shape #5102
Comments
I can confirm that the crash goes away if I bring back this line: cfddb09#diff-c05cf3aed297bf0c5f1296cc40c522b00fb300c7a4340a1f6be5b0bbe2c42039L2048 Though I have no idea at the moment if what we then end up producing is correct at all. But that does seem to strongly imply that #5009 is the culprit. |
It shouldn't be possible for f16 MMA to have instrShape K=32, since instrShapeK is calculated as As long as the above relationships are true, the current logic in But code quality wise I think it makes sense to have |
Thanks for looking into this so quickly! Yes, the TTGIR came from an even worse example where the result layout doesn't match the LHS's parent layout. There's a slack thread where I asked about this. Somehow Triton got into an edge case when trying to optimize an internal implementation of flash attention we have. The first step here would be to implement a better verifier for warp_group_dot / dot_op layout, which would make it easier to root cause the offending optimization pass, and then we can fix it. I was planning to try getting some time to do that in the next couple of weeks (though I'm also happy to get help if anyone else has spare cycles). |
Fixes #5102 The logic in `getTotalElemsPerThreadForOperand` should now directly match that in `SharedToDotOperandMMAv2OrV3`
…n-lang#5105) Fixes triton-lang#5102 The logic in `getTotalElemsPerThreadForOperand` should now directly match that in `SharedToDotOperandMMAv2OrV3`
Somewhere between 68aa962 and 8aedb5e the following IR started to crash:
(Note that on earlier Triton versions we need to replace
kWidth = 2
withkWidth = 0
in the example, since this parameter changed from having to be 0 to having to be non-0.)It seems to be related to the instruction shape, since it works with
instrShape = [16, 64, 16]
, but fails withinstrShape = [16, 64, 32]
.This is the stack trace:
My hunch is that it could be related to #5009 ? CC @ggengnv
The text was updated successfully, but these errors were encountered: