-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[LoopUnroll] Do not pipeline epilog loops generated by loop unrolling #5027
Conversation
that's concerning. Do you know why? |
Still getting to the bottom. The problem went away with |
ah I see, so potentially a ptxas bug :( |
Yeah, I think so. May file a bug to NVIDIA once we are able to provide them a repro. For now I'm disabling the pipelining for epilog loops as it may not be profitable anyways. |
) There is a need of accessing the resulted epilog loop from the SC loop unroller. It'd clean and convenient to get that directly from the loop unroller instead of rescanning the whole function, as discussed in triton-lang/triton#5027 . I'm changing the result type of `loopUnrollByFactor` for that.
…#114573) There is a need of accessing the resulted epilog loop from the SC loop unroller. It'd clean and convenient to get that directly from the loop unroller instead of rescanning the whole function, as discussed in triton-lang/triton#5027 . I'm changing the result type of `loopUnrollByFactor` for that.
…triton-lang#5027) The epilog loop created by the loop unroller may not be run if the main unrolled loop covers all original loop iterations, thus pipelining it non-speculatively may not be beneficial. It can also cause some correctness issue when combined with the downstream PTXAS optimizer.
…triton-lang#5027) The epilog loop created by the loop unroller may not be run if the main unrolled loop covers all original loop iterations, thus pipelining it non-speculatively may not be beneficial. It can also cause some correctness issue when combined with the downstream PTXAS optimizer.
The epilog loop created by the loop unroller may not be run if the main unrolled loop covers all original loop iterations, thus pipelining it non-speculatively may not be beneficial. It can also cause some correctness issue when combined with the downstream PTXAS optimizer.