-
Notifications
You must be signed in to change notification settings - Fork 440
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revert edf18ce
and fix launch dimension triplet
#19582
base: main
Are you sure you want to change the base?
Conversation
…han the hardware limit per dim." This reverts commit edf18ce.
Launch dimension should be of the form ((block.x, 1, 1), (thread.x, thready, 1)) to accommodate checks in (parallel_loop_emitter.cc)[https://github.com/openxla/xla/blob/main/xla/service/gpu/parallel_loop_emitter.cc#L169-L171]
071c2ba
and fix launch dimension tripletedf18ce
and fix launch dimension triplet
@olegshyshkov This patch tries to address some of the violations introduced in edf18ce |
I would prefer that we don't revert edf18ce. parallel_loop_emitter.cc is a very old part of the emitter that is used only for a handful special instruction, so I wouldn't use it as a ground of truth. That change was aimed for Nvidia GPU, but I understand that ROCm has different requirement. I think the solution here is to have a backend-specific logic for the check and to distribute blocks. |
@olegshyshkov I encountered a failure in jax maxtext model at parallel_loop_emitter that blocks.y > 1. The same would fail for nvidia as well. That was the reason for reverting. |
Interesting. Could you share an HLO snippet with the fusion that causes the failure? |
@olegshyshkov I am trying to get a minimal working hlo snippet to reproduce this error. I am facing problems with hlo_bisect as it is aborting for various reason. For now I am attaching the stack trace with run_hlo_module utility
Let me know if it helps, I will continue to get a working hlo snippet. |
@olegshyshkov Following snippet can cause the error
But I am not able to reproduce this on this branch. When I run run_hlo_module it gives llvm finger print error
But can be reproduced with |
Also On NVIDIA GPUs, you may have to increase the tensor size so as to exceed block_dim_limit().x in https://github.com/openxla/xla/blob/main/xla/service/gpu/launch_dimensions.cc#L45 |
Owing to checks in https://github.com/openxla/xla/blob/main/xla/service/gpu/parallel_loop_emitter.cc#L169-L171 launch dimension can be of the form ((block.x, 1, 1), (thread.x, thread.y, 1)). And in ROCm it is expected that (block.x * thread.x) <= 0xFFFFFFFF