-
Notifications
You must be signed in to change notification settings - Fork 634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Running iree-run-module-multi.mlir on a cpu device and a gpu device #19483
Comments
In our understanding of the code, it's trying to copy a CPU device buffer to a HIP device buffer using a The command buffer being used is set up in During execution of the The problem seems to be that the compiler doesn't know how to handle the situation where a buffer from one device needs to be transferred to another device and improperly codes the transfer as a |
Copy buffer is what should be used but there's an epic feature sprint worth of work or more (1+ quarters) to make buffer allocation, import/export, and lifetime management capable of supporting these kind of fine-grained cross-device transfers. Heterogeneous devices with discrete memory is not expected to work today. There's shorter-term work that could make heterogeneous devices that share the host address space and that do not not require registration (as most devices do) work by having a shared allocator via iree_hal_device_replace_allocator and implementing dyn_cast for host buffers but there's gotchas there. For at least the next quarter the focus is on homogeneous devices as there's a significant amount of work across the stack to make that efficient that acts as the foundation for the future heterogeneous support. |
(if you're interested in this area we should chat about what you're looking to do - there may be easier options than multi device!) |
Thanks for the feedback, Ben! I take it that you already have a plan for how copy_buffer should do cross-device transfers in general. Yes, I think we probably need to sync up soon. :-) |
Here is an issue reported regarding to two gpu devices: #19507 But this issue appears to be fixed when I use git hash ce659488d2ed07cc944d05e096b1f91b93f28709 |
What happened?
Compile the mlir file
iree-compile iree-run-module-multi.mlir --iree-execution-model=async-external --iree-hal-target-device=device_a=local[0] --iree-hal-target-device=device_b=hip[0] --iree-hal-local-target-device-backends=vmvx --iree-hip-target=gfx1103 -o cpu_gpu.vmfb
Run the generated vmfb file:
The output on screen:
Steps to reproduce your issue
The mlir file is inside the repo:
iree/tools/test/iree-run-module-multi.mlir
Line 10 in dea512b
What component(s) does this issue relate to?
Runtime
Version information
Git hash: 63cdc7d
Additional context
Compile on two cpu devices work fine
The text was updated successfully, but these errors were encountered: