Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compiling iree-run-module-multi.mlir on two different devices may need a way to specify iree-hal-local-target-device-backends for each device #19509

Open
dezhiAmd opened this issue Dec 18, 2024 · 3 comments
Labels
bug 🐞 Something isn't working

Comments

@dezhiAmd
Copy link

What happened?

Compile command:
./iree-compile ~/iree/tools/test/iree-run-module-multi.mlir --iree-execution-model=async-external --iree-hal-target-device=device_a=local[0] --iree-hal-target-device=device_b=hip[0] --iree-hal-local-target-device-backends=vmvx --iree-hip-target=gfx942 -o cpu_gpu.vmfb

Output:

iree-compile: iree/third_party/llvm-project/llvm/include/llvm/ADT/STLExtras.h:483: decltype(auto) llvm::filter_iterator_base<mlir::Region::OpIterator, bool (*)(mlir::Operation &), std::forward_iterator_tag>::operator*() const [WrappedIteratorT = mlir::Region::OpIterator, PredicateT = bool (*)(mlir::Operation &), IterTag = std::forward_iterator_tag]: Assertion `BaseT::wrapped() != End && "Cannot dereference end iterator!"' failed.
Please report issues to https://github.com/iree-org/iree/issues and include the crash backtrace.
Stack dump:
0.      Program arguments: ./iree-compile ~/iree/tools/test/iree-run-module-multi.mlir --iree-execution-model=async-external --iree-hal-target-device=device_a=local[0] --iree-hal-target-device=device_b=hip[0] --iree-hal-local-target-device-backends=vmvx --iree-hip-target=gfx942 -o cpu_gpu.vmfb
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  libIREECompiler.so 0x00007fe087f96f67 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 39
1  libIREECompiler.so 0x00007fe087f951a0 llvm::sys::RunSignalHandlers() + 80
2  libIREECompiler.so 0x00007fe087f9762a
3  libc.so.6          0x00007fe081926520
4  libc.so.6          0x00007fe08197a9fc pthread_kill + 300
5  libc.so.6          0x00007fe081926476 raise + 22
6  libc.so.6          0x00007fe08190c7f3 abort + 211
7  libc.so.6          0x00007fe08190c71b
8  libc.so.6          0x00007fe08191de96
9  libIREECompiler.so 0x00007fe08a4b6892
10 libIREECompiler.so 0x00007fe08a7f9b05
11 libIREECompiler.so 0x00007fe08a4b64a9
12 libIREECompiler.so 0x00007fe0881429f5 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 629
13 libIREECompiler.so 0x00007fe088143168 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 328
14 libIREECompiler.so 0x00007fe0881475eb
15 libIREECompiler.so 0x00007fe08970372a
16 libIREECompiler.so 0x00007fe0881429f5 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 629
17 libIREECompiler.so 0x00007fe088143168 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 328
18 libIREECompiler.so 0x00007fe0881475eb
19 libIREECompiler.so 0x00007fe089704054
20 libIREECompiler.so 0x00007fe0881429f5 mlir::detail::OpToOpPassAdaptor::run(mlir::Pass*, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int) + 629
21 libIREECompiler.so 0x00007fe088143168 mlir::detail::OpToOpPassAdaptor::runPipeline(mlir::OpPassManager&, mlir::Operation*, mlir::AnalysisManager, bool, unsigned int, mlir::PassInstrumentor*, mlir::PassInstrumentation::PipelineParentInfo const*) + 328
22 libIREECompiler.so 0x00007fe088145529 mlir::PassManager::run(mlir::Operation*) + 969
23 libIREECompiler.so 0x00007fe087eeaea0 ireeCompilerInvocationPipeline + 3408
24 libIREECompiler.so 0x00007fe088108cd8
25 libIREECompiler.so 0x00007fe088108511
26 libc.so.6          0x00007fe08190dd90
27 libc.so.6          0x00007fe08190de40 __libc_start_main + 128
28 iree-compile       0x000055da58cdb6c5
Aborted (core dumped)

Steps to reproduce your issue

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

What component(s) does this issue relate to?

No response

Version information

No response

Additional context

No response

@dezhiAmd dezhiAmd added the bug 🐞 Something isn't working label Dec 18, 2024
@dezhiAmd
Copy link
Author

iree-hal-local-target-device-backends=rocm , now this will compile:

./iree-compile ~/iree/tools/test/iree-run-module-multi.mlir --iree-execution-model=async-external --iree-hal-target
-device=device_a=local[0] --iree-hal-target-device=device_b=hip[0] --iree-hal-local-target-device-backends=rocm --iree-hip-target=gfx942 -o cpu_gpu.vmfb

But run it will fail:
./iree-run-module --module=cpu_gpu.vmfb --function=mutli_device_mul --input=4xf32=10,11,12,13 --device=local-task --device=hip:0 --trace_execution=true --task_topology_group_count=1

The result:

[module.__init+00000000]    <block>
[module.__init+00000001]    %i0 = vm.const.i32 13  // 0x0000000D
[module.__init+00000008]    %i1 = vm.const.i32 28  // 0x0000001C
[module.__init+0000000F]    %r0 = vm.const.ref.zero
[module.__init+00000012]    %i2 = vm.const.i32 2  // 0x00000002
[module.__init+00000019]    %i3 = vm.const.i32 3  // 0x00000003
[module.__init+00000020]    %r1 = vm.const.ref.zero
[module.__init+00000023]    %i4 = vm.const.i32 1  // 0x00000001
[module.__init+0000002A]    %i6:7 = vm.const.i64 64  // 0x0000000000000040
[module.__init+00000035]    %i8:9 = vm.const.i64 16  // 0x0000000000000010
[module.__init+00000040]    %i10:11 = vm.const.i64 -1  // 0xFFFFFFFFFFFFFFFF
[module.__init+0000004B]    %i5 = vm.const.i32 14  // 0x0000000E
[module.__init+00000052]    %i12 = vm.const.i32 5  // 0x00000005
[module.__init+00000059]    %i13 = vm.const.i32.zero
[module.__init+0000005C]    %i14:15 = vm.const.i64.zero
[module.__init+0000005F]    %i16:17 = vm.const.i64 1  // 0x0000000000000001
[module.__init+0000006A]    %r2 = vm.const.ref.zero
[module.__init+0000006D]    %i18 = vm.call @hal.devices.count()
[module.__init+00000078]    %i18:19 = vm.ext.i32.i64.s %i18(2)
[module.__init+0000007D]    vm.br ^00000098(%r2(null)->%r3, %i14(0)->%i20, %i15(0)->%i21, %i14(0)->%i22, %i15(0)->%i23)
[module.__init+00000099]    %i24 = vm.cmp.nz.ref %r3(null)
[module.__init+0000009E]    %i24 = vm.xor.i32 %i24(0), %i4(1)
[module.__init+000000A5]    %i25 = vm.cmp.lt.i64.s %i20:21(0), %i18:19(2)
[module.__init+000000AC]    %i25 = vm.and.i32 %i24(1), %i25(1)
[module.__init+000000B3]    vm.cond_br %i25(1), ^000000C2(), ^0000018C()
[module.__init+000000C3]    %i24 = vm.trunc.i64.i32 %i20:21(0)
[module.__init+000000C8]    %r3 = vm.call @hal.devices.get(%i24(0))
[module.__init+000000D6]    %r4 = vm.const.ref.rodata 0  // 0x0x55fd296f4010 13b
[module.__init+000000DD]    %r5 = vm.const.ref.rodata 1  // 0x0x55fd296f402c 6b
[module.__init+000000E4]    %i24, %i26 = vm.call @hal.device.query.i64(%r3(!hal.device/0x0x55fd29704690), %r4(!vm.buffer/0x0x55fd296f1458), %r5(!vm.buffer/0x0x55fd296f1480))
[module.__init+000000F8]    %i25 = vm.cmp.nz.i64 %i26:27(1)
[module.__init+000000FD]    %i24 = vm.select.i32 %i24(1) ? %i25(1) : %i13(0)
[module.__init+00000106]    vm.cond_br %i24(1), ^0000011A(), ^00000152(%i13(0)->%i24)
[module.__init+0000011B]    %r4 = vm.const.ref.rodata 2  // 0x0x55fd296f4040 21b
[module.__init+00000122]    %r5 = vm.const.ref.rodata 3  // 0x0x55fd296f4064 13b
[module.__init+00000129]    %i24, %i26 = vm.call @hal.device.query.i64(%r3(!hal.device/0x0x55fd29704690), %r4(!vm.buffer/0x0x55fd296f14a8), %r5(!vm.buffer/0x0x55fd296f14d0))
[module.__init+0000013C]    %i25 = vm.cmp.nz.i64 %i26:27(0)
[module.__init+00000141]    %i24 = vm.select.i32 %i24(1) ? %i25(0) : %i13(0)
[module.__init+0000014A]    vm.br ^00000152()
[module.__init+00000153]    %i25 = vm.cmp.eq.i64 %i22:23(0), %i14:15(0)
[module.__init+0000015A]    %i26:27 = vm.select.i64 %i24(0) ? %i16:17(1) : %i14:15(0)
[module.__init+00000163]    %i22:23 = vm.add.i64 %i22:23(0), %i26:27(0)
[module.__init+0000016A]    %i24 = vm.and.i32 %i24(0), %i25(1)
[module.__init+00000171]    %r3 = vm.select.ref %i24(0) ? %r3(!hal.device/0x0x55fd29704690) : %r2(null) -> !hal.device
[module.__init+0000017E]    %i20:21 = vm.add.i64 %i20:21(0), %i16:17(1)
[module.__init+00000185]    vm.br ^00000098()
[module.__init+00000099]    %i24 = vm.cmp.nz.ref %r3(null)
[module.__init+0000009E]    %i24 = vm.xor.i32 %i24(0), %i4(1)
[module.__init+000000A5]    %i25 = vm.cmp.lt.i64.s %i20:21(1), %i18:19(2)
[module.__init+000000AC]    %i25 = vm.and.i32 %i24(1), %i25(1)
[module.__init+000000B3]    vm.cond_br %i25(1), ^000000C2(), ^0000018C()
[module.__init+000000C3]    %i24 = vm.trunc.i64.i32 %i20:21(1)
[module.__init+000000C8]    %r3 = vm.call @hal.devices.get(%i24(1))
[module.__init+000000D6]    %r4 = vm.const.ref.rodata 0  // 0x0x55fd296f4010 13b
[module.__init+000000DD]    %r5 = vm.const.ref.rodata 1  // 0x0x55fd296f402c 6b
[module.__init+000000E4]    %i24, %i26 = vm.call @hal.device.query.i64(%r3(!hal.device/0x0x55fd29f9f200), %r4(!vm.buffer/0x0x55fd296f1458), %r5(!vm.buffer/0x0x55fd296f1480))
[module.__init+000000F8]    %i25 = vm.cmp.nz.i64 %i26:27(0)
[module.__init+000000FD]    %i24 = vm.select.i32 %i24(1) ? %i25(0) : %i13(0)
[module.__init+00000106]    vm.cond_br %i24(0), ^0000011A(), ^00000152(%i13(0)->%i24)
[module.__init+00000153]    %i25 = vm.cmp.eq.i64 %i22:23(0), %i14:15(0)
[module.__init+0000015A]    %i26:27 = vm.select.i64 %i24(0) ? %i16:17(1) : %i14:15(0)
[module.__init+00000163]    %i22:23 = vm.add.i64 %i22:23(0), %i26:27(0)
[module.__init+0000016A]    %i24 = vm.and.i32 %i24(0), %i25(1)
[module.__init+00000171]    %r3 = vm.select.ref %i24(0) ? %r3(!hal.device/0x0x55fd29f9f200) : %r2(null) -> !hal.device
[module.__init+0000017E]    %i20:21 = vm.add.i64 %i20:21(1), %i16:17(1)
[module.__init+00000185]    vm.br ^00000098()
[module.__init+00000099]    %i24 = vm.cmp.nz.ref %r3(null)
[module.__init+0000009E]    %i24 = vm.xor.i32 %i24(0), %i4(1)
[module.__init+000000A5]    %i25 = vm.cmp.lt.i64.s %i20:21(2), %i18:19(2)
[module.__init+000000AC]    %i25 = vm.and.i32 %i24(1), %i25(0)
[module.__init+000000B3]    vm.cond_br %i25(0), ^000000C2(), ^0000018C()
[module.__init+0000018D]    vm.cond_br %i24(1), ^0000019C(), ^0000048A()
[module.__init+0000019D]    vm.fail %i12(5), "HAL device `device_a` not found or unavailable: #hal.device.target<"local", {ordinal = 0 : index}, [#hal.executable.target<"rocm", "rocm-hsaco-fb", {iree.gpu.target = #iree_gpu.target<arch = "gfx942", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<MFMA_F32_16x16x4_F32>, <MFMA_F32_16x16x16_F16>, <MFMA_F32_32x32x8_F16>, <MFMA_F32_16x16x32_F8E4M3FNUZ>, <MFMA_I32_16x16x32_I8>, <MFMA_I32_32x32x16_I8>], subgroup_size_choices = [64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647]>>, ukernels = "none"}>]>"
<vm>:0: NOT_FOUND; HAL device `device_a` not found or unavailable: #hal.device.target<"local", {ordinal = 0 : index}, [#hal.executable.target<"rocm", "rocm-hsaco-fb", {iree.gpu.target = #iree_gpu.target<arch = "gfx942", features = "", wgp = <compute =  fp64|fp32|fp16|int64|int32|int16|int8, storage =  b64|b32|b16|b8, subgroup =  shuffle|arithmetic, dot =  dp4xi8toi32, mma = [<MFMA_F32_16x16x4_F32>, <MFMA_F32_16x16x16_F16>, <MFMA_F32_32x32x8_F16>, <MFMA_F32_16x16x32_F8E4M3FNUZ>, <MFMA_I32_16x16x32_I8>, <MFMA_I32_32x32x16_I8>], subgroup_size_choices = [64], max_workgroup_sizes = [1024, 1024, 1024], max_thread_count_per_workgroup = 1024, max_workgroup_memory_bytes = 65536, max_workgroup_counts = [2147483647, 2147483647, 2147483647]>>, ukernels = "none"}>]>; ; creating VM context; creating run context

@benvanik
Copy link
Collaborator

as I mentioned before, iree-hal-local-target-device-backends is not the flag you are looking for and does nothing in this case.

@benvanik
Copy link
Collaborator

benvanik commented Dec 18, 2024

(we should probably verify it at least but that's hard - you really should not be trying to use rocm executables with the local executor)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants