Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running iree-run-module-multi.mlir on a cpu device and a gpu device #19483

Open
dezhiAmd opened this issue Dec 13, 2024 · 5 comments
Open

Running iree-run-module-multi.mlir on a cpu device and a gpu device #19483

dezhiAmd opened this issue Dec 13, 2024 · 5 comments
Labels
bug 🐞 Something isn't working

Comments

@dezhiAmd
Copy link

What happened?

Compile the mlir file
iree-compile iree-run-module-multi.mlir --iree-execution-model=async-external --iree-hal-target-device=device_a=local[0] --iree-hal-target-device=device_b=hip[0] --iree-hal-local-target-device-backends=vmvx --iree-hip-target=gfx1103 -o cpu_gpu.vmfb

Run the generated vmfb file:

iree-run-module --module=cpu_gpu.vmfb ^
--function=mutli_device_mul ^
--input=4xf32=10,11,12,13 ^
--device=local-task --device=hip:0 ^
--trace_execution=true ^
--task_topology_group_count=1

The output on screen:

[module.__init+00000000]    <block>
[module.__init+00000001]    %i0 = vm.const.i32 1  // 0x00000001
[module.__init+00000008]    %r0 = vm.const.ref.zero
[module.__init+0000000B]    %i1 = vm.const.i32 14  // 0x0000000E
[module.__init+00000012]    %i2:3 = vm.const.i64 -1  // 0xFFFFFFFFFFFFFFFF
[module.__init+0000001D]    %i4 = vm.const.i32 18  // 0x00000012
[module.__init+00000024]    %i5 = vm.const.i32.zero
[module.__init+00000027]    %i6:7 = vm.const.i64.zero
[module.__init+0000002A]    %i8:9 = vm.const.i64 1  // 0x0000000000000001
[module.__init+00000035]    %r1 = vm.const.ref.zero
[module.__init+00000038]    %i10 = vm.call @hal.devices.count()
[module.__init+00000044]    %i10:11 = vm.ext.i32.i64.s %i10(2)
[module.__init+00000049]    vm.br ^00000064(%r1(null)->%r2, %i6(0)->%i12, %i7(0)->%i13, %i6(0)->%i14, %i7(0)->%i15)
[module.__init+00000065]    %i16 = vm.cmp.nz.ref %r2(null)
[module.__init+0000006A]    %i16 = vm.xor.i32 %i16(0), %i0(1)
[module.__init+00000071]    %i17 = vm.cmp.lt.i64.s %i12:13(0), %i10:11(2)
[module.__init+00000078]    %i17 = vm.and.i32 %i16(1), %i17(1)
[module.__init+0000007F]    vm.cond_br %i17(1), ^0000008E(), ^00000158()
[module.__init+0000008F]    %i16 = vm.trunc.i64.i32 %i12:13(0)
[module.__init+00000094]    %r2 = vm.call @hal.devices.get(%i16(0))
[module.__init+000000A2]    %r3 = vm.const.ref.rodata 0  // 0x0000012FC38E0108 13b
[module.__init+000000A9]    %r4 = vm.const.ref.rodata 1  // 0x0000012FC38E0124 6b
[module.__init+000000B0]    %i16, %i18 = vm.call @hal.device.query.i64(%r2(!hal.device/0x0000012FC3906970), %r3(!vm.buffer/0x0000012FC38E3C10), %r4(!vm.buffer/0x0000012FC38E3C38))
[module.__init+000000C4]    %i17 = vm.cmp.nz.i64 %i18:19(1)
[module.__init+000000C9]    %i16 = vm.select.i32 %i16(1) ? %i17(1) : %i5(0)
[module.__init+000000D2]    vm.cond_br %i16(1), ^000000E6(), ^0000011E(%i5(0)->%i16)
[module.__init+000000E7]    %r3 = vm.const.ref.rodata 2  // 0x0000012FC38E0138 21b
[module.__init+000000EE]    %r4 = vm.const.ref.rodata 3  // 0x0000012FC38E015C 16b
[module.__init+000000F5]    %i16, %i18 = vm.call @hal.device.query.i64(%r2(!hal.device/0x0000012FC3906970), %r3(!vm.buffer/0x0000012FC38E3C60), %r4(!vm.buffer/0x0000012FC38E3C88))
[module.__init+00000108]    %i17 = vm.cmp.nz.i64 %i18:19(1)
[module.__init+0000010D]    %i16 = vm.select.i32 %i16(1) ? %i17(1) : %i5(0)
[module.__init+00000116]    vm.br ^0000011E()
[module.__init+0000011F]    %i17 = vm.cmp.eq.i64 %i14:15(0), %i6:7(0)
[module.__init+00000126]    %i18:19 = vm.select.i64 %i16(1) ? %i8:9(1) : %i6:7(0)
[module.__init+0000012F]    %i14:15 = vm.add.i64 %i14:15(0), %i18:19(1)
[module.__init+00000136]    %i16 = vm.and.i32 %i16(1), %i17(1)
[module.__init+0000013D]    %r2 = vm.select.ref %i16(1) ? %r2(!hal.device/0x0000012FC3906970) : %r1(null) -> !hal.device
[module.__init+0000014A]    %i12:13 = vm.add.i64 %i12:13(0), %i8:9(1)
[module.__init+00000151]    vm.br ^00000064()
[module.__init+00000065]    %i16 = vm.cmp.nz.ref %r2(!hal.device/0x0000012FC3906970)
[module.__init+0000006A]    %i16 = vm.xor.i32 %i16(1), %i0(1)
[module.__init+00000071]    %i17 = vm.cmp.lt.i64.s %i12:13(1), %i10:11(2)
[module.__init+00000078]    %i17 = vm.and.i32 %i16(0), %i17(1)
[module.__init+0000007F]    vm.cond_br %i17(0), ^0000008E(), ^00000158()
[module.__init+00000159]    vm.cond_br %i16(0), ^00000168(), ^0000021B()
[module.__init+0000021C]    %r3 = vm.const.ref.rodata 2  // 0x0000012FC38E0138 21b
[module.__init+00000223]    %r4 = vm.const.ref.rodata 3  // 0x0000012FC38E015C 16b
[module.__init+0000022A]    %i12, %i14 = vm.call @hal.device.query.i64(%r2(!hal.device/0x0000012FC3906970), %r3(!vm.buffer/0x0000012FC38E3C60), %r4(!vm.buffer/0x0000012FC38E3C88))
[module.__init+0000023E]    %i13 = vm.cmp.nz.i64 %i14:15(1)
[module.__init+00000243]    %i12 = vm.select.i32 %i12(1) ? %i13(1) : %i5(0)
[module.__init+0000024C]    %i12:13 = vm.select.i64 %i12(1) ? %i6:7(0) : %i2:3(-1)
[module.__init+00000255]    %i12 = vm.cmp.eq.i64 %i12:13(0), %i6:7(0)
[module.__init+0000025C]    vm.global.store.ref %r2(!hal.device/0x0000012FC3906970), .refs[0] : !hal.device
[module.__init+00000267]    vm.cond_br %i12(1), ^00000276(), ^000002B8()
[module.__init+00000277]    %r5 = vm.const.ref.rodata 4  // 0x0000012FC38E10F0 1397b
[module.__init+0000027E]    %r2 = vm.call @hal.executable.create(%r2(!hal.device/0x0000012FC3906970), %r4(!vm.buffer/0x0000012FC38E3C88), %r5(!vm.buffer/0x0000012FC38E3CB0), %r0(null))
[module.__init+00000292]    vm.global.store.ref %r2(!hal.executable/0x0000012FD0077ED0), .refs[1] : !hal.executable
[module.__init+0000029D]    vm.br ^00000343(%r1(null)->%r2, %i6(0)->%i12, %i7(0)->%i13, %i6(0)->%i14, %i7(0)->%i15)
[module.__init+00000344]    %i16 = vm.cmp.nz.ref %r2(null)
[module.__init+00000349]    %i16 = vm.xor.i32 %i16(0), %i0(1)
[module.__init+00000350]    %i17 = vm.cmp.lt.i64.s %i12:13(0), %i10:11(2)
[module.__init+00000357]    %i17 = vm.and.i32 %i16(1), %i17(1)
[module.__init+0000035E]    vm.cond_br %i17(1), ^0000036E(), ^00000432()
[module.__init+0000036F]    %i16 = vm.trunc.i64.i32 %i12:13(0)
[module.__init+00000374]    %r2 = vm.call @hal.devices.get(%i16(0))
[module.__init+00000382]    %r4 = vm.const.ref.rodata 0  // 0x0000012FC38E0108 13b
[module.__init+00000389]    %r5 = vm.const.ref.rodata 5  // 0x0000012FC38E018C 3b
[module.__init+00000390]    %i16, %i18 = vm.call @hal.device.query.i64(%r2(!hal.device/0x0000012FC3906970), %r4(!vm.buffer/0x0000012FC38E3C10), %r5(!vm.buffer/0x0000012FC38E3CD8))
[module.__init+000003A4]    %i17 = vm.cmp.nz.i64 %i18:19(0)
[module.__init+000003A9]    %i16 = vm.select.i32 %i16(1) ? %i17(0) : %i5(0)
[module.__init+000003B2]    vm.cond_br %i16(0), ^000003C6(), ^000003F8(%i5(0)->%i16)
[module.__init+000003F9]    %i17 = vm.cmp.eq.i64 %i14:15(0), %i6:7(0)
[module.__init+00000400]    %i18:19 = vm.select.i64 %i16(0) ? %i8:9(1) : %i6:7(0)
[module.__init+00000409]    %i14:15 = vm.add.i64 %i14:15(0), %i18:19(0)
[module.__init+00000410]    %i16 = vm.and.i32 %i16(0), %i17(1)
[module.__init+00000417]    %r2 = vm.select.ref %i16(0) ? %r2(!hal.device/0x0000012FC3906970) : %r1(null) -> !hal.device
[module.__init+00000424]    %i12:13 = vm.add.i64 %i12:13(0), %i8:9(1)
[module.__init+0000042B]    vm.br ^00000343()
[module.__init+00000344]    %i16 = vm.cmp.nz.ref %r2(null)
[module.__init+00000349]    %i16 = vm.xor.i32 %i16(0), %i0(1)
[module.__init+00000350]    %i17 = vm.cmp.lt.i64.s %i12:13(1), %i10:11(2)
[module.__init+00000357]    %i17 = vm.and.i32 %i16(1), %i17(1)
[module.__init+0000035E]    vm.cond_br %i17(1), ^0000036E(), ^00000432()
[module.__init+0000036F]    %i16 = vm.trunc.i64.i32 %i12:13(1)
[module.__init+00000374]    %r2 = vm.call @hal.devices.get(%i16(1))
[module.__init+00000382]    %r4 = vm.const.ref.rodata 0  // 0x0000012FC38E0108 13b
[module.__init+00000389]    %r5 = vm.const.ref.rodata 5  // 0x0000012FC38E018C 3b
[module.__init+00000390]    %i16, %i18 = vm.call @hal.device.query.i64(%r2(!hal.device/0x0000012FD026D920), %r4(!vm.buffer/0x0000012FC38E3C10), %r5(!vm.buffer/0x0000012FC38E3CD8))
[module.__init+000003A4]    %i17 = vm.cmp.nz.i64 %i18:19(1)
[module.__init+000003A9]    %i16 = vm.select.i32 %i16(1) ? %i17(1) : %i5(0)
[module.__init+000003B2]    vm.cond_br %i16(1), ^000003C6(), ^000003F8(%i5(0)->%i16)
[module.__init+000003C7]    %r4 = vm.const.ref.rodata 6  // 0x0000012FC38E019C 13b
[module.__init+000003CE]    %i16, %i18 = vm.call @hal.device.query.i64(%r2(!hal.device/0x0000012FD026D920), %r3(!vm.buffer/0x0000012FC38E3C60), %r4(!vm.buffer/0x0000012FC38E3D00))
[module.__init+000003E2]    %i17 = vm.cmp.nz.i64 %i18:19(1)
[module.__init+000003E7]    %i16 = vm.select.i32 %i16(1) ? %i17(1) : %i5(0)
[module.__init+000003F0]    vm.br ^000003F8()
[module.__init+000003F9]    %i17 = vm.cmp.eq.i64 %i14:15(0), %i6:7(0)
[module.__init+00000400]    %i18:19 = vm.select.i64 %i16(1) ? %i8:9(1) : %i6:7(0)
[module.__init+00000409]    %i14:15 = vm.add.i64 %i14:15(0), %i18:19(1)
[module.__init+00000410]    %i16 = vm.and.i32 %i16(1), %i17(1)
[module.__init+00000417]    %r2 = vm.select.ref %i16(1) ? %r2(!hal.device/0x0000012FD026D920) : %r1(null) -> !hal.device
[module.__init+00000424]    %i12:13 = vm.add.i64 %i12:13(1), %i8:9(1)
[module.__init+0000042B]    vm.br ^00000343()
[module.__init+00000344]    %i16 = vm.cmp.nz.ref %r2(!hal.device/0x0000012FD026D920)
[module.__init+00000349]    %i16 = vm.xor.i32 %i16(1), %i0(1)
[module.__init+00000350]    %i17 = vm.cmp.lt.i64.s %i12:13(2), %i10:11(2)
[module.__init+00000357]    %i17 = vm.and.i32 %i16(0), %i17(0)
[module.__init+0000035E]    vm.cond_br %i17(0), ^0000036E(), ^00000432()
[module.__init+00000433]    vm.cond_br %i16(0), ^00000442(), ^0000074A()
[module.__init+0000074B]    %r1 = vm.const.ref.rodata 6  // 0x0000012FC38E019C 13b
[module.__init+00000752]    %i0, %i8 = vm.call @hal.device.query.i64(%r2(!hal.device/0x0000012FD026D920), %r3(!vm.buffer/0x0000012FC38E3C60), %r1(!vm.buffer/0x0000012FC38E3D00))
[module.__init+00000766]    %i4 = vm.cmp.nz.i64 %i8:9(1)
[module.__init+0000076B]    %i0 = vm.select.i32 %i0(1) ? %i4(1) : %i5(0)
[module.__init+00000774]    %i2:3 = vm.select.i64 %i0(1) ? %i6:7(0) : %i2:3(-1)
[module.__init+0000077D]    %i0 = vm.cmp.eq.i64 %i2:3(0), %i6:7(0)
[module.__init+00000784]    vm.global.store.ref %r2(!hal.device/0x0000012FD026D920), .refs[2] : !hal.device
[module.__init+0000078F]    vm.cond_br %i0(1), ^0000079E(), ^000007F4()
[module.__init+0000079F]    %r3 = vm.const.ref.rodata 7  // 0x0000012FC38E16D0 4728b
[module.__init+000007A6]    %r0 = vm.call @hal.executable.create(%r2(!hal.device/0x0000012FD026D920), %r1(!vm.buffer/0x0000012FC38E3D00), %r3(!vm.buffer/0x0000012FC38E3D28), %r0(null))
[module.__init+000007BA]    vm.global.store.ref %r0(!hal.executable/0x0000012FD0120230), .refs[3] : !hal.executable
[module.__init+000007C5]    %r0 = vm.call @module.__mutli_device_mul_memoize_apply()
[module.__mutli_device_mul_memoize_apply+00000000]    <block>
[module.__mutli_device_mul_memoize_apply+00000001]    %i0 = vm.const.i32 13  // 0x0000000D
[module.__mutli_device_mul_memoize_apply+00000008]    %i1 = vm.const.i32 28  // 0x0000001C
[module.__mutli_device_mul_memoize_apply+0000000F]    %r0 = vm.const.ref.zero
[module.__mutli_device_mul_memoize_apply+00000012]    %i2 = vm.const.i32 1  // 0x00000001
[module.__mutli_device_mul_memoize_apply+00000019]    %i3 = vm.const.i32 2  // 0x00000002
[module.__mutli_device_mul_memoize_apply+00000020]    %i4 = vm.const.i32 3  // 0x00000003
[module.__mutli_device_mul_memoize_apply+00000027]    %i5 = vm.const.i32.zero
[module.__mutli_device_mul_memoize_apply+0000002A]    %i6:7 = vm.const.i64 16  // 0x0000000000000010
[module.__mutli_device_mul_memoize_apply+00000035]    %i8:9 = vm.const.i64.zero
[module.__mutli_device_mul_memoize_apply+00000038]    %i10:11 = vm.const.i64 -1  // 0xFFFFFFFFFFFFFFFF
[module.__mutli_device_mul_memoize_apply+00000043]    %r1 = vm.global.load.ref .refs[0](!hal.device/0x0000012FC3906970) : !hal.device
[module.__mutli_device_mul_memoize_apply+0000004E]    %r2 = vm.global.load.ref .refs[1](!hal.executable/0x0000012FD0077ED0) : !hal.executable
[module.__mutli_device_mul_memoize_apply+00000059]    %r1 = vm.call @hal.command_buffer.create(%r1(!hal.device/0x0000012FC3906970), %i5(0), %i4(3), %i10(4294967295), %i3(2))
[module.__mutli_device_mul_memoize_apply+0000006E]    vm.call.varadic @hal.command_buffer.dispatch(%r1(!hal.command_buffer/0x0000012FD092CE90), %r2(!hal.executable/0x0000012FD0077ED0), %i5(0), %i2(1), %i2(1), %i2(1), %i8(0), %i5(0), %i5(0), %r0(null), %i8(0), %i6(16), %i5(0), %i2(1), %r0(null), %i8(0), %i6(16))
[module.__mutli_device_mul_memoize_apply+000000AE]    vm.call @hal.command_buffer.execution_barrier(%r1(!hal.command_buffer/0x0000012FD092CE90), %i1(28), %i0(13), %i5(0))
[module.__mutli_device_mul_memoize_apply+000000C0]    vm.call @hal.command_buffer.finalize(%r1(!hal.command_buffer/0x0000012FD092CE90))
[module.__mutli_device_mul_memoize_apply+000000CC]    vm.return %r1(!hal.command_buffer/0x0000012FD092CE90)
[module.__init+000007D0]    vm.global.store.ref %r0(!hal.command_buffer/0x0000012FD092CE90), .refs[4] : !hal.command_buffer
[module.__init+000007DB]    %r0 = vm.call @module.__mutli_device_mul_memoize_apply_0()
[module.__mutli_device_mul_memoize_apply_0+00000000]    <block>
[module.__mutli_device_mul_memoize_apply_0+00000001]    %i0 = vm.const.i32 1  // 0x00000001
[module.__mutli_device_mul_memoize_apply_0+00000008]    %i1 = vm.const.i32 2  // 0x00000002
[module.__mutli_device_mul_memoize_apply_0+0000000F]    %r0 = vm.const.ref.zero
[module.__mutli_device_mul_memoize_apply_0+00000012]    %i2 = vm.const.i32 13  // 0x0000000D
[module.__mutli_device_mul_memoize_apply_0+00000019]    %i3 = vm.const.i32 28  // 0x0000001C
[module.__mutli_device_mul_memoize_apply_0+00000020]    %i4 = vm.const.i32 3  // 0x00000003
[module.__mutli_device_mul_memoize_apply_0+00000027]    %i5 = vm.const.i32.zero
[module.__mutli_device_mul_memoize_apply_0+0000002A]    %i6:7 = vm.const.i64 64  // 0x0000000000000040
[module.__mutli_device_mul_memoize_apply_0+00000035]    %i8:9 = vm.const.i64 16  // 0x0000000000000010
[module.__mutli_device_mul_memoize_apply_0+00000040]    %i10:11 = vm.const.i64.zero
[module.__mutli_device_mul_memoize_apply_0+00000043]    %i12:13 = vm.const.i64 -1  // 0xFFFFFFFFFFFFFFFF
[module.__mutli_device_mul_memoize_apply_0+0000004E]    %r1 = vm.global.load.ref .refs[2](!hal.device/0x0000012FD026D920) : !hal.device
[module.__mutli_device_mul_memoize_apply_0+00000059]    %r2 = vm.global.load.ref .refs[3](!hal.executable/0x0000012FD0120230) : !hal.executable
[module.__mutli_device_mul_memoize_apply_0+00000064]    %r1 = vm.call @hal.command_buffer.create(%r1(!hal.device/0x0000012FD026D920), %i5(0), %i4(3), %i12(4294967295), %i4(3))
[module.__mutli_device_mul_memoize_apply_0+0000007A]    vm.call @hal.command_buffer.execution_barrier(%r1(!hal.command_buffer/0x0000012FD0DD1F80), %i3(28), %i2(13), %i5(0))
[module.__mutli_device_mul_memoize_apply_0+0000008C]    vm.call @hal.command_buffer.copy_buffer(%r1(!hal.command_buffer/0x0000012FD0DD1F80), %i5(0), %i1(2), %r0(null), %i10(0), %r0(null), %i10(0), %i8(16))
[module.__mutli_device_mul_memoize_apply_0+000000A6]    vm.call @hal.command_buffer.execution_barrier(%r1(!hal.command_buffer/0x0000012FD0DD1F80), %i3(28), %i2(13), %i5(0))
[module.__mutli_device_mul_memoize_apply_0+000000B8]    vm.call.varadic @hal.command_buffer.dispatch(%r1(!hal.command_buffer/0x0000012FD0DD1F80), %r2(!hal.executable/0x0000012FD0120230), %i5(0), %i0(1), %i0(1), %i0(1), %i10(0), %i5(0), %i1(2), %r0(null), %i10(0), %i6(64), %i5(0), %i0(1), %r0(null), %i10(0), %i8(16))
[module.__mutli_device_mul_memoize_apply_0+000000F8]    vm.call @hal.command_buffer.execution_barrier(%r1(!hal.command_buffer/0x0000012FD0DD1F80), %i3(28), %i2(13), %i5(0))
[module.__mutli_device_mul_memoize_apply_0+0000010A]    vm.call @hal.command_buffer.finalize(%r1(!hal.command_buffer/0x0000012FD0DD1F80))
[module.__mutli_device_mul_memoize_apply_0+00000116]    vm.return %r1(!hal.command_buffer/0x0000012FD0DD1F80)
[module.__init+000007E6]    vm.global.store.ref %r0(!hal.command_buffer/0x0000012FD0DD1F80), .refs[5] : !hal.command_buffer
[module.__init+000007F1]    vm.return
EXEC @mutli_device_mul
[module.mutli_device_mul+00000000]    <block>
[module.mutli_device_mul+00000001]    %i0 = vm.const.i32 48  // 0x00000030
[module.mutli_device_mul+00000008]    %i1 = vm.const.i32.zero
[module.mutli_device_mul+0000000B]    %i2 = vm.const.i32 3075  // 0x00000C03
[module.mutli_device_mul+00000012]    %i3 = vm.const.i32 16  // 0x00000010
[module.mutli_device_mul+00000019]    %i4 = vm.const.i32 1  // 0x00000001
[module.mutli_device_mul+00000020]    %i5 = vm.const.i32 553648160  // 0x21000020
[module.mutli_device_mul+00000027]    %i6:7 = vm.const.i64 4  // 0x0000000000000004
[module.mutli_device_mul+00000032]    %i8:9 = vm.const.i64.zero
[module.mutli_device_mul+00000035]    %i10:11 = vm.const.i64 16  // 0x0000000000000010
[module.mutli_device_mul+00000040]    %i12:13 = vm.const.i64 64  // 0x0000000000000040
[module.mutli_device_mul+0000004B]    %i14:15 = vm.const.i64 -1  // 0xFFFFFFFFFFFFFFFF
[module.mutli_device_mul+00000056]    %i16 = vm.const.i32 -1  // 0xFFFFFFFF
[module.mutli_device_mul+0000005D]    %r3 = vm.const.ref.zero
[module.mutli_device_mul+00000060]    %r4 = vm.global.load.ref .refs[0](!hal.device/0x0000012FC3906970) : !hal.device
[module.mutli_device_mul+0000006B]    %r5 = vm.global.load.ref .refs[2](!hal.device/0x0000012FD026D920) : !hal.device
[module.mutli_device_mul+00000076]    %r6 = vm.global.load.ref .refs[4](!hal.command_buffer/0x0000012FD092CE90) : !hal.command_buffer
[module.mutli_device_mul+00000081]    %r7 = vm.global.load.ref .refs[5](!hal.command_buffer/0x0000012FD0DD1F80) : !hal.command_buffer
[module.mutli_device_mul+0000008C]    %r8 = vm.const.ref.rodata 8  // 0x0000012FC38E01CC 6b
[module.mutli_device_mul+00000093]    vm.call.varadic @hal.buffer_view.assert(%r0(!hal.buffer_view/0x0000012FC38DDB60), %r8(!vm.buffer/0x0000012FC38E3D50), %i5(553648160), %i4(1), %i6(4))
[module.mutli_device_mul+000000B2]    %r0 = vm.call @hal.buffer_view.buffer(%r0(!hal.buffer_view/0x0000012FC38DDB60))
[module.mutli_device_mul+000000C0]    %r8 = vm.call @hal.device.allocator(%r4(!hal.device/0x0000012FC3906970))
[module.mutli_device_mul+000000CE]    %r9 = vm.const.ref.rodata 9  // 0x0000012FC38E01E0 6b
[module.mutli_device_mul+000000D5]    vm.call @hal.buffer.assert(%r0(!hal.buffer/0x0000012FD092CFC0), %r9(!vm.buffer/0x0000012FC38E3D78), %r8(!hal.allocator/0x0000012FC38BEA30), %i10(16), %i3(16), %i2(3075))
[module.mutli_device_mul+000000EA]    %r8 = vm.call @hal.fence.create(%r4(!hal.device/0x0000012FC3906970), %i1(0))
[module.mutli_device_mul+000000FA]    %r1 = vm.call @hal.device.queue.alloca(%r4(!hal.device/0x0000012FC3906970), %i14(4294967295), %r1(null), %r8(!hal.fence/0x0000012FD021E9B0), %i1(0), %i0(48), %i2(3075), %i10(16))
[module.mutli_device_mul+00000116]    %r9 = vm.call @hal.fence.create(%r4(!hal.device/0x0000012FC3906970), %i1(0))
[module.mutli_device_mul+00000126]    vm.call.varadic @hal.device.queue.execute.indirect(%r4(!hal.device/0x0000012FC3906970), %i14(4294967295), %r8(!hal.fence/0x0000012FD021E9B0), %r9(!hal.fence/0x0000012FD021ED30), %r6(!hal.command_buffer/0x0000012FD092CE90), %r0(!hal.buffer/0x0000012FD092CFC0), %i8(0), %i10(16), %r1(!hal.buffer/0x0000012FD092D100), %i8(0), %i10(16))
[module.mutli_device_mul+00000154]    %r0 = vm.call @hal.fence.create(%r5(!hal.device/0x0000012FD026D920), %i1(0))
[module.mutli_device_mul+00000164]    %i3 = vm.call.varadic @hal.fence.await(%i16(4294967295), %r9(!hal.fence/0x0000012FD021ED30))
[module.mutli_device_mul+0000017A]    %r6 = vm.call @hal.device.queue.alloca(%r5(!hal.device/0x0000012FD026D920), %i14(4294967295), %r3(null), %r0(!hal.fence/0x0000012FD021EB70), %i1(0), %i0(48), %i2(3075), %i10(16))
[module.mutli_device_mul+00000196]    %i3 = vm.call.varadic @hal.fence.await(%i16(4294967295), %r0(!hal.fence/0x0000012FD021EB70))
[module.mutli_device_mul+000001AC]    %r8 = vm.call @hal.fence.create(%r5(!hal.device/0x0000012FD026D920), %i1(0))
[module.mutli_device_mul+000001BC]    %i3 = vm.call.varadic @hal.fence.await(%i16(4294967295), %r9(!hal.fence/0x0000012FD021ED30))
[module.mutli_device_mul+000001D2]    %r9 = vm.call @hal.device.queue.alloca(%r5(!hal.device/0x0000012FD026D920), %i14(4294967295), %r3(null), %r8(!hal.fence/0x0000012FD021E860), %i1(0), %i0(48), %i2(3075), %i12(64))
[module.mutli_device_mul+000001EE]    %i3 = vm.call.varadic @hal.fence.await(%i16(4294967295), %r8(!hal.fence/0x0000012FD021E860))
[module.mutli_device_mul+00000204]    %r0 = vm.call.varadic @hal.fence.join(%r0(!hal.fence/0x0000012FD021EB70), %r8(!hal.fence/0x0000012FD021E860))
[module.mutli_device_mul+00000218]    %r8 = vm.call @hal.fence.create(%r5(!hal.device/0x0000012FD026D920), %i1(0))
[module.mutli_device_mul+00000228]    vm.call.varadic @hal.device.queue.execute.indirect(%r5(!hal.device/0x0000012FD026D920), %i14(4294967295), %r0(!hal.fence/0x0000012FC38DD8E0), %r8(!hal.fence/0x0000012FD021EB70), %r7(!hal.command_buffer/0x0000012FD0DD1F80), %r1(!hal.buffer/0x0000012FD092D100), %i8(0), %i10(16), %r6(!hal.buffer/0x0000012FD0224C40), %i8(0), %i10(16), %r9(!hal.buffer/0x0000012FD0223C80), %i8(0), %i12(64))
[module.mutli_device_mul+0000025C]    %r0 = vm.call @hal.fence.create(%r5(!hal.device/0x0000012FD026D920), %i1(0))
Assertion failed: !!(iree_hal_resource_is(base_value, &iree_hal_hip_buffer_vtable)), file C:\develop\iree-two-devices\third_party\iree\runtime\src\iree\hal\drivers\hip\hip_buffer.c, line 34
[module.mutli_device_mul+0000026C]    %i3 = vm.call.varadic @hal.fence.await(%i16(4294967295), %r8(!hal.fence/0x0000012FD021EB70))

Steps to reproduce your issue

The mlir file is inside the repo:

// RUN: --iree-hal-target-device=device_b=local[1] \

What component(s) does this issue relate to?

Runtime

Version information

Git hash: 63cdc7d

Additional context

Compile on two cpu devices work fine

@dezhiAmd dezhiAmd added the bug 🐞 Something isn't working label Dec 13, 2024
@daveliddell
Copy link
Contributor

In our understanding of the code, it's trying to copy a CPU device buffer to a HIP device buffer using a copy_buffer command on the HIP device queue.

The command buffer being used is set up in module.__mutli_device_mul_memoize_apply_0, line +008C.
The source (data) buffer is created at module.mutli_device_mul+000000FA.
The command buffer (and source buffer) is enqueued at module.mutli_device_mul+00000228.

During execution of the copy_buffer command on the HIP device, the device's copy_buffer implementation discovers that the source buffer isn't a HIP device buffer and asserts out.

The problem seems to be that the compiler doesn't know how to handle the situation where a buffer from one device needs to be transferred to another device and improperly codes the transfer as a copy_buffer.

@benvanik
Copy link
Collaborator

Copy buffer is what should be used but there's an epic feature sprint worth of work or more (1+ quarters) to make buffer allocation, import/export, and lifetime management capable of supporting these kind of fine-grained cross-device transfers. Heterogeneous devices with discrete memory is not expected to work today. There's shorter-term work that could make heterogeneous devices that share the host address space and that do not not require registration (as most devices do) work by having a shared allocator via iree_hal_device_replace_allocator and implementing dyn_cast for host buffers but there's gotchas there. For at least the next quarter the focus is on homogeneous devices as there's a significant amount of work across the stack to make that efficient that acts as the foundation for the future heterogeneous support.

@benvanik
Copy link
Collaborator

(if you're interested in this area we should chat about what you're looking to do - there may be easier options than multi device!)

@daveliddell
Copy link
Contributor

Thanks for the feedback, Ben! I take it that you already have a plan for how copy_buffer should do cross-device transfers in general. Yes, I think we probably need to sync up soon. :-)

@dezhiAmd
Copy link
Author

dezhiAmd commented Dec 19, 2024

focus is on homogeneous devices

Here is an issue reported regarding to two gpu devices: #19507

But this issue appears to be fixed when I use git hash ce659488d2ed07cc944d05e096b1f91b93f28709

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants