[numeric][cpu]: numeric error for ONNX Gather operator element at index 200 (0.420379) does not match the expected (0.642927); #18273

pdhirajkumarprasad · 2024-08-19T06:19:44Z

What happened?

For the given IR

module {
  func.func @main(%arg0: !torch.vtensor<[10,200],f32>, %arg1: !torch.vtensor<[35,1],si64>) -> !torch.vtensor<[35,1,200],f32> attributes {torch.onnx_meta.ir_version = 10 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "", torch.onnx_meta.producer_version = ""} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.Gather"(%arg0, %arg1) : (!torch.vtensor<[10,200],f32>, !torch.vtensor<[35,1],si64>) -> !torch.vtensor<[35,1,200],f32> 
    return %0 : !torch.vtensor<[35,1,200],f32>
  }
}

We are seeing numeric mismatch

IREE version:
IREE compiler version 20240819.990 @ aeda149
LLVM version 20.0.0git

Steps to reproduce your issue

Command to reproduce the issue:

iree-compile model.torch_onnx.mlir --iree-hal-target-backends=llvm-cpu -o out.vmfb --iree-input-demote-i64-to-i32
iree-run-module --module=out.vmfb --device="local-task" --input="[email protected]" --input="[email protected]"  --expected_output="35x1x200xf32=@golden_output.0.bin"

This issue is coming due to presence of --iree-input-demote-i64-to-i32. If I remove this then I am 100% match
golden_output.0.bin.txt
input.0.bin.txt
input.1.bin.txt

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

The text was updated successfully, but these errors were encountered:

MaheshRavishankar · 2024-08-20T17:28:12Z

There might be a codegen issue here, but fact that not dropping to i32 makes the error go away seems to suggest this is not a codegen issue. It is in general not "safe" to truncate fully like this, but can be done if we know the inputs are within the 32 bit range.

@lialan start by seeing if there is any IR difference between the 32-bit and 64-bit compilation paths for this example.

lialan · 2024-08-21T01:33:45Z

They both generate same structure of LLVM IR, except: In the demote path, generated IR contains a load of i32:

%60 = llvm.getelementptr %31[%59] : (!llvm.ptr, i64) -> !llvm.ptr, i32

while in the normal path:

%59 = llvm.getelementptr %30[%58] : (!llvm.ptr, i64) -> !llvm.ptr, i64

In both paths, %31/%30 directly come from ABI:

    %29 = llvm.extractvalue %28[10] : !llvm.struct<"iree_hal_executable_dispatch_state_v0_t", (i32, i32, i16, i16, i32, i32, i16, i8, i8, ptr, ptr, ptr)>
    %30 = llvm.getelementptr %29[1] : (!llvm.ptr) -> !llvm.ptr, !llvm.ptr
    %31 = llvm.load %30 : !llvm.ptr -> !llvm.ptr

vs

    %28 = llvm.extractvalue %27[10] : !llvm.struct<"iree_hal_executable_dispatch_state_v0_t", (i32, i32, i16, i16, i32, i32, i16, i8, i8, ptr, ptr, ptr)>
    %29 = llvm.getelementptr %28[1] : (!llvm.ptr) -> !llvm.ptr, !llvm.ptr
    %30 = llvm.load %29 : !llvm.ptr -> !llvm.ptr

Suspect this caused the issue.

MaheshRavishankar · 2024-08-21T03:05:11Z

Can you post the two IRs

lialan · 2024-08-22T03:33:33Z

Attaching dumped IR files.

This one enables --iree-input-demote-i64-to-i32 which causes value not matching:
demote_to_i32.mlir.txt

This one does not have the option and works fine:
no_demote_to_i32.mlir.txt

MaheshRavishankar · 2024-08-22T16:48:51Z

Looking at the IR, it doesnt look like a compilation failure. I dont know what the values of inputs you are sending in here. Should make sure that it is safe to demote from i64 to i32. At this point, I dont really see a codegen issue.

MaheshRavishankar · 2024-08-28T06:01:25Z

@pdhirajkumarprasad up signalling. This doesn't look like a compiler error. Could we close it?

pdhirajkumarprasad · 2024-09-04T04:42:37Z

Model works fine without the flag

pdhirajkumarprasad added the bug 🐞 Something isn't working label Aug 19, 2024

ScottTodd added the integrations/onnx ONNX integration work label Aug 19, 2024

pdhirajkumarprasad mentioned this issue Aug 20, 2024

MiGraphx CPU/GPU Status Tracking nod-ai/SHARK-TestSuite#325

Open

nirvedhmeshram assigned lialan and MaheshRavishankar Aug 20, 2024

lialan added the hal/cpu Runtime Host/CPU-based HAL backend label Aug 20, 2024

lialan added this to IREE Compilation Errors Aug 20, 2024

MaheshRavishankar unassigned lialan Aug 20, 2024

lialan moved this to Triage in IREE Compilation Errors Aug 20, 2024

lialan moved this from Triage to Todo in IREE Compilation Errors Aug 20, 2024

MaheshRavishankar assigned lialan and unassigned MaheshRavishankar Aug 22, 2024

pdhirajkumarprasad closed this as completed Sep 4, 2024

github-project-automation bot moved this from Todo to Done in IREE Compilation Errors Sep 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[numeric][cpu]: numeric error for ONNX Gather operator element at index 200 (0.420379) does not match the expected (0.642927); #18273

[numeric][cpu]: numeric error for ONNX Gather operator element at index 200 (0.420379) does not match the expected (0.642927); #18273

pdhirajkumarprasad commented Aug 19, 2024 •

edited

Loading

MaheshRavishankar commented Aug 20, 2024

lialan commented Aug 21, 2024

MaheshRavishankar commented Aug 21, 2024

lialan commented Aug 22, 2024

MaheshRavishankar commented Aug 22, 2024

MaheshRavishankar commented Aug 28, 2024

pdhirajkumarprasad commented Sep 4, 2024

[numeric][cpu]: numeric error for ONNX Gather operator element at index 200 (0.420379) does not match the expected (0.642927); #18273

[numeric][cpu]: numeric error for ONNX Gather operator element at index 200 (0.420379) does not match the expected (0.642927); #18273

Comments

pdhirajkumarprasad commented Aug 19, 2024 • edited Loading

What happened?

Steps to reproduce your issue

What component(s) does this issue relate to?

Version information

Additional context

MaheshRavishankar commented Aug 20, 2024

lialan commented Aug 21, 2024

MaheshRavishankar commented Aug 21, 2024

lialan commented Aug 22, 2024

MaheshRavishankar commented Aug 22, 2024

MaheshRavishankar commented Aug 28, 2024

pdhirajkumarprasad commented Sep 4, 2024

pdhirajkumarprasad commented Aug 19, 2024 •

edited

Loading