Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[numeric][cpu]: numeric error for ONNX Gather operator element at index 200 (0.420379) does not match the expected (0.642927); #18273

Closed
pdhirajkumarprasad opened this issue Aug 19, 2024 · 7 comments
Assignees
Labels
bug 🐞 Something isn't working hal/cpu Runtime Host/CPU-based HAL backend integrations/onnx ONNX integration work

Comments

@pdhirajkumarprasad
Copy link

pdhirajkumarprasad commented Aug 19, 2024

What happened?

For the given IR

module {
  func.func @main(%arg0: !torch.vtensor<[10,200],f32>, %arg1: !torch.vtensor<[35,1],si64>) -> !torch.vtensor<[35,1,200],f32> attributes {torch.onnx_meta.ir_version = 10 : si64, torch.onnx_meta.opset_version = 21 : si64, torch.onnx_meta.producer_name = "", torch.onnx_meta.producer_version = ""} {
    %none = torch.constant.none
    %0 = torch.operator "onnx.Gather"(%arg0, %arg1) : (!torch.vtensor<[10,200],f32>, !torch.vtensor<[35,1],si64>) -> !torch.vtensor<[35,1,200],f32> 
    return %0 : !torch.vtensor<[35,1,200],f32>
  }
}

We are seeing numeric mismatch

IREE version:
IREE compiler version 20240819.990 @ aeda149
LLVM version 20.0.0git

Steps to reproduce your issue

Command to reproduce the issue:

iree-compile model.torch_onnx.mlir --iree-hal-target-backends=llvm-cpu -o out.vmfb --iree-input-demote-i64-to-i32
iree-run-module --module=out.vmfb --device="local-task" --input="[email protected]" --input="[email protected]"  --expected_output="35x1x200xf32=@golden_output.0.bin"

This issue is coming due to presence of --iree-input-demote-i64-to-i32. If I remove this then I am 100% match
golden_output.0.bin.txt
input.0.bin.txt
input.1.bin.txt

What component(s) does this issue relate to?

Runtime

Version information

No response

Additional context

No response

@MaheshRavishankar
Copy link
Contributor

There might be a codegen issue here, but fact that not dropping to i32 makes the error go away seems to suggest this is not a codegen issue. It is in general not "safe" to truncate fully like this, but can be done if we know the inputs are within the 32 bit range.

@lialan start by seeing if there is any IR difference between the 32-bit and 64-bit compilation paths for this example.

@lialan lialan moved this to Triage in IREE Compilation Errors Aug 20, 2024
@lialan lialan moved this from Triage to Todo in IREE Compilation Errors Aug 20, 2024
@lialan
Copy link
Contributor

lialan commented Aug 21, 2024

They both generate same structure of LLVM IR, except: In the demote path, generated IR contains a load of i32:

%60 = llvm.getelementptr %31[%59] : (!llvm.ptr, i64) -> !llvm.ptr, i32

while in the normal path:

%59 = llvm.getelementptr %30[%58] : (!llvm.ptr, i64) -> !llvm.ptr, i64

In both paths, %31/%30 directly come from ABI:

    %29 = llvm.extractvalue %28[10] : !llvm.struct<"iree_hal_executable_dispatch_state_v0_t", (i32, i32, i16, i16, i32, i32, i16, i8, i8, ptr, ptr, ptr)>
    %30 = llvm.getelementptr %29[1] : (!llvm.ptr) -> !llvm.ptr, !llvm.ptr
    %31 = llvm.load %30 : !llvm.ptr -> !llvm.ptr

vs

    %28 = llvm.extractvalue %27[10] : !llvm.struct<"iree_hal_executable_dispatch_state_v0_t", (i32, i32, i16, i16, i32, i32, i16, i8, i8, ptr, ptr, ptr)>
    %29 = llvm.getelementptr %28[1] : (!llvm.ptr) -> !llvm.ptr, !llvm.ptr
    %30 = llvm.load %29 : !llvm.ptr -> !llvm.ptr

Suspect this caused the issue.

@MaheshRavishankar
Copy link
Contributor

Can you post the two IRs

@lialan
Copy link
Contributor

lialan commented Aug 22, 2024

Attaching dumped IR files.

This one enables --iree-input-demote-i64-to-i32 which causes value not matching:
demote_to_i32.mlir.txt

This one does not have the option and works fine:
no_demote_to_i32.mlir.txt

@MaheshRavishankar
Copy link
Contributor

Looking at the IR, it doesnt look like a compilation failure. I dont know what the values of inputs you are sending in here. Should make sure that it is safe to demote from i64 to i32. At this point, I dont really see a codegen issue.

@MaheshRavishankar
Copy link
Contributor

@pdhirajkumarprasad up signalling. This doesn't look like a compiler error. Could we close it?

@pdhirajkumarprasad
Copy link
Author

Model works fine without the flag

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working hal/cpu Runtime Host/CPU-based HAL backend integrations/onnx ONNX integration work
Projects
Status: Done
Development

No branches or pull requests

4 participants