Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX "resize" op test failures #17345

Open
ScottTodd opened this issue May 10, 2024 · 7 comments
Open

ONNX "resize" op test failures #17345

ScottTodd opened this issue May 10, 2024 · 7 comments
Labels
bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend integrations/onnx ONNX integration work integrations/pytorch PyTorch integration work

Comments

@ScottTodd
Copy link
Member

What happened?

#17330 updates our LLVM and torch-mlir commits, pulling in llvm/torch-mlir#3013. Some tests are newly passing, many tests are still failing somewhere (compiler, runtime numerics), and a few tests are hanging on certain platforms.

At least CUDA is hanging on test_resize_downsample_scales_linear:
https://github.com/iree-org/iree/actions/runs/9034897378/job/24828864270?pr=17330#step:9:1813
I can't reproduce that on Windows though.

Steps to reproduce your issue

Generally follow the instructions at https://github.com/nod-ai/SHARK-TestSuite/tree/main/iree_tests and pull the config files from this repo.

For example, to run on CUDA:

pytest onnx/ -k test_resize -rA \
  --config-files=D:\dev\projects\iree\build_tools\pkgci\external_test_suite\onnx_gpu_cuda.json \
  --ignore-xfails

or Vulkan:

pytest onnx/ -k test_resize -rA \
  --config-files=D:\dev\projects\iree\build_tools\pkgci\external_test_suite\onnx_gpu_vulkan.json \
  --ignore-xfails
Config Logs
CPU https://gist.github.com/ScottTodd/0778165b2d31a54bfefbb9fa2b2662d6
CUDA https://gist.github.com/ScottTodd/dd34be6577da489f3d5b6b0a0a65ed0d
Vulkan https://gist.github.com/ScottTodd/b2f509585bee804ebd900e2144258241

Note that Vulkan has model.mlir:4:10: error: failed to legalize operation 'arith.fptosi' that was explicitly marked illegal

What component(s) does this issue relate to?

Frontends, Compiler, Runtime

Version information

No response

Additional context

No response

@bjacob
Copy link
Contributor

bjacob commented May 10, 2024

FYI @AmosLewis this is the reason why llvm/torch-mlir#3013 was ultimately dropped from the integrate #17330.

@AmosLewis
Copy link
Contributor

FYI @AmosLewis this is the reason why llvm/torch-mlir#3013 was ultimately dropped from the integrate #17330.

Will you start a new PR to bump it next? Do you have any idea is it a torch-mlir bug or is it an iree bug?

@ScottTodd
Copy link
Member Author

  • I suspect the Vulkan failed to legalize operation 'arith.fptosi' error is in upstream MLIR SPIRV (missing lowering)
  • Numerical errors in tests could be issues in the torch-mlir lowerings
  • CUDA hang ... no idea, couldn't get much from CI logs and couldn't reproduce on Windows. Maybe a miscompile (torch-mlir lowering) or runtime issue (IREE CUDA HAL), if compilation succeeded but the hang was a runtime.

@AmosLewis
Copy link
Contributor

nod-ai/SHARK-ModelDev#616 the model and failure resize mlir are listed in the description

@bjacob
Copy link
Contributor

bjacob commented May 10, 2024

Will you start a new PR to bump it next?

I don't plan to do it myself. We have an integration rotation schedule and the integrates of this week were already done out-of-schedule :-)

@ScottTodd
Copy link
Member Author

We have a separate rotation for updating torch-mlir (in fact, @AmosLewis is up for next week 🤔). They are usually updated separately but needed to be updated together in this case.

@AmosLewis
Copy link
Contributor

#17358

AmosLewis added a commit that referenced this issue May 14, 2024
Solve iree issue: ONNX "resize" op test failures #17345
bangtianliu pushed a commit to bangtianliu/iree that referenced this issue Jun 5, 2024
LLITCHEV pushed a commit to LLITCHEV/iree that referenced this issue Jul 30, 2024
Solve iree issue: ONNX "resize" op test failures iree-org#17345

Signed-off-by: Lubo Litchev <[email protected]>
@ScottTodd ScottTodd added the integrations/onnx ONNX integration work label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐞 Something isn't working codegen/spirv SPIR-V code generation compiler backend integrations/onnx ONNX integration work integrations/pytorch PyTorch integration work
Projects
None yet
Development

No branches or pull requests

3 participants