Deflake some pkgci jobs. #19472

ScottTodd · 2024-12-12T00:02:09Z

Increase real weight test timeouts from 4 minutes to 10 minutes to work around https://github.com/iree-org/iree/actions/runs/12281522213/job/34271200734#step:9:1461

 ============================== slowest durations ===============================
240.00s call     SHARK-TestSuite/iree_tests/sharktank/punet/int8/test_cases.json::sdxl_unet_int8_export.mlir::gpu_rocm::real_weights
31.44s call     SHARK-TestSuite/iree_tests/sharktank/punet/fp16/test_cases.json::sdxl_unet_fp16_export.mlir::gpu_rocm::real_weights
11.22s call     SHARK-TestSuite/iree_tests/sharktank/llama/open-llama-3b-v2-f16/test_cases.json::open-llama-3b-v2-f16.mlirbc::gpu_rocm::real_weights_prefill
0.08s call     SHARK-TestSuite/iree_tests/pytorch/models/resnet50/test_cases.json::resnet50.mlirbc::gpu_rocm::real_weights
0.07s call     SHARK-TestSuite/iree_tests/pytorch/models/opt-125M/test_cases.json::opt-125M.mlirbc::gpu_rocm::real_weights

(10 durations < 0.005s hidden.  Use -vv to show these durations.)
=========================== short test summary info ============================
PASSED SHARK-TestSuite/iree_tests/sharktank/llama/open-llama-3b-v2-f16/test_cases.json::open-llama-3b-v2-f16.mlirbc::gpu_rocm::real_weights_prefill
PASSED SHARK-TestSuite/iree_tests/sharktank/punet/fp16/test_cases.json::sdxl_unet_fp16_export.mlir::gpu_rocm::real_weights
XFAIL SHARK-TestSuite/iree_tests/pytorch/models/opt-125M/test_cases.json::opt-125M.mlirbc::gpu_rocm::real_weights - Expected compilation to fail (included in 'expected_compile_failures')
XFAIL SHARK-TestSuite/iree_tests/pytorch/models/resnet50/test_cases.json::resnet50.mlirbc::gpu_rocm::real_weights - Expected compilation to fail (included in 'expected_compile_failures')
FAILED SHARK-TestSuite/iree_tests/sharktank/punet/int8/test_cases.json::sdxl_unet_int8_export.mlir::gpu_rocm::real_weights - Failed: Timeout >240.0s
======= 1 failed, 2 passed, 2 deselected, 2 xfailed in 282.99s (0:04:42) =======

Skip flaky test_gridsample_zeros_padding op test to work around https://github.com/iree-org/iree/actions/runs/12286576807/job/34287344921#step:8:59

 _ IREE compile and run: test_gridsample_zeros_padding::model.mlir::model.mlir::cpu_llvm_sync _
[gw3] linux -- Python 3.11.10 /home/runner/work/iree/iree/venv/bin/python
Error invoking iree-run-module
Error code: 1
Stderr diagnostics:

Stdout diagnostics:
EXEC @test_gridsample_zeros_padding
[FAILED] result[0]: element at index 3 (2.80544E+13) does not match the expected (0); expected that the view is equal to contents of a view of 1x1x2x4xf32
  expected:
1x1x2x4xf32=[[[0 0 1.7 0][0 1.7 0 0]]]
  actual:
1x1x2x4xf32=[[[0 0 1.7 2.80544E+13][2.80544E+13 1.7 0 2.80544E+13]]]

and https://github.com/iree-org/iree/actions/runs/12285879922/job/34285283119#step:8:51

_ IREE compile and run: test_gridsample_zeros_padding::model.mlir::model.mlir::cpu_llvm_sync _
[gw3] linux -- Python 3.11.11 /home/runner/work/iree/iree/venv/bin/python
Error invoking iree-run-module
Error code: 1
Stderr diagnostics:

Stdout diagnostics:
EXEC @test_gridsample_zeros_padding
[FAILED] result[0]: element at index 3 (39529.7) does not match the expected (0); expected that the view is equal to contents of a view of 1x1x2x4xf32
  expected:
1x1x2x4xf32=[[[0 0 1.7 0][0 1.7 0 0]]]
  actual:
1x1x2x4xf32=[[[0 0 1.7 39529.7][39529.7 1.7 0 39529.7]]]

(This test seems to be failing consistently as of ea9176a, but with differing outputs, we could mark it as failing or skip)

amd-chrissosa

Thanks for doing this - do we usually file bugs to re-enable skipped tests after we skip them?

ScottTodd · 2024-12-12T15:56:34Z

Thanks for doing this - do we usually file bugs to re-enable skipped tests after we skip them?

There are a few such bugs but they can get lost in the noise. When we do file a bug, we leave a comment like

# TODO(#1234): re-enable when this test isn't flaky

The JSON file here also isn't a great place for comments about individual tests though, since it is partly auto-generated and there are just so many test cases.

Follow-up to #19472. CI is still showing timeouts: https://github.com/iree-org/iree/actions/runs/12300081495/job/34328004297#step:6:390 ci-exactly: build_packages, regression_test

ScottTodd added 2 commits December 11, 2024 15:55

Increase real weight test timeouts from 4 minutes to 10 minutes.

b3c595a

Skip flaky test_gridsample_zeros_padding op test.

4a99f2a

ScottTodd added the infrastructure Relating to build systems, CI, or testing label Dec 12, 2024

ScottTodd requested a review from amd-chrissosa December 12, 2024 00:02

amd-chrissosa approved these changes Dec 12, 2024

View reviewed changes

ScottTodd merged commit 27742f6 into iree-org:main Dec 12, 2024
36 of 39 checks passed

ScottTodd deleted the ci-deflake branch December 12, 2024 15:56

ScottTodd mentioned this pull request Dec 12, 2024

Increase all timeouts in pkgci_regression_test.yml. #19477

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deflake some pkgci jobs. #19472

Deflake some pkgci jobs. #19472

ScottTodd commented Dec 12, 2024

amd-chrissosa left a comment

ScottTodd commented Dec 12, 2024

Deflake some pkgci jobs. #19472

Deflake some pkgci jobs. #19472

Conversation

ScottTodd commented Dec 12, 2024

amd-chrissosa left a comment

Choose a reason for hiding this comment

ScottTodd commented Dec 12, 2024