Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deflake some pkgci jobs. #19472

Merged
merged 2 commits into from
Dec 12, 2024
Merged

Deflake some pkgci jobs. #19472

merged 2 commits into from
Dec 12, 2024

Conversation

ScottTodd
Copy link
Member

  • Increase real weight test timeouts from 4 minutes to 10 minutes to work around https://github.com/iree-org/iree/actions/runs/12281522213/job/34271200734#step:9:1461

     ============================== slowest durations ===============================
    240.00s call     SHARK-TestSuite/iree_tests/sharktank/punet/int8/test_cases.json::sdxl_unet_int8_export.mlir::gpu_rocm::real_weights
    31.44s call     SHARK-TestSuite/iree_tests/sharktank/punet/fp16/test_cases.json::sdxl_unet_fp16_export.mlir::gpu_rocm::real_weights
    11.22s call     SHARK-TestSuite/iree_tests/sharktank/llama/open-llama-3b-v2-f16/test_cases.json::open-llama-3b-v2-f16.mlirbc::gpu_rocm::real_weights_prefill
    0.08s call     SHARK-TestSuite/iree_tests/pytorch/models/resnet50/test_cases.json::resnet50.mlirbc::gpu_rocm::real_weights
    0.07s call     SHARK-TestSuite/iree_tests/pytorch/models/opt-125M/test_cases.json::opt-125M.mlirbc::gpu_rocm::real_weights
    
    (10 durations < 0.005s hidden.  Use -vv to show these durations.)
    =========================== short test summary info ============================
    PASSED SHARK-TestSuite/iree_tests/sharktank/llama/open-llama-3b-v2-f16/test_cases.json::open-llama-3b-v2-f16.mlirbc::gpu_rocm::real_weights_prefill
    PASSED SHARK-TestSuite/iree_tests/sharktank/punet/fp16/test_cases.json::sdxl_unet_fp16_export.mlir::gpu_rocm::real_weights
    XFAIL SHARK-TestSuite/iree_tests/pytorch/models/opt-125M/test_cases.json::opt-125M.mlirbc::gpu_rocm::real_weights - Expected compilation to fail (included in 'expected_compile_failures')
    XFAIL SHARK-TestSuite/iree_tests/pytorch/models/resnet50/test_cases.json::resnet50.mlirbc::gpu_rocm::real_weights - Expected compilation to fail (included in 'expected_compile_failures')
    FAILED SHARK-TestSuite/iree_tests/sharktank/punet/int8/test_cases.json::sdxl_unet_int8_export.mlir::gpu_rocm::real_weights - Failed: Timeout >240.0s
    ======= 1 failed, 2 passed, 2 deselected, 2 xfailed in 282.99s (0:04:42) =======
    
  • Skip flaky test_gridsample_zeros_padding op test to work around https://github.com/iree-org/iree/actions/runs/12286576807/job/34287344921#step:8:59

     _ IREE compile and run: test_gridsample_zeros_padding::model.mlir::model.mlir::cpu_llvm_sync _
    [gw3] linux -- Python 3.11.10 /home/runner/work/iree/iree/venv/bin/python
    Error invoking iree-run-module
    Error code: 1
    Stderr diagnostics:
    
    Stdout diagnostics:
    EXEC @test_gridsample_zeros_padding
    [FAILED] result[0]: element at index 3 (2.80544E+13) does not match the expected (0); expected that the view is equal to contents of a view of 1x1x2x4xf32
      expected:
    1x1x2x4xf32=[[[0 0 1.7 0][0 1.7 0 0]]]
      actual:
    1x1x2x4xf32=[[[0 0 1.7 2.80544E+13][2.80544E+13 1.7 0 2.80544E+13]]]
    

    and https://github.com/iree-org/iree/actions/runs/12285879922/job/34285283119#step:8:51

    _ IREE compile and run: test_gridsample_zeros_padding::model.mlir::model.mlir::cpu_llvm_sync _
    [gw3] linux -- Python 3.11.11 /home/runner/work/iree/iree/venv/bin/python
    Error invoking iree-run-module
    Error code: 1
    Stderr diagnostics:
    
    Stdout diagnostics:
    EXEC @test_gridsample_zeros_padding
    [FAILED] result[0]: element at index 3 (39529.7) does not match the expected (0); expected that the view is equal to contents of a view of 1x1x2x4xf32
      expected:
    1x1x2x4xf32=[[[0 0 1.7 0][0 1.7 0 0]]]
      actual:
    1x1x2x4xf32=[[[0 0 1.7 39529.7][39529.7 1.7 0 39529.7]]]
    

    (This test seems to be failing consistently as of ea9176a, but with differing outputs, we could mark it as failing or skip)

@ScottTodd ScottTodd added the infrastructure Relating to build systems, CI, or testing label Dec 12, 2024
Copy link
Contributor

@amd-chrissosa amd-chrissosa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this - do we usually file bugs to re-enable skipped tests after we skip them?

@ScottTodd
Copy link
Member Author

Thanks for doing this - do we usually file bugs to re-enable skipped tests after we skip them?

There are a few such bugs but they can get lost in the noise. When we do file a bug, we leave a comment like

# TODO(#1234): re-enable when this test isn't flaky

The JSON file here also isn't a great place for comments about individual tests though, since it is partly auto-generated and there are just so many test cases.

@ScottTodd ScottTodd merged commit 27742f6 into iree-org:main Dec 12, 2024
36 of 39 checks passed
@ScottTodd ScottTodd deleted the ci-deflake branch December 12, 2024 15:56
ScottTodd added a commit that referenced this pull request Dec 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Relating to build systems, CI, or testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants