i#3320 AArch64: Add tool.drcachesim.TLB-threads to ignore list. #7222

egrimley-arm · 2025-01-27T09:20:34Z

It frequently fails in the same way as other drcachesim tests on SVE hardware.

It frequently fails in the same way as other drcachesim tests on SVE hardware. Issue: #3320 Change-Id: I2383af3ca8af584f769ebd8e68fc9a0a82928ed1

egrimley-arm · 2025-01-27T12:25:08Z

Ironically, when I first uploaded this I got some other bogus failures on aarch64-precommit:

debug-internal-64:
    code_api|tool.drcacheoff.multiproc
    code_api|tool.record_filter_bycore_multi

But that would presumably be a different change to a different list.

derekbruening

You can see from #3320's title and all the references from PR's that it shows up in many tests, so listing every test it can happen in under the ignore list will end up with maybe every single memtrace test. I don't think this is the best way to handle #3320.

egrimley-arm · 2025-01-27T14:09:29Z

It's not just #3320. There's a general problem with test flakiness. Currently the chances of a commit with no bugs getting through the CI seem to be about 50%.[*] It's inefficient for people to have to download the enormous test output, inspect it, decide that the failures are probably uninteresting, then run the entire test suite again. Would it make sense to run ctest --rerun-failed (once or perhaps even twice) before failing?

[*] Someone could probably run a script over historical test results to get a better estimate than that! The chances of a test failing depend on the platform and precise environment so it's not something someone could usefully measure on their own machine.

derekbruening · 2025-01-27T14:15:32Z

To avoid listing every online memtrace test on a64 sve and then never detecting other failures or regressions there, how about ignoring this particular assert only so we don't ignore other types of failures.

Another proposal is to make the pipe size a parameter and set it to a smaller value on the 2nd and 3rd runs of the retry-on-failure. The smaller value makes the error less likely to happen.

egrimley-arm · 2025-01-27T14:23:44Z

(It's an interesting problem in applied statistics, perhaps, how to derive a fair estimate of the flakiness rate from past results. You could take all runs of a particular test suite and take the subset of those for which the source tree has both passed and failed and take the proportion of failures in that subset. But would that be a fair estimate? I could argue it both ways. Probability and statistics are full of apparent paradoxes.)

egrimley-arm · 2025-01-27T14:32:10Z

To avoid listing every online memtrace test on a64 sve and then never detecting other failures or regressions there, how about ignoring this particular assert only so we don't ignore other types of failures.

I suppose the question is whether the test would pass if the ASSERT were replaced with an ASSERT_CURIOSITY. I'll see if I can answer that question, and I'll update #3320 if I discover anything.

suite/runsuite_wrapper.pl

derekbruening · 2025-01-27T15:05:21Z

To avoid listing every online memtrace test on a64 sve and then never detecting other failures or regressions there, how about ignoring this particular assert only so we don't ignore other types of failures.

I suppose the question is whether the test would pass if the ASSERT were replaced with an ASSERT_CURIOSITY. I'll see if I can answer that question, and I'll update #3320 if I discover anything.

No, something is messed up, so a curiosity is too light. I'm suggesting it only be ignored in a test setting, probably done after the fact (or possible through -ignore_assert_list). But please see my review comment on the retry-on-failure setting: that is the highest priority thing to check.

egrimley-arm · 2025-01-31T07:59:43Z

Closing this as #7238 provides a better work-around and makes AArch64 consistent with other architectures.

(The underlying flakiness, much of which is caused by #3320, still exists, of course.)

egrimley-arm added the OpSys-AArch64 label Jan 27, 2025

egrimley-arm requested a review from joshua-warburton January 27, 2025 09:20

egrimley-arm self-assigned this Jan 27, 2025

egrimley-arm added 2 commits January 27, 2025 11:41

Merge branch 'master' into i3320-TLB-threads

Loading
Loading status checks…

e68297b

Merge branch 'master' into i3320-TLB-threads

Loading
Loading status checks…

054917e

derekbruening requested changes Jan 27, 2025

View reviewed changes

derekbruening reviewed Jan 27, 2025

View reviewed changes

suite/runsuite_wrapper.pl Show resolved Hide resolved

suite/runsuite_wrapper.pl Show resolved Hide resolved

This was referenced Jan 27, 2025

ASSERT type_is_instr in multiple drcachesim online tests due to weird pipe ordering issue #3320

Open

Fix flaky tests on AArch64 #2417

Closed

egrimley-arm closed this Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i#3320 AArch64: Add tool.drcachesim.TLB-threads to ignore list. #7222

i#3320 AArch64: Add tool.drcachesim.TLB-threads to ignore list. #7222

egrimley-arm commented Jan 27, 2025

egrimley-arm commented Jan 27, 2025

derekbruening left a comment

egrimley-arm commented Jan 27, 2025

derekbruening commented Jan 27, 2025

egrimley-arm commented Jan 27, 2025

egrimley-arm commented Jan 27, 2025

derekbruening commented Jan 27, 2025

egrimley-arm commented Jan 31, 2025

i#3320 AArch64: Add tool.drcachesim.TLB-threads to ignore list. #7222

i#3320 AArch64: Add tool.drcachesim.TLB-threads to ignore list. #7222

Conversation

egrimley-arm commented Jan 27, 2025

egrimley-arm commented Jan 27, 2025

derekbruening left a comment

Choose a reason for hiding this comment

egrimley-arm commented Jan 27, 2025

derekbruening commented Jan 27, 2025

egrimley-arm commented Jan 27, 2025

egrimley-arm commented Jan 27, 2025

derekbruening commented Jan 27, 2025

egrimley-arm commented Jan 31, 2025