Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#3320 AArch64: Add tool.drcachesim.TLB-threads to ignore list. #7222

Closed
wants to merge 3 commits into from

Conversation

egrimley-arm
Copy link
Contributor

It frequently fails in the same way as other drcachesim tests on SVE hardware.

Issue: #3320

Verified

This commit was signed with the committer’s verified signature.
WenyXu Weny Xu
It frequently fails in the same way as other drcachesim tests
on SVE hardware.

Issue: #3320

Change-Id: I2383af3ca8af584f769ebd8e68fc9a0a82928ed1
@egrimley-arm
Copy link
Contributor Author

Ironically, when I first uploaded this I got some other bogus failures on aarch64-precommit:

debug-internal-64:
    code_api|tool.drcacheoff.multiproc
    code_api|tool.record_filter_bycore_multi

But that would presumably be a different change to a different list.

Copy link
Contributor

@derekbruening derekbruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see from #3320's title and all the references from PR's that it shows up in many tests, so listing every test it can happen in under the ignore list will end up with maybe every single memtrace test. I don't think this is the best way to handle #3320.

@egrimley-arm
Copy link
Contributor Author

It's not just #3320. There's a general problem with test flakiness. Currently the chances of a commit with no bugs getting through the CI seem to be about 50%.[*] It's inefficient for people to have to download the enormous test output, inspect it, decide that the failures are probably uninteresting, then run the entire test suite again. Would it make sense to run ctest --rerun-failed (once or perhaps even twice) before failing?

[*] Someone could probably run a script over historical test results to get a better estimate than that! The chances of a test failing depend on the platform and precise environment so it's not something someone could usefully measure on their own machine.

@derekbruening
Copy link
Contributor

To avoid listing every online memtrace test on a64 sve and then never detecting other failures or regressions there, how about ignoring this particular assert only so we don't ignore other types of failures.

Another proposal is to make the pipe size a parameter and set it to a smaller value on the 2nd and 3rd runs of the retry-on-failure. The smaller value makes the error less likely to happen.

@egrimley-arm
Copy link
Contributor Author

(It's an interesting problem in applied statistics, perhaps, how to derive a fair estimate of the flakiness rate from past results. You could take all runs of a particular test suite and take the subset of those for which the source tree has both passed and failed and take the proportion of failures in that subset. But would that be a fair estimate? I could argue it both ways. Probability and statistics are full of apparent paradoxes.)

@egrimley-arm
Copy link
Contributor Author

To avoid listing every online memtrace test on a64 sve and then never detecting other failures or regressions there, how about ignoring this particular assert only so we don't ignore other types of failures.

I suppose the question is whether the test would pass if the ASSERT were replaced with an ASSERT_CURIOSITY. I'll see if I can answer that question, and I'll update #3320 if I discover anything.

suite/runsuite_wrapper.pl Show resolved Hide resolved
suite/runsuite_wrapper.pl Show resolved Hide resolved
@derekbruening
Copy link
Contributor

To avoid listing every online memtrace test on a64 sve and then never detecting other failures or regressions there, how about ignoring this particular assert only so we don't ignore other types of failures.

I suppose the question is whether the test would pass if the ASSERT were replaced with an ASSERT_CURIOSITY. I'll see if I can answer that question, and I'll update #3320 if I discover anything.

No, something is messed up, so a curiosity is too light. I'm suggesting it only be ignored in a test setting, probably done after the fact (or possible through -ignore_assert_list). But please see my review comment on the retry-on-failure setting: that is the highest priority thing to check.

@egrimley-arm
Copy link
Contributor Author

Closing this as #7238 provides a better work-around and makes AArch64 consistent with other architectures.

(The underlying flakiness, much of which is caused by #3320, still exists, of course.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants