-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#3320 AArch64: Add tool.drcachesim.TLB-threads to ignore list. #7222
Conversation
It frequently fails in the same way as other drcachesim tests on SVE hardware. Issue: #3320 Change-Id: I2383af3ca8af584f769ebd8e68fc9a0a82928ed1
Ironically, when I first uploaded this I got some other bogus failures on
But that would presumably be a different change to a different list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not just #3320. There's a general problem with test flakiness. Currently the chances of a commit with no bugs getting through the CI seem to be about 50%.[*] It's inefficient for people to have to download the enormous test output, inspect it, decide that the failures are probably uninteresting, then run the entire test suite again. Would it make sense to run [*] Someone could probably run a script over historical test results to get a better estimate than that! The chances of a test failing depend on the platform and precise environment so it's not something someone could usefully measure on their own machine. |
To avoid listing every online memtrace test on a64 sve and then never detecting other failures or regressions there, how about ignoring this particular assert only so we don't ignore other types of failures. Another proposal is to make the pipe size a parameter and set it to a smaller value on the 2nd and 3rd runs of the retry-on-failure. The smaller value makes the error less likely to happen. |
(It's an interesting problem in applied statistics, perhaps, how to derive a fair estimate of the flakiness rate from past results. You could take all runs of a particular test suite and take the subset of those for which the source tree has both passed and failed and take the proportion of failures in that subset. But would that be a fair estimate? I could argue it both ways. Probability and statistics are full of apparent paradoxes.) |
I suppose the question is whether the test would pass if the |
No, something is messed up, so a curiosity is too light. I'm suggesting it only be ignored in a test setting, probably done after the fact (or possible through -ignore_assert_list). But please see my review comment on the retry-on-failure setting: that is the highest priority thing to check. |
It frequently fails in the same way as other drcachesim tests on SVE hardware.
Issue: #3320