-
Notifications
You must be signed in to change notification settings - Fork 588
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing tests on KNL #1904
Comments
info registers in the emergency debugger:
|
Looking through KNL errata, I see:
|
It seems like we should be able to synthesize DR6 by looking at the last instruction executed and seeing which memory it would have accessed. Might be a bit of work, but should be doable. @rocallahan thoughts? |
So you're sure that the issue here is that a watchpoint was set and should have triggered but the CPU did not trigger it? That erratum does not apply to this particular code, right? Are you assuming this is an undocumented hardware bug? |
It looks that way (note the watch point is still triggered just not reported). I'll have to take a more careful look in the morning. I do suspect that the erratum applies, but is just too narrowly worded. I'm talking to Intel to get some more insight into this erratum. |
I'm a little confused here. Our internal watchpoint has |
You mean for all instructions or just for the fast-forward string instructions? The former would be a pretty large amount of work and maintenance burden. If we had to do it I think we'd want to import DynamoRio or something similar instead of rolling it from scratch. But I don't think we have to do it. I think just handling the string instructions would be OK. At this stage, though, I'm not sure that the kind of bug in the erratum is what you're seeing. |
You're right. I was tired and jumped to conclusions too quickly. The actual problem is shown by this test case:
Passes on Xeon, fails on KNL with
In particular, it looks like single stepping now suffers from the same quirk watchpoints do:
|
OK. That means we can't use hardware singlestepping to advance to the precise stopping point. We could emulate string-instruction singlestepping from rr instead, but I just thought of a crazy hack that should work and be easier: when we want to single-step the string instruction, temporarily adjust the IP to be after the REP prefix! |
Hmm, there might be edge cases when the instruction has another prefix before the REP prefix... |
Rewrite |
Yeah, that works. |
Will try and let you know how it goes. |
Another issue is the very first singlestep performed by |
Ah, true. |
And really we should fix this for non-fast-forward singlestep too, since it would be bad if a normal singlestep could progress further than a fast-forward. So it's a bit horrible but I think |
It seems like there's a problem with doing this, because we won't be able to detect the early loop-exit as we do now. I guess we can look at flags and see if they changed. |
The approach discussed above basically works. However, we have performance mysteries: So here's a mystery:
Not sure how to go about debugging this. Thoughts? Gonna play with processor affinity a bit to see if I can make something happen. |
And since this is the same recording, they are by necessity bound to the same cpu core. Yet they're still faster. Something is very wrong here. |
I had a hunch. The following diff makes rr 5 times faster (in this particular test case):
|
Good news! Once all the pending pull requests are merged, the only remaining failure is |
Mostly for my own reference (though feel free to jump in and help out), these are the tests that are failing on KNL (after updating gdb to a reasonably recent version):
Fixed after #1907:
Unrelated to CPU architecture but seen on the same machine:
fork_exec_info_thr
(Fixed by Various changes having to do with unwinding #1910)Fixed by #1909:
Will be fixed by #1911 (as long as #1907 is applied):
The text was updated successfully, but these errors were encountered: