-
Notifications
You must be signed in to change notification settings - Fork 573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
trace signal placement in rseq region is wrong due to abort handler redirect #4041
Comments
Update: the invariant test is actually not complaining about the ud2a: it has no logic where it knows that instruction raises a signal. Rather, it is complaining about the signal handler returning to the abort handler instead of the interruption point:
That raises another question: should DR raise yet another kernel xfer event for that, since it is surprising? Though the signal handler could longjmp anywhere (this invariant test assumes an app that doesn't do that). |
When a migration or context switch happens during rseq native execution, we now raise a kernel xfer event. The event is of a new type DR_XFER_RSEQ_ABORT. To implement this, the native abort handler cannot be linked and must return to dispatch. The special-exit-reason feature is used for this purpose. Adds a test. To force a migration we use a system call, which we do not normally allow inside an rseq region. I added a debug-build exception for this particular test by executable name, along with a syscall discovery workaround for the attach test. Adds a client via static DR to api.rseq to test that the event is raised. Adds handling to drmemtrace in the tracer and raw2trace. For raw2trace we walk backward to undo the committing store that was recorded, since a real rseq abort would happen before the final store. I would like to add on offline trace rseq regression test, but it hits Issue; #2350, #4019, #4041 Fixes #4019
My proposal is to keep the 1st translation around somewhere and pass it to the kernel xfer event, but not the signal event (since that should match the kernel). Then memtrace can use the kernel xfer event source to place the signal marker. I would also add logic to the trace_invariants that ud2a should be immediately followed by a signal marker. This issue covers including a regression test for #4019, since it's blocking doing so and they both involve an rseq memtrace. I don't think that will require any extra work: once an rseq abort is present, the existing marker checks should cover it. |
When a migration or context switch happens during rseq native execution, we now raise a kernel xfer event. The event is of a new type DR_XFER_RSEQ_ABORT. To implement this, the native abort handler cannot be linked and must return to dispatch. The special-exit-reason feature is used for this purpose. Adds a test. To force a migration we use a system call, which we do not normally allow inside an rseq region. I added a debug-build exception for this particular test by executable name, along with a syscall discovery workaround for the attach test. Adds a client via static DR to api.rseq to test that the event is raised. Adds handling to drmemtrace in the tracer and raw2trace. For raw2trace we walk backward to undo the committing store that was recorded, since a real rseq abort would happen before the final store. I would like to add on offline trace rseq regression test, but it hits #4041 and so the test will be added as part of that issue. Issue; #2350, #4019, #4041 Fixes #4019
When a signal occurs inside an rseq region, DR now provides the actual interruption PC to the kernel xfer event (but not the signal event, to match kernel behavior). This is required to accurately place the signal in a drmemtrace offline trace. What this looks like: Before: 0x00007fe36b1ea8f1 74 02 jz $0x00007fe36b1ea8f5 0x00007fe36b1ea8f3 0f 0b ud2a 0x00007fe36b1ea8f5 83 05 74 38 00 00 01 addl $0x01, <rel> 0x00007fe36b1ee170 <marker: kernel xfer to handler> <marker: timestamp 13224196441528495> <marker: tid 74400 on core 0> 0x00007fe36b1ea85c 55 push %rbp After: 0x00007f15e71d78f1 74 02 jz $0x00007f15e71d78f5 0x00007f15e71d78f3 0f 0b ud2a <marker: kernel xfer to handler> <marker: timestamp 13224196672834337> <marker: tid 78727 on core 1> 0x00007f15e71d785c 55 push %rbp Adds a trace_invariants check for a signal immediately after UD2A. I first went through and added module_mapper and decoding to trace_invariants: but then we have to link in static DR (even if we do our own raw decoding as module_mapper uses DR) and thus we can't run the test anymore on AArch64 (#2007) or Mac. Instead, I went with a simpler solution of putting the annotations that signal_invariants.c uses into rseq.c. I had to fix 2 bugs to get the marker in the right place and to get this invariants check to fire: 1) raw2trace was adding to cur_modoffs before memrefs and then having an extra check for where to put the the signal, which doesn't make sense. It ended up putting the marker too early. After fixing, the other tests still pass, so it is not clear why I had it that way. 2) op_offline was off inside trace_invariants, so the assert on a signal after ud2a was not firing. I now have analyzer_multi setting op_offline up front as a synthetic option for post-processing usage. I also added a direct jump as an annotation as the start of the abort handler in rseq.c to avoid trace_invariants complaining about a signal not returning to the interruption point. I added a drcachesim offline regression test tracing the rseq test's code. This same test will be separately used for #4038. Issue: #4038, #4041 Fixes #4041
When a signal occurs inside an rseq region, DR now provides the actual interruption PC to the kernel xfer event (but not the signal event, to match kernel behavior). This is required to accurately place the signal in a drmemtrace offline trace. What this looks like: Before: 0x00007fe36b1ea8f1 74 02 jz $0x00007fe36b1ea8f5 0x00007fe36b1ea8f3 0f 0b ud2a 0x00007fe36b1ea8f5 83 05 74 38 00 00 01 addl $0x01, <rel> 0x00007fe36b1ee170 <marker: kernel xfer to handler> <marker: timestamp 13224196441528495> <marker: tid 74400 on core 0> 0x00007fe36b1ea85c 55 push %rbp After: 0x00007f15e71d78f1 74 02 jz $0x00007f15e71d78f5 0x00007f15e71d78f3 0f 0b ud2a <marker: kernel xfer to handler> <marker: timestamp 13224196672834337> <marker: tid 78727 on core 1> 0x00007f15e71d785c 55 push %rbp Adds a trace_invariants check for a signal immediately after UD2A. I first went through and added module_mapper and decoding to trace_invariants: but then we have to link in static DR (even if we do our own raw decoding as module_mapper uses DR) and thus we can't run the test anymore on AArch64 (#2007) or Mac. Instead, I went with a simpler solution of putting the annotations that signal_invariants.c uses into rseq.c. I had to fix 2 bugs to get the marker in the right place and to get this invariants check to fire: 1) raw2trace was adding to cur_modoffs before memrefs and then having an extra check for where to put the the signal, which doesn't make sense. It ended up putting the marker too early. After fixing, the other tests still pass, so it is not clear why I had it that way. 2) op_offline was off inside trace_invariants, so the assert on a signal after ud2a was not firing. I now have analyzer_multi setting op_offline up front as a synthetic option for post-processing usage. I also added a direct jump as an annotation as the start of the abort handler in rseq.c to avoid trace_invariants complaining about a signal not returning to the interruption point. I added a drcachesim offline regression test tracing the rseq test's code. This same test will be separately used for #4038. Issue: #4038, #4041 Fixes #4041
Makes the following changes to ensure the drcachesim rseq test from + Confirms that the gap is found by trace_invariants. It wasn't in HEAD: here I fix a bug to only honor pref_xfer_marker if it was not cleared. + Adds ending of a bb right after an rseq committing store, to simplify the native-rseq-mangled bb and drmemtrace's handling: raw2trace wants to include the subsequent instructions in the bb; they are removed by #4038's rollback but it seems cleaner to just not have them there in the bb in the first place. Issue: #4038, #4041
When a signal occurs inside an rseq region, DR now provides the actual interruption PC to the kernel xfer event (but not the signal event, to match kernel behavior). This is required to accurately place the signal in a drmemtrace offline trace. What this looks like: Before: 0x00007fe36b1ea8f1 74 02 jz $0x00007fe36b1ea8f5 0x00007fe36b1ea8f3 0f 0b ud2a 0x00007fe36b1ea8f5 83 05 74 38 00 00 01 addl $0x01, <rel> 0x00007fe36b1ee170 <marker: kernel xfer to handler> <marker: timestamp 13224196441528495> <marker: tid 74400 on core 0> 0x00007fe36b1ea85c 55 push %rbp After: 0x00007f15e71d78f1 74 02 jz $0x00007f15e71d78f5 0x00007f15e71d78f3 0f 0b ud2a <marker: kernel xfer to handler> <marker: timestamp 13224196672834337> <marker: tid 78727 on core 1> 0x00007f15e71d785c 55 push %rbp Adds a trace_invariants check for a signal immediately after UD2A. I first went through and added module_mapper and decoding to trace_invariants: but then we have to link in static DR (even if we do our own raw decoding as module_mapper uses DR) and thus we can't run the test anymore on AArch64 (#2007) or Mac. Instead, I went with a simpler solution of putting the annotations that signal_invariants.c uses into rseq.c. I had to fix 2 bugs to get the marker in the right place and to get this invariants check to fire: 1) raw2trace was adding to cur_modoffs before memrefs and then having an extra check for where to put the the signal, which doesn't make sense. It ended up putting the marker too early. After fixing, the other tests still pass, so it is not clear why I had it that way. 2) op_offline was off inside trace_invariants, so the assert on a signal after ud2a was not firing. I now have analyzer_multi setting op_offline up front as a synthetic option for post-processing usage. I also added a direct jump as an annotation as the start of the abort handler in rseq.c to avoid trace_invariants complaining about a signal not returning to the interruption point. I added a drcachesim offline regression test tracing the rseq test's code. This same test will be separately used for #4019. Issue: #4019, #4041 Fixes #4041
Makes the following changes to ensure the drcachesim rseq test from + Confirms that the gap is found by trace_invariants. It wasn't in HEAD: here I fix a bug to only honor pref_xfer_marker if it was not cleared. + Adds ending of a bb right after an rseq committing store, to simplify the native-rseq-mangled bb and drmemtrace's handling: raw2trace wants to include the subsequent instructions in the bb; they are removed by #4019's rollback but it seems cleaner to just not have them there in the bb in the first place. Issue: #4019, #4041
Note that the work here solved this problem for offline traces only, not for online: probably we should reopen for online. |
Re-opening as the solution here is not ideal:
So the new proposal is to revert what was done here, and then:
|
With our run-twice rseq solution, we first run an instrumented path through the rseq region which goes into the recorded drmemtrace trace. But the native rseq execution might take a side exit or abort. We already had code to try to roll back the committing store on an abort, but it was fragile as a timestamp in between thwarted it. We had no code to make a side exit match and as a result we'd see PC discontinuities. Here, we solve both problems with the following approach: First, we have DR add a new DR_NOTE_RSEQ_ENTRY label to the top of the first block in an rseq region. This label's data stores the end PC and the abort handler PC (we do not currently use the latter in drmemtrace: once we revert #4041 we may need to do so). Next, the tracer turns that label into a new TRACE_MARKER_TYPE_RSEQ_ENTRY marker with the end PC as the marker value. Finally, rawtrace uses the new label to fix up the trace. When it sees the entry marker, it starts buffering instead of writing out records. While buffering, it remembers the decode PC of each insruction (for encoding entries), plus the PC + target of each branch. When it sees the rseq end PC, it sets a flag, and on the next instruction it has the rseq exit PC. If it sees an rseq abort marker or a signal marker with the flag set, it uses the interrupted PC as the rseq exit PC (this avoids having to wait for the other side of the signal handler). With this rseq exit PC, raw2trace takes the following actions depending on the PC value: + End PC fall-through: completed; emit entire buffer. + Committing store PC: roll back final instr in buffer. (Once #4041 reverts this to the handler PC we'll use the handler PC here and will add a new marker to hold that data.) + Target of a branch in the buffer: roll back to that branch. If multiple: take most recent. + None of the above: there cannot be an indirect branch we observed so we assume this is an inverted cbr. If there is a forward direct cbr which was taken and stayed in the region, we synthesize a direct jump in the gap it jumped over. This is not perfect but is the best we can readily do. We leave the old rollback method (which didn't handle a timestamp in between) in place but it is only enabled if we see an rseq abort marker without the new rseq entry marker first. We decided to not bump the trace version nor add a new filetype to indicate the presence of rseq entry markers as we can support legacy traces this way (at least as well as they were supported before). We give up on identifying which of multiple indirect rseq side exits were taken. We give up on distinguishing a side exit whose target is the end PC fall-through from a completion. For emitting the accumulated rseq buffer, the delayed branch code added for #5941 will handle chunk boundaries. A side improvements along the way: we add a branch target pc to instr_summary_t and avoided increasing its space by removing the previously-stored next_pc and computing it from the length. Adds a number of unit tests for rseq rollback and side exits, along with a more-real instance in the linux.rseq test. This was also tested on a large proprietary application with multiple real instances of rseq aborts and side exits. Issue: #5953, #5953, #4041 Fixes #5953 Fixes #5954
The Linux kernel presents the rseq abort handler PC as the interrupted (really, the resume point) for a signal generated while inside an rseq region. Yet DR was modifying that to instead show clients the rseq committing store PC. Similarly, DR presented that commit PC for its rseq abort event if an rseq region aborted through some other non-signal means like a thread migration. Since this is a little misleading and doesn't match what the app sees, and results in the signal interrutped PC not matching the later resume PC, we change that behavior here to have both signal and rseq events use the handler PC. There is no way to obtain the precise interruption PC from the kernel so we follow that lead and do not try to present it from DR. Furthemore, DR was not previously generating an rseq abort event if a signal was the cause of the abort. We change that here, raising the event prior to the signal event. Adds a new drmemtrace label TRACE_MARKER_TYPE_RSEQ_HANDLER to communicate the handler PC to raw2trace. Updates raw2trace to use this to detect an rseq abort. Adds a new test case to raw2trace_unit_tests. Updates the api.rseq test. Fixes #4041
With our run-twice rseq solution, we first run an instrumented path through the rseq region which goes into the recorded drmemtrace trace. But the native rseq execution might take a side exit or abort. We already had code to try to roll back the committing store on an abort, but it was fragile as a timestamp in between thwarted it. We had no code to make a side exit match and as a result we'd see PC discontinuities. Here, we solve both problems with the following approach: First, we have DR add a new DR_NOTE_RSEQ_ENTRY label to the top of the first block in an rseq region. This label's data stores the end PC and the abort handler PC (we do not currently use the latter in drmemtrace: once we revert #4041 we may need to do so). Next, the tracer turns that label into a new TRACE_MARKER_TYPE_RSEQ_ENTRY marker with the end PC as the marker value. Finally, rawtrace uses the new label to fix up the trace. When it sees the entry marker, it starts buffering instead of writing out records. While buffering, it remembers the decode PC of each insruction (for encoding entries), plus the PC + target of each branch. When it sees the rseq end PC, it sets a flag, and on the next instruction it has the rseq exit PC. If it sees an rseq abort marker or a signal marker with the flag set, it uses the interrupted PC as the rseq exit PC (this avoids having to wait for the other side of the signal handler). With this rseq exit PC, raw2trace takes the following actions depending on the PC value: + End PC fall-through: completed; emit entire buffer. + Committing store PC: roll back final instr in buffer. (Once #4041 reverts this to the handler PC we'll use the handler PC here and will add a new marker to hold that data.) + Target of a branch in the buffer: roll back to that branch. If multiple: take most recent. + None of the above: there cannot be an indirect branch we observed so we assume this is an inverted cbr. If there is a forward direct cbr which was taken and stayed in the region, we synthesize a direct jump in the gap it jumped over. This is not perfect but is the best we can readily do. We leave the old rollback method (which didn't handle a timestamp in between) in place but it is only enabled if we see an rseq abort marker without the new rseq entry marker first. We decided to not bump the trace version nor add a new filetype to indicate the presence of rseq entry markers as we can support legacy traces this way (at least as well as they were supported before). We give up on identifying which of multiple indirect rseq side exits were taken. We give up on distinguishing a side exit whose target is the end PC fall-through from a completion. For emitting the accumulated rseq buffer, the delayed branch code added for #5941 will handle chunk boundaries. A side improvements along the way: we add a branch target pc to instr_summary_t and avoided increasing its space by removing the previously-stored next_pc and computing it from the length. Adds a number of unit tests for rseq rollback and side exits, along with a more-real instance in the linux.rseq test. This was also tested on a large proprietary application with multiple real instances of rseq aborts and side exits. Issue: #5953, #5953, #4041 Fixes #5953 Fixes #5954
The Linux kernel presents the rseq abort handler PC as the interrupted (really, the resume point) for a signal generated while inside an rseq region. Yet DR was modifying that to instead show clients the rseq committing store PC. Similarly, DR presented that commit PC for its rseq abort event if an rseq region aborted through some other non-signal means like a thread migration. Since this is a little misleading and doesn't match what the app sees, and results in the signal interrupted PC not matching the later resume PC, we change that behavior here to have both signal and rseq events use the handler PC. There is no way to obtain the precise interruption PC from the kernel so we follow that lead and do not try to present it from DR. Furthemore, DR was not previously generating an rseq abort event if a signal was the cause of the abort. We change that here, raising the event prior to the signal event. Updates raw2trace to use the handler PC that is now in the rseq abort marker to detect an rseq abort, rather than looking for the commit PC. Adds a new test case to raw2trace_unit_tests and updates others. Updates the api.rseq test. Fixes #4041
We have a problem: a fault in the instrumented rseq run is no longer placed correctly! Trace invariant failure in T1724466 at ref # 94702: Interruption marker mis-placed
The rseq test raises SIGILL the first time through that code (then in the rseq abort But why did this pass in PR #5970 and is only failing now on the filter PR #5971? Unclear. The proposed solution is to have the rseq abort event contain the precise interrupted PC |
Fixes problems with the new rseq adjustment code with filtered traces: + Load and update the buffer pointer when emitting the rseq entry label + Do not try to adjust for i-filtered Adds 2 new tests for filtering the linux.rseq app. The new tests revealed a problem with the instr counts in the schedule file: raw2trace was counting the PC-only entries as instructions; we fix that here. Reverts the rseq abort event and marker value back to the interrupted PC for a signal in the instrumented execution. Leaves all other cases (all aborts in the native execution) as the handler. This lets us properly place the abort+signal for the instrumented interruption case. Adjusts the invariant checker abort check to use the new end PC for a more accurate test to avoid false positives and negatives. Fixes #5953, #4041
Fixes 3 more problems with rseq in DR and in drmemtraces: 1) Side exit in instrumented run: avoids unbounded buffer growth by detecting by looking at the region bounds. Adds a buffer large size sanity check. Adds a test to linux.rseq. 2) Try to handle double-abort: couldn't repro in test though so the fix of disabling a later signal from reporting an abort when we report an abort fragment exit is not fully vetted. 3) Asynch signal in instrumented run: this was not being adjusted so the interruption/continuation PC is the abort handler. Adds a test to linux.rseq. Adds more invariant checks around rseq region exits. Issue: #5953, #4041, #5954
Fixes 3 more problems with rseq in DR and in drmemtraces related to exits and signals: 1) Side exit in instrumented run: avoids unbounded buffer growth by detecting by looking at the region bounds. Adds a buffer large size sanity check. Adds a test to linux.rseq. 2) Try to handle double-abort: couldn't repro in test though so the fix of disabling a later signal from reporting an abort when we report an abort fragment exit is not fully vetted. 3) Asynch signal in instrumented run: this was not being adjusted so the interruption/continuation PC is the abort handler. Adds a test to linux.rseq. Adds more invariant checks around rseq region exits. Issue: #5953, #4041, #5954, #5980 Fixes #5980
Gathering an offline drcachesim trace on the rseq test, for #4019, shows a signal marker too late -- it should be right after the ud2a:
This is caught by the invariants test, which I was hoping to add as a regression test for #4019:
Looks like the recorded offset is wrong:
Ah here we go:
Hmm, how can we solve that? The kernel does not record the actual signal
PC (unlike rseq v1), so DR does not either, and thus the tracer does not
know when the signal was triggered.
The text was updated successfully, but these errors were encountered: