-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support drreg use outside of insertion phase combined with use in insertion phase #3823
Comments
… sequences. Adds the functions drx_expand_scatter_gather() and drx_expand_scatter_gather_ex() that are capable of expanding AVX-512 scatter and gather and AVX2 gather instructions into equivalent sequences of scalar code. WARNING: Do not use the new functions yet, support is incomplete. In particular, * Certain registers may be clobbered by the sequence and will not be restored. * Application state may not be properly restored if the sequence is getting interrupted. * drreg may fail if used in other phases in addition to app2app (xref #3823). Adds a limited test for the expanded sequences. The test works with above limitations, because the clobbered registers are not live in the test application and the application does not check the mask state until the instruction is complete. In the test application this may result in occasional duplicated scalar loads and stores, but will not break the test. Adds a check for the mask registers to test client.avx512ctx. Issue: #2985
…o scalar sequences. (#3831) Adds the function drx_expand_scatter_gather() that is capable of expanding AVX-512 scatter and gather and AVX2 gather instructions into equivalent sequences of scalar code. WARNING: Do not use yet, support is incomplete. In particular, * Certain registers may be clobbered by the sequence and will not be restored. * Application state may not be properly restored if the sequence is getting interrupted. * drreg may fail if used in other phases in addition to app2app (xref #3823). Furthermore, * AArch64 support is missing (xref #3837). * The qword index and value versions are not supported in 32-bit mode. Adds a limited test for the expanded sequences. The test works with above limitations, because the clobbered registers are not live in the test application and the application does not check the mask state until the instruction is complete. In the test application this may result in occasional duplicated scalar loads and stores, but will not break the test. Adds a check for the mask registers to test client.avx512ctx. Issue: #2985, #3837
What about, instead of having global state and these setter/clear methods (still not clear whether this can rely on that we are globally synchronized or would need a lock/mutex), we tie the drreg_reserve_register() and drreg_reserve_aflags() etc. calls to phase? Starting with some default phase (NOPHASE) in case drreg is used outside of drmgr (is this even supported?). Any call to drreg_reserve_register() and similar API will tie the slot it is using to the current drmgr phase. In automatic management and optimizations in insertion phase, only slots of NOPHASE or insertion phase would be considered. Yes this would not support several users in the same phase. But makes it simpler from an API point of view. If somebody would every need this, one could extend the scheme to passing custom phase types in drreg_reserve_register() versions that take in a custom phase. It should just work then and no big changes would need to be made. |
We had an offline discussion and this is the proposal I'm pushing: Proposal: have drreg-inserted spills+restores added in the app2app phase marked drreg should expose a way to query the label, in case some clients want to This does not address multiple app2app events using drreg but I'm |
Adds to several comments in drreg following analysis with respect to #3823. Issue #3823's title will be renamed to cover the multiple users aspect. Single user app2app insertion should be ok as is. we're leaving the door open to potentially introduce special labels mentioned in #3823 with respect to restore state event in drreg. Minor renaming drreg_forward_analysis to drreg_forward_liveness_analysis. Issue: #3823
…3899) Adds to several comments in drreg following analysis with respect to #3823. Issue #3823's title will be renamed to cover the multiple users aspect. Single user app2app insertion should be ok as is. We're leaving the door open to potentially introduce special labels mentioned in #3823 with respect to restore state event in drreg. Issue: #3823
AFAICS (xref #3899) this issue here should now cover specifically mixed use of drreg from in and outside of the insertion phase and multiple users. |
There can still be conflicts in spill slots. In #4863 (comment), I'm seeing that the spilled value of a register reserved in app2app phase (by |
Was the mark-as-"app" proposal supposed to be combined with the disjoint slot spaces or "arenas" proposal? |
An alternative to the disjoint slot spaces design: When picking a spill slot, I'm not complete sure yet how this will affect drreg analyses/optimisations though. E.g., in previous phases, if drreg skipped spilling a reg value on finding that it was already spilled previously to some spill slot, then we shouldn't use that spill slot in later phases even if we didn't see a prior spill for it. Maybe we can add a new |
This is needed when there are multiple instrumentation passes that use drreg to spill reg values to slots. In this case, we need to avoid spill slots that were already used in prior instrumentation passes. The added logic avoids any spill slot used by prior passes in the current ilist. This is not completely optimal as a spill slot may be available for use at some point in the ilist even if it is already used elsewhere in the ilist. But we live with this sub-optimality. Issue: #3823
Adds a stricter free spill slot check in drreg. This is needed when there are multiple instrumentation passes that use drreg to spill reg values to slots. In this case, we need to avoid spill slots that are in-use by prior instrumentation passes. The added logic looks at the remaining instrs in the given ilist after 'where'. Any spill slot usage after 'where' must have been added by a prior instrumentation pass and not the current one. To avoid spill slot conflicts, we should avoid using such spill slots with pending usage. This is not completely optimal as a spill slot may be available for use at some point in the ilist even if it is used later in the ilist. But we live with this sub-optimality. Adds new drreg_bb_properties_t (DRREG_HANDLE_MULTI_PHASE_SLOT_RESERVATIONS) that enables the stricter free spill slot check. This is currently set by drx_expand_scatter_gather. Uses data area of label instrs marked with a special note to store information about spill slot usage, which is used in the new stricter free spill slot check. Adds documentation about the new option, highlighting that it's necessary to enable this for multi-phase drreg reg reservations to work as intended, and that it is not enabled by default due to performance cost. Delays resetting bb_props to late in the instru2instru stage, in case stages besides insertion need to read it. Extends drreg-test to verify that spill slots used in later phases do not conflict with those from earlier phases. Fixes: #3823
Adds a stricter free spill slot check for GPR reservations in drreg. This is needed when there are multiple instrumentation passes that use drreg to spill GPR values to slots. In this case, we need to avoid slots that are in-use by prior instrumentation passes. The added logic looks at the remaining instrs in the given ilist after 'where'. Any spill slot usage after 'where' must have been added by a prior instrumentation pass and not the current one. To avoid spill slot conflicts, we should avoid using such spill slots with pending usage. This is not completely optimal as a spill slot may be available for use at some point in the ilist even if it is used later in the ilist. But we live with this sub-optimality. Adds new drreg_bb_properties_t that enables the stricter free spill slot check (DRREG_HANDLE_MULTI_PHASE_SLOT_RESERVATIONS). This is currently set by drx_expand_scatter_gather. Uses data area of label instrs marked with a special note to store information about spill slot usage, which is used in the new stricter free spill slot check. Adds documentation about the new option, highlighting that it's necessary to enable this for multi-phase drreg reg reservations to work as intended, and that it is not enabled by default due to performance cost. Also documents that using this option may require adjustment to spill slot reservations made by the client. Delays resetting bb_props to late in the instru2instru stage, in case stages besides insertion need to read it. Extends drreg-test to verify that spill slots used in later phases do not conflict with those from earlier phases, with the new option enabled. Adjust TLS slots reserved in drx_init. The existing logic of updating slots reserved by drx if drx_expand_scatter_gather is invoked is racy. Instead, now we reserve 4 slots upfront in `drx_init`. This strategy over-provisions drx if no scatter gather instr is present, so we avoid it on Windows which is more sensitive to slot usage. On Linux, slots are more plentiful so this is okay. drx will fall back to using DR slots if it runs out of TLS slots, which may be slower. All these effects are limited to scatter/gather instrs that are split into their own bbs. Adding support for multi-phase reserve/restore for aflags is follow-up work. Issue: #3823
Extends drreg support for multi-phase use by avoiding aflags spill slot conflict across multiple phases. Removes existing restriction of aflags spill slot always being the slot 0. Allows selection of aflags spill slot at spills, and releasing of the slot at unreservations. Adjusts reg state restoration logic to detect aflags slot based on xax spills and lahf. May need some more work for non-X86. Adds drreg test to verify absence of aflags spill slot conflict when aflags are reserved in app2app and insertion phases. Issue: #3823 Issue: #2985 Fixes: #2985
Moves label that contains slot id for register spill/restore instrs to after the instr instead of before. The free spill slot selection logic that makes use of these labels scans instrs *after* the given one, so we may miss the label if it is placed before. Fixes order of app val spill and tool val restore instrs after an instr that reads and writes a spilled reg. This was to take into account the label which is now after the tool restore instr. Adds test to verify restoration of reg that was reserved in multiple phases on a fault, for X86 and AARCHXX. Also adds AARCHXX variant of the multi-phase slot conflict test. Adds a note to the label instrs added by drreg-test to mark instrumentation locations. This is to avoid conflicts with other lqbel instrs. Issue: #3823, #2985
Moves label that contains slot id for register spill/restore instrs to after the instr instead of before. The free spill slot selection logic that makes use of these labels scans instrs after the given one, so we may miss the label if it is placed before. Fixes order of app val spill and tool val restore instrs after an instr that reads and writes a spilled reg. This was to take into account the label which is now after the tool val restore instr. Adds test to verify restoration of reg that was reserved in multiple phases on a fault, for X86 and AARCHXX. Also adds AARCHXX variant of the multi-phase slot conflict test, and extends it to also check proper restoration of app val (under normal operation, as opposed to under a fault which is done by the above test). The existing test only verified whether the slot used in different phases is different. Sets a new signal handler for the part of drreg-test that doesn't expect any signal. It adds a log message in case a signal is seen due to some test failure. Adds a note to the label instrs added by drreg-test to mark instrumentation locations. This is to avoid conflicts with other label instrs. Issue: #3823, #2985
I'm working on adding drreg support for multi-phase aflags reservation. In the existing implementation, aflags are always spilled to slot 0. So the first part is to instead choose a slot every time we need to spill aflags (done in PR#4917). The next step is to enhance the reg state restoration logic to recognise the slot that aflags were spilled to. This is more complex due to the following:
On AArchXX as it exists today, we can probably fix all these by ensuring that we use the same spill slot id for re-spills, and detecting aflags spill |
Found another existing bug that affects
Line 1897 in e8fc651
This is demonstrated by |
Revamps the drreg state restoration logic in drreg_event_restore_state to correctly handle various corner cases like multiple restores (i#4939), aflags in reg (i#4933) and multi-phase use (i#3823). Makes the reconstructed ilist available to drreg_event_restore_state. This is needed for determining whether some instr is app or tool. The new logic looks at the complete ilist to track the movement of the app value of gpr/aflags across spills and restores. We do not forget a spill slot on seeing one restore (which caused i#4939), but remember that the app value can still be found in the spill slot, in addition to the gpr itself. Instead, a spill slot is forgetten when it is used to spill some other reg. We also track whether gprs/aflags contain their native app value or not. This helps in handling cases of overlapping or nested spill regions arising out of multi-phase use of drreg. Tool writes (except restores) to gprs/aflags clobbers the native value. App writes to gprs/aflags installs a native value and clobbers the value in the reg's spill slot. We also track movement of aflags to registers more robustly (to fix i#4933). Adds tests for the i#4939 and i#4933 cases. Test for i#3823 will be added in a separate PR. Issue: #4933, #4939, #3823 Fixes: #4933 Fixes: #4939
Revamps the drreg state restoration logic in drreg_event_restore_state to correctly handle various corner cases like multiple restores (i#4939), multiple spills, aflags in reg (i#4933), multi-phase use (i#3823) leading to nested or otherwise overlapping spill regions, and app instrs that look like drreg spill/restore instrs. Makes the reconstructed ilist available to drreg_event_restore_state (i#3801). The additional metadata that is used by the new logic is only whether an instr is an app or meta instr. For cases when the reconstructed ilist is not available, we continue to use the old best-effort logic. Highlighted its caveats in comments. The new logic looks at the complete ilist to track the movement of the app value of gprs/aflags across spills and restores. The native value can be in the gpr/aflags itself or one or more spill slots. We do not forget a spill slot on seeing one restore (which caused i#4939), but remember that the app value can still be found in the spill slot, in addition to the gpr itself. Instead, a spill slot is forgotten when it is used to spill some other reg or if there's an app instr that writes to the spilled gpr/aflags. We also track whether gprs/aflags contain their native app value or not. This helps in handling cases of overlapping or nested spill regions arising out of multi-phase use of drreg. Tool writes (except restores) to gprs/aflags clobber the native value. App writes to gprs/aflags install a native value and invalidate the value in the gpr's/aflags' spill slot. Restores from spill slots also install a native value. Adds tests for multi-phase gpr reservation cases (nested/overlapping spill regions), multiple spills for gpr and multiple restores for gpr. Tests for these scenarios for aflags will be added when multi-phase drreg support is added for aflags. Tracks movement of aflags to registers more robustly to fix i#4933. Also adds a test for this case. Avoids confusing app instrs that write using the stolen reg on AArch64 with drreg spill or restore instrs. Also adds a test that verifies this fix. Adds documentation about faulting fragment's reconstructed ilist being available in restore state event callbacks. Issue: #4933, #4939, #3823, #3801 Fixes: #4933 Fixes: #4939
Modifies free spill slot selection logic to use is_our_spill_or_restore instead of the labels with spill slot use information added for this purpose. We do not need the extra information in the latter and can simply use the former routine. Removes the extra spill slot use metadata added in form of labels from the instrlist. Issue: #3823
Modifies free spill slot selection logic to use is_our_spill_or_restore instead of the labels with spill slot use information added for this purpose. We do not need the extra information in the latter and can simply use the former routine. Removes the extra spill slot use metadata added in form of labels from the instrlist. Also adds documentation about possibility of DR slot conflicts if DR APIs are mixed with drreg ones. Issue: #3823
Extends drreg support for multi-phase use by avoiding aflags spill slot conflict across different phases. Removes existing restriction of aflags spill slot always being the slot 0. Allows selection of aflags spill slot at spills, and releasing of the slot at unreservations. Unlike gprs, we need to reset pt->aflags.slot everytime we release the slot, because this field is used to determine whether we are spilling/restoring aflags in various routines. Adjusts fault restoration logic in drreg_event_restore_state_without_ilist to detect aflags spill slot dynamically. This is not perfect as we need the ilist for robust restoration, which is done in drreg_event_restore_state_with_ilist if ilist is available. Adds drreg test to verify absence of aflags spill slot conflict when aflags are reserved in app2app and insertion phases. Also adds drreg test to verify correct restoration of aflags on fault, for various cases: when the spill regions are nested, when they overlap (but are not nested), when the aflags value is restored once before the fault, when the native aflags value is spilled to multiple slots by different phases, and tests for some restoration cases when the faulting fragment's ilist isn't available. Adds a small fix for subtest detection in drreg-test, which didn't work if the duplicate mov immediates occurred later in a sequence of mov instrs. Renames drreg-test and signal handlers to a more descriptive name rather than just incrementing numbers/alphabets. Modifies the path where xax is already reserved and in-use when aflags are reserved: instead of writing aflags value to the spill slot using xax_swap, we now use xax itself. This is for a simpler drreg state restoration logic, which can now correctly assume that aflags are always written to slot using xax. Also added a test for this scenario. Documents limitation of drreg_statelessly_restore_app_value that there should not be any new reservations that cause a new spill slot selection between the given `where_restore` and `where_respill` instrs. Also adds a test that exercises drreg_statelessly_restore_app_value for xax when aflags are in xax. Issue: #3823, #2985 Fixes: #3823
Revamps drreg's state restoration logic (again) to not need to observe non-drreg spills and restores. The existing strategy of tracking the app value across spills and restores required drreg to be aware of non-drreg spilling mechanisms like spilling to mcontext, to stack, or any other possible methods. This makes it non-ideal. Implementes a new strategy for state restoration that does not need to be aware of non-drreg spills, and also handles the various tricky drreg cases like multi-phase use. The key observation behind this is that it is easier to find the matching spill for a given restore, than to find the matching restore for a given spill. This is because there may be other restores besides the final restore (e.g. restores for app read, user prompted restores, etc.). This makes it hard to find exactly where the spill region for a reg/aflags ends. Now we walk the ilist backwards from the last instr to the faulting instr. The first restore (or last restore in the list) found is expected to be an app value restore, we find the matching spill for that restore which uses the same slot which has to be an app value spill too. We continue doing this until we reach the faulting instr and if there's a spill-restore pair where the spill is before the faulting instr, we use that slot to restore the reg/aflags. Adds multiple new test cases that use non-drreg spilling mechanisms: spilling to mcontext, clean call instrumentation. Also modifies some previous tests so that the interesting part of the test happens after the faulting instr; this is required because we now walk backwards in the ilist, not forwards. Issue: #3823 Fixes: #4963
Revamps drreg's state restoration logic (again) to obviate the need to observe non-drreg spills and restores. The previous strategy of tracking the app value across spills and restores required drreg to be aware of non-drreg spilling mechanisms like spilling to mcontext, to stack, or any other possible methods. This made it non-ideal. Implements a new strategy for state restoration that does not need to be aware of non-drreg spills, and also handles the various tricky drreg cases like multi-phase use. The key observation behind this is that it is easier to find the matching spill for a given restore, than to find the matching restore for a given spill. This is because there may be other restores besides the final restore (e.g. restores for app read, user prompted restores, etc.). This makes it hard to find exactly where the spill region for a reg/aflags ends. Additional complexities include the fact that aflags re-spills may not use the same slot, which makes differentiating spills from multiple phases difficult. Now we walk the ilist backwards from the last instr to the faulting instr. The first restore (or last restore in the list) found is expected to be an app value restore. We find the matching spill for that restore which uses the same slot which has to be an app value spill too. We continue doing this until we reach the faulting instr and if there's a spill-restore pair where the spill is before the faulting instr, we use that slot to restore the reg/aflags. This way we see only app value spills/restores and skip over tool value spills/restores. Also makes a small fix in instr_is_reg_spill_or_restore_ex to not return true for mcontext base loads to regs, which is not strictly a reg restore operation. Adds multiple new test cases that use non-drreg spilling mechanisms: spilling to mcontext, clean call instrumentation. Also modifies some previous tests so that the interesting part of the test happens after the faulting instr; this is required because we now walk backwards in the ilist, not forwards. Verified on a GA runner that the original burst_threads failure didn't occur in 500 runs. Issue: #3823 Fixes: #4963
Revamps drreg's state restoration logic (again) to obviate the need to observe non-drreg spills and restores. The previous strategy of tracking the app value across spills and restores required drreg to be aware of non-drreg spilling mechanisms like spilling to mcontext, to stack, or any other possible methods. This made it non-ideal. Implements a new strategy for state restoration that does not need to be aware of non-drreg spills, and also handles the various tricky drreg cases like multi-phase use. The key observation behind this is that it is easier to find the matching spill for a given restore, than to find the matching restore for a given spill. This is because there may be other restores besides the final restore (e.g. restores for app read, user prompted restores, etc.). This makes it hard to find exactly where the spill region for a reg/aflags ends. Additional complexities include the fact that aflags re-spills may not use the same slot, which makes differentiating spills from multiple phases difficult. Now we walk the ilist backwards from the last instr to the faulting instr. The first restore (or last restore in the list) found is expected to be an app value restore. We find the matching spill for that restore which uses the same slot which has to be an app value spill too. We continue doing this until we reach the faulting instr and if there's a spill-restore pair where the spill is before the faulting instr, we use that slot to restore the reg/aflags. This way we see only app value spills/restores and skip over tool value spills/restores. Also makes a small fix in instr_is_reg_spill_or_restore_ex to not return true for mcontext base loads to regs, which is not strictly a reg restore operation. Adds multiple new test cases that use non-drreg spilling mechanisms: spilling to mcontext, clean call instrumentation. Also modifies some previous tests so that the interesting part of the test happens after the faulting instr; this is required because we now walk backwards in the ilist, not forwards. Verified on a GA runner that the original burst_threads failure didn't occur in 500 runs. Issue: #3823 Fixes: #4963
drreg claims to support bare-bones use outside of the insertion phase, and it does, except it does not do the right thing if there are also drreg users in the insertion phase: it will not keep the reservations in the two phases distinct. The same problem happens with multiple users in the same app2app or instru2instru phase.
If we do want to support use in other phases (the one being hit now is #2985 ), my proposal would be to add disjoint slot spaces or "arenas". This may have to be done w/ global state to avoid breaking existing interfaces by adding an arena parameter:
and drreg_init w/ some DRREG_ flag creating a separate arena of the asked-for-number of slots.
We would need to work through interactions w/ regular insertion drreg and make sure it's going to work in other respects -- reg spilled in app2app will be spilled again (nested) but restored to 1st spill on app barrier. Other client will not be able to get app's native value: ok to live w/?
The text was updated successfully, but these errors were encountered: