-
Notifications
You must be signed in to change notification settings - Fork 571
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i#4134 drbbdup: Avoid flags preservation for 2 cases #5323
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
When there are just 2 drbbdup cases and one has an encoding of zero, we can use a flags-free jump-if-register-is-zero for our dispatch, avoiding flags preservation costs. Applies this to x86 as well by switching to the xcx scratch register and using JECXZ. JECXZ is relatively slow on modern processors. I measure its performance, and it depends on the application whether it out-performs savings the flags. I left it as the default with hopes that it will help more often than not on larger clients and applications, but we can remove it if that is not borne out in future evaluations. The existing no-encode-test meets the criteria and serves as a test. Before: -------------------------------------------------- after instrumentation: TAG 0x0000ffff868340c0 +0 m4 @0x0000fffd428950e8 f900b781 str %x1 -> +0x0168(%x28)[8byte] +4 m4 @0x0000fffd42894da0 d53b4200 mrs %nzcv -> %x0 +8 m4 @0x0000fffd42894cd8 f900af80 str %x0 -> +0x0158(%x28)[8byte] +12 m4 @0x0000fffd42894c58 d28c1000 movz $0x6080 lsl $0x00 -> %x0 +16 m4 @0x0000fffd42894bd8 f2a85000 movk %x0 $0x4280 lsl $0x10 -> %x0 +20 m4 @0x0000fffd42894b10 f2dfffe0 movk %x0 $0xffff lsl $0x20 -> %x0 +24 m4 @0x0000fffd42894a48 f9400000 ldr (%x0)[8byte] -> %x0 +28 m4 @0x0000fffd42894e20 f9400000 <label> +28 m4 @0x0000fffd42894980 f100041f subs %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr +32 m4 @0x0000fffd42894900 54000001 b.ne @0x0000fffd42894fa0[8byte] -------------------------------------------------- After: -------------------------------------------------- after instrumentation: TAG 0x0000ffffa53f20c0 +0 m4 @0x0000fffd614530e8 f900b781 str %x1 -> +0x0168(%x28)[8byte] +4 m4 @0x0000fffd61452da0 d28a1000 movz $0x5080 lsl $0x00 -> %x0 +8 m4 @0x0000fffd61452cd8 f2ac2780 movk %x0 $0x613c lsl $0x10 -> %x0 +12 m4 @0x0000fffd61452c58 f2dfffe0 movk %x0 $0xffff lsl $0x20 -> %x0 +16 m4 @0x0000fffd61452bd8 f9400000 ldr (%x0)[8byte] -> %x0 +20 m4 @0x0000fffd61452e20 f9400000 <label> +20 m4 @0x0000fffd61452b10 b4000000 cbz @0x0000fffd61452fa0[8byte] %x0 -------------------------------------------------- Issue: #4134
Would it be possible to add another variant of no-encode-test where zero is not the default case but another case? |
johnfxgalea
reviewed
Feb 3, 2022
johnfxgalea
approved these changes
Feb 3, 2022
… test our cmp opts
Yes, done. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When there are just 2 drbbdup cases and one has an encoding of zero,
we can use a flags-free jump-if-register-is-zero for our dispatch,
avoiding flags preservation costs.
Applies this to x86 as well by switching to the xcx scratch register
and using JECXZ. JECXZ is relatively slow on modern processors. I
measure its performance, and it depends on the application whether it
out-performs savings the flags. I left it as the default with hopes
that it will help more often than not on larger clients and
applications, but we can remove it if that is not borne out in future
evaluations.
The existing no-encode-test meets the criteria and serves as a test.
Before:
After:
Issue: #4134