Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#4134 drbbdup: Avoid flags preservation for 2 cases #5323

Merged
merged 5 commits into from
Feb 4, 2022

Conversation

derekbruening
Copy link
Contributor

When there are just 2 drbbdup cases and one has an encoding of zero,
we can use a flags-free jump-if-register-is-zero for our dispatch,
avoiding flags preservation costs.

Applies this to x86 as well by switching to the xcx scratch register
and using JECXZ. JECXZ is relatively slow on modern processors. I
measure its performance, and it depends on the application whether it
out-performs savings the flags. I left it as the default with hopes
that it will help more often than not on larger clients and
applications, but we can remove it if that is not borne out in future
evaluations.

The existing no-encode-test meets the criteria and serves as a test.

Before:

  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff868340c0
   +0    m4 @0x0000fffd428950e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd42894da0  d53b4200   mrs    %nzcv -> %x0
   +8    m4 @0x0000fffd42894cd8  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +12   m4 @0x0000fffd42894c58  d28c1000   movz   $0x6080 lsl $0x00 -> %x0
   +16   m4 @0x0000fffd42894bd8  f2a85000   movk   %x0 $0x4280 lsl $0x10 -> %x0
   +20   m4 @0x0000fffd42894b10  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +24   m4 @0x0000fffd42894a48  f9400000   ldr    (%x0)[8byte] -> %x0
   +28   m4 @0x0000fffd42894e20  f9400000   <label>
   +28   m4 @0x0000fffd42894980  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +32   m4 @0x0000fffd42894900  54000001   b.ne   @0x0000fffd42894fa0[8byte]
  --------------------------------------------------

After:

  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffffa53f20c0
   +0    m4 @0x0000fffd614530e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd61452da0  d28a1000   movz   $0x5080 lsl $0x00 -> %x0
   +8    m4 @0x0000fffd61452cd8  f2ac2780   movk   %x0 $0x613c lsl $0x10 -> %x0
   +12   m4 @0x0000fffd61452c58  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +16   m4 @0x0000fffd61452bd8  f9400000   ldr    (%x0)[8byte] -> %x0
   +20   m4 @0x0000fffd61452e20  f9400000   <label>
   +20   m4 @0x0000fffd61452b10  b4000000   cbz    @0x0000fffd61452fa0[8byte] %x0
  --------------------------------------------------

Issue: #4134

When there are just 2 drbbdup cases and one has an encoding of zero,
we can use a flags-free jump-if-register-is-zero for our dispatch,
avoiding flags preservation costs.

Applies this to x86 as well by switching to the xcx scratch register
and using JECXZ.  JECXZ is relatively slow on modern processors.  I
measure its performance, and it depends on the application whether it
out-performs savings the flags.  I left it as the default with hopes
that it will help more often than not on larger clients and
applications, but we can remove it if that is not borne out in future
evaluations.

The existing no-encode-test meets the criteria and serves as a test.

Before:
  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff868340c0
   +0    m4 @0x0000fffd428950e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd42894da0  d53b4200   mrs    %nzcv -> %x0
   +8    m4 @0x0000fffd42894cd8  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +12   m4 @0x0000fffd42894c58  d28c1000   movz   $0x6080 lsl $0x00 -> %x0
   +16   m4 @0x0000fffd42894bd8  f2a85000   movk   %x0 $0x4280 lsl $0x10 -> %x0
   +20   m4 @0x0000fffd42894b10  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +24   m4 @0x0000fffd42894a48  f9400000   ldr    (%x0)[8byte] -> %x0
   +28   m4 @0x0000fffd42894e20  f9400000   <label>
   +28   m4 @0x0000fffd42894980  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +32   m4 @0x0000fffd42894900  54000001   b.ne   @0x0000fffd42894fa0[8byte]
  --------------------------------------------------
After:
  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffffa53f20c0
   +0    m4 @0x0000fffd614530e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd61452da0  d28a1000   movz   $0x5080 lsl $0x00 -> %x0
   +8    m4 @0x0000fffd61452cd8  f2ac2780   movk   %x0 $0x613c lsl $0x10 -> %x0
   +12   m4 @0x0000fffd61452c58  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +16   m4 @0x0000fffd61452bd8  f9400000   ldr    (%x0)[8byte] -> %x0
   +20   m4 @0x0000fffd61452e20  f9400000   <label>
   +20   m4 @0x0000fffd61452b10  b4000000   cbz    @0x0000fffd61452fa0[8byte] %x0
  --------------------------------------------------

Issue: #4134
@johnfxgalea
Copy link
Contributor

johnfxgalea commented Feb 3, 2022

Would it be possible to add another variant of no-encode-test where zero is not the default case but another case?

ext/drbbdup/drbbdup.c Outdated Show resolved Hide resolved
@derekbruening
Copy link
Contributor Author

Would it be possible to add another variant of no-encode-test where zero is not the default case but another case?

Yes, done.

@derekbruening derekbruening merged commit 4d8c837 into master Feb 4, 2022
@derekbruening derekbruening deleted the i4134-drbbdup-aarch-cbz branch February 4, 2022 01:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants