Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i#4134 drbbdup: Avoid 2nd scratch via max_case_encoding #5322

Merged
merged 5 commits into from
Feb 3, 2022

Conversation

derekbruening
Copy link
Contributor

@derekbruening derekbruening commented Feb 2, 2022

Adds a new drbbdup option max_case_encoding. When the value is small,
we can avoid a 2nd scratch register for aarchxx, since we can fit all
encodings in immediates inside compare instructions.

Adds a sanity check.
Confirmed visually in the debug logs that the 2nd scratch is gone:
Before:

  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff868340c0
   +0    m4 @0x0000fffd428950e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd42894da0  d53b4200   mrs    %nzcv -> %x0
   +8    m4 @0x0000fffd42894cd8  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +12   m4 @0x0000fffd42894c58  d28c1000   movz   $0x6080 lsl $0x00 -> %x0
   +16   m4 @0x0000fffd42894bd8  f2a85000   movk   %x0 $0x4280 lsl $0x10 -> %x0
   +20   m4 @0x0000fffd42894b10  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +24   m4 @0x0000fffd42894a48  f9400000   ldr    (%x0)[8byte] -> %x0
   +28   m4 @0x0000fffd42894e20  f9400000   <label>
   +28   m4 @0x0000fffd42894980  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +32   m4 @0x0000fffd42894900  54000001   b.ne   @0x0000fffd42894fa0[8byte]
  --------------------------------------------------

After:

  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff9aedf0c0
   +0    m4 @0x0000fffd56f400e8  d53b4200   mrs    %nzcv -> %x0
   +4    m4 @0x0000fffd56f3fda0  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +8    m4 @0x0000fffd56f3fcd8  d2821000   movz   $0x1080 lsl $0x00 -> %x0
   +12   m4 @0x0000fffd56f3fc58  f2aadd60   movk   %x0 $0x56eb lsl $0x10 -> %x0
   +16   m4 @0x0000fffd56f3fbd8  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +20   m4 @0x0000fffd56f3fb10  f9400000   ldr    (%x0)[8byte] -> %x0
   +24   m4 @0x0000fffd56f3fe20  f9400000   <label>
   +24   m4 @0x0000fffd56f3fa48  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +28   m4 @0x0000fffd56f3f980  54000001   b.ne   @0x0000fffd56f3ffa0[8byte]
  --------------------------------------------------

Issue: #4134

Adds a new drbbdup option max_case_encoding.  When the value is small,
we can avoid a 2nd scratch register for aarchxx, since we can fit all
encodings in immediates inside compare instructions.

Adds a sanity check.
Confirmed visually in the debug logs that the 2nd scratch is gone:
Before:
  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff868340c0
   +0    m4 @0x0000fffd428950e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd42894da0  d53b4200   mrs    %nzcv -> %x0
   +8    m4 @0x0000fffd42894cd8  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +12   m4 @0x0000fffd42894c58  d28c1000   movz   $0x6080 lsl $0x00 -> %x0
   +16   m4 @0x0000fffd42894bd8  f2a85000   movk   %x0 $0x4280 lsl $0x10 -> %x0
   +20   m4 @0x0000fffd42894b10  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +24   m4 @0x0000fffd42894a48  f9400000   ldr    (%x0)[8byte] -> %x0
   +28   m4 @0x0000fffd42894e20  f9400000   <label>
   +28   m4 @0x0000fffd42894980  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +32   m4 @0x0000fffd42894900  54000001   b.ne   @0x0000fffd42894fa0[8byte]
  --------------------------------------------------
After:
  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff9aedf0c0
   +0    m4 @0x0000fffd56f400e8  d53b4200   mrs    %nzcv -> %x0
   +4    m4 @0x0000fffd56f3fda0  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +8    m4 @0x0000fffd56f3fcd8  d2821000   movz   $0x1080 lsl $0x00 -> %x0
   +12   m4 @0x0000fffd56f3fc58  f2aadd60   movk   %x0 $0x56eb lsl $0x10 -> %x0
   +16   m4 @0x0000fffd56f3fbd8  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +20   m4 @0x0000fffd56f3fb10  f9400000   ldr    (%x0)[8byte] -> %x0
   +24   m4 @0x0000fffd56f3fe20  f9400000   <label>
   +24   m4 @0x0000fffd56f3fa48  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +28   m4 @0x0000fffd56f3f980  54000001   b.ne   @0x0000fffd56f3ffa0[8byte]
  --------------------------------------------------

Issue: #4134
@johnfxgalea
Copy link
Contributor

@derekbruening Is this ready for review?

@derekbruening
Copy link
Contributor Author

@derekbruening Is this ready for review?

Yes, forgot to request it. Thanks for noticing.

ext/drbbdup/drbbdup.c Outdated Show resolved Hide resolved
@johnfxgalea
Copy link
Contributor

You can punt the x86_64 optimisation for drbbdup_insert_compare_encoding_and_branch to another PR, but it is also suitable here as it also leverages the assumption of a max encoding case (32-bit size in this case).

@derekbruening
Copy link
Contributor Author

You can punt the x86_64 optimisation for drbbdup_insert_compare_encoding_and_branch to another PR, but it is also suitable here as it also leverages the assumption of a max encoding case (32-bit size in this case).

You mean adding the use of an immediate instead of a load? OK. Doesn't it not depend on the max encoding: it only depends on the current case value, which never changes?

@johnfxgalea
Copy link
Contributor

You can punt the x86_64 optimisation for drbbdup_insert_compare_encoding_and_branch to another PR, but it is also suitable here as it also leverages the assumption of a max encoding case (32-bit size in this case).

You mean adding the use of an immediate instead of a load? OK. Doesn't it not depend on the max encoding: it only depends on the current case value, which never changes?

You can punt the x86_64 optimisation for drbbdup_insert_compare_encoding_and_branch to another PR, but it is also suitable here as it also leverages the assumption of a max encoding case (32-bit size in this case).

You mean adding the use of an immediate instead of a load? OK. Doesn't it not depend on the max encoding: it only depends on the current case value, which never changes?

Yeah. Okay, max encoding is not needed for that optimisation; we can just check the size of each encoding.

@johnfxgalea
Copy link
Contributor

LGTM. One thing which I noticed is that drbbdup_prepare_redirect does not restore reg2 before redirection when handling dynamic cases. Shouldn't reg2 be restored there?

@derekbruening
Copy link
Contributor Author

LGTM. One thing which I noticed is that drbbdup_prepare_redirect does not restore reg2 before redirection when handling dynamic cases. Shouldn't reg2 be restored there?

Good catch. Done.

Note that you didn't actually mark "Approve" (the "LGTM" string doesn't do it in this system).

@derekbruening derekbruening merged commit e222439 into master Feb 3, 2022
@derekbruening derekbruening deleted the i4134-drbbdup-aarch-cmp-opt branch February 3, 2022 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants