i#4134 drbbdup: Avoid 2nd scratch via max_case_encoding #5322

derekbruening · 2022-02-02T17:58:40Z

Adds a new drbbdup option max_case_encoding. When the value is small,
we can avoid a 2nd scratch register for aarchxx, since we can fit all
encodings in immediates inside compare instructions.

Adds a sanity check.
Confirmed visually in the debug logs that the 2nd scratch is gone:
Before:

  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff868340c0
   +0    m4 @0x0000fffd428950e8  f900b781   str    %x1 -> +0x0168(%x28)[8byte]
   +4    m4 @0x0000fffd42894da0  d53b4200   mrs    %nzcv -> %x0
   +8    m4 @0x0000fffd42894cd8  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +12   m4 @0x0000fffd42894c58  d28c1000   movz   $0x6080 lsl $0x00 -> %x0
   +16   m4 @0x0000fffd42894bd8  f2a85000   movk   %x0 $0x4280 lsl $0x10 -> %x0
   +20   m4 @0x0000fffd42894b10  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +24   m4 @0x0000fffd42894a48  f9400000   ldr    (%x0)[8byte] -> %x0
   +28   m4 @0x0000fffd42894e20  f9400000   <label>
   +28   m4 @0x0000fffd42894980  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +32   m4 @0x0000fffd42894900  54000001   b.ne   @0x0000fffd42894fa0[8byte]
  --------------------------------------------------

After:

  --------------------------------------------------
  after instrumentation:
  TAG  0x0000ffff9aedf0c0
   +0    m4 @0x0000fffd56f400e8  d53b4200   mrs    %nzcv -> %x0
   +4    m4 @0x0000fffd56f3fda0  f900af80   str    %x0 -> +0x0158(%x28)[8byte]
   +8    m4 @0x0000fffd56f3fcd8  d2821000   movz   $0x1080 lsl $0x00 -> %x0
   +12   m4 @0x0000fffd56f3fc58  f2aadd60   movk   %x0 $0x56eb lsl $0x10 -> %x0
   +16   m4 @0x0000fffd56f3fbd8  f2dfffe0   movk   %x0 $0xffff lsl $0x20 -> %x0
   +20   m4 @0x0000fffd56f3fb10  f9400000   ldr    (%x0)[8byte] -> %x0
   +24   m4 @0x0000fffd56f3fe20  f9400000   <label>
   +24   m4 @0x0000fffd56f3fa48  f100041f   subs   %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr
   +28   m4 @0x0000fffd56f3f980  54000001   b.ne   @0x0000fffd56f3ffa0[8byte]
  --------------------------------------------------

Issue: #4134

Adds a new drbbdup option max_case_encoding. When the value is small, we can avoid a 2nd scratch register for aarchxx, since we can fit all encodings in immediates inside compare instructions. Adds a sanity check. Confirmed visually in the debug logs that the 2nd scratch is gone: Before: -------------------------------------------------- after instrumentation: TAG 0x0000ffff868340c0 +0 m4 @0x0000fffd428950e8 f900b781 str %x1 -> +0x0168(%x28)[8byte] +4 m4 @0x0000fffd42894da0 d53b4200 mrs %nzcv -> %x0 +8 m4 @0x0000fffd42894cd8 f900af80 str %x0 -> +0x0158(%x28)[8byte] +12 m4 @0x0000fffd42894c58 d28c1000 movz $0x6080 lsl $0x00 -> %x0 +16 m4 @0x0000fffd42894bd8 f2a85000 movk %x0 $0x4280 lsl $0x10 -> %x0 +20 m4 @0x0000fffd42894b10 f2dfffe0 movk %x0 $0xffff lsl $0x20 -> %x0 +24 m4 @0x0000fffd42894a48 f9400000 ldr (%x0)[8byte] -> %x0 +28 m4 @0x0000fffd42894e20 f9400000 <label> +28 m4 @0x0000fffd42894980 f100041f subs %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr +32 m4 @0x0000fffd42894900 54000001 b.ne @0x0000fffd42894fa0[8byte] -------------------------------------------------- After: -------------------------------------------------- after instrumentation: TAG 0x0000ffff9aedf0c0 +0 m4 @0x0000fffd56f400e8 d53b4200 mrs %nzcv -> %x0 +4 m4 @0x0000fffd56f3fda0 f900af80 str %x0 -> +0x0158(%x28)[8byte] +8 m4 @0x0000fffd56f3fcd8 d2821000 movz $0x1080 lsl $0x00 -> %x0 +12 m4 @0x0000fffd56f3fc58 f2aadd60 movk %x0 $0x56eb lsl $0x10 -> %x0 +16 m4 @0x0000fffd56f3fbd8 f2dfffe0 movk %x0 $0xffff lsl $0x20 -> %x0 +20 m4 @0x0000fffd56f3fb10 f9400000 ldr (%x0)[8byte] -> %x0 +24 m4 @0x0000fffd56f3fe20 f9400000 <label> +24 m4 @0x0000fffd56f3fa48 f100041f subs %x0 $0x0000000000000001 lsl $0x0000000000000000 -> %xzr +28 m4 @0x0000fffd56f3f980 54000001 b.ne @0x0000fffd56f3ffa0[8byte] -------------------------------------------------- Issue: #4134

johnfxgalea · 2022-02-02T22:36:27Z

@derekbruening Is this ready for review?

derekbruening · 2022-02-02T22:47:28Z

@derekbruening Is this ready for review?

Yes, forgot to request it. Thanks for noticing.

ext/drbbdup/drbbdup.c

johnfxgalea · 2022-02-03T13:47:17Z

You can punt the x86_64 optimisation for drbbdup_insert_compare_encoding_and_branch to another PR, but it is also suitable here as it also leverages the assumption of a max encoding case (32-bit size in this case).

derekbruening · 2022-02-03T16:49:22Z

You can punt the x86_64 optimisation for drbbdup_insert_compare_encoding_and_branch to another PR, but it is also suitable here as it also leverages the assumption of a max encoding case (32-bit size in this case).

You mean adding the use of an immediate instead of a load? OK. Doesn't it not depend on the max encoding: it only depends on the current case value, which never changes?

johnfxgalea · 2022-02-03T17:03:06Z

You can punt the x86_64 optimisation for drbbdup_insert_compare_encoding_and_branch to another PR, but it is also suitable here as it also leverages the assumption of a max encoding case (32-bit size in this case).

You mean adding the use of an immediate instead of a load? OK. Doesn't it not depend on the max encoding: it only depends on the current case value, which never changes?

Yeah. Okay, max encoding is not needed for that optimisation; we can just check the size of each encoding.

…bbdup-aarch-cmp-opt

johnfxgalea · 2022-02-03T17:34:18Z

LGTM. One thing which I noticed is that drbbdup_prepare_redirect does not restore reg2 before redirection when handling dynamic cases. Shouldn't reg2 be restored there?

derekbruening · 2022-02-03T18:51:49Z

LGTM. One thing which I noticed is that drbbdup_prepare_redirect does not restore reg2 before redirection when handling dynamic cases. Shouldn't reg2 be restored there?

Good catch. Done.

Note that you didn't actually mark "Approve" (the "LGTM" string doesn't do it in this system).

derekbruening requested a review from johnfxgalea February 2, 2022 22:47

johnfxgalea reviewed Feb 3, 2022

View reviewed changes

ext/drbbdup/drbbdup.c Outdated Show resolved Hide resolved

Use immed on x86_64 if cur encoding is 32-bit

d09858f

derekbruening added 2 commits February 3, 2022 12:19

Switch to dedicated flag; move max to ==; fix assert

f2d5dbc

Merge branch 'master' of github.com:DynamoRIO/dynamorio into i4134-dr…

6648c13

…bbdup-aarch-cmp-opt

derekbruening requested a review from johnfxgalea February 3, 2022 17:20

Restore scratch2 for redirect; improve assert by passing in manager

f769ea7

johnfxgalea approved these changes Feb 3, 2022

View reviewed changes

derekbruening merged commit e222439 into master Feb 3, 2022

derekbruening deleted the i4134-drbbdup-aarch-cmp-opt branch February 3, 2022 21:34

derekbruening mentioned this pull request Feb 15, 2022

Collect multiple trace samples from a single benchmark run #3995

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i#4134 drbbdup: Avoid 2nd scratch via max_case_encoding #5322

i#4134 drbbdup: Avoid 2nd scratch via max_case_encoding #5322

derekbruening commented Feb 2, 2022 •

edited

Loading

johnfxgalea commented Feb 2, 2022

derekbruening commented Feb 2, 2022

johnfxgalea commented Feb 3, 2022

derekbruening commented Feb 3, 2022

johnfxgalea commented Feb 3, 2022

johnfxgalea commented Feb 3, 2022

derekbruening commented Feb 3, 2022

i#4134 drbbdup: Avoid 2nd scratch via max_case_encoding #5322

i#4134 drbbdup: Avoid 2nd scratch via max_case_encoding #5322

Conversation

derekbruening commented Feb 2, 2022 • edited Loading

johnfxgalea commented Feb 2, 2022

derekbruening commented Feb 2, 2022

johnfxgalea commented Feb 3, 2022

derekbruening commented Feb 3, 2022

johnfxgalea commented Feb 3, 2022

johnfxgalea commented Feb 3, 2022

derekbruening commented Feb 3, 2022

derekbruening commented Feb 2, 2022 •

edited

Loading