Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Using struct for VectorPacket in PacketTracer benchmark #19662

Merged
merged 1 commit into from
Aug 29, 2018

Conversation

fiigii
Copy link

@fiigii fiigii commented Aug 24, 2018

This PR changes the PacketTracer benchmark to use struct instead of class for VectorPacket.

This will make the benchmark 31% faster with JIT change #19663 (2x slower without the JIT change) of struct promotion because it reduces the GC overhead mentioned in https://github.com/dotnet/coreclr/issues/19116.

@fiigii
Copy link
Author

fiigii commented Aug 24, 2018

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

Does this PR also look good to you? @tannergooding @CarolEidt @AndyAyersMS
We need this software change to leverage the struct promotion improvement.

@@ -10,7 +10,7 @@
using System.Runtime.CompilerServices;
using System;

internal class VectorPacket256
internal struct VectorPacket256
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are using in, this should probably be readonly struct to ensure you don't incur any hidden copies.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, seems we can optimize here, will try.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you may be incurring hidden copies when passing the value as in. I would recommend checking this with the ErrorProne.NET analyzer

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then you may be incurring hidden copies when passing the value as in.

Probably not, that loop just updates a local struct. Will try to eliminate the field update.

Copy link
Author

@fiigii fiigii Aug 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to eliminate the field update of local structs via updating local vectors then new the readonly VectorPacket by the local vectors.
This change makes a little bit slower (0.75s vs 0.74s) and a bit code size regression due to callee-saved SIMD register prolog.

So, I think mutable struct is okay for this program.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I have updated this PR to use pass-by-value.

@AndyAyersMS
Copy link
Member

What happens if you just change from class to struct but don't change the parameters to in?

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

@AndyAyersMS Will try, thanks.

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

What happens if you just change from class to struct but don't change the parameters to in?

@AndyAyersMS The pass-by-value version is a little bit slower than using in. Execution time 0.74s v.s 0.72s on Core i7.

@tannergooding
Copy link
Member

The pass-by-value version is a little bit slower than using in. Execution time 0.74s v.s 0.72s on Core i7.

Do you have a diff to share? These seems like the kind of thing that should be handled by the first class struct work (or by supporting __vectorcall on Windows).

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

PMI jit-diff shows 15.39% code size shrink from the struct promotion change (the pass-by-ref version gets 16.26%). The code size regression of ConvertToIntRGB is from callee-saved SIMD register code.

PMI Diffs for PacketTracer.dll for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -6605 (-15.39% of base)
    diff is an improvement.
Top file improvements by size (bytes):
       -6605 : PacketTracer.dasm (-15.39% of base)
1 total files with size differences (1 improved, 0 regressed), 0 unchanged.
Top method regressions by size (bytes):
          74 : PacketTracer.dasm - ColorPacket256Helper:ConvertToIntRGB(struct):struct
Top method improvements by size (bytes):
       -1539 : PacketTracer.dasm - Packet256Tracer:GetNaturalColor(struct,struct,struct,struct,ref):struct:this
       -1222 : PacketTracer.dasm - Camera:Create(struct,struct):ref
       -1022 : PacketTracer.dasm - Packet256Tracer:Shade(struct,ref,ref,int):struct:this
        -821 : PacketTracer.dasm - Packet256Tracer:GetPoints(struct,struct,ref):struct:this
        -508 : PacketTracer.dasm - SpherePacket256:Intersect(ref):struct:this
26 total methods with size differences (25 improved, 1 regressed), 183 unchanged.

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

Can I use jit-diff to show codegen diff between two different C# programs (pass-by-value vs pass-by-ref) with the same JIT compiler?

@AndyAyersMS
Copy link
Member

Yes. It requires a few steps, something like:

  • jit-diff diff --pmi --diff -t V1 --assembly <test version 1>
  • in-place update test sources to version 2 and recompile
  • jit-diff diff --pmi --diff -t V2 --assembly <test version 2>
  • jit-analyze --base bin\diffs\V1\diff --diff bin\diffs\V2\diff

and/or run your own diff tool on the two files

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

@AndyAyersMS Thanks for teaching. The pass-by-value version has 1.64% code size regression.

Summary:
(Lower is better)

Total bytes of diff: 587 (1.64% of base)
    diff is a regression.

Total byte diff includes 452 bytes from reconciling methods
        Base had    5 unique methods,     7129 unique bytes
        Diff had    5 unique methods,     7581 unique bytes

Top file regressions by size (bytes):
         587 : PacketTracer.dasm (1.64% of base)

1 total files with size differences (0 improved, 1 regressed), 0 unchanged.

Top method regressions by size (bytes):
        4141 : PacketTracer.dasm - Packet256Tracer:GetNaturalColor(struct,struct,struct,struct,ref):struct:this (0/1 methods)
        2780 : PacketTracer.dasm - Packet256Tracer:Shade(struct,ref,ref,int):struct:this (0/1 methods)
         381 : PacketTracer.dasm - ColorPacket256Helper:ConvertToIntRGB(struct):struct (0/1 methods)
         248 : PacketTracer.dasm - SpherePacket256:Normals(struct):struct:this (0/1 methods)
          83 : PacketTracer.dasm - Scene:Normals(struct,struct):struct:this

Top method improvements by size (bytes):
       -3989 : PacketTracer.dasm - Packet256Tracer:GetNaturalColor(struct,byref,byref,byref,ref):struct:this (1/0 methods)
       -2594 : PacketTracer.dasm - Packet256Tracer:Shade(byref,ref,ref,int):struct:this (1/0 methods)
        -267 : PacketTracer.dasm - ColorPacket256Helper:ConvertToIntRGB(byref):struct (1/0 methods)
        -248 : PacketTracer.dasm - SpherePacket256:Normals(byref):struct:this (1/0 methods)
         -31 : PacketTracer.dasm - PlanePacket256:Normals(byref):struct:this (1/0 methods)

@AndyAyersMS
Copy link
Member

The changes modified signatures so the methods don't quite line up -- but you can still diff the corresponding versions manually. Might be interesting to see if there's a theme that emerges from the diffs.

What about perf?

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

@AndyAyersMS The pass-by-value version is a little bit slower than using in. Execution time 0.74s v.s 0.72s on Core i7.

I will look into the codegen diff and get VTune data later.

@AndyAyersMS
Copy link
Member

Ah, right -- you already mentioned that. My sense is we should just go with the simpler version.

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

My sense is we should just go with the simpler version.

@AndyAyersMS Don't you mean just use pass-by-value in this PR?

@AndyAyersMS
Copy link
Member

Yes, just the change from class to struct.

@fiigii
Copy link
Author

fiigii commented Aug 28, 2018

Thanks, will do.

@fiigii fiigii changed the title Using struct for VectorPacket and pass-by-ref in PacketTracer benchmark Using struct for VectorPacket in PacketTracer benchmark Aug 28, 2018
@CarolEidt
Copy link

Do you have a diff to share? These seems like the kind of thing that should be handled by the first class struct work (or by supporting __vectorcall on Windows).

I agree with @AndyAyersMS that we should go with the pass-by-value version. That said, it would be great to determine whether the main additional cost is due to the lack of __vectorcall, or some other inefficiency of codegen.

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tannergooding
Copy link
Member

@AndyAyersMS, did you have any additional feedback here, or are we good to merge?

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to merge.

@tannergooding tannergooding merged commit fb4b1f7 into dotnet:master Aug 29, 2018
@fiigii fiigii deleted the usingstruct branch August 30, 2018 20:41
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants