Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: change basic block weight to float #45052

Merged
merged 5 commits into from
Nov 24, 2020

Conversation

AndyAyersMS
Copy link
Member

Change the core data type for basic block weights from unsigned to float,
to simplify overall calculations and allow for a wider dynamic range.

Many changes are straightforward, but a few are worth noting:

  • LSRA needs a true max weight, so had to introduce infinity
  • I removed some of the overflow checking as floats naturally saturate.
  • The simple geometric loop weight scaling (*8 per loop nest level) leads
    to some very large counts in some tests (15 level loop nests). We may
    want to rethink this and scale less aggressively in deep nests.
  • Morph's use of the weighted ref counts for RCS_EARLY is nonstandard
    and the values are not actually weights, so I just added a cast back to unsigned.
  • Several places in the jit seem to try and compare or combine unweighted
    and weighted counts; I don't think this makes sense. But have left as is.
  • Lower, LIR, and Decompose were passing around weights but never using them.
  • I had to introduce a special new weight for the inline projection we do
    for the prejit root.

These changes lead to small numbers of diffs, mostly places where small rounding
changes have altered heuristics; notably:

  • cse weights
  • LSRA's initial take on whether a parameter should be enregistered

Overall diff impact is a wash.

There are almost no diffs without PGO/IBC data. Diffs are slightly more
prominient in the Roslyn assemblies prejitted with some IBC.

I've tried to keep the format of weights the same in dumps (in most places)
and see minimal diffs in dumps too.

Change the core data type for basic block weights from unsigned to float,
to simplify overall calculations and allow for a wider dynamic range.

Many changes are straightforward, but a few are worth noting:
* LSRA needs a true max weight, so had to introduce infinity
* I removed some of the overflow checking as floats naturally saturate.
* The simple geometric loop weight scaling (*8 per loop nest level) leads
  to some very large counts in some tests (15 level loop nests). We may
  want to rethink this and scale less aggressively in deep nests.
* Morph's use of the weighted ref counts for RCS_EARLY is nonstandard
  and the values are not actually weights, so I just added a cast back to unsigned.
* Several places in the jit seem to try and compare or combine unweighted
  and weighted counts; I don't think this makes sense. But have left as is.
* Lower, LIR, and Decompose were passing around weights but never using them.
* I had to introduce a special new weight for the inline projection we do
  for the prejit root.

These changes lead to small numbers of diffs, mostly places where small rounding
changes have altered heuristics; notably:
* cse weights
* LSRA's initial take on whether a parameter should be enregistered

Overall diff impact is a wash.

There are almost no diffs without PGO/IBC data. Diffs are slightly more
prominient in the Roslyn assemblies prejitted with some IBC.

I've tried to keep the format of weights the same in dumps (in most places)
and see minimal diffs in dumps too.
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Nov 21, 2020
@AndyAyersMS
Copy link
Member Author

cc @dotnet/jit-contrib

Still todo: some kind of TP assessment, broader testing.

Crossgen CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit

Total bytes of delta: 652 (0.00% of base)
    diff is a regression.

Top file regressions (bytes):
        1119 : Microsoft.CodeAnalysis.CSharp.dasm (0.05% of base)

Top file improvements (bytes):
        -225 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.01% of base)
        -201 : Microsoft.CodeAnalysis.dasm (-0.03% of base)
         -29 : Newtonsoft.Json.dasm (-0.00% of base)
         -12 : System.Private.CoreLib.dasm (-0.00% of base)

5 total files with Code Size differences (4 improved, 1 regressed), 263 unchanged.

424 total methods with Code Size differences (217 improved, 207 regressed), 195737 unchanged.
PMI CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit

Total bytes of delta: -1169 (-0.00% of base)
    diff is an improvement.

Top file regressions (bytes):
          73 : System.Data.Common.dasm (0.00% of base)
           1 : Microsoft.CodeAnalysis.VisualBasic.dasm (0.00% of base)

Top file improvements (bytes):
       -1120 : System.Text.Json.dasm (-0.19% of base)
         -40 : System.DirectoryServices.dasm (-0.01% of base)
         -29 : Newtonsoft.Json.dasm (-0.00% of base)
         -27 : System.DirectoryServices.AccountManagement.dasm (-0.01% of base)
         -27 : System.Private.CoreLib.dasm (-0.00% of base)

7 total files with Code Size differences (5 improved, 2 regressed), 261 unchanged.

11 total methods with Code Size differences (9 improved, 2 regressed), 258788 unchanged.

#define BB_ZERO_WEIGHT 0.0f
#define BB_HOT_WEIGHT 1000000.0f
#define BB_MAX_WEIGHT FLT_MAX // maximum finite weight -- needs rethinking.
#define BB_VERY_HOT_WEIGHT 256.0f // how many average hits a BB has (per BBT scenario run) for this block
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is BB_VERY_HOT_WEIGHT smaller than the new BB_HOT_WEIGHT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I should rename these for what they are, they're each only used once.

BB_HOT_WEIGHT is used when we're prejtting and assessing the inlinability of the root method -- we assume there might be at least one call site with this weight.

BB_VERY_HOT_WEIGHT is used by the x86 emitter to decide if a method should be 16 byte aligned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed an update.

unsigned sumSplitEdgeCount = 0;
BasicBlock::weight_t wtdSpillCount = 0;
BasicBlock::weight_t wtdCopyRegCount = 0;
BasicBlock::weight_t wtdResolutionMovCount = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These variables ending in Count should probably be renamed as Cost at some point in the future

{
return m_defCount;
}
unsigned UseCount()
BasicBlock::weight_t UseCount()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Likewise I will consider renaming Count to be Cost in a future change

@AndyAyersMS
Copy link
Member Author

Crossgen of SPC seemingly just a bit faster with these changes:

(instrs via pin; msec via Measure-Command)

metric base diff
instrs 19.75 B 19.74 B
msec 4014 3998

@AndyAyersMS
Copy link
Member Author

@dotnet/jit-contrib anyone else planning to look at this one?

If you are and haven't gotten to it yet just reply here and I'll hold off merging.

@BruceForstall
Copy link
Member

I'll look...

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor notes, questions

src/coreclr/src/jit/block.h Outdated Show resolved Hide resolved
src/coreclr/src/jit/block.h Outdated Show resolved Hide resolved
#define BB_UNITY_WEIGHT 100.0f // how much a normal execute once block weights
#define BB_UNITY_WEIGHT_UNSIGNED 100 // how much a normal execute once block weights
#define BB_LOOP_WEIGHT 8.0f // how much more loops are weighted
#define BB_ZERO_WEIGHT 0.0f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an equality check against zero somewhere that isn't just checking to see if it's this assigned value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be? Might be tricky to find these...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have any issues with negative zero as previously the weight was an unsigned.

src/coreclr/src/jit/compiler.h Outdated Show resolved Hide resolved
src/coreclr/src/jit/compiler.h Show resolved Hide resolved
@@ -299,7 +299,7 @@ void Compiler::fgComputeProfileScale()
//
if (calleeWeight < callSiteWeight)
{
JITDUMP(" ... callee entry count %d is less than call site count %d\n", calleeWeight, callSiteWeight);
JITDUMP(" ... callee entry count %f is less than call site count %f\n", calleeWeight, callSiteWeight);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it's worthwhile to introduce some FMT_WEIGHT define to ensure uniformity in dumping of weight data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. A lot of the dumping is already stylized (via refCntWtd2str).

// Note that for CLR v2.0 and earlier our
// block weights were stored using unsigned shorts
// Type used to hold block and edge weights
typedef float weight_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worthwhile to keep weight_t as a BasicBlock member, or should we just make it a global type? (Or even its own class?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably vote to get rid of it entirely and just use float. I don't find these sorts of typedefs helpful; most anywhere you work with these values you also need to know the actual underlying type.

In past compilers I've seen people wrap these floats in structs to get stronger type checking but that is a whole lot of uninteresting boilerplate.

@@ -1836,6 +1847,18 @@ unsigned CountDigits(unsigned num, unsigned base /* = 10 */)
return count;
}

unsigned CountDigits(float num, unsigned base /* = 10 */)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems semantically dubious. What about the fractional component? If the caller doesn't care, they should cast to unsigned and call the unsigned version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already anticipate printing up to 2 fractional digits because of our old "fixed point" scheme.

Was trying to keep formatting comparable because there are lots of computed printf layouts in the jit I'd rather not mess with just yet.

@AndyAyersMS
Copy link
Member Author

@BruceForstall I followed up on some of your feedback. Will probably leave the rest as is unless you object.

@BruceForstall
Copy link
Member

Will probably leave the rest as is unless you object.

That's fine.

@AndyAyersMS AndyAyersMS merged commit 7031456 into dotnet:master Nov 24, 2020
@AndyAyersMS AndyAyersMS deleted the FloatWeights branch November 24, 2020 22:25
@nathan-moore nathan-moore mentioned this pull request Nov 29, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 24, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants