Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: change basic block weight to float #45052

Merged
merged 5 commits into from
Nov 24, 2020
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions src/coreclr/src/jit/assertionprop.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,8 @@ void Compiler::optAddCopies()
BlockSet paramImportantUseDom(BlockSetOps::MakeFull(this));

// This will be threshold for determining heavier-than-average uses
unsigned paramAvgWtdRefDiv2 = (varDsc->lvRefCntWtd() + varDsc->lvRefCnt() / 2) / (varDsc->lvRefCnt() * 2);
BasicBlock::weight_t paramAvgWtdRefDiv2 =
(varDsc->lvRefCntWtd() + varDsc->lvRefCnt() / 2) / (varDsc->lvRefCnt() * 2);

bool paramFoundImportantUse = false;

Expand Down Expand Up @@ -306,9 +307,9 @@ void Compiler::optAddCopies()
/* dominates all the uses of the local variable */

/* Our default is to use the first block */
BasicBlock* bestBlock = fgFirstBB;
unsigned bestWeight = bestBlock->getBBWeight(this);
BasicBlock* block = bestBlock;
BasicBlock* bestBlock = fgFirstBB;
BasicBlock::weight_t bestWeight = bestBlock->getBBWeight(this);
BasicBlock* block = bestBlock;

#ifdef DEBUG
if (verbose)
Expand Down
18 changes: 8 additions & 10 deletions src/coreclr/src/jit/block.h
Original file line number Diff line number Diff line change
Expand Up @@ -514,16 +514,14 @@ struct BasicBlock : private LIR::Range
const char* dspToString(int blockNumPadding = 0);
#endif // DEBUG

typedef unsigned weight_t; // Type used to hold block and edge weights
// Note that for CLR v2.0 and earlier our
// block weights were stored using unsigned shorts
// Type used to hold block and edge weights
typedef float weight_t;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worthwhile to keep weight_t as a BasicBlock member, or should we just make it a global type? (Or even its own class?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd probably vote to get rid of it entirely and just use float. I don't find these sorts of typedefs helpful; most anywhere you work with these values you also need to know the actual underlying type.

In past compilers I've seen people wrap these floats in structs to get stronger type checking but that is a whole lot of uninteresting boilerplate.


#define BB_UNITY_WEIGHT 100 // how much a normal execute once block weights
#define BB_LOOP_WEIGHT 8 // how much more loops are weighted
#define BB_ZERO_WEIGHT 0
#define BB_MAX_WEIGHT UINT32_MAX // we're using an 'unsigned' for the weight
#define BB_VERY_HOT_WEIGHT 256 // how many average hits a BB has (per BBT scenario run) for this block
// to be considered as very hot
#define BB_UNITY_WEIGHT 100.0f // how much a normal execute once block weights
AndyAyersMS marked this conversation as resolved.
Show resolved Hide resolved
#define BB_UNITY_WEIGHT_UNSIGNED 100 // how much a normal execute once block weights
#define BB_LOOP_WEIGHT 8.0f // how much more loops are weighted
AndyAyersMS marked this conversation as resolved.
Show resolved Hide resolved
#define BB_ZERO_WEIGHT 0.0f
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an equality check against zero somewhere that isn't just checking to see if it's this assigned value?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be? Might be tricky to find these...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have any issues with negative zero as previously the weight was an unsigned.

#define BB_MAX_WEIGHT FLT_MAX // maximum finite weight -- needs rethinking.

weight_t bbWeight; // The dynamic execution weight of this block

Expand Down Expand Up @@ -551,7 +549,7 @@ struct BasicBlock : private LIR::Range
}

// setBBProfileWeight -- Set the profile-derived weight for a basic block
void setBBProfileWeight(unsigned weight)
void setBBProfileWeight(weight_t weight)
{
this->bbFlags |= BBF_PROF_WEIGHT;
this->bbWeight = weight;
Expand Down
2 changes: 1 addition & 1 deletion src/coreclr/src/jit/codegencommon.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2203,7 +2203,7 @@ void CodeGen::genGenerateMachineCode()

if (compiler->fgHaveProfileData())
{
printf("; with IBC profile data, edge weights are %s, and fgCalledCount is %u\n",
printf("; with IBC profile data, edge weights are %s, and fgCalledCount is %.0f\n",
compiler->fgHaveValidEdgeWeights ? "valid" : "invalid", compiler->fgCalledCount);
}

Expand Down
35 changes: 18 additions & 17 deletions src/coreclr/src/jit/compiler.h
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
// Licensed to the .NET Foundation under one or more agreements.
// Licensed to the .NET Foundation under one or more agreements.
AndyAyersMS marked this conversation as resolved.
Show resolved Hide resolved
// The .NET Foundation licenses this file to you under the MIT license.

/*XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Expand Down Expand Up @@ -5526,7 +5527,7 @@ class Compiler

bool fgHaveProfileData();
void fgComputeProfileScale();
bool fgGetProfileWeightForBasicBlock(IL_OFFSET offset, unsigned* weight);
bool fgGetProfileWeightForBasicBlock(IL_OFFSET offset, BasicBlock::weight_t* weight);
void fgInstrumentMethod();

public:
Expand All @@ -5541,7 +5542,7 @@ class Compiler
// or BB_UNITY_WEIGHT when we aren't using profile data.
unsigned fgProfileRunsCount()
{
return fgIsUsingProfileWeights() ? fgNumProfileRuns : BB_UNITY_WEIGHT;
return fgIsUsingProfileWeights() ? fgNumProfileRuns : BB_UNITY_WEIGHT_UNSIGNED;
AndyAyersMS marked this conversation as resolved.
Show resolved Hide resolved
}

//-------- Insert a statement at the start or end of a basic block --------
Expand Down Expand Up @@ -6080,7 +6081,7 @@ class Compiler
// non-loop predecessors other than the head entry, create a new, empty block that goes (only) to the entry,
// and redirects the preds of the entry to this new block.) Sets the weight of the newly created block to
// "ambientWeight".
void optEnsureUniqueHead(unsigned loopInd, unsigned ambientWeight);
void optEnsureUniqueHead(unsigned loopInd, BasicBlock::weight_t ambientWeight);

void optUnrollLoops(); // Unrolls loops (needs to have cost info)

Expand Down Expand Up @@ -6485,8 +6486,8 @@ class Compiler
unsigned short csdDefCount; // definition count
unsigned short csdUseCount; // use count (excluding the implicit uses at defs)

unsigned csdDefWtCnt; // weighted def count
unsigned csdUseWtCnt; // weighted use count (excluding the implicit uses at defs)
BasicBlock::weight_t csdDefWtCnt; // weighted def count
BasicBlock::weight_t csdUseWtCnt; // weighted use count (excluding the implicit uses at defs)

GenTree* csdTree; // treenode containing the 1st occurrence
Statement* csdStmt; // stmt containing the 1st occurrence
Expand Down Expand Up @@ -6599,13 +6600,13 @@ class Compiler
#endif // FEATURE_VALNUM_CSE

#if FEATURE_ANYCSE
bool optDoCSE; // True when we have found a duplicate CSE tree
bool optValnumCSE_phase; // True when we are executing the optValnumCSE_phase
unsigned optCSECandidateTotal; // Grand total of CSE candidates for both Lexical and ValNum
unsigned optCSECandidateCount; // Count of CSE's candidates, reset for Lexical and ValNum CSE's
unsigned optCSEstart; // The first local variable number that is a CSE
unsigned optCSEcount; // The total count of CSE's introduced.
unsigned optCSEweight; // The weight of the current block when we are doing PerformCSE
bool optDoCSE; // True when we have found a duplicate CSE tree
bool optValnumCSE_phase; // True when we are executing the optValnumCSE_phase
unsigned optCSECandidateTotal; // Grand total of CSE candidates for both Lexical and ValNum
unsigned optCSECandidateCount; // Count of CSE's candidates, reset for Lexical and ValNum CSE's
unsigned optCSEstart; // The first local variable number that is a CSE
unsigned optCSEcount; // The total count of CSE's introduced.
BasicBlock::weight_t optCSEweight; // The weight of the current block when we are doing PerformCSE

bool optIsCSEcandidate(GenTree* tree);

Expand Down Expand Up @@ -7712,11 +7713,11 @@ class Compiler
return codeGen->doDoubleAlign();
}
DWORD getCanDoubleAlign();
bool shouldDoubleAlign(unsigned refCntStk,
unsigned refCntReg,
unsigned refCntWtdReg,
unsigned refCntStkParam,
unsigned refCntWtdStkDbl);
bool shouldDoubleAlign(unsigned refCntStk,
unsigned refCntReg,
BasicBlock::weight_t refCntWtdReg,
unsigned refCntStkParam,
BasicBlock::weight_t refCntWtdStkDbl);
#endif // DOUBLE_ALIGN

bool IsFullPtrRegMapRequired()
Expand Down
14 changes: 4 additions & 10 deletions src/coreclr/src/jit/compiler.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -856,7 +856,7 @@ inline unsigned int genCSEnum2bit(unsigned index)

#ifdef DEBUG
const char* genES2str(BitVecTraits* traits, EXPSET_TP set);
const char* refCntWtd2str(unsigned refCntWtd);
const char* refCntWtd2str(BasicBlock::weight_t refCntWtd);
#endif

/*
Expand Down Expand Up @@ -1841,15 +1841,9 @@ inline void LclVarDsc::incRefCnts(BasicBlock::weight_t weight, Compiler* comp, R
weight *= 2;
}

unsigned newWeight = lvRefCntWtd(state) + weight;
if (newWeight >= lvRefCntWtd(state))
{ // lvRefCntWtd is an "unsigned". Don't overflow it
setLvRefCntWtd(newWeight, state);
}
else
{ // On overflow we assign UINT32_MAX
setLvRefCntWtd(UINT32_MAX, state);
}
BasicBlock::weight_t newWeight = lvRefCntWtd(state) + weight;
assert(newWeight >= lvRefCntWtd(state));
setLvRefCntWtd(newWeight, state);
}
}

Expand Down
31 changes: 13 additions & 18 deletions src/coreclr/src/jit/decomposelongs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -61,9 +61,7 @@ void DecomposeLongs::DecomposeBlock(BasicBlock* block)
{
assert(block == m_compiler->compCurBB); // compCurBB must already be set.
assert(block->isEmpty() || block->IsLIR());

m_blockWeight = block->getBBWeight(m_compiler);
m_range = &LIR::AsRange(block);
m_range = &LIR::AsRange(block);
DecomposeRangeHelper();
}

Expand All @@ -75,20 +73,17 @@ void DecomposeLongs::DecomposeBlock(BasicBlock* block)
//
// Arguments:
// compiler - The compiler context.
// blockWeight - The weight of the block into which the range will be
// inserted.
// range - The range to decompose.
//
// Return Value:
// None.
//
void DecomposeLongs::DecomposeRange(Compiler* compiler, unsigned blockWeight, LIR::Range& range)
void DecomposeLongs::DecomposeRange(Compiler* compiler, LIR::Range& range)
{
assert(compiler != nullptr);

DecomposeLongs decomposer(compiler);
decomposer.m_blockWeight = blockWeight;
decomposer.m_range = ⦥
decomposer.m_range = ⦥

decomposer.DecomposeRangeHelper();
}
Expand Down Expand Up @@ -626,7 +621,7 @@ GenTree* DecomposeLongs::DecomposeCast(LIR::Use& use)
else
{
LIR::Use src(Range(), &(cast->AsOp()->gtOp1), cast);
unsigned lclNum = src.ReplaceWithLclVar(m_compiler, m_blockWeight);
unsigned lclNum = src.ReplaceWithLclVar(m_compiler);

loResult = src.Def();

Expand Down Expand Up @@ -768,22 +763,22 @@ GenTree* DecomposeLongs::DecomposeStoreInd(LIR::Use& use)

// Save address to a temp. It is used in storeIndLow and storeIndHigh trees.
LIR::Use address(Range(), &tree->AsOp()->gtOp1, tree);
address.ReplaceWithLclVar(m_compiler, m_blockWeight);
address.ReplaceWithLclVar(m_compiler);
JITDUMP("[DecomposeStoreInd]: Saving address tree to a temp var:\n");
DISPTREERANGE(Range(), address.Def());

if (!gtLong->AsOp()->gtOp1->OperIsLeaf())
{
LIR::Use op1(Range(), &gtLong->AsOp()->gtOp1, gtLong);
op1.ReplaceWithLclVar(m_compiler, m_blockWeight);
op1.ReplaceWithLclVar(m_compiler);
JITDUMP("[DecomposeStoreInd]: Saving low data tree to a temp var:\n");
DISPTREERANGE(Range(), op1.Def());
}

if (!gtLong->AsOp()->gtOp2->OperIsLeaf())
{
LIR::Use op2(Range(), &gtLong->AsOp()->gtOp2, gtLong);
op2.ReplaceWithLclVar(m_compiler, m_blockWeight);
op2.ReplaceWithLclVar(m_compiler);
JITDUMP("[DecomposeStoreInd]: Saving high data tree to a temp var:\n");
DISPTREERANGE(Range(), op2.Def());
}
Expand Down Expand Up @@ -841,7 +836,7 @@ GenTree* DecomposeLongs::DecomposeInd(LIR::Use& use)
GenTree* indLow = use.Def();

LIR::Use address(Range(), &indLow->AsOp()->gtOp1, indLow);
address.ReplaceWithLclVar(m_compiler, m_blockWeight);
address.ReplaceWithLclVar(m_compiler);
JITDUMP("[DecomposeInd]: Saving addr tree to a temp var:\n");
DISPTREERANGE(Range(), address.Def());

Expand Down Expand Up @@ -1151,7 +1146,7 @@ GenTree* DecomposeLongs::DecomposeShift(LIR::Use& use)
// x = x << 32

LIR::Use loOp1Use(Range(), &gtLong->AsOp()->gtOp1, gtLong);
loOp1Use.ReplaceWithLclVar(m_compiler, m_blockWeight);
loOp1Use.ReplaceWithLclVar(m_compiler);

hiResult = loOp1Use.Def();
Range().Remove(gtLong);
Expand Down Expand Up @@ -1434,10 +1429,10 @@ GenTree* DecomposeLongs::DecomposeRotate(LIR::Use& use)
{
// If the rotate amount is 32, then swap hi and lo
LIR::Use loOp1Use(Range(), &gtLong->AsOp()->gtOp1, gtLong);
loOp1Use.ReplaceWithLclVar(m_compiler, m_blockWeight);
loOp1Use.ReplaceWithLclVar(m_compiler);

LIR::Use hiOp1Use(Range(), &gtLong->AsOp()->gtOp2, gtLong);
hiOp1Use.ReplaceWithLclVar(m_compiler, m_blockWeight);
hiOp1Use.ReplaceWithLclVar(m_compiler);

hiResult = loOp1Use.Def();
loResult = hiOp1Use.Def();
Expand Down Expand Up @@ -1821,7 +1816,7 @@ GenTree* DecomposeLongs::StoreNodeToVar(LIR::Use& use)
}

// Otherwise, we need to force var = call()
unsigned varNum = use.ReplaceWithLclVar(m_compiler, m_blockWeight);
unsigned varNum = use.ReplaceWithLclVar(m_compiler);
m_compiler->lvaTable[varNum].lvIsMultiRegRet = true;

// Decompose the new LclVar use
Expand All @@ -1848,7 +1843,7 @@ GenTree* DecomposeLongs::RepresentOpAsLocalVar(GenTree* op, GenTree* user, GenTr
else
{
LIR::Use opUse(Range(), edge, user);
opUse.ReplaceWithLclVar(m_compiler, m_blockWeight);
opUse.ReplaceWithLclVar(m_compiler);
return *edge;
}
}
Expand Down
3 changes: 1 addition & 2 deletions src/coreclr/src/jit/decomposelongs.h
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ class DecomposeLongs
void PrepareForDecomposition();
void DecomposeBlock(BasicBlock* block);

static void DecomposeRange(Compiler* compiler, unsigned blockWeight, LIR::Range& range);
static void DecomposeRange(Compiler* compiler, LIR::Range& range);

private:
inline LIR::Range& Range() const
Expand Down Expand Up @@ -69,7 +69,6 @@ class DecomposeLongs

// Data
Compiler* m_compiler;
unsigned m_blockWeight;
LIR::Range* m_range;
};

Expand Down
5 changes: 3 additions & 2 deletions src/coreclr/src/jit/emit.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4755,7 +4755,7 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
// code to be 16-byte aligned.
//
// 1. For ngen code with IBC data, use 16-byte alignment if the method
// has been called more than BB_VERY_HOT_WEIGHT times.
// has been called more than ScenarioHotWeight times.
// 2. For JITed code and ngen code without IBC data, use 16-byte alignment
// when the code is 16 bytes or smaller. We align small getters/setters
// because of they are penalized heavily on certain hardware when not 16-byte
Expand All @@ -4764,7 +4764,8 @@ unsigned emitter::emitEndCodeGen(Compiler* comp,
//
if (emitComp->fgHaveProfileData())
{
if (emitComp->fgCalledCount > (BB_VERY_HOT_WEIGHT * emitComp->fgProfileRunsCount()))
const float scenarioHotWeight = 256.0f;
if (emitComp->fgCalledCount > (scenarioHotWeight * emitComp->fgProfileRunsCount()))
{
allocMemFlag = CORJIT_ALLOCMEM_FLG_16BYTE_ALIGN;
}
Expand Down
Loading