-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[WIP] try VARSET_TP iterator without arguments. #11945
Conversation
@dotnet-bot test windows_nt x64 throughput |
@dotnet-bot test windows_nt x64 throughput |
Updated results (an average for 25 runs)
so it is near 9% advantage according to JitCompilation time report (all time, ms). |
src/jit/gentree.h
Outdated
@@ -149,6 +149,18 @@ struct InlineCandidateInfo; | |||
|
|||
typedef unsigned short AssertionIndex; | |||
|
|||
//------------------------------------------------------------------------ | |||
// GetAssertionIndex: return 1-based AssertionIndex from 0-based int index. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see why we can't make AssertionIndex 0-based with -1 as NO_ASSERTION_INDEX. It will allows us to delete this confusing conversions, that are easy to miss.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we do this? Is it to pack Assertion Numbers in the assertion bitmaps? Thus, bit 0 == assertion# 1, bit 1 == assertion # 2, etc.? We should stop doing this and just "burn" bit 0, so we don't need to do this math. Thus, bit0 == unused, bit 1 == assertion# 1, bit 2 == assertion #2, etc. We do it this way for BasicBlock numbers already. It's simpler and doesn't require this logic. There will possibly be diffs as one fewer assertion would be allowed in the bitmaps (assuming those are fixed sized)
In reply to: 122082167 [](ancestors = 122082167)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briansull Couldn't we change AssertionProp to stop doing this +1/-1 dance everywhere, and simply burn bit 0, as I've suggested here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No we shouldn't waste a bit
@dotnet-bot test windows_nt x64 throughput |
PTAL @dotnet/jit-contrib |
Should this still be marked [WIP]? (If you get positive reviews will you merge it?) What platform/architecture are the timings for? (x64 release Windows? What about x86 release Windows?) |
src/jit/assertionprop.cpp
Outdated
@@ -2852,16 +2852,15 @@ GenTreePtr Compiler::optAssertionProp_LclVar(ASSERT_VALARG_TP assertions, const | |||
} | |||
|
|||
BitVecOps::Iter iter(apTraits, assertions); | |||
unsigned index = 0; | |||
while (iter.NextElem(&index)) | |||
for (int i = iter.NextElem(); i != -1; i = iter.NextElem()) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another pattern would be:
int i;
while ((i = iter.NextElem()) != -1)
which, IMO, might be preferable, to avoid duplicating iter.NextElem()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but in this case we define I in the external area of visibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true, but I'm not worried about that. Especially if a more descriptive name than i
is used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer that we not use assignment as an expression. If we want to avoid the visual noise of the repeated calls to NextElem()
, we should consider using `C++11 iterators.
Can you explain why it's faster? 9% seems hard to believe. Assuming you are measuring locally, you should build both baseline and diff release builds without PGO, to avoid PGO data degradation effects (although perhaps PGO retraining would make your change faster than without it):
|
src/jit/bitset.h
Outdated
@@ -147,8 +147,8 @@ FORCEINLINE unsigned BitSetSupport::CountBitsInIntegral<unsigned>(unsigned c) | |||
// In addition to implementing the method signatures here, an instantiation of BitSetOps must also export a | |||
// BitSetOps::Iter type, which supports the following operations: | |||
// Iter(BitSetValueArgType): a constructor | |||
// bool NextElem(unsigned* pElem): returns true if the iteration is not complete, and sets *pElem to the next | |||
// yielded member. | |||
// int NextElem(): returns true the next yielded member if the iteration is not complete, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"returns true the next yielded member" => "returns the next yielded member"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Did you verify no diffs? How? What scenarios/builds/architectures? |
src/jit/bitsetasshortlong.h
Outdated
@@ -513,11 +513,10 @@ class BitSetOps</*BitSetType*/ BitSetShortLongRep, | |||
#endif | |||
|
|||
// If there's a bit, doesn't matter if we're short or long. | |||
if (hasBit) | |||
if (m_bits != 0) | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to add a comment here about why you're checking m_bits != 0
instead of hasBits
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/jit/bitsetasuint64inclass.h
Outdated
m_bitNum++; | ||
m_bits >>= 1; | ||
return true; | ||
return m_bitNum++; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be just return m_bitNum
. Currently, m_bitNum is getting incremented twice. I prefer the increment before the return
, and not using ++
in the return
clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, never mind, since you need to return the before-incremented value.
But, you need to delete the m_bitNum++
line above
In reply to: 122268396 [](ancestors = 122268396)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
src/jit/compiler.cpp
Outdated
VARSET_ITER_INIT(comp, iter, vars, varIndex); | ||
while (iter.NextElem(&varIndex)) | ||
VarSetOps::Iter iter(comp, vars); | ||
for (int i = iter.NextElem(); i != -1; i = iter.NextElem()) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, you should leave the descriptive varIndex
name and not replace it with i
. Same for other cases where you've done this.
It'd be nice to see the the lab CoreCLR-througput job for this. I hate quoting numbers off a dev box for things we want to merge. |
src/jit/bitset.h
Outdated
@@ -279,7 +279,7 @@ class BitSetOps | |||
class Iter { | |||
public: | |||
Iter(Env env, BitSetValueArgType bs) {} | |||
bool NextElem(unsigned* pElem) { return false; } | |||
int NextElem() { return -1; } | |||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of int
I wonder if this should be unsigned
and then there should be a const unsigned BitSetOps::Iter::Done = (unsigned)-1
value that callers compare against, instead of comparing against -1. I'm worried that by changing the "value return" type from unsigned
to int
, you've introduced a ton of signed/unsigned type mismatches in the code (perhaps hidden because that warning is suppressed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, but maybe there is a better way to declare it, rather than static const unsigned Done = (unsigned)-1;
and use with BlockSetOps::Iter::Done;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love the throughput improvement, although I'm very skeptical it can be as good as reported.
It is WIP because I have the test failures and I want to discuss invalid values for iterator and assertions here before the final review.
I think the improvement is significant only in crossgen where we have dominators and more bitsets operations in O(n^2) algorithms. I am planning to fix the failures and do others tests. |
FWIW, I see no improvement in instructions retired as measured by |
I am double-checking now that PGO isn't enabled for the baseline. |
I believe that I have confirmed that neither build of the JIT I used for the IR measurement above was built using PGO. |
Also, this is not to say that we should not take this change, just to help quantify the performance impact. I hope that we can take it, as I think that it's a nice improvement over the current API. |
I fixed bugs and now the results are the same. Also we have not decided could we get rid of GetAssertionIndex ? For example with the same |
@dotnet-bot test Windows_NT arm64 Cross Debug Build @dotnet-bot test Windows_NT x64 Debug Build and Test |
I disagree somewhat. I prefer the existing:
over the proposed:
So if there is no performance improvement, I wouldn't be in favor of this change. |
Ah, I was referring to the removal of the macro for constructing these iterators. I don't really feel one way or the other about the iteration API itself. It's likely that the performance ceiling for the original API is higher than the proposed API in any case, so perhaps we should retain that aspect. |
I will update PR with the
without defines and with GetAssertionIndex commit. |
We usually use bbNum (basic block number), rather than blkNum(block number). This change allows to grep for iterator and etc easier.
@dotnet-bot test OSX10.12 x64 Checked Build and Test |
Useful parts were merged as separate PRs. |
Use return arg to keep both values.
Check
if (m_bits != 0)
, because it can be scheduled beforehasBit = BitScanForward(&nextBit, m_bits);
.Local runs showed on System.Private.Corelib crossgen (Time: total value from the JitTimeLogFile)
before: 4614 ms, after 4445 ms. It is an average for 20 runs.
As usual delete two more defines.