Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[WIP] try VARSET_TP iterator without arguments. #11945

Closed
wants to merge 6 commits into from

Conversation

sandreenko
Copy link

@sandreenko sandreenko commented May 27, 2017

Use return arg to keep both values.
Check if (m_bits != 0) , because it can be scheduled before hasBit = BitScanForward(&nextBit, m_bits);.
Local runs showed on System.Private.Corelib crossgen (Time: total value from the JitTimeLogFile)

before: 4614 ms, after 4445 ms. It is an average for 20 runs.

As usual delete two more defines.

@sandreenko
Copy link
Author

@dotnet-bot test windows_nt x64 throughput

@sandreenko
Copy link
Author

@dotnet-bot test windows_nt x64 throughput

@sandreenko
Copy link
Author

Updated results (an average for 25 runs)

old 4421
new 3987

so it is near 9% advantage according to JitCompilation time report (all time, ms).

results.xlsx

@@ -149,6 +149,18 @@ struct InlineCandidateInfo;

typedef unsigned short AssertionIndex;

//------------------------------------------------------------------------
// GetAssertionIndex: return 1-based AssertionIndex from 0-based int index.
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not see why we can't make AssertionIndex 0-based with -1 as NO_ASSERTION_INDEX. It will allows us to delete this confusing conversions, that are easy to miss.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this? Is it to pack Assertion Numbers in the assertion bitmaps? Thus, bit 0 == assertion# 1, bit 1 == assertion # 2, etc.? We should stop doing this and just "burn" bit 0, so we don't need to do this math. Thus, bit0 == unused, bit 1 == assertion# 1, bit 2 == assertion #2, etc. We do it this way for BasicBlock numbers already. It's simpler and doesn't require this logic. There will possibly be diffs as one fewer assertion would be allowed in the bitmaps (assuming those are fixed sized)


In reply to: 122082167 [](ancestors = 122082167)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansull Couldn't we change AssertionProp to stop doing this +1/-1 dance everywhere, and simply burn bit 0, as I've suggested here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No we shouldn't waste a bit

@sandreenko
Copy link
Author

@dotnet-bot test windows_nt x64 throughput

@sandreenko
Copy link
Author

PTAL @dotnet/jit-contrib

@BruceForstall
Copy link
Member

Should this still be marked [WIP]? (If you get positive reviews will you merge it?)

What platform/architecture are the timings for? (x64 release Windows? What about x86 release Windows?)

@@ -2852,16 +2852,15 @@ GenTreePtr Compiler::optAssertionProp_LclVar(ASSERT_VALARG_TP assertions, const
}

BitVecOps::Iter iter(apTraits, assertions);
unsigned index = 0;
while (iter.NextElem(&index))
for (int i = iter.NextElem(); i != -1; i = iter.NextElem())
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another pattern would be:

int i;
while ((i = iter.NextElem()) != -1)

which, IMO, might be preferable, to avoid duplicating iter.NextElem().

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in this case we define I in the external area of visibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, but I'm not worried about that. Especially if a more descriptive name than i is used.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer that we not use assignment as an expression. If we want to avoid the visual noise of the repeated calls to NextElem(), we should consider using `C++11 iterators.

@BruceForstall
Copy link
Member

Can you explain why it's faster? 9% seems hard to believe.

Assuming you are measuring locally, you should build both baseline and diff release builds without PGO, to avoid PGO data degradation effects (although perhaps PGO retraining would make your change faster than without it):

--- a/pgosupport.cmake
+++ b/pgosupport.cmake
@@ -36,8 +36,8 @@ function(add_pgo TargetName)
         # If we don't have profile data availble, gracefully fall back to a non-PGO opt build
         if(EXISTS ${ProfilePath})
             if(WIN32)
-                set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELEASE        " /LTCG /USEPROFILE:PGD=${ProfilePath}")
-                set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELWITHDEBINFO " /LTCG /USEPROFILE:PGD=${ProfilePath}")
+                # set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELEASE        " /LTCG /USEPROFILE:PGD=${ProfilePath}")
+                # set_property(TARGET ${TargetName} APPEND_STRING PROPERTY LINK_FLAGS_RELWITHDEBINFO " /LTCG /USEPROFILE:PGD=${ProfilePath}")
             else(WIN32)
                 if(UPPERCASE_CMAKE_BUILD_TYPE STREQUAL RELEASE OR UPPERCASE_CMAKE_BUILD_TYPE STREQUAL RELWITHDEBINFO)
                     if(HAVE_LTO)

src/jit/bitset.h Outdated
@@ -147,8 +147,8 @@ FORCEINLINE unsigned BitSetSupport::CountBitsInIntegral<unsigned>(unsigned c)
// In addition to implementing the method signatures here, an instantiation of BitSetOps must also export a
// BitSetOps::Iter type, which supports the following operations:
// Iter(BitSetValueArgType): a constructor
// bool NextElem(unsigned* pElem): returns true if the iteration is not complete, and sets *pElem to the next
// yielded member.
// int NextElem(): returns true the next yielded member if the iteration is not complete,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"returns true the next yielded member" => "returns the next yielded member"

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@BruceForstall
Copy link
Member

Did you verify no diffs? How? What scenarios/builds/architectures?

@@ -513,11 +513,10 @@ class BitSetOps</*BitSetType*/ BitSetShortLongRep,
#endif

// If there's a bit, doesn't matter if we're short or long.
if (hasBit)
if (m_bits != 0)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to add a comment here about why you're checking m_bits != 0 instead of hasBits.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe you could also assert(hasBit)


In reply to: 122267573 [](ancestors = 122267573)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

m_bitNum++;
m_bits >>= 1;
return true;
return m_bitNum++;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be just return m_bitNum. Currently, m_bitNum is getting incremented twice. I prefer the increment before the return, and not using ++ in the return clause.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, never mind, since you need to return the before-incremented value.

But, you need to delete the m_bitNum++ line above


In reply to: 122268396 [](ancestors = 122268396)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

VARSET_ITER_INIT(comp, iter, vars, varIndex);
while (iter.NextElem(&varIndex))
VarSetOps::Iter iter(comp, vars);
for (int i = iter.NextElem(); i != -1; i = iter.NextElem())
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, you should leave the descriptive varIndex name and not replace it with i. Same for other cases where you've done this.

@russellhadley
Copy link

It'd be nice to see the the lab CoreCLR-througput job for this. I hate quoting numbers off a dev box for things we want to merge.

src/jit/bitset.h Outdated
@@ -279,7 +279,7 @@ class BitSetOps
class Iter {
public:
Iter(Env env, BitSetValueArgType bs) {}
bool NextElem(unsigned* pElem) { return false; }
int NextElem() { return -1; }
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of int I wonder if this should be unsigned and then there should be a const unsigned BitSetOps::Iter::Done = (unsigned)-1 value that callers compare against, instead of comparing against -1. I'm worried that by changing the "value return" type from unsigned to int, you've introduced a ton of signed/unsigned type mismatches in the code (perhaps hidden because that warning is suppressed).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, but maybe there is a better way to declare it, rather than static const unsigned Done = (unsigned)-1; and use with BlockSetOps::Iter::Done;

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love the throughput improvement, although I'm very skeptical it can be as good as reported.

@sandreenko
Copy link
Author

sandreenko commented Jun 15, 2017

Should this still be marked [WIP]? (If you get positive reviews will you merge it?)
What platform/architecture are the timings for? (x64 release Windows? What about x86 release Windows?)

It is WIP because I have the test failures and I want to discuss invalid values for iterator and assertions here before the final review.

I love the throughput improvement, although I'm very skeptical it can be as good as reported.

I think the improvement is significant only in crossgen where we have dominators and more bitsets operations in O(n^2) algorithms.

I am planning to fix the failures and do others tests.

@pgavlin
Copy link

pgavlin commented Jun 15, 2017

FWIW, I see no improvement in instructions retired as measured by callgrind when crosgen'ing S.P.C.dll on Linux. Indeed, there is a very small increase in instructions retired (about .03%).

@pgavlin
Copy link

pgavlin commented Jun 15, 2017

I am double-checking now that PGO isn't enabled for the baseline.

@pgavlin
Copy link

pgavlin commented Jun 15, 2017

I believe that I have confirmed that neither build of the JIT I used for the IR measurement above was built using PGO.

@pgavlin
Copy link

pgavlin commented Jun 15, 2017

Also, this is not to say that we should not take this change, just to help quantify the performance impact. I hope that we can take it, as I think that it's a nice improvement over the current API.

@sandreenko
Copy link
Author

I fixed bugs and now the results are the same.
I like that we do not return value in input arguments and do not have defines, but I don't like very long for expressions and BlockSetOps::Iter::Done that is defined in each iterator.

Also we have not decided could we get rid of GetAssertionIndex ? For example with the same ILEGAL =(unsigned char)-1.

@sandreenko
Copy link
Author

@dotnet-bot test Windows_NT arm64 Cross Debug Build

@dotnet-bot test Windows_NT x64 Debug Build and Test

@BruceForstall
Copy link
Member

I hope that we can take it, as I think that it's a nice improvement over the current API.

I disagree somewhat. I prefer the existing:

int i;
while (iter.NextElem(&i))

over the proposed:

for (int i = iter.NextElem(); i != BitVecOps::Iter::Done; i = iter.NextElem())

So if there is no performance improvement, I wouldn't be in favor of this change.

@pgavlin
Copy link

pgavlin commented Jun 19, 2017

I hope that we can take it, as I think that it's a nice improvement over the current API.
I disagree somewhat. I prefer the existing:

int i;
while (iter.NextElem(&i))

over the proposed:

for (int i = iter.NextElem(); i != BitVecOps::Iter::Done; i = iter.NextElem())

So if there is no performance improvement, I wouldn't be in favor of this change.

Ah, I was referring to the removal of the macro for constructing these iterators. I don't really feel one way or the other about the iteration API itself. It's likely that the performance ceiling for the original API is higher than the proposed API in any case, so perhaps we should retain that aspect.

@sandreenko
Copy link
Author

I will update PR with the

int i;
while (iter.NextElem(&i))

without defines and with GetAssertionIndex commit.

Sergey Andreenko added 5 commits July 3, 2017 14:14
We usually use bbNum (basic block number), rather than blkNum(block
number).
This change allows to grep for iterator and etc easier.
@sandreenko
Copy link
Author

@dotnet-bot test OSX10.12 x64 Checked Build and Test
@dotnet-bot test Ubuntu x64 Checked Build and Test

@sandreenko
Copy link
Author

Useful parts were merged as separate PRs.

@sandreenko sandreenko closed this Jul 11, 2017
@karelz karelz modified the milestone: 2.1.0 Aug 28, 2017
@sandreenko sandreenko deleted the iteratorPerf branch November 2, 2018 22:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants