-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SE: Maintenance: Cache GetHashCode for ProgramState #6983
SE: Maintenance: Cache GetHashCode for ProgramState #6983
Conversation
Before: 45.7 GB allocations Comparison (left: before, right: after) Enumerator halved in allocation count and GBs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's try to wrap it to remove duplications.
private int? exceptionsHashCode; | ||
private int? hashCode; | ||
|
||
// Current SymbolicValue result of a given operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can remove this. I don't find it useful anymore when the engine has evolved and the ProgramState is well understood.
// Current SymbolicValue result of a given operation |
analyzers/src/SonarAnalyzer.Common/SymbolicExecution/Roslyn/ProgramState.cs
Show resolved
Hide resolved
HashCode.DictionaryContentHash(CaptureOperation), | ||
HashCode.EnumerableContentHash(PreservedSymbols), | ||
HashCode.EnumerableContentHash(Exceptions)); | ||
hashCode ??= HashCode.Combine( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you measure how many times we create ProgramState
without ever calling getHashCode? In theory, we could do this math at creation time.. or at least inside the wrapped type for the sub-items.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the result for the NullPointerDereference_Roslyn_CS test case:
Count of GetHashcode calls | ProgramState unique objects count | Description |
---|---|---|
0 | 5805 | GetHashcode was never called |
1 | 4282 | GetHashcode was called exactly once |
145 | 1 | GetHashcode was called 145 times for 1 instance |
So we have more than half of the objects, where GetHashCode is never called, and the other half, where GetHashCode is called once. We have one case, where GetHashCode is called 145 times. This is most likely the ProgramState.Empty instance.
This shows that
- it is more important to cache the single hashcodes than the total hashcode
- it is important to lazy evaluate the hashcode.
Here is the set of values from NullPointerDereference_Roslyn_CSharp8. It shows about the same results:
Count of GetHashcode calls | ProgramState unique objects count |
---|---|
0 | 1298 |
1 | 772 |
32 | 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did similar stats against Nancy project:
0x 145696
1x: 124010
2x: 13
3x: 18
3520x: 1 - And I confirmed it's the "Empty" one.
To me, it does not make sense to cache it. There's nothing to win, there's no cache-hit.
The only shortcut we can take is to return 0
when reference equals to Empty
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The saving is that in 124.010 cases we do calculate most likely a single hashcode for a single ImmutableDictionary instead of 5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. If the main gain is on the partial sub-results, what about removing hashCode
itself?
It's calculation should be rather cheap. And it's not reused much. It will also simplify the inits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an alternative solution, we can make the hashcode implementation allocations free (my main concern here). I did so in #7012. There you can find an allocation comparison between the base-line, this PR (caching), and an allocation-free (#7012) implementation. I would recommend merging #7012 in any case and we may also merge this PR on top of it (which would additionally save some CPU cycles).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's merge the other one. And measure the impact of this after.
e4d92c6
to
a918e14
Compare
119d33f
to
ad802d1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the measurements, this change doesn't make sense. We can take shortcut on the Empty
instance.
bc50906
to
771b8bf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably last set of questions
HashCode.DictionaryContentHash(CaptureOperation), | ||
HashCode.EnumerableContentHash(PreservedSymbols), | ||
HashCode.EnumerableContentHash(Exceptions)); | ||
hashCode ??= HashCode.Combine( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. If the main gain is on the partial sub-results, what about removing hashCode
itself?
It's calculation should be rather cheap. And it's not reused much. It will also simplify the inits.
HashCode.DictionaryContentHash(CaptureOperation), | ||
HashCode.EnumerableUnorderedContentHash(PreservedSymbols), | ||
HashCode.EnumerableOrderedContentHash(Exceptions)); | ||
hashCode ??= HashCode.Combine( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd need a performance comment here, explaining the overall conclusion of this PR... and why every new field should have its HashCode
private int? hashCode; | ||
|
||
private ImmutableDictionary<IOperation, SymbolicValue> OperationValue { get => operationValue; init => SetCachedHashCodeField(value, ref operationValue, ref operationValueHashCode); } | ||
private ImmutableDictionary<ISymbol, SymbolicValue> SymbolValue { get => symbolValue; init => SetCachedHashCodeField(value, ref symbolValue, ref symbolValueHashCode); } | ||
private ImmutableDictionary<int, int> VisitCount { get; init; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's put VisitCount
last, to make it more regular. Also in the copy constructor
771b8bf
to
2dce265
Compare
…bols in NonNullableValueTypeCheck (#6995)
d1fa962
to
d6211d9
Compare
2dce265
to
8567e1c
Compare
Kudos, SonarCloud Quality Gate passed! |
SonarCloud Quality Gate failed. |
Based on the merged allocation free hashcode calculation, we see these figures:
The changes are minimal and within the margin of error
I haven't really measured the impact on runtime as this requires more than one run but the analyzer runner reported these numbers for the two runs:
I would suggest closing the PR (as the changes are really ugly) and instead paying more attention to |
Makes sense to close it. The root of the problem is gone. |
Part of #6964