Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SE: Maintenance: Cache GetHashCode for ProgramState #6983

Conversation

martin-strecker-sonarsource
Copy link
Contributor

Part of #6964

@martin-strecker-sonarsource
Copy link
Contributor Author

martin-strecker-sonarsource commented Mar 23, 2023

Before: 45.7 GB allocations
After: 44.4 GB allocations
Note: This PR and the baseline for comparison both don't have the cache of the HashCode of SymbolicValue included (#6968).

Comparison (left: before, right: after)
image

Enumerator halved in allocation count and GBs

Copy link
Contributor

@pavel-mikula-sonarsource pavel-mikula-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's try to wrap it to remove duplications.

private int? exceptionsHashCode;
private int? hashCode;

// Current SymbolicValue result of a given operation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this. I don't find it useful anymore when the engine has evolved and the ProgramState is well understood.

Suggested change
// Current SymbolicValue result of a given operation

HashCode.DictionaryContentHash(CaptureOperation),
HashCode.EnumerableContentHash(PreservedSymbols),
HashCode.EnumerableContentHash(Exceptions));
hashCode ??= HashCode.Combine(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you measure how many times we create ProgramState without ever calling getHashCode? In theory, we could do this math at creation time.. or at least inside the wrapped type for the sub-items.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the result for the NullPointerDereference_Roslyn_CS test case:

Count of GetHashcode calls ProgramState unique objects count Description
0 5805 GetHashcode was never called
1 4282 GetHashcode was called exactly once
145 1 GetHashcode was called 145 times for 1 instance

So we have more than half of the objects, where GetHashCode is never called, and the other half, where GetHashCode is called once. We have one case, where GetHashCode is called 145 times. This is most likely the ProgramState.Empty instance.
This shows that

  • it is more important to cache the single hashcodes than the total hashcode
  • it is important to lazy evaluate the hashcode.

Here is the set of values from NullPointerDereference_Roslyn_CSharp8. It shows about the same results:

Count of GetHashcode calls ProgramState unique objects count
0 1298
1 772
32 1

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did similar stats against Nancy project:
0x 145696
1x: 124010
2x: 13
3x: 18
3520x: 1 - And I confirmed it's the "Empty" one.

To me, it does not make sense to cache it. There's nothing to win, there's no cache-hit.

The only shortcut we can take is to return 0 when reference equals to Empty.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The saving is that in 124.010 cases we do calculate most likely a single hashcode for a single ImmutableDictionary instead of 5.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. If the main gain is on the partial sub-results, what about removing hashCode itself?

It's calculation should be rather cheap. And it's not reused much. It will also simplify the inits.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an alternative solution, we can make the hashcode implementation allocations free (my main concern here). I did so in #7012. There you can find an allocation comparison between the base-line, this PR (caching), and an allocation-free (#7012) implementation. I would recommend merging #7012 in any case and we may also merge this PR on top of it (which would additionally save some CPU cycles).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's merge the other one. And measure the impact of this after.

Copy link
Contributor

@pavel-mikula-sonarsource pavel-mikula-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the measurements, this change doesn't make sense. We can take shortcut on the Empty instance.

Copy link
Contributor

@pavel-mikula-sonarsource pavel-mikula-sonarsource left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably last set of questions

HashCode.DictionaryContentHash(CaptureOperation),
HashCode.EnumerableContentHash(PreservedSymbols),
HashCode.EnumerableContentHash(Exceptions));
hashCode ??= HashCode.Combine(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. If the main gain is on the partial sub-results, what about removing hashCode itself?

It's calculation should be rather cheap. And it's not reused much. It will also simplify the inits.

HashCode.DictionaryContentHash(CaptureOperation),
HashCode.EnumerableUnorderedContentHash(PreservedSymbols),
HashCode.EnumerableOrderedContentHash(Exceptions));
hashCode ??= HashCode.Combine(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd need a performance comment here, explaining the overall conclusion of this PR... and why every new field should have its HashCode

private int? hashCode;

private ImmutableDictionary<IOperation, SymbolicValue> OperationValue { get => operationValue; init => SetCachedHashCodeField(value, ref operationValue, ref operationValueHashCode); }
private ImmutableDictionary<ISymbol, SymbolicValue> SymbolValue { get => symbolValue; init => SetCachedHashCodeField(value, ref symbolValue, ref symbolValueHashCode); }
private ImmutableDictionary<int, int> VisitCount { get; init; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's put VisitCount last, to make it more regular. Also in the copy constructor

@martin-strecker-sonarsource martin-strecker-sonarsource force-pushed the Martin/SE_6964_04_CacheGetHashCodeProgramState branch from 2dce265 to 8567e1c Compare March 31, 2023 16:58
@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
No Duplication information No Duplication information

@sonarqubecloud
Copy link

SonarCloud Quality Gate failed.    Quality Gate failed

Bug B 1 Bug
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 5 Code Smells

100.0% 100.0% Coverage
0.0% 0.0% Duplication

@martin-strecker-sonarsource
Copy link
Contributor Author

martin-strecker-sonarsource commented Mar 31, 2023

Based on the merged allocation free hashcode calculation, we see these figures:
image

ProgramState has more fields now and is bigger and therefore takes more MB.

The changes are minimal and within the margin of error

Run Byte object count
Base line 39.451.830 692.333
cached 39.886.055 691.585

I haven't really measured the impact on runtime as this requires more than one run but the analyzer runner reported these numbers for the two runs:

  • With Cache: 159365,7515 ms
  • Base line: 165859,5391 ms

I would suggest closing the PR (as the changes are really ugly) and instead paying more attention to System.Collections.Immutable.SortedInt32KeyNode<> which takes the crown in allocations (object count and MB). I described in #6964 under Rethink ProgramState.AddVisit what we can do there.

Base automatically changed from feature/SE to master April 4, 2023 13:05
@pavel-mikula-sonarsource
Copy link
Contributor

Makes sense to close it. The root of the problem is gone.

@pavel-mikula-sonarsource pavel-mikula-sonarsource deleted the Martin/SE_6964_04_CacheGetHashCodeProgramState branch April 6, 2023 07:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants