Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address message aggregation overhead #203

Merged

Conversation

richardstartin
Copy link
Member

@richardstartin richardstartin commented Sep 28, 2022

We frequently see high overhead in message aggregation:

Screenshot 2022-09-28 at 13 26 45

This PR consists of 3 parts:

  1. Use Map.putIfAbsent to avoid doubling the work done in Map.get then Map.put (note the almost identical stack traces next to eachother). This has to be done via MethodHandles but this can be greatly simplified when JDK7 support is dropped.
  2. Tweak Message for better hashing (note the TreeMap frames below HashMap which indicate hash collisions):
    • include the type in hashCode to avoid collisions in more cases
    • Use Arrays.hashCode instead of Objects.hash for the tags because the latter triggers an IDE warning for ambiguous varargs parameters
    • Remove the mutable done field from equals which can lead to memory leaks if the value changes after inserting a key into the map
    • short circuit equals when compared with this
    • Implement Comparable for better degradation when has collisions do occur
  3. Change the aggregate types to EnumMap and make it static since it can be. The profile doesn't justify the change and I can drop the commit, but it's a generally worthwhile change.

@richardstartin
Copy link
Member Author

This has led to a really good improvement after deploying to the same service

Screenshot 2022-09-29 at 12 48 06

@richardstartin
Copy link
Member Author

It appears that the defective equals was already partially addressed in #194 but hasn't been released yet. The changes here are still necessary though:

  1. In AlphaNumericMessage.equals there is no need to call super.equals twice
  2. Message.equals must not depend on mutable done if it is to be used as a Map key
  3. equals(this) can be short-cut
  4. Implementing Comparable for map keys limits how bad things can get if hashes collide
  5. Using putIfAbsent halves the cost which halves how bad things can get when hashes collide

However, the improvement we see after deploying this PR cannot be solely attributed to it and will be partially because of #194

@richardstartin
Copy link
Member Author

I can confirm that the majority of the improvement we see is explained by the changes already made in #194 - it would be great if that could be released and pushed out to users. It's not critical to merge the changes made here, though I do still consider them beneficial.

@richardstartin
Copy link
Member Author

Having monitored the service on the master branch of dogstatsd overnight, we have seen cases where implementing comparable and using Map.putIfAbsent would have been beneficial:
Screenshot 2022-09-30 at 09 38 38
Since this takes place within a lock, it can be seen to contribute to the contention profile, so it's worth minimising time spent holding the lock
Screenshot 2022-09-30 at 09 39 02

@richardstartin richardstartin force-pushed the rgs/message-aggregation-overhead branch from 01650c1 to 6674963 Compare October 27, 2022 16:36
@richardstartin
Copy link
Member Author

richardstartin commented Oct 28, 2022

I created a very simple benchmark here to demonstrate the benefits of this change. The benchmark aggregates numeric messages using 4 benchmark threads, contending on an aggregator with 4 shards, measuring aggregation throughput at saturation. Increasing the number of shards doesn’t appear to change much in this set up, but may well do in highly concurrent environments.

The benchmark is parameterised:

  1. it has a set of tag values which produce hash collisions
  2. it has another set of tag values which do not produce hash collisions

These values may look silly, but hash collisions are a fact of life with polynomial hash codes, even if contrived cases look... contrived.

Running on the released 4.1.0 (make sure to clear out your .m2 cache before building to make sure it is downloaded from Maven Central though because the project doesn't use a SNAPSHOT version), when there are hash collisions, the throughput roughly halves:

Benchmark                                       (numAspects)  (numTags)                                (rawTags)   Mode  Cnt      Score     Error   Units
DogStatsDAggregationBenchmark.aggregateCounter             1         10  C}~,C~_,D^~,D__,D`@,Da!,E?~,E@_,EA@,EB!  thrpt    5  20778.006 ± 259.774  ops/ms
DogStatsDAggregationBenchmark.aggregateCounter             1         10                      0,1,2,3,4,5,6,7,8,9  thrpt    5  39062.510 ± 778.435  ops/ms

On this branch, the throughput still degrades under collisions, but is improved in both cases

Benchmark                                       (numAspects)  (numTags)                                (rawTags)   Mode  Cnt      Score      Error   Units
DogStatsDAggregationBenchmark.aggregateCounter             1         10  C}~,C~_,D^~,D__,D`@,Da!,E?~,E@_,EA@,EB!  thrpt    5  26633.658 ±  850.338  ops/ms
DogStatsDAggregationBenchmark.aggregateCounter             1         10                      0,1,2,3,4,5,6,7,8,9  thrpt    5  48218.414 ± 1379.077  ops/ms

However, the case with collisions here isn’t pathological enough to cause degradation to HashMap$TreeNode storage as was seen happening on the master branch (see screenshot in comment), which is where implementing Comparable comes in because Comparable keys produce more balanced trees. If I had more time for this I would construct benchmark parameters where there are enough collisions (there need to be more than 8 keys in a bucket in a HashMap with size larger than 64 for this to happen) but I feel the benchmark already justifies the simple changes made here.

…rable contract, add test to verify expected aggregation behaviour
@richardstartin richardstartin force-pushed the rgs/message-aggregation-overhead branch from d4971de to 46ffe13 Compare October 31, 2022 15:22
… to delete when increasing the language level
@richardstartin richardstartin force-pushed the rgs/message-aggregation-overhead branch from 46ffe13 to c7cf1cb Compare October 31, 2022 15:27
src/main/java/com/timgroup/statsd/AlphaNumericMessage.java Outdated Show resolved Hide resolved
src/main/java/com/timgroup/statsd/Message.java Outdated Show resolved Hide resolved
if (MAP_PUT_IF_ABSENT != null) {
try {
return (Message) (Object) MAP_PUT_IF_ABSENT.invokeExact(map, (Object) message, (Object) message);
} catch (Throwable ignore) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of exceptions can be expected here? Not propagating exceptions further means if this code breaks, it will break silently.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No exception can be thrown here, it's just necessary to keep the compiler happy.

src/main/java/com/timgroup/statsd/Message.java Outdated Show resolved Hide resolved
@richardstartin richardstartin force-pushed the rgs/message-aggregation-overhead branch from b61b169 to e687607 Compare November 2, 2022 14:55
@richardstartin richardstartin force-pushed the rgs/message-aggregation-overhead branch from e687607 to 9238fbe Compare November 2, 2022 14:57
@jbachorik
Copy link

Hi @vickenty - I wonder, are there any fundamental issues left for this PR to be resolved?
If not, could we get it approved and merged? This would be a nice 'performance win' :)

@vickenty vickenty merged commit 76e0e71 into DataDog:master Nov 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants