Self-organizing hash table to improve the performance of bucket aggregations #7652

ketanv3 · 2023-05-22T09:52:28Z

Description

Some aggregations know ahead of time about the number of buckets and owning ordinals (eg. range aggregations), thus, are able to map densely and directly. Most other aggregations cannot predict this, thus, rely on a hash table lookup to identify the owning bucket ordinal.

Aggregations in the "nyc_taxis" and the "http_logs" workloads revealed that a major share of the CPU time is spent in these lookups. As these lookups are performed for each document matched by the query, even a small improvement in the hash table can have a large overall improvement.

Latest Approach: `ReorganizingLongHash`

This PR introduces ReorganizingLongHash which is a drop-in replacement for LongHash. It organizes itself by moving keys around dynamically in order to reduce the longest probe sequence length (PSL), which makes lookups faster as keys are likely to be found in the same CPU cache line. It also optimizes lookups for recently added keys, making it useful for aggregations where keys are correlated across consecutive hits.

The expected longest PSL in a full table is Θ(ln n).
Inserts are slightly expensive due to additional bookkeeping, but lookups are faster. As the number of hits >> number of buckets, most keys only require a lookup (i.e. the number of unique inserts is less). This makes the "add" operation faster overall.
Keys are likely to be correlated across consecutive hits (eg. belonging to the same minute/hour/day in the case of date histogram aggregation). By placing a "recent" key closer to its ideal home slot, we can reduce the number of probes on such workloads.
Faster bit-mixers were evaluated (#7345), but their distribution were unacceptable.

Probe Sequence Length

It is the number of probes needed to find the desired key in the hash table.

The average PSL is nearly identical. This is expected as the sum of PSLs is preserved after keys are rearranged.

The longest PSL is reduced significantly, thus improving the worst-case performance of certain keys.

Memory Usage

The memory usage is identical. This is expected as the initial capacity, growth rate, and the underlying table implementation remains the same. Only the encoding and placement strategy has changed.

JMH Benchmarks

Uncorrelated Adds

This represents workloads where keys arrive in random order. Performance is similar on small number of keys, but gets better with large number of keys, mainly due to the reduced number of memory lookups.

Correlated Adds

This represents workloads where adjacent keys are likely to be similar (eg. bucketing on timestamp where two consecutive hits are likely to share the same "hour"). Performance is significantly better as keys are likely to be found at their ideal home slot.

Distinct Adds

This represents workloads with high cardinality, i.e., each hit maps to a different key. Performance gets better due to the reduced number of memory lookups.

OSB Benchmarks

nyc_taxis

There is around 15-23% improvement on Intel and 8-12% on Graviton.

c6i.2xlarge

Metric	Task	Baseline	Contender	Diff	Unit
Min Throughput	autohisto_agg	1.5042	1.50298	-0.00122	ops/s
Mean Throughput	autohisto_agg	1.50684	1.50485	-0.00199	ops/s
Median Throughput	autohisto_agg	1.50625	1.50443	-0.00182	ops/s
Max Throughput	autohisto_agg	1.51209	1.50857	-0.00351	ops/s
50th percentile latency	autohisto_agg	345.824	267.783	-78.041	ms
90th percentile latency	autohisto_agg	354.344	274.595	-79.7489	ms
99th percentile latency	autohisto_agg	357.427	292.599	-64.8274	ms
100th percentile latency	autohisto_agg	359.794	296.043	-63.7512	ms
50th percentile service time	autohisto_agg	344.754	266.345	-78.4085	ms
90th percentile service time	autohisto_agg	353.373	273.745	-79.6275	ms
99th percentile service time	autohisto_agg	356.595	291.844	-64.7506	ms
100th percentile service time	autohisto_agg	358.902	294.78	-64.1225	ms
error rate	autohisto_agg	0	0	0	%
Min Throughput	date_histogram_agg	1.50502	1.50575	0.00072	ops/s
Mean Throughput	date_histogram_agg	1.50819	1.50949	0.0013	ops/s
Median Throughput	date_histogram_agg	1.50746	1.50865	0.00119	ops/s
Max Throughput	date_histogram_agg	1.51452	1.51704	0.00252	ops/s
50th percentile latency	date_histogram_agg	330.51	281.653	-48.8575	ms
90th percentile latency	date_histogram_agg	337.787	287.68	-50.1066	ms
99th percentile latency	date_histogram_agg	347.023	307.381	-39.6416	ms
100th percentile latency	date_histogram_agg	353.087	324.471	-28.6164	ms
50th percentile service time	date_histogram_agg	329.366	280.401	-48.9651	ms
90th percentile service time	date_histogram_agg	336.458	286.567	-49.8908	ms
99th percentile service time	date_histogram_agg	345.665	305.908	-39.7566	ms
100th percentile service time	date_histogram_agg	352.358	323.602	-28.7562	ms
error rate	date_histogram_agg	0	0	0	%

c6g.2xlarge

Metric	Task	Baseline	Contender	Diff	Unit
Min Throughput	autohisto_agg	1.50132	1.50246	0.00114	ops/s
Mean Throughput	autohisto_agg	1.50213	1.50401	0.00188	ops/s
Median Throughput	autohisto_agg	1.50195	1.50365	0.0017	ops/s
Max Throughput	autohisto_agg	1.50376	1.50705	0.00329	ops/s
50th percentile latency	autohisto_agg	558.64	493.889	-64.7505	ms
90th percentile latency	autohisto_agg	560.574	503.981	-56.5928	ms
99th percentile latency	autohisto_agg	561.542	519.913	-41.6294	ms
100th percentile latency	autohisto_agg	562.369	519.959	-42.4109	ms
50th percentile service time	autohisto_agg	557.821	492.454	-65.3662	ms
90th percentile service time	autohisto_agg	559.619	502.671	-56.9488	ms
99th percentile service time	autohisto_agg	560.985	518.241	-42.7436	ms
100th percentile service time	autohisto_agg	561.36	518.883	-42.4771	ms
error rate	autohisto_agg	0	0	0	%
Min Throughput	date_histogram_agg	1.50196	1.50266	0.00071	ops/s
Mean Throughput	date_histogram_agg	1.5032	1.50433	0.00113	ops/s
Median Throughput	date_histogram_agg	1.50293	1.50395	0.00102	ops/s
Max Throughput	date_histogram_agg	1.50562	1.50764	0.00202	ops/s
50th percentile latency	date_histogram_agg	531.506	486.349	-45.157	ms
90th percentile latency	date_histogram_agg	532.87	488.923	-43.9464	ms
99th percentile latency	date_histogram_agg	533.273	511.675	-21.598	ms
100th percentile latency	date_histogram_agg	534.678	514.076	-20.6019	ms
50th percentile service time	date_histogram_agg	530.393	485.487	-44.906	ms
90th percentile service time	date_histogram_agg	531.645	487.663	-43.9819	ms
99th percentile service time	date_histogram_agg	532.286	510.785	-21.5009	ms
100th percentile service time	date_histogram_agg	532.896	513.15	-19.7462	ms
error rate	date_histogram_agg	0	0	0	%

http_logs

There is around 23% improvement on Intel and 11% on Graviton.

c6i.2xlarge

Metric	Task	Baseline	Contender	Diff	Unit
Min Throughput	hourly_agg	0.20066	0.200747	9e-05	ops/s
Mean Throughput	hourly_agg	0.200913	0.201033	0.00012	ops/s
Median Throughput	hourly_agg	0.200878	0.200994	0.00012	ops/s
Max Throughput	hourly_agg	0.201311	0.201485	0.00017	ops/s
50th percentile latency	hourly_agg	1675.88	1284.36	-391.512	ms
90th percentile latency	hourly_agg	1726.11	1336.14	-389.966	ms
99th percentile latency	hourly_agg	1834.83	1373.69	-461.141	ms
100th percentile latency	hourly_agg	1866.72	1412.86	-453.857	ms
50th percentile service time	hourly_agg	1672.5	1280.77	-391.733	ms
90th percentile service time	hourly_agg	1721.53	1332.07	-389.458	ms
99th percentile service time	hourly_agg	1830.75	1370.11	-460.638	ms
100th percentile service time	hourly_agg	1862.45	1408.22	-454.229	ms
error rate	hourly_agg	0	0	0	%

c6g.2xlarge

Metric	Task	Baseline	Contender	Diff	Unit
Min Throughput	hourly_agg	0.200508	0.200526	2e-05	ops/s
Mean Throughput	hourly_agg	0.200702	0.200727	2e-05	ops/s
Median Throughput	hourly_agg	0.200676	0.200699	2e-05	ops/s
Max Throughput	hourly_agg	0.201008	0.201043	3e-05	ops/s
50th percentile latency	hourly_agg	2445.65	2167.05	-278.599	ms
90th percentile latency	hourly_agg	2448.21	2168.8	-279.413	ms
99th percentile latency	hourly_agg	2462.3	2173.48	-288.816	ms
100th percentile latency	hourly_agg	2543.93	2187.5	-356.429	ms
50th percentile service time	hourly_agg	2442.78	2163.35	-279.429	ms
90th percentile service time	hourly_agg	2444.92	2165.02	-279.904	ms
99th percentile service time	hourly_agg	2459.01	2170.67	-288.345	ms
100th percentile service time	hourly_agg	2540.64	2183.31	-357.334	ms
error rate	hourly_agg	0	0	0	%

Previous Approach: `LongRHHash`

This PR introduces LongRHHash which is a drop-in replacement for LongHash. It combines ideas from Robin Hood hashing and Cuckoo hashing to improve performance. Here's a summary:

Keys are moved around dynamically in order to reduce the probe sequence length (PSL). The expected longest PSL in a full table is Θ(ln n). Inserts are slightly expensive due to additional bookkeeping, but lookups are faster.
- This reduces the worst-case lookup time of keys with a large PSL.
- Keys are also more likely to be found in the same CPU cache line (typically 64 bytes).
As the number of hits >> number of buckets, most keys only require a lookup (i.e. the number of unique inserts is less). This makes the "add" operation faster overall.
Keys are likely to be correlated across consecutive hits (eg. belonging to the same minute/hour/day in the case of date histogram aggregation). A cheap hash function is used to look up recent keys, then falling back to a good hash function with linear probing.
Faster bit-mixers were evaluated (#7345), but their distribution were unacceptable.

Probe Sequence Length

It is the number of probes needed to find the desired key in the hash table.

The average PSL is nearly identical. This is expected as the sum of PSLs is preserved after keys are rearranged.

The longest PSL is reduced significantly, thus improving the worst-case performance of certain keys.

JMH Benchmarks

These benchmarks were performed on c6i.2xlarge (Intel) / c6g.2xlarge (Graviton) instances running OpenJDK 17.0.6. These results represent the time to perform 16.7M "adds" with varying number of distinct buckets.

Uncorrelated Adds

This represents workloads where keys arrive in random order. "Recent" lookups are ineffective for large hash tables, so the improvement solely comes from the rearrangement of keys.

(A) is the baseline, (B) is the contender without "recent" lookups, and (C) is the contender with "recent" lookups.

(B) was better than (A) throughout the spectrum.
(C) was much better at low occupancy, but performed poorly with larger table sizes (~150+) due to a cache-miss followed by a full linear probe.
(C) uses a fixed size (256) "recent" table which performs well until 60% occupancy. It is possible to grow this table or switch to (B) if the hash table grows larger. While it may improve performance on synthetic benchmarks, it may not translate to any real-world performance benefits.

Correlated Adds

This represents workloads where adjacent keys are likely to be similar (eg. bucketing on timestamp where two consecutive hits are likely to share the same "hour").

(A) is the baseline, (B) is the contender with "recent" lookups. Performance is consistent (no sawtooth pattern) and keeps improving as the number of buckets grow. This is because, with two independent hash functions, it is unlikely that both collide.

OSB Benchmarks

nyc_taxis

There is 14-17% improvement on Intel and 8-12% improvement on Graviton.

c6i.2xlarge details

Metric	Task	Baseline	Contender	Diff	Unit
Min Throughput	autohisto_agg	1.5042	1.50502	0.00082	ops/s
Mean Throughput	autohisto_agg	1.50684	1.50818	0.00134	ops/s
Median Throughput	autohisto_agg	1.50625	1.50746	0.00121	ops/s
Max Throughput	autohisto_agg	1.51209	1.51452	0.00243	ops/s
50th percentile latency	autohisto_agg	345.824	298.608	-47.2159	ms
90th percentile latency	autohisto_agg	354.344	307.344	-46.9998	ms
99th percentile latency	autohisto_agg	357.427	316.633	-40.7939	ms
100th percentile latency	autohisto_agg	359.794	326.847	-32.9466	ms
50th percentile service time	autohisto_agg	344.754	297.394	-47.3594	ms
90th percentile service time	autohisto_agg	353.373	305.879	-47.4936	ms
99th percentile service time	autohisto_agg	356.595	315.587	-41.0082	ms
100th percentile service time	autohisto_agg	358.902	326.042	-32.8601	ms
error rate	autohisto_agg	0	0	0	%
Min Throughput	date_histogram_agg	1.50502	1.50586	0.00084	ops/s
Mean Throughput	date_histogram_agg	1.50819	1.50967	0.00148	ops/s
Median Throughput	date_histogram_agg	1.50746	1.5088	0.00134	ops/s
Max Throughput	date_histogram_agg	1.51452	1.51738	0.00286	ops/s
50th percentile latency	date_histogram_agg	330.51	275.304	-55.2067	ms
90th percentile latency	date_histogram_agg	337.787	285.986	-51.8007	ms
99th percentile latency	date_histogram_agg	347.023	307.172	-39.8502	ms
100th percentile latency	date_histogram_agg	353.087	310.265	-42.8225	ms
50th percentile service time	date_histogram_agg	329.366	273.975	-55.3916	ms
90th percentile service time	date_histogram_agg	336.458	285.122	-51.3364	ms
99th percentile service time	date_histogram_agg	345.665	306.428	-39.2366	ms
100th percentile service time	date_histogram_agg	352.358	308.634	-43.7241	ms
error rate	date_histogram_agg	0	0	0	%

c6g.2xlarge details

Metric	Task	Baseline	Contender	Diff	Unit
Min Throughput	autohisto_agg	1.50132	1.50251	0.00119	ops/s
Mean Throughput	autohisto_agg	1.50213	1.50408	0.00195	ops/s
Median Throughput	autohisto_agg	1.50195	1.50373	0.00179	ops/s
Max Throughput	autohisto_agg	1.50376	1.50722	0.00346	ops/s
50th percentile latency	autohisto_agg	558.64	491.169	-67.471	ms
90th percentile latency	autohisto_agg	560.574	492.553	-68.021	ms
99th percentile latency	autohisto_agg	561.542	494.173	-67.3688	ms
100th percentile latency	autohisto_agg	562.369	510.678	-51.6918	ms
50th percentile service time	autohisto_agg	557.821	490.021	-67.7991	ms
90th percentile service time	autohisto_agg	559.619	490.908	-68.7118	ms
99th percentile service time	autohisto_agg	560.985	492.691	-68.2933	ms
100th percentile service time	autohisto_agg	561.36	508.64	-52.7199	ms
error rate	autohisto_agg	0	0	0	%
Min Throughput	date_histogram_agg	1.50196	1.50268	0.00072	ops/s
Mean Throughput	date_histogram_agg	1.5032	1.50435	0.00115	ops/s
Median Throughput	date_histogram_agg	1.50293	1.50397	0.00103	ops/s
Max Throughput	date_histogram_agg	1.50562	1.50768	0.00205	ops/s
50th percentile latency	date_histogram_agg	531.506	487.024	-44.4823	ms
90th percentile latency	date_histogram_agg	532.87	488.058	-44.8119	ms
99th percentile latency	date_histogram_agg	533.273	489.811	-43.4617	ms
100th percentile latency	date_histogram_agg	534.678	496.169	-38.5092	ms
50th percentile service time	date_histogram_agg	530.393	486.082	-44.311	ms
90th percentile service time	date_histogram_agg	531.645	486.96	-44.6849	ms
99th percentile service time	date_histogram_agg	532.286	488.544	-43.7412	ms
100th percentile service time	date_histogram_agg	532.896	495.547	-37.3498	ms
error rate	date_histogram_agg	0	0	0	%

http_logs

There is 25% improvement on Intel and 12% improvement on Graviton.

c6i.2xlarge details

Metric	Task	Baseline	Contender	Diff	Unit
Min Throughput	hourly_agg	0.20066	0.200759	0.0001	ops/s
Mean Throughput	hourly_agg	0.200913	0.20105	0.00014	ops/s
Median Throughput	hourly_agg	0.200878	0.20101	0.00013	ops/s
Max Throughput	hourly_agg	0.201311	0.201509	0.0002	ops/s
50th percentile latency	hourly_agg	1675.88	1249.67	-426.202	ms
90th percentile latency	hourly_agg	1726.11	1294.06	-432.045	ms
99th percentile latency	hourly_agg	1834.83	1346.8	-488.03	ms
100th percentile latency	hourly_agg	1866.72	1362.41	-504.306	ms
50th percentile service time	hourly_agg	1672.5	1244.76	-427.744	ms
90th percentile service time	hourly_agg	1721.53	1289.75	-431.78	ms
99th percentile service time	hourly_agg	1830.75	1342.43	-488.317	ms
100th percentile service time	hourly_agg	1862.45	1358.18	-504.272	ms
error rate	hourly_agg	0	0	0	%

c6g.2xlarge details

Metric	Task	Baseline	Contender	Diff	Unit
Min Throughput	hourly_agg	0.200508	0.20057	6e-05	ops/s
Mean Throughput	hourly_agg	0.200702	0.200788	9e-05	ops/s
Median Throughput	hourly_agg	0.200676	0.200758	8e-05	ops/s
Max Throughput	hourly_agg	0.201008	0.201131	0.00012	ops/s
50th percentile latency	hourly_agg	2445.65	2141.59	-304.058	ms
90th percentile latency	hourly_agg	2448.21	2145.17	-303.046	ms
99th percentile latency	hourly_agg	2462.3	2149.36	-312.941	ms
100th percentile latency	hourly_agg	2543.93	2159	-384.936	ms
50th percentile service time	hourly_agg	2442.78	2138.38	-304.401	ms
90th percentile service time	hourly_agg	2444.92	2141.43	-303.493	ms
99th percentile service time	hourly_agg	2459.01	2145.47	-313.546	ms
100th percentile service time	hourly_agg	2540.64	2155.71	-384.934	ms
error rate	hourly_agg	0	0	0	%

Check List

New functionality includes testing.
- All tests pass
New functionality has been documented.
- New functionality has javadoc added
Commits are signed per the DCO using --signoff
Commit changes are listed out in CHANGELOG.md file (See: Changelog)

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

github-actions · 2023-05-22T09:56:11Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/15839/
CommitID: a222e8b
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-05-24T08:17:20Z

Gradle Check (Jenkins) Run Completed with:

RESULT: FAILURE ❌
URL: https://build.ci.opensearch.org/job/gradle-check/16162/
CommitID: f59172f
Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green.
Is the failure a flaky test unrelated to your change?

github-actions · 2023-05-24T21:52:28Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.cluster.allocation.AwarenessAllocationIT.testThreeZoneOneReplicaWithForceZoneValueAndLoadAwareness

URL: https://build.ci.opensearch.org/job/gradle-check/16245/
CommitID: cfb9ac8
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

codecov · 2023-05-24T21:56:08Z

Codecov Report

Merging #7652 (e3d3915) into main (0c7ba94) will increase coverage by 0.00%.
The diff coverage is 43.66%.

@@            Coverage Diff             @@
##               main    #7652    +/-   ##
==========================================
  Coverage     70.96%   70.96%            
- Complexity    56780    56790    +10     
==========================================
  Files          4733     4735     +2     
  Lines        268140   268281   +141     
  Branches      39287    39308    +21     
==========================================
+ Hits         190285   190387   +102     
- Misses        61816    61853    +37     
- Partials      16039    16041     +2

Impacted Files	Coverage Δ
.../org/opensearch/common/util/LongHashBenchmark.java	`0.00% <0.00%> (ø)`
...g/opensearch/common/util/ReorganizingLongHash.java	`95.31% <95.31%> (ø)`
...aggregations/bucket/terms/LongKeyedBucketOrds.java	`82.75% <100.00%> (ø)`

... and 447 files with indirect coverage changes

github-actions · 2023-05-24T22:01:44Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.cluster.routing.allocation.decider.DiskThresholdDeciderIT.testIndexCreateBlockIsRemovedWhenAnyNodesNotExceedHighWatermarkWithAutoReleaseEnabled

URL: https://build.ci.opensearch.org/job/gradle-check/16246/
CommitID: b18dfcb
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

github-actions · 2023-06-16T08:31:24Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu

URL: https://build.ci.opensearch.org/job/gradle-check/17747/
CommitID: bc6f294
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

ketanv3 · 2023-06-16T09:29:51Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕

TEST FAILURES:
      1 org.opensearch.search.backpressure.SearchBackpressureIT.testSearchShardTaskCancellationWithHighCpu
URL: https://build.ci.opensearch.org/job/gradle-check/17747/

CommitID: bc6f294
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

Unrelated to this PR. This flaky-test is being fixed in #8063.

Codecov/patch failed as it's expecting test coverage in the benchmark file. This should probably be excluded.

backslasht

Looks good to me! Thanks for introducing ReorganizingLongHash.

ketanv3 · 2023-06-19T10:10:19Z

Hi @dblock, can you check the latest revision and comments?

dblock · 2023-06-26T23:15:36Z

I'm good with this. There's a weird changelog conflict, care to resolve? I tried to use the GitHub UI and I didn't like its suggestion that had nothing to do with this PR :)

…ggregations Signed-off-by: Ketan Verma <[email protected]>

…ed in the hash table itself Signed-off-by: Ketan Verma <[email protected]>

Signed-off-by: Ketan Verma <[email protected]>

…initial capacity Signed-off-by: Ketan Verma <[email protected]>

github-actions · 2023-06-27T04:23:34Z

Gradle Check (Jenkins) Run Completed with:

RESULT: UNSTABLE ❕
TEST FAILURES:

      1 org.opensearch.remotestore.SegmentReplicationUsingRemoteStoreIT.testNodeDropWithOngoingReplication

URL: https://build.ci.opensearch.org/job/gradle-check/18426/
CommitID: e3d3915
Please review all flaky tests that succeeded after retry and create an issue if one does not already exist to track the flaky failure.

ketanv3 · 2023-06-27T04:24:36Z

@dblock Updated!

dblock · 2023-06-28T22:15:18Z

Good work. Watch the backport to 2.x, might need some manual work.

opensearch-trigger-bot · 2023-06-28T22:15:53Z

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/backport-2.x
# Create a new branch
git switch --create backport/backport-7652-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 b7cace575c84d94b49c02cd8328835d3f8b1a0d0
# Push it to GitHub
git push --set-upstream origin backport/backport-7652-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-7652-to-2.x.

dblock · 2023-06-28T23:07:21Z

@ketanv3 want to backport manually to 2.x?

ketanv3 · 2023-06-29T02:28:23Z

@dblock Sure! Working on it.

…gations (opensearch-project#7652) * Add self-organizing hash table to improve the performance of bucket aggregations Signed-off-by: Ketan Verma <[email protected]> * Updated approach: PSL, fingerprint and recency information are embedded in the hash table itself Signed-off-by: Ketan Verma <[email protected]> * Updated tests and added microbenchmarks Signed-off-by: Ketan Verma <[email protected]> * Renamed FastLongHash to ReorganizingLongHash and updated the default initial capacity Signed-off-by: Ketan Verma <[email protected]> --------- Signed-off-by: Ketan Verma <[email protected]>

ketanv3 · 2023-06-29T04:41:30Z

@dblock Here's the backport PR: #8337

…gations (opensearch-project#7652) * Add self-organizing hash table to improve the performance of bucket aggregations Signed-off-by: Ketan Verma <[email protected]> * Updated approach: PSL, fingerprint and recency information are embedded in the hash table itself Signed-off-by: Ketan Verma <[email protected]> * Updated tests and added microbenchmarks Signed-off-by: Ketan Verma <[email protected]> * Renamed FastLongHash to ReorganizingLongHash and updated the default initial capacity Signed-off-by: Ketan Verma <[email protected]> --------- Signed-off-by: Ketan Verma <[email protected]>

…gations (#7652) (#8337) * Add self-organizing hash table to improve the performance of bucket aggregations * Updated approach: PSL, fingerprint and recency information are embedded in the hash table itself * Updated tests and added microbenchmarks * Renamed FastLongHash to ReorganizingLongHash and updated the default initial capacity --------- Signed-off-by: Ketan Verma <[email protected]>

…gations (opensearch-project#7652) * Add self-organizing hash table to improve the performance of bucket aggregations Signed-off-by: Ketan Verma <[email protected]> * Updated approach: PSL, fingerprint and recency information are embedded in the hash table itself Signed-off-by: Ketan Verma <[email protected]> * Updated tests and added microbenchmarks Signed-off-by: Ketan Verma <[email protected]> * Renamed FastLongHash to ReorganizingLongHash and updated the default initial capacity Signed-off-by: Ketan Verma <[email protected]> --------- Signed-off-by: Ketan Verma <[email protected]>

…gations (opensearch-project#7652) * Add self-organizing hash table to improve the performance of bucket aggregations Signed-off-by: Ketan Verma <[email protected]> * Updated approach: PSL, fingerprint and recency information are embedded in the hash table itself Signed-off-by: Ketan Verma <[email protected]> * Updated tests and added microbenchmarks Signed-off-by: Ketan Verma <[email protected]> * Renamed FastLongHash to ReorganizingLongHash and updated the default initial capacity Signed-off-by: Ketan Verma <[email protected]> --------- Signed-off-by: Ketan Verma <[email protected]> Signed-off-by: Shivansh Arora <[email protected]>

ketanv3 force-pushed the performance/long-hash branch from a222e8b to f59172f Compare May 24, 2023 07:49

ketanv3 force-pushed the performance/long-hash branch 2 times, most recently from cfb9ac8 to b18dfcb Compare May 24, 2023 21:25

ketanv3 changed the title ~~[WIP] Self-organizing hash table to improve the performance of bucket aggregations~~ Self-organizing hash table to improve the performance of bucket aggregations May 24, 2023

ketanv3 marked this pull request as ready for review May 25, 2023 04:46

ketanv3 requested review from reta, anasalkouz, andrross, Bukhtawar, CEHENKLE, dblock, gbbafna, setiah, kartg, kotwanikunal, mch2, nknize, owaiskazi19, Rishikesh1159, ryanbogan, saratvemulapalli, shwetathareja, dreamer-89, tlfeng and VachaShah as code owners May 25, 2023 04:46

backslasht approved these changes Jun 16, 2023

View reviewed changes

dblock approved these changes Jun 26, 2023

View reviewed changes

ketanv3 added 4 commits June 27, 2023 08:43

Add self-organizing hash table to improve the performance of bucket a…

36857d3

…ggregations Signed-off-by: Ketan Verma <[email protected]>

Updated approach: PSL, fingerprint and recency information are embedd…

66334ae

…ed in the hash table itself Signed-off-by: Ketan Verma <[email protected]>

Updated tests and added microbenchmarks

80a9c6d

Signed-off-by: Ketan Verma <[email protected]>

Renamed FastLongHash to ReorganizingLongHash and updated the default …

e3d3915

…initial capacity Signed-off-by: Ketan Verma <[email protected]>

ketanv3 force-pushed the performance/long-hash branch from bc6f294 to e3d3915 Compare June 27, 2023 03:49

dblock approved these changes Jun 28, 2023

View reviewed changes

dblock merged commit b7cace5 into opensearch-project:main Jun 28, 2023

dblock added the backport 2.x Backport to 2.x branch label Jun 28, 2023

ketanv3 mentioned this pull request Jun 29, 2023

[Backport 2.x] Self-organizing hash table to improve the performance of bucket aggregations (#7652) #8337

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Self-organizing hash table to improve the performance of bucket aggregations #7652

Self-organizing hash table to improve the performance of bucket aggregations #7652

ketanv3 commented May 22, 2023 •

edited

Loading

github-actions bot commented May 22, 2023

github-actions bot commented May 24, 2023

github-actions bot commented May 24, 2023

codecov bot commented May 24, 2023 •

edited

Loading

github-actions bot commented May 24, 2023

github-actions bot commented Jun 16, 2023

ketanv3 commented Jun 16, 2023

Gradle Check (Jenkins) Run Completed with:

backslasht left a comment

ketanv3 commented Jun 19, 2023

dblock commented Jun 26, 2023

github-actions bot commented Jun 27, 2023

ketanv3 commented Jun 27, 2023

dblock commented Jun 28, 2023

opensearch-trigger-bot bot commented Jun 28, 2023

dblock commented Jun 28, 2023

ketanv3 commented Jun 29, 2023

ketanv3 commented Jun 29, 2023

Self-organizing hash table to improve the performance of bucket aggregations #7652

Self-organizing hash table to improve the performance of bucket aggregations #7652

Conversation

ketanv3 commented May 22, 2023 • edited Loading

Description

Latest Approach: ReorganizingLongHash

Probe Sequence Length

Memory Usage

JMH Benchmarks

Uncorrelated Adds

Correlated Adds

Distinct Adds

OSB Benchmarks

nyc_taxis

http_logs

Previous Approach: LongRHHash

Probe Sequence Length

JMH Benchmarks

Uncorrelated Adds

Correlated Adds

OSB Benchmarks

nyc_taxis

http_logs

Check List

github-actions bot commented May 22, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented May 24, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented May 24, 2023

Gradle Check (Jenkins) Run Completed with:

codecov bot commented May 24, 2023 • edited Loading

Codecov Report

github-actions bot commented May 24, 2023

Gradle Check (Jenkins) Run Completed with:

github-actions bot commented Jun 16, 2023

Gradle Check (Jenkins) Run Completed with:

ketanv3 commented Jun 16, 2023

Gradle Check (Jenkins) Run Completed with:

backslasht left a comment

Choose a reason for hiding this comment

ketanv3 commented Jun 19, 2023

dblock commented Jun 26, 2023

github-actions bot commented Jun 27, 2023

Gradle Check (Jenkins) Run Completed with:

ketanv3 commented Jun 27, 2023

dblock commented Jun 28, 2023

opensearch-trigger-bot bot commented Jun 28, 2023

dblock commented Jun 28, 2023

ketanv3 commented Jun 29, 2023

ketanv3 commented Jun 29, 2023

ketanv3 commented May 22, 2023 •

edited

Loading

Latest Approach: `ReorganizingLongHash`

Previous Approach: `LongRHHash`

codecov bot commented May 24, 2023 •

edited

Loading