Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance] Potential regression in 1.0.0-Beta1 #673

Closed
nknize opened this issue May 7, 2021 · 7 comments
Closed

[Performance] Potential regression in 1.0.0-Beta1 #673

nknize opened this issue May 7, 2021 · 7 comments
Labels
benchmarking Issues related to benchmarking or performance. Priority-High Severity-Major v1.2.0 Issues related to version 1.2.0

Comments

@nknize
Copy link
Collaborator

nknize commented May 7, 2021

Describe the issue
Benchmarking has been set up and the following numbers captured today:

Product/ Version
Architecture
Description Instance Type Workload Details Index Latency (ms) - p50 Index Latency (ms) - p90 Index Latency (ms) - p99 Index Latency (ms) - p100 Index Throughput (docs/s) - p0 Index Throughput (docs/s) - p50 Index Throughput (docs/s) - p100 Ops Count Op Error Count Error Rate Query Latency (ms) - p50 Query Latency (ms) - p90 Query Latency (ms) - p99 Query Latency (ms) - p100 Query Throughput (docs/s) - p0 Query Throughput (docs/s) - p50 Query Throughput (docs/s) - p100 Ops Count Op Error Count Error Rate CPU (%) - p50 CPU (%) - p90 CPU (%) - p99 CPU (%) - p100 Memory (%) - p50 Memory (%) - p90 Memory (%) - p99 Memory (%) - p100 GC (ms) - Old GC (ms) - Young
ODFE 1.13.2 X64 With security m5.xlarge nyc_taxis/ 2 warmupIterations/ 3 testIterations 2,371.4 3,077.4 4,294.3 9,076.4 32,124.7 33,120.6 38,174.6 46,871 - - 175.2 203.5 216.6 273.3 1.748 1.753 1.767 1,510 - - 97 97 98 98 50 64 67 70 - 1,678,812
OpenSearch-1.0.0-beta1 X64 With security m5.xlarge nyc_taxis/ 2 warmupIterations/ 3 testIterations 3,198.3 4,432.5 7,910.1 10,011.5 22,784.5 23,332.2 27,295.6 47,649 58 0.10% 261 305.1 381.9 411.6 2.009 2.015 2.031 906 - - 98 98 98 98 51 64 68 70 - 1,794,217
ODFE 1.13.2 X64 Without security m5.xlarge nyc_taxis/ 2 warmupIterations/ 3 testIterations 1,915.7 2,592.7 3,620.7 6,655.2 39,218.4 40,990.2 47,736.1 46,193 - - 266.5 280.5 289.5 294 1.747 1.751 1.764 1,510 - - 98 98 99 100 48 62 67 75 - 1,281,143
OpenSearch-1.0.0-beta1 X64 Without security m5.xlarge nyc_taxis/ 2 warmupIterations/ 3 testIterations 2,986 4,170.5 7,673.3 13,117.2 24,271.4 25,024.7 29,637.7 47,475 - - 316.7 335.4 424.2 466.2 2.002 2.017 2.033 906 - - 99 99 99 99 49 62 67 69 - 1,543,353

Initial results indicate a possible regression. Further root cause analysis, from benchmark server configuration to investigating dependency performance, needs to be done to first confirm these as valid performance regression indicators. If the behavior is confirmed the regression will be tracked down and a patch will be opened and merged in advance of the Beta 2 release. This issue is to track the progress and transparently communicate the update as it becomes available.

@nknize nknize added Severity-Major Priority-High benchmarking Issues related to benchmarking or performance. labels May 7, 2021
@CEHENKLE
Copy link
Member

Tl;Dr The original large performance degradation we saw was caused by an issue in the performance tool. Now that we've cleared that out, we're still seeing a (smaller) performance degradation, but we've got a smoking gun that the version changes are causing the issue. So we believe this is ultimately related to #693, and should be resolved with that issue.

We've been digging into this, and here's what we've found:

The results above are from an internal tool based on Rally. We had previously run Rally against OpenSearch Alpha, so in order to get an Apples to Apples comparison, we wanted to rerun Rally against OpenSearch Beta. We did that and saw similar results:
osbeta1_0_0_vs_es_&_10_2_nyc_taxi.pdf

BUT! (and the plot thickens!)
Then we figure out what was going on with Rally and the internal tool. Rally picks the branch of the track based on the version reported by the cluster, for ODFE that’s ‘7.10.2’, for OpenSearch that’s ‘1.0.0-beta1’. This means that Rally used the 5.0 branch of nyc-taxis when running OpenSearch instead of the 7.1 branch that it used for ODFE. So from out POV, that was (relatively) good news since it meant that these results have no meaning, since the track and operations were different.

BUT! (more plot!)
Once we'd figure out a way to get Rally to use the same tracks we ran again... and we still seeing some degradation. But not nearly as bad.

Product/ Version Architecture Description Instance Type Workload Details Index Query CPU (%)       Memory (%)       GC (ms)  
Latency (ms) Throughput (docs/s) Operation Counts Latency (ms) Throughput (docs/s)     Operation Counts                  
p50 p90 p99 p100 p0 p50 p100 Ops Count Op Error Count Error Rate p50 p90 p99 p100 p0 p50 p100
ODFE 1.13.2 X64 With security m5.xlarge nyc_taxis/ 2 warmupIterations/ 3 testIterations 2,371.4 3,077.4 4,294.3 9,076.4 32,124.7 33,120.6 38,174.6 46,871 - - 175.2 203.5
OpenSearch-1.0.0-beta1 X64 With security m5.xlarge nyc_taxis/ 2 warmupIterations/ 3 testIterations 2,530.2 3,251.7 4,833.3 7,956.3 30,336.8 31,283.1 36,216.9 47,046 - - 178.3 206.9
Comparison         -6.70% -5.66% -12.55% 12.34% -5.57% 5.55% 5.13%       -1.77% -1.67%
ODFE 1.13.2 X64 Without security m5.xlarge nyc_taxis/ 2 warmupIterations/ 3 testIterations 1,915.7 2,592.7 3,620.7 6,655.2 39,218.4 40,990.2 47,736.1 46,193 - - 266.5 280.5
OpenSearch-1.0.0-beta1 X64 Without security m5.xlarge nyc_taxis/ 2 warmupIterations/ 3 testIterations 2,252 2,984 4,472.3 7,873 33,929.4 35,021.5 41,034.1 46,697 - - 204.4 217.9
Comparison         -17.55% -15.09% -23.52% -18.30% -13.49% -14.56% -14.04%       23.30% 22.32%

BUT! (still more!)
So if we believe (and we do) that the original performance impact was because we was using different tracks, where was the new issue coming from? @harold-wang has been digging in, running tests in rally with different builds, removing different PRs. We have definitely gotten a smoking gun that the issue started with our version changes. We have a couple of theories on what's going on that's causing the degradation, but I'll let @harold-wang post more about it.

Thanks very much to @harold-wang, @cmanning09, @nknize and Phil Treddenick for digging in.

@nknize
Copy link
Collaborator Author

nknize commented May 15, 2021

Initial PR has been opened to fix BWC for legacy transport clients.

@harold-wang
Copy link
Contributor

harold-wang commented Jun 7, 2021

Performance issue is gone after I change the version id from 1.0.0 to 7.10.3 in ESRally test command. This issue is closed now.

Product Architecture Task Race Version Min Throughput Mean Throughput Median Throughput Max Throughput 50th percentile latency 90th percentile latency 99th percentile latency 99.9th percentile latency 99.99th percentile latency 100th percentile latency 50th percentile service time 90th percentile service time 99th percentile service time 99.9th percentile service time 99.99th percentile service time 100th percentile service time
Elasticsearch 7.10.3 X86 Index nyc_taxi 112993 114159 114064 115787 622.197 1031.09 1657.14 2432.13 2758.52 3137.58 622.197 1031.09 1657.14 2432.13 2758.52 3137.58
OpenSearch 1.0.0 X86 Index nyc_taxi 114724 115716 115726 116100 611.568 1023.96 1561.83 2414.98 2824.62 3124.74 611.568 1023.96 1561.83 2414.98 2824.62 3124.74
Diff 1731.3 1556.98 1662.52 313.289 -10.6289 -7.13577 -95.3073 -17.152 66.0953 -12.8442 -10.6289 -7.13577 -95.3073 -17.152 66.0953 -12.8442

nohup esrally race --distribution-version=1.0.0 --track=nyc_taxis --target-hosts=ec2-52-26-172-17.us-west-2.compute.amazonaws.com:9200 --pipeline=benchmark-only --challenge=append-no-conflicts --report-format=csv --report-file=~/es_benchmarks/result.csv --user-tag="intention:OpenSearch100_main_lucence_882" > esrally.log 2>&1 &

Should use 7.10.3 even for OpenSearch 1.0.0 since ESRally could not interpret version 1.0.0 properly.

@harold-wang
Copy link
Contributor

@harold-wang
Copy link
Contributor

harold-wang commented Jun 7, 2021

Re-open this issue, we are running the same test by Mensor against the cluster with opensearch BWC enabled. will update.

Product Architecture Task Race Version Latency Index (ms) P50 Latency Index (ms) P90 Latency Index (ms) P99 Latency Index (ms) P100 Latency Query (ms) P50 Latency Query (ms) P90 Latency Query (ms) P99 Latency Query (ms) P100 Throughput (req/s) Index P0 Throughput (req/s) Index P50 Throughput (req/s) Index P100 Throughput (req/s) Query P0 Throughput (req/s) Query P50 Throughput (req/s) Query P100 CPU P50 CPU P50 CPU P90 CPU P99 Mem P50 Mem P50 Mem P90 Mem P99
OpenSearch BWC Enabled X86 ... nyc_taxi 7.10.3 339.4 456.3 661 1,629.1 112.4 118.3 154.4 179.7 213,291 218,335.8 225,436.7 1.749 1.755 1.769 9 37 40 40 42.5 61 67.8 75
Elasticsearch X86 ... nyc_taxi 1.0.0 389.4 509.8 915.7 2,096.7 74.609 77.823 92.948 108.1 182,861.3 188,277.1 193,464.71 1.751 1.756 1.772 10 30 31 32 57 75 84 89
(OS-ES)/ES . . . . -12.8% -10.5% -27.8% -22.3% 50.7% 52.0% 66.1% 66.2% 16.6% 16.0% 16.5% -0.1% -0.1% -0.2% -10.0% 23.3% 29.0% 25.0% -25.4% -18.7% -19.3% -15.7%

@harold-wang
Copy link
Contributor

harold-wang commented Jun 10, 2021

Summary:

  1. OpenSearch shows higher CPU usage in P50, P90 and P99 (23% to 25%)
  2. OpenSearch shows lower memory usage(-15% to -25%)
  3. Query throughput is almost same
  4. OpenSearch has 16% higher index throughput
  5. OpenSearch has 50% higher Query latency in P50 and P90, P99, P100

Mensor test link:
Opensearch 1.0.0: https://prod.mensor.searchservices.aws.dev/test/491d7695-6ffb-46db-b3bb-1c323a03dd4f
Elasticsearch 7.10.3: https://prod.mensor.searchservices.aws.dev/test/996da8b8-2747-4f42-9603-fbffbfbf7de8

@dblock dblock added the v1.1.0 Issues, PRs, related to the 1.1.0 release label Aug 3, 2021
@nknize
Copy link
Collaborator Author

nknize commented Sep 13, 2021

I think this was resolved as a versioning discrepancy due to the versioning framework using old versions of elasticsearch? bumping to v.1.2.0 but I think this can be closed?

@nknize nknize added v1.2.0 Issues related to version 1.2.0 and removed v1.1.0 Issues, PRs, related to the 1.1.0 release labels Sep 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
benchmarking Issues related to benchmarking or performance. Priority-High Severity-Major v1.2.0 Issues related to version 1.2.0
Projects
None yet
Development

No branches or pull requests

4 participants