[Performance] Potential regression in 1.0.0-Beta1 #673

nknize · 2021-05-07T22:04:02Z

Describe the issue
Benchmarking has been set up and the following numbers captured today:

Product/ Version	Architecture	Description	Instance Type	Workload Details	Index Latency (ms) - p50	Index Latency (ms) - p90	Index Latency (ms) - p99	Index Latency (ms) - p100	Index Throughput (docs/s) - p0	Index Throughput (docs/s) - p50	Index Throughput (docs/s) - p100	Ops Count	Op Error Count	Error Rate	Query Latency (ms) - p50	Query Latency (ms) - p90	Query Latency (ms) - p99	Query Latency (ms) - p100	Query Throughput (docs/s) - p0	Query Throughput (docs/s) - p50	Query Throughput (docs/s) - p100	Ops Count	Op Error Count	Error Rate	CPU (%) - p50	CPU (%) - p90	CPU (%) - p99	CPU (%) - p100	Memory (%) - p50	Memory (%) - p90	Memory (%) - p99	Memory (%) - p100	GC (ms) - Old	GC (ms) - Young
ODFE 1.13.2	X64	With security	m5.xlarge	nyc_taxis/ 2 warmupIterations/ 3 testIterations	2,371.4	3,077.4	4,294.3	9,076.4	32,124.7	33,120.6	38,174.6	46,871	-	-	175.2	203.5	216.6	273.3	1.748	1.753	1.767	1,510	-	-	97	97	98	98	50	64	67	70	-	1,678,812
OpenSearch-1.0.0-beta1	X64	With security	m5.xlarge	nyc_taxis/ 2 warmupIterations/ 3 testIterations	3,198.3	4,432.5	7,910.1	10,011.5	22,784.5	23,332.2	27,295.6	47,649	58	0.10%	261	305.1	381.9	411.6	2.009	2.015	2.031	906	-	-	98	98	98	98	51	64	68	70	-	1,794,217
ODFE 1.13.2	X64	Without security	m5.xlarge	nyc_taxis/ 2 warmupIterations/ 3 testIterations	1,915.7	2,592.7	3,620.7	6,655.2	39,218.4	40,990.2	47,736.1	46,193	-	-	266.5	280.5	289.5	294	1.747	1.751	1.764	1,510	-	-	98	98	99	100	48	62	67	75	-	1,281,143
OpenSearch-1.0.0-beta1	X64	Without security	m5.xlarge	nyc_taxis/ 2 warmupIterations/ 3 testIterations	2,986	4,170.5	7,673.3	13,117.2	24,271.4	25,024.7	29,637.7	47,475	-	-	316.7	335.4	424.2	466.2	2.002	2.017	2.033	906	-	-	99	99	99	99	49	62	67	69	-	1,543,353

Initial results indicate a possible regression. Further root cause analysis, from benchmark server configuration to investigating dependency performance, needs to be done to first confirm these as valid performance regression indicators. If the behavior is confirmed the regression will be tracked down and a patch will be opened and merged in advance of the Beta 2 release. This issue is to track the progress and transparently communicate the update as it becomes available.

CEHENKLE · 2021-05-15T00:50:49Z

Tl;Dr The original large performance degradation we saw was caused by an issue in the performance tool. Now that we've cleared that out, we're still seeing a (smaller) performance degradation, but we've got a smoking gun that the version changes are causing the issue. So we believe this is ultimately related to #693, and should be resolved with that issue.

We've been digging into this, and here's what we've found:

The results above are from an internal tool based on Rally. We had previously run Rally against OpenSearch Alpha, so in order to get an Apples to Apples comparison, we wanted to rerun Rally against OpenSearch Beta. We did that and saw similar results:
osbeta1_0_0_vs_es_&_10_2_nyc_taxi.pdf

BUT! (and the plot thickens!)
Then we figure out what was going on with Rally and the internal tool. Rally picks the branch of the track based on the version reported by the cluster, for ODFE that’s ‘7.10.2’, for OpenSearch that’s ‘1.0.0-beta1’. This means that Rally used the 5.0 branch of nyc-taxis when running OpenSearch instead of the 7.1 branch that it used for ODFE. So from out POV, that was (relatively) good news since it meant that these results have no meaning, since the track and operations were different.

BUT! (more plot!)
Once we'd figure out a way to get Rally to use the same tracks we ran again... and we still seeing some degradation. But not nearly as bad.

Product/ Version	Architecture	Description	Instance Type	Workload Details	Index	Query	CPU (%)				Memory (%)				GC (ms)
Latency (ms)	Throughput (docs/s)	Operation Counts	Latency (ms)	Throughput (docs/s)			Operation Counts
p50	p90	p99	p100	p0	p50	p100	Ops Count	Op Error Count	Error Rate	p50	p90	p99	p100	p0	p50	p100
ODFE 1.13.2	X64	With security	m5.xlarge	nyc_taxis/ 2 warmupIterations/ 3 testIterations	2,371.4	3,077.4	4,294.3	9,076.4	32,124.7	33,120.6	38,174.6	46,871	-	-	175.2	203.5
OpenSearch-1.0.0-beta1	X64	With security	m5.xlarge	nyc_taxis/ 2 warmupIterations/ 3 testIterations	2,530.2	3,251.7	4,833.3	7,956.3	30,336.8	31,283.1	36,216.9	47,046	-	-	178.3	206.9
Comparison					-6.70%	-5.66%	-12.55%	12.34%	-5.57%	5.55%	5.13%				-1.77%	-1.67%
ODFE 1.13.2	X64	Without security	m5.xlarge	nyc_taxis/ 2 warmupIterations/ 3 testIterations	1,915.7	2,592.7	3,620.7	6,655.2	39,218.4	40,990.2	47,736.1	46,193	-	-	266.5	280.5
OpenSearch-1.0.0-beta1	X64	Without security	m5.xlarge	nyc_taxis/ 2 warmupIterations/ 3 testIterations	2,252	2,984	4,472.3	7,873	33,929.4	35,021.5	41,034.1	46,697	-	-	204.4	217.9
Comparison					-17.55%	-15.09%	-23.52%	-18.30%	-13.49%	-14.56%	-14.04%				23.30%	22.32%

BUT! (still more!)
So if we believe (and we do) that the original performance impact was because we was using different tracks, where was the new issue coming from? @harold-wang has been digging in, running tests in rally with different builds, removing different PRs. We have definitely gotten a smoking gun that the issue started with our version changes. We have a couple of theories on what's going on that's causing the degradation, but I'll let @harold-wang post more about it.

Thanks very much to @harold-wang, @cmanning09, @nknize and Phil Treddenick for digging in.

nknize · 2021-05-15T02:43:57Z

Initial PR has been opened to fix BWC for legacy transport clients.

harold-wang · 2021-06-07T13:17:47Z

Performance issue is gone after I change the version id from 1.0.0 to 7.10.3 in ESRally test command. This issue is closed now.

Product	Architecture	Task	Race	Version	Min Throughput	Mean Throughput	Median Throughput	Max Throughput	50th percentile latency	90th percentile latency	99th percentile latency	99.9th percentile latency	99.99th percentile latency	100th percentile latency	50th percentile service time	90th percentile service time	99th percentile service time	99.9th percentile service time	99.99th percentile service time	100th percentile service time
Elasticsearch 7.10.3	X86	Index	nyc_taxi	112993	114159	114064	115787	622.197	1031.09	1657.14	2432.13	2758.52	3137.58	622.197	1031.09	1657.14	2432.13	2758.52	3137.58
OpenSearch 1.0.0	X86	Index	nyc_taxi	114724	115716	115726	116100	611.568	1023.96	1561.83	2414.98	2824.62	3124.74	611.568	1023.96	1561.83	2414.98	2824.62	3124.74
Diff					1731.3	1556.98	1662.52	313.289	-10.6289	-7.13577	-95.3073	-17.152	66.0953	-12.8442	-10.6289	-7.13577	-95.3073	-17.152	66.0953	-12.8442

nohup esrally race --distribution-version=1.0.0 --track=nyc_taxis --target-hosts=ec2-52-26-172-17.us-west-2.compute.amazonaws.com:9200 --pipeline=benchmark-only --challenge=append-no-conflicts --report-format=csv --report-file=~/es_benchmarks/result.csv --user-tag="intention:OpenSearch100_main_lucence_882" > esrally.log 2>&1 &

Should use 7.10.3 even for OpenSearch 1.0.0 since ESRally could not interpret version 1.0.0 properly.

harold-wang · 2021-06-07T13:46:21Z

OpenSearch100_main_vs_elasticsearch_7_10_3_performanceIssue_fakeversionfix.pdf

harold-wang · 2021-06-07T22:17:13Z

Re-open this issue, we are running the same test by Mensor against the cluster with opensearch BWC enabled. will update.

Product	Architecture	Task	Race	Version	Latency Index (ms) P50	Latency Index (ms) P90	Latency Index (ms) P99	Latency Index (ms) P100	Latency Query (ms) P50	Latency Query (ms) P90	Latency Query (ms) P99	Latency Query (ms) P100	Throughput (req/s) Index P0	Throughput (req/s) Index P50	Throughput (req/s) Index P100	Throughput (req/s) Query P0	Throughput (req/s) Query P50	Throughput (req/s) Query P100	CPU P50	CPU P50	CPU P90	CPU P99	Mem P50	Mem P50	Mem P90	Mem P99
OpenSearch BWC Enabled	X86	...	nyc_taxi	7.10.3	339.4	456.3	661	1,629.1	112.4	118.3	154.4	179.7	213,291	218,335.8	225,436.7	1.749	1.755	1.769	9	37	40	40	42.5	61	67.8	75
Elasticsearch	X86	...	nyc_taxi	1.0.0	389.4	509.8	915.7	2,096.7	74.609	77.823	92.948	108.1	182,861.3	188,277.1	193,464.71	1.751	1.756	1.772	10	30	31	32	57	75	84	89
(OS-ES)/ES	.	.	.	.	-12.8%	-10.5%	-27.8%	-22.3%	50.7%	52.0%	66.1%	66.2%	16.6%	16.0%	16.5%	-0.1%	-0.1%	-0.2%	-10.0%	23.3%	29.0%	25.0%	-25.4%	-18.7%	-19.3%	-15.7%

harold-wang · 2021-06-10T14:01:35Z

Summary:

OpenSearch shows higher CPU usage in P50, P90 and P99 (23% to 25%)
OpenSearch shows lower memory usage(-15% to -25%)
Query throughput is almost same
OpenSearch has 16% higher index throughput
OpenSearch has 50% higher Query latency in P50 and P90, P99, P100

Mensor test link:
Opensearch 1.0.0: https://prod.mensor.searchservices.aws.dev/test/491d7695-6ffb-46db-b3bb-1c323a03dd4f
Elasticsearch 7.10.3: https://prod.mensor.searchservices.aws.dev/test/996da8b8-2747-4f42-9603-fbffbfbf7de8

nknize · 2021-09-13T19:08:27Z

I think this was resolved as a versioning discrepancy due to the versioning framework using old versions of elasticsearch? bumping to v.1.2.0 but I think this can be closed?

nknize added Severity-Major Priority-High benchmarking Issues related to benchmarking or performance. labels May 7, 2021

harold-wang closed this as completed Jun 7, 2021

harold-wang reopened this Jun 7, 2021

dblock added the v1.1.0 Issues, PRs, related to the 1.1.0 release label Aug 3, 2021

Jon-AtAWS mentioned this issue Aug 4, 2021

Use http_logs for performance/regression testing #1046

Open

nknize added v1.2.0 Issues related to version 1.2.0 and removed v1.1.0 Issues, PRs, related to the 1.1.0 release labels Sep 13, 2021

CEHENKLE closed this as completed Oct 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Performance] Potential regression in 1.0.0-Beta1 #673

[Performance] Potential regression in 1.0.0-Beta1 #673

nknize commented May 7, 2021

CEHENKLE commented May 15, 2021

nknize commented May 15, 2021

harold-wang commented Jun 7, 2021 •

edited

Loading

harold-wang commented Jun 7, 2021

harold-wang commented Jun 7, 2021 •

edited

Loading

harold-wang commented Jun 10, 2021 •

edited

Loading

nknize commented Sep 13, 2021

[Performance] Potential regression in 1.0.0-Beta1 #673

[Performance] Potential regression in 1.0.0-Beta1 #673

Comments

nknize commented May 7, 2021

CEHENKLE commented May 15, 2021

nknize commented May 15, 2021

harold-wang commented Jun 7, 2021 • edited Loading

harold-wang commented Jun 7, 2021

harold-wang commented Jun 7, 2021 • edited Loading

harold-wang commented Jun 10, 2021 • edited Loading

nknize commented Sep 13, 2021

harold-wang commented Jun 7, 2021 •

edited

Loading

harold-wang commented Jun 7, 2021 •

edited

Loading

harold-wang commented Jun 10, 2021 •

edited

Loading