Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](runtime_profile) fix race condition in to_thrift #45047

Merged
merged 2 commits into from
Dec 6, 2024

Conversation

kaijchen
Copy link
Contributor

@kaijchen kaijchen commented Dec 5, 2024

What problem does this PR solve?

Fix race condition in RuntimeProfile::to_thrift().

#6  0x000055bce5a78bbf in std::__throw_length_error (__s=0x55bca1eb7880 <str> "vector::reserve") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:82
#7  0x000055bcafbbbc8f in std::vector<doris::TRuntimeProfileNode, std::allocator<doris::TRuntimeProfileNode> >::reserve (this=this@entry=0x7f2e69c39f48, __n=<optimized out>)
    at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:70
#8  0x000055bcafbb6e34 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:577
#9  0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#10 0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#11 0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#12 0x000055bcaf32ee52 in doris::LoadChannel::_report_profile (this=this@entry=0x6150116fca80, response=response@entry=0x61201b768340)

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaijchen
Copy link
Contributor Author

kaijchen commented Dec 5, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39935 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 1f9fec992ad309a87c1ff7d7c25fd751c3d75e2c, data reload: false

------ Round 1 ----------------------------------
q1	17585	7470	7227	7227
q2	2072	180	176	176
q3	10538	1121	1141	1121
q4	10219	761	785	761
q5	7625	2729	2636	2636
q6	229	144	150	144
q7	1001	637	604	604
q8	9264	1824	1890	1824
q9	6757	6551	6503	6503
q10	7116	2368	2311	2311
q11	470	253	262	253
q12	428	234	237	234
q13	17803	3033	3026	3026
q14	247	209	214	209
q15	559	522	523	522
q16	673	587	590	587
q17	968	548	466	466
q18	7542	6642	6654	6642
q19	1318	985	1028	985
q20	468	183	184	183
q21	3994	3317	3209	3209
q22	388	312	322	312
Total cold run time: 107264 ms
Total hot run time: 39935 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7221	7233	7183	7183
q2	338	235	239	235
q3	2917	2805	2926	2805
q4	2047	1784	1781	1781
q5	5522	5568	5649	5568
q6	236	142	136	136
q7	2239	1810	1797	1797
q8	3341	3543	3579	3543
q9	9051	9008	8997	8997
q10	3597	3552	3553	3552
q11	632	506	514	506
q12	838	603	607	603
q13	14322	3247	3265	3247
q14	308	274	269	269
q15	586	521	515	515
q16	684	656	654	654
q17	1850	1631	1623	1623
q18	8186	7692	7607	7607
q19	1720	1595	1327	1327
q20	2147	1892	1926	1892
q21	5677	5471	5429	5429
q22	633	573	575	573
Total cold run time: 74092 ms
Total hot run time: 59842 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191841 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 1f9fec992ad309a87c1ff7d7c25fd751c3d75e2c, data reload: false

query1	1166	378	386	378
query2	6522	2075	2041	2041
query3	6690	220	218	218
query4	34044	23668	23539	23539
query5	4358	484	448	448
query6	286	187	187	187
query7	4635	301	308	301
query8	297	238	238	238
query9	9809	2732	2723	2723
query10	457	262	262	262
query11	18118	15170	15326	15170
query12	157	115	103	103
query13	1630	394	399	394
query14	11797	7462	7357	7357
query15	246	177	183	177
query16	8024	527	452	452
query17	1858	582	577	577
query18	1567	310	291	291
query19	369	162	158	158
query20	120	108	110	108
query21	246	106	103	103
query22	4867	4354	4475	4354
query23	35143	34585	34462	34462
query24	10622	2534	2446	2446
query25	635	396	388	388
query26	1191	150	151	150
query27	2597	287	283	283
query28	8022	2495	2460	2460
query29	790	414	402	402
query30	288	155	143	143
query31	1027	820	813	813
query32	92	61	60	60
query33	753	308	299	299
query34	958	520	544	520
query35	901	748	742	742
query36	1119	929	956	929
query37	148	109	69	69
query38	4308	4325	4270	4270
query39	1487	1429	1407	1407
query40	198	97	97	97
query41	45	42	43	42
query42	111	100	100	100
query43	550	504	500	500
query44	1271	817	792	792
query45	187	167	167	167
query46	1183	704	737	704
query47	1923	1874	1846	1846
query48	410	310	319	310
query49	1057	392	393	392
query50	834	386	441	386
query51	7221	7122	6931	6931
query52	100	89	92	89
query53	261	186	182	182
query54	1078	406	404	404
query55	82	84	80	80
query56	260	229	233	229
query57	1248	1111	1091	1091
query58	225	225	209	209
query59	3240	3024	2996	2996
query60	265	249	258	249
query61	108	104	101	101
query62	908	673	678	673
query63	221	185	188	185
query64	4099	721	652	652
query65	3296	3200	3214	3200
query66	861	312	311	311
query67	15999	15791	15624	15624
query68	4277	588	561	561
query69	430	251	262	251
query70	1214	1139	1155	1139
query71	325	259	250	250
query72	6482	4111	4047	4047
query73	766	353	364	353
query74	10159	9075	8906	8906
query75	3413	2661	2671	2661
query76	2475	1166	1032	1032
query77	386	289	270	270
query78	10455	9467	9394	9394
query79	1145	617	599	599
query80	829	424	456	424
query81	513	226	247	226
query82	1222	120	118	118
query83	317	142	154	142
query84	255	70	74	70
query85	1035	307	293	293
query86	326	304	300	300
query87	4756	4551	4574	4551
query88	3472	2221	2193	2193
query89	405	292	366	292
query90	2072	189	189	189
query91	135	102	103	102
query92	68	48	49	48
query93	1070	532	547	532
query94	827	279	279	279
query95	348	252	253	252
query96	617	285	288	285
query97	2839	2683	2615	2615
query98	211	195	197	195
query99	1625	1317	1308	1308
Total cold run time: 299054 ms
Total hot run time: 191841 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.66 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 1f9fec992ad309a87c1ff7d7c25fd751c3d75e2c, data reload: false

query1	0.03	0.03	0.03
query2	0.08	0.04	0.03
query3	0.23	0.07	0.06
query4	1.61	0.10	0.11
query5	0.42	0.42	0.42
query6	1.16	0.67	0.65
query7	0.02	0.02	0.02
query8	0.05	0.04	0.03
query9	0.58	0.52	0.50
query10	0.55	0.55	0.58
query11	0.15	0.10	0.11
query12	0.14	0.11	0.10
query13	0.61	0.60	0.61
query14	2.80	2.82	2.82
query15	0.91	0.83	0.83
query16	0.38	0.39	0.37
query17	1.06	1.04	0.97
query18	0.22	0.22	0.22
query19	1.93	1.78	2.04
query20	0.02	0.01	0.01
query21	15.38	0.62	0.60
query22	2.60	2.81	1.62
query23	17.00	0.86	0.83
query24	2.74	1.27	1.68
query25	0.25	0.10	0.07
query26	0.49	0.15	0.14
query27	0.04	0.04	0.04
query28	10.17	1.10	1.07
query29	12.86	3.21	3.22
query30	0.25	0.07	0.08
query31	2.84	0.38	0.37
query32	3.29	0.47	0.47
query33	2.99	3.04	3.06
query34	16.88	4.46	4.44
query35	4.52	4.45	4.49
query36	0.68	0.49	0.50
query37	0.09	0.05	0.06
query38	0.05	0.03	0.04
query39	0.03	0.03	0.03
query40	0.17	0.12	0.12
query41	0.08	0.03	0.02
query42	0.03	0.02	0.03
query43	0.03	0.02	0.03
Total cold run time: 106.41 s
Total hot run time: 32.66 s

dataroaring
dataroaring previously approved these changes Dec 5, 2024
Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 5, 2024
Copy link
Contributor

github-actions bot commented Dec 5, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Dec 5, 2024

PR approved by anyone and no changes requested.

@yiguolei yiguolei added the p0_c label Dec 5, 2024
@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Dec 6, 2024
@kaijchen
Copy link
Contributor Author

kaijchen commented Dec 6, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39913 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a00628aa7b0fe2893370381529d8c718883b9682, data reload: false

------ Round 1 ----------------------------------
q1	17586	7406	7214	7214
q2	2050	174	177	174
q3	10602	1064	1143	1064
q4	10582	790	721	721
q5	7609	2692	2661	2661
q6	240	153	154	153
q7	998	628	603	603
q8	9249	1893	1930	1893
q9	6666	6653	6508	6508
q10	7047	2285	2360	2285
q11	466	260	265	260
q12	435	224	224	224
q13	17774	3020	3088	3020
q14	234	210	224	210
q15	563	521	522	521
q16	663	598	606	598
q17	983	544	564	544
q18	7270	6636	6729	6636
q19	1349	1086	949	949
q20	469	180	182	180
q21	4071	3325	3183	3183
q22	376	312	319	312
Total cold run time: 107282 ms
Total hot run time: 39913 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7230	7227	7281	7227
q2	333	222	225	222
q3	2884	2811	2905	2811
q4	2076	1782	1859	1782
q5	5708	5689	5665	5665
q6	223	146	140	140
q7	2262	1805	1815	1805
q8	3441	3515	3456	3456
q9	9007	9045	8974	8974
q10	3626	3553	3573	3553
q11	584	512	491	491
q12	802	584	600	584
q13	12461	3217	3237	3217
q14	304	291	271	271
q15	565	534	507	507
q16	711	644	642	642
q17	1836	1615	1598	1598
q18	8304	7871	7627	7627
q19	1693	1610	1500	1500
q20	2135	1908	1858	1858
q21	5610	5525	5377	5377
q22	663	557	576	557
Total cold run time: 72458 ms
Total hot run time: 59864 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.48% (10006/26003)
Line Coverage: 29.52% (83915/284304)
Region Coverage: 28.62% (43125/150694)
Branch Coverage: 25.21% (21920/86954)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a00628aa7b0fe2893370381529d8c718883b9682_a00628aa7b0fe2893370381529d8c718883b9682/report/index.html

Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 6, 2024
Copy link
Contributor

github-actions bot commented Dec 6, 2024

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

TPC-DS: Total hot run time: 197218 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a00628aa7b0fe2893370381529d8c718883b9682, data reload: false

query1	1475	980	973	973
query2	6264	2078	2015	2015
query3	10989	4502	4467	4467
query4	33363	23651	23582	23582
query5	3517	466	476	466
query6	289	183	173	173
query7	3992	292	301	292
query8	290	226	220	220
query9	9613	2707	2711	2707
query10	452	243	246	243
query11	18013	15332	15332	15332
query12	153	102	99	99
query13	1541	394	390	390
query14	9825	7228	7415	7228
query15	287	200	193	193
query16	8084	505	514	505
query17	1701	605	593	593
query18	2250	320	294	294
query19	377	152	157	152
query20	119	112	108	108
query21	201	106	103	103
query22	4810	4475	4389	4389
query23	35317	34359	34567	34359
query24	10706	2698	2570	2570
query25	635	400	400	400
query26	1658	162	155	155
query27	2634	290	287	287
query28	7594	2484	2476	2476
query29	970	408	398	398
query30	248	150	159	150
query31	1063	854	846	846
query32	96	53	53	53
query33	768	296	270	270
query34	1077	491	534	491
query35	913	758	760	758
query36	1115	941	963	941
query37	127	72	78	72
query38	4533	4332	4426	4332
query39	1482	1449	1471	1449
query40	262	100	97	97
query41	45	43	45	43
query42	113	100	95	95
query43	547	492	487	487
query44	1230	829	825	825
query45	185	172	169	169
query46	1195	723	720	720
query47	2035	1989	1925	1925
query48	449	310	316	310
query49	1164	413	386	386
query50	839	428	403	403
query51	7302	7135	7296	7135
query52	102	89	89	89
query53	259	180	182	180
query54	1364	415	424	415
query55	80	82	77	77
query56	253	227	238	227
query57	1330	1185	1166	1166
query58	231	209	223	209
query59	3205	2974	3012	2974
query60	277	248	260	248
query61	137	150	139	139
query62	854	682	683	682
query63	218	203	187	187
query64	4984	670	648	648
query65	3304	3215	3201	3201
query66	1396	304	294	294
query67	16115	15808	15575	15575
query68	4936	546	554	546
query69	423	243	253	243
query70	1177	1149	1185	1149
query71	362	242	242	242
query72	6223	4041	4177	4041
query73	783	366	371	366
query74	10494	9066	8939	8939
query75	3388	2667	2651	2651
query76	2647	1046	1013	1013
query77	427	271	324	271
query78	10463	9458	9328	9328
query79	1410	591	585	585
query80	876	428	446	428
query81	543	238	228	228
query82	565	124	125	124
query83	238	146	142	142
query84	242	75	69	69
query85	1356	294	287	287
query86	448	300	294	294
query87	4684	4617	4511	4511
query88	3806	2211	2182	2182
query89	429	300	292	292
query90	2066	187	185	185
query91	140	101	104	101
query92	65	48	49	48
query93	2310	549	538	538
query94	1099	279	299	279
query95	369	257	252	252
query96	627	275	279	275
query97	2914	2739	2659	2659
query98	219	198	188	188
query99	1526	1325	1338	1325
Total cold run time: 305271 ms
Total hot run time: 197218 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.91 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a00628aa7b0fe2893370381529d8c718883b9682, data reload: false

query1	0.03	0.03	0.05
query2	0.07	0.03	0.03
query3	0.23	0.07	0.08
query4	1.62	0.10	0.11
query5	0.41	0.41	0.42
query6	1.17	0.66	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.60	0.51	0.50
query10	0.56	0.56	0.58
query11	0.14	0.10	0.10
query12	0.14	0.11	0.12
query13	0.62	0.62	0.61
query14	2.72	2.73	2.84
query15	0.92	0.83	0.84
query16	0.39	0.38	0.38
query17	1.07	1.05	1.06
query18	0.23	0.21	0.21
query19	1.88	1.81	1.98
query20	0.01	0.01	0.01
query21	15.38	0.58	0.57
query22	2.26	2.51	2.05
query23	17.11	0.94	0.80
query24	2.64	2.02	0.95
query25	0.29	0.18	0.04
query26	0.41	0.14	0.13
query27	0.04	0.06	0.04
query28	10.46	1.11	1.07
query29	12.53	3.34	3.34
query30	0.25	0.07	0.07
query31	2.87	0.39	0.37
query32	3.27	0.47	0.47
query33	3.03	3.05	3.04
query34	17.21	4.44	4.48
query35	4.50	4.50	4.60
query36	0.66	0.50	0.49
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.03	0.03	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.03
query42	0.03	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 106.25 s
Total hot run time: 32.91 s

@yiguolei yiguolei merged commit 4512cb0 into apache:master Dec 6, 2024
23 of 27 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 6, 2024
### What problem does this PR solve?

Fix race condition in `RuntimeProfile::to_thrift()`.

```
#6  0x000055bce5a78bbf in std::__throw_length_error (__s=0x55bca1eb7880 <str> "vector::reserve") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:82
#7  0x000055bcafbbbc8f in std::vector<doris::TRuntimeProfileNode, std::allocator<doris::TRuntimeProfileNode> >::reserve (this=this@entry=0x7f2e69c39f48, __n=<optimized out>)
    at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:70
#8  0x000055bcafbb6e34 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:577
#9  0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#10 0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#11 0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#12 0x000055bcaf32ee52 in doris::LoadChannel::_report_profile (this=this@entry=0x6150116fca80, response=response@entry=0x61201b768340)
```
github-actions bot pushed a commit that referenced this pull request Dec 6, 2024
### What problem does this PR solve?

Fix race condition in `RuntimeProfile::to_thrift()`.

```
#6  0x000055bce5a78bbf in std::__throw_length_error (__s=0x55bca1eb7880 <str> "vector::reserve") at ../../../../../libstdc++-v3/src/c++11/functexcept.cc:82
#7  0x000055bcafbbbc8f in std::vector<doris::TRuntimeProfileNode, std::allocator<doris::TRuntimeProfileNode> >::reserve (this=this@entry=0x7f2e69c39f48, __n=<optimized out>)
    at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/vector.tcc:70
#8  0x000055bcafbb6e34 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:577
#9  0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#10 0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#11 0x000055bcafbb7780 in doris::RuntimeProfile::to_thrift (this=<optimized out>, nodes=0x7f2e69c39f48) at /root/doris/be/src/util/runtime_profile.cpp:612
#12 0x000055bcaf32ee52 in doris::LoadChannel::_report_profile (this=this@entry=0x6150116fca80, response=response@entry=0x61201b768340)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged p0_c reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants