Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](memory) Disable Arrow Jemalloc #37528

Merged
merged 2 commits into from
Jul 9, 2024

Conversation

xinyiZzz
Copy link
Contributor

@xinyiZzz xinyiZzz commented Jul 9, 2024

Proposed changes

Currently, Arrow use separate Jemalloc, and use non-standard methods mallocx, sdallocx, rallocx in memory_pool_jemalloc.cc to optimize memory allocation.

But this may be incompatible with older versions of the Linux kernel. when we use Arrow on Arm Kirin v10 or Centos 7.4, it will get stuck on the Jemalloc Lock, with the stack below, it will appear when calling arrow::RecordBatch::MakeEmpty. the kernel version of Arm Kylin v10 is 4.19.90, and the kernel version of Centos 7.4 is 4.14.

#0  0x0000ffffae3ceff8 in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x0000ffffae3c9b50 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x0000ffffae61c834 in pthread_mutex_lock () from /lib64/libc.so.6
#3  0x0000aaaac99bc1e0 in je_arrow_private_je_malloc_mutex_lock_slow ()
#4  0x0000aaaac99af3a4 in ?? ()
#5  0x0000aaaac99b576c in je_arrow_mallocx ()
#6  0x0000aaaac99a8aec in ?? ()
#7  0x0000aaaac99a9858 in arrow::AllocateResizableBuffer(long, arrow::MemoryPool*) ()
#8  0x0000aaaac399f8b8 in arrow::BufferBuilder::Resize(long, bool) ()
#9  0x0000aaaac983715c in arrow::BaseBinaryBuilder<arrow::BinaryType>::Resize(long) ()
#10 0x0000aaaac39a47e0 in arrow::BaseBinaryBuilder<arrow::BinaryType>::Append(unsigned char const*, int) ()

After disable separate Jemalloc when compiling Arrow, the above error disappeared, and Arrow will use the default memory allocator, which is Doris Jemalloc.

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Jul 9, 2024

run buildall

@xinyiZzz xinyiZzz force-pushed the 20240707_fix_arrow_compile branch from 4417a23 to fc2fc02 Compare July 9, 2024 07:42
@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Jul 9, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 39834 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit fc2fc027aaf942a5f4f2b681375296ab67275fdd, data reload: false

------ Round 1 ----------------------------------
q1	18059	4462	4388	4388
q2	3043	201	194	194
q3	12408	1161	1132	1132
q4	10644	749	805	749
q5	7849	2704	2622	2622
q6	227	141	141	141
q7	959	602	603	602
q8	9249	2084	2064	2064
q9	8998	6502	6459	6459
q10	8951	3704	3777	3704
q11	468	242	249	242
q12	414	232	236	232
q13	17774	2974	2985	2974
q14	285	224	223	223
q15	529	475	494	475
q16	506	383	374	374
q17	969	701	667	667
q18	8057	7507	7397	7397
q19	7454	1576	1365	1365
q20	703	342	341	341
q21	4928	3142	4049	3142
q22	412	352	347	347
Total cold run time: 122886 ms
Total hot run time: 39834 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4389	4265	4269	4265
q2	376	263	270	263
q3	3005	2783	2811	2783
q4	1845	1587	1639	1587
q5	5236	5262	5285	5262
q6	226	132	134	132
q7	2127	1782	1724	1724
q8	3212	3364	3322	3322
q9	8381	8354	8396	8354
q10	3848	3664	3653	3653
q11	596	491	469	469
q12	783	620	604	604
q13	16453	2992	3028	2992
q14	293	265	252	252
q15	521	475	495	475
q16	473	415	416	415
q17	1798	1507	1473	1473
q18	7752	7584	7438	7438
q19	1715	1512	1451	1451
q20	1969	1811	1769	1769
q21	5001	4768	4821	4768
q22	660	556	573	556
Total cold run time: 70659 ms
Total hot run time: 54007 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 169964 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit fc2fc027aaf942a5f4f2b681375296ab67275fdd, data reload: false

query1	908	368	369	368
query2	6904	2310	2378	2310
query3	6656	216	229	216
query4	24843	17556	17358	17358
query5	4244	497	494	494
query6	269	195	177	177
query7	4596	298	307	298
query8	325	308	303	303
query9	8675	2465	2480	2465
query10	449	295	299	295
query11	10866	10009	10011	10009
query12	137	89	81	81
query13	1637	366	372	366
query14	10352	6998	8416	6998
query15	235	192	181	181
query16	7201	316	314	314
query17	1809	550	532	532
query18	1186	298	271	271
query19	195	150	146	146
query20	90	81	81	81
query21	208	133	134	133
query22	4306	3981	3954	3954
query23	33534	32903	33040	32903
query24	11243	2863	2807	2807
query25	605	364	369	364
query26	1369	150	150	150
query27	2612	274	279	274
query28	7198	2095	2084	2084
query29	922	653	632	632
query30	287	152	150	150
query31	966	740	744	740
query32	98	55	55	55
query33	795	318	319	318
query34	887	483	493	483
query35	726	567	572	567
query36	1110	958	971	958
query37	138	79	80	79
query38	2881	2726	2716	2716
query39	847	782	811	782
query40	283	120	117	117
query41	52	52	51	51
query42	117	96	103	96
query43	563	563	570	563
query44	1190	735	726	726
query45	193	161	160	160
query46	1102	726	705	705
query47	1824	1747	1752	1747
query48	370	300	294	294
query49	1100	415	411	411
query50	774	404	406	404
query51	6873	6751	6878	6751
query52	100	100	99	99
query53	357	299	291	291
query54	963	491	448	448
query55	75	73	75	73
query56	290	262	268	262
query57	1138	1026	1060	1026
query58	247	239	259	239
query59	3204	3379	3200	3200
query60	309	275	272	272
query61	93	94	91	91
query62	839	658	679	658
query63	334	289	288	288
query64	10449	2176	1630	1630
query65	3175	3098	3118	3098
query66	1302	338	330	330
query67	15430	14938	14942	14938
query68	4614	556	552	552
query69	480	349	329	329
query70	1209	1120	1106	1106
query71	387	285	282	282
query72	7287	2767	2623	2623
query73	749	322	332	322
query74	5956	5595	5475	5475
query75	3472	2687	2686	2686
query76	2837	903	922	903
query77	467	309	311	309
query78	9446	8928	8783	8783
query79	2115	514	512	512
query80	1487	471	463	463
query81	566	221	228	221
query82	778	144	137	137
query83	202	167	166	166
query84	280	87	91	87
query85	1443	310	299	299
query86	454	334	329	329
query87	3273	3081	3117	3081
query88	3830	2449	2463	2449
query89	484	398	385	385
query90	1858	195	196	195
query91	132	103	102	102
query92	69	48	52	48
query93	2318	495	502	495
query94	1227	218	215	215
query95	500	326	318	318
query96	585	271	269	269
query97	3186	3044	2997	2997
query98	221	210	198	198
query99	1588	1300	1248	1248
Total cold run time: 278555 ms
Total hot run time: 169964 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.97 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit fc2fc027aaf942a5f4f2b681375296ab67275fdd, data reload: false

query1	0.04	0.03	0.03
query2	0.07	0.04	0.04
query3	0.22	0.05	0.05
query4	1.67	0.07	0.07
query5	0.50	0.49	0.47
query6	1.14	0.72	0.72
query7	0.02	0.01	0.01
query8	0.05	0.05	0.04
query9	0.54	0.50	0.48
query10	0.54	0.54	0.55
query11	0.16	0.11	0.11
query12	0.15	0.13	0.12
query13	0.59	0.58	0.58
query14	0.78	0.79	0.78
query15	0.86	0.81	0.83
query16	0.36	0.36	0.37
query17	1.03	0.99	1.01
query18	0.24	0.23	0.22
query19	1.85	1.76	1.75
query20	0.01	0.01	0.01
query21	15.39	0.74	0.65
query22	4.69	6.58	2.22
query23	18.31	1.38	1.30
query24	2.14	0.22	0.21
query25	0.15	0.08	0.08
query26	0.30	0.22	0.22
query27	0.46	0.23	0.23
query28	13.33	1.01	1.00
query29	12.61	3.36	3.33
query30	0.26	0.06	0.05
query31	2.87	0.40	0.38
query32	3.25	0.49	0.48
query33	2.85	2.91	2.91
query34	17.14	4.35	4.37
query35	4.42	4.37	4.40
query36	0.65	0.45	0.49
query37	0.18	0.15	0.16
query38	0.15	0.15	0.14
query39	0.04	0.03	0.04
query40	0.15	0.12	0.12
query41	0.09	0.04	0.04
query42	0.05	0.04	0.04
query43	0.05	0.04	0.04
Total cold run time: 110.35 s
Total hot run time: 30.97 s

@xinyiZzz
Copy link
Contributor Author

xinyiZzz commented Jul 9, 2024

run buildall

1 similar comment
@morningman
Copy link
Contributor

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40009 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 569cde15e08897015e31cf3e2c532ed6683e6e9e, data reload: false

------ Round 1 ----------------------------------
q1	18164	4388	4376	4376
q2	2460	200	194	194
q3	11029	1172	1170	1170
q4	10696	793	736	736
q5	7800	2672	2706	2672
q6	227	140	144	140
q7	962	620	635	620
q8	9416	2098	2105	2098
q9	8847	6539	6459	6459
q10	8782	3690	3736	3690
q11	476	243	248	243
q12	401	237	237	237
q13	17767	2982	2993	2982
q14	269	230	237	230
q15	527	484	499	484
q16	510	379	373	373
q17	979	720	716	716
q18	8143	7507	7361	7361
q19	3956	1528	1314	1314
q20	689	321	342	321
q21	4950	3249	3244	3244
q22	409	349	360	349
Total cold run time: 117459 ms
Total hot run time: 40009 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4586	4290	4212	4212
q2	370	253	272	253
q3	3002	2768	2728	2728
q4	1869	1605	1592	1592
q5	5274	5290	5255	5255
q6	225	132	135	132
q7	2123	1720	1688	1688
q8	3182	3368	3312	3312
q9	8358	8425	8348	8348
q10	3848	3634	3634	3634
q11	581	495	477	477
q12	795	596	630	596
q13	16318	2971	3031	2971
q14	291	269	269	269
q15	519	480	470	470
q16	479	418	420	418
q17	1800	1473	1455	1455
q18	7745	7480	7377	7377
q19	1733	1647	1653	1647
q20	1985	1797	1791	1791
q21	4828	4801	4670	4670
q22	607	547	568	547
Total cold run time: 70518 ms
Total hot run time: 53842 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 172741 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 569cde15e08897015e31cf3e2c532ed6683e6e9e, data reload: false

query1	924	377	357	357
query2	6451	2577	2375	2375
query3	6654	213	226	213
query4	28656	17476	17263	17263
query5	4187	474	484	474
query6	271	191	167	167
query7	4600	284	288	284
query8	338	306	300	300
query9	8426	2402	2378	2378
query10	444	284	277	277
query11	12951	9955	10046	9955
query12	135	84	82	82
query13	1642	375	376	375
query14	10178	7566	7707	7566
query15	237	184	190	184
query16	7928	322	298	298
query17	2035	539	526	526
query18	1965	279	275	275
query19	205	149	153	149
query20	91	83	83	83
query21	206	128	124	124
query22	4357	4085	4022	4022
query23	33712	32976	33237	32976
query24	12069	2835	2753	2753
query25	654	370	370	370
query26	1784	150	150	150
query27	2944	270	276	270
query28	7779	2061	2052	2052
query29	1127	622	606	606
query30	288	146	148	146
query31	1003	761	738	738
query32	93	54	54	54
query33	767	321	322	321
query34	919	484	478	478
query35	715	581	555	555
query36	1100	931	915	915
query37	297	82	78	78
query38	2863	2742	2726	2726
query39	869	797	803	797
query40	277	121	119	119
query41	56	53	50	50
query42	123	96	102	96
query43	621	537	536	536
query44	1177	749	738	738
query45	188	160	162	160
query46	1070	713	737	713
query47	1875	1753	1772	1753
query48	387	300	302	300
query49	1193	425	413	413
query50	775	405	391	391
query51	6709	6736	6717	6717
query52	103	89	103	89
query53	365	297	292	292
query54	1006	459	456	456
query55	76	73	72	72
query56	293	271	266	266
query57	1198	1048	1027	1027
query58	269	251	251	251
query59	3468	3397	3195	3195
query60	297	291	280	280
query61	97	96	98	96
query62	810	636	666	636
query63	327	292	296	292
query64	10461	2196	1613	1613
query65	3178	3145	3096	3096
query66	1363	337	337	337
query67	15272	15039	14984	14984
query68	4466	540	545	540
query69	469	327	335	327
query70	1060	1082	1100	1082
query71	386	292	282	282
query72	7149	5430	5055	5055
query73	743	325	327	325
query74	5946	5449	5463	5449
query75	3420	2682	2688	2682
query76	2704	936	908	908
query77	460	306	314	306
query78	9460	8953	8864	8864
query79	2672	507	510	507
query80	1922	489	480	480
query81	586	216	221	216
query82	800	140	136	136
query83	287	168	177	168
query84	267	96	86	86
query85	2170	316	312	312
query86	486	309	311	309
query87	3266	3117	3063	3063
query88	4328	2440	2445	2440
query89	498	402	376	376
query90	1857	198	187	187
query91	132	104	104	104
query92	58	47	48	47
query93	2895	508	509	508
query94	1208	210	210	210
query95	401	314	316	314
query96	597	279	268	268
query97	3149	3056	3102	3056
query98	234	200	192	192
query99	1638	1262	1268	1262
Total cold run time: 290612 ms
Total hot run time: 172741 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.12 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 569cde15e08897015e31cf3e2c532ed6683e6e9e, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.04	0.04
query3	0.22	0.06	0.05
query4	1.67	0.10	0.10
query5	0.50	0.49	0.49
query6	1.14	0.73	0.73
query7	0.02	0.01	0.02
query8	0.04	0.04	0.04
query9	0.54	0.48	0.48
query10	0.54	0.53	0.54
query11	0.16	0.12	0.12
query12	0.15	0.13	0.13
query13	0.60	0.59	0.59
query14	0.76	0.78	0.78
query15	0.86	0.81	0.83
query16	0.35	0.37	0.35
query17	1.01	1.01	1.03
query18	0.22	0.22	0.23
query19	1.73	1.76	1.65
query20	0.01	0.01	0.01
query21	15.43	0.76	0.67
query22	4.14	6.14	2.37
query23	18.30	1.41	1.27
query24	2.18	0.22	0.23
query25	0.16	0.09	0.09
query26	0.30	0.21	0.20
query27	0.46	0.23	0.22
query28	13.22	1.03	1.01
query29	12.65	3.33	3.31
query30	0.25	0.06	0.06
query31	2.86	0.38	0.38
query32	3.28	0.48	0.47
query33	2.93	2.88	2.92
query34	16.87	4.38	4.35
query35	4.40	4.42	4.41
query36	0.65	0.48	0.47
query37	0.18	0.15	0.17
query38	0.16	0.14	0.15
query39	0.05	0.03	0.04
query40	0.15	0.12	0.12
query41	0.09	0.05	0.05
query42	0.06	0.05	0.05
query43	0.04	0.04	0.03
Total cold run time: 109.44 s
Total hot run time: 31.12 s

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jul 9, 2024
Copy link
Contributor

github-actions bot commented Jul 9, 2024

PR approved by at least one committer and no changes requested.

Copy link
Contributor

github-actions bot commented Jul 9, 2024

PR approved by anyone and no changes requested.

Copy link
Contributor

@luzhijing luzhijing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@morningman morningman merged commit e142d02 into apache:master Jul 9, 2024
26 of 29 checks passed
dataroaring pushed a commit that referenced this pull request Jul 9, 2024
hello-stephen pushed a commit that referenced this pull request Jul 10, 2024
## Proposed changes

after #37528 update third-party library
morningman pushed a commit to morningman/doris that referenced this pull request Jul 15, 2024
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Aug 10, 2024
zhiqiang-hhhh added a commit to zhiqiang-hhhh/doris that referenced this pull request Aug 18, 2024
Gabriel39 pushed a commit to Gabriel39/incubator-doris that referenced this pull request Sep 14, 2024
Gabriel39 added a commit to Gabriel39/incubator-doris that referenced this pull request Sep 14, 2024
* [PipelineX](improvement) Prepare tasks in parallel (apache#40270)

Issue Number: close #xxx

<!--Describe your changes.-->

* [Improvement](agg) Improve count distinct distribute keys

* 2-phase agg for count distinct int and 4-phase for string

* Disable streaming agg for count distinct distribute key

* [branch-2.1](memory) Disable Arrow Jemalloc (apache#37529)

pick #apache#37528

* [branch-2.1](memory) Disable Arrow Jemalloc step 2 (apache#37556)

pick apache#37533

* [api](cache) Add HTTP API to clear data cache

* [Improvement](pipeline) Do parallel preparation for multiple fragments

* fix compiling after cherry-pick

---------

Co-authored-by: Xinyi Zou <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.5-merged dev/3.0.0-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants