Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[opt](hms table)Some optimizations for hms external table #44909

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

wuwenchi
Copy link
Contributor

@wuwenchi wuwenchi commented Dec 3, 2024

What problem does this PR solve?

Problem Summary:

  1. Increase the schema cache to reduce the time to obtain the schema.
  2. HoodieTableMetaClient is stored in HMSExternalTable to prevent redundant creation.
  3. Cache HoodieTableFileSystemView to speed up getting FileGroup or FileSlice.
  4. Fix analyze path for file:/abc.
  5. Add FSDataInputStreamWrapper to solve hudi conflict class.

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@wuwenchi
Copy link
Contributor Author

wuwenchi commented Dec 3, 2024

run buildall

@wuwenchi wuwenchi marked this pull request as draft December 3, 2024 07:39
@doris-robot
Copy link

TPC-H: Total hot run time: 40334 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit f5120edd70db2f580c199ffa66abcfe9eed14a56, data reload: false

------ Round 1 ----------------------------------
q1	17600	7578	7388	7388
q2	2048	173	166	166
q3	10578	1066	1161	1066
q4	10505	764	726	726
q5	7610	2744	2733	2733
q6	238	146	148	146
q7	1018	631	599	599
q8	9218	1848	1948	1848
q9	6665	6556	6533	6533
q10	7001	2302	2281	2281
q11	467	265	283	265
q12	419	225	223	223
q13	17767	3030	3044	3030
q14	253	215	212	212
q15	585	540	527	527
q16	668	585	595	585
q17	979	599	714	599
q18	7287	6735	6693	6693
q19	1331	1052	1032	1032
q20	447	177	179	177
q21	4066	3196	3268	3196
q22	372	323	309	309
Total cold run time: 107122 ms
Total hot run time: 40334 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7290	7273	7269	7269
q2	332	228	237	228
q3	2936	2790	2968	2790
q4	2108	1832	1854	1832
q5	5751	5694	5669	5669
q6	229	139	138	138
q7	2259	1800	1855	1800
q8	3450	3539	3487	3487
q9	8980	8998	9100	8998
q10	3636	3592	3570	3570
q11	599	517	496	496
q12	822	599	589	589
q13	10554	3258	3201	3201
q14	297	275	270	270
q15	579	532	530	530
q16	681	641	634	634
q17	1850	1641	1597	1597
q18	8308	7843	7569	7569
q19	1672	1620	1579	1579
q20	2123	1874	1911	1874
q21	5681	5454	5540	5454
q22	649	582	580	580
Total cold run time: 70786 ms
Total hot run time: 60154 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 198338 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit f5120edd70db2f580c199ffa66abcfe9eed14a56, data reload: false

query1	1263	965	924	924
query2	6230	2083	2064	2064
query3	10999	4448	4609	4448
query4	67375	28402	23596	23596
query5	4934	462	453	453
query6	406	191	180	180
query7	5530	307	296	296
query8	322	239	249	239
query9	8527	2707	2706	2706
query10	442	264	265	264
query11	17248	15382	15890	15382
query12	157	111	108	108
query13	1460	443	446	443
query14	10130	7920	7626	7626
query15	219	188	191	188
query16	7163	504	503	503
query17	1092	641	550	550
query18	1816	303	303	303
query19	211	152	144	144
query20	114	111	109	109
query21	236	105	104	104
query22	4803	4547	4692	4547
query23	35598	34805	34751	34751
query24	5547	2582	2525	2525
query25	518	383	395	383
query26	672	155	149	149
query27	2046	281	283	281
query28	4554	2481	2465	2465
query29	688	428	422	422
query30	225	152	148	148
query31	1026	831	817	817
query32	68	62	62	62
query33	474	284	310	284
query34	945	538	541	538
query35	906	794	753	753
query36	1089	961	951	951
query37	118	80	71	71
query38	4664	4443	4482	4443
query39	1543	1490	1460	1460
query40	208	103	107	103
query41	45	47	43	43
query42	114	107	100	100
query43	554	501	506	501
query44	1201	827	832	827
query45	201	178	166	166
query46	1182	727	732	727
query47	2057	1963	1936	1936
query48	424	322	316	316
query49	720	397	417	397
query50	865	391	384	384
query51	7450	7251	6975	6975
query52	99	86	88	86
query53	252	180	187	180
query54	519	387	419	387
query55	74	76	78	76
query56	262	223	243	223
query57	1258	1105	1115	1105
query58	205	204	206	204
query59	3231	3010	2917	2917
query60	282	246	245	245
query61	114	107	101	101
query62	793	665	641	641
query63	214	185	192	185
query64	1710	644	640	640
query65	3328	3192	3187	3187
query66	718	301	326	301
query67	16147	15748	15539	15539
query68	3955	580	564	564
query69	433	262	251	251
query70	1196	1158	1148	1148
query71	356	247	254	247
query72	6198	4036	4085	4036
query73	804	362	365	362
query74	10316	9067	9069	9067
query75	3382	2637	2664	2637
query76	1900	1087	1052	1052
query77	494	274	268	268
query78	10411	9502	9464	9464
query79	1134	597	590	590
query80	844	530	446	446
query81	495	238	230	230
query82	214	122	117	117
query83	168	148	148	148
query84	282	74	70	70
query85	939	290	302	290
query86	367	291	296	291
query87	4706	4597	4679	4597
query88	3956	2211	2168	2168
query89	418	289	292	289
query90	2111	186	193	186
query91	138	101	104	101
query92	60	54	51	51
query93	1772	538	545	538
query94	872	298	263	263
query95	343	257	244	244
query96	623	278	280	278
query97	2905	2660	2720	2660
query98	221	192	200	192
query99	1593	1346	1321	1321
Total cold run time: 319191 ms
Total hot run time: 198338 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.34 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit f5120edd70db2f580c199ffa66abcfe9eed14a56, data reload: false

query1	0.03	0.03	0.04
query2	0.06	0.03	0.03
query3	0.23	0.08	0.06
query4	1.63	0.11	0.11
query5	0.43	0.40	0.41
query6	1.17	0.66	0.65
query7	0.02	0.01	0.02
query8	0.04	0.03	0.03
query9	0.59	0.51	0.50
query10	0.55	0.57	0.55
query11	0.13	0.10	0.11
query12	0.14	0.11	0.11
query13	0.61	0.61	0.60
query14	2.82	2.73	2.85
query15	0.90	0.83	0.82
query16	0.38	0.37	0.37
query17	1.05	1.06	1.00
query18	0.23	0.21	0.21
query19	1.95	1.70	1.95
query20	0.02	0.00	0.02
query21	15.35	0.60	0.62
query22	2.72	1.90	2.44
query23	16.95	1.00	0.90
query24	3.38	1.13	0.79
query25	0.26	0.15	0.05
query26	0.58	0.14	0.14
query27	0.04	0.04	0.06
query28	10.71	1.10	1.09
query29	12.54	3.24	3.25
query30	0.25	0.06	0.06
query31	2.85	0.39	0.39
query32	3.26	0.46	0.46
query33	2.99	3.07	3.01
query34	16.92	4.44	4.50
query35	4.54	4.44	4.46
query36	0.69	0.50	0.48
query37	0.09	0.06	0.06
query38	0.04	0.03	0.03
query39	0.04	0.02	0.02
query40	0.16	0.13	0.13
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 107.49 s
Total hot run time: 32.34 s

@wuwenchi wuwenchi force-pushed the opt-hudi-scan branch 2 times, most recently from f3636a1 to 201ca19 Compare December 4, 2024 07:44
@wuwenchi
Copy link
Contributor Author

wuwenchi commented Dec 4, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40246 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 201ca19bbe4fd95b191a46a9a8c17c463b722173, data reload: false

------ Round 1 ----------------------------------
q1	17616	7525	7275	7275
q2	2055	173	169	169
q3	10597	1129	1205	1129
q4	10569	759	772	759
q5	7629	2755	2756	2755
q6	240	148	146	146
q7	1004	637	602	602
q8	9234	1894	1944	1894
q9	6719	6569	6476	6476
q10	6982	2333	2331	2331
q11	455	271	265	265
q12	432	227	226	226
q13	17771	3026	3066	3026
q14	249	210	218	210
q15	569	536	529	529
q16	656	578	615	578
q17	976	522	519	519
q18	7335	6615	6703	6615
q19	1326	1009	1087	1009
q20	484	187	184	184
q21	4036	3258	3226	3226
q22	397	327	323	323
Total cold run time: 107331 ms
Total hot run time: 40246 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7246	7207	7349	7207
q2	325	222	224	222
q3	2952	2891	2969	2891
q4	2083	1887	2006	1887
q5	5668	5708	5704	5704
q6	232	141	141	141
q7	2250	1822	1854	1822
q8	3391	3563	3532	3532
q9	9001	9135	9046	9046
q10	3631	3555	3559	3555
q11	608	516	488	488
q12	829	638	630	630
q13	10777	3318	3248	3248
q14	311	271	287	271
q15	596	529	527	527
q16	690	655	656	655
q17	1863	1642	1611	1611
q18	8353	7818	7708	7708
q19	1705	1581	1556	1556
q20	2154	1900	1884	1884
q21	5585	5565	5603	5565
q22	650	600	613	600
Total cold run time: 70900 ms
Total hot run time: 60750 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197319 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 201ca19bbe4fd95b191a46a9a8c17c463b722173, data reload: false

query1	1249	961	941	941
query2	6239	2059	1999	1999
query3	10972	4445	4468	4445
query4	67008	28409	23540	23540
query5	4898	482	462	462
query6	413	194	180	180
query7	5477	308	300	300
query8	315	229	225	225
query9	8323	2724	2718	2718
query10	441	243	251	243
query11	16897	15281	16044	15281
query12	162	104	105	104
query13	1432	448	410	410
query14	10610	6971	6791	6791
query15	212	186	194	186
query16	6999	431	503	431
query17	1054	582	595	582
query18	1705	320	322	320
query19	213	170	165	165
query20	123	117	116	116
query21	201	105	107	105
query22	4667	4380	4578	4380
query23	35060	34450	35589	34450
query24	5358	2662	2617	2617
query25	493	387	400	387
query26	665	154	156	154
query27	2116	279	283	279
query28	5080	2528	2514	2514
query29	675	427	415	415
query30	210	149	148	148
query31	1008	825	853	825
query32	70	58	56	56
query33	475	315	299	299
query34	954	516	533	516
query35	909	763	790	763
query36	1112	970	957	957
query37	126	73	75	73
query38	4601	4383	4456	4383
query39	1528	1498	1467	1467
query40	205	119	96	96
query41	43	41	42	41
query42	114	99	101	99
query43	548	501	498	498
query44	1204	834	845	834
query45	197	171	180	171
query46	1192	718	714	714
query47	2050	1927	1901	1901
query48	427	332	336	332
query49	738	423	382	382
query50	857	391	398	391
query51	7415	7175	7176	7175
query52	99	88	88	88
query53	255	184	173	173
query54	509	394	391	391
query55	77	74	72	72
query56	261	233	233	233
query57	1213	1128	1118	1118
query58	206	217	215	215
query59	3234	3186	2940	2940
query60	268	245	241	241
query61	109	108	104	104
query62	774	680	674	674
query63	220	225	200	200
query64	1476	674	650	650
query65	3308	3393	3222	3222
query66	719	300	296	296
query67	16089	15724	15684	15684
query68	3818	556	550	550
query69	434	264	245	245
query70	1207	1151	1099	1099
query71	370	245	245	245
query72	6215	4074	4072	4072
query73	790	361	359	359
query74	10219	9156	9072	9072
query75	3415	2701	2686	2686
query76	1893	1149	1002	1002
query77	480	277	278	277
query78	10504	9548	9501	9501
query79	1756	624	631	624
query80	1196	422	427	422
query81	505	249	234	234
query82	391	120	113	113
query83	177	146	143	143
query84	278	70	68	68
query85	967	291	295	291
query86	410	289	305	289
query87	4733	4647	4519	4519
query88	3757	2217	2180	2180
query89	423	285	293	285
query90	1974	214	183	183
query91	139	101	105	101
query92	59	49	52	49
query93	2551	546	551	546
query94	812	291	292	291
query95	344	253	246	246
query96	633	284	286	284
query97	2876	2674	2684	2674
query98	218	193	204	193
query99	1584	1314	1326	1314
Total cold run time: 318985 ms
Total hot run time: 197319 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 33.21 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 201ca19bbe4fd95b191a46a9a8c17c463b722173, data reload: false

query1	0.03	0.04	0.03
query2	0.07	0.03	0.03
query3	0.24	0.08	0.07
query4	1.61	0.11	0.10
query5	0.43	0.42	0.42
query6	1.15	0.66	0.66
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.59	0.51	0.51
query10	0.54	0.56	0.55
query11	0.14	0.11	0.11
query12	0.14	0.11	0.12
query13	0.60	0.60	0.59
query14	2.72	2.75	2.86
query15	0.91	0.82	0.84
query16	0.39	0.39	0.40
query17	1.05	1.04	1.03
query18	0.22	0.21	0.21
query19	1.84	1.75	1.99
query20	0.02	0.01	0.02
query21	15.35	0.60	0.57
query22	2.29	1.65	1.85
query23	17.29	0.84	0.76
query24	3.06	2.14	1.80
query25	0.12	0.10	0.15
query26	0.69	0.14	0.14
query27	0.05	0.05	0.03
query28	9.50	1.10	1.08
query29	12.54	3.32	3.26
query30	0.25	0.06	0.07
query31	2.87	0.39	0.39
query32	3.28	0.47	0.45
query33	3.09	3.08	3.02
query34	17.08	4.46	4.45
query35	4.52	4.51	4.51
query36	0.66	0.48	0.50
query37	0.08	0.06	0.05
query38	0.05	0.04	0.03
query39	0.02	0.02	0.02
query40	0.16	0.13	0.12
query41	0.07	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.02
Total cold run time: 105.84 s
Total hot run time: 33.21 s

@wuwenchi wuwenchi marked this pull request as ready for review December 5, 2024 08:51
@wuwenchi wuwenchi marked this pull request as draft December 11, 2024 06:18
@wuwenchi
Copy link
Contributor Author

run buildall

@wuwenchi wuwenchi marked this pull request as ready for review December 13, 2024 08:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants