Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Opt](orc)Optimize the merge io when orc reader read multiple tiny stripes. (#42004) #43467

Conversation

morningman
Copy link
Contributor

cherry-pick #42004

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Nov 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

1 similar comment
Copy link
Contributor

github-actions bot commented Nov 7, 2024

clang-tidy review says "All clean, LGTM! 👍"

hubgeter and others added 2 commits November 9, 2024 19:02
…ripes. (apache#42004)

When reading orc files, we may encounter a scenario where the stripe
byte size is very small but the number of stripes is very large.

This pr introduces three session variables
`orc_tiny_stripe_threshold_bytes`, `orc_once_max_read_bytes`, and
`orc_max_merge_distance_bytes` to optimize io reading for the above
scenarios.

If a stripe byte size is less than `orc_tiny_stripe_threshold_bytes`, we
will consider it as a tiny stripe. For multiple tiny stripes, we will
perform IO merge reading according to the `orc_once_max_read_bytes` and
`orc_max_merge_distance_bytes` parameters. Among them,
`orc_once_max_read_bytes` indicates the maximum size of the merged IO.
You should not set `orc_once_max_read_bytes` less than
`orc_tiny_stripe_threshold_bytes`, although we will not force an error.
When using tiny stripe reading optimization, since tiny stripes are not
necessarily continuous, when the distance between two tiny stripes is
greater than `orc_max_merge_distance_bytes`, we will not merge them into
one IO.

If you don't want to use this optimization, you can `set
orc_tiny_stripe_threshold_bytes = 0`.

Default parameters:
```mysql
orc_tiny_stripe_threshold_bytes = 8388608 (8M)
orc_once_max_read_bytes = 8388608 (8M)
orc_max_merge_distance_bytes = 1048576 (1M)
```

We also add relevant profiles for this purpose so that parameters can be
adjusted to optimize reading.
`RangeCacheFileReader`:
1. `CacheRefreshCount`: how many IOs are merged
2. `ReadToCacheBytes`: how much data is actually read after merging
3. `ReadToCacheTime`: how long it takes to read data after merging
4. `RequestBytes`: how many bytes does the apache-orc library actually
need to read the orc file
5. `RequestIO`: how many times the apache-orc library calls this read
interface
6. `RequestTime`: how long it takes the apache-orc library to call this
read interface

It should be noted that `RangeCacheFileReader` is a wrapper of the
reader that actually reads data, such as the hdfs reader, so strictly
speaking, `CacheRefreshCount` is not equal to how many IOs are initiated
to hdfs, because each time the hdfs reader is requested, the hdfs reader
may not be able to read all the data at once.

This pr also involves changes to the apache-orc third-party library:
apache/doris-thirdparty#244.
Reference implementation:
https://github.com/trinodb/trino/blob/master/lib/trino-orc/src/main/java/io/trino/orc/OrcDataSourceUtils.java#L36

```mysql
set orc_tiny_stripe_threshold_bytes = xxx;
set orc_once_max_read_bytes = xxx;
set orc_max_merge_distance_bytes = xxx;

```

Introduces three session variables `orc_tiny_stripe_threshold_bytes`,
`orc_once_max_read_bytes`, and `orc_max_merge_distance_bytes` to
optimize io reading of scenarios where the orc stripe byte size is very
small but the number of stripes is very large.

Co-authored-by: kaka11chen <[email protected]>
Co-authored-by: daidai <[email protected]>
@morningman morningman force-pushed the pick_42004_to_upstream-apache_branch-3.0 branch from f7cb311 to 8689aff Compare November 9, 2024 11:02
@morningman
Copy link
Contributor Author

run buildall

Copy link
Contributor

github-actions bot commented Nov 9, 2024

clang-tidy review says "All clean, LGTM! 👍"

@doris-robot
Copy link

TPC-H: Total hot run time: 40544 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 8689affd465f32332fea42b7a4ce8dc3bc732649, data reload: false

------ Round 1 ----------------------------------
q1	17750	7535	7273	7273
q2	2073	195	170	170
q3	10667	1085	1106	1085
q4	10544	739	764	739
q5	7749	2848	2719	2719
q6	233	150	147	147
q7	970	618	616	616
q8	9591	1967	2038	1967
q9	6611	6406	6404	6404
q10	7003	2301	2374	2301
q11	454	261	267	261
q12	397	216	212	212
q13	17982	3010	2962	2962
q14	241	220	210	210
q15	554	510	511	510
q16	676	604	600	600
q17	963	593	573	573
q18	7375	6534	6639	6534
q19	1400	1045	1025	1025
q20	499	201	197	197
q21	3963	3062	3148	3062
q22	1085	980	977	977
Total cold run time: 108780 ms
Total hot run time: 40544 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7318	7207	7219	7207
q2	412	314	309	309
q3	2894	2859	2921	2859
q4	1975	1779	1728	1728
q5	5684	5682	5775	5682
q6	222	144	143	143
q7	2205	1793	1756	1756
q8	3341	3558	3499	3499
q9	8816	8900	8866	8866
q10	3531	3496	3517	3496
q11	591	492	497	492
q12	787	586	593	586
q13	16442	3201	3178	3178
q14	313	286	269	269
q15	591	547	538	538
q16	707	689	661	661
q17	1922	1612	1604	1604
q18	8146	7839	7424	7424
q19	7363	1582	1582	1582
q20	2080	1870	1803	1803
q21	5335	5318	5263	5263
q22	1143	1023	1008	1008
Total cold run time: 81818 ms
Total hot run time: 59953 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 193440 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 8689affd465f32332fea42b7a4ce8dc3bc732649, data reload: false

query1	2596	2258	2242	2242
query2	6218	2057	1990	1990
query3	14451	9436	244	244
query4	32903	23406	23387	23387
query5	3770	447	441	441
query6	283	192	189	189
query7	3998	330	312	312
query8	296	245	252	245
query9	9469	2690	2692	2690
query10	482	272	270	270
query11	17703	15120	15156	15120
query12	161	101	100	100
query13	1544	433	399	399
query14	8307	7520	7310	7310
query15	227	186	187	186
query16	7440	532	471	471
query17	1584	603	586	586
query18	1608	635	637	635
query19	232	188	195	188
query20	124	114	120	114
query21	212	108	106	106
query22	4956	4618	4351	4351
query23	34563	33806	33786	33786
query24	12381	3379	3362	3362
query25	712	416	425	416
query26	1705	185	190	185
query27	2514	290	294	290
query28	7806	2491	2475	2475
query29	1045	443	450	443
query30	397	302	299	299
query31	1014	792	830	792
query32	97	83	54	54
query33	762	289	267	267
query34	1008	507	501	501
query35	903	733	724	724
query36	1106	907	920	907
query37	257	75	72	72
query38	3927	3933	3837	3837
query39	1480	1413	1411	1411
query40	254	96	97	96
query41	53	48	51	48
query42	117	96	98	96
query43	526	492	478	478
query44	1240	776	775	775
query45	178	165	162	162
query46	1124	718	741	718
query47	1855	1791	1765	1765
query48	451	368	362	362
query49	1057	361	375	361
query50	806	398	419	398
query51	7342	7128	7068	7068
query52	105	89	89	89
query53	265	182	181	181
query54	1162	446	443	443
query55	77	76	75	75
query56	253	236	246	236
query57	1202	1089	1095	1089
query58	226	200	203	200
query59	3114	2852	2763	2763
query60	287	247	265	247
query61	109	133	139	133
query62	859	668	654	654
query63	220	184	185	184
query64	5104	652	619	619
query65	3263	3208	3204	3204
query66	1253	299	304	299
query67	15723	15455	15307	15307
query68	4651	565	569	565
query69	432	252	255	252
query70	1167	1086	1124	1086
query71	334	269	260	260
query72	6157	3961	4030	3961
query73	749	346	346	346
query74	10346	8967	8906	8906
query75	3333	2631	2626	2626
query76	2740	928	1043	928
query77	363	269	278	269
query78	10600	9770	9409	9409
query79	8673	579	594	579
query80	2586	435	422	422
query81	583	245	255	245
query82	1394	111	114	111
query83	274	156	154	154
query84	286	80	77	77
query85	2178	297	285	285
query86	495	266	289	266
query87	4382	4217	4238	4217
query88	5619	2377	2380	2377
query89	577	292	287	287
query90	2184	186	187	186
query91	180	144	144	144
query92	63	50	49	49
query93	7366	545	542	542
query94	929	289	305	289
query95	354	254	252	252
query96	633	276	298	276
query97	3333	3182	3143	3143
query98	213	204	199	199
query99	1628	1328	1285	1285
Total cold run time: 323268 ms
Total hot run time: 193440 ms

@morningman morningman merged commit 2995e8f into apache:branch-3.0 Nov 9, 2024
20 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants