Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature][Variant] support support schema for inner sub types in variant type #40573

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

eldenmoon
Copy link
Member

@eldenmoon eldenmoon commented Sep 9, 2024

Background

Currently we support auto detect schema info from semi-structure json, but when we encounter something like '2021-01-01' we just store as string type and query like string type, if we use it as date type, it maybe slow.So in this PR, we support to specify subschemas for variant type.

Usage

 CREATE TABLE `test_predefine` (
            `id` bigint NOT NULL,
            `type` varchar(30) NULL,
            `v1` variant<a.b.c:int,ss:string,dcm:decimal,dt:datetime,ip:ipv4,a.b.d:double> NULL,
            INDEX idx_var_sub(`v1`) USING INVERTED PROPERTIES("parser" = "english") )
        ENGINE=OLAP DUPLICATE KEY(`id`) DISTRIBUTED BY HASH(`id`) BUCKETS 3
        PROPERTIES ( "replication_allocation" = "tag.location.default: 1");

Design

public class VariantType  {
    private final List<StructField> predefinedFields;
    private final Supplier<Map<String, StructField>> nameToFields;
}

like struct, we store we predefined fields as static type and no need to do schema merge

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@eldenmoon eldenmoon marked this pull request as draft September 9, 2024 16:52
@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 38251 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 57b2bcaf311b18dd254cb9dd9746c9fc001f876d, data reload: false

------ Round 1 ----------------------------------
q1	17991	4570	4314	4314
q2	2028	193	182	182
q3	11202	1008	1072	1008
q4	10289	688	701	688
q5	7753	2897	2841	2841
q6	227	135	135	135
q7	962	610	603	603
q8	9331	2116	2096	2096
q9	7227	6587	6567	6567
q10	6988	2239	2205	2205
q11	480	245	245	245
q12	397	224	225	224
q13	17773	3128	3106	3106
q14	291	237	255	237
q15	542	486	500	486
q16	512	448	440	440
q17	993	725	719	719
q18	7422	6965	6812	6812
q19	1409	1054	1084	1054
q20	690	321	333	321
q21	4137	3145	2939	2939
q22	1112	1048	1029	1029
Total cold run time: 109756 ms
Total hot run time: 38251 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4405	4242	4307	4242
q2	388	278	278	278
q3	2902	2605	2681	2605
q4	1938	1660	1610	1610
q5	5593	5717	5769	5717
q6	223	135	135	135
q7	2274	1840	1824	1824
q8	3316	3417	3495	3417
q9	8955	8917	8920	8917
q10	3582	3464	3386	3386
q11	612	522	501	501
q12	822	641	666	641
q13	13213	3287	3315	3287
q14	321	283	304	283
q15	552	492	501	492
q16	555	499	512	499
q17	1865	1549	1581	1549
q18	8285	7803	7892	7803
q19	1770	1744	1636	1636
q20	2180	1905	1949	1905
q21	5652	5473	5339	5339
q22	1132	1081	1056	1056
Total cold run time: 70535 ms
Total hot run time: 57122 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192755 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 57b2bcaf311b18dd254cb9dd9746c9fc001f876d, data reload: false

query1	1259	896	880	880
query2	6462	1862	1852	1852
query3	10804	4064	4226	4064
query4	59673	26653	23285	23285
query5	5293	512	508	508
query6	397	159	155	155
query7	5769	297	291	291
query8	316	225	219	219
query9	8915	2490	2481	2481
query10	505	276	271	271
query11	18488	15209	15511	15209
query12	154	99	103	99
query13	1579	431	385	385
query14	11148	7591	7317	7317
query15	228	183	168	168
query16	7580	458	463	458
query17	1108	569	544	544
query18	1953	289	287	287
query19	291	142	143	142
query20	117	110	106	106
query21	206	103	109	103
query22	4548	4619	4745	4619
query23	34446	33251	33398	33251
query24	5927	2868	2803	2803
query25	513	385	392	385
query26	684	157	150	150
query27	1784	273	275	273
query28	3656	2010	2010	2010
query29	669	400	406	400
query30	234	154	150	150
query31	985	739	791	739
query32	86	56	57	56
query33	435	291	294	291
query34	883	487	470	470
query35	841	738	745	738
query36	1059	942	948	942
query37	142	91	92	91
query38	3983	3860	3883	3860
query39	1468	1406	1418	1406
query40	207	118	116	116
query41	49	50	49	49
query42	125	98	97	97
query43	502	458	488	458
query44	1101	738	738	738
query45	200	168	164	164
query46	1081	745	736	736
query47	1871	1798	1824	1798
query48	380	293	305	293
query49	785	451	449	449
query50	816	413	431	413
query51	7045	6915	6957	6915
query52	94	85	85	85
query53	242	172	175	172
query54	575	457	447	447
query55	76	74	80	74
query56	294	248	280	248
query57	1210	1114	1053	1053
query58	225	233	230	230
query59	2967	2719	2704	2704
query60	301	273	269	269
query61	101	99	100	99
query62	742	660	659	659
query63	218	185	187	185
query64	2755	668	670	668
query65	3242	3154	3245	3154
query66	673	346	335	335
query67	15444	15344	15164	15164
query68	3020	600	604	600
query69	438	272	286	272
query70	1243	1092	1096	1092
query71	353	278	281	278
query72	6183	4105	4044	4044
query73	748	324	327	324
query74	9226	8803	8869	8803
query75	3375	2692	2761	2692
query76	1572	952	980	952
query77	572	320	314	314
query78	9650	9020	9049	9020
query79	1052	536	535	535
query80	697	499	493	493
query81	530	231	234	231
query82	234	140	137	137
query83	170	148	145	145
query84	261	78	78	78
query85	685	345	283	283
query86	309	311	297	297
query87	4272	4233	4166	4166
query88	3453	2318	2424	2318
query89	395	286	276	276
query90	1996	192	194	192
query91	128	98	95	95
query92	60	47	53	47
query93	1076	533	532	532
query94	710	296	284	284
query95	332	258	248	248
query96	589	267	265	265
query97	3222	3100	3070	3070
query98	220	208	196	196
query99	1498	1289	1246	1246
Total cold run time: 306178 ms
Total hot run time: 192755 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.83 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 57b2bcaf311b18dd254cb9dd9746c9fc001f876d, data reload: false

query1	0.04	0.05	0.04
query2	0.06	0.04	0.03
query3	0.22	0.05	0.05
query4	1.67	0.08	0.09
query5	0.52	0.49	0.49
query6	1.14	0.73	0.71
query7	0.01	0.01	0.02
query8	0.05	0.04	0.04
query9	0.54	0.49	0.49
query10	0.55	0.56	0.53
query11	0.16	0.11	0.12
query12	0.15	0.13	0.12
query13	0.59	0.59	0.59
query14	1.45	1.40	1.42
query15	0.87	0.80	0.82
query16	0.39	0.37	0.36
query17	0.97	0.98	0.99
query18	0.21	0.20	0.20
query19	1.86	1.71	1.76
query20	0.01	0.01	0.01
query21	15.41	0.67	0.68
query22	4.33	6.98	2.39
query23	18.27	1.39	1.29
query24	2.11	0.23	0.22
query25	0.16	0.09	0.07
query26	0.28	0.18	0.18
query27	0.08	0.08	0.07
query28	13.30	1.03	0.99
query29	12.60	3.34	3.30
query30	0.24	0.05	0.05
query31	2.88	0.40	0.39
query32	3.26	0.48	0.47
query33	2.94	3.01	3.00
query34	17.19	4.45	4.54
query35	4.51	4.46	4.51
query36	0.65	0.48	0.49
query37	0.18	0.15	0.15
query38	0.15	0.15	0.15
query39	0.05	0.04	0.04
query40	0.15	0.12	0.12
query41	0.09	0.05	0.06
query42	0.06	0.06	0.05
query43	0.05	0.04	0.04
Total cold run time: 110.4 s
Total hot run time: 31.83 s

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.80% (9386/25503)
Line Coverage: 28.22% (77428/274410)
Region Coverage: 27.59% (39940/144750)
Branch Coverage: 24.24% (20327/83856)
Coverage Report: http://coverage.selectdb-in.cc/coverage/9e8806d6a1d564a644b645495873413731979a1c_9e8806d6a1d564a644b645495873413731979a1c/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38203 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9e8806d6a1d564a644b645495873413731979a1c, data reload: false

------ Round 1 ----------------------------------
q1	17622	4991	4390	4390
q2	2029	196	183	183
q3	11388	1022	1185	1022
q4	10483	723	847	723
q5	7768	2900	2847	2847
q6	229	134	132	132
q7	956	623	601	601
q8	9336	2118	2110	2110
q9	7196	6678	6607	6607
q10	7015	2302	2182	2182
q11	452	248	244	244
q12	402	229	227	227
q13	18810	3108	3090	3090
q14	277	236	238	236
q15	546	520	488	488
q16	516	432	423	423
q17	1016	732	647	647
q18	7688	6983	7020	6983
q19	1374	1151	1058	1058
q20	667	335	327	327
q21	4024	3105	2669	2669
q22	1144	1014	1028	1014
Total cold run time: 110938 ms
Total hot run time: 38203 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4424	4328	4327	4327
q2	380	274	269	269
q3	2961	2673	2693	2673
q4	1924	1657	1691	1657
q5	5701	5672	5742	5672
q6	225	131	130	130
q7	2258	1820	1798	1798
q8	3394	3484	3506	3484
q9	8845	8886	8830	8830
q10	3652	3426	3457	3426
q11	627	534	526	526
q12	793	682	666	666
q13	12072	3191	3282	3191
q14	338	301	285	285
q15	558	498	514	498
q16	568	500	494	494
q17	1928	1580	1539	1539
q18	8344	7928	7965	7928
q19	1736	1516	1541	1516
q20	2224	1898	1904	1898
q21	5890	5623	5456	5456
q22	1132	1051	1035	1035
Total cold run time: 69974 ms
Total hot run time: 57298 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191297 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9e8806d6a1d564a644b645495873413731979a1c, data reload: false

query1	1249	898	914	898
query2	6384	1872	1862	1862
query3	10631	3939	3853	3853
query4	58864	25316	23121	23121
query5	5394	492	485	485
query6	410	173	155	155
query7	5779	298	286	286
query8	317	215	213	213
query9	8811	2472	2472	2472
query10	483	265	258	258
query11	18287	15016	15434	15016
query12	160	106	103	103
query13	1596	404	389	389
query14	10717	6980	7399	6980
query15	222	187	171	171
query16	7459	484	502	484
query17	1093	589	548	548
query18	1821	310	292	292
query19	289	162	149	149
query20	120	113	113	113
query21	205	109	104	104
query22	4672	4463	4615	4463
query23	34629	33858	33163	33163
query24	5907	2915	2863	2863
query25	486	376	399	376
query26	677	151	152	151
query27	1778	269	274	269
query28	3633	2029	2002	2002
query29	626	404	399	399
query30	240	151	156	151
query31	945	773	760	760
query32	82	50	57	50
query33	458	287	281	281
query34	882	466	471	466
query35	818	729	731	729
query36	1049	946	977	946
query37	143	84	87	84
query38	3906	3786	3940	3786
query39	1428	1376	1363	1363
query40	206	111	111	111
query41	45	44	44	44
query42	115	91	95	91
query43	498	475	452	452
query44	1089	756	733	733
query45	193	163	165	163
query46	1086	692	737	692
query47	1903	1765	1790	1765
query48	370	289	290	289
query49	766	440	440	440
query50	827	408	417	408
query51	7016	6918	6878	6878
query52	98	84	85	84
query53	249	180	178	178
query54	610	447	455	447
query55	73	71	75	71
query56	278	252	262	252
query57	1184	1109	1089	1089
query58	224	219	231	219
query59	2972	2885	2708	2708
query60	289	261	260	260
query61	115	102	104	102
query62	730	668	660	660
query63	228	186	178	178
query64	2777	724	741	724
query65	3177	3180	3142	3142
query66	692	350	338	338
query67	15598	15297	15118	15118
query68	2990	579	574	574
query69	406	283	270	270
query70	1172	1088	1045	1045
query71	350	277	273	273
query72	4987	4001	3991	3991
query73	757	323	320	320
query74	9059	8883	8738	8738
query75	3390	2613	2703	2613
query76	1405	1015	1005	1005
query77	508	313	313	313
query78	10464	9103	9077	9077
query79	1052	532	540	532
query80	689	564	503	503
query81	461	235	231	231
query82	237	140	144	140
query83	170	149	151	149
query84	261	77	78	77
query85	729	294	283	283
query86	307	269	301	269
query87	4437	4364	4198	4198
query88	3007	2340	2325	2325
query89	388	291	289	289
query90	1772	190	186	186
query91	123	94	97	94
query92	61	51	52	51
query93	1068	533	534	533
query94	589	298	283	283
query95	345	254	304	254
query96	587	266	263	263
query97	3229	3102	3075	3075
query98	218	193	196	193
query99	1519	1260	1285	1260
Total cold run time: 302795 ms
Total hot run time: 191297 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.46 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9e8806d6a1d564a644b645495873413731979a1c, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.05
query3	0.22	0.05	0.05
query4	1.69	0.08	0.08
query5	0.48	0.50	0.49
query6	1.13	0.73	0.73
query7	0.02	0.01	0.02
query8	0.05	0.04	0.05
query9	0.55	0.50	0.48
query10	0.53	0.56	0.55
query11	0.16	0.12	0.11
query12	0.15	0.13	0.12
query13	0.58	0.60	0.59
query14	1.41	1.41	1.47
query15	0.84	0.84	0.81
query16	0.37	0.37	0.36
query17	1.00	1.07	1.03
query18	0.21	0.20	0.21
query19	1.78	1.76	1.75
query20	0.01	0.01	0.01
query21	15.38	0.68	0.67
query22	4.85	6.76	1.97
query23	18.32	1.42	1.26
query24	2.15	0.22	0.22
query25	0.15	0.08	0.08
query26	0.26	0.19	0.17
query27	0.08	0.07	0.08
query28	13.21	1.03	0.99
query29	12.60	3.47	3.41
query30	0.24	0.05	0.06
query31	2.90	0.41	0.40
query32	3.25	0.47	0.48
query33	2.93	2.99	2.98
query34	16.84	4.34	4.42
query35	4.42	4.49	4.43
query36	0.65	0.48	0.47
query37	0.18	0.15	0.16
query38	0.15	0.14	0.15
query39	0.04	0.04	0.04
query40	0.15	0.13	0.12
query41	0.09	0.05	0.05
query42	0.06	0.05	0.04
query43	0.04	0.05	0.04
Total cold run time: 110.24 s
Total hot run time: 31.46 s

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.83% (9396/25515)
Line Coverage: 28.24% (77560/274624)
Region Coverage: 27.61% (39988/144848)
Branch Coverage: 24.24% (20341/83908)
Coverage Report: http://coverage.selectdb-in.cc/coverage/f04ceb010dbfd9ac4d189402b22edd27e63404f3_f04ceb010dbfd9ac4d189402b22edd27e63404f3/report/index.html

@eldenmoon eldenmoon changed the title Var pred be [Feature][Variant] support support schema for nested types in variant type Sep 10, 2024
@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon eldenmoon changed the title [Feature][Variant] support support schema for nested types in variant type [Feature][Variant] support support schema for inner sub types in variant type Sep 10, 2024
@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.80% (9397/25533)
Line Coverage: 28.23% (77557/274763)
Region Coverage: 27.59% (39982/144907)
Branch Coverage: 24.23% (20340/83938)
Coverage Report: http://coverage.selectdb-in.cc/coverage/0e0eb12c37e977cc33cd2f45cf274ff4819ffcab_0e0eb12c37e977cc33cd2f45cf274ff4819ffcab/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 38429 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0e0eb12c37e977cc33cd2f45cf274ff4819ffcab, data reload: false

------ Round 1 ----------------------------------
q1	17728	5239	4352	4352
q2	2038	198	188	188
q3	11759	955	1121	955
q4	10525	746	786	746
q5	7759	2898	2795	2795
q6	231	137	139	137
q7	964	619	607	607
q8	9325	2152	2123	2123
q9	7029	6624	6585	6585
q10	7021	2304	2281	2281
q11	470	254	249	249
q12	399	226	225	225
q13	17767	3125	3107	3107
q14	293	234	261	234
q15	550	518	489	489
q16	530	449	417	417
q17	1005	704	777	704
q18	7542	6893	6820	6820
q19	1391	1088	995	995
q20	677	335	339	335
q21	4009	3075	3112	3075
q22	1117	1010	1034	1010
Total cold run time: 110129 ms
Total hot run time: 38429 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4396	4371	4379	4371
q2	387	266	274	266
q3	2911	2666	2689	2666
q4	1922	1652	1675	1652
q5	5722	5716	5894	5716
q6	236	133	134	133
q7	2224	1821	1861	1821
q8	3309	3505	3566	3505
q9	8909	8823	8821	8821
q10	3713	3373	3353	3353
q11	632	509	506	506
q12	843	658	666	658
q13	13353	3261	3337	3261
q14	316	292	302	292
q15	531	480	497	480
q16	535	509	500	500
q17	1839	1538	1554	1538
q18	8246	7891	7942	7891
q19	1774	1673	1556	1556
q20	2211	1936	1922	1922
q21	5869	5599	5593	5593
q22	1114	1038	1054	1038
Total cold run time: 70992 ms
Total hot run time: 57539 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 197213 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 0e0eb12c37e977cc33cd2f45cf274ff4819ffcab, data reload: false

query1	1270	902	889	889
query2	6371	2005	1952	1952
query3	10594	4205	4046	4046
query4	59803	29320	23280	23280
query5	5035	499	516	499
query6	407	161	174	161
query7	5636	302	291	291
query8	327	236	219	219
query9	7811	2512	2486	2486
query10	413	288	264	264
query11	17396	15061	15536	15061
query12	154	104	106	104
query13	1445	403	384	384
query14	10426	7080	7171	7080
query15	209	188	174	174
query16	6863	523	471	471
query17	1153	624	585	585
query18	1524	308	314	308
query19	216	156	152	152
query20	124	111	112	111
query21	236	105	106	105
query22	4700	4395	4839	4395
query23	34532	33383	33605	33383
query24	6013	2852	2845	2845
query25	511	416	405	405
query26	608	156	152	152
query27	1596	275	279	275
query28	3964	2044	2031	2031
query29	664	440	420	420
query30	223	152	155	152
query31	933	766	760	760
query32	67	53	56	53
query33	433	316	299	299
query34	882	477	464	464
query35	841	739	731	731
query36	1063	921	945	921
query37	146	85	84	84
query38	3942	3919	3852	3852
query39	1471	1402	1425	1402
query40	202	120	118	118
query41	48	48	46	46
query42	116	95	91	91
query43	510	485	459	459
query44	1091	750	741	741
query45	197	167	164	164
query46	1097	767	735	735
query47	1837	1779	1811	1779
query48	384	302	294	294
query49	789	450	462	450
query50	820	412	423	412
query51	6971	6927	6800	6800
query52	101	86	89	86
query53	254	180	181	180
query54	581	457	464	457
query55	78	75	78	75
query56	296	272	267	267
query57	1189	1098	1080	1080
query58	232	246	333	246
query59	3036	2944	2800	2800
query60	296	266	272	266
query61	104	100	118	100
query62	758	639	654	639
query63	209	182	186	182
query64	1382	690	708	690
query65	3197	3141	3142	3141
query66	642	339	339	339
query67	15898	15569	15216	15216
query68	1976	843	861	843
query69	427	325	329	325
query70	1163	1106	1172	1106
query71	345	342	343	342
query72	5318	3503	3524	3503
query73	598	581	580	580
query74	8972	8818	8794	8794
query75	3032	2934	2989	2934
query76	1016	847	840	840
query77	524	398	408	398
query78	9422	9194	9336	9194
query79	883	877	847	847
query80	817	857	798	798
query81	469	261	261	261
query82	264	266	267	266
query83	190	192	191	191
query84	231	106	104	104
query85	621	439	381	381
query86	332	319	316	316
query87	4302	4342	4434	4342
query88	4518	4124	4110	4110
query89	367	358	366	358
query90	1510	308	307	307
query91	120	126	129	126
query92	78	76	77	76
query93	921	912	914	912
query94	555	389	358	358
query95	415	408	415	408
query96	473	470	472	470
query97	3079	3147	3118	3118
query98	236	228	221	221
query99	1422	1298	1285	1285
Total cold run time: 296913 ms
Total hot run time: 197213 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.33 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 0e0eb12c37e977cc33cd2f45cf274ff4819ffcab, data reload: false

query1	0.05	0.04	0.04
query2	0.09	0.04	0.03
query3	0.22	0.06	0.06
query4	1.66	0.09	0.08
query5	0.52	0.50	0.49
query6	1.13	0.73	0.73
query7	0.02	0.02	0.02
query8	0.05	0.04	0.04
query9	0.55	0.50	0.48
query10	0.56	0.58	0.55
query11	0.15	0.11	0.11
query12	0.14	0.12	0.12
query13	0.60	0.60	0.58
query14	1.39	1.41	1.45
query15	0.85	0.82	0.83
query16	0.37	0.37	0.37
query17	1.02	0.98	0.97
query18	0.21	0.19	0.21
query19	1.81	1.81	1.84
query20	0.01	0.01	0.01
query21	15.39	0.66	0.65
query22	4.56	7.55	1.86
query23	18.32	1.38	1.25
query24	2.13	0.21	0.22
query25	0.16	0.07	0.08
query26	0.27	0.18	0.18
query27	0.07	0.07	0.07
query28	13.22	1.03	1.00
query29	12.59	3.35	3.33
query30	0.25	0.06	0.07
query31	2.86	0.41	0.39
query32	3.23	0.48	0.48
query33	3.04	2.98	3.01
query34	16.96	4.42	4.48
query35	4.48	4.48	4.42
query36	0.66	0.48	0.47
query37	0.18	0.15	0.16
query38	0.15	0.15	0.15
query39	0.06	0.03	0.04
query40	0.15	0.13	0.12
query41	0.09	0.05	0.05
query42	0.05	0.04	0.05
query43	0.04	0.05	0.04
Total cold run time: 110.31 s
Total hot run time: 31.33 s

@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon eldenmoon marked this pull request as ready for review September 11, 2024 03:50
@doris-robot
Copy link

TPC-H: Total hot run time: 38214 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 83e28509aa6bf3174c224a19d50f5a51460eb1fd, data reload: false

------ Round 1 ----------------------------------
q1	17764	4508	4352	4352
q2	2025	193	193	193
q3	10455	1169	1041	1041
q4	10145	701	708	701
q5	7734	2864	2814	2814
q6	227	135	137	135
q7	945	618	600	600
q8	9322	2076	2056	2056
q9	7323	6597	6599	6597
q10	7019	2219	2189	2189
q11	463	247	249	247
q12	402	230	229	229
q13	17761	3123	3116	3116
q14	287	234	236	234
q15	548	494	482	482
q16	520	424	414	414
q17	987	716	727	716
q18	7452	6913	6755	6755
q19	1398	1008	1021	1008
q20	686	343	332	332
q21	4228	2996	3118	2996
q22	1137	1033	1007	1007
Total cold run time: 108828 ms
Total hot run time: 38214 ms

----- Round 2, with runtime_filter_mode=off -----
q1	4431	4305	4314	4305
q2	383	284	265	265
q3	2881	2688	2694	2688
q4	1980	1688	1769	1688
q5	5432	5396	5434	5396
q6	229	132	131	131
q7	2071	1718	1790	1718
q8	3230	3357	3348	3348
q9	8472	8474	8419	8419
q10	3470	3180	3230	3180
q11	588	496	499	496
q12	800	632	607	607
q13	12928	3117	3056	3056
q14	302	272	275	272
q15	537	488	470	470
q16	509	494	489	489
q17	1828	1526	1507	1507
q18	7821	7480	7395	7395
q19	1702	1445	1470	1445
q20	2057	1797	1855	1797
q21	5534	5116	5343	5116
q22	1149	1078	1006	1006
Total cold run time: 68334 ms
Total hot run time: 54794 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 192847 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 83e28509aa6bf3174c224a19d50f5a51460eb1fd, data reload: false

query1	908	371	367	367
query2	6464	1868	1905	1868
query3	6646	210	213	210
query4	33901	23231	23072	23072
query5	4180	515	498	498
query6	251	166	159	159
query7	4585	298	299	298
query8	277	222	220	220
query9	8540	2463	2456	2456
query10	440	279	283	279
query11	17703	15183	15213	15183
query12	151	100	98	98
query13	1644	395	363	363
query14	9753	6998	7339	6998
query15	281	172	178	172
query16	8108	421	448	421
query17	1568	568	542	542
query18	2134	306	285	285
query19	326	144	141	141
query20	118	109	109	109
query21	214	104	102	102
query22	4341	4294	4133	4133
query23	34193	33752	33792	33752
query24	11154	2912	2818	2818
query25	629	382	388	382
query26	1141	153	155	153
query27	2337	276	276	276
query28	7366	2020	2009	2009
query29	832	405	408	405
query30	312	158	155	155
query31	984	780	812	780
query32	101	55	59	55
query33	769	299	303	299
query34	959	501	487	487
query35	865	725	715	715
query36	1107	984	942	942
query37	168	84	84	84
query38	4023	3950	3911	3911
query39	1457	1414	1412	1412
query40	203	118	117	117
query41	46	45	44	44
query42	117	95	97	95
query43	504	467	455	455
query44	1217	767	737	737
query45	201	167	167	167
query46	1094	759	753	753
query47	1890	1772	1814	1772
query48	391	294	284	284
query49	1104	433	432	432
query50	805	402	405	402
query51	7010	6892	6974	6892
query52	101	87	87	87
query53	264	187	186	186
query54	906	486	460	460
query55	77	76	74	74
query56	275	266	266	266
query57	1227	1066	1075	1066
query58	234	246	231	231
query59	2852	2716	2711	2711
query60	294	265	268	265
query61	99	113	105	105
query62	840	668	668	668
query63	226	181	179	179
query64	4175	791	761	761
query65	3273	3215	3167	3167
query66	1226	335	336	335
query67	15899	15312	15343	15312
query68	3132	855	842	842
query69	435	314	327	314
query70	1188	1206	1172	1172
query71	352	336	332	332
query72	6176	3502	3468	3468
query73	594	589	582	582
query74	8973	8911	8951	8911
query75	3150	3032	2955	2955
query76	1858	865	867	865
query77	483	407	403	403
query78	9268	9248	9623	9248
query79	917	908	866	866
query80	845	811	811	811
query81	464	261	260	260
query82	271	263	269	263
query83	195	193	198	193
query84	225	108	110	108
query85	655	417	383	383
query86	323	334	292	292
query87	4353	4308	4581	4308
query88	4300	4102	4063	4063
query89	385	362	363	362
query90	1493	317	313	313
query91	122	135	125	125
query92	81	75	72	72
query93	912	927	907	907
query94	583	354	356	354
query95	427	415	410	410
query96	471	470	474	470
query97	3062	3104	3081	3081
query98	227	232	220	220
query99	1401	1262	1266	1262
Total cold run time: 284801 ms
Total hot run time: 192847 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.79% (9397/25545)
Line Coverage: 28.23% (77568/274784)
Region Coverage: 27.60% (39998/144925)
Branch Coverage: 24.24% (20348/83948)
Coverage Report: http://coverage.selectdb-in.cc/coverage/83e28509aa6bf3174c224a19d50f5a51460eb1fd_83e28509aa6bf3174c224a19d50f5a51460eb1fd/report/index.html

@doris-robot
Copy link

ClickBench: Total hot run time: 31.69 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 83e28509aa6bf3174c224a19d50f5a51460eb1fd, data reload: false

query1	0.04	0.04	0.04
query2	0.08	0.04	0.04
query3	0.23	0.06	0.05
query4	1.65	0.07	0.07
query5	0.49	0.50	0.51
query6	1.14	0.74	0.73
query7	0.02	0.01	0.01
query8	0.06	0.04	0.04
query9	0.55	0.50	0.48
query10	0.55	0.55	0.54
query11	0.16	0.13	0.12
query12	0.16	0.13	0.12
query13	0.61	0.60	0.58
query14	1.38	1.42	1.44
query15	0.84	0.82	0.82
query16	0.36	0.37	0.37
query17	0.98	0.98	0.98
query18	0.20	0.21	0.20
query19	1.86	1.74	1.80
query20	0.01	0.01	0.02
query21	15.41	0.68	0.67
query22	4.67	6.59	2.30
query23	18.31	1.34	1.27
query24	2.14	0.22	0.22
query25	0.15	0.09	0.08
query26	0.28	0.19	0.17
query27	0.08	0.08	0.08
query28	13.22	1.03	1.00
query29	12.57	3.28	3.25
query30	0.24	0.06	0.06
query31	2.86	0.41	0.39
query32	3.27	0.48	0.48
query33	3.00	3.02	3.02
query34	17.10	4.36	4.48
query35	4.43	4.45	4.43
query36	0.67	0.47	0.46
query37	0.20	0.15	0.16
query38	0.16	0.14	0.15
query39	0.05	0.04	0.04
query40	0.16	0.13	0.12
query41	0.09	0.05	0.05
query42	0.05	0.05	0.05
query43	0.05	0.04	0.05
Total cold run time: 110.53 s
Total hot run time: 31.69 s

@eldenmoon
Copy link
Member Author

run buildall

@@ -170,6 +172,8 @@ public abstract class Type {
typeMap.put("MAP", Type.MAP);
typeMap.put("OBJECT", Type.UNSUPPORTED);
typeMap.put("ARRAY", Type.ARRAY);
typeMap.put("IPV4", Type.IPV4);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change need to be picked to branch-2.1.

virtual Field get_type_field(const IColumn& column, int row) const {
Field field;
column.get(row, field);
field.set_type_info(get_type_id());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call get_precision() and get_frac() for all data types.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's slow for calling virtual functions

@@ -81,6 +81,15 @@ class DataTypeJsonb final : public IDataType {
return String(value.value(), value.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe next PR

@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon
Copy link
Member Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

// Currently the jsonb type should be the top level type, so we should not wrap it in array,
// see create_array_of_type.
// TODO we need to support array<jsonb> correctly
if (UNLIKELY(field.get_type_id() == TypeIndex::JSONB && info->num_dimensions > 0)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: boolean expression can be simplified by DeMorgan's theorem [readability-simplify-boolean-expr]

    if (UNLIKELY(field.get_type_id() == TypeIndex::JSONB && info->num_dimensions > 0)) {
        ^
Additional context

be/src/common/compiler_util.h:35: expanded from macro 'UNLIKELY'

#define UNLIKELY(expr) __builtin_expect(!!(expr), 0)
                                         ^

@@ -248,7 +248,8 @@ DataTypePtr DataTypeFactory::create_data_type(const TypeDescriptor& col_desc, bo
return nested;
}

DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable) {
DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: function 'create_data_type' exceeds recommended size/complexity thresholds [readability-function-size]

DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable,
                             ^
Additional context

be/src/vec/data_types/data_type_factory.cpp:250: 114 lines including whitespace and comments (threshold 80)

DataTypePtr DataTypeFactory::create_data_type(const TypeIndex& type_index, bool is_nullable,
                             ^

@@ -20,6 +20,7 @@
#include <glog/logging.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: 'glog/logging.h' file not found [clang-diagnostic-error]

#include <glog/logging.h>
         ^

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.39% (9665/25852)
Line Coverage: 28.69% (80240/279686)
Region Coverage: 28.09% (41429/147507)
Branch Coverage: 24.70% (21104/85434)
Coverage Report: http://coverage.selectdb-in.cc/coverage/98224ecdbc56f5064bdeae2af127d9976f248f86_98224ecdbc56f5064bdeae2af127d9976f248f86/report/index.html

@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.38% (9664/25852)
Line Coverage: 28.69% (80243/279686)
Region Coverage: 28.09% (41431/147505)
Branch Coverage: 24.70% (21105/85432)
Coverage Report: http://coverage.selectdb-in.cc/coverage/11d404d36df24553cfc902b1b0e4484ea49511c1_11d404d36df24553cfc902b1b0e4484ea49511c1/report/index.html

validateNestedType(scalarType, fieldType);
if (!fieldNames.add(field.getName())) {
throw new AnalysisException("Duplicate field name " + field.getName()
+ " in struct " + scalarType.toSql());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in variant

@@ -276,6 +276,8 @@ TabletColumn TabletReader::materialize_column(const TabletColumn& orig) {
cast_type.type);
}
column_with_cast_type.set_type(filed_type);
column_with_cast_type.set_precision_frac(cast_type.precision, cast_type.scale);
column_with_cast_type.set_is_decimal(cast_type.precision > 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not rigorous to use cast_type.precision > 0 to judge is_decimal. You can use cast_type.is_decimal_v2_type() and cast_type.is_decimal_v3_type() or add a new is_decimal_type for cast_type.

@@ -992,6 +1001,12 @@ Status VerticalSegmentWriter::_append_block_with_variant_subcolumns(RowsInBlock&
auto full_path = full_path_builder.append(parent_column->name_lower_case(), false)
.append(entry->path.get_parts(), false)
.build();
if (typed_columns.contains(entry->path.get_path())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of typed_columns?

Field get_type_field(const IColumn& column, int row) const override {
Field field;
column.get(row, field);
field.set_type_info(get_type_id(), 0, static_cast<int>(get_scale()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is precision == 0 right?

assert_cast<const ColumnUInt64&, TypeCheckOnRelease::DISABLE>(column);
Field field;
column_data.get(row, field);
field.set_type_info(get_type_id(), 0, static_cast<int>(get_scale()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is precision == 0 right?

@eldenmoon eldenmoon marked this pull request as draft November 11, 2024 16:07
@eldenmoon
Copy link
Member Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 40205 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit a267cd5b87336cd392c2ab512b1ddcb88044d7d5, data reload: false

------ Round 1 ----------------------------------
q1	17729	7703	7329	7329
q2	2059	176	177	176
q3	10752	1109	1219	1109
q4	10363	755	775	755
q5	7627	2748	2786	2748
q6	239	149	146	146
q7	1009	619	597	597
q8	9247	1907	1994	1907
q9	6582	6462	6430	6430
q10	7020	2309	2288	2288
q11	470	262	263	262
q12	426	217	213	213
q13	17757	2999	3023	2999
q14	261	211	222	211
q15	586	524	510	510
q16	669	582	582	582
q17	1003	616	534	534
q18	7651	6704	6681	6681
q19	1375	1056	1069	1056
q20	476	184	197	184
q21	4009	3237	3172	3172
q22	375	322	316	316
Total cold run time: 107685 ms
Total hot run time: 40205 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7380	7282	7356	7282
q2	334	227	229	227
q3	2895	2874	2939	2874
q4	2064	1847	1852	1847
q5	5667	5683	5698	5683
q6	235	143	142	142
q7	2258	1796	1828	1796
q8	3457	3575	3575	3575
q9	8736	8924	8862	8862
q10	3613	3562	3558	3558
q11	596	515	512	512
q12	824	595	587	587
q13	11645	3292	3250	3250
q14	306	266	270	266
q15	582	525	512	512
q16	674	659	658	658
q17	1868	1668	1630	1630
q18	8408	7868	7684	7684
q19	1727	1583	1581	1581
q20	2131	1881	1878	1878
q21	5476	5446	5200	5200
q22	660	566	636	566
Total cold run time: 71536 ms
Total hot run time: 60170 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.29% (9976/26051)
Line Coverage: 29.42% (83548/283987)
Region Coverage: 28.53% (42954/150533)
Branch Coverage: 25.16% (21821/86712)
Coverage Report: http://coverage.selectdb-in.cc/coverage/a267cd5b87336cd392c2ab512b1ddcb88044d7d5_a267cd5b87336cd392c2ab512b1ddcb88044d7d5/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 197059 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit a267cd5b87336cd392c2ab512b1ddcb88044d7d5, data reload: false

query1	1255	985	936	936
query2	6265	2153	2117	2117
query3	10929	4073	4037	4037
query4	67214	29277	23784	23784
query5	4985	481	439	439
query6	407	184	178	178
query7	5594	302	290	290
query8	329	245	227	227
query9	8946	2714	2668	2668
query10	462	255	270	255
query11	17356	15303	15767	15303
query12	154	105	106	105
query13	1497	404	411	404
query14	10847	7486	7341	7341
query15	208	183	186	183
query16	7080	476	452	452
query17	1063	562	564	562
query18	1845	298	312	298
query19	216	167	173	167
query20	119	111	109	109
query21	209	102	114	102
query22	4811	4459	4442	4442
query23	34931	34435	34288	34288
query24	5402	2532	2492	2492
query25	492	393	393	393
query26	688	146	150	146
query27	1787	281	296	281
query28	4491	2469	2498	2469
query29	679	419	411	411
query30	217	149	157	149
query31	1027	831	838	831
query32	69	53	57	53
query33	415	296	283	283
query34	969	518	540	518
query35	900	807	773	773
query36	1054	975	984	975
query37	128	70	74	70
query38	4481	4398	4299	4299
query39	1492	1466	1460	1460
query40	205	104	101	101
query41	47	44	44	44
query42	109	97	95	95
query43	534	474	483	474
query44	1212	843	817	817
query45	193	168	168	168
query46	1170	699	748	699
query47	2067	1892	1914	1892
query48	426	309	313	309
query49	747	395	397	395
query50	857	405	400	400
query51	7423	7174	7204	7174
query52	106	85	87	85
query53	254	183	172	172
query54	507	398	381	381
query55	82	75	75	75
query56	251	237	249	237
query57	1284	1166	1150	1150
query58	219	219	221	219
query59	3266	3144	3054	3054
query60	270	241	263	241
query61	109	107	118	107
query62	783	660	690	660
query63	223	195	195	195
query64	1364	659	627	627
query65	3318	3222	3239	3222
query66	699	300	299	299
query67	15864	15594	15601	15594
query68	4114	578	551	551
query69	452	258	259	258
query70	1152	1130	1112	1112
query71	348	251	247	247
query72	6420	4069	4008	4008
query73	758	354	360	354
query74	10269	8991	9009	8991
query75	3403	2672	2653	2653
query76	1893	1082	1150	1082
query77	507	292	292	292
query78	10562	9469	9364	9364
query79	1465	607	607	607
query80	886	427	427	427
query81	511	247	222	222
query82	1319	122	115	115
query83	194	158	144	144
query84	274	70	70	70
query85	881	370	297	297
query86	336	286	292	286
query87	4637	4529	4579	4529
query88	3698	2193	2147	2147
query89	432	300	299	299
query90	2012	187	185	185
query91	143	103	105	103
query92	60	49	50	49
query93	1816	546	548	546
query94	772	249	285	249
query95	345	314	244	244
query96	610	279	276	276
query97	2843	2678	2691	2678
query98	210	191	194	191
query99	1919	1353	1307	1307
Total cold run time: 319843 ms
Total hot run time: 197059 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.62 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit a267cd5b87336cd392c2ab512b1ddcb88044d7d5, data reload: false

query1	0.04	0.03	0.03
query2	0.08	0.04	0.04
query3	0.23	0.06	0.06
query4	1.64	0.10	0.10
query5	0.40	0.41	0.41
query6	1.14	0.67	0.65
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.59	0.50	0.51
query10	0.56	0.56	0.55
query11	0.14	0.11	0.11
query12	0.14	0.11	0.10
query13	0.61	0.60	0.59
query14	2.72	2.84	2.77
query15	0.91	0.82	0.82
query16	0.38	0.38	0.38
query17	1.01	1.06	0.98
query18	0.22	0.21	0.21
query19	1.97	1.87	2.04
query20	0.02	0.00	0.02
query21	15.36	0.62	0.59
query22	2.71	2.66	1.41
query23	17.16	0.93	0.76
query24	3.24	1.98	1.40
query25	0.23	0.34	0.10
query26	0.50	0.14	0.14
query27	0.05	0.04	0.05
query28	9.41	1.10	1.07
query29	12.52	3.19	3.21
query30	0.26	0.06	0.06
query31	2.85	0.40	0.38
query32	3.27	0.47	0.47
query33	2.95	3.06	3.06
query34	16.93	4.47	4.49
query35	4.56	4.47	4.52
query36	0.67	0.49	0.49
query37	0.09	0.06	0.05
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.02
query42	0.04	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.02 s
Total hot run time: 32.62 s

@eldenmoon
Copy link
Member Author

run buildall

@eldenmoon
Copy link
Member Author

run buildall

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants