Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fix](function) fix Substring/SubReplace error result with input utf8 string #40929

Merged
merged 2 commits into from
Sep 19, 2024

Conversation

Mryange
Copy link
Contributor

@Mryange Mryange commented Sep 18, 2024

Proposed changes


mysql [(none)]>select sub_replace("你好世界","a",1);
+-------------------------------------+
| sub_replace('你好世界', 'a', 1)     |
+-------------------------------------+
| �a�好世界                             |
+-------------------------------------+



mysql [(none)]>select SUBSTRING('中文测试',5);
+------------------------------------------+
| substring('中文测试', 5, 2147483647)     |
+------------------------------------------+
| 中文测试                                 |
+------------------------------------------+
1 row in set (0.04 sec)



now
mysql [(none)]>select sub_replace("你好世界","a",1);
+-------------------------------------+
| sub_replace('你好世界', 'a', 1)     |
+-------------------------------------+
| 你a世界                             |
+-------------------------------------+
1 row in set (0.05 sec)

mysql [(none)]>select SUBSTRING('中文测试',5);
+------------------------------------------+
| substring('中文测试', 5, 2147483647)     |
+------------------------------------------+
|                                          |
+------------------------------------------+
1 row in set (0.13 sec)

@doris-robot
Copy link

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR

Since 2024-03-18, the Document has been moved to doris-website.
See Doris Document.

@Mryange
Copy link
Contributor Author

Mryange commented Sep 18, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 41849 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit adeaa7b53d75c8f3e4627d71213500a4c4aa15a6, data reload: false

------ Round 1 ----------------------------------
q1	10336	7395	7304	7304
q2	950	178	160	160
q3	2934	1157	1128	1128
q4	5747	814	766	766
q5	4493	3181	3103	3103
q6	241	153	152	152
q7	1033	629	623	623
q8	5518	2040	2058	2040
q9	6521	6419	6430	6419
q10	2591	2305	2349	2305
q11	393	238	253	238
q12	402	221	221	221
q13	17546	2968	3001	2968
q14	242	215	221	215
q15	585	532	524	524
q16	668	629	636	629
q17	998	838	817	817
q18	7425	6679	6815	6679
q19	1238	944	996	944
q20	561	293	276	276
q21	4020	3366	3315	3315
q22	1102	1029	1023	1023
Total cold run time: 75544 ms
Total hot run time: 41849 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7252	7265	7245	7245
q2	330	230	242	230
q3	2934	2800	2846	2800
q4	1976	1697	1712	1697
q5	5384	5423	5415	5415
q6	229	144	145	144
q7	2122	1744	1729	1729
q8	3196	3324	3313	3313
q9	8470	8398	8439	8398
q10	3376	3337	3312	3312
q11	578	479	487	479
q12	753	576	571	571
q13	3695	2989	3032	2989
q14	294	273	265	265
q15	563	524	524	524
q16	728	670	679	670
q17	1765	1553	1541	1541
q18	7653	7417	7392	7392
q19	1678	1530	1517	1517
q20	2066	1811	1786	1786
q21	5386	5093	5145	5093
q22	1109	1052	1001	1001
Total cold run time: 61537 ms
Total hot run time: 58111 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.32% (9582/25673)
Line Coverage: 28.71% (79209/275932)
Region Coverage: 28.18% (41015/145525)
Branch Coverage: 24.81% (20903/84258)
Coverage Report: http://coverage.selectdb-in.cc/coverage/adeaa7b53d75c8f3e4627d71213500a4c4aa15a6_adeaa7b53d75c8f3e4627d71213500a4c4aa15a6/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 199197 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit adeaa7b53d75c8f3e4627d71213500a4c4aa15a6, data reload: false

query1	958	365	372	365
query2	6518	2121	2034	2034
query3	6720	213	218	213
query4	34307	23426	23530	23426
query5	4322	480	448	448
query6	267	185	166	166
query7	4624	297	295	295
query8	274	215	214	214
query9	9696	2642	2638	2638
query10	472	279	302	279
query11	18124	15056	15316	15056
query12	158	99	101	99
query13	1623	411	397	397
query14	10504	7312	7391	7312
query15	315	178	182	178
query16	8078	460	480	460
query17	1804	580	571	571
query18	2129	315	304	304
query19	357	151	148	148
query20	118	120	110	110
query21	215	104	104	104
query22	4551	4195	4438	4195
query23	34686	34026	37045	34026
query24	11619	3152	3033	3033
query25	582	516	493	493
query26	1204	230	235	230
query27	2491	407	415	407
query28	8226	3244	3220	3220
query29	616	596	595	595
query30	342	175	174	174
query31	974	833	840	833
query32	94	74	72	72
query33	782	362	384	362
query34	810	693	712	693
query35	803	832	775	775
query36	1020	990	977	977
query37	170	156	161	156
query38	3949	3971	3910	3910
query39	1437	1440	1478	1440
query40	298	134	132	132
query41	54	52	51	51
query42	147	150	149	149
query43	523	519	540	519
query44	1670	1527	1502	1502
query45	186	184	189	184
query46	1030	1018	1006	1006
query47	1822	1822	1828	1822
query48	485	501	481	481
query49	1271	492	479	479
query50	718	721	733	721
query51	7098	6933	7057	6933
query52	134	128	128	128
query53	289	286	270	270
query54	1218	651	661	651
query55	107	113	107	107
query56	334	333	324	324
query57	1144	1094	1091	1091
query58	279	279	282	279
query59	2900	2968	2986	2968
query60	341	349	334	334
query61	127	121	119	119
query62	799	687	798	687
query63	271	276	272	272
query64	5130	989	964	964
query65	3227	3183	3216	3183
query66	1060	356	357	356
query67	15424	15420	15387	15387
query68	3124	862	846	846
query69	439	344	339	339
query70	1125	1185	1173	1173
query71	333	336	335	335
query72	5295	3429	3300	3300
query73	586	584	582	582
query74	9095	8972	8989	8972
query75	3078	2811	2819	2811
query76	2014	860	874	860
query77	388	367	358	358
query78	9312	9417	9210	9210
query79	922	933	883	883
query80	587	570	562	562
query81	447	253	251	251
query82	233	229	232	229
query83	166	159	161	159
query84	242	103	111	103
query85	683	372	350	350
query86	324	333	315	315
query87	4349	4282	4434	4282
query88	4330	4085	4113	4085
query89	376	360	357	357
query90	1408	318	322	318
query91	166	166	165	165
query92	78	72	75	72
query93	912	912	899	899
query94	536	368	390	368
query95	425	422	417	417
query96	488	491	490	490
query97	3094	3108	3131	3108
query98	228	227	228	227
query99	1391	1301	1264	1264
Total cold run time: 290199 ms
Total hot run time: 199197 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.84 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit adeaa7b53d75c8f3e4627d71213500a4c4aa15a6, data reload: false

query1	0.05	0.04	0.04
query2	0.06	0.02	0.02
query3	0.23	0.06	0.07
query4	1.65	0.10	0.10
query5	0.53	0.51	0.49
query6	1.13	0.74	0.72
query7	0.02	0.01	0.01
query8	0.04	0.03	0.02
query9	0.55	0.51	0.49
query10	0.56	0.55	0.55
query11	0.14	0.10	0.11
query12	0.14	0.11	0.11
query13	0.60	0.59	0.58
query14	3.07	2.99	2.97
query15	0.88	0.82	0.82
query16	0.39	0.38	0.38
query17	1.05	1.07	1.09
query18	0.22	0.20	0.21
query19	1.98	1.91	1.82
query20	0.02	0.01	0.01
query21	15.35	0.60	0.60
query22	2.66	2.81	1.09
query23	17.34	0.91	0.78
query24	3.25	1.19	0.75
query25	0.26	0.13	0.14
query26	0.42	0.14	0.14
query27	0.04	0.03	0.04
query28	10.52	1.09	1.06
query29	12.58	3.24	3.23
query30	0.25	0.06	0.06
query31	2.86	0.38	0.38
query32	3.26	0.48	0.46
query33	2.96	3.02	3.09
query34	16.72	4.38	4.40
query35	4.38	4.43	4.41
query36	0.66	0.47	0.52
query37	0.09	0.05	0.06
query38	0.04	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.12	0.12
query41	0.08	0.02	0.03
query42	0.03	0.02	0.02
query43	0.03	0.03	0.03
Total cold run time: 107.28 s
Total hot run time: 31.84 s

ColumnString::Offsets& res_offsets = result_column->get_offsets();
PaddedPODArray<size_t> index;

for (size_t row = 0; row < input_rows_count; ++row) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

recheck the logic again?

@Mryange
Copy link
Contributor Author

Mryange commented Sep 18, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 42087 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 2fd5e86dce5017e8e2444b29f82748bd7dfd57bd, data reload: false

------ Round 1 ----------------------------------
q1	17597	7309	7261	7261
q2	2051	159	159	159
q3	10703	1185	1209	1185
q4	10688	789	792	789
q5	7803	3173	3120	3120
q6	237	158	155	155
q7	1043	622	625	622
q8	9668	2090	2073	2073
q9	7031	6494	6516	6494
q10	7884	2313	2293	2293
q11	442	261	247	247
q12	412	232	216	216
q13	17824	3008	2943	2943
q14	248	211	213	211
q15	570	540	534	534
q16	677	613	623	613
q17	995	834	820	820
q18	7705	6909	6972	6909
q19	1399	1073	944	944
q20	608	305	297	297
q21	4060	3203	3244	3203
q22	1110	999	1006	999
Total cold run time: 110755 ms
Total hot run time: 42087 ms

----- Round 2, with runtime_filter_mode=off -----
q1	7338	7249	7276	7249
q2	326	231	228	228
q3	2923	2791	2792	2791
q4	1950	1726	1680	1680
q5	5336	5395	5461	5395
q6	228	146	146	146
q7	2138	1712	1737	1712
q8	3190	3300	3310	3300
q9	8381	8396	8429	8396
q10	3391	3344	3317	3317
q11	595	479	471	471
q12	774	570	565	565
q13	6376	2975	2987	2975
q14	286	267	271	267
q15	556	522	527	522
q16	703	661	671	661
q17	1762	1567	1512	1512
q18	7754	7447	7308	7308
q19	1659	1609	1405	1405
q20	2059	1773	1815	1773
q21	5373	5220	5272	5220
q22	1117	1045	1045	1045
Total cold run time: 64215 ms
Total hot run time: 57938 ms

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 37.33% (9583/25673)
Line Coverage: 28.72% (79241/275932)
Region Coverage: 28.19% (41023/145524)
Branch Coverage: 24.81% (20903/84256)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2fd5e86dce5017e8e2444b29f82748bd7dfd57bd_2fd5e86dce5017e8e2444b29f82748bd7dfd57bd/report/index.html

@doris-robot
Copy link

TPC-DS: Total hot run time: 195066 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 2fd5e86dce5017e8e2444b29f82748bd7dfd57bd, data reload: false

query1	955	367	368	367
query2	6536	2085	2061	2061
query3	6719	215	236	215
query4	34336	23417	23512	23417
query5	4296	484	476	476
query6	269	165	161	161
query7	4631	302	316	302
query8	294	224	217	217
query9	9690	2651	2655	2651
query10	477	303	282	282
query11	18080	15218	15406	15218
query12	157	96	97	96
query13	1643	417	398	398
query14	10475	7499	7506	7499
query15	336	177	183	177
query16	8125	408	446	408
query17	1758	594	562	562
query18	2170	316	305	305
query19	379	145	151	145
query20	120	104	113	104
query21	214	107	104	104
query22	4661	4181	4341	4181
query23	34856	34044	34227	34044
query24	11217	2937	2940	2937
query25	674	416	417	416
query26	1391	164	169	164
query27	2784	300	295	295
query28	8125	2479	2416	2416
query29	910	434	430	430
query30	317	160	155	155
query31	1000	793	806	793
query32	98	64	59	59
query33	778	292	293	292
query34	966	518	520	518
query35	913	728	744	728
query36	1080	936	913	913
query37	166	89	90	89
query38	3955	3926	3987	3926
query39	1481	1413	1396	1396
query40	294	101	102	101
query41	52	51	46	46
query42	115	94	100	94
query43	545	477	488	477
query44	1237	832	801	801
query45	199	162	175	162
query46	1135	760	754	754
query47	1887	1784	1808	1784
query48	462	373	372	372
query49	1135	422	400	400
query50	824	403	411	403
query51	7128	6930	6959	6930
query52	103	89	88	88
query53	257	184	183	183
query54	1299	470	475	470
query55	87	78	76	76
query56	275	274	291	274
query57	1215	1084	1087	1084
query58	261	243	244	243
query59	3230	3148	2894	2894
query60	314	304	269	269
query61	104	108	101	101
query62	850	649	649	649
query63	222	191	187	187
query64	5198	658	645	645
query65	3218	3179	3312	3179
query66	1440	302	301	301
query67	16017	15404	15461	15404
query68	3104	572	583	572
query69	442	290	286	286
query70	1205	1130	1128	1128
query71	330	288	275	275
query72	5912	4032	3972	3972
query73	765	333	339	333
query74	9554	9048	9163	9048
query75	3421	2644	2652	2644
query76	2085	942	899	899
query77	418	291	293	291
query78	9855	9305	9590	9305
query79	1304	913	891	891
query80	892	609	587	587
query81	524	255	255	255
query82	809	239	231	231
query83	208	159	167	159
query84	242	105	97	97
query85	768	369	369	369
query86	394	292	322	292
query87	4464	4340	4499	4340
query88	4511	4101	4103	4101
query89	382	371	363	363
query90	1798	327	332	327
query91	168	169	170	169
query92	80	72	73	72
query93	945	927	918	918
query94	824	353	354	353
query95	467	424	421	421
query96	491	490	486	486
query97	3176	3087	3125	3087
query98	224	225	232	225
query99	1442	1314	1282	1282
Total cold run time: 297339 ms
Total hot run time: 195066 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 32.56 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 2fd5e86dce5017e8e2444b29f82748bd7dfd57bd, data reload: false

query1	0.04	0.04	0.04
query2	0.06	0.03	0.03
query3	0.23	0.06	0.06
query4	1.63	0.10	0.10
query5	0.52	0.52	0.49
query6	1.13	0.72	0.72
query7	0.02	0.02	0.01
query8	0.04	0.03	0.02
query9	0.57	0.50	0.49
query10	0.54	0.56	0.54
query11	0.14	0.10	0.10
query12	0.14	0.11	0.10
query13	0.61	0.59	0.58
query14	3.08	2.99	3.04
query15	0.92	0.84	0.84
query16	0.37	0.40	0.39
query17	1.06	1.08	1.08
query18	0.22	0.21	0.21
query19	1.98	1.92	1.96
query20	0.01	0.01	0.01
query21	15.36	0.59	0.58
query22	2.65	2.03	2.48
query23	17.28	0.94	0.75
query24	2.96	0.58	1.04
query25	0.29	0.14	0.07
query26	0.38	0.15	0.14
query27	0.04	0.04	0.03
query28	11.16	1.09	1.07
query29	12.61	3.20	3.22
query30	0.24	0.06	0.05
query31	2.89	0.38	0.39
query32	3.26	0.47	0.46
query33	2.96	3.00	3.05
query34	16.97	4.35	4.38
query35	4.50	4.40	4.50
query36	0.65	0.51	0.47
query37	0.08	0.06	0.05
query38	0.05	0.04	0.03
query39	0.03	0.02	0.02
query40	0.16	0.13	0.11
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 107.98 s
Total hot run time: 32.56 s

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

PR approved by at least one committer and no changes requested.

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Sep 19, 2024
Copy link
Contributor

PR approved by anyone and no changes requested.

@HappenLee HappenLee merged commit cee07d6 into apache:master Sep 19, 2024
23 of 28 checks passed
Mryange added a commit to Mryange/doris that referenced this pull request Sep 19, 2024
… string (apache#40929)

```

mysql [(none)]>select sub_replace("你好世界","a",1);
+-------------------------------------+
| sub_replace('你好世界', 'a', 1)     |
+-------------------------------------+
| �a�好世界                             |
+-------------------------------------+

mysql [(none)]>select SUBSTRING('中文测试',5);
+------------------------------------------+
| substring('中文测试', 5, 2147483647)     |
+------------------------------------------+
| 中文测试                                 |
+------------------------------------------+
1 row in set (0.04 sec)

now
mysql [(none)]>select sub_replace("你好世界","a",1);
+-------------------------------------+
| sub_replace('你好世界', 'a', 1)     |
+-------------------------------------+
| 你a世界                             |
+-------------------------------------+
1 row in set (0.05 sec)

mysql [(none)]>select SUBSTRING('中文测试',5);
+------------------------------------------+
| substring('中文测试', 5, 2147483647)     |
+------------------------------------------+
|                                          |
+------------------------------------------+
1 row in set (0.13 sec)
```
yiguolei pushed a commit that referenced this pull request Sep 23, 2024
#40954)

… string (#40929)
#40929
```

mysql [(none)]>select sub_replace("你好世界","a",1);
+-------------------------------------+
| sub_replace('你好世界', 'a', 1)     |
+-------------------------------------+
| �a�好世界                             |
+-------------------------------------+

mysql [(none)]>select SUBSTRING('中文测试',5);
+------------------------------------------+
| substring('中文测试', 5, 2147483647)     |
+------------------------------------------+
| 中文测试                                 |
+------------------------------------------+
1 row in set (0.04 sec)

now
mysql [(none)]>select sub_replace("你好世界","a",1);
+-------------------------------------+
| sub_replace('你好世界', 'a', 1)     |
+-------------------------------------+
| 你a世界                             |
+-------------------------------------+
1 row in set (0.05 sec)

mysql [(none)]>select SUBSTRING('中文测试',5);
+------------------------------------------+
| substring('中文测试', 5, 2147483647)     |
+------------------------------------------+
|                                          |
+------------------------------------------+
1 row in set (0.13 sec)
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
dataroaring pushed a commit that referenced this pull request Oct 9, 2024
… string (#40929)

```

mysql [(none)]>select sub_replace("你好世界","a",1);
+-------------------------------------+
| sub_replace('你好世界', 'a', 1)     |
+-------------------------------------+
| �a�好世界                             |
+-------------------------------------+



mysql [(none)]>select SUBSTRING('中文测试',5);
+------------------------------------------+
| substring('中文测试', 5, 2147483647)     |
+------------------------------------------+
| 中文测试                                 |
+------------------------------------------+
1 row in set (0.04 sec)



now
mysql [(none)]>select sub_replace("你好世界","a",1);
+-------------------------------------+
| sub_replace('你好世界', 'a', 1)     |
+-------------------------------------+
| 你a世界                             |
+-------------------------------------+
1 row in set (0.05 sec)

mysql [(none)]>select SUBSTRING('中文测试',5);
+------------------------------------------+
| substring('中文测试', 5, 2147483647)     |
+------------------------------------------+
|                                          |
+------------------------------------------+
1 row in set (0.13 sec)
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.7-merged dev/3.0.3-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants