You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I had searched in the issues and found no similar issues.
Version
2.0.3
What's Wrong?
when I insert data into sink_table from source_table, START_CALL_TIME filed is of DATE type in sink_table and is of DATETYPE type in source_table, the inconsistent types lead to data loss when inserting data using nereids planner, but it is ok using old planner。
this is my sql:
INSERT INTO sink_table SELECT *
FROM (
SELECT DISTINCT acct_month, call_date, device_number, OPPOSE_NUMBER, org_trm_id
, START_CALL_TIME, call_duration
FROM source_table
WHERE acct_month = SUBSTR("20240131", 1, 6) AND call_date = SUBSTR("20240131", -2)
UNION
SELECT DISTINCT acct_month, call_date, OPPOSE_NUMBER AS device_number, device_number AS OPPOSE_NUMBER, org_trm_id
, START_CALL_TIME, call_duration
FROM source_table
WHERE acct_month = SUBSTR("20240131", 1, 6) AND call_date = SUBSTR("20240131", -2)
) alias_17125550933410;
correct result is 27837957 rows in old planner, but the wrong result is 27531867 rows in nereids planner.
but when change START_CALL_TIME to DATETIME type in sink_table, nereids planner is right.
What You Expected?
is there something wrong in nereids optimizer?
How to Reproduce?
Because this problem occurs in a production environment, we can mock table to get similiar plan:
sink_table:
source_table:
data in test.csv:
10000,2017-10-01,北京,20,0,2017-10-01 06:00:00,20,10,10
10000,2017-10-01,北京,20,0,2017-10-01 07:00:00,15,2,2
10001,2017-10-01,北京,30,1,2017-10-01 17:05:45,2,22,22
10002,2017-10-02,上海,20,1,2017-10-02 12:59:12,200,5,5
10003,2017-10-02,广州,32,0,2017-10-02 11:20:00,30,11,11
10004,2017-10-01,深圳,35,0,2017-10-01 10:00:15,100,3,3
10004,2017-10-03,深圳,35,0,2017-10-03 10:20:22,11,6,6
insert into sink_table from source_table:
insert into sink_table
select * from (
select DISTINCT user_id, date, city, age, sex,last_visit_date,cost,max_dwell_time,min_dwell_time from source_table where city='北京'
union
select DISTINCT user_id, date, city, age, sex,last_visit_date,cost,max_dwell_time,min_dwell_time from source_table where city='深圳'
) t;
The text was updated successfully, but these errors were encountered:
yx-keith
changed the title
[Bug] data is lost when inserting into using nereids optimizer
[Bug] data is lost when inserting into using nereids planner
May 10, 2024
yx-keith
changed the title
[Bug] data is lost when inserting into using nereids planner
[Bug] data is lost when inserting into using nereids planner when type is inconsistent
May 10, 2024
yx-keith
changed the title
[Bug] data is lost when inserting into using nereids planner when type is inconsistent
[Bug] data is lost when inserting into using nereids planner when type is inconsistent(DATETIME TO DATE)
May 10, 2024
Search before asking
Version
2.0.3
What's Wrong?
when I insert data into sink_table from source_table, START_CALL_TIME filed is of DATE type in sink_table and is of DATETYPE type in source_table, the inconsistent types lead to data loss when inserting data using nereids planner, but it is ok using old planner。
this is my sql:
INSERT INTO sink_table SELECT *
FROM (
SELECT DISTINCT acct_month, call_date, device_number, OPPOSE_NUMBER, org_trm_id
, START_CALL_TIME, call_duration
FROM source_table
WHERE acct_month = SUBSTR("20240131", 1, 6) AND call_date = SUBSTR("20240131", -2)
UNION
SELECT DISTINCT acct_month, call_date, OPPOSE_NUMBER AS device_number, device_number AS OPPOSE_NUMBER, org_trm_id
, START_CALL_TIME, call_duration
FROM source_table
WHERE acct_month = SUBSTR("20240131", 1, 6) AND call_date = SUBSTR("20240131", -2)
) alias_17125550933410;
correct result is 27837957 rows in old planner, but the wrong result is 27531867 rows in nereids planner.
but when change START_CALL_TIME to DATETIME type in sink_table, nereids planner is right.
What You Expected?
is there something wrong in nereids optimizer?
How to Reproduce?
Because this problem occurs in a production environment, we can mock table to get similiar plan:
sink_table:
source_table:
data in test.csv:
10000,2017-10-01,北京,20,0,2017-10-01 06:00:00,20,10,10
10000,2017-10-01,北京,20,0,2017-10-01 07:00:00,15,2,2
10001,2017-10-01,北京,30,1,2017-10-01 17:05:45,2,22,22
10002,2017-10-02,上海,20,1,2017-10-02 12:59:12,200,5,5
10003,2017-10-02,广州,32,0,2017-10-02 11:20:00,30,11,11
10004,2017-10-01,深圳,35,0,2017-10-01 10:00:15,100,3,3
10004,2017-10-03,深圳,35,0,2017-10-03 10:20:22,11,6,6
load test.csv to source_table:
curl --location-trusted -u root: -T test.csv -H "column_separator:," http://127.0.0.1:8030/api/demo/source_table/_stream_load
insert into sink_table from source_table:
insert into sink_table
select * from (
select DISTINCT user_id, date, city, age, sex,last_visit_date,cost,max_dwell_time,min_dwell_time from source_table where city='北京'
union
select DISTINCT user_id, date, city, age, sex,last_visit_date,cost,max_dwell_time,min_dwell_time from source_table where city='深圳'
) t;
nereids plan:
old planner:
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: