`lag()/at()/lead()` return show offset'th row, it is not related to window frame bound #1554

aceforeverd · 2022-03-31T09:22:59Z

Bug Description

  - id: 6
    desc: window merge optimization
    inputs:
      - columns: [ "row_id int","ts timestamp","group1 string","val1 int" ]
        indexs: [ "index1:group1:ts" ]
        name: t1
        data: |
          1, 1612130400000, g1, 1
          2, 1612130401000, g1, 2
          3, 1612130402000, g1, 3
          4, 1612130403000, g1, 4
          5, 1612130404000, g1, 5
    sql: |
      select
      `row_id` as row_id_1,
      `row_id` as t1_row_id_original_0,
      case when !isnull(at(`val1`, 0)) over t1_group1_ts_0s_5s_10 then count_where(`val1`, `val1` = at(`val1`, 0)) over t1_group1_ts_0s_5s_10 else null end as t1_val1_window_count_1,
      case when !isnull(at(`val1`, 0)) over t1_group1_ts_1s_5s_10 then count_where(`val1`, `val1` = at(`val1`, 0)) over t1_group1_ts_1s_5s_10 else null end as t1_val1_window_count_2
      from
        `t1` WINDOW
        t1_group1_ts_0s_5s_10 as (partition by `group1` order by `ts` rows_range between 5s preceding and 0s preceding MAXSIZE 10),
        t1_group1_ts_1s_5s_10 as (partition by `group1` order by `ts` rows_range between 5s preceding and 1s preceding MAXSIZE 10);
    expect:
      columns: ["row_id_1 int", "t1_row_id_original_0 int", "t1_val1_window_count_1 int64", "t1_val1_window_count_2 int64"]
      order: row_id_1
      data: |
        1, 1, 1, 0
        2, 2, 1, 0
        3, 3, 1, 0
        4, 4, 1, 0
        5, 5, 1, 0

Expected Behavior

the case should pass. Current result is not:

+----------+----------------------+------------------------+------------------------+
| row_id_1 | t1_row_id_original_0 | t1_val1_window_count_1 | t1_val1_window_count_2 |
+----------+----------------------+------------------------+------------------------+
| 1        | 1                    | 1                      | NULL                   |
| 2        | 2                    | 1                      | NULL                   |
| 3        | 3                    | 1                      | NULL                   |
| 4        | 4                    | 1                      | NULL                   |
| 5        | 5                    | 1                      | NULL                   |
+----------+----------------------+------------------------+------------------------+

Work List

add extra handling in logic plan for lag functions
- working in fix: 1554 lag correctness #1605
fix lag correctness in request mode & cluster environment
fix window merge result correctness ( may related to at/lag/first_value udf function over merged window report different result compared to those not merged #1587 )

The text was updated successfully, but these errors were encountered:

aceforeverd · 2022-03-31T09:26:24Z

High-priority Bugs

aceforeverd · 2022-03-31T09:29:10Z

bugfix-liguo

aceforeverd · 2022-04-01T15:31:17Z

the problem might be complex, I just have a look for at() function (which is alias for lag()) currently:

the output for SQL:

      select
      `row_id` as row_id_1,
      `row_id` as t1_row_id_original_0,
      `val1` as val1,
      lag(`val1`, 0) over t1_group1_ts_1s_5s_10 as t1_val1_window_count_2
      from `t1` WINDOW
        t1_group1_ts_1s_5s_10 as (partition by `group1` order by `ts` rows_range between 5s preceding and 1s preceding MAXSIZE 10);

is:

+-----------------+----------+----------------------+------+------------------------+
| MemTableHandler | row_id_1 | t1_row_id_original_0 | val1 | t1_val1_window_count_2 |
+-----------------+----------+----------------------+------+------------------------+
| 0               | 1        | 1                    | 1    | NULL                   |
| 1               | 2        | 2                    | 2    | 1                      |
| 2               | 3        | 3                    | 3    | 2                      |
| 3               | 4        | 4                    | 4    | 3                      |
| 4               | 5        | 5                    | 5    | 4                      |
+-----------------+----------+----------------------+------+------------------------+

here lag(val1, 0) returns the first row in window, that feels not corret. I tested MySQL, and the meaning there is just offsetth row relative to current row:

MariaDB [test]> select id, ts, `group`, lag(val, 0) over (partition by `group` order by ts rows between 5 preceding and 1 preceding) as l from t1;
+----+-----+-------+------+
| id | ts  | group | l    |
+----+-----+-------+------+
|  1 | 100 | g1    |    1 |
|  2 | 200 | g1    |    2 |
|  3 | 300 | g1    |    3 |
|  4 | 400 | g1    |    4 |
|  5 | 500 | g1    |    5 |
+----+-----+-------+------+
5 rows in set (0.000 sec)

MariaDB [test]> select id, ts, `group`, lag(val, 1) over (partition by `group` order by ts rows between 5 preceding and 1 preceding) as l from t1;
+----+-----+-------+------+
| id | ts  | group | l    |
+----+-----+-------+------+
|  1 | 100 | g1    | NULL |
|  2 | 200 | g1    |    1 |
|  3 | 300 | g1    |    2 |
|  4 | 400 | g1    |    3 |
|  5 | 500 | g1    |    4 |
+----+-----+-------+------+
5 rows in set (0.000 sec)

SparkSQL do not support lag over window frame, so there is no standard from spark. But to get first or nth row in window, there is first_value or nth_value in spark.
Do we have detailed definition for lag @jingchen2222 @tobegit3hub 🌝

other reference:

https://docs.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql?view=sql-server-ver15

aceforeverd · 2022-04-01T15:38:20Z

previous definition for lag and lead: #265

aceforeverd · 2022-04-01T15:40:37Z

the documents seems not correct or accurate

OpenMLDB/hybridse/src/udf/default_defs/window_functions_def.cc

Lines 97 to 115 in 06ccd67

    
                   .doc(R"( 
        
                       @brief Returns the value of expression from the offset-th row of the ordered partition. 
        
                       @param offset The number of rows forward from the current row from which to obtain the value. 
        
                       Example: 
        
                       |value| 
        
                       |--| 
        
                       |0| 
        
                       |1| 
        
                       |2| 
        
                       |3| 
        
                       |4| 
        
                       @code{.sql} 
        
                           SELECT at(value, 3) OVER w; 
        
                           -- output 3 
        
                       @endcode 
        
                   )")

aceforeverd · 2022-04-02T10:17:19Z

Bug Description

  - id: 6
    desc: window merge optimization
    inputs:
      - columns: [ "row_id int","ts timestamp","group1 string","val1 int" ]
        indexs: [ "index1:group1:ts" ]
        name: t1
        data: |
          1, 1612130400000, g1, 1
          2, 1612130401000, g1, 2
          3, 1612130402000, g1, 3
          4, 1612130403000, g1, 4
          5, 1612130404000, g1, 5
    sql: |
      select
      `row_id` as row_id_1,
      `row_id` as t1_row_id_original_0,
      case when !isnull(at(`val1`, 0)) over t1_group1_ts_0s_5s_10 then count_where(`val1`, `val1` = at(`val1`, 0)) over t1_group1_ts_0s_5s_10 else null end as t1_val1_window_count_1,
      case when !isnull(at(`val1`, 0)) over t1_group1_ts_1s_5s_10 then count_where(`val1`, `val1` = at(`val1`, 0)) over t1_group1_ts_1s_5s_10 else null end as t1_val1_window_count_2
      from
        `t1` WINDOW
        t1_group1_ts_0s_5s_10 as (partition by `group1` order by `ts` rows_range between 5s preceding and 0s preceding MAXSIZE 10),
        t1_group1_ts_1s_5s_10 as (partition by `group1` order by `ts` rows_range between 5s preceding and 1s preceding MAXSIZE 10);
    expect:
      columns: ["row_id_1 int", "t1_row_id_original_0 int", "t1_val1_window_count_1 int64", "t1_val1_window_count_2 int64"]
      order: row_id_1
      data: |
        1, 1, 1, NULL
        2, 2, 1, 1
        3, 3, 1, 1
        4, 4, 1, 1
        5, 5, 1, 1

Expected Behavior

the case should pass. Current result is not:

+----------+----------------------+------------------------+------------------------+
| row_id_1 | t1_row_id_original_0 | t1_val1_window_count_1 | t1_val1_window_count_2 |
+----------+----------------------+------------------------+------------------------+
| 1        | 1                    | 1                      | NULL                   |
| 2        | 2                    | 1                      | NULL                   |
| 3        | 3                    | 1                      | NULL                   |
| 4        | 4                    | 1                      | NULL                   |
| 5        | 5                    | 1                      | NULL                   |
+----------+----------------------+------------------------+------------------------+

Steps to Reproduce

neither the yaml case is correct. actual result should be:

at(val1, 0) refer to current row, so !isnull(at(val1, 0)) always true. And since the second window is above current row absolutely, count_where always have zero match

lumianph · 2022-04-05T09:53:47Z

@jingchen2222 can you help take a look, and probably give some suggestions where we should look into this issue?

jingchen2222 · 2022-04-05T18:39:59Z

the problem might be complex, I just have a look for at() function (which is alias for lag()) currently:

the output for SQL:

      select
      `row_id` as row_id_1,
      `row_id` as t1_row_id_original_0,
      `val1` as val1,
      lag(`val1`, 0) over t1_group1_ts_1s_5s_10 as t1_val1_window_count_2
      from `t1` WINDOW
        t1_group1_ts_1s_5s_10 as (partition by `group1` order by `ts` rows_range between 5s preceding and 1s preceding MAXSIZE 10);

is:

+-----------------+----------+----------------------+------+------------------------+
| MemTableHandler | row_id_1 | t1_row_id_original_0 | val1 | t1_val1_window_count_2 |
+-----------------+----------+----------------------+------+------------------------+
| 0               | 1        | 1                    | 1    | NULL                   |
| 1               | 2        | 2                    | 2    | 1                      |
| 2               | 3        | 3                    | 3    | 2                      |
| 3               | 4        | 4                    | 4    | 3                      |
| 4               | 5        | 5                    | 5    | 4                      |
+-----------------+----------+----------------------+------+------------------------+

here lag(val1, 0) returns the first row in window, that feels not corret. I tested MySQL, and the meaning there is just offsetth row relative to current row:

MariaDB [test]> select id, ts, `group`, lag(val, 0) over (partition by `group` order by ts rows between 5 preceding and 1 preceding) as l from t1;
+----+-----+-------+------+
| id | ts  | group | l    |
+----+-----+-------+------+
|  1 | 100 | g1    |    1 |
|  2 | 200 | g1    |    2 |
|  3 | 300 | g1    |    3 |
|  4 | 400 | g1    |    4 |
|  5 | 500 | g1    |    5 |
+----+-----+-------+------+
5 rows in set (0.000 sec)

MariaDB [test]> select id, ts, `group`, lag(val, 1) over (partition by `group` order by ts rows between 5 preceding and 1 preceding) as l from t1;
+----+-----+-------+------+
| id | ts  | group | l    |
+----+-----+-------+------+
|  1 | 100 | g1    | NULL |
|  2 | 200 | g1    |    1 |
|  3 | 300 | g1    |    2 |
|  4 | 400 | g1    |    3 |
|  5 | 500 | g1    |    4 |
+----+-----+-------+------+
5 rows in set (0.000 sec)

SparkSQL do not support lag over window frame, so there is no standard from spark. But to get first or nth row in window, there is first_value or nth_value in spark. Do we have detailed definition for lag @jingchen2222 @tobegit3hub 🌝

other reference:

https://docs.microsoft.com/en-us/sql/t-sql/functions/lag-transact-sql?view=sql-server-ver15

@aceforeverd You're right. The lag should have returned the expression of the row preceding n offset from the current row. The current implementation of the lag function, however, is based on the expression of the row preceding n offset from the start row of the window.

I think it would be a tough job.
@lumianph @aceforeverd @tobegit3hub
I suggest that firstly we can put a restriction on the usage of the lag function. That is the lag function can only be used when the associated a CurrentHisotryWindow. (a window rows/rows_range between x preceding and current row)

select id, ts, `group`, lag(val, 1) over (partition by `group` order by ts rows between 5 preceding and current row) as l from t1;

Secondly, fix the lag expression over a pure history window, a window excludes the current row.

select id, ts, `group`, lag(val, 1) over (partition by `group` order by ts rows between 5 preceding and 2 preceding) as l from t1;

We should extend the pure history window to a CurrentHisotryWindow. And then you should pass the extended window instead of the pure history window to the lag function
This might introduces some changes to the HistroyWindow class for BatchMode queries.
Consider how to construct a window for RequestMode query when the lag over a pure history window. We should keep in mind that simply scanning a range of rows and then union on the current row can not meet the requirement in this case.

Things will be worse if we are trying to deal with expression as follows:
select count_where(val, val != lag(val,1)) over (partition by `group` order by ts rows between 5 preceding and current row) as l from t1;
we should pass a current history window to lag(val, 1) while pass a pure history window to count_where(val,...)

lumianph · 2022-04-06T05:05:10Z

@jingchen2222 thank you very much. @aceforeverd can you first help evaluate the workload. i will call a meeting to discuss the solution.

lumianph · 2022-04-06T05:06:53Z

the window bug

aceforeverd · 2022-04-06T05:38:40Z

@jingchen2222 thank you very much. @aceforeverd can you first help evaluate the workload. i will call a meeting to discuss the solution.

sure

aceforeverd · 2022-04-07T02:40:35Z

@aceforeverd You're right. The lag should have returned the expression of the row preceding n offset from the current row. The current implementation of the lag function, however, is based on the expression of the row preceding n offset from the start row of the window.

I think it would be a tough job. @lumianph @aceforeverd @tobegit3hub I suggest that firstly we can put a restriction on the usage of the lag function. That is the lag function can only be used when the associated a CurrentHisotryWindow. (a window rows/rows_range between x preceding and current row)
select id, ts, `group`, lag(val, 1) over (partition by `group` order by ts rows between 5 preceding and current row) as l from t1;
Secondly, fix the lag expression over a pure history window, a window excludes the current row.
select id, ts, `group`, lag(val, 1) over (partition by `group` order by ts rows between 5 preceding and 2 preceding) as l from t1;
We should extend the pure history window to a CurrentHisotryWindow. And then you should pass the extended window instead of the pure history window to the lag function

This might introduces some changes to the HistroyWindow class for BatchMode queries.

Consider how to construct a window for RequestMode query when the lag over a pure history window. We should keep in mind that simply scanning a range of rows and then union on the current row can not meet the requirement in this case.

Things will be worse if we are trying to deal with expression as follows: select count_where(val, val != lag(val,1)) over (partition by `group` order by ts rows between 5 preceding and current row) as l from t1; we should pass a current history window to lag(val, 1) while pass a pure history window to count_where(val,...)

Thank you ~. Since the problem is the offset in lag appears out-of-range of window frame bound, so it is also possible for offset to be beyond window upper bound. Hence I'm thinking giving all lag/at function the same window frame as UNBOUND AND CURRENT ROW

It is some kind misleading in SQL view as lag is actually not related to window frame bound

aceforeverd · 2022-04-07T15:32:57Z

The first thing I'd like to do is in the Planner when transform from SQLNote, will create a separate window for all at/lag project by its order and partition, but make frame unbound preceding and current row.

Meanwhile SQL engine should support the syntax without window frame or even without partition by, e.g select lag(row) over (order by col1 partition by col2) from t1;, as it is the encouraged way to use lag function.

aceforeverd · 2022-04-08T04:34:01Z

@aceforeverd You're right. The lag should have returned the expression of the row preceding n offset from the current row. The current implementation of the lag function, however, is based on the expression of the row preceding n offset from the start row of the window.

I think it would be a tough job. @lumianph @aceforeverd @tobegit3hub I suggest that firstly we can put a restriction on the usage of the lag function. That is the lag function can only be used when the associated a CurrentHisotryWindow. (a window rows/rows_range between x preceding and current row)
select id, ts, `group`, lag(val, 1) over (partition by `group` order by ts rows between 5 preceding and current row) as l from t1;
Secondly, fix the lag expression over a pure history window, a window excludes the current row.
select id, ts, `group`, lag(val, 1) over (partition by `group` order by ts rows between 5 preceding and 2 preceding) as l from t1;
We should extend the pure history window to a CurrentHisotryWindow. And then you should pass the extended window instead of the pure history window to the lag function

This might introduces some changes to the HistroyWindow class for BatchMode queries.

Consider how to construct a window for RequestMode query when the lag over a pure history window. We should keep in mind that simply scanning a range of rows and then union on the current row can not meet the requirement in this case.

@jingchen2222 I can't understand the third point though

Things will be worse if we are trying to deal with expression as follows: select count_where(val, val != lag(val,1)) over (partition by `group` order by ts rows between 5 preceding and current row) as l from t1; we should pass a current history window to lag(val, 1) while pass a pure history window to count_where(val,...)

…rent row - logic plan: for lag/at project, it will create a new `ProjectListNode` with window frame bound to [unbound, current row] - the fix may not work in batch-request or cluster environment

jingchen2222 · 2022-04-11T02:00:10Z

thinking giving all lag/at function the same window frame as UNBOUND AND CURRENT ROW

Giving all lag/at function the same window frame as UNBOUND AND CURRENT ROW might introduce another problem.
That is the unpredictable performance of an unbounded window especially when the dataset contains hundreds and thousands of rows associated with one key?

aceforeverd · 2022-04-11T02:15:15Z

thinking giving all lag/at function the same window frame as UNBOUND AND CURRENT ROW

Giving all lag/at function the same window frame as UNBOUND AND CURRENT ROW might introduce another problem. That is the unpredictable performance of an unbounded window especially when the dataset contains hundreds and thousands of rows associated with one key?

Yeah. We do not seem to have restriction for the type of offset, so it is not determined on compile time how big the offset value will be. If offset is restricted to integer or simple expr that can transformed to integer in compile time, it may do some optimization for lag's window ?

zhanghaohit · 2022-04-11T04:23:27Z

thinking giving all lag/at function the same window frame as UNBOUND AND CURRENT ROW

Giving all lag/at function the same window frame as UNBOUND AND CURRENT ROW might introduce another problem. That is the unpredictable performance of an unbounded window especially when the dataset contains hundreds and thousands of rows associated with one key?

Yeah. We do not seem to have restriction for the type of offset, so it is not determined on compile time how big the offset value will be. If offset is restricted to integer or simple expr that can transformed to integer in compile time, it may do some optimization for lag's window ?

offset can be something that can only be known at runtime? Support type is not like this?

Supported Types:
[list<bool>, int64]
[list<date>, int64]
[list<number>, int64]
[list<string>, int64]
[list<timestamp>, int64]

aceforeverd · 2022-04-11T05:23:33Z

offset can be something that can only be known at runtime? Support type is not like this?
Supported Types:
[list<bool>, int64]
[list<date>, int64]
[list<number>, int64]
[list<string>, int64]
[list<timestamp>, int64]

yeah, it is runnable e.g lag(col1, lag(col2, 1)), I think we just need lag(col, 0) or lag(col, 1+1). Meanwhile, spark sql seems do the same restriction.

…rent row - logic plan: for lag/at project, it will create a new `ProjectListNode` with window frame bound to [unbound, current row] - the fix may not work in batch-request or cluster environment

it will create a new rows window for lag like functions

…rent row - logic plan: for lag/at project, it will create a new `ProjectListNode` with window frame bound to [unbound, current row] - the fix may not work in batch-request or cluster environment

it will create a new rows window for lag like functions

* add a few debug log for sql parser * fix(#1554): lag results always evaluated with respect to current row

aceforeverd added bug Something isn't working community issues from the community labels Mar 31, 2022

aceforeverd self-assigned this Mar 31, 2022

aceforeverd changed the title ~~unexpected output for homogeneous SQL window migration~~ unexpected query output when using homogeneous SQL window Mar 31, 2022

lumianph added the high-priority label Apr 2, 2022

aceforeverd changed the title ~~unexpected query output when using homogeneous SQL window~~ lag()/lead() return show offset'th row, it is not related to window frame bound Apr 2, 2022

aceforeverd mentioned this issue Apr 6, 2022

at/lag/first_value udf function over merged window report different result compared to those not merged #1587

Closed

aceforeverd changed the title ~~lag()/lead() return show offset'th row, it is not related to window frame bound~~ lag()/at()/lead() return show offset'th row, it is not related to window frame bound Apr 7, 2022

zhanghaohit closed this as completed Apr 11, 2022

zhanghaohit reopened this Apr 11, 2022

aceforeverd added a commit to aceforeverd/fedb that referenced this issue Apr 27, 2022

re-fix 4paradigm#1554

96d604d

it will create a new rows window for lag like functions

aceforeverd mentioned this issue Apr 29, 2022

fix: 1554 lag correctness #1605

Merged

aceforeverd added a commit to aceforeverd/fedb that referenced this issue May 5, 2022

re-fix 4paradigm#1554

54874fc

it will create a new rows window for lag like functions

aceforeverd closed this as completed in #1605 May 6, 2022

aceforeverd added a commit that referenced this issue May 6, 2022

fix: 1554 lag correctness (#1605)

346092e

* add a few debug log for sql parser * fix(#1554): lag results always evaluated with respect to current row

aceforeverd mentioned this issue Aug 17, 2022

window: the right border can't work? #2365

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`lag()/at()/lead()` return show offset'th row, it is not related to window frame bound #1554

`lag()/at()/lead()` return show offset'th row, it is not related to window frame bound #1554

aceforeverd commented Mar 31, 2022 •

edited

Loading

aceforeverd commented Mar 31, 2022

aceforeverd commented Mar 31, 2022

aceforeverd commented Apr 1, 2022 •

edited

Loading

aceforeverd commented Apr 1, 2022

aceforeverd commented Apr 1, 2022 •

edited

Loading

aceforeverd commented Apr 2, 2022 •

edited

Loading

lumianph commented Apr 5, 2022

jingchen2222 commented Apr 5, 2022 •

edited

Loading

the output for SQL:

lumianph commented Apr 6, 2022

lumianph commented Apr 6, 2022

aceforeverd commented Apr 6, 2022

aceforeverd commented Apr 7, 2022 •

edited

Loading

aceforeverd commented Apr 7, 2022

aceforeverd commented Apr 8, 2022 •

edited

Loading

jingchen2222 commented Apr 11, 2022 •

edited

Loading

aceforeverd commented Apr 11, 2022 •

edited

Loading

zhanghaohit commented Apr 11, 2022

aceforeverd commented Apr 11, 2022

lag()/at()/lead() return show offset'th row, it is not related to window frame bound #1554

lag()/at()/lead() return show offset'th row, it is not related to window frame bound #1554

Comments

aceforeverd commented Mar 31, 2022 • edited Loading

aceforeverd commented Mar 31, 2022

aceforeverd commented Mar 31, 2022

aceforeverd commented Apr 1, 2022 • edited Loading

the output for SQL:

aceforeverd commented Apr 1, 2022

aceforeverd commented Apr 1, 2022 • edited Loading

aceforeverd commented Apr 2, 2022 • edited Loading

lumianph commented Apr 5, 2022

jingchen2222 commented Apr 5, 2022 • edited Loading

the output for SQL:

lumianph commented Apr 6, 2022

lumianph commented Apr 6, 2022

aceforeverd commented Apr 6, 2022

aceforeverd commented Apr 7, 2022 • edited Loading

aceforeverd commented Apr 7, 2022

aceforeverd commented Apr 8, 2022 • edited Loading

jingchen2222 commented Apr 11, 2022 • edited Loading

aceforeverd commented Apr 11, 2022 • edited Loading

zhanghaohit commented Apr 11, 2022

aceforeverd commented Apr 11, 2022

`lag()/at()/lead()` return show offset'th row, it is not related to window frame bound #1554

`lag()/at()/lead()` return show offset'th row, it is not related to window frame bound #1554

aceforeverd commented Mar 31, 2022 •

edited

Loading

aceforeverd commented Apr 1, 2022 •

edited

Loading

aceforeverd commented Apr 1, 2022 •

edited

Loading

aceforeverd commented Apr 2, 2022 •

edited

Loading

jingchen2222 commented Apr 5, 2022 •

edited

Loading

aceforeverd commented Apr 7, 2022 •

edited

Loading

aceforeverd commented Apr 8, 2022 •

edited

Loading

jingchen2222 commented Apr 11, 2022 •

edited

Loading

aceforeverd commented Apr 11, 2022 •

edited

Loading