-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support IGNORE NULLS for LEAD window function #9419
Conversation
3 | ||
24 | ||
14 | ||
78 50 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only test is failing....
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mustafasrepo setting batch size to extreme low value will affect the window range?
|
||
# result should be same with above, when lag algorithm work with pruned data. | ||
# decreasing batch size, causes data to be produced in smaller chunks at the source. | ||
# Hence sliding window algorithm is used during calculations. | ||
# LEAD function to be added, there is some bug with incoming data array when LEAD is called |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll address this in following PR, what I found is the incoming data is incomplete for LEAD.
so for set datafusion.execution.batch_size = 8000;
the incoming array is complete
[datafusion/physical-expr/src/window/lead_lag.rs:280:17] &array = StringArray
[
"x",
null,
"b",
null,
]
and for set datafusion.execution.batch_size = 1;
the data is partially missed
[datafusion/physical-expr/src/window/lead_lag.rs:280:17] &array = StringArray
[
"x",
null,
"b"
]
Probably the error is in built_in.rs
I'll check it in following PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @comphead for this PR. It is LGTM!. Regarding failing test, you mentioned. The reason for its failure is that BoundedWindowExecutor uses get_range
method to calculate required buffer for the result calculation. For most of the window function that uses window frame, get_range
is not used. Because required range can be determined from window frame. However, for some builtin window functions window frame is not used such as LEAD, LAG, ROW_NUMBER
. For these ones get_range
should be implemented to make sure that required buffer is always present for the result. I think bug is related to the get_range
implementation for the LEAD
. I will file a subsequent PR to solve this problem.
* IGNORE NULLS support for LEAD * fix * fix * fix
Which issue does this PR close?
Closes #.
Related to #9055
Rationale for this change
Expanding IGNORE NULLS support for LEAD window function
What changes are included in this PR?
Are these changes tested?
Yes
Are there any user-facing changes?