Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support IGNORE NULLS for LEAD window function #9419

Merged
merged 4 commits into from
Mar 4, 2024
Merged

Conversation

comphead
Copy link
Contributor

@comphead comphead commented Mar 1, 2024

Which issue does this PR close?

Closes #.
Related to #9055

Rationale for this change

Expanding IGNORE NULLS support for LEAD window function

What changes are included in this PR?

Are these changes tested?

Yes

Are there any user-facing changes?

@github-actions github-actions bot added physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt) labels Mar 1, 2024
@comphead comphead changed the title IGNORE NULLS support for LEAD Implement IGNORE NULLS support for LEAD Mar 1, 2024
@comphead comphead marked this pull request as draft March 1, 2024 22:41
@comphead comphead changed the title Implement IGNORE NULLS support for LEAD Support IGNORE NULLS for LEAD window function Mar 1, 2024
3
24
14
78 50
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only test is failing....

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mustafasrepo setting batch size to extreme low value will affect the window range?

@comphead comphead marked this pull request as ready for review March 2, 2024 18:35
@comphead comphead requested a review from mustafasrepo March 2, 2024 18:36

# result should be same with above, when lag algorithm work with pruned data.
# decreasing batch size, causes data to be produced in smaller chunks at the source.
# Hence sliding window algorithm is used during calculations.
# LEAD function to be added, there is some bug with incoming data array when LEAD is called
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll address this in following PR, what I found is the incoming data is incomplete for LEAD.
so for set datafusion.execution.batch_size = 8000;
the incoming array is complete

[datafusion/physical-expr/src/window/lead_lag.rs:280:17] &array = StringArray
[
  "x",
  null,
  "b",
  null,
]

and for set datafusion.execution.batch_size = 1;
the data is partially missed

[datafusion/physical-expr/src/window/lead_lag.rs:280:17] &array = StringArray
[
  "x",
  null,
  "b"
]

Probably the error is in built_in.rs I'll check it in following PR

Copy link
Contributor

@mustafasrepo mustafasrepo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @comphead for this PR. It is LGTM!. Regarding failing test, you mentioned. The reason for its failure is that BoundedWindowExecutor uses get_range method to calculate required buffer for the result calculation. For most of the window function that uses window frame, get_range is not used. Because required range can be determined from window frame. However, for some builtin window functions window frame is not used such as LEAD, LAG, ROW_NUMBER. For these ones get_range should be implemented to make sure that required buffer is always present for the result. I think bug is related to the get_range implementation for the LEAD. I will file a subsequent PR to solve this problem.

@mustafasrepo mustafasrepo merged commit dd1cf01 into apache:main Mar 4, 2024
23 checks passed
wiedld pushed a commit to wiedld/arrow-datafusion that referenced this pull request Mar 21, 2024
* IGNORE NULLS support for LEAD

* fix

* fix

* fix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants