-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Convert rank
/ dense_rank
and percent_rank
builtin functions to UDWF
#12718
Convert rank
/ dense_rank
and percent_rank
builtin functions to UDWF
#12718
Conversation
@@ -508,6 +508,9 @@ message ScalarUDFExprNode { | |||
enum BuiltInWindowFunction { | |||
UNSPECIFIED = 0; // https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum | |||
// ROW_NUMBER = 0; | |||
// TODO | |||
// if i will remove this | |||
// should i renumber all variables? | |||
RANK = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcsherin how to handle this ?
I want to remove rank, percent_rank and dense_rank from here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You just need to comment out the lines (not remove them):
enum BuiltInWindowFunction {
UNSPECIFIED = 0; // https://protobuf.dev/programming-guides/dos-donts/#unspecified-enum
// ROW_NUMBER = 0;
// RANK = 1;
// DENSE_RANK = 2;
// PERCENT_RANK = 3;
…elated to Datafusion window function
…d updated pbson fields
\n Aggregate: groupBy=[[ROLLUP (person.state, person.last_name)]], aggr=[[sum(person.age), grouping(person.state), grouping(person.last_name)]]\ | ||
\n TableScan: person"; | ||
// TODO | ||
// fix this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jcsherin when i am running cargo test,
it is not able to detect the rank, dense_rank and percent_rank udwf functions.
I even added the window_function at line 2637 as well.
Any idea where to register the newly created udwf's.
Thanks, In advance .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UDWF is registered but the lookup in get_window_meta
still returns None
.
datafusion/datafusion/sql/tests/common/mod.rs
Lines 220 to 222 in 18193e6
fn get_window_meta(&self, _name: &str) -> Option<Arc<WindowUDF>> { | |
None | |
} |
This PR is coming along well 😄. Thanks @jatin510.
/// TODO | ||
/// how to handle import of percent_rank function from | ||
/// datafusion+functions_window | ||
/// as using this is causing the circular depency issue | ||
/// # // first_value is an aggregate function in another crate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the expr function have written docs for the percent_rank,
but when I try to import the percent_rank from datafusion_functions_window workspace,
its throwing error related to the circular depency
how to handle such cases ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a couple of ways to get around this. But here I noticed that the first_value
function is a mock and not the real first_value
window function.
So you can consider doing the same for percent_rank
. Then you don't have to import anything. This is alright because the test is not testing the functionality of percent_rank
in the doc test.
Another advantage of mocking in this specific case is that the change remains local to the doc test which I believe is good.
This is so exciting to see |
in
but percent_rank is supposed to return the float64 value |
datafusion/datafusion/physical-expr/src/window/rank.rs Lines 95 to 98 in a1ae158
datafusion/datafusion/functions-window-common/src/field.rs Lines 59 to 63 in e1b992a
@jatin510 You are right, the return type is The The datafusion/datafusion/expr/src/built_in_window_function.rs Lines 129 to 135 in 389f7f7
|
Thanks for your help, @jcsherin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can add test cases for the expression API here:
async fn roundtrip_expr_api() -> Result<()> { |
datafusion/sql/tests/common/mod.rs
Outdated
|
||
pub fn with_window_function(mut self, window_function: Arc<WindowUDF>) -> Self { | ||
self.window_functions.insert( | ||
window_function.name().to_string().to_lowercase(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the to_lowercase()
is needed here for WindowUDF
s.
@jatin510 Great job! This PR is almost there ✊. The It would be nicer if we continue to maintain the same form factor as the original implementation. You can reuse the pub enum RankType {
Basic,
Dense,
Percent,
} And in /// Evaluates the window function inside the given range.
fn evaluate(
&mut self,
values: &[ArrayRef],
range: &Range<usize>,
) -> Result<ScalarValue> {
/// no more code duplication here ...
match self.rank_type {
RankType::Basic => Ok(ScalarValue::UInt64(Some(
self.state.last_rank_boundary as u64 + 1,
))),
RankType::Dense => Ok(ScalarValue::UInt64(Some(self.state.n_rank as u64))),
RankType::Percent => {
exec_err!("Can not execute PERCENT_RANK in a streaming fashion")
}
}
} Would you like to unify this? If not, this can be completed in a follow-on PR. |
define_udwf_and_expr!( | ||
DenseRank, | ||
dense_rank, | ||
"Returns the rank of each row within a window partition without gaps." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Returns the rank of each row within a window partition without gaps." | |
"Returns rank of the current row without gaps. This function counts peer groups" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
define_udwf_and_expr!( | ||
PercentRank, | ||
percent_rank, | ||
"Returns the relative rank of the current row within a window partition." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Returns the relative rank of the current row within a window partition." | |
"Returns the relative rank of the current row: (rank - 1) / (total rows - 1)" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I will create a follow up PR for this change |
rank(), | ||
dense_rank(), | ||
percent_rank(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Awesome -- than you @jatin510 -- I have startd the CI and plan to review this tomorrow morning |
rank
builtin function to UDWF
…nk-percentrank-denserank
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @jatin510 and @jcsherin -- this is a really nice improvement. We are getting very close to having no built in functions 🎉 .
FYI @mustafasrepo @ozankabak and @berkaysynnada
// specific language governing permissions and limitations | ||
// under the License. | ||
|
||
//! Defines physical expression for `dense_rank` that can evaluated at runtime during query execution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking this module contains the WindowUDFImpl
for dense_rank
, but I think this wording is consistent with the other code in this crate.
I have this PR checked out to resolve some logical conflicts anyways, so I will pushed a commit to change it in this PR too
rank
builtin function to UDWFrank
/ dense_rank
and percen_rank
builtin functions to UDWF
rank
/ dense_rank
and percen_rank
builtin functions to UDWFrank
/ dense_rank
and percent_rank
builtin functions to UDWF
🚀 |
Which issue does this PR close?
Closes #12648
Rationale for this change
Same as described in #8709.
What changes are included in this PR?
Are these changes tested?
Updates explain output in sqllogictests to match lowercase rank, dense_rank and percent_rank.
Are there any user-facing changes?