Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combine the logic of rank, dense_rank and percent_rank udwf to reduce duplications #12893

Merged
merged 9 commits into from
Oct 16, 2024

Conversation

jatin510
Copy link
Contributor

Which issue does this PR close?

Closes #.

Rationale for this change

The functionality of the rank, percent_rank and dense_rank is same.
Combining them into one

What changes are included in this PR?

Combining rank, dense_rank and percent_rank logic

Are these changes tested?

Are there any user-facing changes?

@jatin510 jatin510 marked this pull request as draft October 12, 2024 12:00
);

// define_udwf_and_expr!(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcsherin while defining the other udwf like this, it is not working, giving an error during compile time.
Am I going in the right direction ?

Thanks !

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first argument needs to be a distinct identifier. I guess something like this will work,

define_udwf_and_expr!(BasicRank, rank, "...", Rank::basic);
define_udwf_and_expr!(PercentRank, rank, "...", Rank::percent_rank);
define_udwf_and_expr!(DenseRank, rank, "...", Rank::dense_rank);

@github-actions github-actions bot added core Core DataFusion crate proto Related to proto crate labels Oct 12, 2024
@jatin510 jatin510 changed the title wip: combining the logic of rank, dense_rank and percent_rank udwf combining the logic of rank, dense_rank and percent_rank udwf Oct 12, 2024
@jatin510 jatin510 marked this pull request as ready for review October 12, 2024 13:29
Comment on lines 135 to 140
let nullable = false;
Ok(Field::new(
field_args.name(),
self.data_type.clone(),
nullable,
))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Rank.data_type field can be removed.

Suggested change
let nullable = false;
Ok(Field::new(
field_args.name(),
self.data_type.clone(),
nullable,
))
let return_type = match self.rank_type {
RankType::Basic | RankType::Dense => DataType::UInt64,
RankType::Percent => DataType::Float64,
};
Ok(Field::new(
field_args.name(),
return_type.clone(),
false,
))

@@ -69,7 +114,7 @@ impl WindowUDFImpl for Rank {
}

fn name(&self) -> &str {
"rank"
self.name.as_str()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: &self.name

@jcsherin
Copy link
Contributor

jcsherin commented Oct 12, 2024

Please apply this patch to fix cargo doc failing in CI.

Download fix_macro_intra_doc_links.patch

$ git apply fix_macro_intra_doc_links.patch

It will resolve these errors:

error: unresolved link to `PercentRank`
   --> datafusion/functions-window/src/macros.rs:305:21
    |
305 |               #[doc = " Create a [`WindowFunction`](datafusion_expr::Expr::WindowFunction) expression for"]
    |  _____________________^
306 | |             #[doc = concat!(" [`", stringify!($UDWF), "`] user-defined window function.")]
307 | |             #[doc = ""]
308 | |             #[doc = concat!(" ", $DOC)]
    | |______________________________________^

This is because PercentRank here is just an identifier for creating a unique UDWF and not something we can link to. So in the macro generated doc comments [PercentRank] is a broken link. This patch temporarily fixes it by changing it to PercentRank by removing the square brackets used for intra-doc linking.


Patch contents

diff --git a/datafusion/functions-window/src/macros.rs b/datafusion/functions-window/src/macros.rs
index e934f883b101..2905ccf4c204 100644
--- a/datafusion/functions-window/src/macros.rs
+++ b/datafusion/functions-window/src/macros.rs
@@ -303,7 +303,7 @@ macro_rules! create_udwf_expr {
     ($UDWF:ident, $OUT_FN_NAME:ident, $DOC:expr) => {
         paste::paste! {
             #[doc = " Create a [`WindowFunction`](datafusion_expr::Expr::WindowFunction) expression for"]
-            #[doc = concat!(" [`", stringify!($UDWF), "`] user-defined window function.")]
+            #[doc = concat!(" `", stringify!($UDWF), "` user-defined window function.")]
             #[doc = ""]
             #[doc = concat!(" ", $DOC)]
             pub fn $OUT_FN_NAME() -> datafusion_expr::Expr {
@@ -316,7 +316,7 @@ macro_rules! create_udwf_expr {
     ($UDWF:ident, $OUT_FN_NAME:ident, [$($PARAM:ident),+], $DOC:expr) => {
         paste::paste! {
             #[doc = " Create a [`WindowFunction`](datafusion_expr::Expr::WindowFunction) expression for"]
-            #[doc = concat!(" [`", stringify!($UDWF), "`] user-defined window function.")]
+            #[doc = concat!(" `", stringify!($UDWF), "` user-defined window function.")]
             #[doc = ""]
             #[doc = concat!(" ", $DOC)]
             pub fn $OUT_FN_NAME(

Comment on lines 89 to 91
/// Get rank_type of the rank in window function with order by
pub fn get_type(&self) -> RankType {
self.rank_type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ok to remove Rank::get_type() as it is not used anywhere.

@jatin510
Copy link
Contributor Author

Thanks @jcsherin

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @jatin510 and @jcsherin -- this looks like a very nice cleanup to me

cc @berkaysynnada @ozankabak

@alamb alamb changed the title combining the logic of rank, dense_rank and percent_rank udwf Combine the logic of rank, dense_rank and percent_rank udwf to reduce duplications Oct 15, 2024
Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Module doc's can be updated in rank.rs.

@github-actions github-actions bot added the documentation Improvements or additions to documentation label Oct 15, 2024
@jatin510
Copy link
Contributor Author

LGTM. Module doc's can be updated in rank.rs.

done @berkaysynnada

@berkaysynnada berkaysynnada merged commit 747001a into apache:main Oct 16, 2024
25 checks passed
@alamb
Copy link
Contributor

alamb commented Oct 16, 2024

🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate documentation Improvements or additions to documentation proto Related to proto crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants