Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[window function] add percent_rank window function #1077

Merged
merged 1 commit into from
Oct 9, 2021

Conversation

jimexist
Copy link
Member

@jimexist jimexist commented Oct 7, 2021

Which issue does this PR close?

[window function] add percent_rank function

Closes #667
Related #1076

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

@github-actions github-actions bot added the datafusion Changes in the datafusion crate label Oct 7, 2021
@alamb alamb changed the title [window function] add percent_rank function [window function] add percent_rank and dense_rank window functions Oct 8, 2021
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jimexist ! I updated the title of this PR so that it reflects this PR implementing dense_rank as well as percent_rank

I think the code looks (very) good -- though it might be beneficial to have a unit test for dense_rank.

I also spot checked the answers using postgres:

Postgres:

alamb=# select * from arr;
 x | y
---+----
 1 | 10
 1 | 11
 1 |  3
 1 |  6
 2 | 20
 1 |  3
(6 rows)

alamb=# select x, percent_rank() over (partition by x order by y) from arr;
 x | percent_rank
---+--------------
 1 |            0
 1 |            0
 1 |          0.5
 1 |         0.75
 1 |            1
 2 |            0
(6 rows)

alamb=# select x, dense_rank() over (partition by x order by y) from arr;
 x | dense_rank 
---+------------
 1 |          1
 1 |          1
 1 |          2
 1 |          3
 1 |          4
 2 |          1
(6 rows)

DataFusion

echo "1,10" > /tmp/arr.csv
echo "1,11" >> /tmp/arr.csv
echo "1,3" >> /tmp/arr.csv
echo "1,6" >> /tmp/arr.csv
echo "2,20" >> /tmp/arr.csv
echo "1,3" >> /tmp/arr.csv
cargo run -p datafusion-cli
> CREATE EXTERNAL TABLE arr(x int, y int)
STORED AS CSV
LOCATION '/tmp/arr.csv';
0 rows in set. Query took 0.000 seconds.
>
>
> select * from arr;
+---+----+
| x | y  |
+---+----+
| 1 | 10 |
| 1 | 11 |
| 1 | 3  |
| 1 | 6  |
| 2 | 20 |
| 1 | 3  |
+---+----+
6 rows in set. Query took 0.013 seconds.
> select x, percent_rank() over (partition by x order by y) from arr;
+---+----------------+
| x | PERCENT_RANK() |
+---+----------------+
| 1 | 0              |
| 1 | 0              |
| 1 | 0.5            |
| 1 | 0.75           |
| 1 | 1              |
| 2 | 0              |
+---+----------------+

> select x, dense_rank() over (partition by x order by y) from arr;
+---+--------------+
| x | DENSE_RANK() |
+---+--------------+
| 1 | 1            |
| 1 | 1            |
| 1 | 2            |
| 1 | 3            |
| 1 | 4            |
| 2 | 1            |
+---+--------------+

👍

@@ -170,4 +236,33 @@ mod tests {
test_with_rank(&r, vec![1, 1, 3, 4, 4, 4, 7, 8])?;
Ok(())
}

#[test]
fn test_percent_rank() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should also add a test for dense_rank() here as well

@houqp houqp added the enhancement New feature or request label Oct 8, 2021
@jimexist
Copy link
Member Author

jimexist commented Oct 9, 2021

@alamb in fact dense_rank function was already implemented, this PR only added percent rank and thus i needed to change the dense: bool to a three element enum. same goes to unit test, as the code originally covered both rank and dense_rank.

@Dandandan Dandandan merged commit eb3e135 into apache:master Oct 9, 2021
@Dandandan
Copy link
Contributor

Thanks @jimexist !

@jimexist jimexist deleted the add-percent-rank branch October 9, 2021 09:25
@alamb alamb changed the title [window function] add percent_rank and dense_rank window functions [window function] add percent_rank window functions Oct 9, 2021
@alamb alamb changed the title [window function] add percent_rank window functions [window function] add percent_rank window function Oct 9, 2021
@alamb
Copy link
Contributor

alamb commented Oct 9, 2021

@alamb in fact dense_rank function was already implemented, this PR only added percent rank and thus i needed to change the dense: bool to a three element enum. same goes to unit test, as the code originally covered both rank and dense_rank.

I have (unbroken) the title of this PR now -- thank you for the clarification. Sorry for the mistake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datafusion Changes in the datafusion crate enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

good first PR: implement percent_rank and cume_dist built-in window functions
4 participants