-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[window function] add percent_rank
window function
#1077
Conversation
5df82b3
to
515ada6
Compare
515ada6
to
18aff6a
Compare
percent_rank
functionpercent_rank
and dense_rank
window functions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jimexist ! I updated the title of this PR so that it reflects this PR implementing dense_rank
as well as percent_rank
I think the code looks (very) good -- though it might be beneficial to have a unit test for dense_rank
.
I also spot checked the answers using postgres:
Postgres:
alamb=# select * from arr;
x | y
---+----
1 | 10
1 | 11
1 | 3
1 | 6
2 | 20
1 | 3
(6 rows)
alamb=# select x, percent_rank() over (partition by x order by y) from arr;
x | percent_rank
---+--------------
1 | 0
1 | 0
1 | 0.5
1 | 0.75
1 | 1
2 | 0
(6 rows)
alamb=# select x, dense_rank() over (partition by x order by y) from arr;
x | dense_rank
---+------------
1 | 1
1 | 1
1 | 2
1 | 3
1 | 4
2 | 1
(6 rows)
DataFusion
echo "1,10" > /tmp/arr.csv
echo "1,11" >> /tmp/arr.csv
echo "1,3" >> /tmp/arr.csv
echo "1,6" >> /tmp/arr.csv
echo "2,20" >> /tmp/arr.csv
echo "1,3" >> /tmp/arr.csv
cargo run -p datafusion-cli
> CREATE EXTERNAL TABLE arr(x int, y int)
STORED AS CSV
LOCATION '/tmp/arr.csv';
0 rows in set. Query took 0.000 seconds.
>
>
> select * from arr;
+---+----+
| x | y |
+---+----+
| 1 | 10 |
| 1 | 11 |
| 1 | 3 |
| 1 | 6 |
| 2 | 20 |
| 1 | 3 |
+---+----+
6 rows in set. Query took 0.013 seconds.
> select x, percent_rank() over (partition by x order by y) from arr;
+---+----------------+
| x | PERCENT_RANK() |
+---+----------------+
| 1 | 0 |
| 1 | 0 |
| 1 | 0.5 |
| 1 | 0.75 |
| 1 | 1 |
| 2 | 0 |
+---+----------------+
> select x, dense_rank() over (partition by x order by y) from arr;
+---+--------------+
| x | DENSE_RANK() |
+---+--------------+
| 1 | 1 |
| 1 | 1 |
| 1 | 2 |
| 1 | 3 |
| 1 | 4 |
| 2 | 1 |
+---+--------------+
👍
@@ -170,4 +236,33 @@ mod tests { | |||
test_with_rank(&r, vec![1, 1, 3, 4, 4, 4, 7, 8])?; | |||
Ok(()) | |||
} | |||
|
|||
#[test] | |||
fn test_percent_rank() -> Result<()> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should also add a test for dense_rank()
here as well
@alamb in fact |
Thanks @jimexist ! |
percent_rank
and dense_rank
window functionspercent_rank
window functions
percent_rank
window functionspercent_rank
window function
I have (unbroken) the title of this PR now -- thank you for the clarification. Sorry for the mistake |
Which issue does this PR close?
[window function] add
percent_rank
functionCloses #667
Related #1076
Rationale for this change
What changes are included in this PR?
Are there any user-facing changes?