-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
count not aggregating #3421
Comments
@andrewgazelka the way that we perform this kind of count in SQL is by doing a
|
sounds like we can fix by adding this to the optimizer. Daft/src/daft-logical-plan/src/builder.rs Line 592 in f6eb993
|
@colin-ho What are your thoughts? |
Idk if this should be an optimization, but instead either be special handling for spark connect since our implementations diverge, or we change our algorithm to match spark's. If you look at other engines such as polars, it resolves to I think the quickest path for the spark-connect use case is to map |
The issue is we cannot really map potentially could see if it is 1:1 with the protobuf mentioned at the top of the issue but unsure if this is an acceptable solution. Also for added context this is how we handle count: Daft/src/daft-connect/src/translation/expr/unresolved_function.rs Lines 31 to 44 in 3394a66
|
this is essentially the same as what we're doing in SQL, but instead of |
+1 on this, makes sense to just corner case |
I added a special case |
yields
{'literal': [1]}
However, I believe the correct result should be
{'literal': [10]}
This is required to work as the plan created for
.count()
by spark connect is a similar type of aggregation:protobuf plan below?
want to try it yourself?
Daft/tests/connect/test_count.py
Lines 4 to 13 in ab26cbc
The text was updated successfully, but these errors were encountered: