You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
susmitpy opened this issue
Jan 2, 2023
· 2 comments
Labels
bug 🦗Something isn't workingExternalPull requests and issues from people who do not regularly contribute to modinP1Important tasks that we should complete soon
Whenever groupby and transform is used within df.eval(), it looks like the aggregation is being performed on individual partitions and hence the final result is not proper (my guess).
In the example, since there are only two groups, the count of unique minimum values in the result should be only 2.
This is correctly demonstrated by [1, 501] when the operation is performed normally.
However when the same operation is performed and the expression is passed as a string, the result is wrong.
Expected Behavior
It should work in the same way it is working when it is performed normally.
The aggregated value, minimum value for each group should be only one per group. i.e. 1 for group A and 501 for group B
Error Logs
No response
Installed Versions
Checked on two different versions
Check 1
Modin dependencies
modin : 0.18.0
ray : 2.2.0
pandas dependencies
pandas : 1.5.2
numpy : 1.22.2
Check 2
Modin dependencies
modin : 0.15.3
ray : 1.9.0
pandas dependencies
pandas : 1.4.4
numpy : 1.21.2
The text was updated successfully, but these errors were encountered:
Hi @susmitpy! Thank you so much for opening this issue! I've verified that I can reproduce it locally, as well as confirmed, that the offending lines seem to be these:
where the eval is applied full-axis across the column axis (in order to make sure we don't get KeyError's on the column names int he eval expression), but not full-axis across the row axis. I'm not 100% sure what the best solution is here - we could try parsing the expression and seeing if it requires full-column-axis or can be satisfied with being row-axis and broadcasting columns as necessary and always perform eval full-row-axis, but I'm not sure that that's the best solution. Would love to hear your thoughts as well!
bug 🦗Something isn't workingExternalPull requests and issues from people who do not regularly contribute to modinP1Important tasks that we should complete soon
Modin version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest released version of Modin.
I have confirmed this bug exists on the main branch of Modin. (In order to do this you can follow this guide.)
Reproducible Example
Issue Description
Whenever
groupby
andtransform
is used withindf.eval()
, it looks like the aggregation is being performed on individual partitions and hence the final result is not proper (my guess).In the example, since there are only two groups, the count of unique minimum values in the result should be only 2.
This is correctly demonstrated by
[1, 501]
when the operation is performed normally.However when the same operation is performed and the expression is passed as a string, the result is wrong.
Expected Behavior
It should work in the same way it is working when it is performed normally.
The aggregated value, minimum value for each group should be only one per group. i.e. 1 for group A and 501 for group B
Error Logs
No response
Installed Versions
Checked on two different versions
Check 1
Modin dependencies
modin : 0.18.0
ray : 2.2.0
pandas dependencies
pandas : 1.5.2
numpy : 1.22.2
Check 2
Modin dependencies
modin : 0.15.3
ray : 1.9.0
pandas dependencies
pandas : 1.4.4
numpy : 1.21.2
The text was updated successfully, but these errors were encountered: