Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some functions implemented through reduction operation arise exceptions #1953

Closed
YarShev opened this issue Aug 25, 2020 · 3 comments · Fixed by #1960
Closed

Some functions implemented through reduction operation arise exceptions #1953

YarShev opened this issue Aug 25, 2020 · 3 comments · Fixed by #1960
Assignees
Labels
bug 🦗 Something isn't working

Comments

@YarShev
Copy link
Collaborator

YarShev commented Aug 25, 2020

Describe the problem

Some functions (such as median, skew, std, var, sum, prod) arise an exception when level specified.

Source code / logs

import modin.pandas as pd
import pandas
arrays = [['1', '1', '1', '2', '2', '2', '3', '3', '3'], ['1', '2', '3', '4', '5', '6', '7', '8', '9']]
ps = pandas.Series([3,3,3,3,3,3,3,3,3], index=arrays)
ms = pd.Series([3,3,3,3,3,3,3,3,3], index=arrays)
ps.std(level=0)
1    0.0
2    0.0
3    0.0
dtype: float64
ms.std(level=0) # the same exception is arisen for median, skew, var, sum, prod
ValueError: Length mismatch: Expected axis has 3 elements, new values have 1 elements
arrays = [['1', '1', '2', '2'], ['1', '2', '3', '4']]
pdf = pandas.DataFrame([[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]], index=arrays)
mdf = pd.DataFrame([[1,2,3,4], [1,2,3,4], [1,2,3,4], [1,2,3,4]], index=arrays)
pdf.std(level=0)
     0    1    2    3
1  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0
mdf.std(level=0) # the same exception is arisen for median, skew, var, sum, prod
ValueError: Length mismatch: Expected axis has 2 elements, new values have 1 elements
@YarShev YarShev added the bug 🦗 Something isn't working label Aug 25, 2020
@YarShev YarShev self-assigned this Aug 25, 2020
YarShev added a commit to YarShev/modin that referenced this issue Sep 2, 2020
for reduction operation

Signed-off-by: Igoshev, Yaroslav <[email protected]>
devin-petersohn pushed a commit that referenced this issue Sep 3, 2020
for reduction operation

Signed-off-by: Igoshev, Yaroslav <[email protected]>
anmyachev added a commit to anmyachev/modin that referenced this issue Sep 4, 2020
anmyachev added a commit to anmyachev/modin that referenced this issue Sep 7, 2020
dchigarev pushed a commit that referenced this issue Sep 8, 2020
* TEST-#2033: speed up test_series.py

Signed-off-by: Anatoly Myachev <[email protected]>

* TEST-#2033: fix kurtosis test

Signed-off-by: Anatoly Myachev <[email protected]>

* TEST-#2033: refactor test for #1953

Signed-off-by: Anatoly Myachev <[email protected]>
aregm pushed a commit to aregm/modin that referenced this issue Sep 16, 2020
aregm pushed a commit to aregm/modin that referenced this issue Sep 16, 2020
* TEST-modin-project#2033: speed up test_series.py

Signed-off-by: Anatoly Myachev <[email protected]>

* TEST-modin-project#2033: fix kurtosis test

Signed-off-by: Anatoly Myachev <[email protected]>

* TEST-modin-project#2033: refactor test for modin-project#1953

Signed-off-by: Anatoly Myachev <[email protected]>
@mvashishtha
Copy link
Collaborator

@YarShev

Should these functions be working? I noticed today that e.g. in this test, Modin defaults to pandas for median, skew, std, var, and sem. Taking the "median" version of that test:

import modin.pandas as pd
arrays = [["1", "1", "2", "2"], ["1", "2", "3", "4"]]
data = [[1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4], [1, 2, 3, 4]]
modin_df = pd.DataFrame(data, index=arrays)
modin_df.median(level=0)
UserWarning: `DataFrame.median` defaulting to pandas implementation.
To request implementation, send an email to [email protected].
FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.median(level=1) should use df.groupby(level=1).median().
UserWarning: Distributing <class 'pandas.core.frame.DataFrame'> object. This may take some time.

@YarShev
Copy link
Collaborator Author

YarShev commented Nov 17, 2021

@mvashishtha , yes, they are working but in case level is specified we started defaulting to pandas for them in #2655. We should probably add a note in docs that we use default pandas implementation for the operations when level is specified. Would you get this done?

@mvashishtha
Copy link
Collaborator

@YarShev Yes, I created #3698 for that. Please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants