Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

using .iloc to assign new value #1556

Closed
ndy-cd opened this issue Jun 9, 2020 · 3 comments · Fixed by #1788
Closed

using .iloc to assign new value #1556

ndy-cd opened this issue Jun 9, 2020 · 3 comments · Fixed by #1788
Assignees
Labels
bug 🦗 Something isn't working
Milestone

Comments

@ndy-cd
Copy link

ndy-cd commented Jun 9, 2020

System information

  • Linux Ubuntu 19.10
  • Modin 0.7.3
  • Python 3.7.7

Problem:

When trying to use .iloc to assign a new value there are no changes taking place.

I already left a comment on the issue #147 but it's already closed and I wasn't sure that you can see it.

Code to reproduce:

import modin.pandas as pd #dask
test = pd.DataFrame({'first': range(10)})
test['first'].iloc[0] = 500
#or test['first'].iloc[[0, 1, 2]] = 500
print(test) # there is no changes
@ndy-cd ndy-cd added the bug 🦗 Something isn't working label Jun 9, 2020
@devin-petersohn
Copy link
Collaborator

Thanks @ndy-cd, we prefer the new issue since the old one was resolved. This issue is slightly different because of the assignment.

So what is happening is that the test['first'].iloc assignment is happening inplace in pandas. Modin is doing the assignment, but not propagating the test['first'] back to the original dataframe. It will be straightforward to fix, the two types of indexing case in the same statement (first, and iloc) was not handled.

Assignment with loc/iloc will work if you don't access the column first. A simple temporary workaround for the double indexing is to turn it into a single loc call:

import modin.pandas as pd #dask
test = pd.DataFrame({'first': range(10)})
test.loc[0, 'first'] = 500  # 0 is the Index value of the 0th row
#or test.iloc[[0, 1, 2], 'first'] = 500
print(test)

If you prefer to use iloc you can also do this:

import modin.pandas as pd #dask
test = pd.DataFrame({'first': range(10)})
test.iloc[0, 0] = 500  # 0 is the column and row number
#or test.iloc[[0, 1, 2], 0] = 500
print(test)

We have the machinery in place to support this, but it looks like something got lost with the double indexing. Thanks for the detailed report! We will get this fixed.

@devin-petersohn devin-petersohn added this to the 0.7.4 milestone Jun 9, 2020
@ndy-cd
Copy link
Author

ndy-cd commented Jun 10, 2020

Thank you very much for detailed answer!

It seems to be a same problem with using .replace()
test['first'].replace({500: 40}, inplace=True)

@devin-petersohn
Copy link
Collaborator

Thanks @ndy-cd, that is expected behavior.

Pandas has many ways to do the same thing as you probably know, we have one internal way to do things so that we can better optimize these fundamental operations.

@devin-petersohn devin-petersohn self-assigned this Jul 23, 2020
devin-petersohn added a commit to devin-petersohn/modin that referenced this issue Jul 23, 2020
gshimansky pushed a commit that referenced this issue Jul 24, 2020
aregm pushed a commit to aregm/modin that referenced this issue Sep 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🦗 Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants