Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] loc doesn't enlarge dataframe on fresh column names #7628

Closed
magnatelee opened this issue Mar 17, 2021 · 0 comments · Fixed by #8012
Closed

[BUG] loc doesn't enlarge dataframe on fresh column names #7628

magnatelee opened this issue Mar 17, 2021 · 0 comments · Fixed by #8012
Assignees
Labels
bug Something isn't working Python Affects Python cuDF API.

Comments

@magnatelee
Copy link
Contributor

Describe the bug
In Pandas, setting a value to loc with a fresh column name inserts a new column to the dataframe. In cuDF, this is a hard error.

Steps/Code to reproduce bug

>>> import pandas as pd
>>> x = pd.DataFrame()
>>> x.loc[:, "a"] = [1,2,3]
>>> x
   a
0  1
1  2
2  3
>>> import cudf
>>> x = cudf.DataFrame()
>>> x.loc[:, "a"] = [1,2,3]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 186, in __setitem__
    return self._setitem_tuple_arg(key, value)
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/contextlib.py", line 74, in inner
    return func(*args, **kwds)
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 370, in _setitem_tuple_arg
    columns = self._get_column_selection(key[1])
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/indexing.py", line 376, in _get_column_selection
    return self._df._get_columns_by_label(arg)
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/frame.py", line 475, in _get_columns_by_label
    new_data = self._data.select_by_label(labels)
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/column_accessor.py", line 219, in select_by_label
    return self._select_by_label_grouped(key)
  File "/home/nfs/wonchanl/anaconda3/envs/rapids-tpcx-bb/lib/python3.7/site-packages/cudf/core/column_accessor.py", line 267, in _select_by_label_grouped
    result = self._grouped_data[key]
KeyError: 'a'
@magnatelee magnatelee added bug Something isn't working Needs Triage Need team to review and classify labels Mar 17, 2021
@kkraus14 kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Mar 26, 2021
@isVoid isVoid self-assigned this Apr 20, 2021
rapids-bot bot pushed a commit that referenced this issue Apr 21, 2021
closes #7628 

This PR adds support to setting a column in the dataframe when the provided column name is a new column name. The specified rows can be of a single row label, a collection of row labels, or slices. The value-to-set can be column-like object or scalar. E.g. you can now do this:

```
>>> x = cudf.DataFrame()
>>> x.loc[:, "a"] = [1, 2, 3] # set a new column with list
>>> x
   a
0  1
1  2
2  3
>>> x.loc[[1, 2], "b"] = ["abc", "cba"] # set part of the new column with list
>>> x
   a     b
0  1  <NA>
1  2   abc
2  3   cba
>>> x.loc[:, "c"] = 5 # set the new column to the scalar
>>> x
   a     b  c
0  1  <NA>  5
1  2   abc  5
2  3   cba  5
```

Authors:
  - Michael Wang (https://github.com/isVoid)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #8012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants