Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] update() method is missing #5543

Closed
vivanov10 opened this issue Jun 23, 2020 · 1 comment · Fixed by #6883
Closed

[FEA] update() method is missing #5543

vivanov10 opened this issue Jun 23, 2020 · 1 comment · Fixed by #6883
Assignees
Labels
feature request New feature or request Python Affects Python cuDF API.

Comments

@vivanov10
Copy link

Is your feature request related to a problem? Please describe.

update() method is missing

Describe the solution you'd like

Implement the update() method as in pandas

Describe alternatives you've considered

cuDF implementation of update() I am using:

def update(df, other, join="left", overwrite=True, errors="ignore"):
    # TODO: Support other joins
    if join != "left":  # pragma: no cover
        raise NotImplementedError("Only left join is supported")
    if errors not in ["ignore", "raise"]:
        raise ValueError("The parameter errors must be either 'ignore' or 'raise'")

    other = other.reindex(df.index, axis=0)
    other = other.reindex(df.columns, axis=1)

    for col in df.columns:
        this = df[col]
        that = other[col]

        if errors == "raise":
            mask_this = that.notna()
            mask_that = this.notna()
            if any(mask_this & mask_that):
                raise ValueError("Data overlaps.")

        if overwrite:
            mask = that.isna()
        else:
            mask = this.notna()

        # don't overwrite columns unnecessarily
        if mask.all():
            continue

        df.loc[mask, col] = this[mask]
        df.loc[~mask, col] = that[~mask]

cuDF example:

import cudf as pd

data = pd.DataFrame(
   data=([[1, 10, 100, 1000, 10000], [2, 20, 200, 2000, 20000]]), columns=['a', 'b', 'c', 'd', 'e']
)

data1 = pd.DataFrame(
   data=([[3, 33], [2, 22]]), columns=['a', 'b']
)
update(data, data1)
data

Additional context

This is how it works in pandas:

import pandas as pd

data = pd.DataFrame(
   data=([[1, 10, 100, 1000, 10000], [2, 20, 200, 2000, 20000]]), columns=['a', 'b', 'c', 'd', 'e']
)

data1 = pd.DataFrame(
   data=([[3, 33], [2, 22]]), columns=['a', 'b']
)
data.update(data1)
data

While it raises an exception in cuDF:

import cudf as pd

data = pd.DataFrame(
   data=([[1, 10, 100, 1000, 10000], [2, 20, 200, 2000, 20000]]), columns=['a', 'b', 'c', 'd', 'e']
)

data1 = pd.DataFrame(
   data=([[3, 33], [2, 22]]), columns=['a', 'b']
)
data.update(data1) # no attribute

Traceback:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-23-deb292f322d6> in <module>
      9    data=([[3, 33], [2, 22]]), columns=['a', 'b']
     10 )
---> 11 data.update(data1) # no attribute

~/miniconda3/envs/rapids14/lib/python3.7/site-packages/cudf/core/dataframe.py in __getattr__(self, key)
    430                 return self[key]
    431 
--> 432         raise AttributeError("'DataFrame' object has no attribute %r" % key)
    433 
    434     @annotate("DATAFRAME_GETITEM", color="blue", domain="cudf_python")

AttributeError: 'DataFrame' object has no attribute 'update'
@vivanov10 vivanov10 added Needs Triage Need team to review and classify feature request New feature or request labels Jun 23, 2020
@galipremsagar galipremsagar self-assigned this Jun 23, 2020
@galipremsagar galipremsagar added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Jun 23, 2020
@kkraus14
Copy link
Collaborator

kkraus14 commented Dec 2, 2020

Pushing to 0.18

rapids-bot bot pushed a commit that referenced this issue Jan 21, 2021
Resolves: #5543

This PR adds support for updating a DataFrame with non-NA values from another DataFrame, whereby only the values at matching index/column labels are updated. Only left join is supported, keeping the index and columns of original DataFrame.

Authors:
  - @skirui-source

Approvers:
  - GALI PREM SAGAR (@galipremsagar)
  - Michael Wang (@isVoid)

URL: #6883
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants