-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix consumption of CPU-backed interchange protocol dataframes #11392
Fix consumption of CPU-backed interchange protocol dataframes #11392
Conversation
This PR has been labeled |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shwina This popped back up on my radar after you bumped the branch. If this is still active, I think it only needs a bit more work to be complete?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor nitpicks, but otherwise looks good.
python/cudf/cudf/core/df_protocol.py
Outdated
class _MaskKind(enum.IntEnum): | ||
NON_NULLABLE = (0,) | ||
NAN = (1,) | ||
SENTINEL = (2,) | ||
BITMASK = (3,) | ||
BYTEMASK = 4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it deliberate that BYTEMASK
is an int, but the other mask kinds are tuples? (They both turn into ints on construction, but the tuple instantiation looks kind of weird.
python/cudf/cudf/core/df_protocol.py
Outdated
if not allow_copy: | ||
raise TypeError( | ||
"This operation must copy data from CPU to GPU. " | ||
"Set `allow_copy=True` to allow it." | ||
) | ||
else: | ||
dbuf = rmm.DeviceBuffer(ptr=buf.ptr, size=buf.bufsize) | ||
return _CuDFBuffer( | ||
as_buffer(dbuf, exposed=True), | ||
protocol_dtype_to_cupy_dtype(data_type), | ||
allow_copy, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Style, I (at least) find double-negated conditions hard to parse. Perhaps have the truthy case in the if, and the exceptional case in the else branch:
if allow_copy:
dbuf = ...
return ...
else:
raise TypeError(...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Removing |
@shwina what's the status of this PR? |
…rchange-protocol-error-cross-device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation looks good, a minor request to add a little more testing in the pandas<->cudf interop tests
) | ||
def test_from_cpu_df(pandas_df): | ||
df = pd.DataFrame({"a": [1, 2, 3], "b": ["x", "y", "z"]}) | ||
cudf.from_dataframe(df, allow_copy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cudf.from_dataframe(df, allow_copy=True) | |
cdf = cudf.from_dataframe(df, allow_copy=True) | |
assert_eq(df, cdf) |
Add a case where there are nulls in the pandas frame?
Also can we check the round-trip in the opposite direction? (Assuming that is supported too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I added some null tests.
Also can we check the round-trip in the opposite direction? (Assuming that is supported too).
Today, that segfaults because Pandas doesn't check for the device type of the incoming buffers and tries to interpret the buffer pointers as host pointers :(
…rchange-protocol-error-cross-device
…com:shwina/cudf into fix-interchange-protocol-error-cross-device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One minor change for docstring syntax, otherwise LGTM.
Co-authored-by: Bradley Dice <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, apologies for the delay
/merge |
Description
Closes #11245. This PR fixes a bug in our code that consumes a protocol DataFrame that is backed by CPU memory.
This enables using the
from_dataframe()
function to construct DataFrames from other libraries:Checklist