Skip to content

Commit

Permalink
Validate and materialize iterators earlier in as_column (#15739)
Browse files Browse the repository at this point in the history
closes #8796

I left a `TODO` in `as_column` to validate earlier that `arbitrary` is an iterable or sequence like if it wasn't a recognized array like (e.g. numpy array, pandas object, etc). Additionally, ensure we materialize iterators since there are some checks that would exhaust the object

Authors:
  - Matthew Roeschke (https://github.com/mroeschke)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #15739
  • Loading branch information
mroeschke authored May 14, 2024
1 parent 0f6ce63 commit 4069c82
Showing 1 changed file with 9 additions and 7 deletions.
16 changes: 9 additions & 7 deletions python/cudf/cudf/core/column/column.py
Original file line number Diff line number Diff line change
Expand Up @@ -2070,8 +2070,15 @@ def as_column(
except (ValueError, TypeError):
arbitrary = np.asarray(arbitrary)
return as_column(arbitrary, dtype=dtype, nan_as_null=nan_as_null)
elif not isinstance(arbitrary, (abc.Iterable, abc.Sequence)):
raise TypeError(
f"{type(arbitrary).__name__} must be an iterable or sequence."
)
elif isinstance(arbitrary, abc.Iterator):
arbitrary = list(arbitrary)

# Start of arbitrary that's not handed above but dtype provided
elif isinstance(dtype, pd.DatetimeTZDtype):
if isinstance(dtype, pd.DatetimeTZDtype):
raise NotImplementedError(
"Use `tz_localize()` to construct timezone aware data."
)
Expand Down Expand Up @@ -2127,11 +2134,7 @@ def as_column(
return cudf.core.column.ListColumn.from_sequences(arbitrary)
raise
return as_column(data, nan_as_null=nan_as_null)
elif not isinstance(arbitrary, (abc.Iterable, abc.Sequence)):
# TODO: This validation should probably be done earlier?
raise TypeError(
f"{type(arbitrary).__name__} must be an iterable or sequence."
)

from_pandas = nan_as_null is None or nan_as_null
if dtype is not None:
dtype = cudf.dtype(dtype)
Expand All @@ -2147,7 +2150,6 @@ def as_column(
arbitrary = pd.Series(arbitrary, dtype=dtype)
return as_column(arbitrary, nan_as_null=nan_as_null, dtype=dtype)
else:
arbitrary = list(arbitrary)
for element in arbitrary:
# Carve-outs that cannot be parsed by pyarrow/pandas
if is_column_like(element):
Expand Down

0 comments on commit 4069c82

Please sign in to comment.