-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Loc and iloc Indexing with a scalar on list columns fails #8032
Comments
Additional context: import cudf
s = cudf.Series([[0,1], [0,2]])
print(s.loc[1:1]) # succeeds
s.loc[1]
1 [0, 2]
dtype: list
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-67-7c294f9b6e79> in <module>
3 s = cudf.Series([[0,1], [0,2]])
4 print(s.loc[1:1])
----> 5 s.loc[1]
/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210421/lib/python3.8/site-packages/cudf/core/indexing.py in __getitem__(self, arg)
144 raise KeyError(arg)
145
--> 146 return self._sr.iloc[arg]
147
148 def __setitem__(self, key, value):
/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210421/lib/python3.8/site-packages/cudf/core/indexing.py in __getitem__(self, arg)
81 if isinstance(arg, tuple):
82 arg = list(arg)
---> 83 data = self._sr._column[arg]
84
85 if is_scalar(data) or _is_null_host_scalar(data):
/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210421/lib/python3.8/site-packages/cudf/core/column/column.py in __getitem__(self, arg)
669 def __getitem__(self, arg) -> Union[ScalarLike, ColumnBase]:
670 if is_scalar(arg):
--> 671 return self.element_indexing(int(arg))
672 elif isinstance(arg, slice):
673 start, stop, stride = arg.indices(len(self))
/raid/nicholasb/miniconda3/envs/rapids-gpubdb-20210421/lib/python3.8/site-packages/cudf/core/column/column.py in element_indexing(self, index)
647 raise IndexError("single positional indexer is out-of-bounds")
648
--> 649 return libcudf.copying.get_element(self, idx).value
650
651 def slice(self, start: int, stop: int, stride: int = None) -> ColumnBase:
cudf/_lib/copying.pyx in cudf._lib.copying.get_element()
RuntimeError: cuDF failure at: ../src/copying/get_element.cu:125: get_element_functor not supported for list_view
|
cc @isVoid as this is likely related to list scalars |
I think we'll need list scalar support built out at least in cython for this to work. Generally a I get the same error just doing this:
As for what the value should be when it gets to the user that is of course an open question, but I am leaning towards a pyarrow array so that nulls can be handled and away from a numpy array of object dtype. |
I'm not 100% sure about returning PyArrow objects directly to the user. There's no precedent for doing that, and users who haven't used PyArrow before will now have yet another library to learn about. A plain list of NumPy scalars, or a NumPy array (both with |
Between the two of those I'd rather we do a python list containing numpy scalars and cudf.NA than a numpy array of object type. I just think that it becomes a little hard to understand exactly what |
Part1 of #8032 This PR adds retrieval of row data from a `LIST` type column, through adding support to `list_view` specialization of `get_element`. The row data is stored in a scalar object. Use example: ``` // non-nested LIST column col = [{1, 2, 3}, {4}] s = get_element(col, 1); // s is a type erased list_scalar, s._data == int_column{4} // nested LIST column col = [[{1, 2}, {3}], [{4}, {}]] s = get_element(col, 1); // s is a type erased list_scalar, s._data == list_column{{4}, {}} ``` Implementation note: Depends on `lists::detail::copy_slice` under the hood. Also adds a new `list_scalar` constructor that supports moving external row data to construct a new scalar. Other included in this PR: - `is_element_valid_sync(column, i)`, helper function that returns true if `i`th row of `column` is valid. - `list_scalar` factory functions - Developer guide for `list_scalar` Authors: - Michael Wang (https://github.com/isVoid) Approvers: - Robert Maynard (https://github.com/robertmaynard) - AJ Schmidt (https://github.com/ajschmidt8) - Mark Harris (https://github.com/harrism) - https://github.com/nvdbaranec - Nghia Truong (https://github.com/ttnghia) URL: #8071
This issue has been labeled |
Fixed by #8265 |
Describe the bug
Getting a runtime error when viewing a list view after an iloc or loc after a string split. Ex.:
Steps/Code to reproduce bug
Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.
Expected behavior
array(['Bad Zipcode', 'Technical Glitch'], dtype=object)
Environment overview (please complete the following information)
docker pull
&docker run
commands usedconda install. cuDF version 0.18.1
Environment details
Please run and paste the output of the
cudf/print_env.sh
script here, to gather any other relevant environment detailsAdditional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: