[BUG] tail method sometimes fail #2495

jangorecki · 2019-08-07T20:15:21Z

After running a query, I am getting ans frame. head method works fine on it, but tail method fails. This happens rarely and strongly depends on the data. Using 0.8.0+0.g8fa7bd3.dirty.

>>> ans = x.groupby(['id1'],as_index=False).agg({'v1':'sum'}).reset_index(drop=True)
>>> print(ans.head(3), flush=True)
     id1        v1
0  id001  15006850
1  id002  14994166
>>> print(ans.tail(3), flush=True)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe/dataframe.py", line 491, in __str__
    return self.to_string(nrows=nrows, ncols=ncols)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe/dataframe.py", line 480, in to_string
    cols[h] = self[h].values_to_string(nrows=nrows)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe/series.py", line 354, in values_to_string
    out = [str(v) for v in values]
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe/series.py", line 354, in <listcomp>
    out = [str(v) for v in values]
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe/series.py", line 302, in __getitem__
    return self._column.element_indexing(arg)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe/column.py", line 412, in element_indexing
    val = self.data[index]  # this can raise IndexError
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe/buffer.py", line 149, in __getitem__
    return item.view(self.dtype)
AttributeError: 'NoneType' object has no attribute 'view'
>>> ans.dtypes
id1    object
v1      int64
dtype: object

I can provide reproducible example but it will not be minimal... the one provided in #2494 (comment) might work after changing K=2.

The text was updated successfully, but these errors were encountered:

kkraus14 · 2019-08-15T23:03:16Z

@jangorecki this should be fixed in the latest nightlies. This was due to nulls being improperly handled as Python None objects as opposed to numpy scalars.

jangorecki · 2019-08-25T07:48:27Z

@kkraus14
I don't think the issue is fixed. In 0.8.0 it was also raising segfault.
After upgrade to 0.9.0 I am not getting segfault so far, but print of tail is still raising exception.
h2oai/db-benchmark#102

Traceback (most recent call last):
  File "./cudf/groupby-cudf.py", line 56, in <module>
    print(ans.tail(3), flush=True)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe
/dataframe.py", line 553, in __str__
    return self.to_string()
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe
/dataframe.py", line 550, in to_string
    return self.__repr__()
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe
/dataframe.py", line 591, in __repr__
    output = self.get_renderable_dataframe()
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe
/dataframe.py", line 582, in get_renderable_dataframe
    output._cols[col].astype("str").str.fillna("null")
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe
/series.py", line 1383, in astype
    return self._copy_construct(data=self._column.astype(dtype, **kwargs))
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe
/columnops.py", line 137, in astype
    return self.as_string_column(dtype, **kwargs)
  File "/home/jan/anaconda3/envs/cudf/lib/python3.6/site-packages/cudf/dataframe
/numerical.py", line 129, in as_string_column
    np.dtype(dev_array.dtype)
KeyError: dtype('O')

rgsl888prabhu · 2019-09-19T20:41:41Z

@jangorecki Does it fail always or depends on data generated for 0.9?

jangorecki · 2019-09-20T11:22:42Z

@rgsl888prabhu depends on the data, among 4 different cases of cardinality factor ("K") the issue manifests only in one case. You can generate exact data that cause the problem by following initial instructions.

rgsl888prabhu · 2019-09-20T14:06:06Z

@jangorecki I tried to reproduce using 0.9, but I wasn't able to do so. If you have that .csv file through which you can reproduce, please share it. Meanwhile, I will try to figure out the issue and reproduce it from my end.

jangorecki · 2019-09-20T17:01:00Z

@rgsl888prabhu I have the csv but it is 45 GB size.
csv was generated from a script so it make sense to run a script to produce the same csv rather than sharing 45 GB file.

rgsl888prabhu · 2019-09-20T17:42:05Z

Do you remember the random seed that you had set, I don't see it in the script.

jangorecki · 2019-09-20T18:17:16Z

There is a random seed set in the script:

wget https://raw.githubusercontent.com/h2oai/db-benchmark/master/groupby-datagen.R
Rscript groupby-datagen.R 1e9 2 0 0

rgsl888prabhu · 2019-09-20T21:18:12Z

Thank you @jangorecki, I am able to reproduce scenario.

rgsl888prabhu · 2019-09-23T19:08:25Z

Simplified code to reproduce

import cudf
import numpy as np
id1 = cudf.Series(['a', 'b'], dtype=np.object)
v1 = cudf.Series([1,2])
s = cudf.DataFrame()
s['id1'] = id1
s['v1'] = v1
print(s.tail(3))

jangorecki added Needs Triage Need team to review and classify bug Something isn't working labels Aug 7, 2019

jangorecki mentioned this issue Aug 7, 2019

Add cudf (RAPIDS) h2oai/db-benchmark#44

Closed

kkraus14 closed this as completed Aug 15, 2019

jangorecki mentioned this issue Aug 25, 2019

cudf cryptic error when printing tail of answer from 1e7 k=2 h2oai/db-benchmark#102

Closed

kkraus14 reopened this Aug 26, 2019

kkraus14 added Python Affects Python cuDF API. and removed Needs Triage Need team to review and classify labels Aug 26, 2019

rgsl888prabhu self-assigned this Sep 19, 2019

rgsl888prabhu mentioned this issue Sep 24, 2019

[REVIEW] Fix tail method issue for string columns #2859

Merged

kkraus14 closed this as completed in #2859 Sep 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] tail method sometimes fail #2495

[BUG] tail method sometimes fail #2495

jangorecki commented Aug 7, 2019 •

edited

Loading

kkraus14 commented Aug 15, 2019

jangorecki commented Aug 25, 2019

rgsl888prabhu commented Sep 19, 2019 •

edited

Loading

jangorecki commented Sep 20, 2019

rgsl888prabhu commented Sep 20, 2019 •

edited

Loading

jangorecki commented Sep 20, 2019

rgsl888prabhu commented Sep 20, 2019

jangorecki commented Sep 20, 2019

rgsl888prabhu commented Sep 20, 2019

rgsl888prabhu commented Sep 23, 2019

[BUG] tail method sometimes fail #2495

[BUG] tail method sometimes fail #2495

Comments

jangorecki commented Aug 7, 2019 • edited Loading

kkraus14 commented Aug 15, 2019

jangorecki commented Aug 25, 2019

rgsl888prabhu commented Sep 19, 2019 • edited Loading

jangorecki commented Sep 20, 2019

rgsl888prabhu commented Sep 20, 2019 • edited Loading

jangorecki commented Sep 20, 2019

rgsl888prabhu commented Sep 20, 2019

jangorecki commented Sep 20, 2019

rgsl888prabhu commented Sep 20, 2019

rgsl888prabhu commented Sep 23, 2019

jangorecki commented Aug 7, 2019 •

edited

Loading

rgsl888prabhu commented Sep 19, 2019 •

edited

Loading

rgsl888prabhu commented Sep 20, 2019 •

edited

Loading