Skip to content

Commit

Permalink
simplifying skiprows test in test_orc.py (#10783)
Browse files Browse the repository at this point in the history
@bdice helped me look into an issue with deprecated warnings in #10772 and in the process, he pointed out that the skiprows test was unnecessarily complex. We looked into it some and it appeared to be a copy/paste of a more complex test. He asked that I make this PR to simplify this test, but all the credit for noticing and fixing it is his.

Authors:
  - Mike Wilson (https://github.com/hyperbolic2346)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #10783
  • Loading branch information
hyperbolic2346 authored May 5, 2022
1 parent ee26fbe commit 4ce7b65
Showing 1 changed file with 23 additions and 14 deletions.
37 changes: 23 additions & 14 deletions python/cudf/cudf/tests/test_orc.py
Original file line number Diff line number Diff line change
Expand Up @@ -301,27 +301,36 @@ def test_orc_read_rows(datadir, skiprows, num_rows):
assert_eq(pdf, gdf)


def test_orc_read_skiprows(tmpdir):
def test_orc_read_skiprows():
buff = BytesIO()
df = pd.DataFrame(
{"a": [1, 0, 1, 0, None, 1, 1, 1, 0, None, 0, 0, 1, 1, 1, 1]},
dtype=pd.BooleanDtype(),
)
data = [
True,
False,
True,
False,
None,
True,
True,
True,
False,
None,
False,
False,
True,
True,
True,
True,
]
writer = pyorc.Writer(buff, pyorc.Struct(a=pyorc.Boolean()))
tuples = list(
map(
lambda x: (None,) if x[0] is pd.NA else (bool(x[0]),),
list(df.itertuples(index=False, name=None)),
)
)
writer.writerows(tuples)
writer.writerows([(d,) for d in data])
writer.close()

# testing 10 skiprows due to a boolean specific bug fix that didn't
# repro for other sizes of data
skiprows = 10

expected = cudf.read_orc(buff)[skiprows::].reset_index(drop=True)
expected = cudf.read_orc(buff)[skiprows:].reset_index(drop=True)
got = cudf.read_orc(buff, skiprows=skiprows)

assert_eq(expected, got)


Expand Down

0 comments on commit 4ce7b65

Please sign in to comment.