Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Fix/read parquet for empty DataFrame #6294

Merged
Merged
Show file tree
Hide file tree
Changes from 68 commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
19f5174
Merge pull request #4714 from rapidsai/branch-0.13
raydouglass Mar 30, 2020
a2804c3
REL v0.13.0 release
GPUtester Mar 31, 2020
fef2a2b
REL v0.13.0 CHANGELOG Updates
mike-wendt Apr 1, 2020
ab00eb0
Merge pull request #5310 from rapidsai/branch-0.14
raydouglass Jun 3, 2020
b34b838
REL v0.14.0 release
GPUtester Jun 3, 2020
b1dd9a5
updated concat function to allow concating an empty series with a non…
marlenezw Sep 4, 2020
4f9d1ea
updated changelog.md
marlenezw Sep 4, 2020
45db07b
Update CHANGELOG.md
galipremsagar Sep 4, 2020
124d53a
Merge branch 'branch-0.16' into feature/concat_empty_and_non_empty_se…
galipremsagar Sep 4, 2020
53a29bb
made some changes to allow an empty and non-empty series to conconcat…
marlenezw Sep 8, 2020
2326419
changes to be merged from origin
marlenezw Sep 8, 2020
77d0d0e
changes to reshape
marlenezw Sep 8, 2020
81e74d9
Merge branch 'feature/concat_empty_and_non_empty_series' of https://g…
marlenezw Sep 8, 2020
b8e728c
updated concat function to allow concating an empty series with a non…
marlenezw Sep 4, 2020
e088327
updated changelog.md
marlenezw Sep 4, 2020
e72e198
made some changes to allow an empty and non-empty series to conconcat…
marlenezw Sep 8, 2020
01c0901
updtaes to docstring to resolve merge conflict
marlenezw Sep 9, 2020
9c62595
resolve merge conflicts in docs
marlenezw Sep 9, 2020
4d9b95f
changes to reshape
marlenezw Sep 8, 2020
b6ec5a0
Update CHANGELOG.md
galipremsagar Sep 4, 2020
a1532dc
first changes to acos method.
marlenezw Sep 9, 2020
13536cb
resolving merge conflicts
marlenezw Sep 9, 2020
4f501af
updating chanelog.md
marlenezw Sep 9, 2020
ecad0f2
chanelog.md conflict resolution
marlenezw Sep 9, 2020
cff6904
changes to chanelog.md
marlenezw Sep 10, 2020
979f8a8
Update CHANGELOG.md
galipremsagar Sep 4, 2020
f0d4f40
updating chanelog.md
marlenezw Sep 9, 2020
d54d598
chanelog.md conflict resolution
marlenezw Sep 9, 2020
d400013
Delete conf.py
marlenezw Sep 10, 2020
fc1ab4f
Merge branch 'fix/output_based_on_dtype_for_acos' of https://github.c…
marlenezw Sep 10, 2020
ba05c01
removing merge conflicts and commits that were from another PR
marlenezw Sep 10, 2020
04574c0
removing merge conflicts and commits that were from another PR
marlenezw Sep 10, 2020
900bc03
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 10, 2020
4347ead
Merge branch 'fix/output_based_on_dtype_for_acos' of https://github.c…
marlenezw Sep 10, 2020
eb8e446
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 11, 2020
15cd85b
adding null mask to numbers out of the range.
marlenezw Sep 11, 2020
8cb83a1
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 11, 2020
fd8b805
fixed output type to be a float instead of integer.
marlenezw Sep 11, 2020
c094339
fixing dtype output to float instead of integer.
marlenezw Sep 11, 2020
4cdb3e5
removing two return values and fixing style issues.
marlenezw Sep 11, 2020
4e858ba
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 15, 2020
0d74021
update to chanelog.md, removing conflicts.
marlenezw Sep 15, 2020
f171c5d
added a test for this and allowed exceptions for float32
marlenezw Sep 16, 2020
b3b6fb6
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 16, 2020
b06670c
resolving style changes.
marlenezw Sep 16, 2020
14454ed
removing unnecessary space in changelog.md.
marlenezw Sep 16, 2020
eec448d
Update python/cudf/cudf/core/frame.py
marlenezw Sep 16, 2020
9ba2e0a
Update python/cudf/cudf/core/frame.py
marlenezw Sep 17, 2020
d8b15c8
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 17, 2020
4a7f2af
Merge branch 'fix/output_based_on_dtype_for_acos' of https://github.c…
marlenezw Sep 17, 2020
dc57bb1
adding upper limit for mask
marlenezw Sep 17, 2020
db523cd
using min_column_type to cast dtype.
marlenezw Sep 17, 2020
d81a096
added new min_col function to dtypes and updates frame.py
marlenezw Sep 18, 2020
a4ccca5
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 18, 2020
458ee0d
added new min_col function to dtypes and updates frame.py
marlenezw Sep 18, 2020
3c6ed31
changes to tests to ignore datatype.
marlenezw Sep 18, 2020
2bffad9
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 18, 2020
e6db996
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 18, 2020
893adb3
Update python/cudf/cudf/tests/test_ops.py
marlenezw Sep 21, 2020
e396fc6
Update python/cudf/cudf/core/frame.py
marlenezw Sep 21, 2020
3481b47
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
Sep 22, 2020
0158647
Merge branch 'fix/output_based_on_dtype_for_acos' of https://github.c…
Sep 22, 2020
2dce9b2
refactored code and changes to test_ops.py
Sep 22, 2020
745beda
changes to parquet.pyx and test_parquet.py
marlenezw Sep 22, 2020
d6df083
removing extra changes from old branch
marlenezw Sep 22, 2020
33ce62f
changes from old commits
marlenezw Sep 22, 2020
7693f70
changes to changelog.md
marlenezw Sep 22, 2020
df83b10
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
Sep 24, 2020
9b37a3d
updated tests.
Sep 24, 2020
0e983e4
fixing style issues
Sep 24, 2020
6e80d8c
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
Sep 24, 2020
89a2c98
Merge branch 'branch-0.16' of https://github.com/rapidsai/cudf into f…
marlenezw Sep 25, 2020
c92c28a
removing check categorical in this test
marlenezw Sep 25, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@
- PR #6259 Fix compilation error with GCC 8
- PR #6258 Pin libcudf conda recipe to boost 1.72.0
- PR #6264 Remove include statement for missing rmm/mr/device/default_memory_resource.hpp file
- PR #6294 Fix read parquet key error when reading empty pandas DataFrame with cudf
- PR #6285 Removed unsafe `reinterpret_cast` and implicit pointer-to-bool casts
- PR #6281 Fix unreachable code warning in datetime.cuh
- PR #6286 Fix `read_csv` `int32` overflow
Expand Down
6 changes: 5 additions & 1 deletion python/cudf/cudf/_lib/parquet.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -231,9 +231,13 @@ cpdef read_parquet(filepaths_or_buffers, columns=None, row_groups=None,
column_names.remove(index_col)

for col in column_names:
try:
data = cols_dtype_map[col]
except KeyError:
data = cols_dtype_map.get(col, None)
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved
df._data[col] = cudf.core.column.column_empty(
row_count=0,
dtype=np.dtype(cols_dtype_map[col])
dtype=np.dtype(data)
)

# Set the index column
Expand Down
29 changes: 29 additions & 0 deletions python/cudf/cudf/tests/test_parquet.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,6 +210,35 @@ def test_parquet_reader_basic(parquet_file, columns, engine):
assert_eq(expect, got, check_categorical=False)


@pytest.mark.filterwarnings("ignore:Using CPU")
@pytest.mark.parametrize("engine", ["cudf"])
def test_parquet_reader_empty_pandas_dataframe(tmpdir, engine):
df = pd.DataFrame()

fname = tmpdir.join("test_pq_reader_empty_pandas_dataframe.parquet")

df.to_parquet(fname)
assert os.path.exists(fname)

expect = pd.read_parquet(fname)
got = cudf.read_parquet(fname, engine="cudf")
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved

if len(expect) == 0:
expect = expect.reset_index(drop=True)
got = got.reset_index(drop=True)
if "col_category" in expect.columns:
expect["col_category"] = expect["col_category"].astype("category")
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved

# PANDAS returns category objects whereas cuDF returns hashes
if engine == "cudf":
if "col_category" in expect.columns:
expect = expect.drop(columns=["col_category"])
if "col_category" in got.columns:
got = got.drop(columns=["col_category"])
kkraus14 marked this conversation as resolved.
Show resolved Hide resolved

assert_eq(expect, got, check_categorical=False)


@pytest.mark.parametrize("has_null", [False, True])
@pytest.mark.parametrize("strings_to_categorical", [False, True, None])
def test_parquet_reader_strings(tmpdir, strings_to_categorical, has_null):
Expand Down