Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support multiple input files/buffers for read_json #8403

Merged
merged 13 commits into from
Jun 17, 2021

Conversation

jdye64
Copy link
Contributor

@jdye64 jdye64 commented May 30, 2021

Adds the support for multiple input files/buffers for read_json() so that users can specify multiple files/buffers to generate a single dataframe ("partition") as the result of read_json.

This closes #8320

@jdye64 jdye64 requested a review from a team as a code owner May 30, 2021 23:35
@jdye64 jdye64 requested review from mythrocks and ttnghia May 30, 2021 23:35
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label May 30, 2021
@jdye64
Copy link
Contributor Author

jdye64 commented May 30, 2021

I do not have permissions to edit labels. I would greatly appreciate it if someone could add them for me. Thanks!

@jdye64 jdye64 marked this pull request as draft May 30, 2021 23:37
cpp/src/io/json/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/json/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/json/reader_impl.hpp Outdated Show resolved Hide resolved
@ttnghia ttnghia added feature request New feature or request non-breaking Non-breaking change labels May 31, 2021
@jdye64
Copy link
Contributor Author

jdye64 commented Jun 1, 2021

rerun tests

@github-actions github-actions bot added the Python Affects Python cuDF API. label Jun 7, 2021
@codecov
Copy link

codecov bot commented Jun 7, 2021

Codecov Report

Merging #8403 (a3890b7) into branch-21.08 (93ce6c7) will increase coverage by 0.01%.
The diff coverage is 94.11%.

❗ Current head a3890b7 differs from pull request most recent head b89f085. Consider uploading reports for the commit b89f085 to get more accurate results
Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.08    #8403      +/-   ##
================================================
+ Coverage         82.57%   82.59%   +0.01%     
================================================
  Files               109      109              
  Lines             17870    17858      -12     
================================================
- Hits              14757    14750       -7     
+ Misses             3113     3108       -5     
Impacted Files Coverage Δ
python/cudf/cudf/core/frame.py 93.25% <ø> (+0.29%) ⬆️
python/cudf/cudf/io/json.py 95.65% <90.90%> (-0.90%) ⬇️
python/cudf/cudf/core/column/column.py 87.45% <100.00%> (-0.07%) ⬇️
python/cudf/cudf/core/column_accessor.py 96.82% <100.00%> (ø)
python/cudf/cudf/core/dataframe.py 90.86% <100.00%> (+0.21%) ⬆️
python/cudf/cudf/utils/ioutils.py 79.02% <100.00%> (ø)
...ython/custreamz/custreamz/tests/test_dataframes.py 96.97% <0.00%> (-0.61%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 93ce6c7...b89f085. Read the comment docs.

@jdye64 jdye64 marked this pull request as ready for review June 9, 2021 23:20
@jdye64 jdye64 requested a review from a team as a code owner June 9, 2021 23:20
@jdye64 jdye64 requested review from marlenezw and skirui-source June 9, 2021 23:20
@galipremsagar galipremsagar self-requested a review June 14, 2021 14:11
Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add python tests also for reading multiple files/buffers?

python/cudf/cudf/io/json.py Show resolved Hide resolved
@jdye64
Copy link
Contributor Author

jdye64 commented Jun 15, 2021

rerun tests

@jdye64
Copy link
Contributor Author

jdye64 commented Jun 16, 2021

rerun tests

@shwina shwina self-requested a review June 16, 2021 21:02
@shwina shwina added the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jun 16, 2021
@shwina shwina removed the 5 - DO NOT MERGE Hold off on merging; see PR for details label Jun 17, 2021
@shwina
Copy link
Contributor

shwina commented Jun 17, 2021

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 40776b9 into rapidsai:branch-21.08 Jun 17, 2021
@vuule vuule mentioned this pull request Jul 23, 2021
rapids-bot bot pushed a commit that referenced this pull request Jul 27, 2021
#8403 disabled a large portion of JSON tests. 
This PR reverts the accidental change in that PR.

Authors:
  - Vukasin Milovanovic (https://github.com/vuule)

Approvers:
  - Mike Wilson (https://github.com/hyperbolic2346)
  - Jeremy Dyer (https://github.com/jdye64)
  - Christopher Harris (https://github.com/cwharris)
  - Mark Harris (https://github.com/harrism)

URL: #8843
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] Support multiple inputs in JSON reader
4 participants