Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move filepath and mmap logic out of json/csv up to functions.cpp #9040

Merged
merged 7 commits into from
Aug 24, 2021

Conversation

cwharris
Copy link
Contributor

@cwharris cwharris commented Aug 14, 2021

Removes the filepath-related logic from readers, moving whole-file compression type inference up to io/functions.cpp. Also moves the lazy mmap datasource creation logic out csv/json reader and up to io/functions.cpp.

@cwharris cwharris added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Aug 14, 2021
@cwharris cwharris requested a review from a team as a code owner August 14, 2021 10:12
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 14, 2021
@cwharris
Copy link
Contributor Author

rerun tests

@cwharris cwharris requested a review from vuule August 16, 2021 19:02
@cwharris
Copy link
Contributor Author

rerun tests

@cwharris cwharris marked this pull request as draft August 16, 2021 22:05
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 17, 2021
@cwharris cwharris changed the title replace make_writer and make_reader with simpler make data_sources/sink remove filepath-related logic from csv and json reader Aug 17, 2021
@cwharris cwharris changed the title remove filepath-related logic from csv and json reader remove filepath-related logic from readers Aug 17, 2021
@codecov
Copy link

codecov bot commented Aug 17, 2021

Codecov Report

❗ No coverage uploaded for pull request base (branch-21.10@abe57f8). Click here to learn what that means.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##             branch-21.10    #9040   +/-   ##
===============================================
  Coverage                ?   10.73%           
===============================================
  Files                   ?      114           
  Lines                   ?    19060           
  Branches                ?        0           
===============================================
  Hits                    ?     2046           
  Misses                  ?    17014           
  Partials                ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update abe57f8...2ac281f. Read the comment docs.

@cwharris cwharris marked this pull request as ready for review August 17, 2021 16:35
@cwharris cwharris requested a review from a team as a code owner August 17, 2021 16:35
@cwharris cwharris changed the title remove filepath-related logic from readers move filepath and byte-range logic from readers to io/functions.cpp Aug 18, 2021
@cwharris cwharris changed the title move filepath and byte-range logic from readers to io/functions.cpp move filepath and mmap logic out of json/csv up to functions.cpp Aug 18, 2021
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good stuff. We're finally done passing file names to the readers 👍

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@hyperbolic2346 hyperbolic2346 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very informative read, thanks for doing this.

cpp/src/io/comp/uncomp.cpp Outdated Show resolved Hide resolved
cpp/src/io/csv/reader_impl.cu Outdated Show resolved Hide resolved
cpp/src/io/functions.cpp Show resolved Hide resolved
@cwharris cwharris requested a review from shwina August 24, 2021 14:15
@cwharris
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit c271ce2 into rapidsai:branch-21.10 Aug 24, 2021
rapids-bot bot pushed a commit that referenced this pull request Aug 31, 2021
rapids-bot bot pushed a commit that referenced this pull request Oct 27, 2021
Depends on #9040

Most of this is just rearranging code and renaming some variables. The reason I did this is because having all of the reader functionality as individual functions means more localized cognitive loads. Each function declares exactly what it needs to do it's job, and can enforce const-ness on a more granular level. This makes reasoning about the code easier.

~It turns out this refactor revealed that the filename-associated logic within the csv reader impl was never being used. The filename is always an empty string, and calling the infer_compression_type function was just a way to stringify the compression type passed in by the reader options.~ Correction, there was some filename related logic being used, but I've factored that out in PR #9040.

The changes made in #9040 also allow us to pass only a single datasource to `read_csv`, as that is all that is supported. Therefore the error handling related to passing multiple datasources can be moved upstream.

This PR enables many more simplifications to be made to csv, but I'd like to stop here with this PR, because so far it is purely organizational changes. Any more simplifications will involving refactoring individual functions, which would make this PR incredibly difficult to review.

Authors:
  - Christopher Harris (https://github.com/cwharris)

Approvers:
  - Vukasin Milovanovic (https://github.com/vuule)
  - Jordan Jacobelli (https://github.com/Ethyling)

URL: #9041
rapids-bot bot pushed a commit that referenced this pull request Nov 11, 2021
Depends on #9040

Removes the json reader and impl classes, replacing member variables with local variables, reduces cognitive overhead, and facilitates further refactoring.

Authors:
  - Christopher Harris (https://github.com/cwharris)

Approvers:
  - Ram (Ramakrishna Prabhu) (https://github.com/rgsl888prabhu)
  - MithunR (https://github.com/mythrocks)
  - Elias Stehle (https://github.com/elstehle)

URL: #9088
rapids-bot bot pushed a commit that referenced this pull request Nov 18, 2021
Depends on #9040 and (unfortunately) #9041

Authors:
  - Christopher Harris (https://github.com/cwharris)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #9089
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Python Affects Python cuDF API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants