Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify read_csv by removing unnecessary reader/impl classes #9041

Merged
merged 23 commits into from
Oct 27, 2021

Conversation

cwharris
Copy link
Contributor

@cwharris cwharris commented Aug 14, 2021

Depends on #9040

Most of this is just rearranging code and renaming some variables. The reason I did this is because having all of the reader functionality as individual functions means more localized cognitive loads. Each function declares exactly what it needs to do it's job, and can enforce const-ness on a more granular level. This makes reasoning about the code easier.

It turns out this refactor revealed that the filename-associated logic within the csv reader impl was never being used. The filename is always an empty string, and calling the infer_compression_type function was just a way to stringify the compression type passed in by the reader options. Correction, there was some filename related logic being used, but I've factored that out in PR #9040.

The changes made in #9040 also allow us to pass only a single datasource to read_csv, as that is all that is supported. Therefore the error handling related to passing multiple datasources can be moved upstream.

This PR enables many more simplifications to be made to csv, but I'd like to stop here with this PR, because so far it is purely organizational changes. Any more simplifications will involving refactoring individual functions, which would make this PR incredibly difficult to review.

@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 14, 2021
@cwharris cwharris added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change tech debt labels Aug 14, 2021
@cwharris
Copy link
Contributor Author

rerun tests

cpp/src/io/functions.cpp Outdated Show resolved Hide resolved
@github-actions github-actions bot added the Python Affects Python cuDF API. label Aug 18, 2021
@codecov
Copy link

codecov bot commented Aug 19, 2021

Codecov Report

Merging #9041 (e8a8887) into branch-21.12 (ab4bfaa) will decrease coverage by 0.12%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-21.12    #9041      +/-   ##
================================================
- Coverage         10.79%   10.66%   -0.13%     
================================================
  Files               116      117       +1     
  Lines             18869    19725     +856     
================================================
+ Hits               2036     2104      +68     
- Misses            16833    17621     +788     
Impacted Files Coverage Δ
python/dask_cudf/dask_cudf/sorting.py 92.90% <0.00%> (-1.21%) ⬇️
python/cudf/cudf/io/csv.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/hdf.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/orc.py 0.00% <0.00%> (ø)
python/cudf/cudf/__init__.py 0.00% <0.00%> (ø)
python/cudf/cudf/_version.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/abc.py 0.00% <0.00%> (ø)
python/cudf/cudf/api/types.py 0.00% <0.00%> (ø)
python/cudf/cudf/io/dlpack.py 0.00% <0.00%> (ø)
python/cudf/cudf/core/frame.py 0.00% <0.00%> (ø)
... and 65 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72694d2...e8a8887. Read the comment docs.

@cwharris
Copy link
Contributor Author

rerun tests

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cwharris cwharris changed the base branch from branch-21.10 to branch-21.12 October 26, 2021 15:15
@cwharris cwharris requested review from a team as code owners October 26, 2021 15:15
@cwharris cwharris requested a review from charlesbluca October 26, 2021 15:15
@github-actions github-actions bot added CMake CMake build issue conda labels Oct 26, 2021
@galipremsagar galipremsagar removed request for a team October 26, 2021 15:20
@robertmaynard robertmaynard removed the request for review from a team October 26, 2021 15:27
@robertmaynard
Copy link
Contributor

Removed CMake code review as the PR has no CMake changes.

@cwharris
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 3c6d1ee into rapidsai:branch-21.12 Oct 27, 2021
rapids-bot bot pushed a commit that referenced this pull request Nov 18, 2021
Depends on #9040 and (unfortunately) #9041

Authors:
  - Christopher Harris (https://github.com/cwharris)

Approvers:
  - Robert Maynard (https://github.com/robertmaynard)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #9089
@vyasr vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuIO Reviewer labels Feb 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Needs Review Waiting for reviewer to review or respond CMake CMake build issue cuIO cuIO issue improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants