Simplify read_csv by removing unnecessary reader/impl classes #9041

cwharris · 2021-08-14T11:59:41Z

Depends on #9040

Most of this is just rearranging code and renaming some variables. The reason I did this is because having all of the reader functionality as individual functions means more localized cognitive loads. Each function declares exactly what it needs to do it's job, and can enforce const-ness on a more granular level. This makes reasoning about the code easier.

It turns out this refactor revealed that the filename-associated logic within the csv reader impl was never being used. The filename is always an empty string, and calling the infer_compression_type function was just a way to stringify the compression type passed in by the reader options. Correction, there was some filename related logic being used, but I've factored that out in PR #9040.

The changes made in #9040 also allow us to pass only a single datasource to read_csv, as that is all that is supported. Therefore the error handling related to passing multiple datasources can be moved upstream.

This PR enables many more simplifications to be made to csv, but I'd like to stop here with this PR, because so far it is purely organizational changes. Any more simplifications will involving refactoring individual functions, which would make this PR incredibly difficult to review.

… to local variables

cwharris · 2021-08-16T19:04:34Z

rerun tests

…ions-simplify

cpp/src/io/functions.cpp

…json/csv options

codecov · 2021-08-19T00:13:52Z

Codecov Report

Merging #9041 (e8a8887) into branch-21.12 (ab4bfaa) will decrease coverage by 0.12%.
The diff coverage is n/a.

@@               Coverage Diff                @@
##           branch-21.12    #9041      +/-   ##
================================================
- Coverage         10.79%   10.66%   -0.13%     
================================================
  Files               116      117       +1     
  Lines             18869    19725     +856     
================================================
+ Hits               2036     2104      +68     
- Misses            16833    17621     +788

Impacted Files	Coverage Δ
python/dask_cudf/dask_cudf/sorting.py	`92.90% <0.00%> (-1.21%)`	⬇️
python/cudf/cudf/io/csv.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/hdf.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/orc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/__init__.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/_version.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/abc.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/api/types.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/io/dlpack.py	`0.00% <0.00%> (ø)`
python/cudf/cudf/core/frame.py	`0.00% <0.00%> (ø)`
... and 65 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 72694d2...e8a8887. Read the comment docs.

cwharris · 2021-08-26T17:28:19Z

rerun tests

review-notebook-app · 2021-10-26T15:15:10Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

robertmaynard · 2021-10-26T15:28:10Z

Removed CMake code review as the PR has no CMake changes.

cwharris · 2021-10-27T15:36:41Z

@gpucibot merge

Depends on #9040 and (unfortunately) #9041 Authors: - Christopher Harris (https://github.com/cwharris) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Vukasin Milovanovic (https://github.com/vuule) URL: #9089

cwharris added 2 commits August 14, 2021 05:08

simplify io/functions.cpp data source/sink factories

9c72e56

begin replacing csv_reader with pure functions

9e92ca2

github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Aug 14, 2021

cwharris added 9 commits August 14, 2021 07:20

pass parse_options explicitly in csv_reader

6492349

replace csv reader impl::select_data_types with pure function

3e365b5

replace csv reader impl::column_flags_ member with local variable

a4497c0

make csv reader impl::find_first_row_start a standalone function

6d708b7

make csv reader impl:col_names_ a local variable

26e37e2

replace csv reader impl::num_records with local variable.

9d84753

convert csv reader impl ::num_actual_columns and ::num_active_columns…

7ce862e

… to local variables

remove csv reader class and impl class in favor of fucntions

9010fe1

rearrange some functions to delete some unneccessary declarations.

7cda106

cwharris requested review from vuule and rgsl888prabhu August 14, 2021 21:25

cwharris added cuIO cuIO issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change tech debt labels Aug 14, 2021

Merge branch 'branch-21.10' of github.com:rapidsai/cudf into io-funct…

884bde6

…ions-simplify

vuule reviewed Aug 16, 2021

View reviewed changes

cpp/src/io/functions.cpp Outdated Show resolved Hide resolved

cwharris added 5 commits August 17, 2021 00:53

remove filepath-related logic from csv and json readers

88e2399

remove filepath logic from avro, parquet, orc readers

62b9520

move range size padding calculation out of json/csv reader and in to …

fb01294

…json/csv options

remove filepaths from json reader

d422aeb

Merge branch 'io-functions-simplify' into io-simplify-csv

a67150e

github-actions bot added the Python Affects Python cuDF API. label Aug 18, 2021

re-delete csv reader_impl header

640375b

vuule added the 4 - Needs cuIO Reviewer label Sep 23, 2021

Merge branch 'branch-21.12' into io-simplify-csv

e8a8887

cwharris changed the base branch from branch-21.10 to branch-21.12 October 26, 2021 15:15

cwharris requested review from a team as code owners October 26, 2021 15:15

cwharris requested a review from charlesbluca October 26, 2021 15:15

github-actions bot added CMake CMake build issue conda labels Oct 26, 2021

galipremsagar removed request for a team October 26, 2021 15:20

jjacobelli approved these changes Oct 26, 2021

View reviewed changes

robertmaynard removed the request for review from a team October 26, 2021 15:27

rapids-bot bot merged commit 3c6d1ee into rapidsai:branch-21.12 Oct 27, 2021

vyasr added 4 - Needs Review Waiting for reviewer to review or respond and removed 4 - Needs cuIO Reviewer labels Feb 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simplify read_csv by removing unnecessary reader/impl classes #9041

Simplify read_csv by removing unnecessary reader/impl classes #9041

cwharris commented Aug 14, 2021 •

edited

Loading

cwharris commented Aug 16, 2021

codecov bot commented Aug 19, 2021 •

edited

Loading

cwharris commented Aug 26, 2021

review-notebook-app bot commented Oct 26, 2021

robertmaynard commented Oct 26, 2021

cwharris commented Oct 27, 2021

Simplify read_csv by removing unnecessary reader/impl classes #9041

Simplify read_csv by removing unnecessary reader/impl classes #9041

Conversation

cwharris commented Aug 14, 2021 • edited Loading

cwharris commented Aug 16, 2021

codecov bot commented Aug 19, 2021 • edited Loading

Codecov Report

cwharris commented Aug 26, 2021

review-notebook-app bot commented Oct 26, 2021

robertmaynard commented Oct 26, 2021

cwharris commented Oct 27, 2021

cwharris commented Aug 14, 2021 •

edited

Loading

codecov bot commented Aug 19, 2021 •

edited

Loading