-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify read_csv by removing unnecessary reader/impl classes #9041
Conversation
… to local variables
rerun tests |
Codecov Report
@@ Coverage Diff @@
## branch-21.12 #9041 +/- ##
================================================
- Coverage 10.79% 10.66% -0.13%
================================================
Files 116 117 +1
Lines 18869 19725 +856
================================================
+ Hits 2036 2104 +68
- Misses 16833 17621 +788
Continue to review full report at Codecov.
|
rerun tests |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Removed CMake code review as the PR has no CMake changes. |
@gpucibot merge |
Depends on #9040 and (unfortunately) #9041 Authors: - Christopher Harris (https://github.com/cwharris) Approvers: - Robert Maynard (https://github.com/robertmaynard) - Vukasin Milovanovic (https://github.com/vuule) URL: #9089
Depends on #9040
Most of this is just rearranging code and renaming some variables. The reason I did this is because having all of the reader functionality as individual functions means more localized cognitive loads. Each function declares exactly what it needs to do it's job, and can enforce const-ness on a more granular level. This makes reasoning about the code easier.
It turns out this refactor revealed that the filename-associated logic within the csv reader impl was never being used. The filename is always an empty string, and calling the infer_compression_type function was just a way to stringify the compression type passed in by the reader options.Correction, there was some filename related logic being used, but I've factored that out in PR #9040.The changes made in #9040 also allow us to pass only a single datasource to
read_csv
, as that is all that is supported. Therefore the error handling related to passing multiple datasources can be moved upstream.This PR enables many more simplifications to be made to csv, but I'd like to stop here with this PR, because so far it is purely organizational changes. Any more simplifications will involving refactoring individual functions, which would make this PR incredibly difficult to review.