Fix select/drop on CSV.Rows #866

quinnj · 2021-08-05T05:23:05Z

The new column-widening functionality regressed using select/drop
keyword args for CSV.Rows. May fix #858.

The new column-widening functionality regressed using select/drop keyword args for `CSV.Rows`. May fix #858.

codecov · 2021-08-05T05:55:57Z

Codecov Report

Merging #866 (4703ffa) into main (d462c35) will not change coverage.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##             main     #866   +/-   ##
=======================================
  Coverage   90.88%   90.88%           
=======================================
  Files           9        9           
  Lines        1898     1898           
=======================================
  Hits         1725     1725           
  Misses        173      173

Impacted Files	Coverage Δ
src/rows.jl	`86.82% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d462c35...4703ffa. Read the comment docs.

* Fix select/drop on CSV.Rows The new column-widening functionality regressed using select/drop keyword args for `CSV.Rows`. May fix #858. * fix tests

* lots of updates to manual/docs * fix * Fix select/drop on CSV.Rows (#866) * Fix select/drop on CSV.Rows The new column-widening functionality regressed using select/drop keyword args for `CSV.Rows`. May fix #858. * fix tests * Add automatic support for reading gzipped csv files (#864) * Add automatic support for reading gzipped csv files Also fixes #733 by using a performant method for processing gzipped files. The motivation here is to make this more "automatic" for users so they don't have to fumble around and question whether they should use GZip.jl or CodecZlib.jl or Zlib.jl or whatever. We know which is the fastest (CodecZlib.jl), and it's a mature, properly binary-built package at this point that works cross platform, so I submit that it's a non-risky dependency to take on for the benefit it provides. In terms of implementation, we take the same approach as data.table `fread` here by decompressing the gzip file to a temp file, then mmapping it. We track the temp files generated in this way and clean them up at the end of `CSV.File` (I need to check `CSV.Rows` as well), and we can also rely on having the default of `cleanup=true` for `tempname()`, which means the julia process itself will try and delete all its generated temp files when it exits. Alternative approaches involving reading chunks of a `GzipDecompressorStream` or having a new `BufferedIOVector <: AbstractVector{UInt8}` which would buffer reads from a `GzipDecompressorStream` input were both considered, but at the end of the day, CSV.jl is hard-coded in several places to work on the input as a `Vector{UInt8}` explicitly. Parsers.jl is more flexible, but it would take non-trivial work to allow CSV.jl the same flexibility. Another factor to consider here is that if we allowed many different `buf` types (other than just `Vector{UInt8}`), then we dig up the recompilation problems all over again (that we literally just finished burying with the removal of type parameters from `Parsers.Options`). That's probably the biggest factor for me personally to take the `fread` approach of using tempfiles, in addition to being very performant. So follow ups before merging are: * Check that `CSV.Rows` can clean up tempfiles * We can be more strict in a few places of requiring `Vector{UInt8}` in CSV.jl * Update doc examples to not need `transcode` explicitly * add buffer_in_memory keyword arg and ensure _any_ source that is gzip will be decompressed automatically * use mktemp instead of tempname * make sure we hit buffer_in_memory codepath * lots of updates to manual/docs * add more writing examples

quinnj added 2 commits August 4, 2021 23:22

Fix select/drop on CSV.Rows

c067304

The new column-widening functionality regressed using select/drop keyword args for `CSV.Rows`. May fix #858.

fix tests

4703ffa

quinnj merged commit 3b62f10 into main Aug 5, 2021

quinnj deleted the jq/selectrows branch August 5, 2021 05:58

quinnj added a commit that referenced this pull request Aug 19, 2021

Fix select/drop on CSV.Rows (#866)

d89d31e

* Fix select/drop on CSV.Rows The new column-widening functionality regressed using select/drop keyword args for `CSV.Rows`. May fix #858. * fix tests

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix select/drop on CSV.Rows #866

Fix select/drop on CSV.Rows #866

quinnj commented Aug 5, 2021

codecov bot commented Aug 5, 2021

Fix select/drop on CSV.Rows #866

Fix select/drop on CSV.Rows #866

Conversation

quinnj commented Aug 5, 2021

codecov bot commented Aug 5, 2021

Codecov Report