Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

footerskip Windows/Linux inconsistency #796

Closed
fgasdia opened this issue Jan 2, 2021 · 2 comments
Closed

footerskip Windows/Linux inconsistency #796

fgasdia opened this issue Jan 2, 2021 · 2 comments

Comments

@fgasdia
Copy link

fgasdia commented Jan 2, 2021

There's an inconsistency between the behavior of footerskip in Windows and Linux. I've labeled three instances of surprising behavior (maybe the first is acceptable) in bold numbers.
edit: I'm using Julia 1.5.3 and CSV v0.8.2

Let's look at a sample csv file. Let's say I want to read rows 1, 2, 3. First lets look at a file with "\n" line endings.

open("foo_n.csv", "w") do io
    @printf(io, "1, a\n")
    @printf(io, "2, b\n")
    @printf(io, "3, c\n")
    @printf(io, "4, d\n")
    @printf(io, "\n")
end
CSV.File("foo_n.csv"; skipto=1, footerskip=1)

reads rows 1, 2, 3, 4 on Windows and Linux.
1) I'm surprised by this because between the default argument ignoreemptylines=true and the description of footerskip, I would expect this to only read rows 1, 2, 3.

To read only the first three rows, set footerskip=2

CSV.File("foo_n.csv"; skipto=1, footerskip=2)

but,
2) this only works on Linux! In Windows, I need to set footerskip=3 which is definitely not the intended behavior.

The handling with lines ending "\r\n" is different.

open("foo_rn.csv", "w") do io
    @printf(io, "1, a\r\n")
    @printf(io, "2, b\r\n")
    @printf(io, "3, c\r\n")
    @printf(io, "4, d\r\n")
    @printf(io, "\r\n")
end

As before, CSV.File("foo_rn.csv"; skipto=1, footerskip=1) reads rows 1, 2, 3, 4 for both Windows and Linux.
3) Unlike the "\n" line ending, on both Windows and Linux

CSV.File("foo_rn.csv"; skipto=1, footerskip=2)

reads all four rows. Both operating systems need footerskip=3 to only return the first three rows.

@quinnj
Copy link
Member

quinnj commented Feb 27, 2021

Thanks for the detailed report, let me investigate further. Some immediate thoughts:

  • ignoreemptylines is (currently) only a forward-parsing feature, and footerskip is applied before parsing begins, so that makes sense in my mind; we could certainly look into supporting ignoreemptylines when skipping footer rows, it probably wouldn't be too bad and would help in certain cases like this
  • The parsing code is all platform/OS-agnostic, so I'm surprised there's a difference between windows/linux; that definitely seems like a bug.
  • Point 3 also seems like a bug; I remember fixing an issue like this at one point, but maybe there's another bug there somehow

quinnj added a commit that referenced this issue Feb 27, 2021
@quinnj
Copy link
Member

quinnj commented Feb 28, 2021

Ah, so it turns out that the bug in the \r\n case is due entirely because we don't account for ignoreemptylines, so it looks like it makes the most sense to support it.

quinnj added a commit that referenced this issue Feb 28, 2021
* Add tests for #796

* Support ReversedBuf in skipemptyrow

* More tests
@quinnj quinnj closed this as completed Feb 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants