-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with Tables.rowtable
#827
Comments
Hmmm....I'm not quite sure how you're ending up with f = CSV.File(IOBuffer("""x,y
a,b
a,b
a,b
a,b
a,b
a,b
a,b
a,b
a,b
"""), ignoreemptylines=false)
julia> Tables.schema(f)
Tables.Schema:
:x Union{Missing, String}
:y Union{Missing, String}
julia> Tables.rowtable(f)
10-element Vector{NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}}:
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}(("a", "b"))
NamedTuple{(:x, :y), Tuple{Union{Missing, String}, Union{Missing, String}}}((missing, missing)) If there's any way to anonymize the data or share it w/ me privately, I'd be interested in tracking this down. |
I spent a bit of time trying to minimize and anonymize and then eventually realized that this was enough to trigger it aha: julia> using Tables, CSV, Random
julia> Threads.nthreads()
2
julia> CSV.write("data.csv", (; col= [randstring(50) for _ = 1:50]))
"data.csv"
julia> Tables.schema(CSV.File("data.csv"; pool=true, threaded=true))
Tables.Schema:
:col PooledString
julia> versioninfo()
Julia Version 1.6.0
Commit f9720dc2eb (2021-03-24 12:55 UTC)
Platform Info:
OS: Linux (x86_64-pc-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, broadwell) (CSV v0.8.4, Tables v1.4.2) It works fine without threading. |
Fixes #827. The issue here is the column type of pooled columns was still `PooledString` when finished parsing, which is an internal-only type used while parsing to signal a column is being pooled. The fix is pretty straightforward: ensure the column type is `String` or `Union{String, Missing}` when we're done parsing.
Fix is up: #828 |
* Fix CSV.File schema for pooled columns when multithreaded parsing Fixes #827. The issue here is the column type of pooled columns was still `PooledString` when finished parsing, which is an internal-only type used while parsing to signal a column is being pooled. The fix is pretty straightforward: ensure the column type is `String` or `Union{String, Missing}` when we're done parsing. * finish test
Thanks! |
Is this fix released already? I seem to have the very same issue on the latest available CSV.jl 0.8.4. |
Oh whoops, looks like we forgot to do a patch release w/ this fix; I've gone ahead and done that here: f405361. |
I just ran into this error when trying to call
Tables.rowtable
on a CSV.File:I don't think I can share the file, unfortunately.
With this and apache/arrow-julia#167 I've found a good workaround has been
map(NamedTuple, Tables.row(...))
. I wonder if this also a schema issue or something else.The text was updated successfully, but these errors were encountered: