-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition in 0.9 #888
Comments
Thanks for reporting; even if you can only share the error + stacktrace you're seeing, that can often be very helpful. |
Filename passed as a string; kwargs: comment="#",
delim=';',
missingstring=["NA",""],
downcast=true,
drop=[1,2],
decimal=',' Error + stacktrace: TaskFailedException
nested task error: AssertionError: task_col.column isa Vector{UInt32}
Stacktrace:
[1] syncrefs!(#unused#::Type{String}, col::CSV.Column, task_col::CSV.Column, task_rows::Int64)
@ CSV ~/.julia/packages/CSV/eB98W/src/file.jl:474
[2] multithreadpostparse(ctx::CSV.Context, ntasks::Int64, pertaskcolumns::Vector{Vector{CSV.Column}}, rows::Vector{Int64}, rowchunkguess::Int64, finalrows::Int64, j::Int64, col::CSV.Column)
@ CSV ~/.julia/packages/CSV/eB98W/src/file.jl:425
[3] (::CSV.var"#26#31"{CSV.Context, Vector{Int64}, Vector{Vector{CSV.Column}}, Int64, Int64, CSV.Column, Int64, Int64})()
@ CSV ./threadingconstructs.jl:169
...and 1 more exception.
sync_end(::Channel{Any})@task.jl:369
macro expansion@task.jl:388[inlined]
CSV.File(::CSV.Context, ::Bool)@file.jl:257
File@file.jl:223[inlined]
#File#[email protected]:219[inlined]
top-level scope@Local: 2[inlined] |
Thanks that's helpful. If there's anyway to share the file privately with me, that would be extremely helpful in terms of debugging this; happy to jump on a call/zoom to facilitate sharing if possible. Otherwise, I'll try to walk through the code and figure out how this is getting triggered. In the mean time, you can avoid the error by doing |
Fixes #888. One of the core issues is that we weren't "remembering", when a column's type got detected, whether its pool value was a generic pool value (like `pool=true`), or provided per-column. This is unique to pooling, because we don't/can't really apply the pooling until we know the column type, so it's usually post-context. The proposal here adds another field `columnspecificpool`, which will remember this property. The other big issue is that the pooling mechanics have shifted around a bit throughout the recent internals refactoring, so we were a little inconsistent in how we were handling things overall.
Fixes #888. One of the core issues is that we weren't "remembering", when a column's type got detected, whether its pool value was a generic pool value (like `pool=true`), or provided per-column. This is unique to pooling, because we don't/can't really apply the pooling until we know the column type, so it's usually post-context. The proposal here adds another field `columnspecificpool`, which will remember this property. The other big issue is that the pooling mechanics have shifted around a bit throughout the recent internals refactoring, so we were a little inconsistent in how we were handling things overall.
I could reproduce the issue on 0.9.1 and the proposed fix in #898 fixes it for me. |
In the REPL the fix works, but it in Pluto it doesn't: TaskFailedException
nested task error: TypeError: in typeassert, expected Dict{Union{Missing, String}, UInt32}, got a value of type Dict{Union{Missing, WeakRefStrings.String31}, UInt32}
Stacktrace:
[1] syncrefs!(#unused#::Type{String}, col::CSV.Column, task_col::CSV.Column, task_rows::Int64)
@ CSV ~/.julia/packages/CSV/h66c0/src/file.jl:476
[2] multithreadpostparse(ctx::CSV.Context, ntasks::Int64, pertaskcolumns::Vector{Vector{CSV.Column}}, rows::Vector{Int64}, rowchunkguess::Int64, finalrows::Int64, j::Int64, col::CSV.Column)
@ CSV ~/.julia/packages/CSV/h66c0/src/file.jl:422
[3] (::CSV.var"#26#31"{CSV.Context, Vector{Int64}, Vector{Vector{CSV.Column}}, Int64, Int64, CSV.Column, Int64, Int64})()
@ CSV ./threadingconstructs.jl:169
sync_end(::Channel{Any})@task.jl:369
macro expansion@task.jl:388[inlined]
CSV.File(::CSV.Context, ::Bool)@file.jl:257
File@file.jl:223[inlined]
#File#[email protected]:219[inlined]
top-level scope@Local: 2[inlined] Note that Pluto runs the notebooks in a Distributed worker process, so you get slightly different scheduling. I think the problem here could be sidestepped by specifying the string type, but the auto detection threshold is what has the race condition. |
Thanks for the follow up; I'll try to dig in and see if there's still a promotion problem with the refpool sometimes. |
I will try to provide a nicer MWE later, but I ran into a problem while working on some proprietary data. The file in question parses without problem in CSV 0.8.3, but fails under the same options in 0.9. Adding
ntasks=1
in CSV 0.9 solves the issue, which makes me suspect a race condition.cc: @dmbates
The text was updated successfully, but these errors were encountered: