-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Threaded parsing type mismatch depending on ntasks
value
#1010
Comments
Additional fact from the original observed file: this column was the right-most column but there were several columns of various types including Float64, string, and Int64, and only this column had issues. |
I think i know what's happening, but not what to do about it. when we chunk up the file in the it's a bit like trying to parse a file that looks like
(i.e. When we try to parse that first chunk as |
Hmmmm.....I don't think that should be possible because we do extra work to ensure chunks only get split exactly on the newline character, so |
hmmmm well, i'm very curious to find out what is going on 😂 the debug output is
I think a few funny things are going on here:
|
Multi-threading fails here, I'm slowly trying to walk through and figure out what's going on but it's quite difficult and overwhelming to understand. |
I'd be curious to know the values here and why that check failed, especially on the full file where it seems like we should have enough columns to get a good % probability of finding the right row endings. It sounds like @nickrobinson251 is probably right that we're not resetting things correctly when multithreaded parsing fails, so we're "stuck" with potentially bad types. |
Problem
We've ran across a very odd issue where depending on the value of
ntasks
set the type of parsed is different.File to replicate the issue, foobar.csv
The text was updated successfully, but these errors were encountered: