-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with SentinelArrays.ChainedVector when limit/skipto is set #963
Labels
Comments
quinnj
added a commit
that referenced
this issue
Jan 17, 2022
Fixes #963. The issue here is that although we were adjusting the # of rows to a provided limit when multithreaded parsing, we failed to adjust the actual column arrays to the correct size. This was an issue when we converted from the old `CSV.Column` custom array type to returning "normal" arrays in the 0.7 -> 0.8 transition. With `CSV.Column`, we just passed the final row total and it adjusted the size dynamically, without physically resizing the underlying array. With regular arrays, however, we need to ensure the array gets resized appropriately. This became more apparent in the recent pooling change that was released since it actually became a silenced BoundsError because of the use of `@inbounds` in the new `checkpooled!` routine. I've taken out those `@inbounds` uses for now to be more conservative. The fix is fairly straightforward in that if we adjust our final row down to a user-provided limit, then we loop over the parsing tasks and "accumulate" rows until we hit the limit and then resize or `empty!` columns as appropriate.
Thanks for the report; fix is up: #964. |
quinnj
added a commit
that referenced
this issue
Jan 17, 2022
Fixes #963. The issue here is that although we were adjusting the # of rows to a provided limit when multithreaded parsing, we failed to adjust the actual column arrays to the correct size. This was an issue when we converted from the old `CSV.Column` custom array type to returning "normal" arrays in the 0.7 -> 0.8 transition. With `CSV.Column`, we just passed the final row total and it adjusted the size dynamically, without physically resizing the underlying array. With regular arrays, however, we need to ensure the array gets resized appropriately. This became more apparent in the recent pooling change that was released since it actually became a silenced BoundsError because of the use of `@inbounds` in the new `checkpooled!` routine. I've taken out those `@inbounds` uses for now to be more conservative. The fix is fairly straightforward in that if we adjust our final row down to a user-provided limit, then we loop over the parsing tasks and "accumulate" rows until we hit the limit and then resize or `empty!` columns as appropriate.
quinnj
added a commit
that referenced
this issue
Jan 19, 2022
* Fix use of limit in multithreaded parsing Fixes #963. The issue here is that although we were adjusting the # of rows to a provided limit when multithreaded parsing, we failed to adjust the actual column arrays to the correct size. This was an issue when we converted from the old `CSV.Column` custom array type to returning "normal" arrays in the 0.7 -> 0.8 transition. With `CSV.Column`, we just passed the final row total and it adjusted the size dynamically, without physically resizing the underlying array. With regular arrays, however, we need to ensure the array gets resized appropriately. This became more apparent in the recent pooling change that was released since it actually became a silenced BoundsError because of the use of `@inbounds` in the new `checkpooled!` routine. I've taken out those `@inbounds` uses for now to be more conservative. The fix is fairly straightforward in that if we adjust our final row down to a user-provided limit, then we loop over the parsing tasks and "accumulate" rows until we hit the limit and then resize or `empty!` columns as appropriate. * Fix
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@quinnj:
Start Julia with multiple threads.
Run:
The problem is that in both columns
SentinelArrays.ChainedVector
the length ofarrays
does not matchinds
. Here is an example forColumn2
:and you see that there are 30 elements missing.
If you reimplemented the
getindex
like:to enable bounds checking you get bounds error when trying to work with
df2
.The text was updated successfully, but these errors were encountered: