-
Notifications
You must be signed in to change notification settings - Fork 370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Iterators.partition for DataFrameRows #3299
Conversation
Wow! So fast! Thank you, @bkamins! This fixes apache/arrow-julia#403 Missigness is now retained through Arrow serialization and the test case succeeds:
Out of curiosity, it's not entirely clear what makes the default fallback lose the type information given that it's views. Is it this line that fixed it? It would be awesome if this could be a patch, because there are other partitioning use cases where people rely on the general fallback for Tables-sources The broader context is that most basic users aren't aware that they need to partition their datasets manually to leverage threading in Arrow (which makes Arrow pretty slow in comparison with PyArrow / Polars, see here). I hope to change that by chunking data by default (PR here), which is something that PyArrow does automatically (reference). |
CI error on nightly is unrelated (it is a problem with compilation time and I cannot reproduce it locally) |
For the I think Perhaps @quinnj is better suited to provide his opinion here? |
No, the reason was that previously we relied on Julia inferring the type of elements of the collection by inspecting them (and a narrowest eltype was therefore picked). Now we pass object that carries over schema information. Aside. The pattern:
does not seem to be fully correct. The point is that This is the situation before this PR:
what the PR changes is that the above statement returns a schema of the underlying data frame (as this is doable, however, I believe that Tables.jl contract does not guarantee this). |
Co-authored-by: Milan Bouchet-Valat <[email protected]>
Thank you! |
Fixes #3298
@svilupp - could you please review this PR against your requirements?
@nalimilan - do you think it should be patch or should go to 1.6 release?