-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix map type support #712
Fix map type support #712
Conversation
Note: I am almost done rewriting the schema stuff here #684. I can add the tests here to make sure it fixes this issue. |
Nice this looks really good! I also just found apache/arrow-rs#2037 so the second issue I found looks like its also in progress to be fixed. |
It looks like the fix for apache/arrow-rs#2037 is already in the 19.0.0 release. I've given it a try and this PR does now fully fix map types but this probably should be closed in favour of #684 and a PR to upgrade arrow to 19.0.0 without disabling I will also point out that this code confuses me. I'm not sure why we need to support dictionary types. I think delta table schemas should probably never map to pyarrow dictionary types. I suspect there may have been some confusion between map types and dictionary types in pyarrow? A dictionary type in pyarrow means the field is dictionary encoded a map type means the data contains key value pairs. |
Hi @Tom-Newton - Thanks for the work on this! Within #703 I started the migration of arrow, but more importantly datafusion. Unfortunately we have to keep those versions aligned. However the simple task of updating datafusion got a bit bigger as they fundamentally changed the internal path handling, and I am right now working on migrating the delta internal path stuff to the new patterns. THis also constitutes an important step towards integrating with the new object store abstractions. |
It sounds like I just need to be patient then 🙂 . It seems like everything that is needed is in progress 🚀. |
Description
Currently
DeltaTable.to_pyarrow_dataset()
fails for any table containing map types. There are 2 reasons for this:ArrowException: C Data interface error: The datatype ""+m"" is still not supported in Rust implementation
from this line.Related Issue(s)
#713