Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix map type support #712

Closed
wants to merge 7 commits into from
Closed

Fix map type support #712

wants to merge 7 commits into from

Conversation

Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Jul 26, 2022

Description

Currently DeltaTable.to_pyarrow_dataset() fails for any table containing map types. There are 2 reasons for this:

  1. The python code to parse the schema json returned by the rust code does not support map types correctly.
  2. The version of rust arrow used (15) does not support map types. After fixing point 1 we get ArrowException: C Data interface error: The datatype ""+m"" is still not supported in Rust implementation from this line.

Related Issue(s)

#713

@wjones127
Copy link
Collaborator

Note: I am almost done rewriting the schema stuff here #684. I can add the tests here to make sure it fixes this issue.

@Tom-Newton
Copy link
Contributor Author

Tom-Newton commented Jul 26, 2022

Note: I am almost done rewriting the schema stuff here #684. I can add the tests here to make sure it fixes this issue.

Nice this looks really good! I also just found apache/arrow-rs#2037 so the second issue I found looks like its also in progress to be fixed.

@Tom-Newton
Copy link
Contributor Author

It looks like the fix for apache/arrow-rs#2037 is already in the 19.0.0 release. I've given it a try and this PR does now fully fix map types but this probably should be closed in favour of #684 and a PR to upgrade arrow to 19.0.0 without disabling deny(warnings) (I was getting deprecation and unstable API warnings).

I will also point out that this code confuses me. I'm not sure why we need to support dictionary types. I think delta table schemas should probably never map to pyarrow dictionary types. I suspect there may have been some confusion between map types and dictionary types in pyarrow? A dictionary type in pyarrow means the field is dictionary encoded a map type means the data contains key value pairs.

@roeap
Copy link
Collaborator

roeap commented Jul 26, 2022

Hi @Tom-Newton - Thanks for the work on this! Within #703 I started the migration of arrow, but more importantly datafusion. Unfortunately we have to keep those versions aligned. However the simple task of updating datafusion got a bit bigger as they fundamentally changed the internal path handling, and I am right now working on migrating the delta internal path stuff to the new patterns. THis also constitutes an important step towards integrating with the new object store abstractions.

@Tom-Newton
Copy link
Contributor Author

It sounds like I just need to be patient then 🙂 . It seems like everything that is needed is in progress 🚀.

@Tom-Newton
Copy link
Contributor Author

Closing since there are 2 in progress PRs that I believe will solve this. #703 #684

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants