You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Antoine Pitrou / @pitrou:
There is actually a discussion to relax the utf8 requirement in IPC metadata values (see the message recently posted by @jorisvandenbossche "Re: [DISCUSS] Binary Values in Key value pairs WAS: Re: [INFO_REQUEST][FLIGHT] - Dynamic schema changes in ArrowFlight streams").
In short: yes, Arrow C++ and PyArrow can put arbitrary binary data in metadata values.
Joris Van den Bossche / @jorisvandenbossche:
(Side note: this might be just for quick testing, but if you actually want to use the extension type on the rust side as well, you should probably define the extension type in Python as a subclass of pyarrow.ExtensionType, and not pyarrow.PyExtensionType, since the latter uses a pickle dump of the class as the serialized metadata, which you won't be able to use in Rust, I suppose)
While trying to roundtrip an extension from schema.metadata (see ARROW-13855 for details), I got invalid utf8, which imo goes against
Specifically, a field
field = pyarrow.field("aa", UuidType())
contains the following:
with the value's data for this key being:
This is not a valid utf8 (see e.g. https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=02b67658b3cddf8dc095bc9750fa7032).
Maybe I am reading the values incorrectly? (null point?)
[1] https://arrow.apache.org/docs/format/CDataInterface.html#c.ArrowSchema.metadata
Reporter: Jorge Leitão / @jorgecarleitao
Note: This issue was originally created as ARROW-15613. Please see the migration documentation for further details.
The text was updated successfully, but these errors were encountered: