You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Parquet does not have a way to represent timestamps with a base of seconds. As a result if you write timestamps with seconds as a base, we can only embed this information in the arrow schema and not the parquet schema.
See #1666 for more context, and discussion around an optional coerce_types feature that would automatically cast unsupported types for maximum compatibility.
If you wish to be compatible with non-arrow parquet readers, you will need to cast the arrow array to a supported time base prior to writing it, or build the array using a supported time base as your linked commit does.
Describe the bug
I'm unable to persist fields represented as e.g.
Timestamp
in Arrow to recognized timestamps in the written Parquet.To Reproduce
I've written a simple utility for converting WARC files to Parquet. Using this, you'll produce Parquet which looks something like this:
Expected behavior
Looking at Parquet produced from a sample datasets (of NYC taxi data), their Parquet has the correctly annotated
TIMESTAMP
:The text was updated successfully, but these errors were encountered: