You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
As discussed in issue #6816 it seems necessary to support some sort of schema for writing parquet files. There are cases where the caller has information about the writing that cudf currently doesn't track or has no way to know. Specifically, decimal precision and maps. To prevent clouding that discussion with specifics, I am creating this request for a specific schema, which is Arrow's schema.
Describe the solution you'd like
We should import this schema into cudf. We could build against the Arrow files or import them wholesale. I think building against them is best for maintenance, but adds a dependency on Arrow, which seems undesirable.
Describe alternatives you've considered
We could roll our own schema, but there are multiple reasons to avoid that.
People who would use this would probably be moving to GPU from some other system, so adopting a commonly used system seems more useful to people.
We would spend a lot of time building and maintaining the schema.
This issue has been marked stale due to no recent activity in the past 30d. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed. This issue will be marked rotten if there is no activity in the next 60d.
This issue has been labeled inactive-90d due to no recent activity in the past 90 days. Please close this issue if no further response or action is needed. Otherwise, please respond with a comment indicating any updates or changes to the original issue and/or confirm this issue still needs to be addressed.
Is your feature request related to a problem? Please describe.
As discussed in issue #6816 it seems necessary to support some sort of schema for writing parquet files. There are cases where the caller has information about the writing that cudf currently doesn't track or has no way to know. Specifically, decimal precision and maps. To prevent clouding that discussion with specifics, I am creating this request for a specific schema, which is Arrow's schema.
Describe the solution you'd like
We should import this schema into cudf. We could build against the Arrow files or import them wholesale. I think building against them is best for maintenance, but adds a dependency on Arrow, which seems undesirable.
Describe alternatives you've considered
We could roll our own schema, but there are multiple reasons to avoid that.
Additional context
Here is a link to the schema that is used in Arrow: https://github.com/apache/arrow/blob/master/cpp/src/parquet/schema.h
The text was updated successfully, but these errors were encountered: