Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] The output columns of read_json need to follow depth-first-search order as in the input schema #17090

Closed
ttnghia opened this issue Oct 15, 2024 · 0 comments · Fixed by #17029
Labels
cuIO cuIO issue feature request New feature or request Spark Functionality that helps Spark RAPIDS

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Oct 15, 2024

Currently, read_json takes in the read schema as a map. As such, its column order cannot be preserved when the output table is generated. The callers need to keep track of the column order by themselves and rearrange the columns based on the output column names.

We should support preserving the column order (by depth-first-search order) so the callers can reduce their overhead of doing so. In order to do so, the input schema needs to be specified using a std::vector of nested columns instead of std::map.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuIO cuIO issue feature request New feature or request Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant