Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement a better JNI function to assemble the output columns from cudf::read_json #17002

Closed
ttnghia opened this issue Oct 4, 2024 · 0 comments · Fixed by #17180
Closed
Assignees
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS

Comments

@ttnghia
Copy link
Contributor

ttnghia commented Oct 4, 2024

After reading data using cudf::read_json, if the read schema is given, we need to rearrange the output columns from the output table such that the final output columns will have order matched with the column order given in the input schema.

Currently, this process can lead to copying a lot of columns from the output table of cudf::read_json (hundreds column) into a structs column, which leads to significant overhead. We can do much better by just moving them instead, so there will be no data copying at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request Spark Functionality that helps Spark RAPIDS
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant