[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002

ttnghia · 2024-10-04T18:23:47Z

After reading data using cudf::read_json, if the read schema is given, we need to rearrange the output columns from the output table such that the final output columns will have order matched with the column order given in the input schema.

Currently, this process can lead to copying a lot of columns from the output table of cudf::read_json (hundreds column) into a structs column, which leads to significant overhead. We can do much better by just moving them instead, so there will be no data copying at all.

The text was updated successfully, but these errors were encountered:

ttnghia added feature request New feature or request Spark Functionality that helps Spark RAPIDS labels Oct 4, 2024

ttnghia self-assigned this Oct 4, 2024

ttnghia mentioned this issue Oct 4, 2024

[FEA] Improve GpuJsonToStructs performance NVIDIA/spark-rapids#11560

Closed

karthikeyann mentioned this issue Oct 21, 2024

JSON spark reader plan for 24.12 #17138

Open

karthikeyann mentioned this issue Nov 6, 2024

Rewrite Java API Table.readJSON to return the output from libcudf read_json directly #17180

Merged

rapids-bot bot closed this as completed in #17180 Nov 8, 2024

rapids-bot bot closed this as completed in e8935b9 Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002

[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002

ttnghia commented Oct 4, 2024

[FEA] Implement a better JNI function to assemble the output columns from cudf::read_json #17002

[FEA] Implement a better JNI function to assemble the output columns from cudf::read_json #17002

Comments

ttnghia commented Oct 4, 2024

[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002

[FEA] Implement a better JNI function to assemble the output columns from `cudf::read_json` #17002