[FEA] The output columns of `read_json` need to follow depth-first-search order as in the input schema #17090

ttnghia · 2024-10-15T18:56:02Z

Currently, read_json takes in the read schema as a map. As such, its column order cannot be preserved when the output table is generated. The callers need to keep track of the column order by themselves and rearrange the columns based on the output column names.

We should support preserving the column order (by depth-first-search order) so the callers can reduce their overhead of doing so. In order to do so, the input schema needs to be specified using a std::vector of nested columns instead of std::map.

The text was updated successfully, but these errors were encountered:

ttnghia added cuIO cuIO issue feature request New feature or request Spark Functionality that helps Spark RAPIDS labels Oct 15, 2024

ttnghia mentioned this issue Oct 15, 2024

[FEA] Improve GpuJsonToStructs performance NVIDIA/spark-rapids#11560

Closed

karthikeyann mentioned this issue Oct 21, 2024

JSON spark reader plan for 24.12 #17138

Open

karthikeyann mentioned this issue Nov 6, 2024

Add optional column_order in JSON reader #17029

Merged

3 tasks

rapids-bot bot closed this as completed in #17029 Nov 8, 2024

rapids-bot bot closed this as completed in b3b5ce9 Nov 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEA] The output columns of `read_json` need to follow depth-first-search order as in the input schema #17090

[FEA] The output columns of `read_json` need to follow depth-first-search order as in the input schema #17090

ttnghia commented Oct 15, 2024

[FEA] The output columns of read_json need to follow depth-first-search order as in the input schema #17090

[FEA] The output columns of read_json need to follow depth-first-search order as in the input schema #17090

Comments

ttnghia commented Oct 15, 2024

[FEA] The output columns of `read_json` need to follow depth-first-search order as in the input schema #17090

[FEA] The output columns of `read_json` need to follow depth-first-search order as in the input schema #17090