Skip to content

Commit

Permalink
Improve performance in JSON reader when mixed_types_as_string optio…
Browse files Browse the repository at this point in the history
…n is enabled (#15236)

Addresses #15196 by applying a patch from @karthikeyann to skip the `infer_column_type_kernel` by forcing the mixed types column to be a string. 
With this optimization, we see a significant improvement in performance. Please refer to the [comment](#15236 (comment)) for a visualization of the results before and after applying this optimization as obtained from the [JSON lines benchmarking exercise](#15124).

Authors:
  - Shruti Shivakumar (https://github.com/shrshi)
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Karthikeyan (https://github.com/karthikeyann)
  - Vukasin Milovanovic (https://github.com/vuule)

URL: #15236
  • Loading branch information
shrshi authored Mar 8, 2024
1 parent 6c18729 commit c9e54cf
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 0 deletions.
3 changes: 3 additions & 0 deletions cpp/src/io/json/json_column.cu
Original file line number Diff line number Diff line change
Expand Up @@ -674,6 +674,7 @@ void make_device_json_column(device_span<SymbolT const> input,
reinitialize_as_string(old_col_id, col);
// all its children (which are already inserted) are ignored later.
}
col.forced_as_string_column = true;
columns.try_emplace(this_col_id, columns.at(old_col_id));
continue;
}
Expand Down Expand Up @@ -915,6 +916,8 @@ std::pair<std::unique_ptr<column>, std::vector<column_name_info>> device_json_co
: "n/a");
#endif
target_type = schema.value().type;
} else if (json_col.forced_as_string_column) {
target_type = data_type{type_id::STRING};
}
// Infer column type, if we don't have an explicit type for it
else {
Expand Down
2 changes: 2 additions & 0 deletions cpp/src/io/json/nested_json.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -160,6 +160,8 @@ struct device_json_column {
std::vector<std::string> column_order;
// Counting the current number of items in this column
row_offset_t num_rows = 0;
// Force as string column
bool forced_as_string_column{false};

/**
* @brief Construct a new d json column object
Expand Down

0 comments on commit c9e54cf

Please sign in to comment.