[PERF] Performance impact of mixed_type_as_string
JSON reader option in reading JSON lines
#15196
Labels
Milestone
mixed_type_as_string
JSON reader option in reading JSON lines
#15196
This report presents some preliminary findings on the normalization, mixed types handling, byte range reading, and error recovery handling options for JSON lines input. Given a valid JSON input string i.e. with no modifications to the data generation and reading a single chunk i.e. the byte range consists of all records, we expect to see no significant impact of enabling these options.
Benchmarks were run on A100 80GB GPU, with all combinations of the above options being enabled/disabled, and a performance degradation of 98% was observed on enabling
mixed_type_as_string
(keepingnormalize_single_quotes=NO row_selection=ALL recovery_mode=RECOVER_WITH_NULL
constant between the two experiments). Refer to figure for performance comparison.To investigate the impact of
mixed_type_as_string
being enabled, the benchmark was profiled with--axis normalize_single_quotes=NO --axis row_selection=ALL --axis mixed_types_as_string=YES --axis recovery_mode=RECOVER_WITH_NULL
.The
infer_column_type_kernel
appears to be the bottleneck due to stalled warps resulting in achieved occupancy of 9.3% (nsys and ncu profiles below).Next steps
Related information
The text was updated successfully, but these errors were encountered: