-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix out of bound write handling for get_json_object_multiple_paths
#2285
Conversation
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
Signed-off-by: Nghia Truong <[email protected]>
build |
get_json_object_multiple_paths
get_json_object_multiple_paths
for (std::size_t idx = 0; idx < oob_indices.size(); ++idx) { | ||
auto const out_idx = oob_indices[idx]; | ||
output[out_idx] = | ||
cudf::make_strings_column(input.size(), | ||
std::move(out_offsets_and_sizes[idx].first), | ||
out_char_buffers[idx].release(), | ||
out_null_masks_and_null_counts[idx].second, | ||
std::move(out_null_masks_and_null_counts[idx].first)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The indices out_idx
and idx
should be different. Previously they were the same. That is incorrect.
@@ -1112,7 +1093,7 @@ std::vector<std::unique_ptr<cudf::column>> get_json_object( | |||
h_path_data, stream, rmm::mr::get_current_device_resource()); | |||
thrust::uninitialized_fill( | |||
rmm::exec_policy(stream), d_has_out_of_bound.begin(), d_has_out_of_bound.end(), 0); | |||
kernel.exec(*d_input_ptr, d_path_data, stream); | |||
kernel_launcher::exec(*d_input_ptr, d_path_data, stream); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When launching the kernel a second time, the number of paths is different from that number in launching the first time. Thus, the kernel launch parameters are different.
I ran through all of the JSON tests and redid the performance testing I did before. Everything passed as expected, because this is just here for a very odd corner case that is not going to ever be needed in practice. |
This fixes the code path for handling out-of-bound write when there are more than one JSON paths in
get_json_object_multiple_paths
. Currently there are two issues with it:A unit test is also added to check for this particular code path.