Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use make_strings_children in parse_data nested json reader #12382

Merged

Conversation

karthikeyann
Copy link
Contributor

@karthikeyann karthikeyann commented Dec 14, 2022

Description

Use make_strings_children utility in parse_data nested json reader
Addresses part of #12167

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@karthikeyann karthikeyann added 3 - Ready for Review Ready for review by team libcudf Affects libcudf (C++/CUDA) code. tech debt improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Dec 14, 2022
@karthikeyann karthikeyann added this to the Nested JSON reader milestone Dec 14, 2022
@karthikeyann karthikeyann self-assigned this Dec 14, 2022
@karthikeyann karthikeyann requested a review from a team as a code owner December 14, 2022 13:37
@karthikeyann karthikeyann requested review from bdice, elstehle, davidwendt and vuule and removed request for bdice December 14, 2022 13:37
cpp/include/cudf/io/detail/data_casting.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/io/detail/data_casting.cuh Outdated Show resolved Hide resolved
@randerzander
Copy link
Contributor

run benchmarks

1 similar comment
@DaceT
Copy link

DaceT commented Dec 21, 2022

run benchmarks

@karthikeyann
Copy link
Contributor Author

Benchmark Results

nested_json_gpu_parser

[0] Quadro GV100

string_size Samples CPU Time Noise GPU Time Noise Elem/s bytes_per_second peak_memory_usage
2^20 = 1048576 3056x 4.093 ms 6.08% 4.087 ms 6.07% 256.555M 256554796 9.201 MiB
2^21 = 2097152 1376x 4.357 ms 2.02% 4.351 ms 2.01% 482.009M 482008813 18.164 MiB
2^22 = 4194304 1520x 5.033 ms 1.91% 5.027 ms 1.90% 834.390M 834390240 36.090 MiB
2^23 = 8388608 2273x 6.582 ms 2.61% 6.576 ms 2.60% 1.276G 1275592364 71.942 MiB
2^24 = 16777216 1481x 10.114 ms 8.88% 10.108 ms 8.87% 1.660G 1659820960 143.645 MiB
2^25 = 33554432 848x 17.180 ms 4.78% 17.175 ms 4.78% 1.954G 1953714331 287.051 MiB
2^26 = 67108864 474x 31.625 ms 2.57% 31.620 ms 2.57% 2.122G 2122376400 573.864 MiB
2^27 = 134217728 264x 56.838 ms 1.30% 56.833 ms 1.30% 2.362G 2361620677 1.121 GiB
2^28 = 268435456 137x 109.830 ms 1.63% 109.824 ms 1.63% 2.444G 2444226569 2.241 GiB
2^29 = 536870912 67x 224.780 ms 0.91% 224.776 ms 0.91% 2.388G 2388474330 4.482 GiB
2^30 = 1073741824 36x 427.308 ms 0.98% 427.302 ms 0.98% 2.513G 2512840746 8.963 GiB

@codecov
Copy link

codecov bot commented Dec 31, 2022

Codecov Report

Base: 86.58% // Head: 85.69% // Decreases project coverage by -0.88% ⚠️

Coverage data is based on head (a47415d) compared to base (b6dccb3).
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff                @@
##           branch-23.02   #12382      +/-   ##
================================================
- Coverage         86.58%   85.69%   -0.89%     
================================================
  Files               155      155              
  Lines             24368    24803     +435     
================================================
+ Hits              21098    21255     +157     
- Misses             3270     3548     +278     
Impacted Files Coverage Δ
python/cudf/cudf/_version.py 1.41% <0.00%> (-98.59%) ⬇️
python/cudf/cudf/core/buffer/spill_manager.py 72.50% <0.00%> (-7.50%) ⬇️
python/cudf/cudf/core/buffer/spillable_buffer.py 90.04% <0.00%> (-2.81%) ⬇️
python/cudf/cudf/utils/dtypes.py 77.77% <0.00%> (-1.69%) ⬇️
python/cudf/cudf/options.py 86.11% <0.00%> (-1.59%) ⬇️
python/cudf/cudf/core/single_column_frame.py 94.30% <0.00%> (-1.27%) ⬇️
...ython/custreamz/custreamz/tests/test_dataframes.py 98.38% <0.00%> (-1.01%) ⬇️
python/dask_cudf/dask_cudf/io/parquet.py 91.81% <0.00%> (-0.59%) ⬇️
python/cudf/cudf/core/multiindex.py 91.66% <0.00%> (-0.51%) ⬇️
python/cudf/cudf/core/algorithms.py 90.00% <0.00%> (-0.48%) ⬇️
... and 37 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

}
return out_it;
return bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When is bytes != num_chars_written?
This function seems overly complex. Maybe I'm missing something?

constexpr size_type write_utf8_char(char_utf8 character, char*& out_it) {
   auto const bytes = (out_it == nullptr) ? strings::detail::bytes_in_char_utf8(character) : 
                                            strings::detail::from_char_utf8(character, out_it);
   out_it += bytes;
   return bytes;
}

There does not seem to be a need for the template as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, this is written to allow the compiler to unroll the loop.
I will change this to above suggestion.

col_size,
std::move(offsets),
std::move(chars),
cudf::detail::null_count(static_cast<bitmask_type*>(null_mask.data()), 0, col_size, stream),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was curious if this could be computed in the functor and if that would be faster than doing the null_count() calculation here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

null_count method will be faster because it handles 32-bit per thread, while functor will handle 1 bit per thread, which will eventually lead to lesser atomicAdds to global memory too.

Besides, functor's shared memory is not in our control since we don't know threads per block. So, we can't use BlockReduce, which will increase no of atomicAdds to the global memory location.

cpp/src/io/json/json_column.cu Outdated Show resolved Hide resolved
Copy link
Contributor

@vuule vuule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good, just some coding style suggestions.

return out_it;
auto const bytes = (out_it == nullptr) ? strings::detail::bytes_in_char_utf8(character)
: strings::detail::from_char_utf8(character, out_it);
if (out_it) out_it += bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (out_it) out_it += bytes;
if (out_it != nullpt) { out_it += bytes; }

__device__ void operator()(size_type idx)
{
if (not bit_is_set(null_mask, idx)) {
if (!d_chars) d_offsets[idx] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!d_chars) d_offsets[idx] = 0;
if (d_chars == nullptr) { d_offsets[idx] = 0; }


// Check if the value corresponds to the null literal
auto const is_null_literal =
(!d_chars) &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(!d_chars) &&
(d_chars == nullptr) &&

serialized_trie_contains(options.trie_na, {in_begin, static_cast<std::size_t>(num_in_chars)});
if (is_null_literal) {
clear_bit(null_mask, idx);
if (!d_chars) d_offsets[idx] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!d_chars) d_offsets[idx] = 0;
if (d_chars == nullptr) { d_offsets[idx] = 0; }

auto str_process_info = process_string(in_begin, in_end, d_buffer, options);
if (str_process_info.result != data_casting_result::PARSING_SUCCESS) {
clear_bit(null_mask, idx);
if (!d_chars) d_offsets[idx] = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!d_chars) d_offsets[idx] = 0;
if (d_chars == nullptr) { d_offsets[idx] = 0; }

clear_bit(null_mask, idx);
if (!d_chars) d_offsets[idx] = 0;
} else {
if (!d_chars) d_offsets[idx] = str_process_info.bytes;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!d_chars) d_offsets[idx] = str_process_info.bytes;
if (d_chars == nullptr) { d_offsets[idx] = str_process_info.bytes; }

@karthikeyann
Copy link
Contributor Author

karthikeyann commented Jan 10, 2023

just some coding style suggestions.

String processing code is full of if(out_ptr) style code. My preference is to follow the same style and also because it's concise.

@karthikeyann karthikeyann requested a review from vuule January 10, 2023 00:11
@vuule
Copy link
Contributor

vuule commented Jan 10, 2023

just some coding style suggestions.

String processing code is full of if(out_ptr) style code. My preference is to follow the same style and also because it's concise.

Well, I didn't follow the same style in my cudf::string changes and got @davidwendt 's approval. I think the takeaway is that it does not really matter :D.

Not a big deal, this is probably a job for a tool to enforce/apply.

@karthikeyann
Copy link
Contributor Author

\merge

@karthikeyann
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit 0b8eb42 into rapidsai:branch-23.02 Jan 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants