Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement in JSON Tree traversal #11919

Merged
Merged
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
a75b0a5
fix the right condition for parent_node propagation initial condition
karthikeyann Sep 30, 2022
4abfb51
parent_node_id generation using only nodes instead of tokens
karthikeyann Oct 3, 2022
efb6621
reduce node_ids memory (not impacting peak memory)
karthikeyann Oct 6, 2022
5f250cb
reorder node_range, node_cat, scope limit token_levels
karthikeyann Oct 6, 2022
49cb0d7
use cub SortPairs to reduce memory
karthikeyann Oct 6, 2022
02a7b5b
reduce memory by cub::DoubleBuffer, scope limit token_id_for_nodes
karthikeyann Oct 6, 2022
9243d89
cleanup
karthikeyann Oct 7, 2022
7efc890
reorganize parent_node_ids algorithm (generic logical stack)
karthikeyann Oct 7, 2022
6d3a166
include CUDF_PUSH_RANGE, CUDF_POP_RANGE nvtx macros
karthikeyann Oct 7, 2022
bbcbffa
replace TreeDepthT with size_type due to cuda Invalid Device function…
karthikeyann Oct 7, 2022
483abf1
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…
karthikeyann Oct 7, 2022
f9f0926
update docs
karthikeyann Oct 7, 2022
f851232
remove nvtx range macros and debug prints
karthikeyann Oct 7, 2022
55369c9
remove nvtx macros
karthikeyann Oct 7, 2022
5eefd64
NVTX RANGES macros commit
karthikeyann Oct 7, 2022
3bb54f4
Revert "NVTX RANGES macros commit"
karthikeyann Oct 7, 2022
b70669d
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…
karthikeyann Oct 7, 2022
5a0a9a7
address review comments (upsj)
karthikeyann Oct 11, 2022
8578a22
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…
karthikeyann Oct 11, 2022
952a6ed
create device_parse_nested_json2(device_span,)
karthikeyann Oct 13, 2022
cc573dc
error out on errortoken location in tree construction
karthikeyann Oct 13, 2022
4b4e337
reduces oversubscription field hash_map
karthikeyann Oct 13, 2022
ce0bb99
new two-level hashing method for col_id generation
karthikeyann Oct 13, 2022
76d60f0
num_fields calculation fix bug
karthikeyann Oct 13, 2022
0d0f7b9
update algorithm with insert_and_find and binary_search
karthikeyann Oct 13, 2022
f864131
cleanup
karthikeyann Oct 14, 2022
5d8cf31
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…
karthikeyann Oct 14, 2022
f923150
include cleanup
karthikeyann Oct 14, 2022
11d41e9
remove old generate_column_id code
karthikeyann Oct 14, 2022
263bfa7
style fix
karthikeyann Oct 14, 2022
2fea3ed
move sequence to compute_row_offsets
karthikeyann Oct 14, 2022
f7c2919
remove unused old code
karthikeyann Oct 14, 2022
1d22d97
move code to generate_column_id function
karthikeyann Oct 15, 2022
1f1a6b9
limit scan to list childs alone in compute_row_offsets
karthikeyann Oct 17, 2022
9cd3ee9
cleanup
karthikeyann Oct 17, 2022
3ff0d28
clean up generate_column_id, scope limit node_type
karthikeyann Oct 19, 2022
93b0a44
remove nvtx ranges tags for profiling
karthikeyann Oct 19, 2022
db3ba48
cleanup, const d_tree
karthikeyann Oct 20, 2022
753d470
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…
karthikeyann Oct 20, 2022
66cbaa0
use count instead of find for token error, add unit test
karthikeyann Oct 26, 2022
f36496e
auto const almost everything
karthikeyann Oct 26, 2022
e17a9b2
algorithm details as comments to non-doxygen style
karthikeyann Oct 26, 2022
ed1a3d5
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…
karthikeyann Oct 26, 2022
8dd3e37
merge fix get_default_stream()
karthikeyann Oct 26, 2022
acfbad1
add const to variables, rename test
karthikeyann Oct 27, 2022
a543c32
replace device_ptr with cudaMemcpy on stream
karthikeyann Oct 27, 2022
a22b7cc
Merge branch 'branch-22.12' of github.com:rapidsai/cudf into enh-json…
karthikeyann Oct 27, 2022
f3f5c28
use memory order relaxed load in hash key read
karthikeyann Oct 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 17 additions & 5 deletions cpp/src/io/json/json_column.cu
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@
#include <rmm/exec_policy.hpp>

#include <thrust/count.h>
#include <thrust/device_ptr.h>
#include <thrust/for_each.h>
#include <thrust/functional.h>
#include <thrust/iterator/counting_iterator.h>
Expand Down Expand Up @@ -722,24 +723,23 @@ std::pair<std::unique_ptr<column>, std::vector<column_name_info>> device_json_co
}
}

table_with_metadata device_parse_nested_json(host_span<SymbolT const> input,
table_with_metadata device_parse_nested_json(device_span<SymbolT const> d_input,
cudf::io::json_reader_options const& options,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();

// Allocate device memory for the JSON input & copy over to device
rmm::device_uvector<SymbolT> d_input = cudf::detail::make_device_uvector_async(input, stream);

auto gpu_tree = [&]() {
// Parse the JSON and get the token stream
const auto [tokens_gpu, token_indices_gpu] = get_token_stream(d_input, options, stream);
// gpu tree generation
return get_tree_representation(tokens_gpu, token_indices_gpu, stream);
}(); // IILE used to free memory of token data.
#ifdef NJP_DEBUG_PRINT
print_tree(input, gpu_tree, stream);
thrust::host_vector<char> h_input(thrust::device_pointer_cast(d_input.begin()),
karthikeyann marked this conversation as resolved.
Show resolved Hide resolved
thrust::device_pointer_cast(d_input.end()));
print_tree(h_input, gpu_tree, stream);
#endif

auto [gpu_col_id, gpu_row_offsets] = records_orient_tree_traversal(d_input, gpu_tree, stream);
Expand Down Expand Up @@ -841,5 +841,17 @@ table_with_metadata device_parse_nested_json(host_span<SymbolT const> input,
{{}, out_column_names}};
}

table_with_metadata device_parse_nested_json(host_span<SymbolT const> input,
cudf::io::json_reader_options const& options,
rmm::cuda_stream_view stream,
rmm::mr::device_memory_resource* mr)
{
CUDF_FUNC_RANGE();

// Allocate device memory for the JSON input & copy over to device
rmm::device_uvector<SymbolT> d_input = cudf::detail::make_device_uvector_async(input, stream);

return device_parse_nested_json(device_span<SymbolT const>{d_input}, options, stream, mr);
}
} // namespace detail
} // namespace cudf::io::json
Loading