Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance improvement in JSON Tree traversal #11919

Merged

Conversation

karthikeyann
Copy link
Contributor

@karthikeyann karthikeyann commented Oct 14, 2022

Description

This PR improves performance of JSON Tree traversal - mainly in creation of column id.

  • Replaced per-level processing with two-level hash algorithm
  • Reduced memory usage for hash map (reduced oversubscription)

Other changes are

  • Fail if tokens has error token in tree generation
  • Created device_span version of device_parse_nested_json

Hits 2 GB/s in GV100 from 128MB json.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

reduces memory usage by 35% (1GB json takes 10.951GB instead of 16.957GB)
reduce peak memory usage (not total memory used)
reorder node_range, node_cat, scope limit token_levels
10.957 GiB -> 9.91 GiB -> 9.774 GiB -> 9.403 GiB
9.403 GiB to 8.487 GiB (for 1GB json input)
1. use insert_and_find to insert key, and update to inserted unique keys
2. get this unique keys, sort, and use lower_bound to translate all
keys to non-gap sequence
@karthikeyann karthikeyann added the 2 - In Progress Currently a work in progress label Oct 14, 2022
@karthikeyann
Copy link
Contributor Author

rerun tests

@ttnghia
Copy link
Contributor

ttnghia commented Oct 26, 2022

Something seems to be wrong with the new test. Did you try running it locally?

@karthikeyann
Copy link
Contributor Author

karthikeyann commented Oct 27, 2022

Did you try running it locally?

Yes. I did. Local run works.

01:50:54 NVIDIA/thrust#1 in /workspace/cpp/tests/utilities/identify_stream_usage/build/libidentify_stream_usage.so : cudaMemcpyAsync()+0x3b
This error is completely new to me. Any clue on what this is?

@upsj
Copy link
Contributor

upsj commented Oct 27, 2022

@karthikeyann this is the new error from #11875, checking that you use the provided stream everywhere. Seems like there is still a default stream lurking somewhere. By LD_PRELOADing libidentify_stream_usage.so, you should be able to catch the issue with a catchpoint in gdb (catch throw) or maybe even identify the location from the backtrace.

Copy link
Contributor

@upsj upsj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the stream issues

// Get the JSON's tree representation
CUDF_EXPECT_THROW_MESSAGE(
cuio_json::detail::get_tree_representation(tokens_gpu, token_indices_gpu, stream),
"JSON Parser encountered an invalid format at location 6");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe we should be using terms we also use inside the algorithm?

Suggested change
"JSON Parser encountered an invalid format at location 6");
"JSON Parser encountered an invalid token at offset 6");

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant to give more useful error for user to look into the text input. I would prefer to give information on the user input text location instead of internal tokens and offsets, so that the user can check the input for errors.

uq_node_id = col_id.begin()] __device__(auto node_id) mutable {
// typename hash_map_type::value_type const insert_pair{};
auto it = view.insert_and_find(cuco::make_pair(node_id, node_id), d_hashed_cache, d_equal);
uq_node_id[node_id] = (it.first)->first; // first.load(cuda::std::memory_order_relaxed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure: What is the purpose of this comment? I would assume cuco ensures that the writes to (it.first)->first cannot be reordered/observed after the completion of insert_and_find?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PointKernel suggested to use cuda::std::memory_order_relaxed while using find at
https://github.com/karthikeyann/cudf/blob/ed1a3d5606991e23e3640df734dff1ba3ffabfd4/cpp/src/io/json/json_tree.cu#L446-L447

    auto const it = key_map.find(node_id, d_hasher, d_equal);
    return (it == key_map.end()) ? size_type{0} : it->second.load(cuda::std::memory_order_relaxed);

@PointKernel does it make sense to use it here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the memory order doesn't matter here so we can use the least expensive one (which is the relaxed memory order).

cpp/src/io/json/json_column.cu Outdated Show resolved Hide resolved
cpp/src/io/json/json_tree.cu Outdated Show resolved Hide resolved
@karthikeyann
Copy link
Contributor Author

rerun tests

1 similar comment
@karthikeyann
Copy link
Contributor Author

rerun tests

@karthikeyann karthikeyann added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 4 - Needs Review Waiting for reviewer to review or respond 4 - Needs cuIO Reviewer 3 - Ready for Review Ready for review by team labels Oct 27, 2022
@karthikeyann
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit aaf251d into rapidsai:branch-22.12 Oct 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change Performance Performance related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants