Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Nested Json benchmark #11466

Merged
merged 88 commits into from
Sep 1, 2022

Conversation

karthikeyann
Copy link
Contributor

@karthikeyann karthikeyann commented Aug 4, 2022

Description

Adds nvbench for nested json parser.
Depends on #11388

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

elstehle added 30 commits July 13, 2022 00:53
@karthikeyann karthikeyann added this to the Nested JSON reader milestone Aug 4, 2022
@karthikeyann karthikeyann requested review from a team as code owners August 4, 2022 16:31
@karthikeyann karthikeyann self-assigned this Aug 4, 2022
@github-actions github-actions bot added the CMake CMake build issue label Aug 4, 2022
@codecov
Copy link

codecov bot commented Aug 4, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@da6b3ed). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.10   #11466   +/-   ##
===============================================
  Coverage                ?   86.40%           
===============================================
  Files                   ?      145           
  Lines                   ?    22959           
  Branches                ?        0           
===============================================
  Hits                    ?    19838           
  Misses                  ?     3121           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.


#include <benchmarks/common/generate_input.hpp>
#include <benchmarks/fixture/rmm_pool_raii.hpp>
#include <nvbench/nvbench.cuh>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this file to be cu file in order to include this header?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a hpp version of this file available?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You only need to be .cu file if you are doing device calls.
Including a .cuh file is fine as long as you are not using any of the device functions (or data) in it.

Copy link
Member

@PointKernel PointKernel Aug 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we place this include right before #include <cstdlib> since nvbench header is further than cudf ones?

auto& d_string_scalar = static_cast<cudf::string_scalar&>(*d_input_scalar);
auto d_scalar = cudf::strings::repeat_string(d_string_scalar, repeat_times);
auto& d_input = static_cast<cudf::scalar_type_t<std::string>&>(*d_scalar);
auto generated_json = std::string(d_input);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, are you constructing a host string from device string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. only due to limitation of parse_nested_json at the moment. This will change once the API is complete.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant you're reading device memory from the host side, which is typically illegal. Should this be invalid and cause crash?

Copy link
Contributor Author

@karthikeyann karthikeyann Aug 24, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is the operator std::string operator in string_scalar which takes care of allocation and copying data to host.

explicit operator std::string() const;

@karthikeyann karthikeyann requested review from ttnghia and isVoid August 24, 2022 03:52
cpp/benchmarks/io/json/nested_json.cpp Show resolved Hide resolved
auto const string_size{size_type(state.get_int64("string_size"))};

auto input = make_test_json_data(string_size, cudf::default_stream_value);
state.add_element_count(input.size());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this show/compute throughput?
Here is an example of nvbench throughput https://github.com/NVIDIA/nvbench/blob/main/docs/benchmarks.md#throughput-measurements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. This is very helpful. 👍
I wanted to see overall chars/sec throughput only. It's difficult to track the number of global memory reads, write manually for this entire algorithm. so, I skipped it.

@karthikeyann
Copy link
Contributor Author

@gpucibot merge

@rapids-bot rapids-bot bot merged commit e5c8776 into rapidsai:branch-22.10 Sep 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2 - In Progress Currently a work in progress CMake CMake build issue feature request New feature or request libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants