Adds Nested Json benchmark #11466

karthikeyann · 2022-08-04T16:31:08Z

Description

Adds nvbench for nested json parser.
Depends on #11388

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

…ite-state-transducer-trimmed

codecov · 2022-08-04T18:31:35Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@da6b3ed). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.10   #11466   +/-   ##
===============================================
  Coverage                ?   86.40%           
===============================================
  Files                   ?      145           
  Lines                   ?    22959           
  Branches                ?        0           
===============================================
  Hits                    ?    19838           
  Misses                  ?     3121           
  Partials                ?        0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

isVoid · 2022-08-04T22:48:35Z

cpp/benchmarks/io/json/nested_json.cpp

+
+#include <benchmarks/common/generate_input.hpp>
+#include <benchmarks/fixture/rmm_pool_raii.hpp>
+#include <nvbench/nvbench.cuh>


Do we need this file to be cu file in order to include this header?

is there a hpp version of this file available?

You only need to be .cu file if you are doing device calls.
Including a .cuh file is fine as long as you are not using any of the device functions (or data) in it.

Should we place this include right before #include <cstdlib> since nvbench header is further than cudf ones?

cpp/benchmarks/io/json/nested_json.cpp

ttnghia · 2022-08-09T20:14:51Z

cpp/benchmarks/io/json/nested_json.cpp

+  auto& d_string_scalar = static_cast<cudf::string_scalar&>(*d_input_scalar);
+  auto d_scalar         = cudf::strings::repeat_string(d_string_scalar, repeat_times);
+  auto& d_input         = static_cast<cudf::scalar_type_t<std::string>&>(*d_scalar);
+  auto generated_json   = std::string(d_input);


Wait, are you constructing a host string from device string?

yes. only due to limitation of parse_nested_json at the moment. This will change once the API is complete.

I meant you're reading device memory from the host side, which is typically illegal. Should this be invalid and cause crash?

There is the operator std::string operator in string_scalar which takes care of allocation and copying data to host.

cudf/cpp/include/cudf/scalar/scalar.hpp

Line 519 in f94146b

explicit operator std::string() const;

cpp/benchmarks/io/json/nested_json.cpp

…_benchmark

cpp/benchmarks/io/json/nested_json.cpp

davidwendt · 2022-08-24T12:50:33Z

cpp/benchmarks/io/json/nested_json.cpp

+  auto const string_size{size_type(state.get_int64("string_size"))};
+
+  auto input = make_test_json_data(string_size, cudf::default_stream_value);
+  state.add_element_count(input.size());


Will this show/compute throughput?
Here is an example of nvbench throughput https://github.com/NVIDIA/nvbench/blob/main/docs/benchmarks.md#throughput-measurements

Thank you. This is very helpful. 👍
I wanted to see overall chars/sec throughput only. It's difficult to track the number of global memory reads, write manually for this entire algorithm. so, I skipped it.

karthikeyann · 2022-09-01T11:38:45Z

@gpucibot merge

Fixes compile error introduced in PR #11466 due to mismatched changes occurring in PR #11534 https://gpuci.gpuopenanalytics.com/job/rapidsai/job/gpuci/job/cudf/job/prb/job/cudf-cpu-cuda-build/CUDA=11.5/11851/console Authors: - David Wendt (https://github.com/davidwendt) Approvers: - Elias Stehle (https://github.com/elstehle) - Karthikeyan (https://github.com/karthikeyann) - Nghia Truong (https://github.com/ttnghia) URL: #11637

elstehle added 30 commits July 13, 2022 00:53

squashed with bracket/brace test

0557d41

clean up & addressing review comments

355d1e4

refactored lookup tables

39a6b65

put lookup tables into their own cudf file

239f138

Change interface for FST to not need temp storage

39cff80

removing unused var post-cleanup

e24a133

unified usage of pragma unrolls

caf6195

Adding hostdevice macros to in-reg array

ea79a81

making const vars const

17dcbfd

refactor lut sanity check

6fdd24a

fixes sg-count & uses rmm stream in fst tests

eccf970

minor doxygen fix

9fe8e4b

adopts suggested fst test changes

694a365

adopts device-side test data gen

f656f49

adopts c++17 namespaces declarations

485a1c6

removes state vector-wrapper in favor of vanilla array

5f1c4b5

some west-const remainders & unifies StateIndexT

e6f8def

adds check for state transition narrowing conversion

a798852

fixes logical stack test includes

eb24962

replaces enum with typed constexpr

f52e614

adds excplitis error checking

3038058

addresses style review comments & fixes a todo

d351e5c

replaces gtest asserts with expects

3f47952

fixes style in dispatch dfa

cba1619

replaces vanilla loop with iota

bea2a02

rephrases documentation on in-reg array

8a184e9

Merge remote-tracking branch 'upstream/branch-22.08' into feature/fin…

78dd893

…ite-state-transducer-trimmed

Merge remote-tracking branch 'upstream/branch-22.08' into feature/fin…

7a19f64

…ite-state-transducer-trimmed

improves style in fst test

4783aae

adds comments in in_reg array

6203709

karthikeyann added this to the Nested JSON reader milestone Aug 4, 2022

karthikeyann requested review from a team as code owners August 4, 2022 16:31

karthikeyann self-assigned this Aug 4, 2022

karthikeyann requested review from harrism and davidwendt August 4, 2022 16:31

github-actions bot added the CMake CMake build issue label Aug 4, 2022

isVoid reviewed Aug 4, 2022

View reviewed changes

ttnghia reviewed Aug 9, 2022

View reviewed changes

cpp/benchmarks/io/json/nested_json.cpp Show resolved Hide resolved

ttnghia reviewed Aug 9, 2022

View reviewed changes

cpp/benchmarks/io/json/nested_json.cpp Outdated Show resolved Hide resolved

ttnghia reviewed Aug 9, 2022

View reviewed changes

cpp/benchmarks/io/json/nested_json.cpp Outdated Show resolved Hide resolved

karthikeyann added 3 commits August 24, 2022 00:48

Merge branch 'branch-22.10' of github.com:rapidsai/cudf into fea-json…

188b140

…_benchmark

remove merge missed files

ba3483f

address review comments

cad1060

karthikeyann requested review from ttnghia and isVoid August 24, 2022 03:52

davidwendt requested changes Aug 24, 2022

View reviewed changes

karthikeyann requested a review from davidwendt August 29, 2022 05:29

davidwendt approved these changes Aug 29, 2022

View reviewed changes

ttnghia approved these changes Aug 30, 2022

View reviewed changes

rapids-bot bot merged commit e5c8776 into rapidsai:branch-22.10 Sep 1, 2022

davidwendt mentioned this pull request Sep 1, 2022

Fix compile error in benchmark nested_json.cpp #11637

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adds Nested Json benchmark #11466

Adds Nested Json benchmark #11466

karthikeyann commented Aug 4, 2022 •

edited

Loading

codecov bot commented Aug 4, 2022 •

edited

Loading

isVoid Aug 4, 2022

karthikeyann Aug 23, 2022

davidwendt Aug 23, 2022

PointKernel Aug 29, 2022 •

edited

Loading

ttnghia Aug 9, 2022

karthikeyann Aug 23, 2022

ttnghia Aug 24, 2022

karthikeyann Aug 24, 2022 •

edited

Loading

davidwendt Aug 24, 2022

karthikeyann Aug 29, 2022

karthikeyann commented Sep 1, 2022

Adds Nested Json benchmark #11466

Adds Nested Json benchmark #11466

Conversation

karthikeyann commented Aug 4, 2022 • edited Loading

Description

Checklist

codecov bot commented Aug 4, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

PointKernel Aug 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karthikeyann Aug 24, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

karthikeyann commented Sep 1, 2022

karthikeyann commented Aug 4, 2022 •

edited

Loading

codecov bot commented Aug 4, 2022 •

edited

Loading

PointKernel Aug 29, 2022 •

edited

Loading

karthikeyann Aug 24, 2022 •

edited

Loading