Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Nested Json benchmark #11466

Merged
merged 88 commits into from
Sep 1, 2022
Merged
Changes from 1 commit
Commits
Show all changes
88 commits
Select commit Hold shift + click to select a range
0557d41
squashed with bracket/brace test
elstehle Apr 11, 2022
355d1e4
clean up & addressing review comments
elstehle Apr 20, 2022
39a6b65
refactored lookup tables
elstehle Apr 25, 2022
239f138
put lookup tables into their own cudf file
elstehle Apr 25, 2022
39cff80
Change interface for FST to not need temp storage
elstehle Apr 27, 2022
e24a133
removing unused var post-cleanup
elstehle May 4, 2022
caf6195
unified usage of pragma unrolls
elstehle May 9, 2022
ea79a81
Adding hostdevice macros to in-reg array
elstehle May 9, 2022
17dcbfd
making const vars const
elstehle May 9, 2022
6fdd24a
refactor lut sanity check
elstehle May 9, 2022
eccf970
fixes sg-count & uses rmm stream in fst tests
elstehle Jun 2, 2022
9fe8e4b
minor doxygen fix
elstehle Jun 14, 2022
694a365
adopts suggested fst test changes
elstehle Jun 15, 2022
f656f49
adopts device-side test data gen
elstehle Jul 7, 2022
485a1c6
adopts c++17 namespaces declarations
elstehle Jul 9, 2022
5f1c4b5
removes state vector-wrapper in favor of vanilla array
elstehle Jul 11, 2022
e6f8def
some west-const remainders & unifies StateIndexT
elstehle Jul 11, 2022
a798852
adds check for state transition narrowing conversion
elstehle Jul 11, 2022
eb24962
fixes logical stack test includes
elstehle Jul 12, 2022
f52e614
replaces enum with typed constexpr
elstehle Jul 14, 2022
3038058
adds excplitis error checking
elstehle Jul 14, 2022
d351e5c
addresses style review comments & fixes a todo
elstehle Jul 14, 2022
3f47952
replaces gtest asserts with expects
elstehle Jul 14, 2022
cba1619
fixes style in dispatch dfa
elstehle Jul 14, 2022
bea2a02
replaces vanilla loop with iota
elstehle Jul 15, 2022
8a184e9
rephrases documentation on in-reg array
elstehle Jul 16, 2022
78dd893
Merge remote-tracking branch 'upstream/branch-22.08' into feature/fin…
elstehle Jul 16, 2022
7a19f64
Merge remote-tracking branch 'upstream/branch-22.08' into feature/fin…
elstehle Jul 19, 2022
4783aae
improves style in fst test
elstehle Jul 20, 2022
6203709
adds comments in in_reg array
elstehle Jul 20, 2022
ad5817a
adds comments to lookup tables
elstehle Jul 20, 2022
dc55653
fixes formatting
elstehle Jul 20, 2022
378be9f
exchanges loops in favor of copy and fills
elstehle Jul 20, 2022
4ba5472
clarifies documentation in agent dfa
elstehle Jul 20, 2022
7980978
disambiguates transition and translation tables
elstehle Jul 20, 2022
2bce061
minor style fix
elstehle Jul 21, 2022
b37f716
if constexprs and doxy on DFA helper
elstehle Jul 21, 2022
d42869a
minor documentation fix
elstehle Jul 21, 2022
6c889f7
replaces loop for comparing vectors with generic macro
elstehle Jul 21, 2022
8a54c72
uses new vector comparison for logical stack test
elstehle Jul 21, 2022
cc1e135
Added utility to debug print & instrumented code to use it
elstehle Mar 31, 2022
7dba177
switched to using rmm also inside algorithm
elstehle Mar 31, 2022
ff7144a
renaming key-value store op to stack_op
elstehle Apr 4, 2022
61a76b7
device_span
elstehle Apr 4, 2022
c28e327
minor style changes addressing review comments
elstehle Apr 13, 2022
a2f27ae
squashed with bracket/brace test
elstehle Apr 11, 2022
fe4762d
refactored lookup tables
elstehle Apr 25, 2022
a064bdd
put lookup tables into their own cudf file
elstehle Apr 25, 2022
2c729c0
fixes sg-count & uses rmm stream in fst tests
elstehle Jun 2, 2022
dbefb6c
rebase on latest FST
elstehle May 3, 2022
d54f3e5
fixes breaking changes from dependent-FST-PR
elstehle Jun 2, 2022
5fc3399
fixes for breaking downstream interface changes
elstehle Jul 13, 2022
6f65947
wraps if with stream params into detail ns
elstehle Jul 13, 2022
6ffc7f3
renames enums & moving from device_span to ptr params
elstehle Jul 14, 2022
0a7821e
fixes rebase conflicts
elstehle Jul 21, 2022
7396335
fixes escape sequence inside strings and field names and adds test fo…
elstehle Jul 21, 2022
6252208
adds comments on pda transition table states
elstehle Jul 21, 2022
191d71d
adopts new verification macro in test
elstehle Jul 22, 2022
4e99962
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
karthikeyann Jul 22, 2022
3b9a1ed
removes superfluous semicolons
elstehle Jul 22, 2022
632be35
rearranges token order in enum and adds documentation
elstehle Jul 23, 2022
3237772
uses namespace alias and switch to rmm stream in test
elstehle Jul 23, 2022
d2832b9
drops the gpu namespace
elstehle Jul 24, 2022
618ed3f
renames header file extension from h to hpp
elstehle Jul 24, 2022
19b37b7
squashed with minimal example
elstehle Jul 27, 2022
f015594
add parse_json_to_columns -> cudf::column
karthikeyann Jul 27, 2022
389e8e8
+ wraps dbg print in macro
elstehle Jul 27, 2022
42f7c4a
disables debug print by default
elstehle Jul 27, 2022
3e9db89
Merge remote-tracking branch 'origin/feature/json-to-columnar' into f…
elstehle Jul 27, 2022
ecf68fc
changeing interface of get_json_columns to also take device_span
elstehle Jul 28, 2022
93cbe1a
parsing to table_with_metadata
elstehle Jul 28, 2022
a1d8901
removes debug print examples
elstehle Jul 28, 2022
b9296d6
renames lists child col name to elements
elstehle Jul 28, 2022
5eddcc7
adds validity
elstehle Jul 28, 2022
cfcd7a1
fixes style
elstehle Jul 28, 2022
4c2ea7b
minor cleanup
karthikeyann Jul 28, 2022
8fc3adc
use device_uvector at few places
karthikeyann Jul 28, 2022
c1b9213
fixes metadata to match parquets metadata
elstehle Jul 30, 2022
e5b1ba6
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
karthikeyann Aug 1, 2022
fd02437
use make_device_uvector_async
karthikeyann Aug 1, 2022
f5810f8
Apply suggestions from code review (nvdbaranec)
karthikeyann Aug 1, 2022
342d3c3
added mr
karthikeyann Aug 1, 2022
f6a531a
add utf fail, pass test cases
karthikeyann Aug 2, 2022
2f22899
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
karthikeyann Aug 4, 2022
a141110
add NESTED_JSON_NVBENCH
karthikeyann Aug 4, 2022
188b140
Merge branch 'branch-22.10' of github.com:rapidsai/cudf into fea-json…
karthikeyann Aug 23, 2022
ba3483f
remove merge missed files
karthikeyann Aug 23, 2022
cad1060
address review comments
karthikeyann Aug 23, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
improves style in fst test
elstehle committed Jul 20, 2022
commit 4783aae2aabbd03f9a439822ddd02d0328b5d52a
26 changes: 11 additions & 15 deletions cpp/tests/io/fst/fst_test.cu
Original file line number Diff line number Diff line change
@@ -89,22 +89,18 @@ static std::pair<OutputItT, IndexOutputItT> fst_baseline(InputItT begin,
// The symbol currently being read
auto const& symbol = *it;

std::size_t symbol_group = 0;

// Iterate over symbol groups and search for the first symbol group containing the current
// symbol
for (auto const& sg : symbol_group_lut) {
if (std::find(std::cbegin(sg), std::cend(sg), symbol) != std::cend(sg)) { break; }
symbol_group++;
}
// symbol, if no match is found we use cend(symbol_group_lut) as the "catch-all" symbol group
auto symbol_group_it = std::find_if(std::cbegin(symbol_group_lut), std::cend(symbol_group_lut),
[symbol](auto& sg) {
return std::find(std::cbegin(sg), std::cend(sg), symbol) != std::cend(sg);
});
auto symbol_group = std::distance(std::cbegin(symbol_group_lut), symbol_group_it);

// Output the translated symbols to the output tape
for (auto out : translation_table[state][symbol_group]) {
*out_tape = out;
++out_tape;
*out_index_tape = in_offset;
out_index_tape++;
}
out_tape = std::copy(std::cbegin(translation_table[state][symbol_group]), std::cend(translation_table[state][symbol_group]), out_tape);
auto out_size = std::distance(std::cbegin(translation_table[state][symbol_group]), std::cend(translation_table[state][symbol_group]));
out_index_tape = std::fill_n(out_index_tape, out_size, in_offset);

// Transition the state of the finite-state machine
state = transition_table[state][symbol_group];
@@ -128,7 +124,7 @@ enum DFA_STATES : char {
TT_STR,
// The state being active after encountering an escape symbol (e.g., '\') while being in the
// TT_STR state.
TT_ESC [[maybe_unused]],
TT_ESC,
// Total number of states
TT_NUM_STATES
};
@@ -149,7 +145,7 @@ enum PDA_SG_ID {
const std::vector<std::vector<char>> pda_state_tt = {
/* IN_STATE { [ } ] " \ OTHER */
/* TT_OOS */ {TT_OOS, TT_OOS, TT_OOS, TT_OOS, TT_STR, TT_OOS, TT_OOS},
/* TT_STR */ {TT_STR, TT_STR, TT_STR, TT_STR, TT_OOS, TT_STR, TT_STR},
/* TT_STR */ {TT_STR, TT_STR, TT_STR, TT_STR, TT_OOS, TT_ESC, TT_STR},
/* TT_ESC */ {TT_STR, TT_STR, TT_STR, TT_STR, TT_STR, TT_STR, TT_STR}};

// Translation table (i.e., for each transition, what are the symbols that we output)