Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds the end-to-end JSON parser implementation #11388

Merged
merged 97 commits into from
Aug 12, 2022
Merged
Changes from 1 commit
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
0557d41
squashed with bracket/brace test
elstehle Apr 11, 2022
355d1e4
clean up & addressing review comments
elstehle Apr 20, 2022
39a6b65
refactored lookup tables
elstehle Apr 25, 2022
239f138
put lookup tables into their own cudf file
elstehle Apr 25, 2022
39cff80
Change interface for FST to not need temp storage
elstehle Apr 27, 2022
e24a133
removing unused var post-cleanup
elstehle May 4, 2022
caf6195
unified usage of pragma unrolls
elstehle May 9, 2022
ea79a81
Adding hostdevice macros to in-reg array
elstehle May 9, 2022
17dcbfd
making const vars const
elstehle May 9, 2022
6fdd24a
refactor lut sanity check
elstehle May 9, 2022
eccf970
fixes sg-count & uses rmm stream in fst tests
elstehle Jun 2, 2022
9fe8e4b
minor doxygen fix
elstehle Jun 14, 2022
694a365
adopts suggested fst test changes
elstehle Jun 15, 2022
f656f49
adopts device-side test data gen
elstehle Jul 7, 2022
485a1c6
adopts c++17 namespaces declarations
elstehle Jul 9, 2022
5f1c4b5
removes state vector-wrapper in favor of vanilla array
elstehle Jul 11, 2022
e6f8def
some west-const remainders & unifies StateIndexT
elstehle Jul 11, 2022
a798852
adds check for state transition narrowing conversion
elstehle Jul 11, 2022
eb24962
fixes logical stack test includes
elstehle Jul 12, 2022
f52e614
replaces enum with typed constexpr
elstehle Jul 14, 2022
3038058
adds excplitis error checking
elstehle Jul 14, 2022
d351e5c
addresses style review comments & fixes a todo
elstehle Jul 14, 2022
3f47952
replaces gtest asserts with expects
elstehle Jul 14, 2022
cba1619
fixes style in dispatch dfa
elstehle Jul 14, 2022
bea2a02
replaces vanilla loop with iota
elstehle Jul 15, 2022
8a184e9
rephrases documentation on in-reg array
elstehle Jul 16, 2022
78dd893
Merge remote-tracking branch 'upstream/branch-22.08' into feature/fin…
elstehle Jul 16, 2022
7a19f64
Merge remote-tracking branch 'upstream/branch-22.08' into feature/fin…
elstehle Jul 19, 2022
4783aae
improves style in fst test
elstehle Jul 20, 2022
6203709
adds comments in in_reg array
elstehle Jul 20, 2022
ad5817a
adds comments to lookup tables
elstehle Jul 20, 2022
dc55653
fixes formatting
elstehle Jul 20, 2022
378be9f
exchanges loops in favor of copy and fills
elstehle Jul 20, 2022
4ba5472
clarifies documentation in agent dfa
elstehle Jul 20, 2022
7980978
disambiguates transition and translation tables
elstehle Jul 20, 2022
2bce061
minor style fix
elstehle Jul 21, 2022
b37f716
if constexprs and doxy on DFA helper
elstehle Jul 21, 2022
d42869a
minor documentation fix
elstehle Jul 21, 2022
6c889f7
replaces loop for comparing vectors with generic macro
elstehle Jul 21, 2022
8a54c72
uses new vector comparison for logical stack test
elstehle Jul 21, 2022
cc1e135
Added utility to debug print & instrumented code to use it
elstehle Mar 31, 2022
7dba177
switched to using rmm also inside algorithm
elstehle Mar 31, 2022
ff7144a
renaming key-value store op to stack_op
elstehle Apr 4, 2022
61a76b7
device_span
elstehle Apr 4, 2022
c28e327
minor style changes addressing review comments
elstehle Apr 13, 2022
a2f27ae
squashed with bracket/brace test
elstehle Apr 11, 2022
fe4762d
refactored lookup tables
elstehle Apr 25, 2022
a064bdd
put lookup tables into their own cudf file
elstehle Apr 25, 2022
2c729c0
fixes sg-count & uses rmm stream in fst tests
elstehle Jun 2, 2022
dbefb6c
rebase on latest FST
elstehle May 3, 2022
d54f3e5
fixes breaking changes from dependent-FST-PR
elstehle Jun 2, 2022
5fc3399
fixes for breaking downstream interface changes
elstehle Jul 13, 2022
6f65947
wraps if with stream params into detail ns
elstehle Jul 13, 2022
6ffc7f3
renames enums & moving from device_span to ptr params
elstehle Jul 14, 2022
0a7821e
fixes rebase conflicts
elstehle Jul 21, 2022
7396335
fixes escape sequence inside strings and field names and adds test fo…
elstehle Jul 21, 2022
6252208
adds comments on pda transition table states
elstehle Jul 21, 2022
191d71d
adopts new verification macro in test
elstehle Jul 22, 2022
4e99962
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
karthikeyann Jul 22, 2022
3b9a1ed
removes superfluous semicolons
elstehle Jul 22, 2022
632be35
rearranges token order in enum and adds documentation
elstehle Jul 23, 2022
3237772
uses namespace alias and switch to rmm stream in test
elstehle Jul 23, 2022
d2832b9
drops the gpu namespace
elstehle Jul 24, 2022
618ed3f
renames header file extension from h to hpp
elstehle Jul 24, 2022
19b37b7
squashed with minimal example
elstehle Jul 27, 2022
f015594
add parse_json_to_columns -> cudf::column
karthikeyann Jul 27, 2022
389e8e8
+ wraps dbg print in macro
elstehle Jul 27, 2022
42f7c4a
disables debug print by default
elstehle Jul 27, 2022
3e9db89
Merge remote-tracking branch 'origin/feature/json-to-columnar' into f…
elstehle Jul 27, 2022
ecf68fc
changeing interface of get_json_columns to also take device_span
elstehle Jul 28, 2022
93cbe1a
parsing to table_with_metadata
elstehle Jul 28, 2022
a1d8901
removes debug print examples
elstehle Jul 28, 2022
b9296d6
renames lists child col name to elements
elstehle Jul 28, 2022
5eddcc7
adds validity
elstehle Jul 28, 2022
cfcd7a1
fixes style
elstehle Jul 28, 2022
4c2ea7b
minor cleanup
karthikeyann Jul 28, 2022
8fc3adc
use device_uvector at few places
karthikeyann Jul 28, 2022
c1b9213
fixes metadata to match parquets metadata
elstehle Jul 30, 2022
e5b1ba6
Merge branch 'branch-22.08' of https://github.com/rapidsai/cudf into …
karthikeyann Aug 1, 2022
fd02437
use make_device_uvector_async
karthikeyann Aug 1, 2022
f5810f8
Apply suggestions from code review (nvdbaranec)
karthikeyann Aug 1, 2022
342d3c3
added mr
karthikeyann Aug 1, 2022
f6a531a
add utf fail, pass test cases
karthikeyann Aug 2, 2022
2f22899
Merge branch 'branch-22.10' of https://github.com/rapidsai/cudf into …
karthikeyann Aug 4, 2022
1bfa0ff
fixes cudf column generation for struct cols without child cols
elstehle Aug 6, 2022
c8ae264
Merge remote-tracking branch 'upstream/branch-22.10' into feature/jso…
elstehle Aug 6, 2022
17f44e5
integrates upstream changes
elstehle Aug 6, 2022
25cd354
adds function to compare schema metadata
elstehle Aug 6, 2022
f839bb0
adds more complex test case with lots of nesting and corner cases
elstehle Aug 6, 2022
25d783c
fixes year in copyright notice
elstehle Aug 6, 2022
1860126
uses host-side bitmask_type buffer
elstehle Aug 8, 2022
7711736
addresses review comments
elstehle Aug 11, 2022
9d3c2e0
removes prints from tests
elstehle Aug 11, 2022
da6bc9f
Merge remote-tracking branch 'upstream/branch-22.10' into feature/jso…
elstehle Aug 11, 2022
165f3b7
adds support for streaming
elstehle Aug 12, 2022
bf99646
makes append_row a json_column member
elstehle Aug 12, 2022
9ea6e1c
fixes header includes
elstehle Aug 12, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
minor doxygen fix
elstehle committed Jul 13, 2022
commit 9fe8e4b6e2c527e471d9627369e72595ef3e452c
2 changes: 1 addition & 1 deletion cpp/src/io/fst/device_dfa.cuh
Original file line number Diff line number Diff line change
@@ -29,8 +29,8 @@ namespace fst {
* @brief Uses a deterministic finite automaton to transduce a sequence of symbols from an input
* iterator to a sequence of transduced output symbols.
*
* @tparam SymbolItT Random-access input iterator type to symbols fed into the FST
* @tparam DfaT The DFA specification
* @tparam SymbolItT Random-access input iterator type to symbols fed into the FST
* @tparam TransducedOutItT Random-access output iterator to which the transduced output will be
* written
* @tparam TransducedIndexOutItT Random-access output iterator type to which the indexes of the