Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Adds the Logical Stack algorithm (#11078)
**Description** This PR adds the _Logical Stack_, an algorithm required by the JSON parser. The _Logical Stack_ takes a sequence of stack operations (i.e., `push(X)`, `pop()`, `read()`) as if they were to be applied to a regular `stack` data structure in the given order. For each operation within that sequence, the algorithm resolves the stack state and writes out the item that is on top of the stack before such operation is applied. As, for some operations, the stack may be empty, the algorithm uses a user-specified _sentinel_ symbol to represent the "empty-stack" (i.e., there is no item on top of the stack). **How the _Logical Stack_ is implemented is illustrated in this presentation:** https://docs.google.com/presentation/d/16r-0SlQFd-7fH2R7I06tc_JqsAd_0GrTgh_q20sJ2ak/edit?usp=sharing The only deviation from the algorithm presented in the slides is the optimisation of a sparse sequence of stack operations. That is, in case of the _JSON Parser_, we only pass symbols that actually push or pop (i.e., `{`, `[`, `}`, and `]`) along with the index at which that operation occurred. Symbols that follow a stack operation that pushes or pops are filled with the symbol that is inferred as top-of-stack symbol of such operation. Results from intermediate processing steps can be dumped to `stdout` by setting: ``` export CUDA_DBG_DUMP=1 ``` **For instance:** ``` // 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 d_input = " [ { } , n u l l , [ 1 , 2 , 3 ] , [ 1 ] ] "; // This is the sparse representation we feed into the logical stack // The algorithm's contract is: positions not present in the sparse list are reading the top-of-the stack d_symbols = " [ { } [ ] [ ] ] " d_indexes = " 0 1 2 9 15 17 19 20 " // Function object used for classifying the kind of stack operation a symbol represents struct ToStackOp { __host__ __device__ fst::stack_op_type operator()( char const& symbol) const { return symbol == '[' ? fst::stack_op_type::PUSH : symbol == ']' ? fst::stack_op_type::POP : fst::stack_op_type::READ; } }; // The symbol that we'll put whenever there's nothing on the stack auto empty_stack_symbol = '_'; // A symbol that does not push auto read_symbol = 'x'; // Type sufficiently large to cover [-max_stack_level, max_stack_level] using stack_level_t = int8_t; fst::sparse_stack_op_to_top_of_stack<stack_level_t>( d_symbols, d_indexes, ToStackOp{}, d_top_of_stack_out, empty_stack_symbol, read_symbol, d_symbols.size(), // input size (num. items in sparse representation) d_input.size(), // output size (num. items in dense representation) stream); // The output represents the symbol that was on top of the stack prior to applying the stack operation d_input = " [ { } , n u l l , [ 1 , 2 , 3 ] , [ 1 ] ] "; // <<-- original input d_top_of_stack_out = " _ [ { [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ [ "; // <<-- logical stack output ``` Authors: - Elias Stehle (https://github.com/elstehle) Approvers: - Yunsong Wang (https://github.com/PointKernel) - Karthikeyan (https://github.com/karthikeyann) URL: #11078
- Loading branch information