Skip to content

Commit

Permalink
JSON tree traversal (#11610)
Browse files Browse the repository at this point in the history
Adds JSON tree traversal algorithm in host and device.

It generates column indices for _record_ orient json format. List of structs at root, where each struct is a row.
- [x] column indices generation 
- [x] row offset

Depends on PR #11518

### Tree Traversal

  This algorithm assigns a unique column id to each node in the tree.
  The row offset is the row index of the node in that column id.
  Algorithm:
  1. Convert node_category+fieldname to node_type.
	      a. Create a hashmap to hash field name and assign unique node id as values.
	      b. Convert the node categories to node types.
	         Node type is defined as node category enum value if it is not a field node,
	         otherwise it is the unique node id assigned by the hashmap (value shifted by #NUM_CATEGORY).
  2. Preprocessing: Translate parent node ids after sorting by level.
	      a. sort by level
	      b. get gather map of sorted indices
	      c. translate parent_node_ids to new sorted indices
  3. Find level boundaries.
     copy_if index of first unique values of sorted levels.
  4. Per-Level Processing: Propagate parent node ids for each level.
	      For each level,
	        a. gather col_id from previous level results. input=col_id, gather_map is parent_indices.
	        b. stable sort by {parent_col_id, node_type}
	        c. scan sum of unique {parent_col_id, node_type}
	        d. scatter the col_id back to stable node_level order (using scatter_indices)
    Restore original node_id order
  5. Generate row_offset.
	      a. stable_sort by parent_col_id.
	      b. scan_by_key {parent_col_id} (required only on nodes who's parent is list)
	      c. propagate to non-list leaves from parent list node by recursion

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Elias Stehle (https://github.com/elstehle)
  - Tobias Ribizel (https://github.com/upsj)
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)

URL: #11610
  • Loading branch information
karthikeyann authored Sep 24, 2022
1 parent 9a5f39a commit 006b254
Show file tree
Hide file tree
Showing 3 changed files with 756 additions and 36 deletions.
Loading

0 comments on commit 006b254

Please sign in to comment.