Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[gpuCI] Forward-merge branch-22.10 to branch-22.12 [skip gpuci] #11763

Merged
merged 1 commit into from
Sep 24, 2022

Conversation

GPUtester
Copy link
Collaborator

Forward-merge triggered by push to branch-22.10 that creates a PR to keep branch-22.12 up-to-date. If this PR is unable to be immediately merged due to conflicts, it will remain open for the team to manually merge.

Adds JSON tree traversal algorithm in host and device.

It generates column indices for _record_ orient json format. List of structs at root, where each struct is a row.
- [x] column indices generation 
- [x] row offset

Depends on PR #11518

### Tree Traversal

  This algorithm assigns a unique column id to each node in the tree.
  The row offset is the row index of the node in that column id.
  Algorithm:
  1. Convert node_category+fieldname to node_type.
	      a. Create a hashmap to hash field name and assign unique node id as values.
	      b. Convert the node categories to node types.
	         Node type is defined as node category enum value if it is not a field node,
	         otherwise it is the unique node id assigned by the hashmap (value shifted by #NUM_CATEGORY).
  2. Preprocessing: Translate parent node ids after sorting by level.
	      a. sort by level
	      b. get gather map of sorted indices
	      c. translate parent_node_ids to new sorted indices
  3. Find level boundaries.
     copy_if index of first unique values of sorted levels.
  4. Per-Level Processing: Propagate parent node ids for each level.
	      For each level,
	        a. gather col_id from previous level results. input=col_id, gather_map is parent_indices.
	        b. stable sort by {parent_col_id, node_type}
	        c. scan sum of unique {parent_col_id, node_type}
	        d. scatter the col_id back to stable node_level order (using scatter_indices)
    Restore original node_id order
  5. Generate row_offset.
	      a. stable_sort by parent_col_id.
	      b. scan_by_key {parent_col_id} (required only on nodes who's parent is list)
	      c. propagate to non-list leaves from parent list node by recursion

Authors:
  - Karthikeyan (https://github.com/karthikeyann)

Approvers:
  - Elias Stehle (https://github.com/elstehle)
  - Tobias Ribizel (https://github.com/upsj)
  - Yunsong Wang (https://github.com/PointKernel)
  - David Wendt (https://github.com/davidwendt)

URL: #11610
@GPUtester GPUtester requested a review from a team as a code owner September 24, 2022 12:25
@GPUtester GPUtester merged commit 59847c1 into branch-22.12 Sep 24, 2022
@github-actions github-actions bot added the libcudf Affects libcudf (C++/CUDA) code. label Sep 24, 2022
@GPUtester
Copy link
Collaborator Author

SUCCESS - forward-merge complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libcudf Affects libcudf (C++/CUDA) code.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants