Implement hybrid thread/warp parallel kernel for `get_json_object` #2258

ttnghia · 2024-07-25T18:10:32Z

Instead of selecting either a thread-parallel or warp-parallel kernel depending on the input row size, this approach implements a different hybrid kernel such that:

Each warp always processes at max one input row.
The number of active threads in a warp depends on the number of JSON paths. The number of warps processing one row is computed as ceil(num_path / warp_size).

Signed-off-by: Nghia Truong <[email protected]>

This reverts commit d181258.

Signed-off-by: Nghia Truong <[email protected]>

ttnghia · 2024-07-25T18:12:33Z

Tested thread-parallel vs warp parallel vs hybrid kernel, with a small (fingerprint) dataset:

thread par, max path size:
2: 40s
4: 17s
6: 15s
8: 11.8s
10: 9.3s
16: 7.6s
32: 5.9s

warp par, max path size:
2: 19s
4: 13s
6: 13s
8: 12s
10: 11.5s
16: 10.6s
32: 10.2s

hybrid thread/warp par, max path size:
2: 17.2s
4: 11s
6: 10s
8: 8s
10: 7.6s
16: 6.4s
32: 5.6s

ttnghia · 2024-07-25T18:21:02Z

build

revans2

The volatility is better, and the low parallelism performance is much better, but it didn't really improve the average performance for the one test case I ran. I think that is okay, because there are less performance low spots that we could hit.

revans2 · 2024-07-25T19:48:11Z

To be clear the chart is comparing against #2256 (review) and I still want to run some more tests, but I think this is good to go.

ttnghia added 6 commits July 25, 2024 10:16

Implement hybrid thread-warp parallel

9d95cd5

Signed-off-by: Nghia Truong <[email protected]>

Fix out of bound access

7db2efb

Signed-off-by: Nghia Truong <[email protected]>

Add temporary benchmark

d181258

Signed-off-by: Nghia Truong <[email protected]>

Implement kernel launcher

0cd1f86

Signed-off-by: Nghia Truong <[email protected]>

Revert "Add temporary benchmark"

46a1d06

This reverts commit d181258.

Update docs

3d82df2

Signed-off-by: Nghia Truong <[email protected]>

ttnghia added the performance label Jul 25, 2024

ttnghia requested a review from revans2 July 25, 2024 18:10

ttnghia self-assigned this Jul 25, 2024

revans2 approved these changes Jul 25, 2024

View reviewed changes

ttnghia merged commit fd0542c into NVIDIA:branch-24.08 Jul 25, 2024
4 checks passed

ttnghia deleted the hybrid_warp_parallel branch July 25, 2024 19:48

ttnghia mentioned this pull request Jul 25, 2024

Run get_json_object_multiple_paths with thread-parallel kernel based on number of paths #2256

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement hybrid thread/warp parallel kernel for `get_json_object` #2258

Implement hybrid thread/warp parallel kernel for `get_json_object` #2258

ttnghia commented Jul 25, 2024

ttnghia commented Jul 25, 2024

ttnghia commented Jul 25, 2024

revans2 left a comment

revans2 commented Jul 25, 2024

Implement hybrid thread/warp parallel kernel for get_json_object #2258

Implement hybrid thread/warp parallel kernel for get_json_object #2258

Conversation

ttnghia commented Jul 25, 2024

ttnghia commented Jul 25, 2024

ttnghia commented Jul 25, 2024

revans2 left a comment

Choose a reason for hiding this comment

revans2 commented Jul 25, 2024

Implement hybrid thread/warp parallel kernel for `get_json_object` #2258

Implement hybrid thread/warp parallel kernel for `get_json_object` #2258