Improve the performance of radix top-k #1175

yong-wang · 2023-01-25T13:19:17Z

The main changes are:

Add a one-block version. It uses single thread block for one row of a batch and is used when len is relatively small (<= 16384)
Avoid writing candidates to buffers when the number of candidates is larger than buffer length.
Add a parameter to control whether to use a fused filter in the last pass or use a standalone filter kernel. The later case is preferable when the leading bits of inputs are almost same.
Early stopping: when the target bucket contains k values, we can stop the computation earlier
Many implementation details are polished, like the initialization of counter, calculation of kernel launch parameters, and the scan step
Tests and benchmarks are updated to include the new implementations. New benchmarks are added to demonstrate the advantage of adaptive version.

…dix-topk

rapids-bot · 2023-01-25T13:19:21Z

Pull requests from external contributors require approval from a rapidsai organization member with write or admin permissions before CI can begin.

yong-wang · 2023-01-25T13:24:35Z

More details about the benchmark changes and results.

The adaptive version is most useful when the most significant bits of input data are almost the same. That is, when the value range of input data is narrow. So, some new benchmarks are added to demonstrate its advantage.

For float input, the value range of input is set to [1.0, 1.00003], and the corresponding binaries are 0x3F800000 and 0x3F8000FF, respectively. So the leading 3 bytes are all the same.
For double input, the value range of input is set to [1.0, 1.000015], and the corresponding binaries are 0x3FF0000000000000 and 0x3FF0000FFFFFFFFF, respectively.

Benchmark results (using A100 GPU and CUDA 12.0):

For existing benchmarks, on average, kRadix11bitsUpdated is 1.82X, 2.19X and 1.34X faster than kRadix11bits for float-int (float value & int index), double-int and double-size_t inputs, respectively.
For the new added benchmarks, which have narrow range of inputs, kRadix11bitsAdaptive is 9.76X, 1.66X and 1X faster than kRadix11bitsUpdated for float-int, double-int and double-size_t inputs, respectively.

cjnolet · 2023-01-26T11:03:47Z

/ok to test

codecov-commenter · 2023-01-26T12:41:55Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.04@88cb31d). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files

@@               Coverage Diff               @@
##             branch-23.04    #1175   +/-   ##
===============================================
  Coverage                ?   87.99%           
===============================================
  Files                   ?       21           
  Lines                   ?      483           
  Branches                ?        0           
===============================================
  Hits                    ?      425           
  Misses                  ?       58           
  Partials                ?        0

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

cjnolet · 2023-02-03T20:44:02Z

/okay to test

cjnolet · 2023-02-03T20:44:10Z

/add to allowlist

cjnolet · 2023-02-03T21:02:09Z

/okay to test

cjnolet · 2023-02-25T12:38:15Z

/ok to test

cjnolet · 2023-03-02T18:09:40Z

/ok to test

cjnolet · 2023-03-09T03:04:00Z

@yong-wang just a heads up that we are reaching the halfway point for release 23.04 and are trying to avoid having too many last-minute changes merged during burndown. It looks like this has a conflict to resolve and it’s otherwise approved to go in. Do you think we can get this in for 23.04?

…dix-topk

cjnolet · 2023-03-10T11:59:24Z

/ok to test

cjnolet · 2023-03-10T12:01:37Z

/merge

tfeher · 2023-03-10T16:47:29Z

@cjnolet there seems to be an issue with the input types passed to select_k while calling from here

raft/cpp/include/raft/spatial/knn/knn.cuh

Lines 156 to 167 in e4aec7b

    
             matrix::detail::select::radix::select_k<value_t, idx_t, 8, 512>( 
        
               in_keys, in_values, n_inputs, input_len, k, out_keys, out_values, select_min, stream); 
        
             break; 
        
           case SelectKAlgo::RADIX_11_BITS: 
        
             matrix::detail::select::radix::select_k<value_t, idx_t, 11, 512>( 
        
               in_keys, in_values, n_inputs, input_len, k, out_keys, out_values, select_min, stream); 
        
             break; 
        
           case SelectKAlgo::WARP_SORT: 
        
             matrix::detail::select::warpsort::select_k<value_t, idx_t>( 
        
               in_keys, in_values, n_inputs, input_len, k, out_keys, out_values, select_min, stream);

I am investigating and I will send a commit to fix this. The question is why does it pass other tests? I could reproduce the CI error locally.

yong-wang · 2023-03-11T04:40:39Z

Thanks Tamas for locating the bug. I had misread the error message and missed the bug.

cjnolet · 2023-03-11T04:43:44Z

/ok to test

cjnolet · 2023-03-11T13:00:02Z

/ok to test

tfeher · 2023-03-11T13:39:05Z

There is an error with some of the tests:

./MATRIX_TEST --gtest_filter=*ReferencedRandomDoubleSizeT*

[  FAILED  ] 4 tests, listed below:
[  FAILED  ] SelectK/ReferencedRandomDoubleSizeT.Run/77, where GetParam() = (params{batch_size: 20, len: 700, k: 10, asc, no-input-index}, kRadix8bits)
[  FAILED  ] SelectK/ReferencedRandomDoubleSizeT.Run/112, where GetParam() = (params{batch_size: 100, len: 1700, k: 31, asc, no-input-index}, kRadix8bits)
[  FAILED  ] SelectK/ReferencedRandomDoubleSizeT.Run/140, where GetParam() = (params{batch_size: 100, len: 1700, k: 64, dsc, no-input-index}, kRadix8bits)
[  FAILED  ] SelectK/ReferencedRandomDoubleSizeT.Run/182, where GetParam() = (params{batch_size: 100, len: 1700, k: 1023, dsc, no-input-index}, kRadix8bits)

yong-wang · 2023-03-12T03:28:10Z

Got it. I'll take a look.

cjnolet · 2023-03-23T22:39:28Z

/okay to test

cjnolet · 2023-03-24T13:07:14Z

/okay to test

yong-wang added 14 commits December 8, 2022 12:35

Improve performance of radix top-k

765f4d4

Merge remote-tracking branch 'origin/branch-23.02' into fea-update-ra…

bc5df8d

…dix-topk

radix top-k: conform to RAFT code style

d085e58

radix top-k: add extra input parameter in_idx

5f63cbd

radix top-k: replace greater with select_min

6086454

radix top-k: make it compiled

dd63770

radix top-k: polish style

aea2bd0

radix top-k: polish code

99924bd

radix top-k: remove Store classes

2746173

radix top-k: polish code comments

381c075

radix top-k: change dynamic to adaptive

d20a480

modify radix top-k so that it conforms the latest select_k code

c408245

fix the case when k equals len

53ebcb8

radix top-k: update tests and benchmarks

fdd30e9

yong-wang requested a review from a team as a code owner January 25, 2023 13:19

github-actions bot added the cpp label Jan 25, 2023

cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 26, 2023

cjnolet assigned yong-wang Jan 26, 2023

cjnolet requested review from achirkin and tfeher January 26, 2023 13:57

cjnolet changed the base branch from branch-23.02 to branch-23.04 February 3, 2023 20:43

Merge branch 'branch-23.04' into fea-update-radix-topk

9ff84d5

cjnolet added 2 commits March 1, 2023 16:45

Merge branch 'branch-23.04' into fea-update-radix-topk

5e7addf

Merge branch 'branch-23.04' into fea-update-radix-topk

f72b3e8

Merge remote-tracking branch 'origin/branch-23.04' into fea-update-ra…

f43a523

…dix-topk

Add missing fused_last_filter arg while dispatching select_k

f7061eb

cjnolet added the 5 - Ready to Merge label Mar 11, 2023

Merge branch 'branch-23.04' into fea-update-radix-topk

7543bbf

Merge branch 'branch-23.04' into fea-update-radix-topk

01ac6dd

cjnolet added 4 - Waiting on Author Waiting for author to respond to review and removed 5 - Ready to Merge labels Mar 15, 2023

cjnolet and others added 3 commits March 17, 2023 11:00

Merge branch 'branch-23.04' into fea-update-radix-topk

45b809f

Merge branch 'branch-23.04' into fea-update-radix-topk

9d7a687

Checking in fix for select_k based on offline conversation w/ Yong Wang.

d447fcb

yong-wang and others added 3 commits March 24, 2023 16:45

minor polish

a11fb8e

adjust the place of volatile

dd6ae51

Merge branch 'branch-23.04' into fea-update-radix-topk

f1e281b

rapids-bot bot merged commit 8f1fa07 into rapidsai:branch-23.04 Mar 24, 2023

tfeher mentioned this pull request Apr 24, 2023

Learn heuristic to pick fastest select_k algorithm #1455

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the performance of radix top-k #1175

Improve the performance of radix top-k #1175

yong-wang commented Jan 25, 2023 •

edited

Loading

rapids-bot bot commented Jan 25, 2023

yong-wang commented Jan 25, 2023 •

edited

Loading

cjnolet commented Jan 26, 2023

codecov-commenter commented Jan 26, 2023 •

edited

Loading

cjnolet commented Feb 3, 2023

cjnolet commented Feb 3, 2023

cjnolet commented Feb 3, 2023

cjnolet commented Feb 25, 2023

cjnolet commented Mar 2, 2023

cjnolet commented Mar 9, 2023

cjnolet commented Mar 10, 2023

cjnolet commented Mar 10, 2023

tfeher commented Mar 10, 2023 •

edited

Loading

yong-wang commented Mar 11, 2023

cjnolet commented Mar 11, 2023

cjnolet commented Mar 11, 2023

tfeher commented Mar 11, 2023

yong-wang commented Mar 12, 2023

cjnolet commented Mar 23, 2023

cjnolet commented Mar 24, 2023

Improve the performance of radix top-k #1175

Improve the performance of radix top-k #1175

Conversation

yong-wang commented Jan 25, 2023 • edited Loading

rapids-bot bot commented Jan 25, 2023

yong-wang commented Jan 25, 2023 • edited Loading

cjnolet commented Jan 26, 2023

codecov-commenter commented Jan 26, 2023 • edited Loading

Codecov Report

cjnolet commented Feb 3, 2023

cjnolet commented Feb 3, 2023

cjnolet commented Feb 3, 2023

cjnolet commented Feb 25, 2023

cjnolet commented Mar 2, 2023

cjnolet commented Mar 9, 2023

cjnolet commented Mar 10, 2023

cjnolet commented Mar 10, 2023

tfeher commented Mar 10, 2023 • edited Loading

yong-wang commented Mar 11, 2023

cjnolet commented Mar 11, 2023

cjnolet commented Mar 11, 2023

tfeher commented Mar 11, 2023

yong-wang commented Mar 12, 2023

cjnolet commented Mar 23, 2023

cjnolet commented Mar 24, 2023

yong-wang commented Jan 25, 2023 •

edited

Loading

yong-wang commented Jan 25, 2023 •

edited

Loading

codecov-commenter commented Jan 26, 2023 •

edited

Loading

tfeher commented Mar 10, 2023 •

edited

Loading