Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the performance of radix top-k #1175

Merged
merged 45 commits into from
Mar 24, 2023

Conversation

yong-wang
Copy link
Contributor

@yong-wang yong-wang commented Jan 25, 2023

The main changes are:

  • Add a one-block version. It uses single thread block for one row of a batch and is used when len is relatively small (<= 16384)
  • Avoid writing candidates to buffers when the number of candidates is larger than buffer length.
  • Add a parameter to control whether to use a fused filter in the last pass or use a standalone filter kernel. The later case is preferable when the leading bits of inputs are almost same.
  • Early stopping: when the target bucket contains k values, we can stop the computation earlier
  • Many implementation details are polished, like the initialization of counter, calculation of kernel launch parameters, and the scan step
  • Tests and benchmarks are updated to include the new implementations. New benchmarks are added to demonstrate the advantage of adaptive version.

@yong-wang yong-wang requested a review from a team as a code owner January 25, 2023 13:19
@rapids-bot
Copy link

rapids-bot bot commented Jan 25, 2023

Pull requests from external contributors require approval from a rapidsai organization member with write or admin permissions before CI can begin.

@github-actions github-actions bot added the cpp label Jan 25, 2023
@yong-wang
Copy link
Contributor Author

yong-wang commented Jan 25, 2023

More details about the benchmark changes and results.

The adaptive version is most useful when the most significant bits of input data are almost the same. That is, when the value range of input data is narrow. So, some new benchmarks are added to demonstrate its advantage.

  • For float input, the value range of input is set to [1.0, 1.00003], and the corresponding binaries are 0x3F800000 and 0x3F8000FF, respectively. So the leading 3 bytes are all the same.
  • For double input, the value range of input is set to [1.0, 1.000015], and the corresponding binaries are 0x3FF0000000000000 and 0x3FF0000FFFFFFFFF, respectively.

Benchmark results (using A100 GPU and CUDA 12.0):

  • For existing benchmarks, on average, kRadix11bitsUpdated is 1.82X, 2.19X and 1.34X faster than kRadix11bits for float-int (float value & int index), double-int and double-size_t inputs, respectively.
  • For the new added benchmarks, which have narrow range of inputs, kRadix11bitsAdaptive is 9.76X, 1.66X and 1X faster than kRadix11bitsUpdated for float-int, double-int and double-size_t inputs, respectively.

@cjnolet cjnolet added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Jan 26, 2023
@cjnolet
Copy link
Member

cjnolet commented Jan 26, 2023

/ok to test

@codecov-commenter
Copy link

codecov-commenter commented Jan 26, 2023

Codecov Report

❗ No coverage uploaded for pull request base (branch-23.04@88cb31d). Click here to learn what that means.
Patch has no changes to coverable lines.

Additional details and impacted files
@@               Coverage Diff               @@
##             branch-23.04    #1175   +/-   ##
===============================================
  Coverage                ?   87.99%           
===============================================
  Files                   ?       21           
  Lines                   ?      483           
  Branches                ?        0           
===============================================
  Hits                    ?      425           
  Misses                  ?       58           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@cjnolet cjnolet requested review from achirkin and tfeher January 26, 2023 13:57
@cjnolet cjnolet changed the base branch from branch-23.02 to branch-23.04 February 3, 2023 20:43
@cjnolet
Copy link
Member

cjnolet commented Feb 3, 2023

/okay to test

@cjnolet
Copy link
Member

cjnolet commented Feb 3, 2023

/add to allowlist

@cjnolet
Copy link
Member

cjnolet commented Feb 3, 2023

/okay to test

@cjnolet
Copy link
Member

cjnolet commented Feb 25, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Mar 2, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Mar 9, 2023

@yong-wang just a heads up that we are reaching the halfway point for release 23.04 and are trying to avoid having too many last-minute changes merged during burndown. It looks like this has a conflict to resolve and it’s otherwise approved to go in. Do you think we can get this in for 23.04?

@cjnolet
Copy link
Member

cjnolet commented Mar 10, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Mar 10, 2023

/merge

@tfeher
Copy link
Contributor

tfeher commented Mar 10, 2023

@cjnolet there seems to be an issue with the input types passed to select_k while calling from here

matrix::detail::select::radix::select_k<value_t, idx_t, 8, 512>(
in_keys, in_values, n_inputs, input_len, k, out_keys, out_values, select_min, stream);
break;
case SelectKAlgo::RADIX_11_BITS:
matrix::detail::select::radix::select_k<value_t, idx_t, 11, 512>(
in_keys, in_values, n_inputs, input_len, k, out_keys, out_values, select_min, stream);
break;
case SelectKAlgo::WARP_SORT:
matrix::detail::select::warpsort::select_k<value_t, idx_t>(
in_keys, in_values, n_inputs, input_len, k, out_keys, out_values, select_min, stream);
I am investigating and I will send a commit to fix this. The question is why does it pass other tests? I could reproduce the CI error locally.

@yong-wang
Copy link
Contributor Author

Thanks Tamas for locating the bug. I had misread the error message and missed the bug.

@cjnolet
Copy link
Member

cjnolet commented Mar 11, 2023

/ok to test

@cjnolet
Copy link
Member

cjnolet commented Mar 11, 2023

/ok to test

@tfeher
Copy link
Contributor

tfeher commented Mar 11, 2023

There is an error with some of the tests:

./MATRIX_TEST --gtest_filter=*ReferencedRandomDoubleSizeT*

[  FAILED  ] 4 tests, listed below:
[  FAILED  ] SelectK/ReferencedRandomDoubleSizeT.Run/77, where GetParam() = (params{batch_size: 20, len: 700, k: 10, asc, no-input-index}, kRadix8bits)
[  FAILED  ] SelectK/ReferencedRandomDoubleSizeT.Run/112, where GetParam() = (params{batch_size: 100, len: 1700, k: 31, asc, no-input-index}, kRadix8bits)
[  FAILED  ] SelectK/ReferencedRandomDoubleSizeT.Run/140, where GetParam() = (params{batch_size: 100, len: 1700, k: 64, dsc, no-input-index}, kRadix8bits)
[  FAILED  ] SelectK/ReferencedRandomDoubleSizeT.Run/182, where GetParam() = (params{batch_size: 100, len: 1700, k: 1023, dsc, no-input-index}, kRadix8bits)

@yong-wang
Copy link
Contributor Author

Got it. I'll take a look.

@cjnolet cjnolet added 4 - Waiting on Author Waiting for author to respond to review and removed 5 - Ready to Merge labels Mar 15, 2023
@cjnolet
Copy link
Member

cjnolet commented Mar 23, 2023

/okay to test

@cjnolet
Copy link
Member

cjnolet commented Mar 24, 2023

/okay to test

@rapids-bot rapids-bot bot merged commit 8f1fa07 into rapidsai:branch-23.04 Mar 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
4 - Waiting on Author Waiting for author to respond to review cpp improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
Development

Successfully merging this pull request may close these issues.

5 participants