Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QST] Details of heuristic function inside the stream-K algorithm. #1125

Closed
Ther-nullptr opened this issue Oct 5, 2023 · 1 comment
Closed
Labels
question Question

Comments

@Ther-nullptr
Copy link

I have read the paper of stream-K and the source code of threadblock_swizzle_streamk method. I am confused by some of the code:

    for (int trial_sk_blocks = min_sk_blocks; trial_sk_blocks <= max_sk_blocks; ++trial_sk_blocks)
    {
      int sk_waves = (trial_sk_blocks + avail_sms - 1) / avail_sms;
      int max_sk_iters_per_block = (sk_iters + trial_sk_blocks - 1) / trial_sk_blocks;
      int sk_iter_equiv = max_sk_iters_per_block * sk_waves;

      int num_peers = ((trial_sk_blocks + sk_tiles - 1) / sk_tiles) + 1;        // add one for alignment skew // !question1

      float iter_cost = 0.02f * float(num_peers) * float(sk_iter_equiv);

      if (trial_sk_blocks % sk_tiles == 0)
      {
        // aligned
        num_peers = (trial_sk_blocks / sk_tiles);

        iter_cost = 0.0f; // !question2
      }
     // ...
  1. What is alignment skew? The +1 is not appeared in the original equation of the paper.
  2. When trial_sk_blocks % sk_tiles == 0, I can understand how the num_peers is calculated, but the iter_cost is set to 0, which in my understanding, not same as the original paper.

The original equation is below:
image

@hwu36
Copy link
Collaborator

hwu36 commented Oct 5, 2023

@dumerrill

The code was fine tuned after the paper was written.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Question
Projects
None yet
Development

No branches or pull requests

3 participants