Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For fixing the cuGraph test failures with PCG #690

Merged
merged 2 commits into from
May 26, 2022

Conversation

vinaydes
Copy link
Contributor

RAFT's RNG class provides methods that fill a buffer with random numbers belonging to various probability distribution functions. For exampleuniform, normal etc. In this methods, multiple instances of random number generator are used in parallel. Each cuda thread gets it own instance of generator. The instance is initialized with three values seed, subsequence and offset. The value for seed is common for all threads and thread id is used for subsequence. The offset is kept 0 for all instances. To fill the buffer, the threads work in grid strided fashion demonstrated by loop below:

  for (size_t i = tid; i < buffer_length; i += total_number_of_threads) {
    buffer[i] = rng.next();
  }

Due to grid-striding of loops, consecutive elements in the buffer are ith element in consecutive subsequences. For example, 10th and 11th elements in the buffer would be 0th element in 10th and 11th subsequences. In the unit tests for cugraph, this scheme seems to introduce a slight bias in certain cases. Easy fix to the issue is to break the lock step increment of individual subsequences.

This PR sets different offset value for each subsequence by setting offset = subsequence. Note that this change has no effect on the period of the random number generator.

This should fix the cuGraph issue rapidsai/cugraph#2266 for now.

@vinaydes vinaydes requested a review from a team as a code owner May 26, 2022 07:57
@github-actions github-actions bot added the cpp label May 26, 2022
@teju85
Copy link
Member

teju85 commented May 26, 2022

Very interesting find @vinaydes ! Can you also check that this fix doesn't break cuML tests, please?

Also, adding @akifcorduk and @MatthiasKohl to see if this doesn't break anything in cuOpt and cugraph-ops respectively.

@teju85 teju85 added bug Something isn't working non-breaking Non-breaking change labels May 26, 2022
Copy link
Member

@teju85 teju85 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes LGTM.

@vinaydes
Copy link
Contributor Author

Yes, I'll run cuML tests with this change.

@vinaydes
Copy link
Contributor Author

Created cuml PR here rapidsai/cuml#4760 after making sure it works locally on my test setup.

@vinaydes
Copy link
Contributor Author

Also created cuGraph PR here rapidsai/cugraph#2316

@cjnolet
Copy link
Member

cjnolet commented May 26, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit d8989b7 into rapidsai:branch-22.06 May 26, 2022
@MatthiasKohl
Copy link
Contributor

Interesting, this should not break anything in cugraph-ops. Thanks @vinaydes for the fix!

rapids-bot bot pushed a commit to rapidsai/cugraph that referenced this pull request Jun 21, 2022
With the RAFT changes here: rapidsai/raft#690 we should be able to use the PC generator again.  The PC generator is significantly faster.

Closes #2266

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Corey J. Nolet (https://github.com/cjnolet)
  - Seunghwa Kang (https://github.com/seunghwak)

URL: #2356
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cpp non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants