Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a selection primtive API #2586

Merged

Conversation

seunghwak
Copy link
Contributor

Partially address #2580.

This PR is dependent on #2584.

This PR defines API for two selection primitives, one for the biased sampling/random walk and another for uniform random sampling/random walk.

@ChuckHastings For Node2Vec style random walk,

We can compute intersections for each (previous vertex, current vertex pairs).

We need to first create a non-detail space primitive calling detail::nbr_intersection (https://github.com/rapidsai/cugraph/blob/branch-22.10/cpp/src/prims/detail/nbr_intersection.cuh#L492) for given vertex pairs (this can be used for Jaccard and Overlap coefficients as well).

In MG, each GPU should store neighbor intersection outputs for the relevant source/destination ranges (not sure whether should we create additional utility functions to handle this, or this may not be a recurring pattern, so just leave this task much more as a dark magic for advanced users who understand how the 2D partitioning actually works). May go for the latter till we see this pattern occurring in other places.

Once we have neighbor intersection outputs and previous vertex IDs for the relevant (previous vertex ID, current vertex ID) pairs, we can create frontier having tagged-current vertex ID values (tag is an index to access previous vertex ID and neighbor intersection outputs for the perv vertex, current vertex pair).

Then, e_bias_op can check whether the outgoing neighbor belongs to the intersection or coincides with the previous vertex to properly set bias values.

@seunghwak seunghwak requested a review from a team as a code owner August 15, 2022 20:36
@seunghwak seunghwak self-assigned this Aug 15, 2022
@seunghwak seunghwak added feature request New feature or request 3 - Ready for Review non-breaking Non-breaking change labels Aug 15, 2022
@seunghwak seunghwak added this to the 22.10 milestone Aug 15, 2022
@seunghwak
Copy link
Contributor Author

rerun tests

@codecov-commenter
Copy link

codecov-commenter commented Aug 16, 2022

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.10@b0837f7). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 93d141b differs from pull request most recent head 5eb397e. Consider uploading reports for the commit 5eb397e to get more accurate results

@@               Coverage Diff               @@
##             branch-22.10    #2586   +/-   ##
===============================================
  Coverage                ?   61.11%           
===============================================
  Files                   ?      106           
  Lines                   ?     5634           
  Branches                ?        0           
===============================================
  Hits                    ?     3443           
  Misses                  ?     2191           
  Partials                ?        0           

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@seunghwak seunghwak requested review from naimnv and jnke2016 August 18, 2022 18:48
typename T>
std::tuple<std::optional<rmm::device_uvector<size_t>>,
decltype(allocate_dataframe_buffer<T>(size_t{0}, rmm::cuda_stream_view{}))>
per_v_random_select_and_transform_outgoing_e(raft::handle_t const& handle,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChuckHastings What do you think about select_and_transfrom vs select_transform in the primitive name?

I initially picked former but now getting more inclined to the latter as it is less verbose and more in line with other std:: thrust:: functions like partitoin_copy, transform_reduce, and so on. I guess removing and here won't cause much confusion?

I have a similar issue in naming extract_and_transform_if... primitives (a modification of the existing extract_if_e primitive to allow not just extracting an edge (source, destination, weight) triplets but to allow including edge IDs and types and so on... I feel like extract_transform may work better than extract_and_transform...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think removing the _and from the name is reasonable. As you observe, the thrust model doesn't include and conjunctions, they are implied.

@ChuckHastings
Copy link
Collaborator

@gpucibot merge

@rapids-bot rapids-bot bot merged commit bbb54c7 into rapidsai:branch-22.10 Aug 25, 2022
@seunghwak seunghwak deleted the fea_selection_primitive_api branch August 25, 2022 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature or request non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants