Skip to content

Commit

Permalink
Define API for MG random walk (rapidsai#2407)
Browse files Browse the repository at this point in the history
This PR defines the API for MG random walk in the C API and the C++ API.

C and C++ tests are defined, although some of the code is ifdef'ed out since there is not a working implementation here.

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Seunghwa Kang (https://github.com/seunghwak)
  - Joseph Nke (https://github.com/jnke2016)
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: rapidsai#2407
  • Loading branch information
ChuckHastings authored Jul 22, 2022
1 parent 3cea60d commit 5bf07fb
Show file tree
Hide file tree
Showing 11 changed files with 2,259 additions and 22 deletions.
2 changes: 2 additions & 0 deletions cpp/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -195,6 +195,8 @@ set(CUGRAPH_SOURCES
src/community/legacy/extract_subgraph_by_vertex.cu
src/community/legacy/egonet.cu
src/sampling/random_walks.cu
src/sampling/random_walks_sg.cu
src/sampling/random_walks_mg.cu
src/sampling/detail/sampling_utils_mg.cu
src/sampling/detail/sampling_utils_sg.cu
src/sampling/uniform_neighbor_sampling_mg.cpp
Expand Down
130 changes: 130 additions & 0 deletions cpp/include/cugraph/algorithms.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -1350,6 +1350,136 @@ random_walks(raft::handle_t const& handle,
bool use_padding = false,
std::unique_ptr<sampling_params_t> sampling_strategy = nullptr);

/**
* @brief returns uniform random walks from starting sources, where each path is of given
* maximum length.
*
* @p start_vertices can contain duplicates, in which case different random walks will
* be generated for each instance.
*
* If the graph is weighted, the return contains edge weights. If the graph is unweighted then
* the returned value will be std::nullopt.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view graph view to operate on
* @param start_vertices Device span defining the starting vertices
* @param max_length maximum length of random walk
* @param seed (optional, defaults to system time), seed for random number generation
* @return tuple containing device vectors of vertices and the edge weights (if
* the graph is weighted)<br>
* For each input selector there will be (max_length+1) elements in the
* vertex vector with the starting vertex followed by the subsequent
* vertices in the random walk. If a path terminates before max_length,
* the vertices will be populated with invalid_vertex_id
* (-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an
* unsigned vertex_t type)<br>
* For each input selector there will be max_length elements in the weights
* vector with the edge weight for the edge in the path. If a path
* terminates before max_length the subsequent edge weights will be
* set to weight_t{0}.
*/
// FIXME: Do I care about transposed or not? I want to be able to operate in either
// direction.
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
uniform_random_walks(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_vertices,
size_t max_length,
uint64_t seed = std::numeric_limits<uint64_t>::max());

/**
* @brief returns biased random walks from starting sources, where each path is of given
* maximum length.
*
* The next vertex is biased based on the edge weights. The probability of traversing a
* departing edge will be the edge weight divided by the sum of the departing edge weights.
*
* @p start_vertices can contain duplicates, in which case different random walks will
* be generated for each instance.
*
* @throws cugraph::logic_error if the graph is unweighted
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view graph view to operate on
* @param start_vertices Device span defining the starting vertices
* @param max_length maximum length of random walk
* @param seed (optional, defaults to system time), seed for random number generation
* @return tuple containing device vectors of vertices and the edge weights<br>
* For each input selector there will be (max_length+1) elements in the
* vertex vector with the starting vertex followed by the subsequent
* vertices in the random walk. If a path terminates before max_length,
* the vertices will be populated with invalid_vertex_id
* (-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an
* unsigned vertex_t type)<br>
* For each input selector there will be max_length elements in the weights
* vector with the edge weight for the edge in the path. If a path
* terminates before max_length the subsequent edge weights will be
* set to weight_t{0}.
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
biased_random_walks(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_vertices,
size_t max_length,
uint64_t seed = std::numeric_limits<uint64_t>::max());

/**
* @brief returns biased random walks with node2vec biases from starting sources,
* where each path is of given maximum length.
*
* @p start_vertices can contain duplicates, in which case different random walks will
* be generated for each instance.
*
* If the graph is weighted, the return contains edge weights and the node2vec computation
* will utilize the edge weights. If the graph is unweighted then the return will not contain
* edge weights and the node2vec computation will assume an edge weight of 1 for all edges.
*
* @tparam vertex_t Type of vertex identifiers. Needs to be an integral type.
* @tparam edge_t Type of edge identifiers. Needs to be an integral type.
* @tparam weight_t Type of edge weights. Needs to be a floating point type.
* @tparam multi_gpu Flag indicating whether template instantiation should target single-GPU (false)
* @param handle RAFT handle object to encapsulate resources (e.g. CUDA stream, communicator, and
* handles to various CUDA libraries) to run graph algorithms.
* @param graph_view graph view to operate on
* @param start_vertices Device span defining the starting vertices
* @param max_length maximum length of random walk
* @param p node2vec return parameter
* @param q node2vec in-out parameter
* @param seed (optional, defaults to system time), seed for random number generation
* @return tuple containing device vectors of vertices and the edge weights<br>
* For each input selector there will be (max_length+1) elements in the
* vertex vector with the starting vertex followed by the subsequent
* vertices in the random walk. If a path terminates before max_length,
* the vertices will be populated with invalid_vertex_id
* (-1 for signed vertex_t, std::numeric_limits<vertex_t>::max() for an
* unsigned vertex_t type)<br>
* For each input selector there will be max_length elements in the weights
* vector with the edge weight for the edge in the path. If a path
* terminates before max_length the subsequent edge weights will be
* set to weight_t{0}.
*/
template <typename vertex_t, typename edge_t, typename weight_t, bool multi_gpu>
std::tuple<rmm::device_uvector<vertex_t>, std::optional<rmm::device_uvector<weight_t>>>
node2vec_random_walks(raft::handle_t const& handle,
graph_view_t<vertex_t, edge_t, weight_t, false, multi_gpu> const& graph_view,
raft::device_span<vertex_t const> start_vertices,
size_t max_length,
weight_t p,
weight_t q,
uint64_t seed = std::numeric_limits<uint64_t>::max());

#ifndef NO_CUGRAPH_OPS
/**
* @brief generate sub-sampled graph as an adjacency list (CSR format) given input graph,
Expand Down
71 changes: 71 additions & 0 deletions cpp/include/cugraph_c/sampling_algorithms.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,8 +36,78 @@ typedef struct {
int32_t align_;
} cugraph_random_walk_result_t;

/**
* @brief Compute uniform random walks
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] start_vertices Array of source vertices
* @param [in] max_length Maximum length of the generated path
* @param [in] result Output from the node2vec call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_uniform_random_walks(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* start_vertices,
size_t max_length,
cugraph_random_walk_result_t** result,
cugraph_error_t** error);

/**
* @brief Compute biased random walks
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] start_vertices Array of source vertices
* @param [in] max_length Maximum length of the generated path
* @param [in] result Output from the node2vec call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_biased_random_walks(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* start_vertices,
size_t max_length,
cugraph_random_walk_result_t** result,
cugraph_error_t** error);

/**
* @brief Compute random walks using the node2vec framework.
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
* needs to be transposed
* @param [in] start_vertices Array of source vertices
* @param [in] max_length Maximum length of the generated path
* @param [in] compress_result If true, return the paths as a compressed sparse row matrix,
* otherwise return as a dense matrix
* @param [in] p The return parameter
* @param [in] q The in/out parameter
* @param [in] result Output from the node2vec call
* @param [out] error Pointer to an error object storing details of any error. Will
* be populated if error code is not CUGRAPH_SUCCESS
* @return error code
*/
cugraph_error_code_t cugraph_node2vec_random_walks(
const cugraph_resource_handle_t* handle,
cugraph_graph_t* graph,
const cugraph_type_erased_device_array_view_t* start_vertices,
size_t max_length,
double p,
double q,
cugraph_random_walk_result_t** result,
cugraph_error_t** error);

/**
* @brief Compute random walks using the node2vec framework.
* @deprecated This call should be replaced with cugraph_node2vec_random_walks
*
* @param [in] handle Handle for accessing resources
* @param [in] graph Pointer to graph. NOTE: Graph might be modified if the storage
Expand Down Expand Up @@ -94,6 +164,7 @@ cugraph_type_erased_device_array_view_t* cugraph_random_walk_result_get_weights(

/**
* @brief If the random walk result is compressed, get the path sizes
* @deprecated This call will no longer be relevant once the new node2vec are called
*
* @param [in] result The result from a random walk algorithm
* @return type erased array pointing to the path sizes in device memory
Expand Down
Loading

0 comments on commit 5bf07fb

Please sign in to comment.