Skip to content

Commit

Permalink
Create C++ documentation (rapidsai#1489)
Browse files Browse the repository at this point in the history
Improve the C++ documentation for the following algorithms:  Pagerank, SSSP, BFS, and Louvain.

We need, generally, to improve the C++ documentation for cuGraph.  This is intended to improve the documentation for the algorithms that have been adapted to use the new graph primitives and will hopefully provide a construct that we can use for updating other C++ documentation.

As we migrate new algorithms to use the new graph primitives over the next several releases we will create documentation for those algorithms as well.

Closes rapidsai#1490

Authors:
  - Chuck Hastings (https://github.com/ChuckHastings)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)
  - Brad Rees (https://github.com/BradReesWork)

URL: rapidsai#1489
  • Loading branch information
ChuckHastings authored Mar 31, 2021
1 parent 7a2b02b commit daa9622
Show file tree
Hide file tree
Showing 4 changed files with 221 additions and 5 deletions.
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ As of Release 0.18 - including 0.18 nightly
| Community | | | |
| | EgoNet | Single-GPU | |
| | Leiden | Single-GPU | |
| | Louvain | Multi-GPU | |
| | Louvain | Multi-GPU | [C++ README](cpp/src/community/README.md#Louvain) |
| | Ensemble Clustering for Graphs | Single-GPU | |
| | Spectral-Clustering - Balanced Cut | Single-GPU | |
| | Spectral-Clustering - Modularity | Single-GPU | |
Expand All @@ -71,16 +71,16 @@ As of Release 0.18 - including 0.18 nightly
| Linear Assignment| | | |
| | Hungarian | Single-GPU | [README](cpp/src/linear_assignment/README-hungarian.md) |
| Link Analysis| | | |
| | Pagerank | Multi-GPU | |
| | Personal Pagerank | Multi-GPU | |
| | Pagerank | Multi-GPU | [C++ README](cpp/src/centrality/README.md#Pagerank) |
| | Personal Pagerank | Multi-GPU | [C++ README](cpp/src/centrality/README.md#Personalized-Pagerank) |
| | HITS | Single-GPU | leverages Gunrock |
| Link Prediction | | | |
| | Jaccard Similarity | Single-GPU | |
| | Weighted Jaccard Similarity | Single-GPU | |
| | Overlap Similarity | Single-GPU | |
| Traversal | | | |
| | Breadth First Search (BFS) | Multi-GPU | with cutoff support |
| | Single Source Shortest Path (SSSP) | Multi-GPU | |
| | Breadth First Search (BFS) | Multi-GPU | with cutoff support <br/> [C++ README](cpp/src/traversal/README.md#BFS) |
| | Single Source Shortest Path (SSSP) | Multi-GPU | [C++ README](cpp/src/traversal/README.md#SSSP) |
| | Traveling Salesperson Problem (TSP) | Single-GPU | |
| Structure | | | |
| | Renumbering | Single-GPU | multiple columns, any data type |
Expand Down
81 changes: 81 additions & 0 deletions cpp/src/centrality/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Centrality algorithms
cuGraph Pagerank is implemented using our graph primitive library

## Pagerank

The unit test code is the best place to search for examples on calling pagerank.

* [SG Implementation](../../tests/experimental/pagerank_test.cpp)
* [MG Implementation](../../tests/pagerank/mg_pagerank_test.cpp)

## Simple pagerank

The example assumes that you create an SG or MG graph somehow. The caller must create the pageranks vector in device memory and pass in the raw pointer to that vector into the pagerank function.

```cpp
#include <algorithms.hpp>
...
using vertex_t = int32_t; // or int64_t, whichever is appropriate
using weight_t = float; // or double, whichever is appropriate
using result_t = weight_t; // could specify float or double also
raft::handle_t handle; // Must be configured if MG
auto graph_view = graph.view(); // assumes you have created a graph somehow

result_t constexpr alpha{0.85};
result_t constexpr epsilon{1e-6};

rmm::device_uvector<result_t> pageranks_v(graph_view.get_number_of_vertices(), handle.get_stream());

// pagerank optionally supports three additional parameters:
// max_iterations - maximum number of iterations, if pagerank doesn't coverge by
// then we abort
// has_initial_guess - if true, values in the pagerank array when the call is initiated
// will be used as the initial pagerank values. These values will
// be normalized before use. If false (the default), the values
// in the pagerank array will be set to 1/num_vertices before
// starting the computation.
// do_expensive_check - perform extensive validation of the input data before
// executing algorithm. Off by default. Note: turning this on
// is expensive
cugraph::experimental::pagerank(handle, graph_view, nullptr, nullptr, nullptr, vertex_t{0},
pageranks_v.data(), alpha, epsilon);
```
## Personalized Pagerank
The example assumes that you create an SG or MG graph somehow. The caller must create the pageranks vector in device memory and pass in the raw pointer to that vector into the pagerank function. Additionally, the caller must create personalization_vertices and personalized_values vectors in device memory, populate them and pass in the raw pointers to those vectors.
```cpp
#include <algorithms.hpp>
...
using vertex_t = int32_t; // or int64_t, whichever is appropriate
using weight_t = float; // or double, whichever is appropriate
using result_t = weight_t; // could specify float or double also
raft::handle_t handle; // Must be configured if MG
auto graph_view = graph.view(); // assumes you have created a graph somehow
vertex_t number_of_personalization_vertices; // Provided by caller
result_t constexpr alpha{0.85};
result_t constexpr epsilon{1e-6};
rmm::device_uvector<result_t> pageranks_v(graph_view.get_number_of_vertices(), handle.get_stream());
rmm::device_uvector<vertex_t> personalization_vertices(number_of_personalization_vertices, handle.get_stream());
rmm::device_uvector<result_t> personalization_values(number_of_personalization_vertices, handle.get_stream());
// Populate personalization_vertices, personalization_values with user provided data
// pagerank optionally supports three additional parameters:
// max_iterations - maximum number of iterations, if pagerank doesn't coverge by
// then we abort
// has_initial_guess - if true, values in the pagerank array when the call is initiated
// will be used as the initial pagerank values. These values will
// be normalized before use. If false (the default), the values
// in the pagerank array will be set to 1/num_vertices before
// starting the computation.
// do_expensive_check - perform extensive validation of the input data before
// executing algorithm. Off by default. Note: turning this on
// is expensive
cugraph::experimental::pagerank(handle, graph_view, nullptr, personalization_vertices.data(),
personalization_values.data(), number_of_personalization_vertices,
pageranks_v.data(), alpha, epsilon);
```
79 changes: 79 additions & 0 deletions cpp/src/community/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Louvain and Related Clustering Algorithms
cuGraph contains a GPU implementation of the Louvain algorithm and several related clustering algorithms (Leiden and ECG).

## Louvain

The Louvain implementation is designed to assign clusters attempting to optimize modularity. The algorithm is derived from the serial implementation described in the following paper:

* VD Blondel, J-L Guillaume, R Lambiotte and E Lefebvre: Fast unfolding of community hierarchies in large networks, J Stat Mech P10008 (2008), http://arxiv.org/abs/0803.0476

It leverages some parallelism ideas from the following paper:
* Hao Lu, Mahantesh Halappanavar, Ananth Kalyanaraman: Parallel heuristics for scalable community detection, Elsevier Parallel Computing (2015), https://www.sciencedirect.com/science/article/pii/S0167819115000472


The challenge in parallelizing Louvain lies in the primary loop which visits the vertices in serial. For each vertex v the change in modularity is computed for moving the vertex from its currently assigned cluster to each of the clusters to which v's neighbors are assigned. The largest positive delta modularity is used to select a new cluster (if there are no positive delta modularities then the vertex is not moved). If the vertex v is moved to a new cluster then the statistics of the vertex v's old cluster and new cluster change. This change in cluster statistics may affect the delta modularity computations of all vertices that follow vertex v in the serial iteration, creating a dependency between the different iterations of the loop.

In order to make efficient use of the GPU parallelism, the cuGraph implementation computes the delta modularity for *all* vertex/neighbor pairs using the *current* vertex assignment. Decisions on moving vertices will be made based upon these delta modularities. This will potentially make choices that the serial version would not make. In order to minimize some of the negative effects of this (as described in the Lu paper), the cuGraph implementation uses an Up/Down technique. In even numbered iterations a vertex can only move from cluster i to cluster j if i > j; in odd numbered iterations a vertex can only move from cluster i to cluster j if i < j. This prevents two vertices from swapping clusters in the same iteration of the loop. We have had great success in converging on high modularity clustering using this technique.

## Calling Louvain

The unit test code is the best place to search for examples on calling louvain.

* [SG Implementation](../../tests/community/louvain_test.cpp)
* [MG Implementation](../../tests/community/mg_louvain_test.cpp)

The API itself is very simple. There are two variations:
* Return a flat clustering
* Return a Dendrogram

### Return a flat clustering

The example assumes that you create an SG or MG graph somehow. The caller must create the clustering vector in device memory and pass in the raw pointer to that vector into the louvain function.

```cpp
#include <algorithms.hpp>
...
using vertex_t = int32_t; // or int64_t, whichever is appropriate
using weight_t = float; // or double, whichever is appropriate
raft::handle_t handle; // Must be configured if MG
auto graph_view = graph.view(); // assumes you have created a graph somehow

size_t level;
weight_t modularity;

rmm::device_uvector<vertex_t> clustering_v(graph_view.get_number_of_vertices(), handle.get_stream());

// louvain optionally supports two additional parameters:
// max_level - maximum level of the Dendrogram
// resolution - constant in the modularity computation
std::tie(level, modularity) = cugraph::louvain(handle, graph_view, clustering_v.data());
```
### Return a Dendrogram
The Dendrogram represents the levels of hierarchical clustering that the Louvain algorithm computes. There is a separate function that will flatten the clustering into the same result as above. Returning the Dendrogram, however, provides a finer level of detail on the intermediate results which can be helpful in more fully understanding the data.
```cpp
#include <algorithms.hpp>
...
using vertex_t = int32_t; // or int64_t, whichever is appropriate
using weight_t = float; // or double, whichever is appropriate
raft::handle_t handle; // Must be configured if MG
auto graph_view = graph.view(); // assumes you have created a graph somehow
cugraph::Dendrogram dendrogram;
weight_t modularity;
// louvain optionally supports two additional parameters:
// max_level - maximum level of the Dendrogram
// resolution - constant in the modularity computation
std::tie(dendrogram, modularity) = cugraph::louvain(handle, graph_view);
// This will get the equivalent result to the earlier example
rmm::device_uvector<vertex_t> clustering_v(graph_view.get_number_of_vertices(), handle.get_stream());
cugraph::flatten_dendrogram(handle, graph_view, dendrogram, clustering.data());
```

## Leiden

## ECG
56 changes: 56 additions & 0 deletions cpp/src/traversal/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
# Traversal
cuGraph traversal algorithms are contained in this directory

## SSSP

The unit test code is the best place to search for examples on calling SSSP.

* [SG Implementation](../../tests/experimental/sssp_test.cpp)
* MG Implementation - TBD

## Simple SSSP

The example assumes that you create an SG or MG graph somehow. The caller must create the distances and predecessors vectors in device memory and pass in the raw pointers to those vectors into the SSSP function.

```cpp
#include <algorithms.hpp>
...
using vertex_t = int32_t; // or int64_t, whichever is appropriate
using weight_t = float; // or double, whichever is appropriate
using result_t = weight_t; // could specify float or double also
raft::handle_t handle; // Must be configured if MG
auto graph_view = graph.view(); // assumes you have created a graph somehow
vertex_t source; // Initialized by user

rmm::device_uvector<weight_t> distances_v(graph_view.get_number_of_vertices(), handle.get_stream());
rmm::device_uvector<vertex_t> predecessors_v(graph_view.get_number_of_vertices(), handle.get_stream());

cugraph::experimental::sssp(handle, graph_view, distances_v.begin(), predecessors_v.begin(), source, std::numeric_limits<weight_t>::max(), false);
```
## BFS
The unit test code is the best place to search for examples on calling BFS.
* [SG Implementation](../../tests/experimental/bfs_test.cpp)
* MG Implementation - TBD
## Simple BFS
The example assumes that you create an SG or MG graph somehow. The caller must create the distances and predecessors vectors in device memory and pass in the raw pointers to those vectors into the BFS function.
```cpp
#include <algorithms.hpp>
...
using vertex_t = int32_t; // or int64_t, whichever is appropriate
using weight_t = float; // or double, whichever is appropriate
using result_t = weight_t; // could specify float or double also
raft::handle_t handle; // Must be configured if MG
auto graph_view = graph.view(); // assumes you have created a graph somehow
vertex_t source; // Initialized by user
rmm::device_uvector<weight_t> distances_v(graph_view.get_number_of_vertices(), handle.get_stream());
rmm::device_uvector<vertex_t> predecessors_v(graph_view.get_number_of_vertices(), handle.get_stream());
cugraph::experimental::bfs(handle, graph_view, d_distances.begin(), d_predecessors.begin(), source, false, std::numeric_limits<vertex_t>::max(), false);
```

0 comments on commit daa9622

Please sign in to comment.