Skip to content

Commit

Permalink
Changes to Cores, components and layout notebooks (rapidsai#2448)
Browse files Browse the repository at this point in the history
  • Loading branch information
acostadon authored Jul 28, 2022
1 parent e935efa commit 5ef8291
Show file tree
Hide file tree
Showing 11 changed files with 203 additions and 426 deletions.
8 changes: 5 additions & 3 deletions notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,12 @@ This repository contains a collection of Jupyter Notebooks that outline how to r
| | [Subgraph Extraction](algorithms/community/Subgraph-Extraction.ipynb) | Compute a subgraph of the existing graph including only the specified vertices |
| | [Triangle Counting](algorithms/community/Triangle-Counting.ipynb) | Count the number of Triangle in a graph |
| Components | | |
| | [Connected Components](components/ConnectedComponents.ipynb) | Find weakly and strongly connected components in a graph |
| | [Connected Components](algorithms/components/ConnectedComponents.ipynb) | Find weakly and strongly connected components in a graph |
| Core | | |
| | [K-Core](cores/kcore.ipynb) | Extracts the K-core cluster |
| | [Core Number](cores/core-number.ipynb) | Computer the Core number for each vertex in a graph |
| | [K-Core](algorithms/cores/kcore.ipynb) | Extracts the K-core cluster |
| | [Core Number](algorithms/cores/core-number.ipynb) | Computer the Core number for each vertex in a graph |
Layout | | |
| | [Force-Atlas2](algorithms/layout/Force-Atlas2.ipynb) |A large graph visualization achieved with cuGraph. |
| Link Analysis | | |
| | [Pagerank](link_analysis/Pagerank.ipynb) | Compute the PageRank of every vertex in a graph |
| | [HITS](link_analysis/HITS.ipynb) | Compute the HITS' Hub and Authority scores for every vertex in a graph |
Expand Down
14 changes: 8 additions & 6 deletions notebooks/algorithms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,18 +17,20 @@ This repository contains a collection of Jupyter Notebooks that outline how to r
| | [Degree](centrality/Degree.ipynb) | Compute Degree Centraility for each vertex |
| | [Eigenvector](centrality/Eigenvector.ipynb) | Compute Eigenvector for every vertex |
| Community | | |
| | [Louvain](community/Louvain.ipynb) and Leiden | Identify clusters in a graph using both the Louvain and Leiden algorithms |
| | [Louvain](community/Louvain.ipynb) | Identify clusters in a graph using both the Louvain and Leiden algorithms |
| | [ECG](community/ECG.ipynb) | Identify clusters in a graph using the Ensemble Clustering for Graph |
| | [K-Truss](community/ktruss.ipynb) | Extracts the K-Truss cluster |
| | [Spectral-Clustering](community/Spectral-Clustering.ipynb) | Identify clusters in a graph using Spectral Clustering with both<br> - Balanced Cut<br> - Modularity Modularity |
| | [Subgraph Extraction](community/Subgraph-Extraction.ipynb) | Compute a subgraph of the existing graph including only the specified vertices |
| | [Triangle Counting](community/Triangle-Counting.ipynb) | Count the number of Triangle in a graph |
<!--| Components | | |
Components | | |
| | [Connected Components](components/ConnectedComponents.ipynb) | Find weakly and strongly connected components in a graph |
| Core | | |
| | [K-Core](cores/kcore.ipynb) | Extracts the K-core cluster |
| | [Core Number](cores/core-number.ipynb) | Computer the Core number for each vertex in a graph |
| Link Analysis | | |
| Cores | | |
| | [core-number](cores/Core-number.ipynb) | Computes the core number for every vertex of a graph G. The core number of a vertex is a maximal subgraph that contains only that vertex and others of degree k or more. |
| | [kcore](cores/kcore.ipynb) |Find the k-core of a graph which is a maximal subgraph that contains nodes of degree k or more.|
Layout | | |
| | [Force-Atlas2](layout/Force-Atlas2.ipynb) |A large graph visualization achieved with cuGraph. |
<!--| Link Analysis | | |
| | [Pagerank](link_analysis/Pagerank.ipynb) | Compute the PageRank of every vertex in a graph |
| | [HITS](link_analysis/HITS.ipynb) | Compute the HITS' Hub and Authority scores for every vertex in a graph |
| Link Prediction | | |
Expand Down
4 changes: 2 additions & 2 deletions notebooks/algorithms/centrality/Centrality.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@
"Anthropological Research 33, 452-473 (1977).*\n",
"\n",
"\n",
"<img src=\"../../img/zachary_black_lines.png\" width=\"35%\"/>\n",
"<img src=\"../../img/zachary_graph_comm.png\" width=\"35%\"/>\n",
"\n",
"Because the test data has vertex IDs starting at 1, the auto-renumber feature of cuGraph (mentioned above) will be used so the starting vertex ID is zero for maximum efficiency. The resulting data will then be auto-unrenumbered, making the entire renumbering process transparent to users."
]
Expand Down Expand Up @@ -457,7 +457,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
Expand Down
4 changes: 2 additions & 2 deletions notebooks/algorithms/centrality/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,8 +15,8 @@ But which vertices are most important? The answer depends on which measure/algor

|Algorithm |Notebooks Containing |Description |
| --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
|Degree Centrality| [Centrality](./Centrality.ipynb), [Degree](centrality/Degree.ipynb) |Measure based on counting direct connections for each vertex|
|Betweenness Centrality| [Centrality](./Centrality.ipynb), [Betweenness](centrality/Betweenness.ipynb) |Number of shortest paths through the vertex|
|Degree Centrality| [Centrality](./Centrality.ipynb), [Degree](./Degree.ipynb) |Measure based on counting direct connections for each vertex|
|Betweenness Centrality| [Centrality](./Centrality.ipynb), [Betweenness](./Betweenness.ipynb) |Number of shortest paths through the vertex|
|Eigenvector Centrality|[Centrality](./Centrality.ipynb), [Eigenvector](./Eigenvector.ipynb)|Measure of connectivity to other important vertices (which also have high connectivity) often referred to as the influence measure of a vertex|
|Katz Centrality|[Centrality](./Centrality.ipynb), [Katz](./Katz.ipynb) |Similar to Eigenvector but has tweaks to measure more weakly connected graph |
|Pagerank|[Centrality](./Centrality.ipynb), [Pagerank](../../link_analysis/Pagerank.ipynb) |Classified as both a link analysis and centrality measure by quantifying incoming links from central vertices. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,16 +16,23 @@
"\n",
"_Notebook Credits_\n",
"\n",
"| Author Credit | Date | Update | cuGraph Version | Test Hardware |\n",
"| --------------|------------|------------------|-----------------|--------------------|\n",
"| Kumar Aatish | 08/13/2019 | created | 0.15 | GV100, CUDA 10.2 |\n",
"| Brad Rees | 10/18/2021 | updated | 21.12 nightly | GV100, CUDA 11.4 |\n",
"| Author Credit | Date | Update | cuGraph Version | Test Hardware |\n",
"| --------------|------------|------------------|-----------------|-----------------------------|\n",
"| Kumar Aatish | 08/13/2019 | created | 0.15 | GV100, CUDA 10.2 |\n",
"| Brad Rees | 10/18/2021 | updated | 21.12 nightly | GV100, CUDA 11.4 |\n",
"| Don Acosta | 07/22/2021 | updated | 22.08 nightly | DGX Tesla V100, CUDA 11.5 |\n",
"\n",
"\n",
"\n",
"\n",
"## Introduction\n",
"\n",
"### Additional Information:\n",
"* Strong component - directed graph - needs to have a directed path from every vertex to every other vertex\n",
"* Weak component - undirected - vertices need to be reachable from every other vertex\n",
"* Strong components are a subset of weak components \n",
"\n",
"\n",
"### Weakly Connected Components\n",
"To compute WCC for a graph in cuGraph we use:\n",
"\n",
Expand All @@ -40,8 +47,7 @@
" as an edge list (edge weights are not used for this algorithm).\n",
" Currently, the graph should be undirected where an undirected edge is\n",
" represented by a directed edge in both directions. The adjacency list\n",
" will be computed if not already present. The number of vertices should\n",
" fit into a 32b int.\n",
" will be computed if not already present.\n",
"\n",
" Returns\n",
" -------\n",
Expand Down Expand Up @@ -87,7 +93,6 @@
"metadata": {},
"source": [
"### Some notes about vertex IDs...\n",
"* The current version of cuGraph requires that vertex IDs be representable as 32-bit integers, meaning graphs currently can contain at most 2^32 unique vertex IDs. However, this limitation is being actively addressed and a version of cuGraph that accommodates more than 2^32 vertices will be available in the near future.\n",
"* cuGraph will automatically renumber graphs to an internal format consisting of a contiguous series of integers starting from 0, and convert back to the original IDs when returning data to the caller. If the vertex IDs of the data are already a contiguous series of integers starting from 0, the auto-renumbering step can be skipped for faster graph creation times.\n",
" * To skip auto-renumbering, set the `renumber` boolean arg to `False` when calling the appropriate graph creation API (eg. `G.from_cudf_edgelist(gdf_r, source='src', destination='dst', renumber=False)`).\n",
" * For more advanced renumbering support, see the examples in `structure/renumber.ipynb` and `structure/renumber-2.ipynb`\n"
Expand All @@ -101,11 +106,11 @@
"We will be using the Netscience dataset : \n",
"*M. E. J. Newman, Finding community structure in networks using the eigenvectors of matrices, Preprint physics/0605087 (2006)*\n",
"\n",
"The graph netscience contains a coauthorship network of scientists working on network theory and experiment. The version given here contains all components of the network, for a total of 1589 scientists, with the the largest component of 379 scientists.\n",
"The graph netscience contains a co-authorship network of scientists working on network theory and experiments. The version given here contains all components of the network, for a total of 1589 scientists, with the the largest component of 379 scientists.\n",
"\n",
"Netscience Adjacency Matrix |NetScience Strongly Connected Components\n",
":---------------------------------------------|------------------------------------------------------------:\n",
"![](../img/netscience.png \"Credit : https://www.cise.ufl.edu/research/sparse/matrices/Newman/netscience\") | ![](../img/netscience_scc.png \"Credit : https://www.cise.ufl.edu/research/sparse/matrices/Newman/netscience\")\n",
"![](../../img/netscience.png \"Credit : https://www.cise.ufl.edu/research/sparse/matrices/Newman/netscience\") | ![](../../img/netscience_scc.png \"Credit : https://www.cise.ufl.edu/research/sparse/matrices/Newman/netscience\")\n",
" \n",
"Matrix plots above by Yifan Hu, AT&T Labs Visualization Group."
]
Expand Down Expand Up @@ -142,7 +147,7 @@
"outputs": [],
"source": [
"# Test file\n",
"datafile='../data/netscience.csv'\n",
"datafile='../../data/netscience.csv'\n",
"\n",
"# the datafile contains three columns, but we only want to use the first two. \n",
"# We will use the \"usecols' feature of read_csv to ignore that column\n",
Expand Down Expand Up @@ -346,7 +351,7 @@
"metadata": {},
"source": [
"___\n",
"Copyright (c) 2019-2020, NVIDIA CORPORATION.\n",
"Copyright (c) 2019-2022, NVIDIA CORPORATION.\n",
"\n",
"Licensed under the Apache License, Version 2.0 (the \"License\"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0\n",
"\n",
Expand All @@ -357,9 +362,9 @@
],
"metadata": {
"kernelspec": {
"display_name": "cugraph_dev",
"display_name": "Python 3.8.13 ('cugraph_dev')",
"language": "python",
"name": "cugraph_dev"
"name": "python3"
},
"language_info": {
"codemirror_mode": {
Expand All @@ -371,7 +376,12 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.12"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
}
}
},
"nbformat": 4,
Expand Down
34 changes: 34 additions & 0 deletions notebooks/algorithms/components/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

# cuGraph Components Algorithms

<img src="../../img/zachary_black_lines.png" width="35%"/>

cuGraph Components notebooks contain Jupyter Notebooks that demonstrate algorithms to identify the connected subgraphs within a graph.

Manipulation of the data before or after the graph analytic is not covered here. Extended, more problem focused, notebooks are being created and available https://github.com/rapidsai/notebooks-extended

## Summary

|Algorithm |Notebooks Containing |Description |
| --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
|Weakly Connected Components | [ConnectedComponents](ConnectedComponents.ipynb) |Find the largest connected components in a graph. Considering directed paths or non-directed paths |
|Strongly Connected Components | [ConnectedComponents](ConnectedComponents.ipynb) |Find the connected components in a graph considering directed paths only|

[System Requirements](../../README.md#requirements)

| Author Credit | Date | Update | cuGraph Version | Test Hardware |
| --------------|------------|------------------|-----------------|----------------|
| Brad Rees | 04/19/2021 | created | 0.19 | GV100, CUDA 11.0
| Don Acosta | 07/21/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5

## Copyright

Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

![RAPIDS](../../img/rapids_logo.png)
34 changes: 34 additions & 0 deletions notebooks/algorithms/cores/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

# cuGraph Core Algorithms

<img src="../../img/zachary_black_lines.png" width="35%"/>

cuGraph Cores notebooks contain Jupyter Notebooks that demonstrate algorithms to find maximally connected subgraphs within a graph. Either identifying the maximum k-core at the vertex (core-number) or graph level (K-Cores).

Manipulation of the data before or after the graph analytic is not covered here. Extended, more problem focused, notebooks are being created and available https://github.com/rapidsai/notebooks-extended

## Summary

|Algorithm |Notebooks Containing |Description |
| --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
|Core Number | [core-number](core-number.ipynb) | Computes the core number for every vertex of a graph G. The core number of a vertex is a maximal subgraph that contains only that vertex and others of degree k or more. |
|K-Cores | [kcore](kcore.ipynb) |Find the k-core of a graph which is a maximal subgraph that contains nodes of degree k or more.|

[System Requirements](../../README.md#requirements)

| Author Credit | Date | Update | cuGraph Version | Test Hardware |
| --------------|------------|------------------|-----------------|----------------|
| Brad Rees | 04/19/2021 | created | 0.19 | GV100, CUDA 11.0
| Don Acosta | 07/21/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5

## Copyright

Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

![RAPIDS](../../img/rapids_logo.png)
Loading

0 comments on commit 5ef8291

Please sign in to comment.