Skip to content

Commit

Permalink
Refactor Sampling, Structure and Traversal Notebooks (#2628)
Browse files Browse the repository at this point in the history
- Moves notebooks under algorithms
- Adds README's
- Tests functionality
- Updates format to match other notebooks

Fixed doc error in edge_betweenness_centrality call reported in issue #2519

closes #2610
closes #2611
closes #2612
closes #2519

Authors:
  - Don Acosta (https://github.com/acostadon)

Approvers:
  - Brad Rees (https://github.com/BradReesWork)
  - Alex Barghi (https://github.com/alexbarghi-nv)

URL: #2628
  • Loading branch information
acostadon authored Sep 7, 2022
1 parent 808ac58 commit dfc640f
Show file tree
Hide file tree
Showing 14 changed files with 268 additions and 138 deletions.
10 changes: 5 additions & 5 deletions notebooks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,13 +36,13 @@ Layout |
| | [Jaccard Similarity](algorithms/link_prediction/Jaccard-Similarity.ipynb) | Compute vertex similarity score using both:<br />- Jaccard Similarity<br />- Weighted Jaccard |
| | [Overlap Similarity](algorithms/link_prediction/Overlap-Similarity.ipynb) | Compute vertex similarity score using the Overlap Coefficient |
| Sampling |
| | [Random Walk](sampling/RandomWalk.ipynb) | Compute Random Walk for a various number of seeds and path lengths |
| | [Random Walk](algorithms/sampling/RandomWalk.ipynb) | Compute Random Walk for a various number of seeds and path lengths |
| Traversal | | |
| | [BFS](traversal/BFS.ipynb) | Compute the Breadth First Search path from a starting vertex to every other vertex in a graph |
| | [SSSP](traversal/SSSP.ipynb) | Single Source Shortest Path - compute the shortest path from a starting vertex to every other vertex |
| | [BFS](algorithms/traversal/BFS.ipynb) | Compute the Breadth First Search path from a starting vertex to every other vertex in a graph |
| | [SSSP](algorithms/traversal/SSSP.ipynb) | Single Source Shortest Path - compute the shortest path from a starting vertex to every other vertex |
| Structure | | |
| | [Renumbering](structure/Renumber.ipynb) <br> [Renumbering 2](structure/Renumber-2.ipynb) | Renumber the vertex IDs in a graph (two sample notebooks) |
| | [Symmetrize](structure/Symmetrize.ipynb) | Symmetrize the edges in a graph |
| | [Renumbering](algorithms/structure/Renumber.ipynb) <br> [Renumbering 2](algorithms/structure/Renumber-2.ipynb) | Renumber the vertex IDs in a graph (two sample notebooks) |
| | [Symmetrize](algorithms/structure/Symmetrize.ipynb) | Symmetrize the edges in a graph |


## RAPIDS notebooks
Expand Down
13 changes: 4 additions & 9 deletions notebooks/algorithms/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,15 +36,14 @@ Layout |
| [Link Prediction](link_prediction/README.md) | | |
| | [Jaccard Similarity](algorithms/link_prediction/Jaccard-Similarity.ipynb) | Compute vertex similarity score using both:<br />- Jaccard Similarity<br />- Weighted Jaccard |
| | [Overlap Similarity](algorithms/link_prediction/Overlap-Similarity.ipynb) | Compute vertex similarity score using the Overlap Coefficient |
<!--| Sampling |
| [Sampling](sampling/README.md) |
| | [Random Walk](sampling/RandomWalk.ipynb) | Compute Random Walk for a various number of seeds and path lengths |
| Traversal | | |
| [Traversal](traversal/README.md) | | |
| | [BFS](traversal/BFS.ipynb) | Compute the Breadth First Search path from a starting vertex to every other vertex in a graph |
| | [SSSP](traversal/SSSP.ipynb) | Single Source Shortest Path - compute the shortest path from a starting vertex to every other vertex |
| Structure | | |
| [Structure](structure/README.md) | | |
| | [Renumbering](structure/Renumber.ipynb) <br> [Renumbering 2](structure/Renumber-2.ipynb) | Renumber the vertex IDs in a graph (two sample notebooks) |
| | [Symmetrize](structure/Symmetrize.ipynb) | Symmetrize the edges in a graph |
-->

[System Requirements](../README.md#requirements)

Expand All @@ -63,8 +62,4 @@ http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.





![RAPIDS](img/rapids_logo.png)
![RAPIDS](../img/rapids_logo.png)
36 changes: 36 additions & 0 deletions notebooks/algorithms/sampling/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@

# cuGraph Sampling Algorithms

<img src="../../img/zachary_black_lines.png" width="35%"/>

CuGraph Sampling notebooks begin to address graph problems solved by random or other methods of sampling.
These algorithms will solve problems like:

* How to collect uniform samples from a large graph
* Scaling down a large known graph
* Exploring a huge unknown graph

## Summary

|Algorithm |Notebooks Containing |Description |
| --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
|Random Walk | [RandomWalk](RandomWalk.ipynb) | Generates a Random path that exists in the graph starting from a seed vertex |

[System Requirements](../../README.md#requirements)

| Author Credit | Date | Update | cuGraph Version | Test Hardware |
| --------------|------------|------------------|-----------------|----------------|
| Brad Rees | 04/20/2021 | created | 0.19 | GV100, CUDA 11.0
| Don Acosta | 08/29/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5|

## Copyright

Copyright (c) 2021-2022, NVIDIA CORPORATION. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

![RAPIDS](../../img/rapids_logo.png)
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,9 @@
"| --------------|------------|----------------|-----------------|----------------|\n",
"| Brad Rees | 04/20/2021 | created | 0.19 | GV100, CUDA 11.0\n",
"| Ralph Liu | 06/22/2022 | updated/tested | 22.08 | TV100, CUDA 11.5\n",
"| Don Acosta | 08/28/2022 | updated/tested | 22.10 | TV100, CUDA 11.5\n",
"\n",
"Currently NetworkX does not have a random walk function. There is code on StackOverflow that generats a random walk by getting a vertice and then randomly selection a neighbor and then repeating the process. "
"Currently NetworkX does not have a random walk function. There is code on StackOverflow that generates a random walk by getting a vertex and then randomly selecting a neighbor and then repeating the process. "
]
},
{
Expand All @@ -27,7 +28,7 @@
"Anthropological Research 33, 452-473 (1977).*\n",
"\n",
"\n",
"![Karate Club](../img/zachary_black_lines.png)\n",
"<img src=\"../../img/zachary_black_lines.png\" width=\"35%\"/>\n",
"\n",
"\n",
"Because the test data has vertex IDs starting at 1, the auto-renumber feature of cuGraph (mentioned above) will be used so the starting vertex ID is zero for maximum efficiency. The resulting data will then be auto-unrenumbered, making the entire renumbering process transparent to users."
Expand Down Expand Up @@ -62,7 +63,7 @@
"metadata": {},
"outputs": [],
"source": [
"gdf['wt'] = 1.0"
"gdf['weight'] = 1.0"
]
},
{
Expand All @@ -73,7 +74,7 @@
"source": [
"# Create a Graph - using the source (src) and destination (dst) vertex pairs from the Dataframe \n",
"G = cugraph.Graph()\n",
"G.from_cudf_edgelist(gdf, source='src', destination='dst', edge_attr='wt')"
"G.from_cudf_edgelist(gdf, source='src', destination='dst', edge_attr='weight')"
]
},
{
Expand Down Expand Up @@ -166,7 +167,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.7 ('base')",
"display_name": "Python 3.9.13 ('cugraph_dev')",
"language": "python",
"name": "python3"
},
Expand All @@ -180,11 +181,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
}
}
},
Expand Down
34 changes: 34 additions & 0 deletions notebooks/algorithms/structure/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@

# cuGraph Structure Algorithms

<img src="../../img/zachary_black_lines.png" width="35%"/>

cuGraph Structure notebooks contain Jupyter Notebooks that demonstrate graph manipulations which support other cuGraph algorithms. Many cuGraph algorithms expect vertices ids formated as a contiguous list of integers. Some only support a directed graph. CuGraph structure algorithms encapsulate that functionality and make all those relying on them more efficient and independent of this aspect graph standardizing.

## Summary

|Algorithm |Notebooks Containing |Description |
| --------------- | ------------------------------------------------------------ | ------------------------------------------------------------ |
|Renumber | [Renumber](Renumber.ipynb) | Converts a graph with arbitrary vertex ids into a contiguous series of integers for efficient handling by many other cuGraph algorithms |
|Renumber | [Renumber2](Renumber-2.ipynb) | Demonstrates how the renumber function can optimize graph processing by converting the underlying sparse matrix into an edgelist with a much small memory footprint. |
|Symmetrize | [Symmetrize](Symmetrize.ipynb) |Demonstrates the functionality to transform an undirected graph into a directed graph with edges in each direction as needed for many other cuGraph algorithms.|


[System Requirements](../../README.md#requirements)

| Author Credit | Date | Update | cuGraph Version | Test Hardware |
| --------------|------------|------------------|-----------------|----------------|
| Brad Rees | 04/19/2021 | created | 0.19 | GV100, CUDA 11.0
| Don Acosta | 08/29/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5|

## Copyright

Copyright (c) 2019-2022, NVIDIA CORPORATION. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

![RAPIDS](../../img/rapids_logo.png)
Original file line number Diff line number Diff line change
Expand Up @@ -12,21 +12,14 @@
"\n",
"An alternative case is using renumbering to convert from one data type down to a contiguious sequence of integer IDs. This is useful when the dataset contain vertex IDs that are not integers. \n",
"\n",
"\n",
"Notebook Credits\n",
"\n",
"| Author | Date | Update |\n",
"| --------------|------------|---------------------|\n",
"| Brad Rees | 08/13/2019 | created |\n",
"| Brad Rees | 07/08/2020 | updated |\n",
"| Ralph Liu | 06/01/2022 | docs & code change |\n",
"| | 06/22/2022 | update |\n",
"\n",
"RAPIDS Versions: 22.08 \n",
"\n",
"Test Hardware\n",
"\n",
"* Tesla V100 32G, CUDA 11.5\n",
"| Author Credit | Date | Update | cuGraph Version | Test Hardware |\n",
"| --------------|------------|--------------------|-----------------|----------------|\n",
"| Brad Rees | 08/13/2019 | created | 0.10 | GV100, CUDA 11.0\n",
"| Brad Rees | 07/08/2020 | updated | 0.15 | GV100, CUDA 11.0\n",
"| Ralph Liu | 06/22/2022 | docs & code change | 22.08 | TV100, CUDA 11.5\n",
"| Don Acosta | 08/28/2022 | updated/tested | 22.10 | TV100, CUDA 11.5\n",
"\n",
"\n",
"## Introduction\n",
Expand All @@ -53,7 +46,7 @@
"metadata": {},
"source": [
"### Test Data\n",
"A cyber data set from the University of New South Wales is used, where just the IP edge pairs from been extracted"
"Using the IP edge pairs of a cyber data set from the University of New South Wales."
]
},
{
Expand Down Expand Up @@ -236,7 +229,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.7 ('base')",
"display_name": "Python 3.9.13 ('cugraph_dev')",
"language": "python",
"name": "python3"
},
Expand All @@ -250,11 +243,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
}
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,17 +13,11 @@
"An alternative case is using renumbering to convert from one data type down to a contiguous sequence of integer IDs. This is useful when the dataset contain vertex IDs that are not integers. \n",
"\n",
"\n",
"Notebook Credits\n",
"* Original Authors: Chuck Hastings and Bradley Rees\n",
"* Created: 08/13/2019\n",
"* Updated: 06/22/2022\n",
"\n",
"RAPIDS Versions: 22.08 \n",
"\n",
"Test Hardware\n",
"\n",
"* Tesla V100 32G, CUDA 11.5\n",
"\n",
"| Author Credit | Date | Update | cuGraph Version | Test Hardware |\n",
"| --------------------------------|-----------------|--------------------|-----------------|----------------|\n",
"| Brad Rees and Chuck Hastings | 08/13/2019 | created | 0.10 | GV100, CUDA 11.0\n",
"| Brad Rees | 06/22/2020 | updated | 0.15 | GV100, CUDA 11.0\n",
"| Don Acosta | 08/28/2022 | updated/tested | 22.10 | TV100, CUDA 11.5\n",
"## Introduction\n",
"\n",
"Demonstrate creating a graph with renumbering.\n",
Expand All @@ -38,9 +32,7 @@
"\n",
"Let us consider that a vertex is uniquely defined as a tuple of elements from the rows of a cuDF DataFrame. The primary restriction is that the number of elements in the tuple must be the same for both source vertices and destination vertices, and that the types of each element in the source tuple must be the same as the corresponding element in the destination tuple. This restriction is a natural restriction and should be obvious why this is required.\n",
"\n",
"Renumbering takes the collection of tuples that uniquely identify vertices in the graph, eliminates duplicates, and assigns integer identifiers to the unique tuples. These integer identifiers are used as *internal* vertex identifiers within the cuGraph software.\n",
"\n",
"One of the features of the renumbering function is that it maps vertex ids of any size and structure down into a range that fits into 32-bit integers. The current cuGraph algorithms are limited to 32-bit signed integers as vertex ids. and the renumbering feature will allow the caller to translate ids that are 64-bit (or strings, or complex data types) into a densely packed 32-bit array of ids that can be used in cuGraph algorithms. Note that if there are more than 2^31 - 1 unique vertex ids then the renumber method will fail with an error indicating that there are too many vertices to renumber into a 32-bit signed integer."
"Renumbering takes the collection of tuples that uniquely identify vertices in the graph, eliminates duplicates, and assigns integer identifiers to the unique tuples. These integer identifiers are used as *internal* vertex identifiers within the cuGraph software.\n"
]
},
{
Expand Down Expand Up @@ -342,7 +334,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.7 ('base')",
"display_name": "Python 3.9.13 ('cugraph_dev')",
"language": "python",
"name": "python3"
},
Expand All @@ -356,11 +348,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
}
}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,16 +8,12 @@
"\n",
"In this notebook, we will use the _symmetrize_ function to create bi-directional edges in an undirected graph\n",
"\n",
"Notebook Credits\n",
"* Original Authors: Bradley Rees and James Wyles\n",
"* Created: 08/13/2019\n",
"* Updated: 06/22/2022\n",
"\n",
"RAPIDS Versions: 22.08 \n",
"\n",
"Test Hardware\n",
"\n",
"* Tesla V100 32G, CUDA 11.5\n",
"| Author Credit | Date | Update | cuGraph Version | Test Hardware |\n",
"| --------------------------------|-----------------|--------------------|-----------------|----------------|\n",
"| Brad Rees and James Wyles | 08/13/2019 | created | 0.10 | GV100, CUDA 11.0\n",
"| Brad Rees | 06/22/2020 | updated | 0.15 | GV100, CUDA 11.0\n",
"| Don Acosta | 08/28/2022 | updated/tested | 22.10 | TV100, CUDA 11.5\n",
"\n",
"\n",
"## Introduction\n",
Expand Down Expand Up @@ -49,7 +45,7 @@
"Anthropological Research 33, 452-473 (1977).*\n",
"\n",
"\n",
"![Karate Club](../img/zachary_black_lines.png)\n"
"<img src=\"../../img/zachary_black_lines.png\" width=\"35%\"/>\n"
]
},
{
Expand All @@ -70,7 +66,7 @@
"outputs": [],
"source": [
"# Read the unsymmetrized data \n",
"unsym_data ='../data/karate_undirected.csv'\n",
"unsym_data ='../../data/karate_undirected.csv'\n",
"gdf = cudf.read_csv(unsym_data, names=[\"src\", \"dst\"], delimiter='\\t', dtype=[\"int32\", \"int32\"] )"
]
},
Expand Down Expand Up @@ -175,7 +171,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.9.7 ('base')",
"display_name": "Python 3.9.13 ('cugraph_dev')",
"language": "python",
"name": "python3"
},
Expand All @@ -189,11 +185,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.7"
"version": "3.9.13"
},
"vscode": {
"interpreter": {
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
}
}
},
Expand Down
Loading

0 comments on commit dfc640f

Please sign in to comment.