Skip to content

Commit

Permalink
Use Datasets API to Update Notebook Examples (rapidsai#2440)
Browse files Browse the repository at this point in the history
Addresses issue rapidsai#2364 

All of the SG notebook examples have been updated to use the newly added Datasets API. Previously, Graph objects were created by specifying a path to the `.csv` file, calling `cuDF` to read in the file, and then converting the edge list to a graph. 

Now, a dataset object is imported and can create graphs by calling the `get_graph()` method. Comments and headings have also been updated for continuity.

Authors:
  - Ralph Liu (https://github.com/oorliu)

Approvers:
  - Rick Ratzel (https://github.com/rlratzel)

URL: rapidsai#2440
  • Loading branch information
oorliu authored Aug 2, 2022
1 parent 5c7303c commit b74e22a
Show file tree
Hide file tree
Showing 27 changed files with 727 additions and 1,188 deletions.
56 changes: 12 additions & 44 deletions notebooks/algorithms/centrality/Betweenness.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
"| --------------|------------|------------------|-----------------|----------------|\n",
"| Brad Rees | 04/24/2019 | created | 0.15 | GV100, CUDA 11.0\n",
"| Brad Rees | 08/16/2020 | tested / updated | 21.10 nightly | RTX 3090 CUDA 11.4\n",
"| Don Acosta | 07/05/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5"
"| Don Acosta | 07/05/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5\n",
"| Ralph Liu | 07/26/2022 | updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5"
]
},
{
Expand Down Expand Up @@ -111,7 +112,10 @@
"source": [
"# Import needed libraries\n",
"import cugraph\n",
"import cudf"
"import cudf\n",
"\n",
"# Import a built-in dataset\n",
"from cugraph.experimental.datasets import karate"
]
},
{
Expand All @@ -124,42 +128,6 @@
"import networkx as nx"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Some Prep"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define the path to the test data \n",
"datafile='../../data/karate-data.csv'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read in the data - GPU\n",
"cuGraph depends on cuDF for data loading and the initial Dataframe creation\n",
"\n",
"The data file contains an edge list, which represents the connection of a vertex to another. The `source` to `destination` pairs is in what is known as Coordinate Format (COO). In this test case, the data is just two columns. However a third, `weight`, column is also possible"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -173,9 +141,8 @@
"metadata": {},
"outputs": [],
"source": [
"# create a Graph using the source (src) and destination (dst) vertex pairs from the Dataframe \n",
"G = cugraph.Graph()\n",
"G.from_cudf_edgelist(gdf, source='src', destination='dst')"
"# Create a graph using the imported Dataset object\n",
"G = karate.get_graph(fetch=True)"
]
},
{
Expand Down Expand Up @@ -256,6 +223,7 @@
"outputs": [],
"source": [
"# Read the data, this also created a NetworkX Graph \n",
"datafile=\"../../data/karate-data.csv\"\n",
"file = open(datafile, 'rb')\n",
"Gnx = nx.read_edgelist(file)"
]
Expand Down Expand Up @@ -321,7 +289,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.13 ('cugraph_dev')",
"display_name": "Python 3.9.7 ('base')",
"language": "python",
"name": "python3"
},
Expand All @@ -335,11 +303,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.9.7"
},
"vscode": {
"interpreter": {
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
}
}
},
Expand Down
53 changes: 14 additions & 39 deletions notebooks/algorithms/centrality/Katz.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@
"| --------------|------------|------------------|-----------------|----------------|\n",
"| Brad Rees | 10/15/2019 | created | 0.14 | GV100, CUDA 10.2\n",
"| Brad Rees | 08/16/2020 | tested / updated | 0.15.1 nightly | RTX 3090 CUDA 11.4\n",
"| Don Acosta | 07/05/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5"
"| Don Acosta | 07/05/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5\n",
"| Ralph Liu | 07/26/2022 | updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5"
]
},
{
Expand Down Expand Up @@ -40,9 +41,9 @@
" this value is 0.0f, cuGraph will use the default value which is 0.00001. \n",
" Setting too small a tolerance can lead to non-convergence due to numerical \n",
" roundoff. Usually values between 0.01 and 0.00001 are acceptable.\n",
" nstart:cuDataFrame, GPU Dataframe containing the initial guess for katz centrality. \n",
" nstart: cuDataFrame, GPU Dataframe containing the initial guess for katz centrality. \n",
" Default is None\n",
" normalized:bool, If True normalize the resulting katz centrality values. \n",
" normalized: bool, If True normalize the resulting katz centrality values. \n",
" Default is True\n",
"\n",
"Returns:\n",
Expand Down Expand Up @@ -106,7 +107,10 @@
"source": [
"# Import rapids libraries\n",
"import cugraph\n",
"import cudf"
"import cudf\n",
"\n",
"# Import a built-in dataset\n",
"from cugraph.experimental.datasets import karate"
]
},
{
Expand Down Expand Up @@ -140,35 +144,6 @@
"tol = 0.00001 # tolerance"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Define the path to the test data \n",
"datafile='../../data/karate-data.csv'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Read in the data - GPU\n",
"cuGraph depends on cuDF for data loading and the initial Dataframe creation\n",
"\n",
"The data file contains an edge list, which represents the connection of a vertex to another. The `source` to `destination` pairs is in what is known as Coordinate Format (COO). In this test case, the data is just two columns. However a third, `weight`, column is also possible"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -182,9 +157,8 @@
"metadata": {},
"outputs": [],
"source": [
"# create a Graph using the source (src) and destination (dst) vertex pairs from the Dataframe \n",
"G = cugraph.Graph()\n",
"G.from_cudf_edgelist(gdf, source='src', destination='dst')"
"# Create a graph using the imported Dataset object\n",
"G = karate.get_graph(fetch=True)"
]
},
{
Expand Down Expand Up @@ -275,6 +249,7 @@
"outputs": [],
"source": [
"# Read the data, this also created a NetworkX Graph \n",
"datafile = \"../../data/karate-data.csv\"\n",
"file = open(datafile, 'rb')\n",
"Gnx = nx.read_edgelist(file)"
]
Expand Down Expand Up @@ -348,7 +323,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.13 ('cugraph_dev')",
"display_name": "Python 3.9.7 ('base')",
"language": "python",
"name": "python3"
},
Expand All @@ -362,11 +337,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.9.7"
},
"vscode": {
"interpreter": {
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
}
}
},
Expand Down
37 changes: 12 additions & 25 deletions notebooks/algorithms/community/ECG.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@
"| | 08/16/2020 | updated | 0.15 | GV100, CUDA 10.2 |\n",
"| | 08/05/2021 | tested/updated | 21.10 nightly | RTX 3090 CUDA 11.4 |\n",
"| Don Acosta | 07/20/2022 | tested/updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n",
"| Ralph Liu | 07/26/2022 | updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n",
"\n",
"## Introduction\n",
"\n",
Expand Down Expand Up @@ -101,34 +102,17 @@
"source": [
"# Import needed libraries\n",
"import cugraph\n",
"import cudf"
"import cudf\n",
"\n",
"# Import a built-in dataset\n",
"from cugraph.experimental.datasets import karate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read data using cuDF"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test file \n",
"datafile='../../data/karate-data.csv'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# read the data using cuDF\n",
"gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
"## Create an Edgelist"
]
},
{
Expand All @@ -137,6 +121,9 @@
"metadata": {},
"outputs": [],
"source": [
"# You can also just get the edgelist\n",
"gdf = karate.get_edgelist(fetch=True)\n",
"\n",
"# The algorithm also requires that there are vertex weights. Just use 1.0 \n",
"gdf[\"data\"] = 1.0"
]
Expand Down Expand Up @@ -232,7 +219,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.13 ('cugraph_dev')",
"display_name": "Python 3.9.7 ('base')",
"language": "python",
"name": "python3"
},
Expand All @@ -246,11 +233,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.9.7"
},
"vscode": {
"interpreter": {
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
}
}
},
Expand Down
37 changes: 12 additions & 25 deletions notebooks/algorithms/community/Louvain.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
"| | 08/16/2020 | updated | 0.14 | GV100, CUDA 10.2 |\n",
"| | 08/05/2021 | tested / updated | 21.10 nightly | RTX 3090 CUDA 11.4 |\n",
"| Don Acosta | 07/11/2022 | tested / updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n",
"| Ralph Liu | 07/26/2022 | updated | 22.08 nightly | DGX Tesla V100 CUDA 11.5 |\n",
"\n",
"\n",
"\n",
Expand Down Expand Up @@ -140,34 +141,17 @@
"source": [
"# Import needed libraries\n",
"import cugraph\n",
"import cudf"
"import cudf\n",
"\n",
"# Import a built-in dataset\n",
"from cugraph.experimental.datasets import karate"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Read data using cuDF"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# Test file \n",
"datafile='../../data//karate-data.csv'"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"# read the data using cuDF\n",
"gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
"## Create an Edgelist"
]
},
{
Expand All @@ -176,6 +160,9 @@
"metadata": {},
"outputs": [],
"source": [
"# You can also just get the edgelist\n",
"gdf = karate.get_edgelist(fetch=True)\n",
"\n",
"# The algorithm also requires that there are vertex weights. Just use 1.0 \n",
"gdf[\"data\"] = 1.0"
]
Expand Down Expand Up @@ -323,7 +310,7 @@
],
"metadata": {
"kernelspec": {
"display_name": "Python 3.8.13 ('cugraph_dev')",
"display_name": "Python 3.9.7 ('base')",
"language": "python",
"name": "python3"
},
Expand All @@ -337,11 +324,11 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.13"
"version": "3.9.7"
},
"vscode": {
"interpreter": {
"hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
"hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
}
}
},
Expand Down
Loading

0 comments on commit b74e22a

Please sign in to comment.