Use Datasets API to Update Notebook Examples (rapidsai#2440)

Addresses issue rapidsai#2364 All of the SG notebook examples have been updated to use the newly added Datasets API. Previously, Graph objects were created by specifying a path to the `.csv` file, calling `cuDF` to read in the file, and then converting the edge list to a graph. Now, a dataset object is imported and can create graphs by calling the `get_graph()` method. Comments and headings have also been updated for continuity. Authors: - Ralph Liu (https://github.com/oorliu) Approvers: - Rick Ratzel (https://github.com/rlratzel) URL: rapidsai#2440
oorliu · Aug 2, 2022 · b74e22a · b74e22a
1 parent 5c7303c
commit b74e22a
Show file tree

Hide file tree

Showing 27 changed files with 727 additions and 1,188 deletions.
diff --git a/notebooks/algorithms/centrality/Betweenness.ipynb b/notebooks/algorithms/centrality/Betweenness.ipynb
@@ -12,7 +12,8 @@
     "| --------------|------------|------------------|-----------------|----------------|\n",
     "| Brad Rees     | 04/24/2019 | created          | 0.15            | GV100, CUDA 11.0\n",
     "| Brad Rees     | 08/16/2020 | tested / updated | 21.10 nightly   | RTX 3090 CUDA 11.4\n",
-    "| Don Acosta    | 07/05/2022 | tested / updated | 22.08 nightly   | DGX Tesla V100 CUDA 11.5"
+    "| Don Acosta    | 07/05/2022 | tested / updated | 22.08 nightly   | DGX Tesla V100 CUDA 11.5\n",
+    "| Ralph Liu    | 07/26/2022 | updated | 22.08 nightly   | DGX Tesla V100 CUDA 11.5"
    ]
   },
   {
@@ -111,7 +112,10 @@
    "source": [
     "# Import needed libraries\n",
     "import cugraph\n",
-    "import cudf"
+    "import cudf\n",
+    "\n",
+    "# Import a built-in dataset\n",
+    "from cugraph.experimental.datasets import karate"
    ]
   },
   {
@@ -124,42 +128,6 @@
     "import networkx as nx"
    ]
   },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Some Prep"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Define the path to the test data  \n",
-    "datafile='../../data/karate-data.csv'"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Read in the data - GPU\n",
-    "cuGraph depends on cuDF for data loading and the initial Dataframe creation\n",
-    "\n",
-    "The data file contains an edge list, which represents the connection of a vertex to another.  The `source` to `destination` pairs is in what is known as Coordinate Format (COO).  In this test case, the data is just two columns.  However a third, `weight`, column is also possible"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -173,9 +141,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# create a Graph using the source (src) and destination (dst) vertex pairs from the Dataframe \n",
-    "G = cugraph.Graph()\n",
-    "G.from_cudf_edgelist(gdf, source='src', destination='dst')"
+    "# Create a graph using the imported Dataset object\n",
+    "G = karate.get_graph(fetch=True)"
    ]
   },
   {
@@ -256,6 +223,7 @@
    "outputs": [],
    "source": [
     "# Read the data, this also created a NetworkX Graph \n",
+    "datafile=\"../../data/karate-data.csv\"\n",
     "file = open(datafile, 'rb')\n",
     "Gnx = nx.read_edgelist(file)"
    ]
@@ -321,7 +289,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3.8.13 ('cugraph_dev')",
+   "display_name": "Python 3.9.7 ('base')",
    "language": "python",
    "name": "python3"
   },
@@ -335,11 +303,11 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.9.7"
   },
   "vscode": {
    "interpreter": {
-    "hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
+    "hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
    }
   }
  },

diff --git a/notebooks/algorithms/centrality/Katz.ipynb b/notebooks/algorithms/centrality/Katz.ipynb
@@ -12,7 +12,8 @@
     "| --------------|------------|------------------|-----------------|----------------|\n",
     "| Brad Rees     | 10/15/2019 | created          | 0.14            | GV100, CUDA 10.2\n",
     "| Brad Rees     | 08/16/2020 | tested / updated | 0.15.1 nightly  | RTX 3090 CUDA 11.4\n",
-    "| Don Acosta    | 07/05/2022 | tested / updated | 22.08 nightly   | DGX Tesla V100 CUDA 11.5"
+    "| Don Acosta    | 07/05/2022 | tested / updated | 22.08 nightly   | DGX Tesla V100 CUDA 11.5\n",
+    "| Ralph Liu    | 07/26/2022 | updated | 22.08 nightly   | DGX Tesla V100 CUDA 11.5"
    ]
   },
   {
@@ -40,9 +41,9 @@
     "            this value is 0.0f, cuGraph will use the default value which is 0.00001. \n",
     "            Setting too small a tolerance can lead to non-convergence due to numerical \n",
     "            roundoff. Usually values between 0.01 and 0.00001 are acceptable.\n",
-    "    nstart:cuDataFrame, GPU Dataframe containing the initial guess for katz centrality. \n",
+    "    nstart: cuDataFrame, GPU Dataframe containing the initial guess for katz centrality. \n",
     "            Default is None\n",
-    "    normalized:bool, If True normalize the resulting katz centrality values.  \n",
+    "    normalized: bool, If True normalize the resulting katz centrality values.  \n",
     "            Default is True\n",
     "\n",
     "Returns:\n",
@@ -106,7 +107,10 @@
    "source": [
     "# Import rapids libraries\n",
     "import cugraph\n",
-    "import cudf"
+    "import cudf\n",
+    "\n",
+    "# Import a built-in dataset\n",
+    "from cugraph.experimental.datasets import karate"
    ]
   },
   {
@@ -140,35 +144,6 @@
     "tol = 0.00001   # tolerance"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Define the path to the test data  \n",
-    "datafile='../../data/karate-data.csv'"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "### Read in the data - GPU\n",
-    "cuGraph depends on cuDF for data loading and the initial Dataframe creation\n",
-    "\n",
-    "The data file contains an edge list, which represents the connection of a vertex to another.  The `source` to `destination` pairs is in what is known as Coordinate Format (COO).  In this test case, the data is just two columns.  However a third, `weight`, column is also possible"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
-   ]
-  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -182,9 +157,8 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "# create a Graph using the source (src) and destination (dst) vertex pairs from the Dataframe \n",
-    "G = cugraph.Graph()\n",
-    "G.from_cudf_edgelist(gdf, source='src', destination='dst')"
+    "# Create a graph using the imported Dataset object\n",
+    "G = karate.get_graph(fetch=True)"
    ]
   },
   {
@@ -275,6 +249,7 @@
    "outputs": [],
    "source": [
     "# Read the data, this also created a NetworkX Graph \n",
+    "datafile = \"../../data/karate-data.csv\"\n",
     "file = open(datafile, 'rb')\n",
     "Gnx = nx.read_edgelist(file)"
    ]
@@ -348,7 +323,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3.8.13 ('cugraph_dev')",
+   "display_name": "Python 3.9.7 ('base')",
    "language": "python",
    "name": "python3"
   },
@@ -362,11 +337,11 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.9.7"
   },
   "vscode": {
    "interpreter": {
-    "hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
+    "hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
    }
   }
  },

diff --git a/notebooks/algorithms/community/ECG.ipynb b/notebooks/algorithms/community/ECG.ipynb
@@ -13,6 +13,7 @@
     "|               | 08/16/2020 | updated          | 0.15            | GV100, CUDA 10.2      |\n",
     "|               | 08/05/2021 | tested/updated   | 21.10 nightly   | RTX 3090 CUDA 11.4    |\n",
     "| Don Acosta    | 07/20/2022 | tested/updated   | 22.08 nightly   | DGX Tesla V100 CUDA 11.5   |\n",
+    "| Ralph Liu    | 07/26/2022 | updated  | 22.08 nightly   | DGX Tesla V100 CUDA 11.5   |\n",
     "\n",
     "## Introduction\n",
     "\n",
@@ -101,34 +102,17 @@
    "source": [
     "# Import needed libraries\n",
     "import cugraph\n",
-    "import cudf"
+    "import cudf\n",
+    "\n",
+    "# Import a built-in dataset\n",
+    "from cugraph.experimental.datasets import karate"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Read data using cuDF"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Test file    \n",
-    "datafile='../../data/karate-data.csv'"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# read the data using cuDF\n",
-    "gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
+    "## Create an Edgelist"
    ]
   },
   {
@@ -137,6 +121,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "# You can also just get the edgelist\n",
+    "gdf = karate.get_edgelist(fetch=True)\n",
+    "\n",
     "# The algorithm also requires that there are vertex weights.  Just use 1.0 \n",
     "gdf[\"data\"] = 1.0"
    ]
@@ -232,7 +219,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3.8.13 ('cugraph_dev')",
+   "display_name": "Python 3.9.7 ('base')",
    "language": "python",
    "name": "python3"
   },
@@ -246,11 +233,11 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.9.7"
   },
   "vscode": {
    "interpreter": {
-    "hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
+    "hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
    }
   }
  },

diff --git a/notebooks/algorithms/community/Louvain.ipynb b/notebooks/algorithms/community/Louvain.ipynb
@@ -15,6 +15,7 @@
     "|               | 08/16/2020 | updated          | 0.14            | GV100, CUDA 10.2            |\n",
     "|               | 08/05/2021 | tested / updated | 21.10 nightly   | RTX 3090 CUDA 11.4          |\n",
     "| Don Acosta    | 07/11/2022 | tested / updated | 22.08 nightly   | DGX Tesla V100 CUDA 11.5    |\n",
+    "| Ralph Liu    | 07/26/2022 | updated | 22.08 nightly   | DGX Tesla V100 CUDA 11.5    |\n",
     "\n",
     "\n",
     "\n",
@@ -140,34 +141,17 @@
    "source": [
     "# Import needed libraries\n",
     "import cugraph\n",
-    "import cudf"
+    "import cudf\n",
+    "\n",
+    "# Import a built-in dataset\n",
+    "from cugraph.experimental.datasets import karate"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Read data using cuDF"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# Test file    \n",
-    "datafile='../../data//karate-data.csv'"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "# read the data using cuDF\n",
-    "gdf = cudf.read_csv(datafile, delimiter='\\t', names=['src', 'dst'], dtype=['int32', 'int32'] )"
+    "## Create an Edgelist"
    ]
   },
   {
@@ -176,6 +160,9 @@
    "metadata": {},
    "outputs": [],
    "source": [
+    "# You can also just get the edgelist\n",
+    "gdf = karate.get_edgelist(fetch=True)\n",
+    "\n",
     "# The algorithm also requires that there are vertex weights.  Just use 1.0 \n",
     "gdf[\"data\"] = 1.0"
    ]
@@ -323,7 +310,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3.8.13 ('cugraph_dev')",
+   "display_name": "Python 3.9.7 ('base')",
    "language": "python",
    "name": "python3"
   },
@@ -337,11 +324,11 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.13"
+   "version": "3.9.7"
   },
   "vscode": {
    "interpreter": {
-    "hash": "cee8a395f2f0c5a5bcf513ae8b620111f4346eff6dc64e1ea99c951b2ec68604"
+    "hash": "f708a36acfaef0acf74ccd43dfb58100269bf08fb79032a1e0a6f35bd9856f51"
    }
   }
  },