Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW] Add notes to performance comparisons notebook #13044

Merged
merged 8 commits into from
Mar 31, 2023
16 changes: 10 additions & 6 deletions docs/cudf/source/user_guide/performance_comparisons.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,16 @@
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"This notebook compares the performance of `cuDF` and `pandas`. The comparisons performed are on identical data sizes. This notebook primarily showcases the factor\n",
"of speedups users can have when the similar `pandas` APIs are run on GPUs using `cudf`.\n",
"\n",
"The hardware details used to run these performance comparisons are at the end of this page."
"The hardware details used to run these performance comparisons are at the end of this page.\n",
Copy link
Contributor Author

@galipremsagar galipremsagar Mar 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've verified this claim holds true on a T4 by running the entire notebook after slashing the data-sizes in half.

"\n",
"**Note**: This notebook is written to measure performance on modern NVIDIA hardware, for older NVIDIA hardware with lower GPU memory please consider lowering the `num_rows` values below by a factor of 2. Results may vary by data sizes, CPU & GPU used."
]
},
{
Expand Down Expand Up @@ -576,9 +579,10 @@
},
"outputs": [],
"source": [
"num_rows = 300_000_000\n",
"pd_series = pd.Series(\n",
" np.random.choice(\n",
" [\"123\", \"56.234\", \"Walmart\", \"Costco\", \"rapids ai\"], size=300_000_000\n",
" [\"123\", \"56.234\", \"Walmart\", \"Costco\", \"rapids ai\"], size=num_rows\n",
" )\n",
")"
]
Expand Down Expand Up @@ -1368,10 +1372,10 @@
},
"outputs": [],
"source": [
"size = 100_000_000\n",
"num_rows = 100_000_000\n",
"pdf = pd.DataFrame()\n",
"pdf[\"key\"] = np.random.randint(0, 2, size)\n",
"pdf[\"val\"] = np.random.randint(0, 7, size)\n",
"pdf[\"key\"] = np.random.randint(0, 2, num_rows)\n",
"pdf[\"val\"] = np.random.randint(0, 7, num_rows)\n",
"\n",
"\n",
"def custom_formula_udf(df):\n",
Expand Down Expand Up @@ -1634,7 +1638,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.9"
"version": "3.10.10"
},
"vscode": {
"interpreter": {
Expand Down