DOC: use constants in performance-comparisons.ipynb #15215

raybellwaves · 2024-03-04T00:12:31Z

Description

I've simplified the performance comparisons notebook by setting constants which can be adjusted at the top of each section e.g. num_rows. This makes it easier for anyone running this to adjust the value and hopefully not encounter memory values. It can also help with testing these benchmarks on dataframes of various lengths. I've stripped the output as I was working on a A10G and I couldn't run with the current num_rows value. I also didn't want to commit the results which may differ compared to the H100 which is used currently and I would rather the results be committed by the RAPIDS team. I can confirm the notebook runs end-to-end (you can see my version here: https://github.com/raybellwaves/cudf-performance-comparisons/blob/main/performance-comparisons.ipynb with smaller num_rows and smaller timeit_number on a A10G (EC2 machine))

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

copy-pr-bot · 2024-03-04T00:12:34Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

review-notebook-app · 2024-03-04T00:12:36Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

bdice · 2024-03-04T04:23:18Z

@raybellwaves Thanks for the PR! I skimmed it and the core ideas seem helpful.

@galipremsagar Would you be able to take a look at this and incorporate the proposed changes for next time we run benchmarks?

galipremsagar · 2024-03-05T16:15:47Z

I'll run these on H100's and update this PR

…son-notebook

galipremsagar · 2024-03-13T18:19:26Z

/okay to test

bdice · 2024-03-13T19:13:05Z

docs/cudf/source/user_guide/performance-comparisons/performance-comparisons.ipynb

Can we fix (or hide) the warning here?

DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. lambda df: df.groupby(["key"], group_keys=False).apply(custom_formula_udf)

Also can we run on a system that isn't doing background work? I see some references to tritonserver consuming GPU memory (probably on other devices?) in the nvidia-smi output, which makes it hard to know if it affected performance negatively.

mroeschke

Nice changes. When printing out the cudf version, could you also include the pandas version as well?

…son-notebook

mroeschke

My comment is non blocking so approving

…son-notebook

galipremsagar · 2024-03-14T16:47:53Z

/okay to test

galipremsagar · 2024-03-14T16:48:01Z

/merge

use constants in performance-comparisons.ipynb

225920e

galipremsagar self-assigned this Mar 5, 2024

galipremsagar self-requested a review March 5, 2024 16:15

galipremsagar added 2 commits March 13, 2024 18:17

Run on H100

6a7726f

Merge branch 'branch-24.04' into use-constants-in-performance-compari…

4f05be4

…son-notebook

galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 13, 2024

bdice reviewed Mar 13, 2024

View reviewed changes

mroeschke reviewed Mar 13, 2024

View reviewed changes

Merge branch 'branch-24.04' into use-constants-in-performance-compari…

6212b33

…son-notebook

mroeschke approved these changes Mar 14, 2024

View reviewed changes

Merge branch 'branch-24.04' into use-constants-in-performance-compari…

4322782

…son-notebook

rapids-bot bot merged commit 769c1bd into rapidsai:branch-24.04 Mar 14, 2024
75 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC: use constants in performance-comparisons.ipynb #15215

DOC: use constants in performance-comparisons.ipynb #15215

raybellwaves commented Mar 4, 2024 •

edited by galipremsagar

Loading

copy-pr-bot bot commented Mar 4, 2024

review-notebook-app bot commented Mar 4, 2024

bdice commented Mar 4, 2024

galipremsagar commented Mar 5, 2024

galipremsagar commented Mar 13, 2024

bdice Mar 13, 2024

mroeschke left a comment

mroeschke left a comment

galipremsagar commented Mar 14, 2024

galipremsagar commented Mar 14, 2024

DOC: use constants in performance-comparisons.ipynb #15215

DOC: use constants in performance-comparisons.ipynb #15215

Conversation

raybellwaves commented Mar 4, 2024 • edited by galipremsagar Loading

Description

Checklist

copy-pr-bot bot commented Mar 4, 2024

review-notebook-app bot commented Mar 4, 2024

bdice commented Mar 4, 2024

galipremsagar commented Mar 5, 2024

galipremsagar commented Mar 13, 2024

bdice Mar 13, 2024

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

mroeschke left a comment

Choose a reason for hiding this comment

galipremsagar commented Mar 14, 2024

galipremsagar commented Mar 14, 2024

raybellwaves commented Mar 4, 2024 •

edited by galipremsagar

Loading