-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC: use constants in performance-comparisons.ipynb #15215
DOC: use constants in performance-comparisons.ipynb #15215
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@raybellwaves Thanks for the PR! I skimmed it and the core ideas seem helpful. @galipremsagar Would you be able to take a look at this and incorporate the proposed changes for next time we run benchmarks? |
I'll run these on H100's and update this PR |
/okay to test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we fix (or hide) the warning here?
DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
lambda df: df.groupby(["key"], group_keys=False).apply(custom_formula_udf)
Also can we run on a system that isn't doing background work? I see some references to tritonserver
consuming GPU memory (probably on other devices?) in the nvidia-smi
output, which makes it hard to know if it affected performance negatively.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice changes. When printing out the cudf version, could you also include the pandas version as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment is non blocking so approving
/okay to test |
/merge |
Description
I've simplified the performance comparisons notebook by setting constants which can be adjusted at the top of each section e.g.
num_rows
. This makes it easier for anyone running this to adjust the value and hopefully not encounter memory values. It can also help with testing these benchmarks on dataframes of various lengths. I've stripped the output as I was working on a A10G and I couldn't run with the currentnum_rows
value. I also didn't want to commit the results which may differ compared to the H100 which is used currently and I would rather the results be committed by the RAPIDS team. I can confirm the notebook runs end-to-end (you can see my version here: https://github.com/raybellwaves/cudf-performance-comparisons/blob/main/performance-comparisons.ipynb with smallernum_rows
and smallertimeit_number
on a A10G (EC2 machine))Checklist