Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: use constants in performance-comparisons.ipynb #15215

Merged
merged 5 commits into from
Mar 14, 2024
Merged

DOC: use constants in performance-comparisons.ipynb #15215

merged 5 commits into from
Mar 14, 2024

Conversation

raybellwaves
Copy link
Contributor

@raybellwaves raybellwaves commented Mar 4, 2024

Description

I've simplified the performance comparisons notebook by setting constants which can be adjusted at the top of each section e.g. num_rows. This makes it easier for anyone running this to adjust the value and hopefully not encounter memory values. It can also help with testing these benchmarks on dataframes of various lengths. I've stripped the output as I was working on a A10G and I couldn't run with the current num_rows value. I also didn't want to commit the results which may differ compared to the H100 which is used currently and I would rather the results be committed by the RAPIDS team. I can confirm the notebook runs end-to-end (you can see my version here: https://github.com/raybellwaves/cudf-performance-comparisons/blob/main/performance-comparisons.ipynb with smaller num_rows and smaller timeit_number on a A10G (EC2 machine))

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

Copy link

copy-pr-bot bot commented Mar 4, 2024

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@bdice
Copy link
Contributor

bdice commented Mar 4, 2024

@raybellwaves Thanks for the PR! I skimmed it and the core ideas seem helpful.

@galipremsagar Would you be able to take a look at this and incorporate the proposed changes for next time we run benchmarks?

@galipremsagar
Copy link
Contributor

I'll run these on H100's and update this PR

@galipremsagar galipremsagar self-assigned this Mar 5, 2024
@galipremsagar galipremsagar self-requested a review March 5, 2024 16:15
@galipremsagar galipremsagar added improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Mar 13, 2024
@galipremsagar
Copy link
Contributor

/okay to test

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix (or hide) the warning here?

DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass `include_groups=False` to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning.
  lambda df: df.groupby(["key"], group_keys=False).apply(custom_formula_udf)

Also can we run on a system that isn't doing background work? I see some references to tritonserver consuming GPU memory (probably on other devices?) in the nvidia-smi output, which makes it hard to know if it affected performance negatively.

Copy link
Contributor

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice changes. When printing out the cudf version, could you also include the pandas version as well?

Copy link
Contributor

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment is non blocking so approving

@galipremsagar
Copy link
Contributor

/okay to test

@galipremsagar
Copy link
Contributor

/merge

@rapids-bot rapids-bot bot merged commit 769c1bd into rapidsai:branch-24.04 Mar 14, 2024
75 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvement / enhancement to an existing function non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants