Feature/python benchmarking #11125

vyasr · 2022-06-17T21:59:11Z

This PR ports the benchmarks in https://github.com/vyasr/cudf_benchmarks, adding official benchmarks to the repository. The new benchmarks are designed from the ground up to make the best use of pytest, pytest-benchmark, and pytest-cases to simplify writing and maintaining benchmarks. Extended discussions of various previous design questions may be found on the original repo. Reviewers may also benefit from reviewing the companion PR creating documentation for how to write benchmarks, #11122.

Tests will not pass here until rapidsai/integration#492 is merged.

vyasr · 2022-06-17T22:04:56Z

This PR has a large changeset by necessity. In order to make this review process a bit smoother, I have explicitly requested reviews from people who have contributed benchmarks before and/or are familiar with my design for the benchmarks (@isVoid, @shwina, and @galipremsagar).

isVoid

Partial review part 1. Incredible work!

python/cudf/benchmarks/common/config.py

python/cudf/benchmarks/conftest.py

python/cudf/benchmarks/common/utils.py

ajschmidt8 · 2022-06-21T16:48:58Z

rerun tests

codecov · 2022-06-24T01:30:14Z

Codecov Report

❗ No coverage uploaded for pull request base (branch-22.08@379faf9). Click here to learn what that means.
The diff coverage is n/a.

@@               Coverage Diff               @@
##             branch-22.08   #11125   +/-   ##
===============================================
  Coverage                ?   86.33%           
===============================================
  Files                   ?      144           
  Lines                   ?    22751           
  Branches                ?        0           
===============================================
  Hits                    ?    19641           
  Misses                  ?     3110           
  Partials                ?        0

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 379faf9...5f80061. Read the comment docs.

galipremsagar

For consistency sake can we switch all the relative imports to absolute imports?

python/cudf/benchmarks/API/bench_dataframe.py

python/cudf/benchmarks/API/bench_frame_or_index.py

python/cudf/benchmarks/API/bench_functions.py

python/cudf/benchmarks/API/bench_functions_cases.py

python/cudf/benchmarks/API/bench_index.py

python/cudf/benchmarks/internal/conftest.py

python/cudf/benchmarks/internal/bench_dataframe_internal.py

python/cudf/benchmarks/internal/bench_column.py

python/cudf/benchmarks/conftest.py

python/cudf/benchmarks/common/utils.py

vyasr · 2022-06-24T20:26:23Z

For consistency sake can we switch all the relative imports to absolute imports?

Done

mroeschke

Looks pretty good! This is a good base of benchmarks that can be expanded on in the future.

Not sure if this was discussed, but how did you imagine these benchmarks to be used? For example in pandas, we have used ASV to catch performance regressions across verions, but we're inconsistent about this as it takes a while to run the whole suite.

vyasr · 2022-06-24T22:27:10Z

rerun tests

vyasr · 2022-06-24T22:30:43Z

Looks pretty good! This is a good base of benchmarks that can be expanded on in the future.

Not sure if this was discussed, but how did you imagine these benchmarks to be used? For example in pandas, we have used ASV to catch performance regressions across verions, but we're inconsistent about this as it takes a while to run the whole suite.

We have indeed discussed this before. ASV is something that we've definitely considered, not for running benchmarks but rather using the dashboards with data from pytest-benchmark via some plugins written by other RAPIDS team members. For the moment, the process of tracking historical benchmark results is pretty manual. We've discussed automating benchmark runs more going forward, but we haven't come to any specific solution yet. I think that to some extent this is waiting on various changes from our ops team.

vyasr · 2022-06-25T02:39:22Z

rerun tests

ci/gpu/build.sh

…tive imports." This reverts commit 5db4589.

vyasr · 2022-06-25T19:48:25Z

@bdice I had to roll back the changes making the benchmarks a Python package because it appears to change pytest's package discovery logic in a way that leads to it always finding the local copy of the package rather than the installed one, which is problematic when the package is not built in place (the case in our CI). I'm happy to revisit this further in the future, but in the interest of moving us forward I would like to merge as is and try to improve the organization and address the CI issues in future PRs.

vyasr · 2022-06-27T17:52:07Z

@gpucibot merge

The version of `bench_isin` merged in #11125 used key and column names of the format `f"key{i}"` rather than the format `f"{string.ascii_lowercase[i]}"` as is used in the dataframe generator. As a result the `isin` benchmark using a dictionary argument short-circuits with no matching keys, and the `isin` benchmark using a dataframe argument finds no matches. This PR also adjusts the `isin` arguments from `range(1000)` to `range(50)` to better match the input dataframe cardinality of 100. With `range(1000)`, every element matches but with `range(50)` only 50% of the elements match. Authors: - Gregory Kimball (https://github.com/GregoryKimball) Approvers: - Bradley Dice (https://github.com/bdice) - GALI PREM SAGAR (https://github.com/galipremsagar) URL: #11549

vyasr added 8 commits June 17, 2022 14:35

Add all benchmark files.

acdbebb

Fix most linter issues.

28ed6a8

Add copyrights.

7e3db91

Add missing module docstrings.

af6882f

Fix long lines for flake8.

97c601c

Remove unnecessary README.

55434af

Run benchmarks as tests in CI.

97bbe3e

Update environment.

0dd018b

vyasr added feature request New feature or request 3 - Ready for Review Ready for review by team Python Affects Python cuDF API. non-breaking Non-breaking change labels Jun 17, 2022

vyasr requested review from a team as code owners June 17, 2022 21:59

vyasr self-assigned this Jun 17, 2022

vyasr requested review from galipremsagar and rgsl888prabhu June 17, 2022 21:59

github-actions bot added conda labels Jun 17, 2022

vyasr requested review from shwina and isVoid and removed request for rgsl888prabhu June 17, 2022 22:02

isVoid reviewed Jun 17, 2022

View reviewed changes

vyasr added 4 commits June 21, 2022 09:54

Enable usage of accepts_cudf_fixture with cases.

7c370f4

Use makefun to create the wrapper more thoroughly.

83ee6b7

Address a couple minor parts of @isVoid review.

c61ba9f

Rename CUDF_BENCHMARKS_TEST_ONLY to CUDF_BENCHMARKS_DEBUG_ONLY.

690577f

vyasr added 3 commits June 23, 2022 13:40

Run tests from inside the benchmarks directory to find the correct cudf.

0d69a1f

Address PR comments.

b060ec7

Try running from the root.

ec4eaa7

isVoid approved these changes Jun 24, 2022

View reviewed changes

galipremsagar reviewed Jun 24, 2022

View reviewed changes

vyasr added 3 commits June 24, 2022 13:09

Fix path to benchmarks.

fe2a3a0

Switch to absolute imports.

dd01dd4

Add gpuci_logger line.

0183a51

galipremsagar approved these changes Jun 24, 2022

View reviewed changes

mroeschke approved these changes Jun 24, 2022

View reviewed changes

vyasr commented Jun 25, 2022

View reviewed changes

ci/gpu/build.sh Show resolved Hide resolved

vyasr commented Jun 25, 2022

View reviewed changes

ci/gpu/build.sh Outdated Show resolved Hide resolved

vyasr added 5 commits June 25, 2022 06:32

Try using py.test instead of pytest

28a6ea6

Try using identical commands as for testing.

a9ecdf3

Revert "Convert the benchmarks into a package so that we can use rela…

5e5e3d4

…tive imports." This reverts commit 5db4589.

Fix style.

b57eee2

Remove unnecessary coverage parameters for benchmarks.

5f80061

rapids-bot bot merged commit c75baeb into rapidsai:branch-22.08 Jun 27, 2022

GregoryKimball mentioned this pull request Jun 29, 2022

[FEA] Create collection of dataframes for testing #2530

Closed

GregoryKimball mentioned this pull request Aug 17, 2022

Conform "bench_isin" to match generator column names #11549

Merged

3 tasks

vyasr deleted the feature/python_benchmarking branch January 23, 2024 21:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/python benchmarking #11125

Feature/python benchmarking #11125

vyasr commented Jun 17, 2022 •

edited

Loading

vyasr commented Jun 17, 2022

isVoid left a comment

ajschmidt8 commented Jun 21, 2022

codecov bot commented Jun 24, 2022 •

edited

Loading

galipremsagar left a comment

vyasr commented Jun 24, 2022

mroeschke left a comment

vyasr commented Jun 24, 2022

vyasr commented Jun 24, 2022

vyasr commented Jun 25, 2022

vyasr commented Jun 25, 2022

vyasr commented Jun 27, 2022

Feature/python benchmarking #11125

Feature/python benchmarking #11125

Conversation

vyasr commented Jun 17, 2022 • edited Loading

vyasr commented Jun 17, 2022

isVoid left a comment

Choose a reason for hiding this comment

ajschmidt8 commented Jun 21, 2022

codecov bot commented Jun 24, 2022 • edited Loading

Codecov Report

galipremsagar left a comment

Choose a reason for hiding this comment

vyasr commented Jun 24, 2022

mroeschke left a comment

Choose a reason for hiding this comment

vyasr commented Jun 24, 2022

vyasr commented Jun 24, 2022

vyasr commented Jun 25, 2022

vyasr commented Jun 25, 2022

vyasr commented Jun 27, 2022

vyasr commented Jun 17, 2022 •

edited

Loading

codecov bot commented Jun 24, 2022 •

edited

Loading