Skip to content

Commit

Permalink
Conform "bench_isin" to match generator column names (#11549)
Browse files Browse the repository at this point in the history
The version of `bench_isin` merged in #11125 used key and column names of the format `f"key{i}"` rather than the format `f"{string.ascii_lowercase[i]}"` as is used in the dataframe generator. As a result the `isin` benchmark using a dictionary argument short-circuits with no matching keys, and the `isin` benchmark using a dataframe argument finds no matches.

This PR also adjusts the `isin` arguments from `range(1000)` to `range(50)` to better match the input dataframe cardinality of 100. With `range(1000)`, every element matches but with `range(50)` only 50% of the elements match.

Authors:
  - Gregory Kimball (https://github.com/GregoryKimball)

Approvers:
  - Bradley Dice (https://github.com/bdice)
  - GALI PREM SAGAR (https://github.com/galipremsagar)

URL: #11549
  • Loading branch information
Gregory Kimball authored Oct 11, 2022
1 parent 26f3e76 commit 566b3d1
Showing 1 changed file with 7 additions and 5 deletions.
12 changes: 7 additions & 5 deletions python/cudf/benchmarks/API/bench_dataframe.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,14 +41,16 @@ def bench_merge(benchmark, dataframe, num_key_cols):
@pytest.mark.parametrize(
"values",
[
range(1000),
{f"key{i}": range(1000) for i in range(10)},
cudf.DataFrame({f"key{i}": range(1000) for i in range(10)}),
cudf.Series(range(1000)),
lambda: range(50),
lambda: {f"{string.ascii_lowercase[i]}": range(50) for i in range(10)},
lambda: cudf.DataFrame(
{f"{string.ascii_lowercase[i]}": range(50) for i in range(10)}
),
lambda: cudf.Series(range(50)),
],
)
def bench_isin(benchmark, dataframe, values):
benchmark(dataframe.isin, values)
benchmark(dataframe.isin, values())


@pytest.fixture(
Expand Down

0 comments on commit 566b3d1

Please sign in to comment.