Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce pytest runtime #10203

Merged

Conversation

brandon-b-miller
Copy link
Contributor

This PR reduces the overall runtime of the cuDF pytest suite. Changes include:

  • asserting equal on the GPU where possible for large datasets
  • in some cases reducing excessive test data size

part of #9999

@brandon-b-miller brandon-b-miller added feature request New feature or request 2 - In Progress Currently a work in progress tests Unit testing for project Python Affects Python cuDF API. non-breaking Non-breaking change labels Feb 3, 2022
@brandon-b-miller brandon-b-miller self-assigned this Feb 3, 2022
@codecov
Copy link

codecov bot commented Feb 3, 2022

Codecov Report

Merging #10203 (1f0b4b4) into branch-22.04 (8b0737d) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

❗ Current head 1f0b4b4 differs from pull request most recent head 62d56be. Consider uploading reports for the commit 62d56be to get more accurate results

Impacted file tree graph

@@               Coverage Diff                @@
##           branch-22.04   #10203      +/-   ##
================================================
- Coverage         10.67%   10.67%   -0.01%     
================================================
  Files               122      122              
  Lines             20874    20876       +2     
================================================
  Hits               2228     2228              
- Misses            18646    18648       +2     
Impacted Files Coverage Δ
python/cudf/cudf/testing/_utils.py 0.00% <0.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b0737d...62d56be. Read the comment docs.

@brandon-b-miller brandon-b-miller marked this pull request as ready for review February 7, 2022 20:38
@brandon-b-miller brandon-b-miller requested a review from a team as a code owner February 7, 2022 20:38
@brandon-b-miller brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 7, 2022
-2.1221,
-0.112121,
21.1212,
],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I am just deleting tests that might be redundant. I am basing this off the assumption that all these different sets of numbers are not really increasing test coverage.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some values of rtol and atol, the deleted values seem to be testing the behavior of isclose at a granularity finer than float32 (requiring float64s for accuracy). That seems potentially important, and would deserve a code comment explaining the coverage that each parameter set adds. Instead of deleting these, we might reframe these parameters so that we don't test a whole matrix of all input values and all atol/rtol values (6x6x6x6 = 1296 tests), and only test certain pieces of the matrix that actually cover the important pieces of the behavior (comparing ints and floats, testing isclose at float64 precision, etc.).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the these tests were only covering float64 both before and after the change. The dataframe and array constructors implicitly upcast the incoming data, so it's always getting compared as float64. Is your concern that we're not covering float32 at all here?

Copy link
Contributor

@bdice bdice Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't very clear. The issue isn't really about data types. It's about the rtol / atol precision requiring float64 precision, which these changes no longer test adequately. The real-world cases of isclose I have seen use very tight tolerances (sometimes tighter than float32 precision, like 1e-08 for data on the order of 1). Currently, this PR removes the input data that is designed to test those cases of tight tolerances.

If you look at the data1/data2 values like 1.987654321 and 1.9876543, those pairs of input data are meant to be compared with the rtol/atol values of 1e-08 in the other set of parameters. If we remove the more-precise values here, we aren't getting good coverage of the tighter tolerance 1e-08, which requires float64 precision to get the correct results. By removing these pairings of parametrized values, this test would no longer fail if the isclose functions in cuDF or cupy were to erroneously cast their inputs to float32.

I agree that this test is grossly over-parametrized, but the deletions here are removing an important case to check.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just undo these changes and add a TODO that we can be more clever about parametrizing this particular test. The other changes in this PR give a more-than-meaningful improvement in test time and I don't think it's worth investing much more time over this single test at the moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that would be good. I can file an issue describing what I've discussed with @brandon-b-miller via Slack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me. Thank you, Bradley!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue #10284 filed. I can put this in my backlog of things to do, or I can help someone else construct the specific cases I have in mind for test coverage.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change has been reverted

def test_parquet_reader_list_skiprows(skip, tmpdir):
num_rows = 128
num_rows = 10
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I reason that that if 0:10 work then 11:128 should too.

Copy link
Contributor

@bdice bdice Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even replace range(0, 10) with [0, 1, 5, 10]. Maybe even search the tests for the regex parametrize.*range. 🙃

@@ -13,7 +13,7 @@
from cudf.testing import _utils as utils
from cudf.utils.dtypes import np_dtypes_to_pandas_dtypes

repr_categories = utils.NUMERIC_TYPES + ["str", "category", "datetime64[ns]"]
repr_categories = ["int64", "float64", "str", "category", "datetime64[ns]"]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I reason that we only need one of each kind because every flavor of float or int should have the same __repr__.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I reason that we only need one of each kind because every flavor of float or int should have the same __repr__.

Agree, can we also add uint16 to cover unsigned int types?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if the tests can handle bool might try adding it here aswell. Else okay to skip it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added uint. It seems bool was not originally included here, and adding it now creates some failed tests that use this parameterization. We will probably need to update a bunch of them to cover bool. Want me to create an issue around this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, not needed for a new issue if we are covering bool atleast anywhere in our repr testing. Else, we probably want to cover it as part of another PR.

pd.Series(["1", "2", "3", "4", "5"]),
pd.Index(["1", "2", "3", "4", "5"]),
],
"index", [["1", "2", "3", "4", "5"]],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT the only place the raw index arg is used is to set the index of ps and gs, and then ps.index and gs.index are used after that. But I think all three of the previous parameters result in the same index despite being different objects at the outset. As such I think 2/3 of these test cases are redundant.

@@ -321,3 +322,9 @@ def does_not_raise():

def xfail_param(param, **kwargs):
return pytest.param(param, marks=pytest.mark.xfail(**kwargs))


deduped_numeric_dtype_tests = pytest.mark.parametrize(
Copy link
Contributor

@shwina shwina Feb 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming nit: maybe numeric_dtypes_pairwise or numeric_dtypes_combinations?

python/cudf/cudf/tests/test_indexing.py Outdated Show resolved Hide resolved
@@ -13,7 +13,7 @@
from cudf.testing import _utils as utils
from cudf.utils.dtypes import np_dtypes_to_pandas_dtypes

repr_categories = utils.NUMERIC_TYPES + ["str", "category", "datetime64[ns]"]
repr_categories = ["int64", "float64", "str", "category", "datetime64[ns]"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I reason that we only need one of each kind because every flavor of float or int should have the same __repr__.

Agree, can we also add uint16 to cover unsigned int types?

@@ -85,15 +85,12 @@ def test_full_series(nrows, dtype):


@pytest.mark.parametrize("dtype", repr_categories)
@pytest.mark.parametrize("nrows", [0, 1, 2, 9, 20 / 2, 11, 20 - 1, 20, 20 + 1])
@pytest.mark.parametrize("ncols", [0, 1, 2, 9, 20 / 2, 11, 20 - 1, 20, 20 + 1])
def test_full_dataframe_20(dtype, nrows, ncols):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we just merge test_full_dataframe_20 & test_full_dataframe_21 by parametrizing size into one test and also with reduced parametrization of nrows & ncols?

Copy link
Contributor

@galipremsagar galipremsagar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, some copyright year updates can be made..

@@ -1,5 +1,6 @@
# Copyright (c) 2020-2021, NVIDIA CORPORATION.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

copyright

python/cudf/cudf/tests/test_csv.py Show resolved Hide resolved
python/cudf/cudf/tests/test_indexing.py Show resolved Hide resolved
python/cudf/cudf/tests/test_repr.py Show resolved Hide resolved
python/cudf/cudf/tests/test_reshape.py Show resolved Hide resolved
python/cudf/cudf/tests/test_udf_masked_ops.py Show resolved Hide resolved
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice job on cutting the runtime. I have some comments attached.

@@ -217,10 +217,11 @@ def test_series_compare(cmpop, obj_class, dtype):

def _series_compare_nulls_typegen():
tests = []
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another way to write this would be:

return [
    *combinations_with_replacement(DATETIME_TYPES, 2),
    *combinations_with_replacement(TIMEDELTA_TYPES, 2),
    *combinations_with_replacement(NUMERIC_TYPES, 2),
    *combinations_with_replacement(STRING_TYPES, 2),
]

However you prefer is fine.

-2.1221,
-0.112121,
21.1212,
],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some values of rtol and atol, the deleted values seem to be testing the behavior of isclose at a granularity finer than float32 (requiring float64s for accuracy). That seems potentially important, and would deserve a code comment explaining the coverage that each parameter set adds. Instead of deleting these, we might reframe these parameters so that we don't test a whole matrix of all input values and all atol/rtol values (6x6x6x6 = 1296 tests), and only test certain pieces of the matrix that actually cover the important pieces of the behavior (comparing ints and floats, testing isclose at float64 precision, etc.).

python/cudf/cudf/testing/_utils.py Outdated Show resolved Hide resolved
cudf.DataFrame({"a": range(1000000)}),
cudf.DataFrame({"a": range(1000000), "b": range(1000000)}),
cudf.DataFrame({"a": range(20), "b": range(20)}),
cudf.DataFrame({"a": range(100000)}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the construction of GPU objects from the parametrize call? It occurs at collection time and is very expensive. This can be constructed lazily like:

@pytest.mark.parametrize(
    "gdf_kwargs",
    [
        dict(data={"a": range(100000)}),
        dict(data={"a": range(100000), "b": range(100000)}),
        # ...
        dict(index=[1, 2, 3]),
        # ...
    ],
)

then:

def test_dataframe_sliced(gdf_kwargs, slice):
    gdf = cudf.DataFrame(**gdf_kwargs)
    pdf = gdf.to_pandas()
    # ...

def test_parquet_reader_list_skiprows(skip, tmpdir):
num_rows = 128
num_rows = 10
Copy link
Contributor

@bdice bdice Feb 9, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even replace range(0, 10) with [0, 1, 5, 10]. Maybe even search the tests for the regex parametrize.*range. 🙃

python/cudf/cudf/tests/test_repr.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_repr.py Outdated Show resolved Hide resolved
python/cudf/cudf/tests/test_udf_masked_ops.py Outdated Show resolved Hide resolved
@brandon-b-miller
Copy link
Contributor Author

rerun tests

@@ -1,4 +1,4 @@
# Copyright (c) 2021, NVIDIA CORPORATION.
# Copyright (c) 2022, NVIDIA CORPORATION.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Copyright (c) 2022, NVIDIA CORPORATION.
# Copyright (c) 2021-2022, NVIDIA CORPORATION.

?

-2.1221,
-0.112121,
21.1212,
],
Copy link
Contributor

@bdice bdice Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I wasn't very clear. The issue isn't really about data types. It's about the rtol / atol precision requiring float64 precision, which these changes no longer test adequately. The real-world cases of isclose I have seen use very tight tolerances (sometimes tighter than float32 precision, like 1e-08 for data on the order of 1). Currently, this PR removes the input data that is designed to test those cases of tight tolerances.

If you look at the data1/data2 values like 1.987654321 and 1.9876543, those pairs of input data are meant to be compared with the rtol/atol values of 1e-08 in the other set of parameters. If we remove the more-precise values here, we aren't getting good coverage of the tighter tolerance 1e-08, which requires float64 precision to get the correct results. By removing these pairings of parametrized values, this test would no longer fail if the isclose functions in cuDF or cupy were to erroneously cast their inputs to float32.

I agree that this test is grossly over-parametrized, but the deletions here are removing an important case to check.

slice(500000),
slice(25000, 50000),
slice(25000, 25001),
slice(50000),
slice(1, 10),
slice(10, 20),
slice(15, 24000),
slice(6),
Copy link
Contributor

@bdice bdice Feb 11, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're testing multiple combinations, we should have coverage of unique code paths: three-argument slices like slice(start, stop, step), negative indices, reversed slices, and empty slices. In the spirit of reducing runtime, some of the other cases can probably be removed if we aim for covering only unique cases. Also, I see no reason why we can't cut this test down to 100 rows instead of 100,000.

Suggested change
slice(6),
slice(6, None), # start but no stop, [6:]
slice(None, None, 3), # only step, [::3]
slice(1, 10, 2), # start, stop, step
slice(3, -5, 2), # negative stop
slice(-2, -4), # slice is empty
slice(-10, -20, -1), # reversed slice
slice(None), # slices everything, same as [:]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried some of these and we actually get multiple failures with these. Raising an issue now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Glad I could help catch a bug here. Please tag me in that issue, I'm interested in seeing what you found. Slice all the things! 🥷⚔️🥷

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raised #10292

@shwina
Copy link
Contributor

shwina commented Feb 15, 2022

@gpucibot merge

@rapids-bot rapids-bot bot merged commit 851e235 into rapidsai:branch-22.04 Feb 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team feature request New feature or request non-breaking Non-breaking change Python Affects Python cuDF API. tests Unit testing for project
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants