-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce pytest runtime #10203
Reduce pytest runtime #10203
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10203 +/- ##
================================================
- Coverage 10.67% 10.67% -0.01%
================================================
Files 122 122
Lines 20874 20876 +2
================================================
Hits 2228 2228
- Misses 18646 18648 +2
Continue to review full report at Codecov.
|
-2.1221, | ||
-0.112121, | ||
21.1212, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I am just deleting tests that might be redundant. I am basing this off the assumption that all these different sets of numbers are not really increasing test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some values of rtol and atol, the deleted values seem to be testing the behavior of isclose
at a granularity finer than float32 (requiring float64s for accuracy). That seems potentially important, and would deserve a code comment explaining the coverage that each parameter set adds. Instead of deleting these, we might reframe these parameters so that we don't test a whole matrix of all input values and all atol/rtol values (6x6x6x6 = 1296 tests), and only test certain pieces of the matrix that actually cover the important pieces of the behavior (comparing ints and floats, testing isclose
at float64 precision, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the these tests were only covering float64
both before and after the change. The dataframe and array constructors implicitly upcast the incoming data, so it's always getting compared as float64
. Is your concern that we're not covering float32
at all here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't very clear. The issue isn't really about data types. It's about the rtol / atol precision requiring float64
precision, which these changes no longer test adequately. The real-world cases of isclose
I have seen use very tight tolerances (sometimes tighter than float32 precision, like 1e-08
for data on the order of 1
). Currently, this PR removes the input data that is designed to test those cases of tight tolerances.
If you look at the data1
/data2
values like 1.987654321
and 1.9876543
, those pairs of input data are meant to be compared with the rtol/atol values of 1e-08
in the other set of parameters. If we remove the more-precise values here, we aren't getting good coverage of the tighter tolerance 1e-08
, which requires float64
precision to get the correct results. By removing these pairings of parametrized values, this test would no longer fail if the isclose
functions in cuDF or cupy were to erroneously cast their inputs to float32
.
I agree that this test is grossly over-parametrized, but the deletions here are removing an important case to check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's just undo these changes and add a TODO
that we can be more clever about parametrizing this particular test. The other changes in this PR give a more-than-meaningful improvement in test time and I don't think it's worth investing much more time over this single test at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that would be good. I can file an issue describing what I've discussed with @brandon-b-miller via Slack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. Thank you, Bradley!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue #10284 filed. I can put this in my backlog of things to do, or I can help someone else construct the specific cases I have in mind for test coverage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change has been reverted
def test_parquet_reader_list_skiprows(skip, tmpdir): | ||
num_rows = 128 | ||
num_rows = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I reason that that if 0:10
work then 11:128
should too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even replace range(0, 10)
with [0, 1, 5, 10]
. Maybe even search the tests for the regex parametrize.*range
. 🙃
python/cudf/cudf/tests/test_repr.py
Outdated
@@ -13,7 +13,7 @@ | |||
from cudf.testing import _utils as utils | |||
from cudf.utils.dtypes import np_dtypes_to_pandas_dtypes | |||
|
|||
repr_categories = utils.NUMERIC_TYPES + ["str", "category", "datetime64[ns]"] | |||
repr_categories = ["int64", "float64", "str", "category", "datetime64[ns]"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I reason that we only need one of each kind
because every flavor of float or int should have the same __repr__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I reason that we only need one of each
kind
because every flavor of float or int should have the same__repr__
.
Agree, can we also add uint16
to cover unsigned int types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if the tests can handle bool
might try adding it here aswell. Else okay to skip it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added uint
. It seems bool
was not originally included here, and adding it now creates some failed tests that use this parameterization. We will probably need to update a bunch of them to cover bool
. Want me to create an issue around this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, not needed for a new issue if we are covering bool atleast anywhere in our repr testing. Else, we probably want to cover it as part of another PR.
pd.Series(["1", "2", "3", "4", "5"]), | ||
pd.Index(["1", "2", "3", "4", "5"]), | ||
], | ||
"index", [["1", "2", "3", "4", "5"]], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT the only place the raw index
arg is used is to set the index of ps
and gs
, and then ps.index
and gs.index
are used after that. But I think all three of the previous parameters result in the same index despite being different objects at the outset. As such I think 2/3 of these test cases are redundant.
python/cudf/cudf/testing/_utils.py
Outdated
@@ -321,3 +322,9 @@ def does_not_raise(): | |||
|
|||
def xfail_param(param, **kwargs): | |||
return pytest.param(param, marks=pytest.mark.xfail(**kwargs)) | |||
|
|||
|
|||
deduped_numeric_dtype_tests = pytest.mark.parametrize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naming nit: maybe numeric_dtypes_pairwise
or numeric_dtypes_combinations
?
Co-authored-by: Michael Wang <[email protected]>
python/cudf/cudf/tests/test_repr.py
Outdated
@@ -13,7 +13,7 @@ | |||
from cudf.testing import _utils as utils | |||
from cudf.utils.dtypes import np_dtypes_to_pandas_dtypes | |||
|
|||
repr_categories = utils.NUMERIC_TYPES + ["str", "category", "datetime64[ns]"] | |||
repr_categories = ["int64", "float64", "str", "category", "datetime64[ns]"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I reason that we only need one of each
kind
because every flavor of float or int should have the same__repr__
.
Agree, can we also add uint16
to cover unsigned int types?
@@ -85,15 +85,12 @@ def test_full_series(nrows, dtype): | |||
|
|||
|
|||
@pytest.mark.parametrize("dtype", repr_categories) | |||
@pytest.mark.parametrize("nrows", [0, 1, 2, 9, 20 / 2, 11, 20 - 1, 20, 20 + 1]) | |||
@pytest.mark.parametrize("ncols", [0, 1, 2, 9, 20 / 2, 11, 20 - 1, 20, 20 + 1]) | |||
def test_full_dataframe_20(dtype, nrows, ncols): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just merge test_full_dataframe_20
& test_full_dataframe_21
by parametrizing size
into one test and also with reduced parametrization of nrows
& ncols
?
Co-authored-by: GALI PREM SAGAR <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some copyright year updates can be made..
python/cudf/cudf/testing/_utils.py
Outdated
@@ -1,5 +1,6 @@ | |||
# Copyright (c) 2020-2021, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copyright
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job on cutting the runtime. I have some comments attached.
@@ -217,10 +217,11 @@ def test_series_compare(cmpop, obj_class, dtype): | |||
|
|||
def _series_compare_nulls_typegen(): | |||
tests = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another way to write this would be:
return [
*combinations_with_replacement(DATETIME_TYPES, 2),
*combinations_with_replacement(TIMEDELTA_TYPES, 2),
*combinations_with_replacement(NUMERIC_TYPES, 2),
*combinations_with_replacement(STRING_TYPES, 2),
]
However you prefer is fine.
-2.1221, | ||
-0.112121, | ||
21.1212, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For some values of rtol and atol, the deleted values seem to be testing the behavior of isclose
at a granularity finer than float32 (requiring float64s for accuracy). That seems potentially important, and would deserve a code comment explaining the coverage that each parameter set adds. Instead of deleting these, we might reframe these parameters so that we don't test a whole matrix of all input values and all atol/rtol values (6x6x6x6 = 1296 tests), and only test certain pieces of the matrix that actually cover the important pieces of the behavior (comparing ints and floats, testing isclose
at float64 precision, etc.).
cudf.DataFrame({"a": range(1000000)}), | ||
cudf.DataFrame({"a": range(1000000), "b": range(1000000)}), | ||
cudf.DataFrame({"a": range(20), "b": range(20)}), | ||
cudf.DataFrame({"a": range(100000)}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the construction of GPU objects from the parametrize
call? It occurs at collection time and is very expensive. This can be constructed lazily like:
@pytest.mark.parametrize(
"gdf_kwargs",
[
dict(data={"a": range(100000)}),
dict(data={"a": range(100000), "b": range(100000)}),
# ...
dict(index=[1, 2, 3]),
# ...
],
)
then:
def test_dataframe_sliced(gdf_kwargs, slice):
gdf = cudf.DataFrame(**gdf_kwargs)
pdf = gdf.to_pandas()
# ...
def test_parquet_reader_list_skiprows(skip, tmpdir): | ||
num_rows = 128 | ||
num_rows = 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even replace range(0, 10)
with [0, 1, 5, 10]
. Maybe even search the tests for the regex parametrize.*range
. 🙃
Co-authored-by: Bradley Dice <[email protected]>
rerun tests |
@@ -1,4 +1,4 @@ | |||
# Copyright (c) 2021, NVIDIA CORPORATION. | |||
# Copyright (c) 2022, NVIDIA CORPORATION. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Copyright (c) 2022, NVIDIA CORPORATION. | |
# Copyright (c) 2021-2022, NVIDIA CORPORATION. |
?
-2.1221, | ||
-0.112121, | ||
21.1212, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I wasn't very clear. The issue isn't really about data types. It's about the rtol / atol precision requiring float64
precision, which these changes no longer test adequately. The real-world cases of isclose
I have seen use very tight tolerances (sometimes tighter than float32 precision, like 1e-08
for data on the order of 1
). Currently, this PR removes the input data that is designed to test those cases of tight tolerances.
If you look at the data1
/data2
values like 1.987654321
and 1.9876543
, those pairs of input data are meant to be compared with the rtol/atol values of 1e-08
in the other set of parameters. If we remove the more-precise values here, we aren't getting good coverage of the tighter tolerance 1e-08
, which requires float64
precision to get the correct results. By removing these pairings of parametrized values, this test would no longer fail if the isclose
functions in cuDF or cupy were to erroneously cast their inputs to float32
.
I agree that this test is grossly over-parametrized, but the deletions here are removing an important case to check.
slice(500000), | ||
slice(25000, 50000), | ||
slice(25000, 25001), | ||
slice(50000), | ||
slice(1, 10), | ||
slice(10, 20), | ||
slice(15, 24000), | ||
slice(6), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're testing multiple combinations, we should have coverage of unique code paths: three-argument slices like slice(start, stop, step)
, negative indices, reversed slices, and empty slices. In the spirit of reducing runtime, some of the other cases can probably be removed if we aim for covering only unique cases. Also, I see no reason why we can't cut this test down to 100 rows instead of 100,000.
slice(6), | |
slice(6, None), # start but no stop, [6:] | |
slice(None, None, 3), # only step, [::3] | |
slice(1, 10, 2), # start, stop, step | |
slice(3, -5, 2), # negative stop | |
slice(-2, -4), # slice is empty | |
slice(-10, -20, -1), # reversed slice | |
slice(None), # slices everything, same as [:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried some of these and we actually get multiple failures with these. Raising an issue now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad I could help catch a bug here. Please tag me in that issue, I'm interested in seeing what you found. Slice all the things! 🥷⚔️🥷
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raised #10292
@gpucibot merge |
This PR reduces the overall runtime of the cuDF pytest suite. Changes include:
part of #9999