Reduce pytest runtime #10203

brandon-b-miller · 2022-02-03T03:16:24Z

This PR reduces the overall runtime of the cuDF pytest suite. Changes include:

asserting equal on the GPU where possible for large datasets
in some cases reducing excessive test data size

part of #9999

codecov · 2022-02-03T05:13:05Z

Codecov Report

Merging #10203 (1f0b4b4) into branch-22.04 (8b0737d) will decrease coverage by 0.00%.
The diff coverage is 0.00%.

❗ Current head 1f0b4b4 differs from pull request most recent head 62d56be. Consider uploading reports for the commit 62d56be to get more accurate results

@@               Coverage Diff                @@
##           branch-22.04   #10203      +/-   ##
================================================
- Coverage         10.67%   10.67%   -0.01%     
================================================
  Files               122      122              
  Lines             20874    20876       +2     
================================================
  Hits               2228     2228              
- Misses            18646    18648       +2

Impacted Files	Coverage Δ
python/cudf/cudf/testing/_utils.py	`0.00% <0.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8b0737d...62d56be. Read the comment docs.

python/cudf/cudf/tests/test_orc.py

python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py

brandon-b-miller · 2022-02-07T22:20:42Z

python/cudf/cudf/tests/test_dataframe.py

-            -2.1221,
-            -0.112121,
-            21.1212,
-        ],


Here I am just deleting tests that might be redundant. I am basing this off the assumption that all these different sets of numbers are not really increasing test coverage.

For some values of rtol and atol, the deleted values seem to be testing the behavior of isclose at a granularity finer than float32 (requiring float64s for accuracy). That seems potentially important, and would deserve a code comment explaining the coverage that each parameter set adds. Instead of deleting these, we might reframe these parameters so that we don't test a whole matrix of all input values and all atol/rtol values (6x6x6x6 = 1296 tests), and only test certain pieces of the matrix that actually cover the important pieces of the behavior (comparing ints and floats, testing isclose at float64 precision, etc.).

AFAICT the these tests were only covering float64 both before and after the change. The dataframe and array constructors implicitly upcast the incoming data, so it's always getting compared as float64. Is your concern that we're not covering float32 at all here?

Sorry, I wasn't very clear. The issue isn't really about data types. It's about the rtol / atol precision requiring float64 precision, which these changes no longer test adequately. The real-world cases of isclose I have seen use very tight tolerances (sometimes tighter than float32 precision, like 1e-08 for data on the order of 1). Currently, this PR removes the input data that is designed to test those cases of tight tolerances.

If you look at the data1/data2 values like 1.987654321 and 1.9876543, those pairs of input data are meant to be compared with the rtol/atol values of 1e-08 in the other set of parameters. If we remove the more-precise values here, we aren't getting good coverage of the tighter tolerance 1e-08, which requires float64 precision to get the correct results. By removing these pairings of parametrized values, this test would no longer fail if the isclose functions in cuDF or cupy were to erroneously cast their inputs to float32.

I agree that this test is grossly over-parametrized, but the deletions here are removing an important case to check.

Let's just undo these changes and add a TODO that we can be more clever about parametrizing this particular test. The other changes in this PR give a more-than-meaningful improvement in test time and I don't think it's worth investing much more time over this single test at the moment.

I agree that would be good. I can file an issue describing what I've discussed with @brandon-b-miller via Slack.

Sounds good to me. Thank you, Bradley!

Issue #10284 filed. I can put this in my backlog of things to do, or I can help someone else construct the specific cases I have in mind for test coverage.

This change has been reverted

brandon-b-miller · 2022-02-07T22:21:02Z

python/cudf/cudf/tests/test_parquet.py

 def test_parquet_reader_list_skiprows(skip, tmpdir):
-    num_rows = 128
+    num_rows = 10


Here I reason that that if 0:10 work then 11:128 should too.

Maybe even replace range(0, 10) with [0, 1, 5, 10]. Maybe even search the tests for the regex parametrize.*range. 🙃

brandon-b-miller · 2022-02-07T22:21:15Z

python/cudf/cudf/tests/test_repr.py

@@ -13,7 +13,7 @@
 from cudf.testing import _utils as utils
 from cudf.utils.dtypes import np_dtypes_to_pandas_dtypes

-repr_categories = utils.NUMERIC_TYPES + ["str", "category", "datetime64[ns]"]
+repr_categories = ["int64", "float64", "str", "category", "datetime64[ns]"]


Here I reason that we only need one of each kind because every flavor of float or int should have the same __repr__.

Here I reason that we only need one of each kind because every flavor of float or int should have the same __repr__.

Agree, can we also add uint16 to cover unsigned int types?

Also, if the tests can handle bool might try adding it here aswell. Else okay to skip it.

I have added uint. It seems bool was not originally included here, and adding it now creates some failed tests that use this parameterization. We will probably need to update a bunch of them to cover bool. Want me to create an issue around this?

I see, not needed for a new issue if we are covering bool atleast anywhere in our repr testing. Else, we probably want to cover it as part of another PR.

brandon-b-miller · 2022-02-07T22:21:21Z

python/cudf/cudf/tests/test_string.py

-        pd.Series(["1", "2", "3", "4", "5"]),
-        pd.Index(["1", "2", "3", "4", "5"]),
-    ],
+    "index", [["1", "2", "3", "4", "5"]],


AFAICT the only place the raw index arg is used is to set the index of ps and gs, and then ps.index and gs.index are used after that. But I think all three of the previous parameters result in the same index despite being different objects at the outset. As such I think 2/3 of these test cases are redundant.

shwina · 2022-02-07T22:29:09Z

python/cudf/cudf/testing/_utils.py

@@ -321,3 +322,9 @@ def does_not_raise():

 def xfail_param(param, **kwargs):
    return pytest.param(param, marks=pytest.mark.xfail(**kwargs))
+
+
+deduped_numeric_dtype_tests = pytest.mark.parametrize(


Naming nit: maybe numeric_dtypes_pairwise or numeric_dtypes_combinations?

python/cudf/cudf/tests/test_binops.py

Co-authored-by: Michael Wang <[email protected]>

python/cudf/cudf/tests/test_indexing.py

galipremsagar · 2022-02-08T16:44:13Z

python/cudf/cudf/tests/test_repr.py

@@ -13,7 +13,7 @@
 from cudf.testing import _utils as utils
 from cudf.utils.dtypes import np_dtypes_to_pandas_dtypes

-repr_categories = utils.NUMERIC_TYPES + ["str", "category", "datetime64[ns]"]
+repr_categories = ["int64", "float64", "str", "category", "datetime64[ns]"]


Here I reason that we only need one of each kind because every flavor of float or int should have the same __repr__.

Agree, can we also add uint16 to cover unsigned int types?

galipremsagar · 2022-02-08T16:46:45Z

python/cudf/cudf/tests/test_repr.py

@@ -85,15 +85,12 @@ def test_full_series(nrows, dtype):


 @pytest.mark.parametrize("dtype", repr_categories)
-@pytest.mark.parametrize("nrows", [0, 1, 2, 9, 20 / 2, 11, 20 - 1, 20, 20 + 1])
-@pytest.mark.parametrize("ncols", [0, 1, 2, 9, 20 / 2, 11, 20 - 1, 20, 20 + 1])
-def test_full_dataframe_20(dtype, nrows, ncols):


Can we just merge test_full_dataframe_20 & test_full_dataframe_21 by parametrizing size into one test and also with reduced parametrization of nrows & ncols?

Co-authored-by: GALI PREM SAGAR <[email protected]>

galipremsagar

LGTM, some copyright year updates can be made..

galipremsagar · 2022-02-08T18:32:02Z

python/cudf/cudf/testing/_utils.py

@@ -1,5 +1,6 @@
 # Copyright (c) 2020-2021, NVIDIA CORPORATION.


python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py

python/cudf/cudf/tests/test_csv.py

python/cudf/cudf/tests/test_extension_compilation.py

python/cudf/cudf/tests/test_indexing.py

python/cudf/cudf/tests/test_repr.py

python/cudf/cudf/tests/test_reshape.py

python/cudf/cudf/tests/test_udf_masked_ops.py

bdice

Nice job on cutting the runtime. I have some comments attached.

bdice · 2022-02-09T04:12:15Z

python/cudf/cudf/tests/test_binops.py

@@ -217,10 +217,11 @@ def test_series_compare(cmpop, obj_class, dtype):

 def _series_compare_nulls_typegen():
    tests = []


Another way to write this would be:

return [ *combinations_with_replacement(DATETIME_TYPES, 2), *combinations_with_replacement(TIMEDELTA_TYPES, 2), *combinations_with_replacement(NUMERIC_TYPES, 2), *combinations_with_replacement(STRING_TYPES, 2), ]

However you prefer is fine.

bdice · 2022-02-09T04:16:53Z

python/cudf/cudf/tests/test_dataframe.py

-            -2.1221,
-            -0.112121,
-            21.1212,
-        ],


For some values of rtol and atol, the deleted values seem to be testing the behavior of isclose at a granularity finer than float32 (requiring float64s for accuracy). That seems potentially important, and would deserve a code comment explaining the coverage that each parameter set adds. Instead of deleting these, we might reframe these parameters so that we don't test a whole matrix of all input values and all atol/rtol values (6x6x6x6 = 1296 tests), and only test certain pieces of the matrix that actually cover the important pieces of the behavior (comparing ints and floats, testing isclose at float64 precision, etc.).

python/cudf/cudf/testing/_utils.py

bdice · 2022-02-09T04:29:03Z

python/cudf/cudf/tests/test_indexing.py

-        cudf.DataFrame({"a": range(1000000)}),
-        cudf.DataFrame({"a": range(1000000), "b": range(1000000)}),
-        cudf.DataFrame({"a": range(20), "b": range(20)}),
+        cudf.DataFrame({"a": range(100000)}),


Can we remove the construction of GPU objects from the parametrize call? It occurs at collection time and is very expensive. This can be constructed lazily like:

@pytest.mark.parametrize( "gdf_kwargs", [ dict(data={"a": range(100000)}), dict(data={"a": range(100000), "b": range(100000)}), # ... dict(index=[1, 2, 3]), # ... ], )

then:

def test_dataframe_sliced(gdf_kwargs, slice): gdf = cudf.DataFrame(**gdf_kwargs) pdf = gdf.to_pandas() # ...

bdice · 2022-02-09T04:32:02Z

python/cudf/cudf/tests/test_parquet.py

 def test_parquet_reader_list_skiprows(skip, tmpdir):
-    num_rows = 128
+    num_rows = 10


Maybe even replace range(0, 10) with [0, 1, 5, 10]. Maybe even search the tests for the regex parametrize.*range. 🙃

python/cudf/cudf/tests/test_repr.py

python/cudf/cudf/tests/test_udf_masked_ops.py

Co-authored-by: Bradley Dice <[email protected]>

brandon-b-miller · 2022-02-11T21:24:15Z

rerun tests

bdice · 2022-02-11T22:09:30Z

python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py

@@ -1,4 +1,4 @@
-# Copyright (c) 2021, NVIDIA CORPORATION.
+# Copyright (c) 2022, NVIDIA CORPORATION.


Suggested change

# Copyright (c) 2022, NVIDIA CORPORATION.

# Copyright (c) 2021-2022, NVIDIA CORPORATION.

?

bdice · 2022-02-11T22:18:22Z

python/cudf/cudf/tests/test_dataframe.py

-            -2.1221,
-            -0.112121,
-            21.1212,
-        ],


Sorry, I wasn't very clear. The issue isn't really about data types. It's about the rtol / atol precision requiring float64 precision, which these changes no longer test adequately. The real-world cases of isclose I have seen use very tight tolerances (sometimes tighter than float32 precision, like 1e-08 for data on the order of 1). Currently, this PR removes the input data that is designed to test those cases of tight tolerances.

If you look at the data1/data2 values like 1.987654321 and 1.9876543, those pairs of input data are meant to be compared with the rtol/atol values of 1e-08 in the other set of parameters. If we remove the more-precise values here, we aren't getting good coverage of the tighter tolerance 1e-08, which requires float64 precision to get the correct results. By removing these pairings of parametrized values, this test would no longer fail if the isclose functions in cuDF or cupy were to erroneously cast their inputs to float32.

I agree that this test is grossly over-parametrized, but the deletions here are removing an important case to check.

bdice · 2022-02-11T22:30:04Z

python/cudf/cudf/tests/test_indexing.py

-        slice(500000),
+        slice(25000, 50000),
+        slice(25000, 25001),
+        slice(50000),
        slice(1, 10),
        slice(10, 20),
        slice(15, 24000),
        slice(6),


If we're testing multiple combinations, we should have coverage of unique code paths: three-argument slices like slice(start, stop, step), negative indices, reversed slices, and empty slices. In the spirit of reducing runtime, some of the other cases can probably be removed if we aim for covering only unique cases. Also, I see no reason why we can't cut this test down to 100 rows instead of 100,000.

Suggested change

slice(6),

slice(6, None), # start but no stop, [6:]

slice(None, None, 3), # only step, [::3]

slice(1, 10, 2), # start, stop, step

slice(3, -5, 2), # negative stop

slice(-2, -4), # slice is empty

slice(-10, -20, -1), # reversed slice

slice(None), # slices everything, same as [:]

I tried some of these and we actually get multiple failures with these. Raising an issue now

Glad I could help catch a bug here. Please tag me in that issue, I'm interested in seeing what you found. Slice all the things! 🥷⚔️🥷

raised #10292

shwina · 2022-02-15T16:31:42Z

@gpucibot merge

address first few slowest tests

4d7ae60

brandon-b-miller added feature request New feature or request 2 - In Progress Currently a work in progress tests Unit testing for project Python Affects Python cuDF API. non-breaking Non-breaking change labels Feb 3, 2022

brandon-b-miller self-assigned this Feb 3, 2022

bdice reviewed Feb 3, 2022

View reviewed changes

python/cudf/cudf/tests/test_orc.py Outdated Show resolved Hide resolved

brandon-b-miller added 4 commits February 7, 2022 08:07

continue reducing runtime

1f9e36d

updates

70932d5

??? IDE added this somehow

b94cce4

style

67e3994

brandon-b-miller marked this pull request as ready for review February 7, 2022 20:38

brandon-b-miller requested a review from a team as a code owner February 7, 2022 20:38

brandon-b-miller requested review from galipremsagar and isVoid February 7, 2022 20:38

style

e05d8bc

brandon-b-miller added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Feb 7, 2022

brandon-b-miller commented Feb 7, 2022

View reviewed changes

python/cudf/cudf/tests/test_avro_reader_fastavro_integration.py Show resolved Hide resolved

brandon-b-miller commented Feb 7, 2022

View reviewed changes

shwina reviewed Feb 7, 2022

View reviewed changes

brandon-b-miller added 2 commits February 7, 2022 14:31

remove unused parameters

f367e01

Merge branch 'branch-22.04' into fix-reduce-pytest-time

9e37504

isVoid reviewed Feb 7, 2022

View reviewed changes

python/cudf/cudf/tests/test_binops.py Outdated Show resolved Hide resolved

Update python/cudf/cudf/tests/test_binops.py

d298d4a

Co-authored-by: Michael Wang <[email protected]>

brandon-b-miller added 2 commits February 8, 2022 07:12

address reviews

6e9a0ef

less from_pandas()

e4a98d0

galipremsagar requested changes Feb 8, 2022

View reviewed changes

brandon-b-miller and others added 2 commits February 8, 2022 10:21

combine repr tests

289a13b

Update python/cudf/cudf/tests/test_indexing.py

2e25ed2

Co-authored-by: GALI PREM SAGAR <[email protected]>

galipremsagar reviewed Feb 8, 2022

View reviewed changes

update copyrights

9393bd0

galipremsagar approved these changes Feb 8, 2022

View reviewed changes

bdice reviewed Feb 9, 2022

View reviewed changes

brandon-b-miller and others added 4 commits February 9, 2022 15:02

Apply suggestions from code review

a20102e

Co-authored-by: Bradley Dice <[email protected]>

address reviews

9c800cd

temporarily revert changes

025c69d

copyright

60a0a92

bdice reviewed Feb 11, 2022

View reviewed changes

brandon-b-miller added 2 commits February 14, 2022 06:46

Merge branch 'branch-22.04' into fix-reduce-pytest-time

6128b2a

redo test_cudf_isclose parameterization

227be3c

bdice mentioned this pull request Feb 14, 2022

[FEA] Refactor isclose tests. #10284

Closed

brandon-b-miller added 3 commits February 15, 2022 06:32

revert changes to test_cudf_isclose

ecdb986

Merge branch 'branch-22.04' into fix-reduce-pytest-time

e3b6500

copyright

62d56be

shwina approved these changes Feb 15, 2022

View reviewed changes

bdice approved these changes Feb 15, 2022

View reviewed changes

rapids-bot bot merged commit 851e235 into rapidsai:branch-22.04 Feb 15, 2022

		@@ -1,5 +1,6 @@
		# Copyright (c) 2020-2021, NVIDIA CORPORATION.

		@@ -217,10 +217,11 @@ def test_series_compare(cmpop, obj_class, dtype):

		def _series_compare_nulls_typegen():
		tests = []

		@@ -1,4 +1,4 @@
		# Copyright (c) 2021, NVIDIA CORPORATION.
		# Copyright (c) 2022, NVIDIA CORPORATION.

	# Copyright (c) 2022, NVIDIA CORPORATION.
	# Copyright (c) 2021-2022, NVIDIA CORPORATION.

-        slice(6),
+        slice(6, None),  # start but no stop, [6:]
+        slice(None, None, 3),  # only step, [::3]
+        slice(1, 10, 2),  # start, stop, step
+        slice(3, -5, 2),  # negative stop
+        slice(-2, -4),  # slice is empty
+        slice(-10, -20, -1),  # reversed slice
+        slice(None),  # slices everything, same as [:]

Reduce pytest runtime #10203

Reduce pytest runtime #10203

Conversation

brandon-b-miller commented Feb 3, 2022

codecov bot commented Feb 3, 2022 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shwina Feb 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

galipremsagar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bdice Feb 9, 2022 • edited Loading

Choose a reason for hiding this comment

brandon-b-miller commented Feb 11, 2022

Choose a reason for hiding this comment

bdice Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

bdice Feb 11, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shwina commented Feb 15, 2022

codecov bot commented Feb 3, 2022 •

edited

Loading

bdice Feb 11, 2022 •

edited

Loading

bdice Feb 9, 2022 •

edited

Loading

shwina Feb 7, 2022 •

edited

Loading

bdice Feb 9, 2022 •

edited

Loading

bdice Feb 11, 2022 •

edited

Loading

bdice Feb 11, 2022 •

edited

Loading