-
Notifications
You must be signed in to change notification settings - Fork 915
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce warnings in pytest output #10168
Conversation
Codecov Report
@@ Coverage Diff @@
## branch-22.04 #10168 +/- ##
================================================
+ Coverage 10.42% 10.47% +0.05%
================================================
Files 119 122 +3
Lines 20603 20487 -116
================================================
- Hits 2148 2147 -1
+ Misses 18455 18340 -115
Continue to review full report at Codecov.
|
@@ -29,8 +29,37 @@ | |||
|
|||
@pytest.fixture(params=dtypes, ids=dtypes) | |||
def pandas_input(request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fixture was throwing a lot of warnings about unsafe casting -- in particular, things like creating a Series with dtype int8
from data that could exceed 127 (the max random value was 1000). Instead, I implemented a function that creates random values based on the type.
pytest output on
Warnings on this branch (as of 99006ce, results from a local build):
Looks like this PR cuts the warning count from 19524 warnings to 3041. That's a good start! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left some suggestions for improvement, otherwise LGTM.
Co-authored-by: Vyas Ramasubramani <[email protected]>
def random_ints(dtype, size): | ||
dtype_min = np.iinfo(dtype).min | ||
dtype_max = np.iinfo(dtype).max | ||
return rng.integers(dtype_min, dtype_max, size=size, dtype=dtype) | ||
|
||
try: | ||
dtype = np.dtype(dtype) | ||
except TypeError: | ||
if dtype == "category": | ||
data = random_ints(np.int64, size) | ||
else: | ||
raise | ||
else: | ||
if dtype.kind == "b": | ||
data = rng.choice([False, True], size=size) | ||
elif dtype.kind in ("m", "M"): | ||
# datetime or timedelta | ||
data = random_ints(np.int64, size) | ||
elif dtype.kind == "U": | ||
# Unicode strings of integers like "12345" | ||
data = random_ints(np.int64, size).astype(dtype.str) | ||
elif dtype.kind == "f": | ||
# floats in [0.0, 1.0) | ||
data = rng.random(size=size, dtype=dtype) | ||
else: | ||
data = random_ints(dtype, size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this could be more generally useful. Which also makes me wonder... does something like this already exist in cudf, and can we re-use that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's definitely a good question. I plan to investigate that further and strive for de-duplication as I work through some more warnings. Lots of the warnings are affected by dtypes!
@gpucibot merge |
This PR reduces some warnings in pytest output.
The pyarrow warning that I silenced was occurring thousands of times during our tests but should be fixed by #9686, so I marked it with a TODO.
I just tackled a few files for now, aiming for around 100 LOC changed for ideal review size.