-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement hypothesis-based tests for linear models #4952
Implement hypothesis-based tests for linear models #4952
Conversation
80ee4e8
to
b4c3155
Compare
b4c3155
to
fe6d204
Compare
Issues I am currently observing:
I will run some benchmarks, especially to address point one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! Love the general approach and the details of dataset generation.
Dumping some statistics for benchmarking:
This was run with the |
Those benchmark times don't worry me too much. If the shrink phase takes a little while, that's fine. It means that Hypothesis has found something, and 30 seconds of CI time is a small price to pay to get us a really good reproducer to work with. We can continue to assess the test time as we roll this out more generally and see if we need to add e.g. additional pytest configuration options to control the impact (for both dev and CI). |
python/cuml/testing/utils.py
Outdated
def array_difference(a, b, with_sign=True): | ||
""" | ||
Utility function to compute the difference between 2 arrays. | ||
""" | ||
a = to_nparray(a) | ||
b = to_nparray(b) | ||
|
||
if len(a) == 0 and len(b) == 0: | ||
return 0 | ||
|
||
if not with_sign: | ||
a, b = np.abs(a), np.abs(b) | ||
return np.sum(np.abs(a - b)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A common issue we've had is that sometimes it is very hard to diagnose the error or magnitude of errors when tests fail, adding some printing when failing here might be highly benefitial
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function does not directly fail tests, but only computes the difference between arrays a
and b
similar to array_equal()
. I use it as a target function to steer hypothesis towards larger errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented in #4973 .
Here is an example for the output of an actual failure case. |
@wphicks I guess we can't merge this without also addressing the failures? Should I create a longer-running feature branch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides my one xfail comment, this looks great! Really like the current form of the dataset generation strategy.
At this point, I'd love to get this in ASAP so that we can go ahead and start applying this to the CPU/GPU algorithms that we're pulling in with this release.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the update. LGTM!
@dantegd Any final thoughts on this or do we feel good about merging? |
@wphicks One of the tests appeared to be flaky with respect to the efficiency of example generation. I've suppressed the healthcheck for that particular test, but we will have to monitor whether those tests generally become flaky and consider to suppress these healthchecks globally to avoid surprises later on. |
Yep, the healthchecks can become an issue with a lot of assumptions on the dataset generation. I'd say we shouldn't be afraid to suppress them, but we should address them in the generation strategy itself if it begins to impact CI time. |
@gpucibot merge |
@dantegd Could you dismiss your review when you get a moment? (Or re-review if you're still thinking this one over) |
rerun tests |
Looks like we got a conda timeout error in one of the gpu test builds. |
rerun tests |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #4952 +/- ##
===============================================
Coverage ? 79.38%
===============================================
Files ? 184
Lines ? 11698
Branches ? 0
===============================================
Hits ? 9287
Misses ? 2411
Partials ? 0
Flags with carried forward coverage won't be shown. Click here to find out more. Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Closes rapidsai#4943. Authors: - Carl Simon Adorf (https://github.com/csadorf) Approvers: - William Hicks (https://github.com/wphicks) URL: rapidsai#4952
Closes #4943.