-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix flaky ModelInsight tests #395
Conversation
Codecov Report
@@ Coverage Diff @@
## master #395 +/- ##
=======================================
Coverage 86.82% 86.82%
=======================================
Files 336 336
Lines 10962 10962
Branches 572 572
=======================================
Hits 9518 9518
Misses 1444 1444 Continue to review full report at Codecov.
|
@@ -755,8 +755,8 @@ class ModelInsightsTest extends FlatSpec with PassengerSparkFixtureTest with Dou | |||
val bigCoeffSum = orginalbigCoeff * math.sqrt(smallFeatureVariance) / labelStd + descaledbigCoeff | |||
val absError2 = math.abs(originalsmallCoeff * math.sqrt(bigFeatureVariance) / labelStd - descaledsmallCoeff) | |||
val smallCoeffSum = originalsmallCoeff * math.sqrt(bigFeatureVariance) / labelStd + descaledsmallCoeff | |||
absError / bigCoeffSum < tol shouldBe true | |||
absError2 / smallCoeffSum < tol shouldBe true | |||
absError / (2 * bigCoeffSum) < tol shouldBe true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you hardcode this change instead of adjusting the tolerance? Is there a reason for a factor of 2 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think back when i wrote this test you suggested abs(x_1 - x_2) / 2 * (x_1 + x_2)
as one way to compute how close x_1 is to x_2, or did I misremember?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested difference / avg. value, which is abs(x_1 - x_2) * 2 / (x_1 + x_2)
. It probably makes more sense to change it to that, but you should probably increase the overall tolerance too to cut down on flakiness.
@@ -755,8 +755,8 @@ class ModelInsightsTest extends FlatSpec with PassengerSparkFixtureTest with Dou | |||
val bigCoeffSum = orginalbigCoeff * math.sqrt(smallFeatureVariance) / labelStd + descaledbigCoeff | |||
val absError2 = math.abs(originalsmallCoeff * math.sqrt(bigFeatureVariance) / labelStd - descaledsmallCoeff) | |||
val smallCoeffSum = originalsmallCoeff * math.sqrt(bigFeatureVariance) / labelStd + descaledsmallCoeff | |||
absError / bigCoeffSum < tol shouldBe true | |||
absError2 / smallCoeffSum < tol shouldBe true | |||
2 * absError / bigCoeffSum < tol shouldBe true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand your fix correctly that you double the left-hand side the test would also be fixed by:
absError should be < tol * smallCoeffSum
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at any rate please use "should be < tol" as it seems to read more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that would also provide a better error message ;) cool!
Bug fixes: - Ensure correct metrics despite model failures on some CV folds [#404](#404) - Fix flaky `ModelInsight` tests [#395](#395) - Avoid creating `SparseVector`s for LOCO [#377](#377) New features / updates: - Model combiner [#385](#399) - Added new sample for HousingPrices [#365](#365) - Test to verify that custom metrics appear in model insight metrics [#387](#387) - Add `FeatureDistribution` to `SerializationFormat`s [#383](#383) - Add metadata to `OpStandadrdScaler` to allow for descaling [#378](#378) - Improve json serde error in `evalMetFromJson` [#380](#380) - Track mean & standard deviation as metrics for numeric features and for text length of text features [#354](#354) - Making model selectors robust to failing models [#372](#372) - Use compact and compressed model json by default [#375](#375) - Descale feature contribution for Linear Regression & Logistic Regression [#345](#345) Dependency updates: - Update tika version [#382](#382)
Related issues
https://github.com/salesforce/TransmogrifAI/blob/master/core/src/test/scala/com/salesforce/op/ModelInsightsTest.scala#L740
https://github.com/salesforce/TransmogrifAI/blob/master/core/src/test/scala/com/salesforce/op/ModelInsightsTest.scala#L762
are flaky due to a low tolerance threshold.
Describe the proposed solution
Change the test to compute a smaller ratio.