-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding binaryclassification bin score evaluator #119
Conversation
*/ | ||
case class BinaryClassificationBinMetrics | ||
( | ||
BrierScore: Double, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Names should start with a lower case: brierScore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for some reason all of our metrics do not follow this convention @tovbinm. We should figure out what we want and then make them all consistent.
val predictionLabel = "pred" | ||
|
||
val dataset_test = Seq( | ||
(Map("probability_1" -> 0.99999, "probability_0" -> 0.0001, "prediction" -> 1.0), 1.0), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use Prediction type instead
@RunWith(classOf[JUnitRunner]) | ||
class OpBinaryClassifyBinEvaluatorTest extends FlatSpec with TestSparkContext { | ||
|
||
val labelName = "label" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create a dataframe using TestFeatureBuilder easily. That way you need to define these strings here and create dataframes with spark below in tests.
case class BinaryClassificationBinMetrics | ||
( | ||
BrierScore: Double, | ||
BinCenters: Seq[Double], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Json annotations has to added on sequences. See other binclass eval metrics class.
* @param name name of default metric | ||
* @param isLargerBetter is metric better if larger | ||
* @param uid uid for instance | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Perhaps rename to OpBinScoreEvaluator?
- Also this evaluator has to be added to the Evaluators factory object as well and BinaryClassEvalMetrics enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This evaluator returns 5 diff values. Should there be 5 factory methods? or only for Brier Score, which is the defaultmetric?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only the Brier Score. @leahmcguire wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes only for the brier score - the other metrics are support for the brier score. The brier score is the only metric that could be used for optimization
val metrics = new OpBinaryClassifyBinEvaluator(numBins = 0) | ||
.setLabelCol(labelName).setPredictionCol(predictionLabel).evaluateAll(df) | ||
|
||
metrics.BrierScore shouldBe 0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
case classes has equals implemented so you can simply do metrics shouldBe BinaryClassificationBinMetrics(0.0, Seq(), Seq(), Seq(), Seq())
here and everywhere in tests
metrics.AverageConversionRate shouldBe Seq(0.0, 0.0, 0.0, 1.0) | ||
} | ||
|
||
it should "return the empty bin metrics for numBins == 0" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should "error on invalid num of bins"
metrics.AverageConversionRate shouldBe Seq.empty[Double] | ||
} | ||
|
||
it should "return the bin metrics for skewed data" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should "evaluate bin metrics for skewed data"
metrics.AverageConversionRate shouldBe Seq(0.0, 0.0, 0.0, 0.0, 1.0) | ||
} | ||
|
||
it should "return the default metric as BrierScore" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should "evaluate the default metric as BrierScore"
|
||
val emptyDataSet = Seq.empty[(Map[String, Double], Double)] | ||
|
||
Spec[OpBinaryClassifyBinEvaluator] should "return the bin metrics" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should "evaluate the bin metrics"
Codecov Report
@@ Coverage Diff @@
## master #119 +/- ##
==========================================
+ Coverage 86.19% 86.22% +0.02%
==========================================
Files 298 299 +1
Lines 9668 9696 +28
Branches 329 542 +213
==========================================
+ Hits 8333 8360 +27
- Misses 1335 1336 +1
Continue to review full report at Codecov.
|
val dataProcessed = makeDataToUse(data, labelColumnName) | ||
|
||
import dataProcessed.sparkSession.implicits._ | ||
val rdd = dataProcessed.select(getPredictionValueCol, labelColumnName).as[(Double, Double)].rdd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you dont seem to use this rdd beyond checking if the dataset is empty
} else { | ||
// Find the significant digit to which the scores needs to be rounded, based of numBins. | ||
val significantDigitToRoundOff = math.log10(numBins).toInt + 1 | ||
val scoreAndLabelsRounded = for {i <- scoreAndLabels} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my explanation on the doc may have made this more complicated than it needs to be.
The probabilities will be between 0 and 1 so use that information to compute the bins and their centers. Make a function binFn
that takes in the probability and returns the bin. Then you can simply do:
scoreAndLabels.map{ case (score, label) => (binFun(score), (score, label, 1L) }.reduceByKey(_ + _) .map{ case (bin, (scoreSum, labelSum, count)) => (bin, scoreSum/count, labelSum/count, count) }
(you will need to import com.twitter.algebird.Operators._
for the reduceByKey to work without specifying everything)
*/ | ||
case class BinaryClassificationBinMetrics | ||
( | ||
BrierScore: Double, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for some reason all of our metrics do not follow this convention @tovbinm. We should figure out what we want and then make them all consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! This is awesome, thank you for contributing!!
please take a look into the failing test
https://travis-ci.com/salesforce/TransmogrifAI/jobs/145162156 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is actually a deeper issue with probability values, so I actually let @leahmcguire comment here.
Yeah so what is happening is that not every model type produces a probablity column. And there is a special case in the makeData call that will fall back to the rawPrediciton value instead of the probability. This is not bounded so you will actually need to do a reduce to get the min and max values for the bin range. |
So do you mean, default min max value to compute bin should be [0,1] |
Thanks for the contribution! It looks like @leahmcguire is an internal user so signing the CLA is not required. However, we need to confirm this. |
Thanks for the contribution! Unfortunately we can't verify the commit author(s): Leah McGuire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request. |
Related issues
#101
Describe the proposed solution
Creating a evaluator for Binary Classification which provides statistics about the predicted scores. This evaluator creates the specified number of bins and computes the following details for each bin.
Overall BrierScore for the scores is also computed, which is a default metric as well.
Added unit tests for the evaluator