-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tweaks to OpBinScoreEvaluator #233
Conversation
Thanks for the contribution! It looks like @shaeselix is an internal user so signing the CLA is not required. However, we need to confirm this. |
@shaeselix you should have an email invite to join the Salesforce org, or you can visit https://github.com/salesforce to accept. Once you've accepted, you can refresh the CLA check at https://cla.salesforce.com/status/salesforce/TransmogrifAI/pull/233 |
Codecov Report
@@ Coverage Diff @@
## master #233 +/- ##
===========================================
- Coverage 86.4% 68.44% -17.96%
===========================================
Files 312 313 +1
Lines 10187 10231 +44
Branches 336 553 +217
===========================================
- Hits 8802 7003 -1799
- Misses 1385 3228 +1843
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## master #233 +/- ##
==========================================
- Coverage 86.6% 79.43% -7.17%
==========================================
Files 315 315
Lines 10341 10345 +4
Branches 325 533 +208
==========================================
- Hits 8956 8218 -738
- Misses 1385 2127 +742
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @shaeselix thanks for the contribution!! Is this different that the score binning done (and stored) in the BrierScore calculation https://github.com/salesforce/TransmogrifAI/blob/master/core/src/main/scala/com/salesforce/op/evaluators/OpBinScoreEvaluator.scala ? Could the lift metric be added to that evaluator as another output metric?
scoreAndLabels: RDD[(Double, Double)], | ||
getScoreBands: RDD[Double] => Seq[(Double, Double, String)] | ||
): Seq[LiftMetricBand] = { | ||
val bands = getScoreBands(scoreAndLabels.map { case (score, _) => score }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
val scores = scoreAndLabels.map { case (score, _) => score }
}.collect { case (Some(band), label) => (band, label) } | ||
val perBandCounts = aggregateBandedLabels(bandedLabels) | ||
val overallRate = overallLiftRate(perBandCounts) | ||
bands.map({ case (lower, upper, band) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bands.map { case (lower, upper, band) => ... }
* @param scores RDD of scores. unused in this function | ||
* @return sequence of (lowerBound, upperBound, bandString) tuples | ||
*/ | ||
private[op] def getDefaultScoreBands(scores: RDD[Double]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant endline private[op] def getDefaultScoreBands(scores: RDD[Double]): Seq[(Double, Double, String)] =
): Seq[LiftMetricBand] = { | ||
liftMetricBands( | ||
scoreAndLabels, | ||
getDefaultScoreBands |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so why not allow users to customize score bands?
* Algorithm for calculating a chart as seen here: | ||
* https://www.kdnuggets.com/2016/03/lift-analysis-data-scientist-secret-weapon.html | ||
*/ | ||
object LiftEvaluator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The evaluator has to extend OpBinaryClassificationEvaluatorBase
, OpMultiClassificationEvaluatorBase
or OpRegressionEvaluatorBase
@leahmcguire @tovbinm apologies for the extended reply to this. I was going through the BrierScore code and while they're calculating very similar things, I don't think they can be combined because the BrierScore calculations are bound by MinScore and MaxScore, and Lift Plots should be bound by 0.0 and 1.0. I was able to refactor it to extend |
@shaeselix - the reason the Brier score goes from min to max and not 0 to 1 is that not every classification model provides a probability as an output. This will fail if you try to run it on an SVM. If that is really the difference in what you are trying to do then you should parameterize the Brier score so you can optionally set the min and max and throw an error if the data falls outside them. |
@leahmcguire @tovbinm okay, we figured out that, yes, this could all be expressed in the OpBinScoreEvaluator. Just made a few tweaks and updated tests. Also happy to contribute to the docs docs for this class, since it looks like they haven't been added yet. |
@shaeselix please do!! ;) |
@leahmcguire @tovbinm any issues here or can we merge? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just change the name and then LGTM
* @param binCenters center of each bin | ||
* @param numberOfDataPoints total number of data points in each bin | ||
* @param sumOfLabels sum of the label in each bin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or do you mean count? - then I request a name change :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's meant to be the count of the "yes" labels, calculated as the sum of the labels. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that makes sense for the name but then see my comment below. Why the sum of yes and not just the count of all (that combined with the average gives you the same info but in a more intuitive formate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't numberOfDataPoints
the count of all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah - good point :-) is what you want really the sum of the labels or the count of the positive labels? I dont think it is possible with the current models but say someone ran binary classification and fed in -1 and 1 instead of 0, 1 what do you want to see?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, what I'm wanting is the count of the positive labels. This could be reconstructed by multiplying the averageConversionRate
with the numberOfDataPoints
but figured it would be cleaner and less subject to round off error by having a parameter for it explicitly.
I assumed your -1/1 example wasn't possible but I can refactor the code to make sure it's only counting positive labels rather than just summing. Would numberOfPositiveLabels
be an appropriate param name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, fix is in! Thanks for the review!
binSize = diff / numOfBins, | ||
binCenters = binCenters, | ||
numberOfDataPoints = numberOfDataPoints, | ||
sumOfLabels = sumOfLabels, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why sum of labels, wouldn't count of labels be more useful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! please add a few lines of docs on how to use brier score as lift metric. perhaps we should still include it somewhere here, e.g. Evaluators.BinaryClassification.lift()
? https://github.com/salesforce/TransmogrifAI/blob/master/core/src/main/scala/com/salesforce/op/evaluators/Evaluators.scala#L40 @leahmcguire wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@shaeselix are you planning to address this in a separate PR then? |
@tovbinm I'm happy to add it in another PR. I would say that BrierScore != Lift. My understanding of lift as a singular metric is that it requires a threshold for a decision boundary. That can be a parameter in the |
Related issues
Refer to issue(s) addressed in this pull request from Issues page.
N/A
Describe the proposed solution
Adding Lift, a new BinaryClassificationMetrics evaluator, to show how well scores predict positive labels in various score groupings / bands. As seen in the first plot of this article: https://www.kdnuggets.com/2016/03/lift-analysis-data-scientist-secret-weapon.html
A new parameter has been added to BinaryClassificationMetrics, LiftMetrics, that is filled with a Seq[LiftMetricBand], calculated with an RDD of scoreAndLabel in LiftEvaluator.
Describe alternatives you've considered
Rather than an evaluator, this could be an estimator. However, since it's a method of evaluation that summarizes a trained dataset, rather than estimating new values from data, IMO it belongs in evaluators. MultiClassificationMetrics has ThresholdMetrics, designed for a Confidence Plot, and this PR was emulated on that design.
Additional context