Tweaks to OpBinScoreEvaluator #233

shaeselix · 2019-02-28T01:37:20Z

Related issues
Refer to issue(s) addressed in this pull request from Issues page.
N/A

Describe the proposed solution
Adding Lift, a new BinaryClassificationMetrics evaluator, to show how well scores predict positive labels in various score groupings / bands. As seen in the first plot of this article: https://www.kdnuggets.com/2016/03/lift-analysis-data-scientist-secret-weapon.html
A new parameter has been added to BinaryClassificationMetrics, LiftMetrics, that is filled with a Seq[LiftMetricBand], calculated with an RDD of scoreAndLabel in LiftEvaluator.

Describe alternatives you've considered
Rather than an evaluator, this could be an estimator. However, since it's a method of evaluation that summarizes a trained dataset, rather than estimating new values from data, IMO it belongs in evaluators. MultiClassificationMetrics has ThresholdMetrics, designed for a Confidence Plot, and this PR was emulated on that design.

Additional context

salesforce-cla · 2019-02-28T01:37:24Z

Thanks for the contribution! It looks like @shaeselix is an internal user so signing the CLA is not required. However, we need to confirm this.

amateurhuman · 2019-02-28T01:50:04Z

@shaeselix you should have an email invite to join the Salesforce org, or you can visit https://github.com/salesforce to accept. Once you've accepted, you can refresh the CLA check at https://cla.salesforce.com/status/salesforce/TransmogrifAI/pull/233

codecov · 2019-02-28T18:24:45Z

Codecov Report

Merging #233 into master will decrease coverage by 17.95%.
The diff coverage is 89.13%.

@@             Coverage Diff             @@
##           master     #233       +/-   ##
===========================================
- Coverage    86.4%   68.44%   -17.96%     
===========================================
  Files         312      313        +1     
  Lines       10187    10231       +44     
  Branches      336      553      +217     
===========================================
- Hits         8802     7003     -1799     
- Misses       1385     3228     +1843

Impacted Files	Coverage Δ
...p/evaluators/OpBinaryClassificationEvaluator.scala	`78.57% <50%> (-3.93%)`	⬇️
...a/com/salesforce/op/evaluators/LiftEvaluator.scala	`92.85% <92.85%> (ø)`
...ce/op/stages/impl/feature/TextLenTransformer.scala	`0% <0%> (-100%)`	⬇️
...lesforce/op/stages/impl/feature/LangDetector.scala	`0% <0%> (-100%)`	⬇️
...alesforce/op/cli/gen/templates/SimpleProject.scala	`0% <0%> (-100%)`	⬇️
...main/scala/com/salesforce/op/filters/Summary.scala	`0% <0%> (-100%)`	⬇️
.../scala/com/salesforce/op/cli/gen/ProblemKind.scala	`0% <0%> (-100%)`	⬇️
...orce/op/stages/impl/feature/RealNNVectorizer.scala	`0% <0%> (-100%)`	⬇️
...cala/com/salesforce/op/cli/gen/FileInProject.scala	`0% <0%> (-100%)`	⬇️
...p/stages/impl/feature/OpScalarStandardScaler.scala	`0% <0%> (-100%)`	⬇️
... and 56 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update da1b27f...cc65e50. Read the comment docs.

codecov · 2019-02-28T18:24:46Z

Codecov Report

Merging #233 into master will decrease coverage by 7.16%.
The diff coverage is 95.83%.

@@            Coverage Diff             @@
##           master     #233      +/-   ##
==========================================
- Coverage    86.6%   79.43%   -7.17%     
==========================================
  Files         315      315              
  Lines       10341    10345       +4     
  Branches      325      533     +208     
==========================================
- Hits         8956     8218     -738     
- Misses       1385     2127     +742

Impacted Files	Coverage Δ
...p/evaluators/OpBinaryClassificationEvaluator.scala	`82.5% <ø> (ø)`	⬆️
...salesforce/op/evaluators/OpBinScoreEvaluator.scala	`96.77% <95.83%> (+0.47%)`	⬆️
...ala/com/salesforce/op/testkit/InfiniteStream.scala	`0% <0%> (-100%)`	⬇️
...alesforce/op/cli/gen/templates/SimpleProject.scala	`0% <0%> (-100%)`	⬇️
...ala/com/salesforce/op/testkit/RandomIntegral.scala	`0% <0%> (-100%)`	⬇️
...scala/com/salesforce/op/testkit/RandomStream.scala	`0% <0%> (-100%)`	⬇️
...n/scala/com/salesforce/op/testkit/RandomData.scala	`0% <0%> (-100%)`	⬇️
...com/salesforce/op/local/OpWorkflowModelLocal.scala	`0% <0%> (-100%)`	⬇️
...com/salesforce/op/testkit/ProbabilityOfEmpty.scala	`0% <0%> (-100%)`	⬇️
...om/salesforce/op/local/OpWorkflowRunnerLocal.scala	`0% <0%> (-100%)`	⬇️
... and 21 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update df42f37...e67646a. Read the comment docs.

leahmcguire

Hi @shaeselix thanks for the contribution!! Is this different that the score binning done (and stored) in the BrierScore calculation https://github.com/salesforce/TransmogrifAI/blob/master/core/src/main/scala/com/salesforce/op/evaluators/OpBinScoreEvaluator.scala ? Could the lift metric be added to that evaluator as another output metric?

tovbinm · 2019-03-07T23:44:43Z