-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outputting Raw Feature Filter information: Part 1 #237
Conversation
Codecov Report
@@ Coverage Diff @@
## master #237 +/- ##
==========================================
- Coverage 86.56% 86.55% -0.01%
==========================================
Files 314 314
Lines 10317 10297 -20
Branches 567 553 -14
==========================================
- Hits 8931 8913 -18
+ Misses 1386 1384 -2
Continue to review full report at Codecov.
|
…sults; made Reader backwards compatible
@leahmcguire @tovbinm @Jauntbox Ready for review
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor comments, otherwise looks good.
|
||
object RawFeatureFilterResults { | ||
|
||
implicit val jsonFormats: Formats = DefaultFormats + |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right now, we have a bunch of these living in OpPipelineStageReadWriteShared
as an implicit Formats
variable. Should we put this there to for consistency? Or move the ones in OpPipelineStageReadWriteShared
to the classes they belong to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
put it there i think
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpPipelineStageReadWriteShared
is in the features
module, whereas RawFeatureFilter
files are all in core
. I think if we decide to move the Formats
variable to OpPipelineStageReadWriteShared
, then we may have to do some refactoring to avoid some circular reference? Like move RawFeatureFilter
and all associated files into the features
module
Please let me know if you know of a way to do this without having to refactor, or how else you'd like to proceed (keep as is, move the Formats
variable to their respective classes, etc.)
For reference, this is what is in OpPipelineStageReadWriteShared
:
implicit val formats: Formats =
DefaultFormats ++
JodaTimeSerializers.all +
EnumEntrySerializer.json4s[AnyValueTypes](AnyValueTypes) +
EnumEntrySerializer.json4s[HashAlgorithm](HashAlgorithm) +
EnumEntrySerializer.json4s[HashSpaceStrategy](HashSpaceStrategy) +
EnumEntrySerializer.json4s[ScalingType](ScalingType) +
EnumEntrySerializer.json4s[TimePeriod](TimePeriod) +
EnumEntrySerializer.json4s[FeatureDistributionType](FeatureDistributionType) +
new SpecialDoubleSerializer
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point, but it turns out that we actually have all the serialization formats for model insights in the companion object so lets move it there for consistency.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leahmcguire Thanks for pointing this example out!
Serialization formats for RawFeatureFilterResults
is currently placed in a companion object. So just wondering what you're envisioning if beyond what's implemented already
Couple possibilities below:
-
There are some
Formats
lines ofOpPipelineStageReadWriteShared
that correspond to an existing class (e.g.,ScalingType
). We can move these into companion objects -
SerializationFormats in
RawFeatureFilterResults
currently extends fromDefaultFormats
but the one inModelInsights
does not. We can makeRawFeatureFilterResults
better resembleModelInsights
by explicitly defining what it needs fromDefaultFormats
?
Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leahmcguire @clin-projects Is RawFeatureFilterResults
going to be serialized as a part of ModelInsights
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes it is. RawFeatureFilterResults
has four components:
- configuration <= will be incorporated in
getStageInfo
- raw feature distributions <= currently serialized in
FeatureInsights
- metrics <= will be serialized in
FeatureInsights
- exclusion reasons <= currently serialized in
FeatureInsights
* @param fillRatioDiffMismatch distribution mismatch: fill ratio difference exceeded max allowed | ||
* @param excluded feature excluded after failing one or more tests | ||
*/ | ||
case class ExclusionReasons |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few suggestions for clearer names here:
trainingUnfilledState
-> trainingFillRate
trainingNullLabelLeaker
-> trainingNullLabelCorrelation
scoringUnfilledState
-> scoringFillRate
jsDivergenceMismatch
-> jsDivergence
fillRateDiffMismatch
-> fillRateDifference
fillRatioDiffMismatc
-> fillRatioDifference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is a tough situation because need to make ExclusionReasons
names distinct from the those in RawFeatureFilterMetrics
(where I've already used many of the names you're suggesting!):
case class RawFeatureFilterMetrics
(
name: String,
trainingFillRate: Double,
trainingNullLabelAbsoluteCorr: Option[Double],
scoringFillRate: Option[Double],
jsDivergence: Option[Double],
fillRateDiff: Option[Double],
fillRatioDiff: Option[Double]
) extends RawFeatureFilterMetricsLike
Thought really long about this... part of rationale for name choices was that these variables refer to outcomes of a comparison test (boolean), so don't want to make it seem like they are referring to a continuous value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I'm not sure I follow why the field names in RawFeatureFilterMetrics
need to be different from the ones in ExclusionReasons
. It should be clear from the type which case class one is referring to, no?
Your point about them being booleans for test results makes sense though, so lets keep them as is.
core/src/main/scala/com/salesforce/op/filters/RawFeatureFilterResultsLike.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/com/salesforce/op/filters/FeatureDistributionTest.scala
Show resolved
Hide resolved
core/src/test/scala/com/salesforce/op/filters/RawFeatureFilterTest.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/OpWorkflowModelReader.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/filters/RawFeatureFilter.scala
Outdated
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/filters/RawFeatureFilterResults.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/com/salesforce/op/OpWorkflowModelReaderWriterTest.scala
Outdated
Show resolved
Hide resolved
core/src/test/scala/com/salesforce/op/OpWorkflowModelReaderWriterTest.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good ( I think ), thought it's hard to tell now cause the PR became quite large ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please add a test that the exclusion reasons in ModelInsights serializes and deserializes correctly https://github.com/salesforce/TransmogrifAI/blob/master/core/src/test/scala/com/salesforce/op/ModelInsightsTest.scala#L347 and then LGTM
@Jauntbox for review! thanks!!!! |
Related issues
I would like to pass information that is obtained by RawFeatureFilter (RFF) into ModelInsights. Some information is already passed through (via FeatureDistribution) but some critical information is not, e.g., reason why a feature was excluded by RFF.
Describe the proposed solution
Creating RawFeatureFilterResults case class that contains and passes information through workflow, gets passed to ModelInsights, and is able to export contents as JSON.
For further details and history of this PR, please refer to a previous PR on a forked branch: clin-projects#1
Update (March 20, 2019)