-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add distributions calculated in RawFeatureFilter to ModelInsights #103
Conversation
Codecov Report
@@ Coverage Diff @@
## master #103 +/- ##
=========================================
+ Coverage 86.19% 86.2% +<.01%
=========================================
Files 294 294
Lines 9550 9552 +2
Branches 319 327 +8
=========================================
+ Hits 8232 8234 +2
Misses 1318 1318
Continue to review full report at Codecov.
|
@@ -474,4 +477,22 @@ class ModelInsightsTest extends FlatSpec with PassengerSparkFixtureTest { | |||
f0InDer3.variance shouldBe Some(3.3) | |||
} | |||
|
|||
it should "include raw feature distribution information calculated in RawFeatureFilter when it's included" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can omit when it's included. Perhaps
include raw feature distribution info calculated in RawFeatureFilter`
val wfRawFeatureDistributions = modelWithRFF.getRawFeatureDistributions() | ||
val wfDistributionsGrouped = wfRawFeatureDistributions.groupBy(_.name) | ||
|
||
// Currently, raw features that aren't explicitly blacklisted, but are not used because they are inputs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this go to RawFeatureFilter docs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure I can move it there
) | ||
} | ||
|
||
it should "not include raw feature distribution information if RawFeatureFilter is not used" in { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps not include raw feature distribution info if RawFeatureFilter is not used
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing what's different here... you want "info" instead of "information"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah ;) the point is - try to make the test name a bit shorter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
*/ | ||
case class FeatureInsights(featureName: String, featureType: String, derivedFeatures: Seq[Insights]) | ||
case class FeatureInsights(featureName: String, featureType: String, derivedFeatures: Seq[Insights], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you perhaps reformat?
case class FeatureInsights
(
featureName: String,
featureType: String,
derivedFeatures: Seq[Insights],
distributions: Seq[FeatureDistributionLike] = Seq.empty
)
*/ | ||
case class FeatureInsights(featureName: String, featureType: String, derivedFeatures: Seq[Insights]) | ||
case class FeatureInsights(featureName: String, featureType: String, derivedFeatures: Seq[Insights], | ||
distributions: Seq[FeatureDistributionLike] = Seq.empty) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make type FeatureDistribution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@leahmcguire why not the trait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is the class everywhere else
Thanks for the contribution! It looks like @leahmcguire is an internal user so signing the CLA is not required. However, we need to confirm this. |
Thanks for the contribution! Unfortunately we can't verify the commit author(s): Leah McGuire <l***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request. |
Related issues
n/a
Describe the proposed solution
Add distributions calculated in RawFeatureFilter to ModelInsights. Note that currently, raw features that aren't explicitly blacklisted by RFF, but are not used because they are inputs to explicitly blacklisted features are not present as raw features in the model, nor in ModelInsights. However, the distributions of such features are still present in the model and accessible via
getRawFeatureDistributions()
.Describe alternatives you've considered
n/a
Additional context
n/a