-
Notifications
You must be signed in to change notification settings - Fork 393
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Standalone minimum variance estimator #463
Conversation
This comment has been minimized.
This comment has been minimized.
val outputMeta = OpVectorMetadata(getOutputFeatureName, outputFeatures, vectorMeta.history) | ||
|
||
val summaryMetadata = { | ||
val featuresStatistics = new SummaryStatistics(colStats, sample = 1.0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we ant to have a case class for metadata with a toMetadata and fromMetadata method to make working with these easier
) extends UnaryEstimator[OPVector, OPVector](operationName = operationName, uid = uid) | ||
with MinVarianceFilterParams { | ||
|
||
private def makeColumnStatistics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than copy pasting these functions is it possible to move them to a shared companion object and just have default empty values for the parts you dont use?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
* @param removeBadFeatures | ||
* @return | ||
*/ | ||
def minVariance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps name this filterMinVariance
core/src/main/scala/com/salesforce/op/stages/impl/preparators/DerivedFeatureFilterUtils.scala
Show resolved
Hide resolved
core/src/main/scala/com/salesforce/op/stages/impl/preparators/DerivedFeatureFilterUtils.scala
Show resolved
Hide resolved
} | ||
val removeBad = $(removeBadFeatures) | ||
|
||
logInfo("Getting vector rows") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do these have to be log info statements? debug seem more appropriate.
val json = meta.wrapped.prettyJson | ||
val recovered = Metadata.fromJson(json) | ||
|
||
// recovered shouldBe meta |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to compare properly, instead of the hashcode (since it's difficult to troubleshoot).
import org.junit.runner.RunWith | ||
import org.scalatest.junit.JUnitRunner | ||
|
||
case class UnlabeledTextRawData |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this class to the end of the file
import org.slf4j.impl.Log4jLoggerAdapter | ||
|
||
|
||
trait MinVarianceFilterParams extends DerivedFeatureFilterParams { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this trait to the end of the file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you are just setting defaults for these values you dont need a separate trait just set them in the class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM please just add comments for functions
core/src/test/scala/com/salesforce/op/stages/impl/preparators/MinVarianceFilterTest.scala
Outdated
Show resolved
Hide resolved
@leahmcguire @tovbinm All comments addressed. Let me know if further changes are needed, or whether we can merge, thanks!!! |
Great work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oof, I didn't get to this in time but why did you change all the log levels for SanityChecker? We use this stuff pretty often when running experiments.
Related issues
N/A
Describe the proposed solution
Standalone unary estimator to perform a minimum variance filter on derived features. Move shared functionality out of SanityChecker into DerivedFeatureFilterUtils object
Describe alternatives you've considered
Alternative 1: Gated Params
Alternative 2: Minimal Wrapper Function
Additional context
We have a need for a minimum variance filter in an unsupervised (i.e., label-less) setting. While SanityChecker already has a minimum variance filter, it is a BinaryEstimator and assumes a (response, features) pair as input