Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Standalone minimum variance estimator #463

Merged
merged 18 commits into from
Mar 6, 2020
Merged

Standalone minimum variance estimator #463

merged 18 commits into from
Mar 6, 2020

Conversation

clin-projects
Copy link
Contributor

@clin-projects clin-projects commented Feb 26, 2020

Related issues
N/A

Describe the proposed solution
Standalone unary estimator to perform a minimum variance filter on derived features. Move shared functionality out of SanityChecker into DerivedFeatureFilterUtils object

Describe alternatives you've considered

Alternative 1: Gated Params

  • Put all SanityChecker filters behind params so that they can be disabled
  • Currently, SanityChecker member functions are private; we would make public those functions needed by UnsupervisedSanityChecker
  • UnsupervisedSanityChecker = unaryEstimator that calls functions from SanityChecker
  • Keep SanityCheckerParams the same for both (there will just be some extra that do not apply to unary case)

Alternative 2: Minimal Wrapper Function

  • Create a wrapper function that creates a new SanityChecker stage using a dummy response and sets trivial thresholds to deactivate filters that use the response

Additional context
We have a need for a minimum variance filter in an unsupervised (i.e., label-less) setting. While SanityChecker already has a minimum variance filter, it is a BinaryEstimator and assumes a (response, features) pair as input

@codecov

This comment has been minimized.

@clin-projects clin-projects changed the title Standalone minimum variance estimator [WIP] Standalone minimum variance estimator Feb 27, 2020
val outputMeta = OpVectorMetadata(getOutputFeatureName, outputFeatures, vectorMeta.history)

val summaryMetadata = {
val featuresStatistics = new SummaryStatistics(colStats, sample = 1.0)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we ant to have a case class for metadata with a toMetadata and fromMetadata method to make working with these easier

@clin-projects clin-projects changed the title [WIP] Standalone minimum variance estimator Standalone minimum variance estimator Feb 28, 2020
) extends UnaryEstimator[OPVector, OPVector](operationName = operationName, uid = uid)
with MinVarianceFilterParams {

private def makeColumnStatistics
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than copy pasting these functions is it possible to move them to a shared companion object and just have default empty values for the parts you dont use?

Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

* @param removeBadFeatures
* @return
*/
def minVariance
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps name this filterMinVariance

}
val removeBad = $(removeBadFeatures)

logInfo("Getting vector rows")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these have to be log info statements? debug seem more appropriate.

val json = meta.wrapped.prettyJson
val recovered = Metadata.fromJson(json)

// recovered shouldBe meta
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to compare properly, instead of the hashcode (since it's difficult to troubleshoot).

import org.junit.runner.RunWith
import org.scalatest.junit.JUnitRunner

case class UnlabeledTextRawData
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this class to the end of the file

import org.slf4j.impl.Log4jLoggerAdapter


trait MinVarianceFilterParams extends DerivedFeatureFilterParams {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this trait to the end of the file

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are just setting defaults for these values you dont need a separate trait just set them in the class

Copy link
Collaborator

@leahmcguire leahmcguire left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM please just add comments for functions

@clin-projects
Copy link
Contributor Author

@leahmcguire @tovbinm All comments addressed. Let me know if further changes are needed, or whether we can merge, thanks!!!

@tovbinm tovbinm merged commit eabb33e into master Mar 6, 2020
@tovbinm
Copy link
Collaborator

tovbinm commented Mar 6, 2020

Great work!

@tovbinm tovbinm deleted the cl/minvarfilter branch March 6, 2020 14:58
Copy link
Contributor

@Jauntbox Jauntbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oof, I didn't get to this in time but why did you change all the log levels for SanityChecker? We use this stuff pretty often when running experiments.

@nicodv nicodv mentioned this pull request Jun 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants