Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scaler and descaler transformers #223

Merged
merged 7 commits into from
Feb 11, 2019
Merged

Conversation

ericwayman
Copy link
Contributor

Related Issues
Often in regression use cases the response variable is not normally distributed. As a data scientist training regression models I'd like generic transformers to easily scale the response variable during training time, but to also descale the predictions (at scoring time) so predictions are returned in the original scale.

Proposed Solution
A general framework for scaling and descaling the response feature that allows developers to easily add support for new feature scaling functions with minimal boiler plate code and without having to write new code to handle reading and writing metadata.

Each family of scaling functions is represented by a case class which extends the Scaler case class.
All supported families of scaling functions are stored in the ScalingType enum.

The ScalerTransformer constructs a transformer that scales features from scalingType and scalingArgs. The scalingType and scalingArgs are stored in the metadata of this feature.

The DescalerTransformer takes in a feature for descaling and a scaled feature containing the metadata for constructing the scaling inverse. This allows for predictions (in the scaled domain) to be descaled at scoring time.

@codecov
Copy link

codecov bot commented Feb 8, 2019

Codecov Report

Merging #223 into master will increase coverage by 0.03%.
The diff coverage is 91.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #223      +/-   ##
==========================================
+ Coverage   86.36%   86.39%   +0.03%     
==========================================
  Files         310      312       +2     
  Lines       10137    10182      +45     
  Branches      347      333      -14     
==========================================
+ Hits         8755     8797      +42     
- Misses       1382     1385       +3
Impacted Files Coverage Δ
...e/op/stages/impl/feature/DescalerTransformer.scala 100% <100%> (ø)
...rce/op/stages/OpPipelineStageReadWriteShared.scala 100% <100%> (ø) ⬆️
...ala/com/salesforce/op/dsl/RichNumericFeature.scala 100% <100%> (ø) ⬆️
...n/scala/com/salesforce/op/dsl/RichMapFeature.scala 73.61% <25%> (+0.37%) ⬆️
...rce/op/stages/impl/feature/ScalerTransformer.scala 96.29% <96.29%> (ø)
...es/src/main/scala/com/salesforce/op/OpParams.scala 85.71% <0%> (-4.09%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 104088a...bcb1aaf. Read the comment docs.

@tovbinm tovbinm changed the title Commits with correct email Scaler and descaler transformers Feb 9, 2019
@tovbinm tovbinm requested a review from Jauntbox February 9, 2019 06:04
import scala.util.{Failure, Success}

@RunWith(classOf[JUnitRunner])
class ScalerMetadataTest extends FlatSpec with TestSparkContext{
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a space as well TestSparkContext {

Copy link
Collaborator

@tovbinm tovbinm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 lgtm and thank you!

@ericwayman ericwayman merged commit e28122c into master Feb 11, 2019
@ericwayman ericwayman deleted the ew/scalerDescalerTransformers branch February 11, 2019 22:09
@tovbinm tovbinm mentioned this pull request Apr 10, 2019
@tovbinm tovbinm mentioned this pull request Jul 11, 2019
@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @ericwayman to sign the Salesforce.com Contributor License Agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants