Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workflow independent model loading #274

Merged
merged 62 commits into from
Jun 14, 2019
Merged

Conversation

tovbinm
Copy link
Collaborator

@tovbinm tovbinm commented Apr 9, 2019

Related issues
Following up on #269 - a sequence of PRs on allowing model loading without workflows...

Describe the proposed solution

1.

This PR finally allows loading models without recreating the workflow by simply calling:

val model = OpWorkflowModel.load(path)

Importantly the old syntax (i.e. workflow.loadModel(path)), still works as before.

With this change we are going to be serializing all transformers and feature generator instances, then reconstructing them when loading the model without requiring the original workflow.

  1. Introduce a new reader/writer interface with a default reflection based implementation
  2. Update stage reader & writer to use it
  3. Add custom serializer for feature generator stages
  4. Update model reader & writer to use new stage read / write technique

Of course updated all the tests with the changes.

Migration Notes
All lamdba transformers and extract functions cannot neither be inline nor have external dependencies on variables. Functions are now required to be specified in objects (Scala 2.11) and as concrete classes for Scala 2.12 and up. Example:

// the following syntax wont work anymore and error at runtime:
val description = FeatureBuilder.Text[Passenger].extract(_.getDescription.toText).asPredictor
val lowerCase = textFeature.map[Text](t => t.value.map(_.toLowerCase).toText)

// replace with concrete function classes as follows:
class DescriptionExtract extends Function1[Passenger, Text] with Serializable { def apply(p: Passenger): Text = p.getDescription.toText }
class LowerCaseText extends Function1[Text, Text] with Serializable { def apply(t: Text): Text = t.value.map(_.toLowerCase).toText }

val description = FeatureBuilder.Text[Passenger].extract(new DescriptionExtract).asPredictor
val lowerCase = textFeature.map[Text](new LowerCaseText)

2.
Additionally one can now provide a custom reader/writer for each transformer stage if necessary as follows:

// implement a custom reader/writer for your stage
class MyStageReaderWriter extends OpPipelineStageReaderWriter[MyStage] {
    def read(stageClass: Class[MyStage], json: JValue): Try[MyStage] = ???
    def write(stage: MyStage): Try[JValue] = ???
}

// add annotation with ReaderWriter to your transformer with the custom reader/writer implementation class MyStageReaderWriter
@ReaderWriter(classOf[MyStageReaderWriter])
class MyStage(...) extends UnaryTransformer[Text, Text] { ... } 

Here is a full example for our own TextTokenizer stage.

@tovbinm tovbinm changed the title Mt/function arguments merged Serialize function arguments for stages Apr 9, 2019
@tovbinm tovbinm requested a review from wsuchy April 9, 2019 22:45
@codecov
Copy link

codecov bot commented Apr 10, 2019

Codecov Report

Merging #274 into master will increase coverage by 0.05%.
The diff coverage is 83.72%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #274      +/-   ##
==========================================
+ Coverage   86.51%   86.56%   +0.05%     
==========================================
  Files         329      335       +6     
  Lines       10617    10748     +131     
  Branches      334      567     +233     
==========================================
+ Hits         9185     9304     +119     
- Misses       1432     1444      +12
Impacted Files Coverage Δ
...c/main/scala/com/salesforce/op/stages/HasOut.scala 100% <ø> (ø) ⬆️
.../scala/com/salesforce/op/test/FeatureAsserts.scala 100% <ø> (ø) ⬆️
...ages/base/sequence/BinarySequenceTransformer.scala 100% <ø> (ø) ⬆️
...la/com/salesforce/op/stages/OpPipelineStages.scala 63.88% <ø> (+1.38%) ⬆️
...rce/op/stages/impl/feature/FilterTransformer.scala 0% <0%> (ø)
...ce/op/stages/impl/feature/ReplaceTransformer.scala 0% <0%> (ø)
...rce/op/stages/impl/feature/ExistsTransformer.scala 0% <0%> (ø)
.../salesforce/op/utils/text/LuceneTextAnalyzer.scala 98.46% <100%> (+0.18%) ⬆️
...s/impl/feature/EmailToPickListMapTransformer.scala 100% <100%> (ø)
...cala/com/salesforce/op/OpWorkflowModelWriter.scala 100% <100%> (ø) ⬆️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 428eb4c...296fe8a. Read the comment docs.

@codecov
Copy link

codecov bot commented Apr 10, 2019

Codecov Report

Merging #274 into master will decrease coverage by 29.33%.
The diff coverage is 58.33%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master     #274       +/-   ##
===========================================
- Coverage   86.35%   57.02%   -29.34%     
===========================================
  Files         319      319               
  Lines       10431    10454       +23     
  Branches      345      550      +205     
===========================================
- Hits         9008     5961     -3047     
- Misses       1423     4493     +3070
Impacted Files Coverage Δ
...rce/op/stages/OpPipelineStageReadWriteShared.scala 100% <ø> (ø) ⬆️
...rce/op/stages/impl/feature/ScalerTransformer.scala 92.59% <ø> (-3.71%) ⬇️
...la/com/salesforce/op/stages/OpPipelineStages.scala 63.51% <ø> (ø) ⬆️
...ages/base/sequence/BinarySequenceTransformer.scala 100% <ø> (ø) ⬆️
...com/salesforce/op/features/types/FeatureType.scala 95.95% <100%> (+0.04%) ⬆️
...a/com/salesforce/op/test/OpPipelineStageSpec.scala 98.03% <100%> (+0.03%) ⬆️
...lesforce/op/utils/reflection/ReflectionUtils.scala 88.23% <11.11%> (-9.14%) ⬇️
...m/salesforce/op/stages/OpPipelineStageReader.scala 64.86% <57.14%> (-0.77%) ⬇️
...m/salesforce/op/stages/OpPipelineStageWriter.scala 70.96% <90%> (+12.63%) ⬆️
...cala/com/salesforce/op/OpWorkflowModelWriter.scala 0% <0%> (-100%) ⬇️
... and 110 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7bcd1e4...4bb8d15. Read the comment docs.

@tovbinm tovbinm changed the title Towards better model serialization Workflow independent model loading Jun 5, 2019
@tovbinm
Copy link
Collaborator Author

tovbinm commented Jun 10, 2019

Can anyone please review this PR?

Copy link
Contributor

@Jauntbox Jauntbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments / questions:

  1. Right now both model loading methods exist, correct? (eg. we can still load with a workflow with lambdas as we always have?)
  2. Do you think we should we explicitly deprecate the old way, or leave it around forever?
  3. Should we also update the helloworld examples since they use lambdas? Or do you think we should leave them as is since they're not saving/loading models?
  4. There are also many examples where we use UnaryLambdaTransformers in tests, but don't save models (eg. BadFeatureZooTest). Do you think we should change all of those too, or just the ones that result in saved models?

If you think we should do either of the last two items, I'm fine with them being in separate PRs since this one is already enormous.

@tovbinm tovbinm merged commit 13ddb4f into master Jun 14, 2019
@tovbinm
Copy link
Collaborator Author

tovbinm commented Jun 14, 2019

Kudos to @wsuchy @Jauntbox for all the help!

@tovbinm tovbinm deleted the mt/function-arguments-merged branch June 14, 2019 16:42
@tovbinm
Copy link
Collaborator Author

tovbinm commented Jun 14, 2019

@Jauntbox

  1. Correct.
  2. I think we should deprecate workflow.loadModel(path) call eventually just to keep things simple.
  3. Yes, we will update them before the next release.
  4. Some of the test changes were not necessary, though I did it anyways.

This was referenced Jun 14, 2019
This was referenced Jul 10, 2019
gerashegalov added a commit that referenced this pull request Sep 13, 2019
- replace lambdas with concrete classes according to #274 
- fix typos
@salesforce-cla
Copy link

Thanks for the contribution! Before we can merge this, we need @wsuchy to sign the Salesforce.com Contributor License Agreement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants