Workflow independent model loading #274

tovbinm · 2019-04-09T22:45:07Z

Related issues
Following up on #269 - a sequence of PRs on allowing model loading without workflows...

Describe the proposed solution

1.

This PR finally allows loading models without recreating the workflow by simply calling:

val model = OpWorkflowModel.load(path)

Importantly the old syntax (i.e. workflow.loadModel(path)), still works as before.

With this change we are going to be serializing all transformers and feature generator instances, then reconstructing them when loading the model without requiring the original workflow.

Introduce a new reader/writer interface with a default reflection based implementation
Update stage reader & writer to use it
Add custom serializer for feature generator stages
Update model reader & writer to use new stage read / write technique

Of course updated all the tests with the changes.

Migration Notes
All lamdba transformers and extract functions cannot neither be inline nor have external dependencies on variables. Functions are now required to be specified in objects (Scala 2.11) and as concrete classes for Scala 2.12 and up. Example:

// the following syntax wont work anymore and error at runtime:
val description = FeatureBuilder.Text[Passenger].extract(_.getDescription.toText).asPredictor
val lowerCase = textFeature.map[Text](t => t.value.map(_.toLowerCase).toText)

// replace with concrete function classes as follows:
class DescriptionExtract extends Function1[Passenger, Text] with Serializable { def apply(p: Passenger): Text = p.getDescription.toText }
class LowerCaseText extends Function1[Text, Text] with Serializable { def apply(t: Text): Text = t.value.map(_.toLowerCase).toText }

val description = FeatureBuilder.Text[Passenger].extract(new DescriptionExtract).asPredictor
val lowerCase = textFeature.map[Text](new LowerCaseText)

2.
Additionally one can now provide a custom reader/writer for each transformer stage if necessary as follows:

// implement a custom reader/writer for your stage
class MyStageReaderWriter extends OpPipelineStageReaderWriter[MyStage] {
    def read(stageClass: Class[MyStage], json: JValue): Try[MyStage] = ???
    def write(stage: MyStage): Try[JValue] = ???
}

// add annotation with ReaderWriter to your transformer with the custom reader/writer implementation class MyStageReaderWriter
@ReaderWriter(classOf[MyStageReaderWriter])
class MyStage(...) extends UnaryTransformer[Text, Text] { ... }

Here is a full example for our own TextTokenizer stage.

:

codecov · 2019-04-10T00:47:46Z

Codecov Report

Merging #274 into master will increase coverage by 0.05%.
The diff coverage is 83.72%.

@@            Coverage Diff             @@
##           master     #274      +/-   ##
==========================================
+ Coverage   86.51%   86.56%   +0.05%     
==========================================
  Files         329      335       +6     
  Lines       10617    10748     +131     
  Branches      334      567     +233     
==========================================
+ Hits         9185     9304     +119     
- Misses       1432     1444      +12

Impacted Files	Coverage Δ
...c/main/scala/com/salesforce/op/stages/HasOut.scala	`100% <ø> (ø)`	⬆️
.../scala/com/salesforce/op/test/FeatureAsserts.scala	`100% <ø> (ø)`	⬆️
...ages/base/sequence/BinarySequenceTransformer.scala	`100% <ø> (ø)`	⬆️
...la/com/salesforce/op/stages/OpPipelineStages.scala	`63.88% <ø> (+1.38%)`	⬆️
...rce/op/stages/impl/feature/FilterTransformer.scala	`0% <0%> (ø)`
...ce/op/stages/impl/feature/ReplaceTransformer.scala	`0% <0%> (ø)`
...rce/op/stages/impl/feature/ExistsTransformer.scala	`0% <0%> (ø)`
.../salesforce/op/utils/text/LuceneTextAnalyzer.scala	`98.46% <100%> (+0.18%)`	⬆️
...s/impl/feature/EmailToPickListMapTransformer.scala	`100% <100%> (ø)`
...cala/com/salesforce/op/OpWorkflowModelWriter.scala	`100% <100%> (ø)`	⬆️
... and 34 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 428eb4c...296fe8a. Read the comment docs.

codecov · 2019-04-10T00:47:46Z

Codecov Report

Merging #274 into master will decrease coverage by 29.33%.
The diff coverage is 58.33%.

@@             Coverage Diff             @@
##           master     #274       +/-   ##
===========================================
- Coverage   86.35%   57.02%   -29.34%     
===========================================
  Files         319      319               
  Lines       10431    10454       +23     
  Branches      345      550      +205     
===========================================
- Hits         9008     5961     -3047     
- Misses       1423     4493     +3070

Impacted Files	Coverage Δ
...rce/op/stages/OpPipelineStageReadWriteShared.scala	`100% <ø> (ø)`	⬆️
...rce/op/stages/impl/feature/ScalerTransformer.scala	`92.59% <ø> (-3.71%)`	⬇️
...la/com/salesforce/op/stages/OpPipelineStages.scala	`63.51% <ø> (ø)`	⬆️
...ages/base/sequence/BinarySequenceTransformer.scala	`100% <ø> (ø)`	⬆️
...com/salesforce/op/features/types/FeatureType.scala	`95.95% <100%> (+0.04%)`	⬆️
...a/com/salesforce/op/test/OpPipelineStageSpec.scala	`98.03% <100%> (+0.03%)`	⬆️
...lesforce/op/utils/reflection/ReflectionUtils.scala	`88.23% <11.11%> (-9.14%)`	⬇️
...m/salesforce/op/stages/OpPipelineStageReader.scala	`64.86% <57.14%> (-0.77%)`	⬇️
...m/salesforce/op/stages/OpPipelineStageWriter.scala	`70.96% <90%> (+12.63%)`	⬆️
...cala/com/salesforce/op/OpWorkflowModelWriter.scala	`0% <0%> (-100%)`	⬇️
... and 110 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7bcd1e4...4bb8d15. Read the comment docs.

…TransmogrifAI into mt/function-arguments-merged

…function-arguments-merged

…TransmogrifAI into mt/function-arguments-merged

tovbinm · 2019-06-10T17:30:16Z

Can anyone please review this PR?

features/src/test/scala/com/salesforce/op/stages/base/binary/BinaryTransformerTest.scala

Jauntbox

A few comments / questions:

Right now both model loading methods exist, correct? (eg. we can still load with a workflow with lambdas as we always have?)
Do you think we should we explicitly deprecate the old way, or leave it around forever?
Should we also update the helloworld examples since they use lambdas? Or do you think we should leave them as is since they're not saving/loading models?
There are also many examples where we use UnaryLambdaTransformers in tests, but don't save models (eg. BadFeatureZooTest). Do you think we should change all of those too, or just the ones that result in saved models?

If you think we should do either of the last two items, I'm fine with them being in separate PRs since this one is already enormous.

tovbinm · 2019-06-14T16:42:47Z

Kudos to @wsuchy @Jauntbox for all the help!

tovbinm · 2019-06-14T16:49:55Z

@Jauntbox

Correct.
I think we should deprecate workflow.loadModel(path) call eventually just to keep things simple.
Yes, we will update them before the next release.
Some of the test changes were not necessary, though I did it anyways.

- replace lambdas with concrete classes according to #274 - fix typos

salesforce-cla · 2020-12-15T09:48:23Z

Thanks for the contribution! Before we can merge this, we need @wsuchy to sign the Salesforce.com Contributor License Agreement.

tovbinm added 2 commits April 9, 2019 15:20

Serialize function arguments for stages

b3c29c6

Merged with Chris branch

7f698c9

:

tovbinm requested a review from leahmcguire as a code owner April 9, 2019 22:45

tovbinm changed the title ~~Mt/function arguments merged~~ Serialize function arguments for stages Apr 9, 2019

tovbinm requested a review from wsuchy April 9, 2019 22:45

tovbinm added the work in progress label Apr 9, 2019

fixes

4bb8d15

wsuchy and others added 21 commits April 10, 2019 09:02

fixed DropIndicesByTransformerTest

3c99062

isModel has become lagacy so there is no point of testing it

674bd49

fixed scala stlye

0f285c3

fixed loading of old models

053e0c1

simplify anyval

8847d0e

Merge branch 'master' into mt/function-arguments-merged

8e3e912

Fix scaler transformer

7fec077

Merge branch 'mt/function-arguments-merged' of github.com:salesforce/…

b1804ab

…TransmogrifAI into mt/function-arguments-merged

RichFeature fixes

fb10684

RichMapFeature changes

fb723e6

some serialiation fixes for language detector

227787b

cleanup

c7b92d6

RichTextFeature and others

4f448f1

some more fixes

2d9e53b

scaler test fix

5d3d18a

RichMap and RichTExt

0bb452c

Merge branch 'mt/function-arguments-merged' of github.com:salesforce/…

e400cbc

…TransmogrifAI into mt/function-arguments-merged

Fixed scaler test

8c82fc6

Merge branch 'master' of github.com:salesforce/TransmogrifAI into mt/…

cc88fd1

…function-arguments-merged

Merge branch 'mt/function-arguments-merged' of github.com:salesforce/…

fcd01c7

…TransmogrifAI into mt/function-arguments-merged

cleanup

5afb52f

tovbinm requested review from Jauntbox, kinfaikan and gerashegalov May 30, 2019 00:40

sss

2cd4c77

tovbinm mentioned this pull request May 30, 2019

Model Load from a brand new workflow #75

Closed

Merge branch 'master' into mt/function-arguments-merged

12ec105

tovbinm requested a review from alexandrnikitin May 31, 2019 17:24

Merge branch 'master' into mt/function-arguments-merged

ed348c3

tovbinm changed the title ~~Towards better model serialization~~ Workflow independent model loading Jun 5, 2019

Merge branch 'master' into mt/function-arguments-merged

296fe8a

alexandrnikitin approved these changes Jun 11, 2019

View reviewed changes

Jauntbox reviewed Jun 13, 2019

View reviewed changes

features/src/test/scala/com/salesforce/op/stages/base/binary/BinaryTransformerTest.scala Show resolved Hide resolved

Jauntbox reviewed Jun 13, 2019

View reviewed changes

Jauntbox approved these changes Jun 14, 2019

View reviewed changes

tovbinm merged commit 13ddb4f into master Jun 14, 2019

tovbinm deleted the mt/function-arguments-merged branch June 14, 2019 16:42

This was referenced Jun 14, 2019

Support Scala 2.12 #332

Open

Remove local runner + update docs #335

Merged

tovbinm mentioned this pull request Jul 10, 2019

Convert lambda functions into concrete classes to allow compatibility with Scala 2.11/2.12 #357

Merged

This was referenced Jul 10, 2019

0.6.0 Release #360

Closed

0.6.0 release #364

Merged

gerashegalov mentioned this pull request Sep 12, 2019

Upgrade helloworld to tmog 0.6.1 #406

Merged

gerashegalov added a commit that referenced this pull request Sep 13, 2019

Upgrade helloworld to tmog 0.6.1 (#406)

07fd316

- replace lambdas with concrete classes according to #274 - fix typos

salesforce-cla bot added the cla:signed label Sep 12, 2020

salesforce-cla bot added cla:missing and removed cla:signed labels Dec 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow independent model loading #274

Workflow independent model loading #274

tovbinm commented Apr 9, 2019 •

edited

Loading

codecov bot commented Apr 10, 2019 •

edited

Loading

codecov bot commented Apr 10, 2019

tovbinm commented Jun 10, 2019

Jauntbox left a comment •

edited by tovbinm

Loading

tovbinm commented Jun 14, 2019

tovbinm commented Jun 14, 2019

salesforce-cla bot commented Dec 15, 2020

Workflow independent model loading #274

Workflow independent model loading #274

Conversation

tovbinm commented Apr 9, 2019 • edited Loading

codecov bot commented Apr 10, 2019 • edited Loading

Codecov Report

codecov bot commented Apr 10, 2019

Codecov Report

tovbinm commented Jun 10, 2019

Jauntbox left a comment • edited by tovbinm Loading

Choose a reason for hiding this comment

tovbinm commented Jun 14, 2019

tovbinm commented Jun 14, 2019

salesforce-cla bot commented Dec 15, 2020

tovbinm commented Apr 9, 2019 •

edited

Loading

codecov bot commented Apr 10, 2019 •

edited

Loading

Jauntbox left a comment •

edited by tovbinm

Loading