WIP: TrainingStep #77

avibryant · 2015-12-21T07:51:29Z

The intent here is to capture the basic mechanics of various training steps - updateTargets, expand, prune, etc - in a way that can be reused in multiple execution environments (local, scalding, spark, ...).

For now, this is all added directly to and used only by the Local trainer. The near-term impact is that the local trainer will stream over its input data in the same way that the distributed trainers do, rather than requiring it to all be loaded into memory. For this PR to be complete, we should move the training steps to their own module (maybe also doing #51), and refactor the scalding trainer to use them.

It's very possible that this is too much or too little abstraction - right now it seems a bit overfit to the needs of the specific training steps and platforms we support, and I suspect it will be brittle going forward. (In fact, featureImportance already doesn't work with this, though I can argue that we should move to a TreeTraversal-based strategy for that which would). At the same time, I think some approach like this will be valuable going forward, and I think it's better to start going imperfectly down this path.

tixxit · 2015-12-21T15:02:41Z

brushfire-core/src/main/scala/com/stripe/brushfire/local/Trainer.scala

+  type V1
+  type V2
+
+  def prepare(trees: Map[Int, Tree[K,V,T]], instance: Instance[K,V,T]): Seq[((Int,Int,K1), V1)]


We should make a case class LeafId(forestIndex: Int, leafIndex: Int) or something - would probably be generally useful.

tixxit · 2015-12-21T15:35:24Z

I really like the idea overall. Hard to tell if it is too overfit, but I agree that we need to start somewhere!

I'm sort of giving some nit-picky comments to start, but will hopefully give some more useful ones as I understand the abstraction better.

tixxit · 2015-12-21T15:45:52Z

brushfire-core/src/main/scala/com/stripe/brushfire/local/Trainer.scala

+    val treeMap = trees.zipWithIndex.map{case (t,i) => i->t}.toMap
+    var sums1 = Map[(Int,Int,step.K1),step.V1]()
+
+    trainingData.foreach{instance =>


A bit of golf, but I think this should do roughly the same thing (including laziness):

val sums1 = MapAlgebra.rollupSum(trainingData.iterator.flatMap(step.prepare(treeMap, _)))

Wait - I think I misunderstood rollupSum. Something similar in spirit should exist though :\

avibryant · 2015-12-22T01:22:33Z

BTW one thing I'm kinda grumpy about is the distinction between TrainingStep and OutputStep. I wanted each step to be able to either produce new trees or some sidechannel output or both (for example, in the long run I'd really like to compute out-of-band error during the expand step). But ValidationStep really does look structurally quite different; it doesn't care about per-leaf or even per-tree, and is instead just computing a single value across the whole forest. A notional FeatureImportanceStep would be similar. So I'm not sure how best to model that possibility.

avibryant · 2015-12-23T22:14:51Z

This is very WIP still, but I've gone forward with the brushfire-training reorg, because having TrainingStep gives me somewhere to land prune and expandInMemory outside the Tree which is still reusable, which was previously a blocker for that. This probably ends up as an overly large PR, but oh well.

…mple

avi-stripe added 2 commits December 20, 2015 16:13

working through a TrainingStep abstraction

963e9d6

add validation step, don't use distributed stopping criterion

5bcace3

tixxit reviewed Dec 21, 2015
View reviewed changes

avi-stripe added 4 commits December 21, 2015 20:20

split out brushfire-training module

a89cf33

move TrainingStep into training package

981a721

moving stuff

940e7eb

reorganize to separate training out better

d77003b

avibryant mentioned this pull request Dec 23, 2015

Make all splits binary #76

Merged

avi-stripe added 4 commits January 26, 2016 14:43

use Erik's Lines.scala to actually stream over local input in the exa…

ce97a1f

…mple

update to new master

7001c76

builds post-merge

6c1a3f4

update versions in iris scripts

6908ef6

avibryant mentioned this pull request Feb 4, 2016

rethink Evaluator #78

Open

avibryant mentioned this pull request Mar 27, 2016

Bounded-memory local training #89

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: TrainingStep #77

WIP: TrainingStep #77

avibryant commented Dec 21, 2015

tixxit Dec 21, 2015

tixxit commented Dec 21, 2015

tixxit Dec 21, 2015

tixxit Dec 21, 2015

avibryant commented Dec 22, 2015

avibryant commented Dec 23, 2015

WIP: TrainingStep #77

Are you sure you want to change the base?

WIP: TrainingStep #77

Conversation

avibryant commented Dec 21, 2015

tixxit Dec 21, 2015

Choose a reason for hiding this comment

tixxit commented Dec 21, 2015

tixxit Dec 21, 2015

Choose a reason for hiding this comment

tixxit Dec 21, 2015

Choose a reason for hiding this comment

avibryant commented Dec 22, 2015

avibryant commented Dec 23, 2015