Bounded-memory local training #89

avibryant · 2016-03-27T01:02:25Z

Some work from the plane, ?r @Striation @tixxit

This has the same goal as #77 but is deliberately worse code (for now) in the interests of being less invasive and getting merged quickly. It completely avoids making any useful abstractions and instead just provides the most direct implementation of a single-node, single-pass-per-expansion local trainer.

The only interesting thing it does is this: by streaming over the training data we avoid using O(training set) memory, but if we try to expand an entire level at once, we still have an O(2^depth * features) problem. So expand takes a parameter of the maximum number of tree leaves to try to expand, per tree, in any one pass, and randomly picks which ones to do (using something like reservoir sampling to get a uniform sample of leaves that don't meet the stopping criteria). This lets you trade off memory use vs. performance (by forcing more passes but capping the memory).

There's still the need to keep all of the trees in memory at once, so it's not truly constant memory, but that's harder to avoid. If it becomes a problem we can look at using bonsai representations even during training...

erik-stripe · 2016-03-28T14:47:30Z

brushfire-core/src/main/scala/com/stripe/brushfire/local/Trainer.scala

+  def containsKey(key: A): Boolean = randValue(key) <= threshold
+  def update(key: A, value: B) {
+    if(containsKey(key))
+      mapValues += key -> value


Since we're just mutating these structures in place why don't we use mutable.Map for better performance?

avibryant · 2016-04-06T15:03:34Z

Per IRL conversation, I'm going to merge this and @Striation might do a follow-up PR with his mutable optimizations.

avi-stripe added 7 commits March 23, 2016 10:30

switch to using Lines for local example

dec1655

single-pass validate

48deb83

one pass updateTargets

732c107

one-pass expand builds but is buggy

fdd9500

make sure to use the new trees

605ca39

trying to limit the amount of RAM used per expansion

9e5e712

reservoir sampling of leaves works

ab147a8

erik-stripe reviewed Mar 28, 2016
View reviewed changes

avibryant merged commit 9903bc7 into master Apr 6, 2016

tixxit deleted the avi-local branch June 4, 2016 18:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bounded-memory local training #89

Bounded-memory local training #89

avibryant commented Mar 27, 2016

erik-stripe Mar 28, 2016

avibryant commented Apr 6, 2016

Bounded-memory local training #89

Bounded-memory local training #89

Conversation

avibryant commented Mar 27, 2016

erik-stripe Mar 28, 2016

Choose a reason for hiding this comment

avibryant commented Apr 6, 2016