error-minimizing pruner #24

avibryant · 2014-12-21T01:40:04Z

This is only really relevant to people building single-tree models, but you should be able to prune a single tree to minimize validation error

avibryant · 2015-02-09T17:56:25Z

To elaborate a bit more on the steps that would be needed here:

You'd need to add a method to Tree that took a Map[Int,T] with the validation distributions for each leaf, by ID, as well as an Error and Voter.
It should work its way recursively up the tree from the leaves, in each case checking the following:
- Let's call the leaf training distributions TL and TR (for left and right) and the leaf validation distributions VL and VR (though actually our code should generalize to any number of children)
- Let's use E(TL,VL) to denote the error object produced by comparing the training and validation distributions (this actually looks like error.create(tl, voter.combine(Some(vl)))).
- We have semigroups for both distributions and errors; let's use + to denote combining them.
- We want to prune these leaves iff E(TL + TR, VL + VR) <= E(TL,VL) + E(TR,VR)
Once we have this method on Tree, we want a method on Trainer that will construct the Map[Int,T] from the trainingData for each tree, and then transform the trees using the prune method.

roban · 2015-02-26T19:35:16Z

In progress at #36

roban · 2015-02-26T22:46:47Z

Closed by #36

avibryant mentioned this issue Dec 29, 2014

per-node error output #25

Open

roban self-assigned this Feb 19, 2015

roban closed this as completed Feb 26, 2015

Provide feedback