v2.0.1
- Multiple improvements to reduce total buffer sizes, allows running larger chunks, (3x for ImageNet):
- Taking buffer sizes into account when coloring graph
- Maxout, ReLU, and MaxSubsampling layers consume much less memory in CUDA backend
- Action graph is optimized to exclude unnecessary concurrency
- Migrated to cuDNN v3
- Reusing CUDA streams
- Allocating chunk of mem for fixed working buffers - improves perf
- Few bug-fixes