v2.0.1

milakov released this 23 Nov 21:39

· 128 commits to master since this release

0586d45

Multiple improvements to reduce total buffer sizes, allows running larger chunks, (3x for ImageNet):
- Taking buffer sizes into account when coloring graph
- Maxout, ReLU, and MaxSubsampling layers consume much less memory in CUDA backend
- Action graph is optimized to exclude unnecessary concurrency
Migrated to cuDNN v3
Reusing CUDA streams
Allocating chunk of mem for fixed working buffers - improves perf
Few bug-fixes

Assets 2

Provide feedback