Lazily initialize CUDA devices (take 2) #613

colesbury · 2016-11-25T23:53:08Z

Previously, cutorch would initialize every CUDA device and enable P2P
access between all pairs. This slows down start-up, especially with 8
devices. Now, THCudaInit does not initialize any devices and P2P access
is enabled lazily. Setting the random number generator seed also does
not initialize the device until random numbers are actually used.

I've updated the Storage copy code to delegate the Tensor copy code. This
fixes the issues with p2p not being enabled and adds proper inter-GPU
synchronization (see #612)

Previously, cutorch would initialize every CUDA device and enable P2P access between all pairs. This slows down start-up, especially with 8 devices. Now, THCudaInit does not initialize any devices and P2P access is enabled lazily. Setting the random number generator seed also does not initialize the device until random numbers are actually used.

soumith · 2016-11-26T05:32:30Z

thanks!

soumith merged commit e2051b6 into torch:master Nov 26, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lazily initialize CUDA devices (take 2) #613

Lazily initialize CUDA devices (take 2) #613

colesbury commented Nov 25, 2016

soumith commented Nov 26, 2016

Lazily initialize CUDA devices (take 2) #613

Lazily initialize CUDA devices (take 2) #613

Conversation

colesbury commented Nov 25, 2016

soumith commented Nov 26, 2016