Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lazily initialize CUDA devices (take 2) #613

Merged
merged 1 commit into from
Nov 26, 2016
Merged

Conversation

colesbury
Copy link
Contributor

Previously, cutorch would initialize every CUDA device and enable P2P
access between all pairs. This slows down start-up, especially with 8
devices. Now, THCudaInit does not initialize any devices and P2P access
is enabled lazily. Setting the random number generator seed also does
not initialize the device until random numbers are actually used.

I've updated the Storage copy code to delegate the Tensor copy code. This
fixes the issues with p2p not being enabled and adds proper inter-GPU
synchronization (see #612)

Previously, cutorch would initialize every CUDA device and enable P2P
access between all pairs. This slows down start-up, especially with 8
devices. Now, THCudaInit does not initialize any devices and P2P access
is enabled lazily. Setting the random number generator seed also does
not initialize the device until random numbers are actually used.
@soumith soumith merged commit e2051b6 into torch:master Nov 26, 2016
@soumith
Copy link
Member

soumith commented Nov 26, 2016

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants