-
Notifications
You must be signed in to change notification settings - Fork 45.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SyntaxNet fails with CUDA out of memory #173
Comments
I met the same error. cuda_driver.cc:965 CUDA_ERROR_OUT_OF_MEMORY when running the distributed mnist code. |
Removed the GTX 1080 in title, since this might be experienced with other cards. |
@orionr were you able to make any progress on this? I don't have much experience running SyntaxNet on different GPUs, but if you figured out a solution that might be useful to others. |
This issue can be fixed, by configuring the
This seems to fix the problem for me! |
Does the program continue in spite of the errors? I think the errors shown here are harmless. TensorFlow has its own BFC allocator. It asks a large chunk of memory from the Cuda driver, and does suballocate. If it runs out of memory, it double the sizes each time it asks from Cuda. When it runs out, it starts a final backpedal and starts to asks smaller amount of memory, and eventually settles on the largest memory that it can successfully gets. This would be fatal if it fails to allocate a memory that is bigger than what has been asked. Normally it program would terminate itself at that point. If you are really running out of memory, you can try to reduce the batch_size. Note that many of those models are developed with GPU with 12GB of memory. If it runs out of memory for GPU with less memory, reducing batch size could be a way to go. |
In my case the program did not continue. It crashed when it tried to allocate more than the 12 GB of my Titan X. I think somewhere there is an error, that it thinks that it did run out of memory and it tries and tries to allocate more and more. And somehow the "allow_groth" option fixed it for me (Cuda 7.5, CuDNN 5 on OS X). And I'm pretty sure 12 GB are more than enough for simply running the "demo.sh" script of PMCPF. |
Thanks Boris. I don't have access to the machine until next week but I'll try it then.
|
@orionr Hi, I have built syntaxnet succesefully, but it seems to work on cpus, rather than gpu. Could you tell me how to make it work on the gpus? |
As a note, after updating
@todtom - You'll want to run |
I am having the same error as is described by @orionr in the thread post. I have Ubuntu 15.10, CUDA 7.5, cuDNN 4.0.7 and I was trying to build syntaxnet from up-to-date models git repos with bazel 0.2.2b as is described here #248 by @David-Ba . I also tried various other versions of bazel and cuDNN 5, but got the same error. It appears that I did not manage to implement successfully the solution proposed here by @borisstock . I added config.gpu_options.allow_growth = True to all the files containing other modifications of config.gpu_options - files tensorflow/tensorflow/python/framework/test_util.py, tensorflow/tensorflow/python/kernel_tests/sparse_xent_op_test.py and tensorflow/tensorflow/python/kernel_tests/sparse_tensor_dense_matmul_op_test.py. I seems though that I missed something essential. Could please @orionr , @borisstock or anyone else who managed to solve this problem specify where exactly should config.gpu_options.allow_growth = True be added? |
I actually didn't need to use |
@orionr , thank you for your quick response. Yes, I always perform bazel clean before rebuilding. I also tried removing and downloding fresh |
@borisstock , @calberti , @orionr I am not sure if you are the right people to ask( if you are not, I am sorry for disturbing you), but should I reopen this issue or maybe open a new one? |
In models/syntaxnet/syntaxnet/parser_eval.py, I made this change and it worked
|
I'm having the same issue and don't know where to put the value.
|
Hi, |
SyntaxNet
I'm running on Ubuntu 16.04 with TensorFlow and models both built from git
master
branchs. Most of the models are working for me, but SyntaxNet fails with a CUDA out of memory error even though the card has 8GB total and nothing else is using those resources. Note that I'm on CUDA 8.0 RC here, but I doubt it makes a difference.Output is as follows
It also seems weird that SyntaxNet requires the tensorflow submodule, since I've actually checked out all of that (including dependencies) and built it in a different location. Would be nice if that wasn't needed, but not a big deal.
Any thoughts out there? Much appreciated.
The text was updated successfully, but these errors were encountered: