SyntaxNet fails with CUDA out of memory #173

orionr · 2016-06-02T16:33:38Z

SyntaxNet

I'm running on Ubuntu 16.04 with TensorFlow and models both built from git master branchs. Most of the models are working for me, but SyntaxNet fails with a CUDA out of memory error even though the card has 8GB total and nothing else is using those resources. Note that I'm on CUDA 8.0 RC here, but I doubt it makes a difference.

Output is as follows

~/git/models/syntaxnet$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.so locally
...
I syntaxnet/term_frequency_map.cc:101] Loaded 49 terms from syntaxnet/models/parsey_mcparseface/tag-map.
I syntaxnet/term_frequency_map.cc:101] Loaded 64036 terms from syntaxnet/models/parsey_mcparseface/word-map.
I tensorflow/core/common_runtime/gpu/gpu_device.cc:783] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:01:00.0)
INFO:tensorflow:Building training network with parameters: feature_sizes: [12 20 20] domain_sizes: [   49    51 64038]
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 6.80G (7304685312 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 6.12G (6574216704 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 5.51G (5916794880 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 4.96G (5325115392 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 4.46G (4792603648 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 4.02G (4313342976 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 3.62G (3882008576 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
E tensorflow/stream_executor/cuda/cuda_driver.cc:965] failed to allocate 3.25G (3493807616 bytes) from device: CUDA_ERROR_OUT_OF_MEMORY
...
I syntaxnet/embedding_feature_extractor.cc:35] Features: input.digit input.hyphen; input.prefix(length="2") input(1).prefix(length="2") input(2).prefix(length="2") input(3).prefix(length="2") input(-1).prefix(length="2") input(-2).prefix(length="2") input(-3).prefix(length="2") input(-4).prefix(length="2"); input.prefix(length="3") input(1).prefix(length="3") input(2).prefix(length="3") input(3).prefix(length="3") input(-1).prefix(length="3") input(-2).prefix(length="3") input(-3).prefix(length="3") input(-4).prefix(length="3"); input.suffix(length="2") input(1).suffix(length="2") input(2).suffix(length="2") input(3).suffix(length="2") input(-1).suffix(length="2") input(-2).suffix(length="2") input(-3).suffix(length="2") input(-4).suffix(length="2"); input.suffix(length="3") input(1).suffix(length="3") input(2).suffix(length="3") input(3).suffix(length="3") input(-1).suffix(length="3") input(-2).suffix(length="3") input(-3).suffix(length="3") input(-4).suffix(length="3"); input.token.word input(1).token.word input(2).token.word input(3).token.word input(-1).token.word input(-2).token.word input(-3).token.word input(-4).token.word 
I syntaxnet/embedding_feature_extractor.cc:36] Embedding names: other;prefix2;prefix3;suffix2;suffix3;words
I syntaxnet/embedding_feature_extractor.cc:37] Embedding dims: 8;16;16;16;16;64
I syntaxnet/term_frequency_map.cc:101] Loaded 64036 terms from syntaxnet/models/parsey_mcparseface/word-map.
I syntaxnet/term_frequency_map.cc:101] Loaded 64036 terms from syntaxnet/models/parsey_mcparseface/word-map.
INFO:tensorflow:Total processed documents: 0
INFO:tensorflow:Total processed documents: 0
INFO:tensorflow:Read 0 documents

It also seems weird that SyntaxNet requires the tensorflow submodule, since I've actually checked out all of that (including dependencies) and built it in a different location. Would be nice if that wasn't needed, but not a big deal.

Any thoughts out there? Much appreciated.

The text was updated successfully, but these errors were encountered:

s0okiym · 2016-06-08T01:39:19Z

I met the same error. cuda_driver.cc:965 CUDA_ERROR_OUT_OF_MEMORY when running the distributed mnist code.

orionr · 2016-06-09T16:43:42Z

Removed the GTX 1080 in title, since this might be experienced with other cards.

calberti · 2016-06-22T17:10:14Z

@orionr were you able to make any progress on this? I don't have much experience running SyntaxNet on different GPUs, but if you figured out a solution that might be useful to others.

borisstock · 2016-06-23T22:17:53Z

This issue can be fixed, by configuring thetf.Session with the following:

config.gpu_options.allow_growth = True

This seems to fix the problem for me!

zheng-xq · 2016-06-28T18:05:17Z

Does the program continue in spite of the errors? I think the errors shown here are harmless.

TensorFlow has its own BFC allocator. It asks a large chunk of memory from the Cuda driver, and does suballocate. If it runs out of memory, it double the sizes each time it asks from Cuda. When it runs out, it starts a final backpedal and starts to asks smaller amount of memory, and eventually settles on the largest memory that it can successfully gets.

This would be fatal if it fails to allocate a memory that is bigger than what has been asked. Normally it program would terminate itself at that point.

If you are really running out of memory, you can try to reduce the batch_size. Note that many of those models are developed with GPU with 12GB of memory. If it runs out of memory for GPU with less memory, reducing batch size could be a way to go.

borisstock · 2016-06-28T18:35:29Z

In my case the program did not continue. It crashed when it tried to allocate more than the 12 GB of my Titan X. I think somewhere there is an error, that it thinks that it did run out of memory and it tries and tries to allocate more and more. And somehow the "allow_groth" option fixed it for me (Cuda 7.5, CuDNN 5 on OS X). And I'm pretty sure 12 GB are more than enough for simply running the "demo.sh" script of PMCPF.

orionr · 2016-06-28T20:15:53Z

Thanks Boris. I don't have access to the machine until next week but I'll try it then.

On Jun 28, 2016, at 2:35 PM, Boris Stock [email protected] wrote:

In my case the program did not continue. It crashed when it tried to allocate more than the 12 GB of my Titan X. I think somewhere there is an error, that is thinks that it did run of memory and it tries and tries to allocate more and more. And somehow the "allow_groth" option fixed it for me (Cuda 7.5, CuDNN 5 on OS X).

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.

todtom · 2016-08-21T10:35:43Z

@orionr Hi, I have built syntaxnet succesefully, but it seems to work on cpus, rather than gpu. Could you tell me how to make it work on the gpus?

orionr · 2016-08-25T20:11:24Z

As a note, after updating tensorflow and models git repos and downgrading bazel to 0.2.2b everything works perfectly!

~/git/models/syntaxnet$ echo 'Bob brought the pizza to Alice.' | syntaxnet/demo.sh
Input: Bob brought the pizza to Alice .
Parse:
brought VBD ROOT
 +-- Bob NNP nsubj
 +-- pizza NN dobj
 |   +-- the DT det
 +-- to IN prep
 |   +-- Alice NNP pobj
 +-- . . punct

@todtom - You'll want to run ./configure inside the models/syntaxnet/tensorflow/ directory. Also make sure you have an NVIDIA card with modern CUDA capabilities. Good luck.

Shnurre · 2016-09-08T16:02:34Z

I am having the same error as is described by @orionr in the thread post.

I have Ubuntu 15.10, CUDA 7.5, cuDNN 4.0.7 and I was trying to build syntaxnet from up-to-date models git repos with bazel 0.2.2b as is described here #248 by @David-Ba . I also tried various other versions of bazel and cuDNN 5, but got the same error.
I should also be noted that Syntaxnet without GPU support builds on my machine correctly and works as it is supposed to.

It appears that I did not manage to implement successfully the solution proposed here by @borisstock . I added config.gpu_options.allow_growth = True to all the files containing other modifications of config.gpu_options - files tensorflow/tensorflow/python/framework/test_util.py, tensorflow/tensorflow/python/kernel_tests/sparse_xent_op_test.py and tensorflow/tensorflow/python/kernel_tests/sparse_tensor_dense_matmul_op_test.py. I seems though that I missed something essential.

Could please @orionr , @borisstock or anyone else who managed to solve this problem specify where exactly should config.gpu_options.allow_growth = True be added?

orionr · 2016-09-08T16:19:48Z

I actually didn't need to use allow_growth = True after updating all of the git repos and downgrading bazel. @Shnurre - what GPU are you using? Also make sure you do a bazel clean before the rebuild. I even removed my _python_build directory inside tensorflow and recreated it each time just to be safe.

Shnurre · 2016-09-08T17:25:17Z

@orionr , thank you for your quick response.
I have GTX 970 though I don't think this error is card-specific.

Yes, I always perform bazel clean before rebuilding. I also tried removing and downloding fresh models repo, manually removing .cache/bazel and completely reinstalling several versions of bazel, but nothing worked for me so far

Shnurre · 2016-09-15T12:36:51Z

@borisstock , @calberti , @orionr I am not sure if you are the right people to ask( if you are not, I am sorry for disturbing you), but should I reopen this issue or maybe open a new one?
I am having exactly the same problems as described here by @orionr but changing bazel version and updating the repos didn't help me.
I am still hoping that @borisstock or anyone else who successfully managed to implement his solution would be able to clarify the solution.

utkrist · 2017-03-24T09:28:18Z

In models/syntaxnet/syntaxnet/parser_eval.py, I made this change and it worked

gpu_opt = tf.GPUOptions(allow_growth=True)
with tf.Session(config=tf.ConfigProto(gpu_options=gpu_opt)) as sess:
    Eval(sess)

irfan-zoefit · 2017-06-02T07:27:19Z

I'm having the same issue and don't know where to put the value.

config.gpu_options.allow_growth=true
Would you specify the file.

zerodarkzone · 2017-11-30T21:26:53Z

Hi,
I keep getting the CUDA_OUT_OF_MEMORY error. I already tried the fix proposed here but it doesn't work. I compiled it with bazel 0.5.4

orionr changed the title ~~SyntaxNet fails with CUDA out of memory on GTX 1080~~ SyntaxNet fails with CUDA out of memory Jun 9, 2016

aselle assigned zheng-xq Jun 28, 2016

aselle added the triaged label Jun 28, 2016

aselle removed the triaged label Jul 28, 2016

gunan added the stat:awaiting response Waiting on input from the contributor label Aug 15, 2016

David-Ba mentioned this issue Aug 22, 2016

SyntaxNet fails to build with GPU support #248

Closed

calberti closed this as completed Aug 26, 2016

Shnurre mentioned this issue Sep 28, 2016

SintaxNet failes to allocate memory while using GPU #469

Closed

tomodachi21 mentioned this issue Jun 29, 2017

bazel GPU build error with fatal error: external/nccl_archive/src/nccl.h: No such file or directory tensorflow/serving#327

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SyntaxNet fails with CUDA out of memory #173

SyntaxNet fails with CUDA out of memory #173

orionr commented Jun 2, 2016 •

edited

Loading

s0okiym commented Jun 8, 2016 •

edited

Loading

orionr commented Jun 9, 2016

calberti commented Jun 22, 2016

borisstock commented Jun 23, 2016

zheng-xq commented Jun 28, 2016

borisstock commented Jun 28, 2016 •

edited

Loading

orionr commented Jun 28, 2016

todtom commented Aug 21, 2016

orionr commented Aug 25, 2016

Shnurre commented Sep 8, 2016 •

edited

Loading

orionr commented Sep 8, 2016

Shnurre commented Sep 8, 2016

Shnurre commented Sep 15, 2016

utkrist commented Mar 24, 2017 •

edited

Loading

irfan-zoefit commented Jun 2, 2017

zerodarkzone commented Nov 30, 2017

SyntaxNet fails with CUDA out of memory #173

SyntaxNet fails with CUDA out of memory #173

Comments

orionr commented Jun 2, 2016 • edited Loading

SyntaxNet

s0okiym commented Jun 8, 2016 • edited Loading

orionr commented Jun 9, 2016

calberti commented Jun 22, 2016

borisstock commented Jun 23, 2016

zheng-xq commented Jun 28, 2016

borisstock commented Jun 28, 2016 • edited Loading

orionr commented Jun 28, 2016

todtom commented Aug 21, 2016

orionr commented Aug 25, 2016

Shnurre commented Sep 8, 2016 • edited Loading

orionr commented Sep 8, 2016

Shnurre commented Sep 8, 2016

Shnurre commented Sep 15, 2016

utkrist commented Mar 24, 2017 • edited Loading

irfan-zoefit commented Jun 2, 2017

zerodarkzone commented Nov 30, 2017

orionr commented Jun 2, 2016 •

edited

Loading

s0okiym commented Jun 8, 2016 •

edited

Loading

borisstock commented Jun 28, 2016 •

edited

Loading

Shnurre commented Sep 8, 2016 •

edited

Loading

utkrist commented Mar 24, 2017 •

edited

Loading