-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource exhausted: OOM when allocating tensor #10
Comments
Hi! This is kind of weird because the default batch size is not that large. Reducing the batch size might help. |
Thank you for your reply.
It loads all the train_data into the feed_dict. In addition when I use nvidia-smi to find out how gpu exhausted, when running the codes My gpus almost ues all the memory as show behide: +-----------------------------------------------------------------------------+ How can I solve the problem,please? train_data size:14747 |
I tried to use : Althought the Gpu memory use less, but when runing eval, it still crash , shows omm. So I tried to use one Gpu to train and anthor gpu to eval, using the codes below: with tf.device('/gpu:0'):
But it not work, gpu 0 is still use for eval, showing "W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at conv_ops.cc:673 : Resource exhausted: OOM when allocating tensor with shape[442410,128,9,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc" |
how do you solve this problem finally? thanks |
Dr. Wang, thank you so much for your wonderful work.
When I run last step : python main.py, an error occored:
2019-07-15 17:38:14.500279: W T:\src\github\tensorflow\tensorflow\core\framework\op_kernel.cc:1318] OP_REQUIRES failed at conv_ops.cc:673 : Resource exhausted: OOM when allocating tensor with shape[442410,128,9,1] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
My gpu information is :
2019-07-15 17:37:54.354919: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:05:00.0
totalMemory: 11.00GiB freeMemory: 9.11GiB
2019-07-15 17:37:54.538331: I T:\src\github\tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 1 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.683
pciBusID: 0000:09:00.0
totalMemory: 11.00GiB freeMemory: 9.11GiB
other information before the error occur is :
2019-07-15 17:38:14.475725: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:674] 1 Chunks of size 2265139200 totalling 2.11GiB
2019-07-15 17:38:14.480071: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:678] Sum Total of in-use chunks: 5.72GiB
2019-07-15 17:38:14.484854: I T:\src\github\tensorflow\tensorflow\core\common_runtime\bfc_allocator.cc:680] Stats:
Limit: 9244818801
InUse: 6142976256
MaxInUse: 6369439488
NumAllocs: 16017
MaxAllocSize: 2265139200
I wonder how can I solve the problem,thank you
The text was updated successfully, but these errors were encountered: