-
Notifications
You must be signed in to change notification settings - Fork 140
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OutOfRangeError in /data/io/read_tfrecord.py at line number 80 #5
Comments
I met the same problem. I think it's because this training process is not reusing the data. When training step is excessing the number of your training data, this error happens. I think you can rewrite the code of data input part to fix this error |
In fact, the code is no problem, it may be caused by the environment configuration error. I have modified the code, please update. |
It still produces the same error. Can you tell me what change you made? |
@1991viet The author yangxue0827 has already modified the code so I think the problem should be solved. You can look into the details of code changing at Jan 30 or around that time. |
I have the same problem, did anyone solve the problem, thanks. |
I am getting the same error. I have checked the tfrecord path. Path seems to be correct and also tfrecord creation did't give any problem. Is there any solution for this problem? |
who solve this problem... I also get this problem... |
I am getting the same error too... |
The same problem... |
2018-12-13 07:07:10.108873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:05:00.0, compute capability: 6.1) Caused by op u'get_batch/batch', defined at: OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0) it seems to be the same problem when |
I found out that the reason might be some of the xml file. There are some image has no gtbox, we have to skip the data when we convert them to tfrecord!
|
The same issue coming across when I bring my own dataset. |
I solve the problem by ensuring the accuracy of original dataset, I guess any data error(include: data_path, data_fromat, data_shape, etc) will cause this issues, just guess. |
In my case, the data format is wrong. My .xml file record the bndbox by (Xmin、Xmax、Ymin、Ymax).
|
I think I had found what causes the problem. In my case, I encountered the same error after I removed some training examples by applying some filters in data/io/convert_data_to_tfrecord.py. It looks like you have to close the tfrecord writer handle after you finish the conversion to prevent the problem from happening. just put a line: writer.close() to the end of data/io/convert_data_to_tfrecord.py, the problem will be gone. |
I was able to overcome this error in Google Colab by reducing the amount of data i fed into the tfrecord. My original tfrecord for all my data was around 16Gb. I broke up my data into smaller ~3Gb tfrecords (this was about 1000 1024x1024 images with annotations). I then trained a detector using the first tfrecord, and then training ended, I resumed training with the next tfrecord. |
yangxue0827/FPN_Tensorflow#35 (comment) 这里的data/tfrecord文件夹需要新建,源代码data下面没有 |
HI
I get the following error when trying to train the model using train1.py on my custom data set. I am using resnet-101 as the back end. Can you please help me out here?
Traceback (most recent call last):
File "train1.py", line 262, in
train()
File "train1.py", line 224, in train
fast_rcnn_total_loss, total_loss, train_op])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fif
o_queue, get_batch/batch/n)]]
Caused by op u'get_batch/batch', defined at:
File "train1.py", line 262, in
train()
File "train1.py", line 36, in train
is_training=True)
File "../data/io/read_tfrecord.py", line 86, in next_batch
dynamic_pad=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 922, in batch
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 716, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 457, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1342, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access
OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fif
o_queue, get_batch/batch/n)]]
The text was updated successfully, but these errors were encountered: