OutOfRangeError in /data/io/read_tfrecord.py at line number 80 #5

shaileshvedula · 2018-01-22T22:03:46Z

HI

I get the following error when trying to train the model using train1.py on my custom data set. I am using resnet-101 as the back end. Can you please help me out here?

Traceback (most recent call last):
File "train1.py", line 262, in
train()
File "train1.py", line 224, in train
fast_rcnn_total_loss, total_loss, train_op])
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 895, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1124, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1321, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1340, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fif
o_queue, get_batch/batch/n)]]

Caused by op u'get_batch/batch', defined at:
File "train1.py", line 262, in
train()
File "train1.py", line 36, in train
is_training=True)
File "../data/io/read_tfrecord.py", line 86, in next_batch
dynamic_pad=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 922, in batch
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 716, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 457, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 1342, in _queue_dequeue_many_v2
timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 767, in apply_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
original_op=self._default_original_op, op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1204, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/cpu:0"](get_batch/batch/padding_fif
o_queue, get_batch/batch/n)]]

lyz0305 · 2018-01-30T05:54:04Z

I met the same problem. I think it's because this training process is not reusing the data. When training step is excessing the number of your training data, this error happens. I think you can rewrite the code of data input part to fix this error

yangxue0827 · 2018-01-30T07:12:08Z

In fact, the code is no problem, it may be caused by the environment configuration error. I have modified the code, please update.

shaileshvedula · 2018-02-14T15:32:53Z

It still produces the same error. Can you tell me what change you made?

lyz0305 · 2018-03-22T04:58:27Z

@1991viet The author yangxue0827 has already modified the code so I think the problem should be solved. You can look into the details of code changing at Jan 30 or around that time.

hinkeret · 2018-08-16T10:56:16Z

I have the same problem, did anyone solve the problem, thanks.

SandeepSreenivasan · 2018-09-20T10:59:45Z

I am getting the same error. I have checked the tfrecord path. Path seems to be correct and also tfrecord creation did't give any problem. Is there any solution for this problem?

heping0228 · 2018-10-22T06:24:32Z

who solve this problem... I also get this problem...

Elsanna · 2018-11-28T02:30:45Z

I am getting the same error too...

OneSilverBullet · 2018-12-10T01:55:08Z

The same problem...

lemonaha · 2018-12-13T09:25:59Z

2018-12-13 07:07:10.108873: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Graphics Device, pci bus id: 0000:05:00.0, compute capability: 6.1)
restore model
2018-12-13 07:10:08.417504: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 9
[[Node: get_batch/Reshape_1 = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](get_batch/DecodeRaw_1/_1061, get_batch/Reshape_1/shape)]]
2018-12-13 07:10:08.417533: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 9
[[Node: get_batch/Reshape_1 = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](get_batch/DecodeRaw_1/_1061, get_batch/Reshape_1/shape)]]
2018-12-13 07:10:08.417514: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 9
[[Node: get_batch/Reshape_1 = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](get_batch/DecodeRaw_1/_1061, get_batch/Reshape_1/shape)]]
2018-12-13 07:10:08.418403: W tensorflow/core/framework/op_kernel.cc:1192] Invalid argument: Input to reshape is a tensor with 1 values, but the requested shape requires a multiple of 9
[[Node: get_batch/Reshape_1 = Reshape[T=DT_INT32, Tshape=DT_INT32, _device="/job:localhost/replica:0/task:0/device:GPU:0"](get_batch/DecodeRaw_1/_1061, get_batch/Reshape_1/shape)]]
2018-12-13 07:10:58: step1 image_name:1534151804311.jpg |
rpn_loc_loss:0.244331941009 | rpn_cla_loss:1.68899667263 |
rpn_total_loss:1.93332862854 |
fast_rcnn_loc_loss:0.0440079607069 | fast_rcnn_cla_loss:1.43304491043 |
fast_rcnn_loc_rotate_loss:0.203424081206 | fast_rcnn_cla_rotate_loss:1.65664339066 |
fast_rcnn_total_loss:3.33712053299 |
total_loss:6.10644292831 | pre_cost_time:68.6907980442s
2018-12-13 07:12:33: step11 image_name:942.jpg |
rpn_loc_loss:0.492506951094 | rpn_cla_loss:3.72918891907 |
rpn_total_loss:4.22169589996 |
fast_rcnn_loc_loss:0.0 | fast_rcnn_cla_loss:3.67863805195e-07 |
fast_rcnn_loc_rotate_loss:0.0 | fast_rcnn_cla_rotate_loss:3.81841331887e-08 |
fast_rcnn_total_loss:4.06047945489e-07 |
total_loss:5.05774736404 | pre_cost_time:0.319267988205s
2018-12-13 07:12:36: step21 image_name:114527781.jpg |
rpn_loc_loss:0.22121527791 | rpn_cla_loss:0.155304968357 |
rpn_total_loss:0.376520246267 |
fast_rcnn_loc_loss:0.0451134853065 | fast_rcnn_cla_loss:0.325144588947 |
fast_rcnn_loc_rotate_loss:0.188544362783 | fast_rcnn_cla_rotate_loss:0.424986839294 |
fast_rcnn_total_loss:0.983789265156 |
total_loss:2.19642567635 | pre_cost_time:0.266241073608s
2018-12-13 07:12:39: step31 image_name:111044010.jpg |
rpn_loc_loss:0.0363116413355 | rpn_cla_loss:0.156413659453 |
rpn_total_loss:0.192725300789 |
fast_rcnn_loc_loss:0.0495819486678 | fast_rcnn_cla_loss:0.0632207170129 |
fast_rcnn_loc_rotate_loss:0.143409430981 | fast_rcnn_cla_rotate_loss:0.073470339179 |
fast_rcnn_total_loss:0.329682469368 |
total_loss:1.35856246948 | pre_cost_time:0.27028298378s
2018-12-13 07:12:42: step41 image_name:670.jpg |
rpn_loc_loss:0.401484191418 | rpn_cla_loss:0.393361717463 |
rpn_total_loss:0.794845938683 |
fast_rcnn_loc_loss:0.0 | fast_rcnn_cla_loss:0.0340446382761 |
fast_rcnn_loc_rotate_loss:0.0 | fast_rcnn_cla_rotate_loss:0.0378245897591 |
fast_rcnn_total_loss:0.0718692243099 |
total_loss:1.70288407803 | pre_cost_time:28.3492949009s
2018-12-13 07:13:41: step51 image_name:1534935113519.jpg |
rpn_loc_loss:0.0820427164435 | rpn_cla_loss:0.138145014644 |
rpn_total_loss:0.220187723637 |
fast_rcnn_loc_loss:0.0 | fast_rcnn_cla_loss:0.00435423571616 |
fast_rcnn_loc_rotate_loss:0.0 | fast_rcnn_cla_rotate_loss:0.00467131333426 |
fast_rcnn_total_loss:0.00902554951608 |
total_loss:1.06538057327 | pre_cost_time:0.26727604866s
2018-12-13 07:14:08: step61 image_name:1535085665101.jpg |
rpn_loc_loss:0.116681322455 | rpn_cla_loss:0.216219723225 |
rpn_total_loss:0.332901060581 |
fast_rcnn_loc_loss:0.0 | fast_rcnn_cla_loss:0.00808826368302 |
fast_rcnn_loc_rotate_loss:0.0 | fast_rcnn_cla_rotate_loss:0.00720638176426 |
fast_rcnn_total_loss:0.0152946449816 |
total_loss:1.1843521595 | pre_cost_time:0.269505977631s
2018-12-13 07:14:10.270148: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
2018-12-13 07:14:10.270245: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
2018-12-13 07:14:10.270418: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
2018-12-13 07:14:10.270677: W tensorflow/core/framework/op_kernel.cc:1192] Out of range: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]
Traceback (most recent call last):
File "train1.py", line 264, in
train()
File "train1.py", line 35, in train
next_batch(dataset_name=cfgs.DATASET_NAME,
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 889, in run
run_metadata_ptr)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1120, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1317, in _do_run
options, run_metadata)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/client/session.py", line 1336, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.OutOfRangeError: PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

Caused by op u'get_batch/batch', defined at:
File "train1.py", line 264, in
train()
File "train1.py", line 35, in train
next_batch(dataset_name=cfgs.DATASET_NAME,
File "../data/io/read_tfrecord.py", line 87, in next_batch
dynamic_pad=True)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 927, in batch
name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/training/input.py", line 722, in _batch
dequeued = queue.dequeue_many(batch_size, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/data_flow_ops.py", line 464, in dequeue_many
self._queue_ref, n=n, component_types=self._dtypes, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/gen_data_flow_ops.py", line 2418, in _queue_dequeue_many_v2
component_types=component_types, timeout_ms=timeout_ms, name=name)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/op_def_library.py", line 787, in _apply_op_helper
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 2956, in create_op
op_def=op_def)
File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/framework/ops.py", line 1470, in init
self._traceback = self._graph._extract_stack() # pylint: disable=protected-access

OutOfRangeError (see above for traceback): PaddingFIFOQueue '_1_get_batch/batch/padding_fifo_queue' is closed and has insufficient elements (requested 1, current size 0)
[[Node: get_batch/batch = QueueDequeueManyV2[component_types=[DT_STRING, DT_FLOAT, DT_INT32, DT_INT32], timeout_ms=-1, _device="/job:localhost/replica:0/task:0/device:CPU:0"](get_batch/batch/padding_fifo_queue, get_batch/batch/n)]]

it seems to be the same problem when python train1.py

lemonaha · 2018-12-14T02:06:40Z

I found out that the reason might be some of the xml file. There are some image has no gtbox, we have to skip the data when we convert them to tfrecord!
just add after line 97

img_height, img_width, gtbox_label = read_xml_gtbox_and_label(xml)
         if gtbox_label.shape[0] <= 0:
             continue

ChaoFan96 · 2018-12-27T08:45:56Z

The same issue coming across when I bring my own dataset.

ChaoFan96 · 2018-12-27T09:21:25Z

The same issue coming across when I bring my own dataset.

I solve the problem by ensuring the accuracy of original dataset, I guess any data error(include: data_path, data_fromat, data_shape, etc) will cause this issues, just guess.

EricYangsw · 2019-04-12T05:16:53Z

In my case, the data format is wrong. My .xml file record the bndbox by (Xmin、Xmax、Ymin、Ymax).
After I converted it became (x1, y1, x2, y2, x3, y3, x4, y4) in convert_data_to_tfrecord.py (function: read_xml_gtbox_and_label ), it can train now.

            # original code:
            if child_item.tag == 'bndbox':
                tmp_box = []
                for node in child_item:
                    tmp_box.append(int(node.text))


            # My Modification:
            if child_item.tag == 'bndbox':
                orig_tmp_box = []
                tmp_box = []
                for node in child_item:
                    orig_tmp_box.append(int(node.text))
                
                for my_idx in [0,1,2,1,2,3,0,3,]:
                    tmp_box.append(orig_tmp_box[my_idx])
                assert label is not None, 'label is none, error'
                tmp_box.append(label)
                box_list.append(tmp_box)

viibridges · 2019-04-28T01:57:12Z

I think I had found what causes the problem. In my case, I encountered the same error after I removed some training examples by applying some filters in data/io/convert_data_to_tfrecord.py. It looks like you have to close the tfrecord writer handle after you finish the conversion to prevent the problem from happening. just put a line: writer.close() to the end of data/io/convert_data_to_tfrecord.py, the problem will be gone.

Zappytoes · 2020-04-10T17:01:52Z

I was able to overcome this error in Google Colab by reducing the amount of data i fed into the tfrecord. My original tfrecord for all my data was around 16Gb. I broke up my data into smaller ~3Gb tfrecords (this was about 1000 1024x1024 images with annotations). I then trained a detector using the first tfrecord, and then training ended, I resumed training with the next tfrecord.

HUI11126 · 2020-12-29T08:42:47Z

yangxue0827/FPN_Tensorflow#35 (comment)
按照2楼的方法，先跑一下data/io/convert_data_to_tfrecord.py，在data/tfrecord下面会生成一个比数据集大很多的文件，然后跑train.py就行了。

这里的data/tfrecord文件夹需要新建，源代码data下面没有

HMCBSJ mentioned this issue Nov 4, 2019

运行错误 DetectionTeamUCAS/R2CNN-Plus-Plus_Tensorflow#36

Open

AllentDan mentioned this issue Dec 28, 2019

OutOfRangeError DetectionTeamUCAS/FPN_Tensorflow#123

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OutOfRangeError in /data/io/read_tfrecord.py at line number 80 #5

OutOfRangeError in /data/io/read_tfrecord.py at line number 80 #5

shaileshvedula commented Jan 22, 2018

lyz0305 commented Jan 30, 2018

yangxue0827 commented Jan 30, 2018

shaileshvedula commented Feb 14, 2018 •

edited

Loading

lyz0305 commented Mar 22, 2018

hinkeret commented Aug 16, 2018

SandeepSreenivasan commented Sep 20, 2018

heping0228 commented Oct 22, 2018

Elsanna commented Nov 28, 2018

OneSilverBullet commented Dec 10, 2018

lemonaha commented Dec 13, 2018

lemonaha commented Dec 14, 2018 •

edited

Loading

ChaoFan96 commented Dec 27, 2018 •

edited

Loading

ChaoFan96 commented Dec 27, 2018

EricYangsw commented Apr 12, 2019

viibridges commented Apr 28, 2019

Zappytoes commented Apr 10, 2020

HUI11126 commented Dec 29, 2020 •

edited

Loading

OutOfRangeError in /data/io/read_tfrecord.py at line number 80 #5

OutOfRangeError in /data/io/read_tfrecord.py at line number 80 #5

Comments

shaileshvedula commented Jan 22, 2018

lyz0305 commented Jan 30, 2018

yangxue0827 commented Jan 30, 2018

shaileshvedula commented Feb 14, 2018 • edited Loading

lyz0305 commented Mar 22, 2018

hinkeret commented Aug 16, 2018

SandeepSreenivasan commented Sep 20, 2018

heping0228 commented Oct 22, 2018

Elsanna commented Nov 28, 2018

OneSilverBullet commented Dec 10, 2018

lemonaha commented Dec 13, 2018

lemonaha commented Dec 14, 2018 • edited Loading

ChaoFan96 commented Dec 27, 2018 • edited Loading

ChaoFan96 commented Dec 27, 2018

EricYangsw commented Apr 12, 2019

viibridges commented Apr 28, 2019

Zappytoes commented Apr 10, 2020

HUI11126 commented Dec 29, 2020 • edited Loading

shaileshvedula commented Feb 14, 2018 •

edited

Loading

lemonaha commented Dec 14, 2018 •

edited

Loading

ChaoFan96 commented Dec 27, 2018 •

edited

Loading

HUI11126 commented Dec 29, 2020 •

edited

Loading