Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster_rcnn_r50_1x 训练完成,模型已导出,但在预测时报错。 #544

Closed
wyc880622 opened this issue Apr 23, 2020 · 12 comments
Closed
Assignees

Comments

@wyc880622
Copy link

模型导出应该没有问题。
报错内容如下所示:请问大师该如何解决?

(paddle) G:\Halcon\Paddle\PaddleDetection>python -u tools/infer.py -c configs/faster_rcnn_r50_1x.yml -o weights=inference_model\faster_rcnn_r50_1x --infer_img=demo/1.jpg --output_dir=infer_output
W0423 19:38:42.788797 21428 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 61, Driver API Version: 10.1, Runtime API Version: 10.0
W0423 19:38:42.800806 21428 device_context.cc:245] device: 0, cuDNN Version: 7.6.
2020-04-23 19:38:44,095-INFO: Loading parameters from inference_model\faster_rcnn_r50_1x...
2020-04-23 19:38:44,095-WARNING: inference_model\faster_rcnn_r50_1x.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-04-23 19:38:44,095-WARNING: inference_model\faster_rcnn_r50_1x.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-04-23 19:38:44,098-WARNING: variable file [ inference_model/faster_rcnn_r50_1x/params inference_model/faster_rcnn_r50_1x/model ] not used
2020-04-23 19:38:44,098-WARNING: variable file [ inference_model/faster_rcnn_r50_1x/params inference_model/faster_rcnn_r50_1x/model ] not used
C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\paddle\fluid\executor.py:804: UserWarning: There are no operators in the program to be executed. If you pass Program manually, please use fluid.program_guard to ensure the current Program is being used.
warnings.warn(error_info)
2020-04-23 19:38:44,100-INFO: Load categories from G:/Halcon/Paddle/PaddleDetection/dataset/coco/annotations/instance_test.json
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\paddle\fluid\executor.py:782: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "tools/infer.py", line 271, in
main()
File "tools/infer.py", line 185, in main
return_numpy=False)
File "C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 783, in run
six.reraise(*sys.exc_info())
File "C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\six.py", line 703, in reraise
raise value
File "C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 778, in run
use_program_cache=use_program_cache)
File "C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 831, in _run_impl
use_program_cache=use_program_cache)
File "C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 902, in _run_program
self._feed_data(program, feed, feed_var_name, scope)
File "C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 580, in _feed_data
check_feed_shape_type(var, cur_feed)
File "C:\Users\Administrator\Anaconda3\envs\paddle\lib\site-packages\paddle\fluid\executor.py", line 230, in check_feed_shape_type
(var.name, len(var.shape), var.shape, feed_shape))
ValueError: The fed Variable 'image' should have dimensions = 4, shape = (-1, 3, 800, 1333), but received fed shape [1, 3, 800, 1067] on each device

@jerrywgz
Copy link
Collaborator

这个是因为paddle比较新的版本会对网络定义shape和实际输入shape做检查,可以更新下检测库,release/0.2分支就已经适配了

@yghstill
Copy link
Collaborator

看您使用,还有问题就是导出的inference_model模型请使用INFERENCE方法进行预测。不然加载不了模型,检测不到结果。
如果想要使用tools/infer.py进行预测,不用导出模型,请直接使用训练完成的模型进行预测即可。

@wyc880622
Copy link
Author

release/0.2分支就已经适配了

谢谢你的回答,我在用的就是0.2版本paddleDetection。是和你说的是一个意思吗。

@jerrywgz
Copy link
Collaborator

可以看下你的配置文件吗,另外确实需要再确认下你infer时加载的模型是否是训练时保存的模型

@wyc880622
Copy link
Author

配置文件如下:
configs.zip
infer加载的模型是导出的模型。但我 用直接导出的模型预测,也同样是报一样的错误。现在不知道应该如何处理了。谢谢

@wyc880622
Copy link
Author

image
这个文件是 导出模型和没有导出模型预测,结果都是eof错误。我赢怎么更新下检测库

@jerrywgz
Copy link
Collaborator

对于infer.py的问题,看了下你的配置文件,你在faster_reader.yml中的配置和https://github.com/PaddlePaddle/PaddleDetection/blob/release/0.2/configs/faster_reader.yml 这里的配置还有些出入,这个也可能是release/0.2期间有过更新,你本地可以再更新下这个分支,
image 主要是要将配置文件中红框的部分去掉才能正常预测

关于cpp_infer的问题,是由于cpp_demo这个配置文件是使用rcnn系列模型的,你这边用的是yolo的模型,这两个模型在输入部分会有些出入,这个我推荐使用master分支,export_model.py会自动根据你测试时的配置生成cpp_infer所需要的配置文件

@wyc880622
Copy link
Author

我已将yml进行了修改,进行了推理程序没有错误。但是预测输出的图片并没有框上,我不知道为什么,难道深度学习训练文件出现了问题,我感觉不太可能,怎么也可以识别出1个来。
image
请问我还应该怎么解决?

@yghstill
Copy link
Collaborator

@wyc880622 请使用--draw_threshold参数调小可视化得分阈值,如果阈值很小也检测不到,那就是训练模型出了问题。另外,是什么模型,loss收敛到什么程度,用训练集图像测试也检测不到吗?

@wyc880622
Copy link
Author

谢谢你回复,我map=0 应该是没有训练好,后来我又从新训练,在保存节点模型时报错,报错内容如下,我训练了2次都出现了这种情况。
请问是哪里的错误?
image
2020-04-24 18:24:58,539-INFO: iter: 9980, lr: 0.010000, 'loss_cls': '0.003621',
'loss_bbox': '0.000801', 'loss_rpn_cls': '0.000050', 'loss_rpn_bbox': '0.000137'
, 'loss': '0.004480', time: 1.147, eta: 1 day, 4:40:48
2020-04-24 18:25:21,467-INFO: iter: 10000, lr: 0.010000, 'loss_cls': '0.002752',
'loss_bbox': '0.000784', 'loss_rpn_cls': '0.000061', 'loss_rpn_bbox': '0.000090
', 'loss': '0.004001', time: 1.147, eta: 1 day, 4:40:07
2020-04-24 18:25:21,468-INFO: Save model to output\faster_rcnn_r50_1x\10000.
I0424 18:25:24.260897 5308 parallel_executor.cc:440] The Program will be execut
ed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed
in parallel.
I0424 18:25:24.267897 5308 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_tr
ainers:1
2020-04-24 18:25:24,945-INFO: Test iter 0
2020-04-24 18:25:50,377-INFO: Test finish iter 42
2020-04-24 18:25:50,378-INFO: Total number of images: 42, inference time: 1.6077
78591672592 fps.
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
2020-04-24 18:25:50,383-INFO: Start evaluate...
Loading and preparing results...
DONE (t=0.00s)
creating index...
index created!
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\envs\paddle\lib\site-packages\numpy\core\functi
on_base.py", line 117, in linspace
num = operator.index(num)
TypeError: 'numpy.float64' object cannot be interpreted as an integer

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "tools/train.py", line 331, in
main()
File "tools/train.py", line 274, in main
cfg['EvalReader']['dataset'])
File "E:\paddle\PaddleDetection-release-0.2\ppdet\utils\eval_utils.py", line 2
09, in eval_results
is_bbox_normalized=is_bbox_normalized)
File "E:\paddle\PaddleDetection-release-0.2\ppdet\utils\coco_eval.py", line 94
, in bbox_eval

@yghstill
Copy link
Collaborator

@wyc880622 由于numpy的最新1.18版本改了一个接口的参数类型,pycocotools里评估时用到这个接口,就报错了,请回退numpy版本或者升级pycocotools,具体参考 #245 #445 的issue解决方案。

@yghstill yghstill self-assigned this Apr 24, 2020
@yghstill
Copy link
Collaborator

@wyc880622 如果没有问题先关闭了,还有问题的话可以reopen这个issue。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants