测试vgg_16_cifar.py报错 #9

quietsmile · 2016-08-31T09:30:12Z

ubuntu 14.04， cuda 7.5, cudnn 5.1.5 安装成功
但是跑demo/image_classification/train.sh时报错，错误信息如下：

[INFO 2016-08-31 17:20:21,497 layers.py:1430] channels=512 size=8192
[INFO 2016-08-31 17:20:21,497 layers.py:1430] output size for conv_8 is 4
[INFO 2016-08-31 17:20:21,498 layers.py:1430] channels=512 size=8192
[INFO 2016-08-31 17:20:21,499 layers.py:1430] output size for conv_9 is 4
[INFO 2016-08-31 17:20:21,501 layers.py:1490] output size for pool_3 is 2_2
[INFO 2016-08-31 17:20:21,502 layers.py:1490] output size for pool_4 is 1_1
[INFO 2016-08-31 17:20:21,507 networks.py:960] The input order is [image, label]
[INFO 2016-08-31 17:20:21,507 networks.py:963] The output order is [cost_0]
I0831 17:20:21.523936 13974 Trainer.cpp:169] trainer mode: Normal
I0831 17:20:21.546594 13974 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-08-31 17:20:21,682 image_provider.py:52] Image size: 32
[INFO 2016-08-31 17:20:21,682 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-08-31 17:20:21,682 image_provider.py:58] DataProvider Initialization finished
I0831 17:20:21.682675 13974 PyDataProvider2.cpp:219] loading dataprovider image_provider::processData
[INFO 2016-08-31 17:20:21,682 image_provider.py:52] Image size: 32
[INFO 2016-08-31 17:20:21,682 image_provider.py:53] Meta path: data/cifar-out/batches/batches.meta
[INFO 2016-08-31 17:20:21,682 image_provider.py:58] DataProvider Initialization finished
I0831 17:20:21.683006 13974 GradientMachine.cpp:134] Initing parameters..
I0831 17:20:22.312453 13974 GradientMachine.cpp:141] Init parameters done.
.........
I0831 17:20:52.894659 13974 TrainerInternal.cpp:162] Batch=100 samples=12800 AvgCost=2.35864 CurrentCost=2.35864 Eval: classification_error_evaluator=0.833906 CurrentEval: classification_error_evaluator=0.833906
.........
I0831 17:21:00.884374 13974 TrainerInternal.cpp:162] Batch=200 samples=25600 AvgCost=2.15774 CurrentCost=1.95684 Eval: classification_error_evaluator=0.792148 CurrentEval: classification_error_evaluator=0.750391
.........
I0831 17:21:08.731333 13974 TrainerInternal.cpp:162] Batch=300 samples=38400 AvgCost=2.01417 CurrentCost=1.72705 Eval: classification_error_evaluator=0.753672 CurrentEval: classification_error_evaluator=0.676719
.........I0831 17:21:15.873359 13974 TrainerInternal.cpp:179] Pass=0 Batch=391 samples=50048 AvgCost=1.90795 Eval: classification_error_evaluator=0.71814
F0831 17:21:18.497601 13974 hl_cuda_cudnn.cc:779] Check failed: CUDNN_STATUS_SUCCESS == cudnnStat (0 vs. 5) Cudnn Error: CUDNN_STATUS_INVALID_VALUE
*** Check failure stack trace: ***
@ 0x7f609f255daa (unknown)
@ 0x7f609f255ce4 (unknown)
@ 0x7f609f2556e6 (unknown)
@ 0x7f609f258687 (unknown)
@ 0x8a98d4 hl_convolution_forward()
@ 0x5c66fc paddle::CudnnConvLayer::forward()
@ 0x62305c paddle::NeuralNetwork::forward()
@ 0x6b54af paddle::Tester::testOneBatch()
@ 0x6b5dc2 paddle::Tester::testOnePeriod()
@ 0x69a28c paddle::Trainer::trainOnePass()
@ 0x69d687 paddle::Trainer::train()
@ 0x53b0b3 main
@ 0x7f609e461ec5 (unknown)
@ 0x546695 (unknown)
@ (nil) (unknown)

更改cudnn版本，5.0.5， 4.0.4错误都一样~
求助！

reyoung · 2016-08-31T09:40:53Z

Please use command paddle version to print compile flag, and paste them here~~ Thanks.

gangliao · 2016-08-31T09:59:53Z

Hi, Can you post your GPU type name? For instance, K40?

quietsmile · 2016-08-31T10:03:08Z

PaddlePaddle 0.8.0b, compiled with
with_avx: ON
with_gpu: ON
with_double: OFF
with_python: ON
with_rdma: OFF
with_glog: ON
with_gflags: ON
with_metric_learning:
with_timer: OFF
with_predict_sdk:

gtx titanx, driver 352.39

qingqing01 · 2016-08-31T11:42:08Z

@quietsmile Hi, there is no problem when we tested on Tesla K20/K40 with cuda 7.5 and cudnn 5.1, cudnn 4.0. But we don't have gtx titanx environment and wasn't able to to replicate this problem. We will solve it later.

wangjiangb · 2016-09-02T01:31:03Z

I have added a change list to fix it.

qingqing01 · 2016-09-23T15:49:30Z

@quietsmile We have fixed this problem in GTX 980, see 341486d .

hedaoyuan · 2016-09-29T14:03:12Z

Fixed #107, and close issue.

…amework_proto Fix merge error

merge to local

Add fsp op for distillation in slim.

* add c_concat for npu * UT for c_concat_npu * fix c_concat , adding rank * add assert nranks * add assert dims % nranks == 0

[yolov3] Add yolov3 demo

[Gpugraph] change graph_sample interface

* parquet parser * fix IsThreadLocalCapturing * run cuda kernel: CalcAucKernel with 512 threads * fix_afs_api_download_dnn_plugin * fix_fleet_last_base * parquet parser * add ps core so * chg cmake Co-authored-by: rensilin <[email protected]> Co-authored-by: root <[email protected]> * parquet * fix IsThreadLocalCapturing * run cuda kernel: CalcAucKernel with 512 threads * fix_afs_api_download_dnn_plugin * fix_fleet_last_base * parquet parser * add ps core so * chg cmake * fix libjvm lost Co-authored-by: rensilin <[email protected]> Co-authored-by: root <[email protected]> * add dymf (PaddlePaddle#10) * dymf tmp * add dymf tmp * local test change * pull thread pool * fix conflict * delete unuse log * local change for mirrow 0 * fix dymf * code clean * fix code clean * code clean * code clean * fix dymf * fix dymf * add endpass optimize * clean code * fix endpass optimize * fix * fix Co-authored-by: yaoxuefeng6 <[email protected]> Co-authored-by: Thunderbrook <[email protected]> * pipeline build (#9) * Fix eigvals_op (PaddlePaddle#12) * dymf tmp * add dymf tmp * local test change * pull thread pool * fix conflict * delete unuse log * local change for mirrow 0 * fix dymf * code clean * fix code clean * code clean * code clean * fix dymf * fix dymf * add endpass optimize * clean code * fix endpass optimize * fix * fix * fix eigvals_op * merge pre-stable * merge pre-stable Co-authored-by: yaoxuefeng6 <[email protected]> Co-authored-by: Thunderbrook <[email protected]> * test * passid memory && Generalization * fix code style Co-authored-by: xionglei1234 <[email protected]> Co-authored-by: rensilin <[email protected]> Co-authored-by: root <[email protected]> Co-authored-by: zmxdream <[email protected]> Co-authored-by: yaoxuefeng6 <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: Thunderbrook <[email protected]> Co-authored-by: liaoxiaochao <[email protected]>

This reverts commit 869c43f.

* revert pipeline pull * fix conflict * fix conflict * fix conflict * add jvm.so * Revert "pipeline build (#9)" This reverts commit 869c43f. * revert async build pull

* revert pipeline pull * fix conflict * fix conflict * fix conflict * add jvm.so * Revert "pipeline build (#9)" This reverts commit 869c43f. * revert async build pull * fix dataset * fix dataset

add paddlebox_v2.0

add cgpu and file parser block bug fix

* update docs * add pretrained models

test warning ast only

Dingxiang opt

Kernels

Readme: fix link to header file

fix static issues

支持overwrite = True时的scatter算子，减少子图数量

fix

update model doc

reyoung added the Bug label Aug 31, 2016

reyoung assigned qingqing01 Aug 31, 2016

hedaoyuan closed this as completed Sep 29, 2016

sarawon mentioned this issue Feb 21, 2017

GPU训练时候报错 #1406

Closed

sdujq mentioned this issue May 5, 2017

paddle exp计算出core （vsExp），浮点计算溢出？ #2024

Closed

April1010 mentioned this issue Jul 28, 2017

SRL任务中CRF-layer使用gpu训练出core #3091

Closed

xiehongweiscut mentioned this issue Aug 3, 2017

使用paddle capi在线化部署服务出现core #3207

Closed

qingqing01 added a commit to qingqing01/Paddle that referenced this issue Aug 10, 2017

Merge pull request PaddlePaddle#9 from reyoung/feature/refactorize_fr…

0515d40

…amework_proto Fix merge error

fty8788 mentioned this issue Aug 21, 2017

capi sequence方式调用出core #3590

Closed

fty8788 mentioned this issue Sep 6, 2017

C预测程序中，如何跳过部分底层网络，将中间层作为输入？ #3915

Closed

fsfszongming256 mentioned this issue Sep 8, 2017

capi预测出core，当样本是dense_vector_sequence类型时，请教正确的调用方式 #3969

Closed

fty8788 mentioned this issue Jan 23, 2018

capi forward函数core： Check failed: size != 0 allocate 0 bytes #7774

Closed

likeqinqin mentioned this issue Mar 28, 2018

使用libpaddle_capi_shared.so动态链接库，偶发core #9436

Closed

heroes999 mentioned this issue Mar 30, 2018

Paddle V2 capi启动出core #9534

Closed

likeqinqin mentioned this issue Apr 11, 2018

c++使用动态链接库、多线程，加载多个模型，会概率性假死 #9845

Closed

lyp2github mentioned this issue Apr 18, 2018

capi出core #10005

Closed

wangshuohuan mentioned this issue Apr 25, 2018

Seq2Seq网络（对示例网络的部分layer做了修改），报Check failed: size != 0 allocate 0 bytes，输入数据和batch数正常非空，麻烦帮忙看下原因，谢谢 #10187

Closed

yttbgf mentioned this issue May 22, 2018

paddle_gradient_machine_destroy core #10845

Closed

xuezhong mentioned this issue Jun 12, 2018

distribution trainning for transformer core dump #11387

Closed

lyp2github mentioned this issue Jun 13, 2018

启动时libpaddle_capi_shared.so 出core #11426

Closed

velconia pushed a commit that referenced this issue Mar 22, 2019

Merge pull request #9 from PaddlePaddle/develop

98069d9

merge to local

bingyanghuang pushed a commit to bingyanghuang/Paddle that referenced this issue Mar 25, 2019

Merge pull request PaddlePaddle#9 from wanghaoshuang/fsp_op

38272f2

Add fsp op for distillation in slim.

jjlucus mentioned this issue Apr 29, 2019

调用预测库报core dump #17168

Closed

paddle-bot-old bot referenced this issue Nov 15, 2021

UT for c_concat_npu

014a6f9

paddle-bot-old bot referenced this issue Nov 16, 2021

fix c_concat , adding rank

31adfbc

paddle-bot-old bot referenced this issue Nov 16, 2021

add assert dims % nranks == 0

1972450

paddle-bot-old bot pushed a commit that referenced this issue Nov 17, 2021

[NPU] c_concat (#9)

7325986

* add c_concat for npu * UT for c_concat_npu * fix c_concat , adding rank * add assert nranks * add assert dims % nranks == 0

gglin001 pushed a commit to graphcore/Paddle-fork that referenced this issue Dec 8, 2021

Add ipu test code (PaddlePaddle#9)

38b7778

AshburnLee mentioned this issue Dec 8, 2021

optimize elementwise_max_grad using new interfaces #37906

Merged

paddle-bot-old bot referenced this issue Mar 7, 2022

update

e7c95e3

wuwuwuxxx mentioned this issue Mar 10, 2022

linux gpu编译错误 #40394

Closed

zhoutianzi666 pushed a commit to zhoutianzi666/Paddle that referenced this issue May 23, 2022

Merge pull request PaddlePaddle#9 from jiweibo/add_yolo_demo

108fab2

[yolov3] Add yolov3 demo

danleifeng pushed a commit to danleifeng/Paddle that referenced this issue May 31, 2022

Merge pull request PaddlePaddle#9 from Thunderbrook/gpugraph_deepwalk

2a16ca7

[Gpugraph] change graph_sample interface

zmxdream referenced this issue in zmxdream/Paddle Jun 8, 2022

pipeline build (#9)

869c43f

zmxdream referenced this issue in zmxdream/Paddle Jul 4, 2022

Revert "pipeline build (#9)"

3275822

This reverts commit 869c43f.

zmxdream referenced this issue in zmxdream/Paddle Jul 6, 2022

revert pipeline pull (PaddlePaddle#35)

61be085

* revert pipeline pull * fix conflict * fix conflict * fix conflict * add jvm.so * Revert "pipeline build (#9)" This reverts commit 869c43f. * revert async build pull

jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022

Merge pull request #9 from qingshui/paddlebox_v2.0

b4b354a

add paddlebox_v2.0

jack603047588 referenced this issue in jack603047588/Paddle Nov 9, 2022

Merge pull request #9 from qingshui/paddlebox

e49656a

add cgpu and file parser block bug fix

marsbzp mentioned this issue Jan 11, 2023

多线程调用C++推理库进行RNN算子崩溃问题！！！！ #49737

Open

qizhaoaoe pushed a commit to qizhaoaoe/Paddle that referenced this issue Mar 3, 2023

Update docs and add pretrained model (PaddlePaddle#9)

33f48e6

* update docs * add pretrained models

chlyzzo mentioned this issue Mar 29, 2023

paddle/fluid/core_avx.so paddle::memory::allocation::MemoryMapFdSet::Clear() #52269

Closed

0x45f pushed a commit to 0x45f/Paddle that referenced this issue Jun 19, 2023

Merge pull request PaddlePaddle#9 from 0x45f/fix-ut1

c2fe354

test warning ast only

tianyan01 pushed a commit to tianyan01/Paddle that referenced this issue Jan 23, 2024

Merge pull request PaddlePaddle#9 from laipaang/dingxiang-opt

dc60819

Dingxiang opt

lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024

fix nas print best_tokens (PaddlePaddle#9)

bdac950

hanhaowen-mt pushed a commit to hanhaowen-mt/Paddle that referenced this issue Feb 29, 2024

Merge pull request PaddlePaddle#9 from mthreads/kernels

8d14d0b

Kernels

NKNaN pushed a commit to NKNaN/Paddle that referenced this issue Mar 3, 2024

Merge pull request PaddlePaddle#9 from est31/fix_header_link

cda4fcb

Readme: fix link to header file

Fridge003 pushed a commit to Fridge003/Paddle that referenced this issue Mar 21, 2024

Merge pull request PaddlePaddle#9 from Fridge003/cinn_tmp

11bebda

fix static issues

ming1753 added a commit to ckl117/Paddle that referenced this issue Jul 23, 2024

Merge pull request PaddlePaddle#9 from ckl117/ADFM-PNC

0079266

支持overwrite = True时的scatter算子，减少子图数量

lizexu123 added a commit to lizexu123/Paddle that referenced this issue Jul 29, 2024

Merge pull request PaddlePaddle#9 from lizexu123/add_trt

faeb360

fix

WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this issue Dec 6, 2024

Merge pull request PaddlePaddle#9 from jerrywgz/update_model_doc

c524cdb

update model doc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

测试vgg_16_cifar.py报错 #9

测试vgg_16_cifar.py报错 #9

quietsmile commented Aug 31, 2016 •

edited

Loading

reyoung commented Aug 31, 2016

gangliao commented Aug 31, 2016

quietsmile commented Aug 31, 2016

qingqing01 commented Aug 31, 2016

wangjiangb commented Sep 2, 2016

qingqing01 commented Sep 23, 2016

hedaoyuan commented Sep 29, 2016

测试vgg_16_cifar.py报错 #9

测试vgg_16_cifar.py报错 #9

Comments

quietsmile commented Aug 31, 2016 • edited Loading

reyoung commented Aug 31, 2016

gangliao commented Aug 31, 2016

quietsmile commented Aug 31, 2016

qingqing01 commented Aug 31, 2016

wangjiangb commented Sep 2, 2016

qingqing01 commented Sep 23, 2016

hedaoyuan commented Sep 29, 2016

quietsmile commented Aug 31, 2016 •

edited

Loading