Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered #346

Closed
jamestang0219 opened this issue Nov 4, 2016 · 7 comments
Assignees

Comments

@jamestang0219
Copy link

jamestang0219 commented Nov 4, 2016

Hello, while I train my models, this error sometimes occurred. It is very weird that by training the same samples, the error sometimes occurred, but sometimes didn't. I wanna know why this error occur and how to avoid this error to train my models successfully. Here is the log:

I1104 11:18:44.281013 87409 TrainerInternal.cpp:165]  Batch=3000 samples=192000 AvgCost=0.136428 CurrentCost=0.100355 Eval: classification_error_evaluator=0.0541406  CurrentEval: classification_error_evaluator=0.0395312
I1104 11:18:45.290464 87409 Tester.cpp:127]  Test samples=1000 cost=0.347865 Eval: classification_error_evaluator=0.148
...................................................................................................
I1104 11:19:09.124543 87409 TrainerInternal.cpp:165]  Batch=3100 samples=198400 AvgCost=0.135145 CurrentCost=0.0966748 Eval: classification_error_evaluator=0.0535938  CurrentEval: classification_error_evaluator=0.0371875
..........F1104 11:19:11.596421 87418 hl_cuda_cublas.cc:220] Check failed: stat == CUBLAS_STATUS_SUCCESS (13 vs. 0) [cublas status]: execution failed
*** Check failure stack trace: ***
F1104 11:19:11.596427 87426 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596457 87427 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596423 87430 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596422 87422 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
*** Check failure stack trace: ***
F1104 11:19:11.596427 87426 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596457 87427 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596423 87430 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596422 87422 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
*** Check failure stack trace: ***
F1104 11:19:11.596427 87426 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596457 87427 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596423 87430 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encounteredF1104 11:19:11.596422 87422 hl_cuda_device.cc:661] Check failed: cudaSuccess == cudaStat (0 vs. 77) Cuda Error: an illegal memory access was encountered
*** Check failure stack trace: ***
/usr/local/paddle/bin//paddle: line 81: 87409 Aborted                 (core dumped) ${DEBUGGER} $MYDIR/../opt/paddle/bin/paddle_trainer ${@:2}

wish your reply, thank you.

@gangliao
Copy link
Contributor

gangliao commented Nov 4, 2016

@jamestang0219
Could you do more professional-quality typesetting on error information? Because it's quite illegible and unreadable. Thanks.

@luotao1
Copy link
Contributor

luotao1 commented Nov 4, 2016

@jamestang0219 You can look at your error information, I have already do the format. Please use three ```

@backyes
Copy link
Contributor

backyes commented Nov 4, 2016

@jamestang0219

  1. Use git log to check if there are some BUGFIXs for GPU training, and try to upgrade Paddle if necessary.
  2. Show us more details
  3. model config if necessary
  4. command options

@jamestang0219
Copy link
Author

@gangliao @luotao1 @backyes
Sorry for the bad log format. My paddle version is 0.8.0b3.
Which kind of information do you need ?

@hedaoyuan
Copy link
Contributor

@jamestang0219
Is this issue the same as #182 ?

@jamestang0219
Copy link
Author

@hedaoyuan
yes, and it sometimes appeared
I'm wondering if there are some problems with the input value, the error must appear every time that I use the same data to train, but it didn't appear for each time. Only for a large data set, for example more that 500,000 sentences data, this error may occur. If I split the large data set into 2 or more pieces, and only train one split, this error never occur.

@hedaoyuan
Copy link
Contributor

Ok, close this issue. We discuss this issue in #182

gglin001 added a commit to graphcore/Paddle-fork that referenced this issue Mar 17, 2022
danleifeng pushed a commit to danleifeng/Paddle that referenced this issue Aug 22, 2023
danleifeng pushed a commit to danleifeng/Paddle that referenced this issue Sep 13, 2023
lizexu123 pushed a commit to lizexu123/Paddle that referenced this issue Feb 23, 2024
WAYKEN-TSE pushed a commit to WAYKEN-TSE/Paddle that referenced this issue Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants