Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

单卡下跑机器翻译程序出现GPU错误 #87

Closed
peterzhang2029 opened this issue Mar 19, 2018 · 3 comments
Closed

单卡下跑机器翻译程序出现GPU错误 #87

peterzhang2029 opened this issue Mar 19, 2018 · 3 comments

Comments

@peterzhang2029
Copy link
Contributor

错误信息:

pass_id=0, batch_id=425, train_loss: 6.189235
pass_id=0, batch_id=426, train_loss: 6.343625
pass_id=0, batch_id=427, train_loss: 6.316195
pass_id=0, batch_id=428, train_loss: 6.354477
pass_id=0, batch_id=429, train_loss: 6.414869
pass_id=0, batch_id=430, train_loss: 6.304601
pass_id=0, batch_id=431, train_loss: 6.103783
pass_id=0, batch_id=432, train_loss: 6.382751
pass_id=0, batch_id=433, train_loss: 6.372557
Traceback (most recent call last):
  File "origin.py", line 336, in <module>
    train()
  File "origin.py", line 313, in train
    fetch_list=[avg_cost])
  File "/usr/local/lib/python2.7/dist-packages/paddle/fluid/executor.py", line 349, in run
    self.executor.run(program_cache.desc, scope, 0, True, True)
paddle.fluid.core.EnforceNotMet: an illegal memory access was encountered at [/benchmark/Paddle/paddle/fluid/platform/device_context.cc:162]
PaddlePaddle Call Stacks:
0       0x7f1a3cbf6d57p paddle::platform::EnforceNotMet::EnforceNotMet(std::__exception_ptr::exception_ptr, char const*, int) + 727
1       0x7f1a3d9b6a72p paddle::platform::CUDADeviceContext::Wait() const + 466
2       0x7f1a3d2273b5p paddle::operators::WhileGradOp::RunImpl(paddle::framework::Scope const&, boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_> const&) const + 4869
3       0x7f1a3ccac14cp paddle::framework::Executor::RunPreparedContext(paddle::framework::ExecutorPrepareContext*, paddle::framework::Scope*, bool, bool) + 1836
4       0x7f1a3ccad688p paddle::framework::Executor::Run(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool) + 104
5       0x7f1a3cc145c3p void pybind11::cpp_function::initialize<pybind11::cpp_function::initialize<void, paddle::framework::Executor, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::Executor::*)(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool)#1}, void, paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(pybind11::cpp_function::initialize<void, paddle::framework::Executor, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool, pybind11::name, pybind11::is_method, pybind11::sibling>(void (paddle::framework::Executor::*)(paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool)#1}&&, void (*)(paddle::framework::Executor*, paddle::framework::ProgramDesc const&, paddle::framework::Scope*, int, bool, bool), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call) + 579
6       0x7f1a3cc12254p pybind11::cpp_function::dispatcher(_object*, _object*, _object*) + 1236
7             0x4c37edp PyEval_EvalFrameEx + 31165
8             0x4b9ab6p PyEval_EvalCodeEx + 774
9             0x4c16e7p PyEval_EvalFrameEx + 22711
10            0x4b9ab6p PyEval_EvalCodeEx + 774
11            0x4c1e6fp PyEval_EvalFrameEx + 24639
12            0x4b9ab6p PyEval_EvalCodeEx + 774
13            0x4eb30fp
14            0x4e5422p PyRun_FileExFlags + 130
15            0x4e3cd6p PyRun_SimpleFileExFlags + 390
16            0x493ae2p Py_Main + 1554
17      0x7f1ad4008830p __libc_start_main + 240
18            0x4933e9p _start + 41

source code : https://github.com/dzhwinter/benchmark/blob/master/fluid/machine_translation.py
commit id: c0421379b78135d028d9b0e1c816c0c831205512
run command line:

export CUDA_VISIBLE_DEVICES="3"
python machine_translation.py
@peterzhang2029
Copy link
Contributor Author

目前的结论:
用较新的版本的 paddlepaddle 源码(commit id: 30b70323b4dc04ff1270c520711fa5428f509ae5 )编译的单卡和多卡都会出以上问题, 如果用commit id : bd8440921c6dcf4df26e236d2b0698d87499c05c 的版本编译就可以跑通了。

@dzhwinter
Copy link
Owner

It has been fixed. Please use the latest code.

@pkuyym pkuyym reopened this Mar 26, 2018
@pkuyym
Copy link
Collaborator

pkuyym commented Mar 26, 2018

Fixed by PaddlePaddle/Paddle#9337

@pkuyym pkuyym closed this as completed Mar 26, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants