Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

distribute training crashed on develop branch #10625

Closed
Yancey1989 opened this issue May 14, 2018 · 2 comments
Closed

distribute training crashed on develop branch #10625

Yancey1989 opened this issue May 14, 2018 · 2 comments
Assignees
Labels

Comments

@Yancey1989
Copy link
Contributor

Yancey1989 commented May 14, 2018

Run the demo code test_word2vec.py with 2pservers + 2trainers, other demos can also reproduce this bug:

I0514 07:31:17.266929 11136 operator.cc:546] expected_kernel_key:data_type[float32]:data_layout[ANY_LAYOUT]:place[CPUPlace]:library_type[PLAIN]
PC: @                0x0 (unknown)
*** SIGSEGV (@0x0) received by PID 10770 (TID 0x7f8132309700) from PID 0; stack trace: ***
    @     0x7f83817d6390 (unknown)
    @     0x7f8334340dc7 std::_Hashtable<>::find()
    @     0x7f833433eb69 paddle::framework::Scope::FindVarLocally()
    @     0x7f833433eba4 paddle::framework::Scope::FindVar()
    @     0x7f833432aa1f paddle::framework::ExecutionContext::Input<>()
    @     0x7f8334230240 paddle::operators::SGDOpKernel<>::Compute()
    @     0x7f833432dac6 paddle::framework::OperatorWithKernel::RunImpl()
    @     0x7f833432a078 paddle::framework::OperatorBase::Run()
    @     0x7f8333b83ace paddle::framework::Executor::RunPreparedContext()
@seiriosPlus
Copy link
Collaborator

Run with 1pservers + 1trainers, pserver report this bug:

*** Error in `python': double free or corruption (out): 0x00007f771c000930 ***
======= Backtrace: =========
/lib/x86_64-linux-gnu/libc.so.6(+0x777e5)[0x7f778c9987e5]
/lib/x86_64-linux-gnu/libc.so.6(+0x8037a)[0x7f778c9a137a]
/lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f778c9a553c]
/usr/local/lib/python2.7/dist-packages/paddle/fluid/core.so(_ZN6paddle9framework5Scope8DropKidsEv+0x39)[0x7f776ecc8c39]
/usr/local/lib/python2.7/dist-packages/paddle/fluid/core.so(_ZN6paddle9framework8Executor18RunPreparedContextEPNS0_22ExecutorPrepareContextEPNS0_5ScopeEbb+0x98a)[0x7f776e50baea]
/usr/local/lib/python2.7/dist-packages/paddle/fluid/core.so(+0xcaddf9)[0x7f776ebf6df9]
/usr/local/lib/python2.7/dist-packages/paddle/fluid/core.so(_ZNSt13__future_base13_State_baseV29_M_do_setEPSt8functionIFSt10unique_ptrINS_12_Result_baseENS3_8_DeleterEEvEEPb+0x2e)[0x7f776eb1204e]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xea99)[0x7f778ccf9a99]
/usr/local/lib/python2.7/dist-packages/paddle/fluid/core.so(+0xcacf37)[0x7f776ebf5f37]
/usr/local/lib/python2.7/dist-packages/paddle/fluid/core.so(_ZN6paddle9framework10ThreadPool8TaskLoopEv+0x3f4)[0x7f776eccdf84]
/usr/lib/x86_64-linux-gnu/libstdc++.so.6(+0xb8c80)[0x7f778278cc80]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f778ccf26ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f778ca2841d]

@Yancey1989
Copy link
Contributor Author

Already fixed by #10663 , close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants