Can't alloc enough memory when run test. #9347

gongweibao · 2018-03-24T08:29:32Z

[16:20:47]	188/215 Test #185: test_multihead_attention ........................***Failed    4.31 sec
[16:20:47]	test_multihead_attention (test_multihead_attention.TestMultiheadAttention) ... ERROR
[16:20:47]	
[16:20:47]	======================================================================
[16:20:47]	ERROR: test_multihead_attention (test_multihead_attention.TestMultiheadAttention)
[16:20:47]	----------------------------------------------------------------------
[16:20:47]	Traceback (most recent call last):
[16:20:47]	  File "test_multihead_attention.py", line 92, in test_multihead_attention
[16:20:47]	    self.run_program()
[16:20:47]	  File "test_multihead_attention.py", line 65, in run_program
[16:20:47]	    self.set_inputs(place)
[16:20:47]	  File "test_multihead_attention.py", line 80, in set_inputs
[16:20:47]	    queries.set(self.queries, place)
[16:20:47]	EnforceNotMet: enforce allocating <= available failed, 1822361026 > 1771896576
[16:20:47]	 at [/paddle/paddle/fluid/platform/gpu_info.cc:118]
[16:20:47]	PaddlePaddle Call Stacks: 
[16:20:47]	0       0x7fef6ec1d6e8p paddle::platform::GpuMaxChunkSize() + 5080
[16:20:47]	1       0x7fef6dec7fc9p paddle::memory::GetGPUBuddyAllocator(int) + 249
[16:20:47]	2       0x7fef6dec819bp void* paddle::memory::Alloc<paddle::platform::CUDAPlace>(paddle::platform::CUDAPlace, unsigned long) + 43
[16:20:47]	3       0x7fef6de224e2p paddle::framework::Tensor::mutable_data(boost::variant<paddle::platform::CUDAPlace, paddle::platform::CPUPlace, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_, boost::detail::variant::void_>, std::type_index) + 674
[16:20:47]	4       0x7fef6de5881ap void paddle::pybind::PyCUDATensorSetFromArray<float>(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&) + 330
[16:20:47]	5       0x7fef6de3a345p void pybind11::detail::argument_loader<paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&>::call_impl<void, void (*&)(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&), 0ul, 1ul, 2ul>(void (*&)(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&), pybind11::detail::index_sequence<0ul, 1ul, 2ul>) + 85
[16:20:47]	6       0x7fef6de42f35p void pybind11::cpp_function::initialize<void (*&)(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&), void, paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (*&)(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&), void (*)(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::operator()(pybind11::detail::function_call&) const + 517
[16:20:47]	7       0x7fef6de430bep void pybind11::cpp_function::initialize<void (*&)(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&), void, paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&, pybind11::name, pybind11::is_method, pybind11::sibling>(void (*&)(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&), void (*)(paddle::framework::Tensor&, pybind11::array_t<float, 17>, paddle::platform::CUDAPlace&), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) + 14

The text was updated successfully, but these errors were encountered:

kbinias · 2018-03-24T18:04:55Z

I had the same problem (#9081) but on agent ip_172.19.56.198 it was possible.

putcn · 2018-03-26T17:34:33Z

@helinwang thinks this might relate to minibatch size, going to test smaller minibatch on CI machines to see if this fixes

gongweibao assigned helinwang and putcn Mar 24, 2018

gongweibao closed this as completed Apr 12, 2018

blacksheep-Aristotle pushed a commit to blacksheep-Aristotle/Paddle that referenced this issue Nov 22, 2024

Add ordered save to avoid OOM (PaddlePaddle#9347)

ba5c2ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't alloc enough memory when run test. #9347

Can't alloc enough memory when run test. #9347

gongweibao commented Mar 24, 2018

kbinias commented Mar 24, 2018 •

edited

Loading

putcn commented Mar 26, 2018

Can't alloc enough memory when run test. #9347

Can't alloc enough memory when run test. #9347

Comments

gongweibao commented Mar 24, 2018

kbinias commented Mar 24, 2018 • edited Loading

putcn commented Mar 26, 2018

kbinias commented Mar 24, 2018 •

edited

Loading