-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting maximum_execution_time to millisecond precision for under a second causes the program to hang forever on exit #260
Comments
Additional notes:
|
Could it be the timer handler started and expired before going out of the scope? |
Hi, thanks for reporting the issue and provide detailed information. we will take some time investigating on it. Please stay tuned. |
With the given information I tried to run with this code for a reproduce on SLES12sp3 but failed. The version I used: azure-storage-cpp 6.1.0 + cpprestsdk 2.10.14. The script I used:
Can you take a look and see if this code can trigger the issue in your environment?
Currently we have test cases to cover the scenarios of millisecond timeout. If it is not enough and you have concern of the coverage, please feel free to raise it and we will carefully evaluate it. |
I am using sdk 6.0.0 and cpprestsdk 2.10.11. I will try your code and script to reproduce in both of my environment and the build environment you've suggested to see if would happen. Thanks, |
Hi, can you take a look at PR #277 ? It resolves this issue at my end. |
@katmsft @JinmingHu-MSFT I've just discovered a memory leak detected by valgrind runnning a test program which runs in a loop listing the blobs within a container. See attached valgrind output. Thanks as always, Yang |
Just ran a test which reads a single object for many iterations and it revealed the same memory leak by the timer patch PR#277. See attached. |
@yxiang92128 Thanks for your feedback, I'll take time to see it. |
@yxiang92128 Hi, I cannot reproduce the memory leak in both scenarios you mentioned with 7.0.0. Can you share source code of your test program? |
@JinmingHu-MSFT There are also two valgrind output files also attached: What I found is that without the following lines in the test_azure.cpp code, valgrind does NOT show any leaks. Perhaps if the maximum_execution_time is set to be a big value, then the leak would occur? Moreover, we are currently still on 6.1.0 release with the PR277 patch. The new 7.0.0 release had added the requirement for GCC 5.1+ and in our SLES12SP3 environment, the newest version of GCC we currently support is GCC 4.8.5 which would add difficulty for us to adopt to your new features and fixes in the future. Hope it helps you in reproducing the leaks indicated. Thanks again for your help. Yang |
@yxiang92128 Hi, now I can reproduce the issue. It's because the timer isn't stopped if the operation is finished before timeout. Since you set the maximum execution time to 10 seconds, if you let the program sleep like 11 seconds before exit, the leak is gone. |
@JinmingHu-MSFT if the operation is finished, shouldn't the timer be stopped? Without the PR277, it didn't function like this and leaked memory and if I have multiple requests going on, they seem to pile up the leaks. Can you see if you can find a fix for it? Thanks. |
@yxiang92128 PR #277 introduced this issue. I'm now trying to fix it. |
@yxiang92128 Your account name and key leaked in azure_memleak.txt and azure_noleak_without_setmaxtime.txt. I deleted these two files, please revoke the key ASAP. |
@JinmingHu-MSFT changed the secret key. thanks for letting us know. |
@yxiang92128 Hi, can you give this PR #297 a try, verifying it fixes the memory leak and doesn't bring back the hang issue? |
@JinmingHu-MSFT |
I just tried to apply the patches to v6.0.0, there were some conflicts, and maybe you didn't handle them correctly. Give this branch a try https://github.com/JinmingHu-MSFT/azure-storage-cpp/tree/fix260-6.0.0. It should fix the memory leak issue. Please note that there is still some timer_handler related hang/crash issue. I'm still working on it. |
@JinmingHu-MSFT ==32326== 1,192 (32 direct, 1,160 indirect) bytes in 1 blocks are definitely lost in loss record 278 of 287 |
@yxiang92128 Hi, I observed the same leak issue as you did. This is because after the timer is stopped, cpprestsdk starts another task to do some post-process. Inside cpprestsdk there's a thread pool to run these tasks. It takes some time to get the task scheduled and finished. If you have your program sleep like 20ms before exit, the leak is gone. Since this memory leak is not that severe and is recycled very soon and doesn't pile up, we decide not to fix it for now. |
@JinmingHu-MSFT
thanks, Yang |
Yes, when cpprestsdk initializes, it preallocates all threads in its thread pool. Cpprestsdk does provide an API to change the pool size, but only before the thread pool is initialized. Unfortunately, currently azure-storage-cpp sdk will initialize the thread pool at loading time, prior to any attempts users can do to change the pool size. So practically, you can't. But I still think this is a valid feature, why not open another issue requesting this feature and we'll evaluate it. |
We're going to close this issue because of inactivity, feel free to reopen it if you have any further questions. |
During our negative testing on SLES12SP3, we set the maximum_execution_time for the blob_client to 1 milliseconds and repeatedly execute a test program to trigger and test a timeout condition.
azure::storage::blob_request_options options;
azure::storage::td_retry_policy policy(std::chrono::seconds(1), 4);
options.set_retry_policy(policy);
options.set_server_timeout(std::chrono::seconds(1));
options.set_maximum_execution_time(std::chrono::milliseconds(1));
options.set_parallelism_factor(2);
blob_client = storage_account.create_cloud_blob_client(options);
And the main program when calling return(0) or exit(0) in the end when calling do_global_dtors_aux, it would be blocked by a thread in pplx::details::event_impl::set() and the main program will then hangs for ever.
This is only occurring in 2.10.11 when millisecond precision was introduced and the workaround for us is to set the maximum_execution_time to 1000 milliseconds and the program then will exit cleanly every time. Using Azure Storage SDK5.0 and C++ REST SDK 2.9.1 won't produce such hang.
Here is the stack trace when the main program hangs:
GNU gdb (GDB; SUSE Linux Enterprise 12) 8.0
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later http://gnu.org/licenses/gpl.html
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-suse-linux".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
http://bugs.opensuse.org/.
Find the GDB manual and other documentation resources online at:
http://www.gnu.org/software/gdb/documentation/.
For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 13824
[New LWP 13825]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f721ea0899d in pthread_join () from /lib64/libpthread.so.0
Missing separate debuginfos, use: zypper install glibc-debuginfo-2.22-61.3.x86_64 libgcc_s1-debuginfo-6.2.1+r239768-2.4.x86_64 libglib-2_0-0-debuginfo-2.48.2-10.2.x86_64 libglibmm-2_4-1-debuginfo-2.48.1-5.5.x86_64 libgmodule-2_0-0-debuginfo-2.48.2-10.2.x86_64 libgobject-2_0-0-debuginfo-2.48.2-10.2.x86_64 liblzma5-debuginfo-5.0.5-4.852.x86_64 libopenssl1_0_0-debuginfo-1.0.2j-60.11.2.x86_64 libpcre1-debuginfo-8.39-7.1.x86_64 libsigc-2_0-0-debuginfo-2.8.0-6.2.x86_64 libstdc++6-debuginfo-6.2.1+r239768-2.4.x86_64 libuuid1-debuginfo-2.29.2-2.3.x86_64 libxml2-2-debuginfo-2.9.4-46.3.2.x86_64 libz1-debuginfo-1.2.8-11.1.x86_64
(gdb) info threads
Id Target Id Frame
from /lib64/libpthread.so.0
2 Thread 0x7f7218e79700 (LWP 13825) "test_azure" 0x00007f721ea0ea7c in __lll_lock_wait ()
from /lib64/libpthread.so.0
(gdb) thread 1
[Switching to thread 1 (Thread 0x7f721ee147c0 (LWP 13824))]
#0 0x00007f721ea0899d in pthread_join () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f721ea0899d in pthread_join () at /lib64/libpthread.so.0
Windows Azure Storage Client Library for C++ v0.2.0 #1 0x00007f721e728919 in (anonymous namespace)::threadpool_impl::~threadpool_impl() ()
at /usr/local/lib64/libcpprest.so.2.10
Commit dependency libs and dlls #2 0x00007f7219d7549f in __cxa_finalize () at /lib64/libc.so.6
Add unit tests #3 0x00007f721e6369c3 in __do_global_dtors_aux () at /usr/local/lib64/libcpprest.so.2.10
Windows Azure Storage Client Library for C++ v0.2.0 #4 0x00007ffe999be270 in ()
http proxy issue #5 0x00007f721ec2b2ca in _dl_fini () at /lib64/ld-linux-x86-64.so.2
(gdb) thread 2
[Switching to thread 2 (Thread 0x7f7218e79700 (LWP 13825))]
#0 0x00007f721ea0ea7c in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0 0x00007f721ea0ea7c in __lll_lock_wait () at /lib64/libpthread.so.0
Windows Azure Storage Client Library for C++ v0.2.0 #1 0x00007f721ea0c8d0 in pthread_cond_broadcast@@GLIBC_2.3.2 () at /lib64/libpthread.so.0
Commit dependency libs and dlls #2 0x00007f721a6b08b9 in std::condition_variable::notify_all() () at /usr/lib64/libstdc++.so.6
Add unit tests #3 0x000000000050942f in pplx::details::event_impl::set() (this=0x1097138)
at /usr/local/include/pplx/pplxlinux.h:84
Windows Azure Storage Client Library for C++ v0.2.0 #4 0x000000000050a13a in pplx::details::_CancellationTokenState::_Cancel() (this=0x1097120)
at /usr/local/include/pplx/pplxcancellation_token.h:383
http proxy issue #5 0x000000000051483e in std::function<void ()>::operator()() const (this=0x7f71f8000900)
at /usr/include/c++/4.8/functional:2471
Gert "Unhandled exception at 0x77562EEC in xxxxx.exe: Microsoft C++ exception: std::runtime_error at memory location 0x0107B60C." error #6 0x000000000050c761 in pplx::details::_MakeUnitToUnitFunc(std::function<void ()> const&)::{lambda(unsigned char)Windows Azure Storage Client Library for C++ v0.2.0 #1}::operator()(unsigned char) const (__closure=0x7f71f8000900) at /usr/local/include/pplx/pplxtasks.h:2653
Azure Storage Client Library for C++ v0.3.0 #7 0x000000000051a29d in std::_Function_handler<unsigned char (unsigned char), pplx::details::_MakeUnitToUnitFunc(std::function<void ()> const&)::{lambda(unsigned char)Windows Azure Storage Client Library for C++ v0.2.0 #1}>::_M_invoke(std::_Any_data const&, unsigned char) (__functor=..., __args#0=0 '\000') at /usr/include/c++/4.8/functional:2057
Update NuSpec for 0.3 release #8 0x00007f721adfb271 in pplx::details::_PPLTaskHandle<unsigned char, pplx::task::_ContinuationTaskHandle<void, void, azure::storage::core::timer_handler::start_timer(std::chrono::duration<long, std::ratio<1l, 1000l> > const&)::{lambda()Windows Azure Storage Client Library for C++ v0.2.0 #1}, std::integral_constant<bool, false>, pplx::details::_TypeSelectorNoAsync>, pplx::details::_ContinuationTaskHandleBase>::invoke() const () at /usr/local/lib/libazurestorage.so.6
Debug Assertion Expression: _pFirstBlock == pHead #9 0x000000000050a81b in pplx::details::_TaskProcHandle::_RunChoreBridge(void*) (_Parameter=0x1096310)
at /usr/local/include/pplx/pplx.h:113
Missing hash_linux.h header #10 0x00007f721e728221 in boost::asio::detail::completion_handler<boost::_bi::bind_t<void, void ()(void), boost::_bi::list1<boost::_bi::value<void*> > > >::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) () at /usr/local/lib64/libcpprest.so.2.10
Unknown variable 'format' on line 344 of util.cpp in Microsoft.WindowsAzure.Storage/src directory #11 0x00007f721e6b24fb in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
at /usr/local/lib64/libcpprest.so.2.10
When will this library be releasing with linux support? #12 0x00007f721e728c57 in boost::asio::detail::posix_thread::func<(anonymous namespace)::threadpool_impl::add_thread()::{lambda()Windows Azure Storage Client Library for C++ v0.2.0 #1}>::run() () at /usr/local/lib64/libcpprest.so.2.10
Memory leak with shared DLL MFC projects #13 0x00007f721e6a3eff in boost_asio_detail_posix_thread_function () at /usr/local/lib64/libcpprest.so.2.10
Azure Storage Client Library for C++ v0.4.0 #14 0x00007f721ea07744 in start_thread () at /lib64/libpthread.so.0
Readme update for 0.4.0, breaking changes update, small fix for Linux co... #15 0x00007f7219e27aad in clone () at /lib64/libc.so.6
(gdb)
Can somebody please take a look at the above and see if the 1 millisecond timeout would cause issue in this scenario and worse blocking other normal operation(s)?
Thanks,
Yang
The text was updated successfully, but these errors were encountered: