Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API using "Connection: close" header results in infinite threads #6514

Closed
cubic3d opened this issue Aug 2, 2018 · 28 comments · Fixed by #6658
Closed

API using "Connection: close" header results in infinite threads #6514

cubic3d opened this issue Aug 2, 2018 · 28 comments · Fixed by #6658
Assignees
Labels
area/api REST API bug Something isn't working
Milestone

Comments

@cubic3d
Copy link

cubic3d commented Aug 2, 2018

Expected Behavior

Kill / limit threads after API requests has been handled.

Current Behavior

Multiple fast API requests create threads which will never be terminated, thousands of them consume server resources and cause stalls from time to time consuming 100% load of a CPU core. Happens here easily over night causing the server to stall and unable to handle any additional TCP connections.

Possible Solution

Steps to Reproduce (for bugs)

Simple to reproduce using Docker:
docker run --rm -p 80:80 -p 5665:5665 -h icinga2 --name icinga -t jordan/icinga2:latest
latest refers currently to r2.9.1-1 - unfortunately 2.9.1 is not explicitly tagged by jordan on Dockerhub.
Get the generated root account for the API from the container and create some traffic:
curl --insecure --user root:8b46e7f992c24208 -H 'Connection: close' https://localhost:5665/v1#[1-100000]
Run 3-4 of the curl in parallel to make things faster.

Context

The way we are using Icinga2 is kinda odd, to explain and make it short as possible: We have several master instances running on a host in pure LXC containers. Every instance responsible for a subset of "client icinga instances". We got about 1500 client servers connected via VPN to the host pushing their own container monitoring data (hosts and services - problems only) to a RabbitMQ instance. A custom "collector" consumes the RabbitMQ queue, aggregates the services into predefined categories on the master instance and pushes the updates via the Icinga2 API to the master instance.
The structure for every of the masters looks like:
Client-Icinga<->Publisher<->VPN<->RabbitMQ<->Collector<->Master-Icinga

Using this constellation with 2.7 was no problem at all. Starting with 2.8 the missing "Connection: close" led to problems with the API not responding properly on first request.

Yes, I know we need to restructure our monitoring, but as always, too much dependencies, unable overnight ;)

Your Environment

  • Version used (icinga2 --version):
  • Operating System and version:
  • Enabled features (icinga2 feature list):
  • Icinga Web 2 version and modules (System - About):
  • Config validation (icinga2 daemon -C):
  • If you run multiple Icinga 2 instances, the zones.conf file (or icinga2 object list --type Endpoint and icinga2 object list --type Zone) from all affected nodes.
icinga2 - The Icinga 2 network monitoring daemon (version: r2.9.1-1)

Copyright (c) 2012-2018 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Application information:
  Installation root: /usr
  Sysconf directory: /etc
  Run directory: /run
  Local state directory: /var
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

System information:
  Platform: Debian GNU/Linux
  Platform version: 9 (stretch)
  Kernel: Linux
  Kernel version: 4.17.6-1-ARCH
  Architecture: x86_64

Build information:
  Compiler: GNU 6.3.0
  Build host: 83e7f59f33f0
Disabled features: debuglog elasticsearch gelf graphite influxdb opentsdb perfdata statusdata syslog
Enabled features: api checker command compatlog ido-mysql livestatus mainlog notification
[2018-08-02 12:31:03 +0000] information/cli: Icinga application loader (version: r2.9.1-1)
[2018-08-02 12:31:03 +0000] information/cli: Loading configuration file(s).
[2018-08-02 12:31:03 +0000] information/ConfigItem: Committing config item(s).
[2018-08-02 12:31:03 +0000] information/ApiListener: My API identity: icinga2
[2018-08-02 12:31:03 +0000] warning/ApplyRule: Apply rule 'ssh' (in /etc/icinga2/conf.d/services.conf: 47:1-47:19) for type 'Service' does not match anywhere!
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 11 Services.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 3 ServiceGroups.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 LivestatusListener.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 ScheduledDowntime.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 2 HostGroups.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 FileLogger.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 NotificationComponent.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 2 NotificationCommands.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 12 Notifications.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 IcingaApplication.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 Host.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 ApiListener.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 Downtime.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 CheckerComponent.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 3 Zones.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 Endpoint.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 2 ApiUsers.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 UserGroup.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 CompatLogger.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 IdoMysqlConnection.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 213 CheckCommands.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 3 TimePeriods.
[2018-08-02 12:31:03 +0000] information/ConfigItem: Instantiated 1 User.
[2018-08-02 12:31:03 +0000] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2018-08-02 12:31:03 +0000] information/cli: Finished validating the configuration file(s).
@neubi4
Copy link
Contributor

neubi4 commented Sep 19, 2018

I can confirm the behavior, we are also seeing this.

We have a Server running 2 Icinga2 instances in docker containers, but Icinga2 hat around 16.000 threads open at the moment, sometimes hitting the nproc ulimit.

I can also confirm that sometimes we see threads with 100% cpu time.

We are also running r2.9.1-1, but with Centos 7 as the host and image version.

I did not test if the header is the problem, but since we are doing readiness and liveness checks with an authorized user to the Icinga2 api port in kubernetes, and kubernetes seems to send connection: close in this requests, this should be the problem.

@dnsmichi, could you have a look at this please?

@dnsmichi
Copy link
Contributor

One problem I do see is that we don't handle the Connection: close header with HTTP/1.1 requests. As such, the client needs to close the connection first, or the liveness checks on the server hit a timeout and close the connection.

@dnsmichi dnsmichi added bug Something isn't working area/api REST API labels Sep 21, 2018
@rkno82
Copy link

rkno82 commented Oct 5, 2018

Is that an security issue, because it can abused for DoS-attacks?

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 8, 2018

You could also just leave the connection open on the client, and wait for the server to close the connection, this puts more (thread) load on the server too. Either way, you shouldn't expose the REST API to the public domain and filter away HTTP requests (e.g. with a proxy).

For the problem itself, I am looking for a fix (vacation came in between).

@dnsmichi dnsmichi self-assigned this Oct 8, 2018
@dnsmichi dnsmichi added this to the 2.10.0 milestone Oct 8, 2018
@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 8, 2018

The problem is not the header itself, that's respected and triggered. The problem lies within Disconnect() which destroys the httpserverconnection object later, which holds a reference to a Workqueue. The destructor waits for the WorkQueue having pending tasks inside, threads stall with WorkQueue::Join().

thread #110
  frame #0: 0x00007fff6886fa16 libsystem_kernel.dylib`__psynch_cvwait + 10
  frame #1: 0x00007fff68a38589 libsystem_pthread.dylib`_pthread_cond_wait + 732
  frame #2: 0x00000001002e4fa3 icinga2`boost::condition_variable::wait(this=0x0000000104023638, m=0x0000700012fa0408) at condition_variable.hpp:85
  frame #3: 0x00000001003acc84 icinga2`icinga::WorkQueue::Join(this=0x00000001040234f0, stop=true) at workqueue.cpp:130
  frame #4: 0x00000001003aca05 icinga2`icinga::WorkQueue::~WorkQueue(this=0x00000001040234f0) at workqueue.cpp:51
  frame #5: 0x00000001003ace25 icinga2`icinga::WorkQueue::~WorkQueue(this=0x00000001040234f0) at workqueue.cpp:48
  frame #6: 0x0000000100635065 icinga2`icinga::HttpServerConnection::~HttpServerConnection(this=0x0000000104023400) at httpserverconnection.hpp:38
  frame #7: 0x000000010062dc05 icinga2`icinga::HttpServerConnection::~HttpServerConnection(this=0x0000000104023400) at httpserverconnection.hpp:38
  frame #8: 0x000000010062dc29 icinga2`icinga::HttpServerConnection::~HttpServerConnection(this=0x0000000104023400) at httpserverconnection.hpp:38
  frame #9: 0x000000010025b710 icinga2`icinga::intrusive_ptr_release(object=0x0000000104023400) at object.cpp:284
  frame #10: 0x000000010054368a icinga2`boost::intrusive_ptr<icinga::HttpServerConnection>::~intrusive_ptr(this=0x0000000106222558) at intrusive_ptr.hpp:98
  frame #11: 0x00000001004fb555 icinga2`boost::intrusive_ptr<icinga::HttpServerConnection>::~intrusive_ptr(this=0x0000000106222558) at intrusive_ptr.hpp:97
  frame #12: 0x000000010062dcf5 icinga2`std::__1::__tuple_leaf<0ul, boost::intrusive_ptr<icinga::HttpServerConnection>, false>::~__tuple_leaf(this=0x0000000106222558) at tuple:170

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

I think I've found something. Note: I am testing with pre 2.10 and the dynamic thread pool, not 2.9.x.

Analysis

HttpResponse->Finish() calls m_Stream->Shutdown() early, if either

  • HTTP/1.0
  • Header Connection: Close

is sent. This breaks further stream handling, assuming that m_Stream->Shutdown() may be called later again in HttpServerConnection handlers. This essentially doesn't close the socket, but changes the polling events and signals the shutdown to a different thread freeing the lock. Highly likely this locks then, even if the stack trace says otherwise.

The stack trace itself comes from the WQ where the current task is cleared and there's an attempt to free all remaining resources, i.e. m_RequestQueue which holds a reference to HttpServerConnection. The destructor just hangs in there with a circular reference.

(lldb) up
frame #1: 0x00007fff68a38589 libsystem_pthread.dylib`_pthread_cond_wait + 732
libsystem_pthread.dylib`_pthread_cond_wait:
    0x7fff68a38589 <+732>: addq   $0x10, %rsp
    0x7fff68a3858d <+736>: movl   %eax, %ebx
    0x7fff68a3858f <+738>: movq   %gs:0x0, %rdi
    0x7fff68a38598 <+747>: movl   %r14d, %esi
(lldb)
frame #2: 0x00000001002e4f13 icinga2`boost::condition_variable::wait(this=0x0000000103866038, m=0x0000700011aa6408) at condition_variable.hpp:85
   82  	            detail::interruption_checker check_for_interruption(&internal_mutex,&cond);
   83  	            pthread_mutex_t* the_mutex = &internal_mutex;
   84  	            guard.activate(m);
-> 85  	            res = pthread_cond_wait(&cond,the_mutex);
   86  	            check_for_interruption.unlock_if_locked();
   87  	            guard.deactivate();
   88  	#else
(lldb)
frame #3: 0x00000001003acbf4 icinga2`icinga::WorkQueue::Join(this=0x0000000103865ef0, stop=true) at workqueue.cpp:130
   127 		boost::mutex::scoped_lock lock(m_Mutex);
   128
   129 		while (m_Processing || !m_Tasks.empty())
-> 130 			m_CVStarved.wait(lock);
   131
   132 		if (stop) {
   133 			m_Stopped = true;
(lldb)
frame #4: 0x00000001003ac975 icinga2`icinga::WorkQueue::~WorkQueue(this=0x0000000103865ef0) at workqueue.cpp:51
   48  	{
   49  		m_StatusTimer->Stop(true);
   50
-> 51  		Join(true);
   52  	}
   53
   54  	void WorkQueue::SetName(const String& name)
(lldb)
frame #5: 0x00000001003acd95 icinga2`icinga::WorkQueue::~WorkQueue(this=0x0000000103865ef0) at workqueue.cpp:48
   45  	}
   46
   47  	WorkQueue::~WorkQueue()
-> 48  	{
   49  		m_StatusTimer->Stop(true);
   50
   51  		Join(true);
(lldb)
frame #6: 0x0000000100635025 icinga2`icinga::HttpServerConnection::~HttpServerConnection(this=0x0000000103865e00) at httpserverconnection.hpp:38
   35  	 *
   36  	 * @ingroup remote
   37  	 */
-> 38  	class HttpServerConnection final : public Object
   39  	{
   40  	public:
   41  		DECLARE_PTR_TYPEDEFS(HttpServerConnection);
(lldb)
frame #7: 0x000000010062dbc5 icinga2`icinga::HttpServerConnection::~HttpServerConnection(this=0x0000000103865e00) at httpserverconnection.hpp:38
   35  	 *
   36  	 * @ingroup remote
   37  	 */
-> 38  	class HttpServerConnection final : public Object
   39  	{
   40  	public:
   41  		DECLARE_PTR_TYPEDEFS(HttpServerConnection);
(lldb)
frame #8: 0x000000010062dbe9 icinga2`icinga::HttpServerConnection::~HttpServerConnection(this=0x0000000103865e00) at httpserverconnection.hpp:38
   35  	 *
   36  	 * @ingroup remote
   37  	 */
-> 38  	class HttpServerConnection final : public Object
   39  	{
   40  	public:
   41  		DECLARE_PTR_TYPEDEFS(HttpServerConnection);
(lldb)
frame #9: 0x000000010025b680 icinga2`icinga::intrusive_ptr_release(object=0x0000000103865e00) at object.cpp:284
   281 			TypeRemoveObject(object);
   282 	#endif /* I2_LEAK_DEBUG */
   283
-> 284 			delete object;
   285 		}
   286 	}
   287
(lldb)
frame #10: 0x00000001005435fa icinga2`boost::intrusive_ptr<icinga::HttpServerConnection>::~intrusive_ptr(this=0x00000001044c2ee8) at intrusive_ptr.hpp:98
   95
   96  	    ~intrusive_ptr()
   97  	    {
-> 98  	        if( px != 0 ) intrusive_ptr_release( px );
   99  	    }
   100
   101 	#if !defined(BOOST_NO_MEMBER_TEMPLATES) || defined(BOOST_MSVC6_MEMBER_TEMPLATES)
(lldb)
frame #11: 0x00000001004fb4c5 icinga2`boost::intrusive_ptr<icinga::HttpServerConnection>::~intrusive_ptr(this=0x00000001044c2ee8) at intrusive_ptr.hpp:97
   94  	    }
   95
   96  	    ~intrusive_ptr()
-> 97  	    {
   98  	        if( px != 0 ) intrusive_ptr_release( px );
   99  	    }
   100
(lldb)
frame #12: 0x000000010062dcb5 icinga2`std::__1::__tuple_leaf<0ul, boost::intrusive_ptr<icinga::HttpServerConnection>, false>::~__tuple_leaf(this=0x00000001044c2ee8) at tuple:170
   167 	}
   168
   169 	template <size_t _Ip, class _Hp, bool>
-> 170 	class __tuple_leaf
   171 	{
   172 	    _Hp __value_;
   173
(lldb)
frame #13: 0x000000010062de01 icinga2`std::__1::__tuple_impl<std::__1::__tuple_indices<0ul, 1ul, 2ul, 3ul>, boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest, icinga::HttpResponse, boost::intrusive_ptr<icinga::ApiUser> >::~__tuple_impl(this=0x00000001044c2ee8) at tuple:369
   366 	template<class _Indx, class ..._Tp> struct __tuple_impl;
   367
   368 	template<size_t ..._Indx, class ..._Tp>
-> 369 	struct _LIBCPP_DECLSPEC_EMPTY_BASES __tuple_impl<__tuple_indices<_Indx...>, _Tp...>
   370 	    : public __tuple_leaf<_Indx, _Tp>...
   371 	{
   372 	    _LIBCPP_INLINE_VISIBILITY
(lldb)
frame #14: 0x000000010062dda5 icinga2`std::__1::__tuple_impl<std::__1::__tuple_indices<0ul, 1ul, 2ul, 3ul>, boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest, icinga::HttpResponse, boost::intrusive_ptr<icinga::ApiUser> >::~__tuple_impl(this=0x00000001044c2ee8) at tuple:369
   366 	template<class _Indx, class ..._Tp> struct __tuple_impl;
   367
   368 	template<size_t ..._Indx, class ..._Tp>
-> 369 	struct _LIBCPP_DECLSPEC_EMPTY_BASES __tuple_impl<__tuple_indices<_Indx...>, _Tp...>
   370 	    : public __tuple_leaf<_Indx, _Tp>...
   371 	{
   372 	    _LIBCPP_INLINE_VISIBILITY
(lldb)
frame #15: 0x000000010062dd85 icinga2`std::__1::tuple<boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest, icinga::HttpResponse, boost::intrusive_ptr<icinga::ApiUser> >::~tuple(this=0x00000001044c2ee8 size=4) at tuple:474
   471
   472
   473 	template <class ..._Tp>
-> 474 	class _LIBCPP_TEMPLATE_VIS tuple
   475 	{
   476 	    typedef __tuple_impl<typename __make_tuple_indices<sizeof...(_Tp)>::type, _Tp...> _BaseT;
   477
(lldb)
frame #16: 0x000000010062dd65 icinga2`std::__1::tuple<boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest, icinga::HttpResponse, boost::intrusive_ptr<icinga::ApiUser> >::~tuple(this=0x00000001044c2ee8 size=4) at tuple:474
   471
   472
   473 	template <class ..._Tp>
-> 474 	class _LIBCPP_TEMPLATE_VIS tuple
   475 	{
   476 	    typedef __tuple_impl<typename __make_tuple_indices<sizeof...(_Tp)>::type, _Tp...> _BaseT;
   477
(lldb)
frame #17: 0x000000010062dd49 icinga2`std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&>::~__bind(this=0x00000001044c2ed8) at functional:2220
   2217	}
   2218
   2219	template<class _Fp, class ..._BoundArgs>
-> 2220	class __bind
   2221	    : public __weak_result_type<typename decay<_Fp>::type>
   2222	{
   2223	protected:
(lldb)
frame #18: 0x000000010062d955 icinga2`std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&>::~__bind(this=0x00000001044c2ed8) at functional:2220
   2217	}
   2218
   2219	template<class _Fp, class ..._BoundArgs>
-> 2220	class __bind
   2221	    : public __weak_result_type<typename decay<_Fp>::type>
   2222	{
   2223	protected:
(lldb)
frame #19: 0x0000000100631cd5 icinga2`std::__1::__compressed_pair_elem<std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&>, 0, false>::~__compressed_pair_elem(this=0x00000001044c2ed8) at memory:2083
   2080	template <class _Tp, int _Idx,
   2081	          bool _CanBeEmptyBase =
   2082	              is_empty<_Tp>::value && !__libcpp_is_final<_Tp>::value>
-> 2083	struct __compressed_pair_elem {
   2084	  typedef _Tp _ParamT;
   2085	  typedef _Tp& reference;
   2086	  typedef const _Tp& const_reference;
(lldb)
frame #20: 0x00000001006324a5 icinga2`std::__1::__compressed_pair<std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&>, std::__1::allocator<std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&> > >::~__compressed_pair(this=0x00000001044c2ed8) at memory:2157
   2154	struct __second_tag {};
   2155
   2156	template <class _T1, class _T2>
-> 2157	class __compressed_pair : private __compressed_pair_elem<_T1, 0>,
   2158	                          private __compressed_pair_elem<_T2, 1> {
   2159	  typedef __compressed_pair_elem<_T1, 0> _Base1;
   2160	  typedef __compressed_pair_elem<_T2, 1> _Base2;
(lldb)
frame #21: 0x0000000100632485 icinga2`std::__1::__compressed_pair<std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&>, std::__1::allocator<std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&> > >::~__compressed_pair(this=0x00000001044c2ed8) at memory:2157
   2154	struct __second_tag {};
   2155
   2156	template <class _T1, class _T2>
-> 2157	class __compressed_pair : private __compressed_pair_elem<_T1, 0>,
   2158	                          private __compressed_pair_elem<_T2, 1> {
   2159	  typedef __compressed_pair_elem<_T1, 0> _Base1;
   2160	  typedef __compressed_pair_elem<_T2, 1> _Base2;
(lldb)
frame #22: 0x0000000100631bc6 icinga2`std::__1::__function::__func<std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&>, std::__1::allocator<std::__1::__bind<void (icinga::HttpServerConnection::*)(icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser> const&), boost::intrusive_ptr<icinga::HttpServerConnection>, icinga::HttpRequest&, icinga::HttpResponse&, boost::intrusive_ptr<icinga::ApiUser>&> >, void ()>::destroy_deallocate(this=0x00000001044c2ed0) at functional:1553
   1550	    typedef allocator_traits<_Alloc> __alloc_traits;
   1551	    typedef typename __rebind_alloc_helper<__alloc_traits, __func>::type _Ap;
   1552	    _Ap __a(__f_.second());
-> 1553	    __f_.~__compressed_pair<_Fp, _Alloc>();
   1554	    __a.deallocate(this, 1);
   1555	}
   1556
(lldb)
frame #23: 0x0000000100213c92 icinga2`std::__1::function<void ()>::operator=(this=0x0000000000000000, (null)=0x0000000000000000) at functional:1846
   1843	    if ((void *)__t == &__buf_)
   1844	        __t->destroy();
   1845	    else if (__t)
-> 1846	        __t->destroy_deallocate();
   1847	    return *this;
   1848	}
   1849
(lldb)
frame #24: 0x0000000100213b67 icinga2`std::__1::function<void ()>::operator=(this=0x0000000000000000, __f=0x0000000000000000)>&&) at functional:1821
   1818	function<_Rp(_ArgTypes...)>&
   1819	function<_Rp(_ArgTypes...)>::operator=(function&& __f) _NOEXCEPT
   1820	{
-> 1821	    *this = nullptr;
   1822	    if (__f.__f_ == 0)
   1823	        __f_ = 0;
   1824	    else if ((void *)__f.__f_ == &__f.__buf_)
(lldb)
frame #25: 0x00000001003af1ff icinga2`icinga::Task::operator=(this=0x0000700011aa6d70, (null)=0x0000700011aa6d30) at workqueue.hpp:46
   43
   44  	using TaskFunction = std::function<void ()>;
   45
-> 46  	struct Task
   47  	{
   48  		Task() = default;
   49
(lldb)
frame #26: 0x00000001003ae26b icinga2`icinga::WorkQueue::WorkerThreadProc(this=0x0000000103865ef0) at workqueue.cpp:299
   296 			RunTaskFunction(task.Function);
   297
   298 			/* clear the task so whatever other resources it holds are released _before_ we re-acquire the mutex */
-> 299 			task = Task();
   300
   301 			IncreaseTaskCount();
   302
(lldb)

dnsmichi pushed a commit that referenced this issue Oct 9, 2018
Previously this was inside the debug log, with the
new socket printers we can enhance checking for proper
connects and disconnects.

refs #6514
@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

Solution?

When a request is finished, we need to explicitly call Disconnect() to cleanup everything - if requested by protocol or header. This cannot happen inside HttpResponse itself but requires a call inside the HttpServerConnection data handling. Prior to calling Disconnect, we should ensure to shutdown the stream to remove pending polling just once.

Tests

Thread-watch on macOS:

watch -n 1 -c 'for p in $(pidof icinga2); do NUM=`ps M $p | wc -l` && echo $((NUM-1)); done'

Requests with HTTP/1.0:

for i in {1..10000}; do curl --http1.0 -k -s -u root:icinga https://localhost:5665/v1 && echo -e "\n\n\n\n\n"; done

Requests with Connection: Close:

for i in {1..10000}; do curl -H 'Connection: close' -k -s -u root:icinga https://localhost:5665/v1 && echo -e "\n\n\n\n\n"; done

Requests with HTTP/1.1:

for i in {1..10000}; do curl --http1.0 -k -s -u root:icinga https://localhost:5665/v1 && echo -e "\n\n\n\n\n"; done

Results

Typically my Macbook spawns 31 threads for Icinga 2.

HTTP/1.1

When starting the curl requests, this goes up from 50 to 75, but then drops to 71.

HTTP/1.0

Requests cause 70-80 threads, you can watch them go down and up a little. Good times drop to just 60 threads.

Connection: Close

Goes up to 63 threads, down to 57, up to 80, drops to 49.

Clients

Local dashing-icinga2 instance which connects to the local icinga2 daemon. Since the scheduler doesn't do anything in a 10 second interval, threads stay at 31.

Firing another bunch of close requests doesn't harm the behaviour, peaks at 60 but still processes the dashing GET request.

screen shot 2018-10-09 at 13 21 26

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

Note: This depends on #6633 and as such we cannot backport this, the PR targets 2.10 only.

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

@neubi4 please test this once merged/inside snapshot packages. This "costs" you an entry on https://www.icinga.com/about/customers/#shareyourstory :-)

@cubic3d
Copy link
Author

cubic3d commented Oct 9, 2018

Thank you for fixing this @dnsmichi. I'll test this as well. In the meantime we had to downgrade, since we are using the API on the client side as well and multiple customer servers went down.

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

@neubi4
Copy link
Contributor

neubi4 commented Oct 9, 2018

@dnsmichi deal, and thanks for all your hard work :)

I tried to test the snapshot packages using the repo (https://packages.icinga.com/epel/7/snapshot/x86_64/), but could not get it to work in our dev environment. Our two masters fail to connect together with this messages:

[2018-10-09 16:12:59 +0200] information/ApiListener: Reconnecting to endpoint 'master2-local-domon.testing.mms-at-work.de' via host 'master2-local-domon.testing.mms-at-work.de' and port '5665'
[2018-10-09 16:13:02 +0200] critical/ApiListener: Client TLS handshake failed (from [172.22.0.6]:50682): Error: Timeout was reached (10) during TLS handshake.

        (0) Handling new API client connection

Context:
        (0) Handling new API client connection

[2018-10-09 16:13:09 +0200] critical/ApiListener: Client TLS handshake failed (to [172.22.0.6]:5665): Error: Timeout was reached (10) during TLS handshake.

        (0) Handling new API client connection

Context:
        (0) Handling new API client connection

It seems it fails to handle the client connections, i see the same messages when director tries to talk to the api.

Same config works without problems with 2.9.1.

icinga2 --version

icinga2 - The Icinga 2 network monitoring daemon (version: v2.9.2-227-g58cfc39)

Copyright (c) 2012-2018 Icinga Development Team (https://www.icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: CentOS Linux
  Platform version: 7 (Core)
  Kernel: Linux
  Kernel version: 3.10.0-862.14.4.el7.x86_64
  Architecture: x86_64

Build information:
  Compiler: GNU 4.8.5
  Build host: unknown

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

icinga2 feature list

Disabled features: command compatlog debuglog elasticsearch gelf graphite livestatus opentsdb perfdata statusdata syslog
Enabled features: api checker ido-mysql influxdb mainlog notification

/etc/icinga2/zones.conf

object Endpoint "master1-local-domon.testing.mms-at-work.de"  {
  host = "master1-local-domon.testing.mms-at-work.de"
}

object Endpoint "master2-local-domon.testing.mms-at-work.de"  {
  host = "master2-local-domon.testing.mms-at-work.de"
}

object Zone "master"  {
  endpoints = [ "master1-local-domon.testing.mms-at-work.de", "master2-local-domon.testing.mms-at-work.de", ]
}

object Zone "director-global"  {
  global = true
}

object Zone "global-templates"  {
  global = true
}

Log

[2018-10-09 15:58:07 +0200] information/FileLogger: 'main-log' started.
[2018-10-09 15:58:07 +0200] information/ApiListener: 'api' started.
[2018-10-09 15:58:07 +0200] information/ApiListener: Started new listener on '[0.0.0.0]:5665'
[2018-10-09 15:58:07 +0200] information/InfluxdbWriter: 'influxdb' started.
[2018-10-09 15:58:07 +0200] information/CheckerComponent: 'checker' started.
[2018-10-09 15:58:07 +0200] information/NotificationComponent: 'notification' started.
[2018-10-09 15:58:07 +0200] information/DbConnection: 'ido-mysql' started.
[2018-10-09 15:58:07 +0200] information/ConfigItem: Activated all objects.
[2018-10-09 15:58:07 +0200] information/ApiListener: Reconnecting to endpoint 'master2-local-domon.testing.mms-at-work.de' via host 'master2-local-domon.testing.mms-at-work.de' and port '5665'
[2018-10-09 15:58:07 +0200] information/cli: Closing console log.
[2018-10-09 15:58:07 +0200] critical/TcpSocket: getaddrinfo() failed with error code -2, "Name or service not known"
[2018-10-09 15:58:07 +0200] critical/ApiListener: Cannot connect to host 'master2-local-domon.testing.mms-at-work.de' on port '5665'
[2018-10-09 15:58:07 +0200] information/ApiListener: Finished reconnecting to endpoint 'master2-local-domon.testing.mms-at-work.de' via host 'master2-local-domon.testing.mms-at-work.de' and port '5665'
[2018-10-09 15:58:17 +0200] information/WorkQueue: #5 (InfluxdbWriter, influxdb) items: 0, rate: 0.0166667/s (1/min 1/5min 1/15min);
[2018-10-09 15:58:17 +0200] information/WorkQueue: #7 (ApiListener, SyncQueue) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-10-09 15:58:17 +0200] information/WorkQueue: #6 (ApiListener, RelayQueue) items: 0, rate: 0.133333/s (8/min 8/5min 8/15min);
[2018-10-09 15:58:17 +0200] information/WorkQueue: #8 (IdoMysqlConnection, ido-mysql) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-10-09 15:58:17 +0200] information/ApiListener: Reconnecting to endpoint 'master2-local-domon.testing.mms-at-work.de' via host 'master2-local-domon.testing.mms-at-work.de' and port '5665'
[2018-10-09 15:58:20 +0200] critical/ApiListener: Client TLS handshake failed (from [172.19.0.6]:43988): Error: Timeout was reached (10) during TLS handshake.

        (0) Handling new API client connection

Context:
        (0) Handling new API client connection

[2018-10-09 15:58:27 +0200] critical/ApiListener: Client TLS handshake failed (to [172.19.0.6]:5665): Error: Timeout was reached (10) during TLS handshake.

        (0) Handling new API client connection

Context:
        (0) Handling new API client connection

[2018-10-09 15:58:32 +0200] critical/ApiListener: Client TLS handshake failed (from [172.19.0.7]:52014): Error: Timeout was reached (10) during TLS handshake.

        (0) Handling new API client connection

Context:
        (0) Handling new API client connection

[2018-10-09 15:59:37 +0200] information/WorkQueue: #8 (IdoMysqlConnection, ido-mysql) items: 6, rate: 0.5/s (30/min 48/5min 48/15min); empty in 10 seconds
[2018-10-09 15:59:37 +0200] information/DbConnection: Resuming IDO connection: ido-mysql
[2018-10-09 15:59:37 +0200] information/IdoMysqlConnection: 'ido-mysql' resumed.
[2018-10-09 15:59:37 +0200] information/IdoMysqlConnection: MySQL IDO instance id: 1 (schema version: '1.14.3')
[2018-10-09 15:59:39 +0200] information/IdoMysqlConnection: Finished reconnecting to MySQL IDO database in 1.40411 second(s).

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

Hmmm, I've found and fixed something similar some minutes ago in git master, wait for the new builds with e6eb703

@dnsmichi dnsmichi added the needs feedback We'll only proceed once we hear from you again label Oct 9, 2018
@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

I've started my local HA master setup, TLS handshake works fine. Probably this happens either from the mentioned bug, or the connection is busy.

@dnsmichi dnsmichi reopened this Oct 9, 2018
@neubi4
Copy link
Contributor

neubi4 commented Oct 9, 2018

I've tried the new builds, it still happens.

A short test with curl shows that the connection to the is accepted, but hangs on transferring any data:

curl -kv https://master1-local-domon.testing.mms-at-work.de:5665/v1
* About to connect() to master1-local-domon.testing.mms-at-work.de port 5665 (#0)
*   Trying 172.19.0.5...
* Connected to master1-local-domon.testing.mms-at-work.de (172.19.0.5) port 5665 (#0)
* Initializing NSS with certpath: sql:/etc/pki/nssdb

This is all i get, waiting a few minutes does not change anything. I dont have any other configuration like agents, satellites or other nodes in the dev setup, so the connection should not be busy with something like that.

Interesting thing in my logs is that after start, both of the masters are logging the TLS timeout message above, every one the two messages with the connection attempt from and to the other master. After that, there is no attempt on any of the masters to connect to the other node over the next minutes (i waited 15m now).

I'm on vacation (or Überstundenabbau) the rest of the week, so i think the ealiest next test (running with debug log for example) i can pomise to do are on monday, but i will try to do it earlier.

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

Ah, this is the setup with daemonize/-d, isn't it? Can you attach to the running processes and generate a thread list please? I'm doing my best to not harm 2.10 this week, still I think we've found something here :)

https://www.icinga.com/docs/icinga2/snapshot/doc/21-development/#gdb-thread-list-from-running-process

@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

Note to self for tomorrow: Check #6630 and the apilistener thread pool whether the threads are gone after daemonize().

dnsmichi pushed a commit that referenced this issue Oct 9, 2018
This reverts commit c8dcd12.

We cannot do this during application startup, since Daemonize()
kills the threads again.

refs #6514
refs #6630
@dnsmichi
Copy link
Contributor

dnsmichi commented Oct 9, 2018

Ok, I'm sitting with my Macbook on the balcony, I can see it. With daemonize(), SocketEventEnginePoll threads aren't initialized. Hooray. I'm triggering snapshot builds for you, would be awesome if you could give it a whirl :)

@neubi4
Copy link
Contributor

neubi4 commented Oct 10, 2018

I just tested the snapshots repo, cluster works fine, thanks :). I will remeber to note that we are running with "-d" when opening the next issue.

Also testet the thread issue, with and without connection: close and with http 1.0, everything is fine, threads count is not going higher as it should be.

Only issue i see now is that the deployment after director kickstart is not working:

[2018-10-10 15:59:47 +0200] critical/config: Error: Invalid field access (for value of type 'Namespace'): 'host var overrides (Director)'
Location: in [stage]/zones.d/director-global/001-director-basics.conf: 13:7-13:39
[stage]/zones.d/director-global/001-director-basics.conf(11): 
[stage]/zones.d/director-global/001-director-basics.conf(12): const DirectorOverrideTemplate = "host var overrides (Director)"
[stage]/zones.d/director-global/001-director-basics.conf(13): if (! globals[DirectorOverrideTemplate]) {
                                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[stage]/zones.d/director-global/001-director-basics.conf(14):   const DirectorOverrideVars = "_override_servicevars"

but i guess its only director v1.5.1 being not compatible with the new namespace feature, so i think we can close the issue here.

@dnsmichi
Copy link
Contributor

Aaah, saves my day, thank you :-)

The Director needs this small patch for namespaces: Icinga/icingaweb2-module-director#1654 - I'll mark this in the release blog post plus extra changelog entry.

@dnsmichi
Copy link
Contributor

Some notes I found on my desk, for the archive.

img_2121

@dnsmichi
Copy link
Contributor

In terms of the namespace error with globals, here's a PR for 2.10 with #6669 since this also affects a global zone deployment.

@cubic3d
Copy link
Author

cubic3d commented Nov 19, 2018

@dnsmichi @neubi4 Sorry to bring that nasty piece up again, but it seems issue is only half fixed. Using the sequential curl requests with "Connection: close" header seems to work, but after firing up our API client which uses threaded job handling results in some errors again. If I don't send the "Connection: close" header, everything is fine (at least for 2.10.2, on some 2.8.x it was the exact opposite, that's why we changed the behaviour).

Here is the debug log:

[2018-11-19 10:10:32 +0100] information/WorkQueue: #36 (HttpServerConnection) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-19 10:10:33 +0100] notice/ThreadPool: Thread pool; current: 6; adjustment: 1
[2018-11-19 10:10:33 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 20; Pool utilization: 72.0427%
[2018-11-19 10:10:34 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:10:34 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:34 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:10:34 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:10:39 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:44 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:10:44 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:10:44 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:44 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:10:45 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00714326%
[2018-11-19 10:10:48 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 20; Pool utilization: 75.2167%
[2018-11-19 10:10:49 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:53540 (no client certificate)
[2018-11-19 10:10:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:53538 (no client certificate)
[2018-11-19 10:10:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:53532 (no client certificate)
[2018-11-19 10:10:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:53548 (no client certificate)
[2018-11-19 10:10:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:53540), user: [ REMOVED ])
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:10:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:53538), user: [ REMOVED ])
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:53532), user: [ REMOVED ])
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:10:52 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:10:52 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53540)
[2018-11-19 10:10:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:53548), user: [ REMOVED ])
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53548)
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53548)
[2018-11-19 10:10:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53538)
[2018-11-19 10:10:52 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:10:52 +0100] notice/ThreadPool: Thread pool; current: 7; adjustment: -1
[2018-11-19 10:10:52 +0100] notice/ThreadPool: Thread pool; current: 7; adjustment: -2
[2018-11-19 10:10:52 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: -2
[2018-11-19 10:10:53 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: -2
[2018-11-19 10:10:53 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: -2
[2018-11-19 10:10:54 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: -2
[2018-11-19 10:10:54 +0100] notice/ThreadPool: Thread pool; current: 6; adjustment: 1
[2018-11-19 10:10:54 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:10:54 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:10:54 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:10:54 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:10:55 +0100] notice/ThreadPool: Thread pool; current: 5; adjustment: 1
[2018-11-19 10:10:58 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: 1
[2018-11-19 10:10:59 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:00 +0100] notice/ThreadPool: Thread pool; current: 7; adjustment: 1
[2018-11-19 10:11:00 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 4; Pool utilization: 0.00333945%
[2018-11-19 10:11:01 +0100] notice/ThreadPool: Thread pool; current: 10; adjustment: 1
[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53528): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53526): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53530): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53534): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53536): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53542): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53544): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53546): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:53550): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:02 +0100] information/WorkQueue: #41 (HttpServerConnection) items: 0, rate: 0.0166667/s (1/min 1/5min 1/15min);
[2018-11-19 10:11:02 +0100] information/WorkQueue: #42 (HttpServerConnection) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-19 10:11:03 +0100] information/HttpServerConnection: No messages for Http connection have been received in the last 10 seconds.
[2018-11-19 10:11:03 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53532)
[2018-11-19 10:11:03 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:53532)
[2018-11-19 10:11:03 +0100] notice/ThreadPool: Thread pool; current: 6; adjustment: 1
[2018-11-19 10:11:03 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 33; Pool utilization: 73.1336%
[2018-11-19 10:11:04 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:04 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:04 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:04 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:08 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: 1
[2018-11-19 10:11:11 +0100] notice/ThreadPool: Thread pool; current: 1; adjustment: 2
[2018-11-19 10:11:11 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:11 +0100] notice/ThreadPool: Thread pool; current: 3; adjustment: -1
[2018-11-19 10:11:14 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:14 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:14 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:15 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 11.4771%
[2018-11-19 10:11:16 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:18 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 35; Pool utilization: 70.8839%
[2018-11-19 10:11:21 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:55174 (no client certificate)
[2018-11-19 10:11:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:55176 (no client certificate)
[2018-11-19 10:11:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:55190 (no client certificate)
[2018-11-19 10:11:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result?host=[ REMOVED ] (from [127.0.0.1]:55174), user: [ REMOVED ])
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:11:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:55192 (no client certificate)
[2018-11-19 10:11:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:55176), user: [ REMOVED ])
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:11:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:55190), user: [ REMOVED ])
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:55190)
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:55190)
[2018-11-19 10:11:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:55192), user: [ REMOVED ])
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:11:22 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:55176)
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:55192)
[2018-11-19 10:11:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:22 +0100] notice/ThreadPool: Thread pool; current: 13; adjustment: -2
[2018-11-19 10:11:22 +0100] notice/ThreadPool: Thread pool; current: 12; adjustment: -2
[2018-11-19 10:11:22 +0100] notice/ThreadPool: Thread pool; current: 10; adjustment: -2
[2018-11-19 10:11:23 +0100] notice/ThreadPool: Thread pool; current: 10; adjustment: -2
[2018-11-19 10:11:24 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:24 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:24 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:26 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:26 +0100] notice/ThreadPool: Thread pool; current: 9; adjustment: 1
[2018-11-19 10:11:28 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: 1
[2018-11-19 10:11:28 +0100] notice/ThreadPool: Thread pool; current: 8; adjustment: 1
[2018-11-19 10:11:29 +0100] notice/ThreadPool: Thread pool; current: 10; adjustment: 1
[2018-11-19 10:11:30 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.4596%
[2018-11-19 10:11:31 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55172): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55178): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55180): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55182): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55184): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55186): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55188): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55194): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:55196): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:11:32 +0100] information/WorkQueue: #43 (HttpServerConnection) items: 0, rate: 0.0166667/s (1/min 1/5min 1/15min);
[2018-11-19 10:11:32 +0100] information/WorkQueue: #45 (HttpServerConnection) items: 0, rate:  0/s (0/min 0/5min 0/15min);
[2018-11-19 10:11:33 +0100] notice/ThreadPool: Thread pool; current: 12; adjustment: 1
[2018-11-19 10:11:33 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 46; Pool utilization: 74.4609%
[2018-11-19 10:11:34 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:34 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:34 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:36 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:36 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: 1
[2018-11-19 10:11:41 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:44 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:44 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:44 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:45 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5044%
[2018-11-19 10:11:46 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:48 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 47; Pool utilization: 74.652%
[2018-11-19 10:11:51 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:52 +0100] information/ApiListener: New client connection from [127.0.0.1]:57286 (no client certificate)
[2018-11-19 10:11:52 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:11:52 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:57286), user: [ REMOVED ])
[2018-11-19 10:11:52 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:52 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:11:52 +0100] notice/ApiListener: Relaying 'event::SetNextCheck' message
[2018-11-19 10:11:52 +0100] notice/ApiListener: Relaying 'event::CheckResult' message
[2018-11-19 10:11:52 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:57286)
[2018-11-19 10:11:52 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:11:52 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: -2
[2018-11-19 10:11:52 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: -2
[2018-11-19 10:11:54 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:11:54 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:11:54 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:11:56 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:11:56 +0100] notice/ThreadPool: Thread pool; current: 14; adjustment: 1
[2018-11-19 10:11:56 +0100] notice/ThreadPool: Thread pool; current: 14; adjustment: 1
[2018-11-19 10:11:59 +0100] notice/ThreadPool: Thread pool; current: 15; adjustment: 1
[2018-11-19 10:11:59 +0100] notice/ThreadPool: Thread pool; current: 15; adjustment: 1
[2018-11-19 10:12:00 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5037%
[2018-11-19 10:12:01 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57278): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57280): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57276): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57282): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57284): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57288): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57290): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57292): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57294): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57296): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57300): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:02 +0100] critical/ApiListener: Client TLS handshake failed (from [127.0.0.1]:57298): Error: Timeout was reached (10) during TLS handshake.

	(0) Handling new API client connection

Context:
	(0) Handling new API client connection

[2018-11-19 10:12:03 +0100] notice/ThreadPool: Thread pool; current: 11; adjustment: 1
[2018-11-19 10:12:03 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 60; Pool utilization: 76.3517%
[2018-11-19 10:12:04 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:12:04 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:12:04 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:12:06 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:06 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:06 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:07 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:07 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:07 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:07 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:08 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:08 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:08 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:08 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:09 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:09 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:09 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:09 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:10 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:10 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:10 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:10 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:11 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:11 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:11 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:11 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:11 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:12 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:12 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:12 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:12 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:13 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:13 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:13 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:13 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:14 +0100] notice/DbConnection: Updating programstatus table.
[2018-11-19 10:12:14 +0100] notice/ApiListener: Current zone master: [ REMOVED ]
[2018-11-19 10:12:14 +0100] notice/ApiListener: Connected endpoints: 
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Pool #1: Pending tasks: 0; Average latency: 0ms; Threads: 5; Pool utilization: 12.5047%
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:15 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:16 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:16 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:16 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:16 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:16 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:17 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:17 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:17 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:17 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:18 +0100] notice/ThreadPool: Pool #2: Pending tasks: 0; Average latency: 0ms; Threads: 60; Pool utilization: 78.0499%
[2018-11-19 10:12:19 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:19 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:19 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:19 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:20 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:20 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:20 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:20 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
^C[2018-11-19 10:12:21 +0100] information/Application: Received request to shut down.
[2018-11-19 10:12:21 +0100] notice/CheckerComponent: Pending checkables: 0; Idle checkables: 52; Checks/s: 0
[2018-11-19 10:12:21 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:21 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:21 +0100] information/Application: Shutting down...
[2018-11-19 10:12:21 +0100] information/ApiListener: 'api' stopped.
[2018-11-19 10:12:21 +0100] information/CheckerComponent: 'checker' stopped.
[2018-11-19 10:12:21 +0100] information/ExternalCommandListener: 'command' stopped.
[2018-11-19 10:12:21 +0100] information/IdoPgsqlConnection: 'ido-pgsql' paused.
[2018-11-19 10:12:21 +0100] information/DbConnection: Pausing IDO connection: ido-pgsql
[2018-11-19 10:12:21 +0100] information/DbConnection: 'ido-pgsql' stopped.
[2018-11-19 10:12:21 +0100] information/ConfigObject: Dumping program state to file '/var/lib/icinga2/icinga2.state'
[2018-11-19 10:12:21 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:21 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:22 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:22 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:59916 (no client certificate)
[2018-11-19 10:12:22 +0100] information/ApiListener: New client connection from [127.0.0.1]:59932 (no client certificate)
[2018-11-19 10:12:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:12:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:59916), user: [ REMOVED ])
[2018-11-19 10:12:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:12:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:12:22 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:12:22 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:59932), user: [ REMOVED ])
[2018-11-19 10:12:22 +0100] notice/WorkQueue: Spawning WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:12:22 +0100] notice/ApiActionHandler: Running action process-check-result
[2018-11-19 10:12:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:59916)
[2018-11-19 10:12:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:12:22 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:59932)
[2018-11-19 10:12:22 +0100] notice/WorkQueue: Stopped WorkQueue threads for 'HttpServerConnection'
[2018-11-19 10:12:22 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:22 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:23 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 1
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:24 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:25 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:26 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 2
[2018-11-19 10:12:27 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:28 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:29 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
[2018-11-19 10:12:30 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
[2018-11-19 10:12:30 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 3
[2018-11-19 10:12:30 +0100] notice/ThreadPool: Thread pool; current: 16; adjustment: 4
^C

Sometimes there are two entries for a disconnect with the same source port:

[2018-11-19 10:29:00 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:43340), user: [ REMOVED ])
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:43340)
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:43340)
[2018-11-19 10:29:00 +0100] notice/ApiListener: New HTTP client
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: Request: POST /v1/actions/process-check-result (from [127.0.0.1]:43344), user: [ REMOVED ])
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:43344)
[2018-11-19 10:29:00 +0100] information/HttpServerConnection: HTTP client disconnected (from [127.0.0.1]:43344)

Log from client side:

2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
2018/11/19 10:12:02 Post https://localhost:5665/v1/actions/process-check-result: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

I see some strange behaviour there:

  • The thread pool seems to be rising up to 16 and then stalling completely.
  • After first TLS handshake fails, Icinga gets in an unstable state. Every subsequent HTTP request will timeout, doesn't matter if authorised or unauthorised with or without the "Connection: close" header (tested with curl).
  • On a SIGHUP Icinga is not able to shutdown in unstable state (see logs).
  • Looks like a race condition somewhere

Unfortunately the client is not opensource, but the only interaction with Icinga it does is pushing check results and get services from a host by API in worker threads (written in GO using http.Client).

Environment:

icinga2 - The Icinga 2 network monitoring daemon (version: r2.10.2-1)

Copyright (c) 2012-2018 Icinga Development Team (https://icinga.com/)
License GPLv2+: GNU GPL version 2 or later <http://gnu.org/licenses/gpl2.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

System information:
  Platform: Ubuntu
  Platform version: 16.04.5 LTS (Xenial Xerus)
  Kernel: Linux
  Kernel version: 4.15.0-36-generic
  Architecture: x86_64

Build information:
  Compiler: GNU 5.3.1
  Build host: 409e1113863b

Application information:

General paths:
  Config directory: /etc/icinga2
  Data directory: /var/lib/icinga2
  Log directory: /var/log/icinga2
  Cache directory: /var/cache/icinga2
  Spool directory: /var/spool/icinga2
  Run directory: /run/icinga2

Old paths (deprecated):
  Installation root: /usr
  Sysconf directory: /etc
  Run directory (base): /run
  Local state directory: /var

Internal paths:
  Package data directory: /usr/share/icinga2
  State path: /var/lib/icinga2/icinga2.state
  Modified attributes path: /var/lib/icinga2/modified-attributes.conf
  Objects path: /var/cache/icinga2/icinga2.debug
  Vars path: /var/cache/icinga2/icinga2.vars
  PID path: /run/icinga2/icinga2.pid

Enabled features: api checker command ido-pgsql mainlog

[2018-11-19 11:13:29 +0100] information/cli: Icinga application loader (version: r2.10.2-1)
[2018-11-19 11:13:29 +0100] information/cli: Loading configuration file(s).
[2018-11-19 11:13:29 +0100] information/ConfigItem: Committing config item(s).
[2018-11-19 11:13:29 +0100] information/ApiListener: My API identity: [ REMOVED ]
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 48 Services.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 IcingaApplication.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 4 Hosts.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 FileLogger.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 2 NotificationCommands.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 ApiListener.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 CheckerComponent.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 3 Zones.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 ExternalCommandListener.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 Endpoint.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 2 ApiUsers.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 215 CheckCommands.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 IdoPgsqlConnection.
[2018-11-19 11:13:29 +0100] information/ConfigItem: Instantiated 1 TimePeriod.
[2018-11-19 11:13:29 +0100] information/ScriptGlobal: Dumping variables to file '/var/cache/icinga2/icinga2.vars'
[2018-11-19 11:13:29 +0100] information/cli: Finished validating the configuration file(s).

PS: Cannot reopen it, since you closed it ;)

@stevie-sy
Copy link
Contributor

in our configuration with Version 2.10.2 have the same Problems like @cubic3d. So if you want have a new issue pls tell us. So that we know where we should write what we find out.

@cubic3d
Copy link
Author

cubic3d commented Nov 23, 2018

@dnsmichi ping

@swegener
Copy link
Contributor

swegener commented Nov 23, 2018

I just debugged this issue with our icinga2 setup and my finding is, that with "Connection: close" it makes a difference if the client sends the headers and body in one go or not.
If the client sends them in one socket operation, everything is fine, because they will be put into the same TCP packet.
If it sends the headers and body in two separate operations, it will probably result in two TCP packets. Icinga seems to read and parse the header packet with the "Connection: close" entry, tries to close the connection, while there is still data, the body packet, pending.

@dnsmichi
Copy link
Contributor

This was an issue targeting 2.10.0 so it cannot be re-opened. Please move your findings in a new issue. Please do note that I'm working on other projects now, so best would be an escalation via support contract from an Icinga partner.

@cubic3d
Copy link
Author

cubic3d commented Nov 26, 2018

@stevie-sy @swegener opened #6799. Would you guys transfer your findings over there?

@dnsmichi dnsmichi removed the needs feedback We'll only proceed once we hear from you again label Jul 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api REST API bug Something isn't working
Projects
None yet
6 participants