Camel HealthCheck #42

rhuss · 2017-06-23T14:37:17Z

There is two kind of health checks required for Syndesis:

Technical health checks which ensure that the Integration itself is running, but not whether the components are without faults. This healthcheck is used as liveness and readiness checks for OpenShift
Status of an integration with respect to the backends. This status should be visualised in the Syndesis UI to give direct feedback to the user. It must not be the case that a faulty backend restarts the backend.

Here are some discussion points collected from various mails:

wrt/ OpenShift readiness and liveness checks I wonder whether we really should check that the connections used in an integration are healthy of whether we should confine ourselves to that the runtime 'engine' (== camel) has started properly. The situation is a little bit similar to Microservices which call other dependent services (like databases), especially when circuit breakers are in use.

I think when an integration is in an error state because of one of its connectors fails, this should be reported by other means than Kubernetes status changes. I should be visible on the Syndesis UI instead.

Also I could imagine that a global status page (like https://status.github.com/) would be useful. We could check here the state of all supported backend systems (twitter, saleforces, ...) on a global level (with some periodic checks running independent of any integration). That way users can correlate there errors to global failures of the backend system (in contrast to individual errors for their accounts).

@davsclaus

Yeah if this is a feature / requirement for the iPaaS and that we have a UX design for such use-cases. If all such runtime errors should not be part of k8s liveness or readiness checks, then we need to

a) use simpler Camel OOTB readiness/liveness check for k8s

b) report any Camel runtime errors to the iPaaS in a different way

Ad a and b)
Today the OOTB readiness/liveness check just rely on that Camel can startup, and that it's state is running (started = true). However as part of the startup procedure of Apache Camel then it may fail due to connection issues to eg salesforce, or a network error to a database via the SQL connector etc.

In other words if Camel can startup successfully then the k8s readiness/liveness check will always be a success.
For iPaaS we would still like the k8s to be this simple as we want any errors in Camel to be reported differently to iPaaS so we can present this to the end users in our own way in the iPaaS.

So in that situation we may even have to simplify the startup procedure of Apache Camel to startup routes later (we have in fact such a JIRA in the community for a rather long time that is intended for Camel 3). Then we can still use the existing OOTB health check for Camel with Spring Boot that then checks that at least you have the right Camel dependencies on the classpath and other "problems". By starting the routes later we ensure the k8s readiness/liveness probe is a success and any future errors from Camel we are now able to capture that ourselves and report this in a different way in the iPaaS.

Then we have some new logic in camel-core that startup those routes afterwards and have a controller that handles this so the routes are re-tried being starter later if a route failed to start etc. And also capture any startup errors which can be accessible from Java, JMX and via REST APIs via Spring Boot REST/Actuator (eg some rest path /camel/status). Then we can use that from iPaaS backend to query the integrations and get the status of the integrations and get those startup errors etc.

The controller can besides report startup problems, also report any runtime errors, for example if the routes was able to startup and the integration runs fine for eg 27 minutes, and then due to some network outage, then there are Camel exceptions during routing, which the controller then also captures and can report in similar way as the startup errors.

@lburgazzoli

For camel 3.0 we definitively need a pluggable "route controller" but
for the time being can't we rely on route policies ?

I've recently done some work on clustering support for camel and the
policy factory I have developed may be used as basis for an interim
solution as it waits for the camel context to be started then
starts/stops the routes when a condition is satisfied so the
application will start faster and the won't fail unless a serious
issues (i.e. errors in the route definition). Then the policy can
manage the route like restarting it and reporting if failure and
eventually it can be configured using a configmap so i.e. a user can
configure that route x should be restarted with a delay of n seconds
and the route y with a different delay to cope with different API
limits and that it should be marked as failed after a number some
retry.

chirino · 2018-03-26T08:47:25Z

This message was created automatically by mail delivery software. A message that you sent could not be delivered to one or more of its recipients. This is a temporary error. The following address(es) deferred: [email protected] Domain hiramchirino.com has exceeded the max emails per hour (159/150 (106%)) allowed. Message will be reattempted later

…

------- This is a copy of the message, including all the headers. ------ Received: from o8.sgmail.github.com ([167.89.101.199]:8331) by host313.hostmonster.com with esmtps (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.89_1) (envelope-from <[email protected]>) id 1f0Nme-0013ZP-EB for [email protected]; Mon, 26 Mar 2018 02:47:16 -0600 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=github.com; h=from:reply-to:to:cc:in-reply-to:references:subject:mime-version:content-type:content-transfer-encoding:list-id:list-archive:list-post:list-unsubscribe; s=s20150108; bh=k3xi//HQbMUpViE0aXvH58lHZzA=; b=m+jrSXNM1p0E5taJ w5JmNviZIU6K4Rg+1+uRfCh1nMsHkJCjHjwNB6ay0dpY268h4kVJ10NrIWtg67T3 ox98AI/On3Q86pfpUgaZcb5UeSfubd1bYz4LyHL+VHRjfUZcaKw2qb1TMGMeiU2Y ONyPouYK8YTn1zx6v4AocZC3+Io= Received: by filter1106p1mdw1.sendgrid.net with SMTP id filter1106p1mdw1-8739-5AB8B389-4 2018-03-26 08:47:05.276488693 +0000 UTC Received: from smtp.github.com (out-3.smtp.github.com [192.30.252.194]) by ismtpd0043p1mdw1.sendgrid.net (SG) with ESMTP id mXOtNTEnQ7q_zb6zYhcyBg for <[email protected]>; Mon, 26 Mar 2018 08:47:05.271 +0000 (UTC) Date: Mon, 26 Mar 2018 08:47:05 +0000 (UTC) From: Zoran Regvart <[email protected]> Reply-To: syndesisio/syndesis-project <[email protected]> To: syndesisio/syndesis-project <[email protected]> Cc: Subscribed <[email protected]> Message-ID: <syndesisio/syndesis-project/issue/42/issue_event/[email protected]> In-Reply-To: <syndesisio/syndesis-project/issues/[email protected]> References: <syndesisio/syndesis-project/issues/[email protected]> Subject: Re: [syndesisio/syndesis-project] Camel HealthCheck (#42) Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="--==_mimepart_5ab8b38985dd_46712b09a3da6ed4228950"; charset=UTF-8 Content-Transfer-Encoding: 7bit Precedence: list X-GitHub-Sender: zregvart X-GitHub-Recipient: chirino X-GitHub-Reason: subscribed List-ID: syndesisio/syndesis-project <syndesis-project.syndesisio.github.com> List-Archive: https://github.com/syndesisio/syndesis-project List-Post: <mailto:[email protected]> List-Unsubscribe: <mailto:unsub+000193579c18b30a9d387e375a3aeeb2cf73f504702e543592cf0000000116d0758992a169ce0e320ea5@reply.github.com>, <https://github.com/notifications/unsubscribe/AAGTVxmgYivSWCERMhYXw_qt0m6OY0ZAks5tiKsJgaJpZM4ODpkY> X-Auto-Response-Suppress: All X-GitHub-Recipient-Address: [email protected] X-SG-EID: r3yY3NeKU5c391Z9JqIJsAQ+rIWE1mJvBRhY4sfmQMv51qJOCtlJbkM9FTit3jyz19gecBSvrhGY19 LxCKmS+zV4EqGejkpZZBa06KsWC8OQ5ygq+kxUY64iL+e8r16uR8aFythYnk3DSB0N77KAAo61o+eL IgJzDS0CMS3nFjsQxWbq80F4kbgZIXZcP1BPGN/61eZZAOBglvjHeth7Y2r3LMXkKarTWbGtD+JgpK 8=

----==_mimepart_5ab8b38985dd_46712b09a3da6ed4228950 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Closed #42.

-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: #42 (comment) ----==_mimepart_5ab8b38985dd_46712b09a3da6ed4228950 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 7bit <p>Closed <a class="issue-link js-issue-link" data-error-text="Failed to load issue title" data-id="238161573" data-permission-text="Issue title is private" data-url="#42" href="#42">#42</a>.</p> <p style="font-size:small;-webkit-text-size-adjust:none;color:#666;">—<br />You are receiving this because you are subscribed to this thread.<br />Reply to this email directly, <a href="#42 (comment)">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAGTV8ygNre-DqIntB7rXloxGeE1sdX5ks5tiKsJgaJpZM4ODpkY">mute the thread</a>.<img src="https://github.com/notifications/beacon/AAGTV0Y267sxlFWdO9aLcKtA3ZXNr9Mmks5tiKsJgaJpZM4ODpkY.gif" height="1" width="1" alt="" /></p> <div itemscope itemtype="http://schema.org/EmailMessage"> <div itemprop="action" itemscope itemtype="http://schema.org/ViewAction"> <link itemprop="url" href="#42 (comment)"></link> <meta itemprop="name" content="View Issue"></meta> </div> <meta itemprop="description" content="View this Issue on GitHub"></meta> </div> <script type="application/json" data-scope="inboxmarkup">{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/syndesisio/syndesis-project","title":"syndesisio/syndesis-project","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/syndesisio/syndesis-project"}},"updates":{"snippets":[{"icon":"DESCRIPTION","message":"Closed #42."}],"action":{"name":"View Issue","url":"#42 (comment)"}}}</script> ----==_mimepart_5ab8b38985dd_46712b09a3da6ed4228950--

rhuss added the Epic label Jun 23, 2017

rhuss added this to the Sprint 14 milestone Jul 11, 2017

rhuss added the Runtime label Jul 17, 2017

fbolton mentioned this issue Jul 24, 2017

Document: Camel HealthCheck syndesisio/syndesis-documentation#2

Open

rhuss modified the milestones: Sprint 15, Sprint 14, Sprint 16 Aug 7, 2017

rhuss self-assigned this Aug 14, 2017

rhuss modified the milestones: Sprint 17, Sprint 16 Aug 29, 2017

zregvart mentioned this issue Sep 4, 2017

How to debug failures in integration startup #90

Closed

This was referenced Sep 19, 2017

Need a 'uses' field for integration revisions syndesisio/syndesis-rest#602

Closed

Need a runLength field in the integration revision list syndesisio/syndesis-rest#601

Closed

rhuss removed this from the Sprint 17 milestone Sep 19, 2017

rhuss removed the Epic label Oct 17, 2017

rhuss mentioned this issue Oct 19, 2017

Support Page UX Design syndesisio/syndesis-ux#47

Open

zregvart closed this as completed Mar 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Camel HealthCheck #42

Camel HealthCheck #42

rhuss commented Jun 23, 2017

chirino commented Mar 26, 2018 via email

Camel HealthCheck #42

Camel HealthCheck #42

Comments

rhuss commented Jun 23, 2017

chirino commented Mar 26, 2018 via email