Skip to content
This repository has been archived by the owner on Mar 26, 2018. It is now read-only.

Camel HealthCheck #42

Closed
rhuss opened this issue Jun 23, 2017 · 1 comment
Closed

Camel HealthCheck #42

rhuss opened this issue Jun 23, 2017 · 1 comment
Assignees
Labels

Comments

@rhuss
Copy link
Contributor

rhuss commented Jun 23, 2017

There is two kind of health checks required for Syndesis:

  • Technical health checks which ensure that the Integration itself is running, but not whether the components are without faults. This healthcheck is used as liveness and readiness checks for OpenShift
  • Status of an integration with respect to the backends. This status should be visualised in the Syndesis UI to give direct feedback to the user. It must not be the case that a faulty backend restarts the backend.

Here are some discussion points collected from various mails:

@ro14nd

wrt/ OpenShift readiness and liveness checks I wonder whether we really should check that the connections used in an integration are healthy of whether we should confine ourselves to that the runtime 'engine' (== camel) has started properly. The situation is a little bit similar to Microservices which call other dependent services (like databases), especially when circuit breakers are in use.

I think when an integration is in an error state because of one of its connectors fails, this should be reported by other means than Kubernetes status changes. I should be visible on the Syndesis UI instead.

Also I could imagine that a global status page (like https://status.github.com/) would be useful. We could check here the state of all supported backend systems (twitter, saleforces, ...) on a global level (with some periodic checks running independent of any integration). That way users can correlate there errors to global failures of the backend system (in contrast to individual errors for their accounts).

@davsclaus

Yeah if this is a feature / requirement for the iPaaS and that we have a UX design for such use-cases. If all such runtime errors should not be part of k8s liveness or readiness checks, then we need to

  • a) use simpler Camel OOTB readiness/liveness check for k8s
  • b) report any Camel runtime errors to the iPaaS in a different way

Ad a and b)
Today the OOTB readiness/liveness check just rely on that Camel can startup, and that it's state is running (started = true). However as part of the startup procedure of Apache Camel then it may fail due to connection issues to eg salesforce, or a network error to a database via the SQL connector etc.

In other words if Camel can startup successfully then the k8s readiness/liveness check will always be a success.
For iPaaS we would still like the k8s to be this simple as we want any errors in Camel to be reported differently to iPaaS so we can present this to the end users in our own way in the iPaaS.

So in that situation we may even have to simplify the startup procedure of Apache Camel to startup routes later (we have in fact such a JIRA in the community for a rather long time that is intended for Camel 3). Then we can still use the existing OOTB health check for Camel with Spring Boot that then checks that at least you have the right Camel dependencies on the classpath and other "problems". By starting the routes later we ensure the k8s readiness/liveness probe is a success and any future errors from Camel we are now able to capture that ourselves and report this in a different way in the iPaaS.

Then we have some new logic in camel-core that startup those routes afterwards and have a controller that handles this so the routes are re-tried being starter later if a route failed to start etc. And also capture any startup errors which can be accessible from Java, JMX and via REST APIs via Spring Boot REST/Actuator (eg some rest path /camel/status). Then we can use that from iPaaS backend to query the integrations and get the status of the integrations and get those startup errors etc.

The controller can besides report startup problems, also report any runtime errors, for example if the routes was able to startup and the integration runs fine for eg 27 minutes, and then due to some network outage, then there are Camel exceptions during routing, which the controller then also captures and can report in similar way as the startup errors.

@lburgazzoli

For camel 3.0 we definitively need a pluggable "route controller" but
for the time being can't we rely on route policies ?

I've recently done some work on clustering support for camel and the
policy factory I have developed may be used as basis for an interim
solution as it waits for the camel context to be started then
starts/stops the routes when a condition is satisfied so the
application will start faster and the won't fail unless a serious
issues (i.e. errors in the route definition). Then the policy can
manage the route like restarting it and reporting if failure and
eventually it can be configured using a configmap so i.e. a user can
configure that route x should be restarted with a delay of n seconds
and the route y with a different delay to cope with different API
limits and that it should be marked as failed after a number some
retry.

@chirino
Copy link
Contributor

chirino commented Mar 26, 2018 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

3 participants