From 6f68f8e16d2f870e6a68d534d84102bfeeea5216 Mon Sep 17 00:00:00 2001 From: Santiago Pericasgeertsen Date: Wed, 5 Apr 2023 14:16:43 -0400 Subject: [PATCH 1/3] New documenation for FT in Nima. Based on the old SE doc but converted to describe the new blocking API. Section about configuration of handlers removed as this is not currently supported in Nima. Signed-off-by: Santiago Pericasgeertsen --- docs/includes/attributes.adoc | 1 + docs/nima/fault-tolerance.adoc | 278 ++++++++++++++++++++++++++++++++- 2 files changed, 278 insertions(+), 1 deletion(-) diff --git a/docs/includes/attributes.adoc b/docs/includes/attributes.adoc index f69a03ef1c1..1684198e8dd 100644 --- a/docs/includes/attributes.adoc +++ b/docs/includes/attributes.adoc @@ -225,6 +225,7 @@ endif::[] :security-provider-abac-base-url: {javadoc-base-url}/io.helidon.security.providers.abac :security-provider-google-login-base-url: {javadoc-base-url}/io.helidon.security.providers.google.login :security-provider-jwt-base-url: {javadoc-base-url}/io.helidon.security.providers.jwt +:nima-faulttolerance-javadoc-base-url: {javadoc-base-url}/io.helidon.nima.faulttolerance // 3rd party versioned URLs :jaeger-doc-base-url: https://www.jaegertracing.io/docs/{version-lib-jaeger} diff --git a/docs/nima/fault-tolerance.adoc b/docs/nima/fault-tolerance.adoc index 876ac8ee7ee..4667857d579 100644 --- a/docs/nima/fault-tolerance.adoc +++ b/docs/nima/fault-tolerance.adoc @@ -26,5 +26,281 @@ include::{rootdir}/includes/nima.adoc[] +== Contents + +- <> +- <> +- <> +- <> +- <> +- <> + +== Overview + +Helidon Nima Fault Tolerance support is inspired by +link:{microprofile-fault-tolerance-spec-url}[MicroProfile Fault Tolerance]. +The API defines the notion of a _fault handler_ that can be combined with other handlers to +improve application robustness. Handlers are created to manage error conditions (faults) +that may occur in real-world application environments. Examples include service restarts, +network delays, temporal infrastructure instabilities, etc. + +The interaction of multiple microservices bring some new challenges from distributed systems +that require careful planning. Faults in distributed systems should be compartmentalized +to avoid unnecessary service interruptions. For example, if comparable information can +be obtained from multiples sources, a user request _should not_ be denied when a subset +of these sources is unreachable or offline. Similarly, if a non-essential source has been +flagged as unreachable, an application should avoid continuous access to that source +as that would result in much higher response times. + +In order to tackle the most common types of application faults, the Helidon Nima Fault +Tolerance API provides support for circuit breakers, retries, timeouts, bulkheads and fallbacks. +In addition, the API makes it very easy to create and monitor asynchronous tasks that +do not require explicit creation and management of threads or executors. + +For more information the reader is referred to the +link:{nima-faulttolerance-javadoc-base-url}/module-summary.html[Fault Tolerance Nima API Javadocs]. + +include::{rootdir}/includes/dependencies.adoc[] + +[source,xml] +---- + + io.helidon.nima.fault-tolerance + helidon-nima-fault-tolerance + +---- + +== API + +The Nima Fault Tolerance API is _blocking_ and based on the JDK's virtual thread model. +As a result, methods return _direct_ values instead of promises in the form of +`Single` or `Multi`. + +In the sections that follow, we shall briefly explore each of the constructs provided +by this API. + +=== Retries + +Temporal networking problems can sometimes be mitigated by simply retrying +a certain task. A `Retry` handler is created using a `RetryPolicy` that +indicates the number of retries, delay between retries, etc. + +[source,java] +---- +Retry retry = Retry.builder() + .retryPolicy(Retry.JitterRetryPolicy.builder() + .calls(3) + .delay(Duration.ofMillis(100)) + .build()) + .build(); +T result = retry.invoke(this::retryOnFailure); +---- + +The sample code above will retry calls to the supplier `this::retryOnFailure` +for up to 3 times with a 100 millisecond delay between them. + +NOTE: The return type of method `retryOnFailure` in the example above must +be some `T` and the parameter to the retry handler's `invoke` +method `Supplier`. + +If the call to the supplier provided completes exceptionally, it will be treated as +a failure and retried until the maximum number of attempts is reached; finer control +is possible by creating a retry policy and using methods such as +`applyOn(Class... classes)` and +`skipOn(Class... classes)` to control the exceptions +that must be retried and those that must be ignored. + +=== Timeouts + +A request to a service that is inaccessible or simply unavailable should be bounded +to ensure a certain quality of service and response time. Timeouts can be configured +to avoid excessive waiting times. In addition, a fallback action can be defined +if a timeout expires as we shall cover in the next section. + +The following is an example of using `Timeout`: +[source,java] +---- +T result = Timeout.create(Duration.ofMillis(10)) + .invoke(this::mayTakeVeryLong); +---- + +NOTE: Using a handler's `create` method is an alternative to using a builder that is +more convenient when default settings are acceptable. + +The example above monitors the call to method `mayTakeVeryLong` and reports a +`TimeoutException` if the execution takes more than 10 milliseconds to complete. + +=== Fallbacks + +A fallback to a _known_ result can sometimes be an alternative to +reporting an error. For example, if we are unable to access a service +we may fall back to the last result obtained from that service at an +earlier time. + +A `Fallback` instance is created by providing a function that takes a `Throwable` +and produces some `T` to be used when the intended method failed to return a value: + +[source,java] +---- +T result = Fallback.create(throwable -> lastKnownValue) + .invoke(this::mayFail); +---- + +This example calls the method `mayFail` and if it produces a `Throwable`, it maps +it to the last known value using the fallback handler. + +=== Circuit Breakers + +Failing to execute a certain task or to call another service repeatedly can have a direct +impact on application performance. It is often preferred to avoid calls to non-essential +services by simply preventing that logic to execute altogether. A circuit breaker can be +configured to monitor such calls and block attempts that are likely to fail, thus improving +overall performance. + +Circuit breakers start in a _closed_ state, letting calls to proceed normally; after +detecting a certain number of errors during a pre-defined processing window, they can _open_ to +prevent additional failures. After a circuit has been opened, it can transition +first to a _half-open_ state before finally transitioning back to a closed state. +The use of an intermediate state (half-open) +makes transitions from open to close more progressive, and prevents a circuit breaker +from eagerly transitioning to states without considering sufficient observations. + +NOTE: Any failure while a circuit breaker is in half-open state will immediately +cause it to transition back to an open state. + +Consider the following example in which `this::mayFail` is monitored by a +circuit breaker: +[source,java] +---- +CircuitBreaker breaker = CircuitBreaker.builder() + .volume(10) + .errorRatio(30) + .delay(Duration.ofMillis(200)) + .successThreshold(2) + .build(); +T result = breaker.invoke(this::mayFail); +---- + +The circuit breaker in this example defines a processing window of size 10, an error +ratio of 30%, a duration to transition to half-open state of 200 milliseconds, and +a success threshold to transition from half-open to closed state of 2 observations. +It follows that, + +* After completing the processing window, if at least 3 errors are detected, the +circuit breaker will transition to the open state, thus blocking the execution +of any subsequent calls. + +* After 200 millis, the circuit breaker will transition back to half-open and +allow calls to proceed again. + +* If the next two calls after transitioning to half-open are successful, the +circuit breaker will transition to closed state; otherwise, it will +transition back to open state, waiting for another 200 milliseconds +before attempting to transition to half-open again. + +A circuit breaker will throw a +`io.helidon.nima.faulttolerance.CircuitBreakerOpenException` +if an attempt to make an invocation takes place while it is in open state. + +=== Bulkheads + +Concurrent access to certain components may need to be limited to avoid +excessive use of resources. For example, if an invocation that opens +a network connection is allowed to execute concurrently without +any restriction, and if the service on the other end is slow responding, +it is possible for the rate at which network connections are opened +to exceed the maximum number of connections allowed. Faults of this +type can be prevented by guarding these invocations using a bulkhead. + +NOTE: The origin of the name _bulkhead_ comes from the partitions that +comprise a ship's hull. If some partition is somehow compromised +(e.g., filled with water) it can be isolated in a manner not to +affect the rest of the hull. + +A waiting queue can be associated with a bulkhead to handle tasks +that are submitted when the bulkhead is already at full capacity. + +[source,java] +---- +Bulkhead bulkhead = Bulkhead.builder() + .limit(3) + .queueLength(5) + .build(); +T result = bulkhead.invoke(this::usesResources); +---- + +This example creates a bulkhead that limits concurrent execution +to `this:usesResources` to at most 3, and with a queue of size 5. The +bulkhead will report a `io.helidon.nima.faulttolerance.BulkheadException` if unable +to proceed with the call: either due to the limit being reached or the queue +being at maximum capacity. + +=== Asynchronous + +Asynchronous tasks can be created or forked by using an `Async` instance. A supplier of type +`T` is provided as the argument when invoking this handler. For example: + +[source,java] +---- +CompletableFuture cf = Async.create().invoke(Thread::currentThread)); +cf.thenAccept(t -> System.out.println("Async task executed in thread " + t)); +---- + +The supplier `() -> Thread.currentThread()` is executed in a new thread and +the value it produces printed by the consumer and passed to `thenAccept`. + +By default, asynchronous tasks are executed using a _new virtual thread per +task_ based on the `ExecutorService` defined in +`io.helidon.nima.faulttolerance.FaultTolerance` and +configurable by an application. Alternatively, an `ExecutorService` can be specified +when building a non-standard `Async` instance. + +=== Handler Composition + +Method invocations can be guarded by any combination of the handlers +presented above. For example, an invocation that +times out can be retried a few times before resorting to a fallback value +—assuming it never succeeds. + +The easiest way to achieve handler composition is by using a builder in the +`FaultTolerance` class as shown in the following example: + +[source,java] +---- +FaultTolerance.TypedBuilder builder = FaultTolerance.typedBuilder(); + +Timeout timeout = Timeout.create(Duration.ofMillis(10)); +builder.addTimeout(timeout); + +Retry retry = Retry.builder() + .retryPolicy(Retry.JitterRetryPolicy.builder() + .calls(3) + .delay(Duration.ofMillis(100)) + .build()) + .build(); +builder.addRetry(retry); + +Fallback fallback = Fallback.create(throwable ->lastKnownValue); +builder.addFallback(fallback); + +T result = builder.build().invoke(this::mayTakeVeryLong); +---- + +The exact order in which handlers are added to a builder depends on the use case, +but generally the order starting from innermost to outermost should be: bulkhead, +timeout, circuit breaker, retry and fallback. That is, fallback is the first +handler in the chain (the last to executed once a value is returned) +and bulkhead is the last one (the first to be executed once a value is returned). + +NOTE: This is the ordering used by the MicroProfile Fault Tolerance implementation +in Helidon when a method is decorated with multiple annotations. + +== Examples + +See <> section for examples. + +== Additional Information + +For additional information, see the +link:{nima-faulttolerance-javadoc-base-url}/module-summary.html[Fault Tolerance Nima API Javadocs]. -== TO DO FOR NIMA \ No newline at end of file From 9c49121dce4dc62dac135c3566e2b2cf70bed2e0 Mon Sep 17 00:00:00 2001 From: Santiago Pericasgeertsen Date: Wed, 5 Apr 2023 14:24:30 -0400 Subject: [PATCH 2/3] Removed section in Contents. Signed-off-by: Santiago Pericasgeertsen --- docs/nima/fault-tolerance.adoc | 1 - 1 file changed, 1 deletion(-) diff --git a/docs/nima/fault-tolerance.adoc b/docs/nima/fault-tolerance.adoc index 4667857d579..9fec534883f 100644 --- a/docs/nima/fault-tolerance.adoc +++ b/docs/nima/fault-tolerance.adoc @@ -31,7 +31,6 @@ include::{rootdir}/includes/nima.adoc[] - <> - <> - <> -- <> - <> - <> From dd8856ad4f29730f3c4e419a4cb31ab591cc679b Mon Sep 17 00:00:00 2001 From: Santiago Pericasgeertsen Date: Wed, 5 Apr 2023 15:54:49 -0400 Subject: [PATCH 3/3] Incorporated feedback. Signed-off-by: Santiago Pericasgeertsen --- docs/nima/fault-tolerance.adoc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/nima/fault-tolerance.adoc b/docs/nima/fault-tolerance.adoc index 9fec534883f..28da1dbc423 100644 --- a/docs/nima/fault-tolerance.adoc +++ b/docs/nima/fault-tolerance.adoc @@ -56,7 +56,7 @@ Tolerance API provides support for circuit breakers, retries, timeouts, bulkhead In addition, the API makes it very easy to create and monitor asynchronous tasks that do not require explicit creation and management of threads or executors. -For more information the reader is referred to the +For more information, see link:{nima-faulttolerance-javadoc-base-url}/module-summary.html[Fault Tolerance Nima API Javadocs]. include::{rootdir}/includes/dependencies.adoc[]