Skip to content

Commit

Permalink
Merge pull request #623 from benjamin-confino/spec-for-mptel-integration
Browse files Browse the repository at this point in the history
Add a new spec section to cover integration with opentelemetry metrics
  • Loading branch information
Azquelt authored Apr 25, 2024
2 parents 298dd53 + 533df9b commit 2db3341
Show file tree
Hide file tree
Showing 48 changed files with 2,908 additions and 92 deletions.
76 changes: 46 additions & 30 deletions spec/src/main/asciidoc/metrics.asciidoc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright (c) 2018-2020 Contributors to the Eclipse Foundation
// Copyright (c) 2018-2024 Contributors to the Eclipse Foundation
//
// See the NOTICE file(s) distributed with this work for additional
// information regarding copyright ownership.
Expand All @@ -18,12 +18,16 @@
// Contributors:
// Andrew Rouse
// Jan Bernitt
// Benjamin Confino

== Integration with MicroProfile Metrics
== Integration with MicroProfile Metrics and MicroProfile Telemetry

When Microprofile Fault Tolerance and Microprofile Metrics are used together, metrics are automatically added for each of
When MicroProfile Fault Tolerance is used together with MicroProfile Metrics or MicroProfile Telemetry, metrics are automatically added for each of
the methods annotated with a `@Retry`, `@Timeout`, `@CircuitBreaker`, `@Bulkhead` or `@Fallback` annotation.

If all three of MicroProfile Fault Tolerance, MicroProfile Metrics, and MicroProfile Telemetry are used together then MicroProfile Fault Tolerance
exports metrics to both MicroProfile Metrics and MicroProfile Telemetry.

=== Names

The automatically added metrics follow a consistent pattern which includes the fully qualified name of the annotated method.
Expand All @@ -33,7 +37,7 @@ is non-portable and may vary between implementations. For portable behavior, mon

=== Scope

Metrics added by this specification will appear in the `base` MicroProfile Metrics scope.
In MicroProfile Metrics, metrics added by this specification will appear in the `base` MicroProfile Metrics scope.

=== Registration

Expand All @@ -44,11 +48,12 @@ Policies that have been disabled through configuration do not cause registration

Implementations must ensure that if any of these annotations are present on a method, then the following metrics are added only once for that method.

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.invocations.total`

| Type | `Counter`
| Type in MP Metrics | `Counter`
| Type in MP Telemetry | A counter that emits long
| Unit | None
| Description | The number of times the method was called
| Tags
Expand All @@ -59,11 +64,12 @@ a| * `method` - the fully qualified method name

=== Metrics added for `@Retry`

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.retry.calls.total`

| Type | `Counter`
| Type in MP Metrics | `Counter`
| Type in MP Telemetry | A counter that emits long
| Unit | None
| Description | The number of times the retry logic was run. This will always be once per method call.
| Tags
Expand All @@ -72,11 +78,12 @@ a| * `method` - the fully qualified method name
* `retryResult` = `[valueReturned\|exceptionNotRetryable\|maxRetriesReached\|maxDurationReached]` - the reason that last attempt to call the method was not retried
|===

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.retry.retries.total`

| Type | `Counter`
| Type in MP Metrics | `Counter`
| Type in MP Telemetry | A counter that emits long
| Unit | None
| Description | The number of times the method was retried
| Tags
Expand All @@ -85,23 +92,25 @@ a| * `method` - the fully qualified method name

=== Metrics added for `@Timeout`

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.timeout.calls.total`

| Type | `Counter`
| Type in MP Metrics | `Counter`
| Type in MP Telemetry | A counter that emits long
| Unit | None
| Description | The number of times the timeout logic was run. This will usually be once per method call, but may be zero times if the circuit breaker prevents execution or more than once if the method is retried.
| Tags
a| * `method` - the fully qualified method name
* `timedOut` = `[true\|false]` - whether the method call timed out
|===

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.timeout.executionDuration`

| Type | `Histogram`
| Type in MP Metrics | `Histogram`
| Type in MP Telemetry | A histogram that emits long
| Unit | Nanoseconds
| Description | Histogram of execution times for the method
| Tags
Expand All @@ -110,11 +119,12 @@ a| * `method` - the fully qualified method name

=== Metrics added for `@CircuitBreaker`

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.circuitbreaker.calls.total`

| Type | `Counter`
| Type in MP Metrics | `Counter`
| Type in MP Telemetry | A counter that emits long
| Unit | None
| Description | The number of times the circuit breaker logic was run. This will usually be once per method call, but may be more than once if the method call is retried.
| Tags
Expand All @@ -125,11 +135,12 @@ a| * `method` - the fully qualified method name
** `circuitBreakerOpen` - the method did not run because the circuit breaker was in open or half-open state
|===

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.circuitbreaker.state.total`

| Type | `Gauge<Long>`
| Type in MP Metrics | `Gauge<Long>`
| Type in MP Telemetry | A counter that emits long
| Unit | Nanoseconds
| Description | Amount of time the circuit breaker has spent in each state
| Tags
Expand All @@ -138,7 +149,7 @@ a| * `method` - the fully qualified method name
| Notes | Although this metric is a `Gauge`, its value increases monotonically.
|===

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.circuitbreaker.opened.total`

Expand All @@ -151,57 +162,62 @@ a| * `method` - the fully qualified method name

=== Metrics added for `@Bulkhead`

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.bulkhead.calls.total`

| Type | `Counter`
| Type in MP Metrics | `Counter`
| Type in MP Telemetry | A counter that emits long
| Unit | None
| Description | The number of times the bulkhead logic was run. This will usually be once per method call, but may be zero times if the circuit breaker prevented execution or more than once if the method call is retried.
| Tags
a| * `method` - the fully qualified method name
* `bulkheadResult` = `[accepted\|rejected]` - whether the bulkhead allowed the method call to run
|===

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.bulkhead.executionsRunning`

| Type | `Gauge<Long>`
| Type in MP Metrics | `Gauge<Long>`
| Type in MP Telemetry | A gauge that emits long
| Unit | None
| Description | Number of currently running executions
| Tags
a| * `method` - the fully qualified method name
|===

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.bulkhead.executionsWaiting`

| Type | `Gauge<Long>`
| Type in MP Metrics | `Gauge<Long>`
| Type in MP Telemetry | A gauge that emits long
| Unit | None
| Description | Number of executions currently waiting in the queue
| Tags
a| * `method` - the fully qualified method name
| Notes | Only added if the method is also annotated with `@Asynchronous`
|===

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.bulkhead.runningDuration`

| Type | `Histogram`
| Type in MP Metrics | `Histogram`
| Type in MP Telemetry | A histogram that emits long
| Unit | Nanoseconds
| Description | Histogram of the time that method executions spent running
| Tags
a| * `method` - the fully qualified method name
|===

[cols="1,5"]
[cols="2,4"]
|===
| Name | `ft.bulkhead.waitingDuration`

| Type | `Histogram`
| Type in MP Metrics | `Histogram`
| Type in MP Telemetry | A histogram that emits long
| Unit | Nanoseconds
| Description | Histogram of the time that method executions spent waiting in the queue
| Tags
Expand All @@ -213,7 +229,7 @@ a| * `method` - the fully qualified method name
=== Notes

Future versions of this specification may change the definitions of the metrics which are added to take advantage of
enhancements in the MicroProfile Metrics specification.
enhancements in the MicroProfile Metrics or MicroProfile Telemetry specification.

If more than one annotation is applied to a method, the metrics associated with each annotation will be added for that method.

Expand Down
8 changes: 8 additions & 0 deletions spec/src/main/asciidoc/relationship.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -68,3 +68,11 @@ The MicroProfile Metrics specification provides a way to monitor microservice in
* When `Timeout` is used, you would like to know how many times the method timed out.

Because of this requirement, when MicroProfile Fault Tolerance and MicroProfile Metrics are used together, metrics are automatically added for each of the methods annotated with a `@Retry`, `@Timeout`, `@CircuitBreaker`, `@Bulkhead` or `@Fallback` annotation.

=== Relationship to MicroProfile Telemetry
The MicroProfile Telemetry specification provides a way to monitor microservice invocations. It is also important to find out how Fault Tolerance policies are operating, e.g.

* When `Retry` is used, it is useful to know how many times a method was called and succeeded after retrying at least once.
* When `Timeout` is used, you would like to know how many times the method timed out.

Because of this requirement, when MicroProfile Fault Tolerance and MicroProfile Telemetry are used together, metrics are automatically added for each of the methods annotated with a `@Retry`, `@Timeout`, `@CircuitBreaker`, `@Bulkhead` or `@Fallback` annotation.
29 changes: 29 additions & 0 deletions tck/formatter.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!--
Copyright (c) 2009-2024 Contributors to the Eclipse Foundation
See the NOTICE file(s) distributed with this work for additional
information regarding copyright ownership.
Licensed under the Apache License, Version 2.0 (the "License");
You may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<profiles version="1">
<profile kind="CodeFormatterProfile" name="MicroProfile" version="1">
<setting id="org.eclipse.jdt.core.formatter.lineSplit" value="120"/>
<setting id="org.eclipse.jdt.core.formatter.comment.line_length" value="120"/>
<setting id="org.eclipse.jdt.core.formatter.indentation.size" value="4"/>
<setting id="org.eclipse.jdt.core.formatter.tabulation.char" value="space"/>
<setting id="org.eclipse.jdt.core.formatter.join_wrapped_lines" value="false"/>
<setting id="org.eclipse.jdt.core.formatter.alignment_for_assignment" value="16"/>
<setting id="org.eclipse.jdt.core.formatter.alignment_for_field_declaration" value="16"/>
<setting id="org.eclipse.jdt.core.formatter.use_on_off_tags" value="true"/>
<setting id="org.eclipse.jdt.core.formatter.alignment_for_arguments_in_annotation" value="18"/>
<setting id="org.eclipse.jdt.core.formatter.alignment_for_enum_constants" value="16"/>
</profile>
</profiles>
36 changes: 32 additions & 4 deletions tck/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,7 @@
Licensed under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
Expand All @@ -17,7 +15,8 @@
<modelVersion>4.0.0</modelVersion>

<parent>
<!-- This is just for now and will not work if the API has a separate release cycle than the rest. -->
<!-- This is just for now and will not work if the API has a separate release
cycle than the rest. -->
<groupId>org.eclipse.microprofile.fault-tolerance</groupId>
<artifactId>microprofile-fault-tolerance-parent</artifactId>
<version>4.1-SNAPSHOT</version>
Expand Down Expand Up @@ -46,6 +45,17 @@
</dependencies>
</dependencyManagement>

<build>
<plugins>
<plugin>
<groupId>net.revelc.code.formatter</groupId>
<artifactId>formatter-maven-plugin</artifactId>
<configuration>
<configFile>${project.basedir}/formatter.xml</configFile>
</configuration>
</plugin>
</plugins>
</build>

<dependencies>
<dependency>
Expand All @@ -55,6 +65,12 @@
<scope>provided</scope>
</dependency>

<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-api</artifactId>
<version>1.36.0</version>
</dependency>

<dependency>
<groupId>jakarta.enterprise</groupId>
<artifactId>jakarta.enterprise.cdi-api</artifactId>
Expand All @@ -75,6 +91,18 @@
<scope>provided</scope>
</dependency>

<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk-metrics</artifactId>
<version>1.36.0</version>
</dependency>

<dependency>
<groupId>io.opentelemetry</groupId>
<artifactId>opentelemetry-sdk-extension-autoconfigure</artifactId>
<version>1.36.0</version>
</dependency>

<dependency>
<groupId>org.testng</groupId>
<artifactId>testng</artifactId>
Expand Down Expand Up @@ -109,5 +137,5 @@
</dependency>
</dependencies>


</project>
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,8 @@ public class Bulkhead1Retry0MethodSyncBean {

@Bulkhead(value = 1)
@Retry(retryOn = {
BulkheadException.class}, delay = 1, delayUnit = ChronoUnit.SECONDS, maxRetries = 0, maxDuration = 999999)
BulkheadException.class},
delay = 1, delayUnit = ChronoUnit.SECONDS, maxRetries = 0, maxDuration = 999999)
public void test(Barrier barrier) {
barrier.await();
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,8 @@ public class Bulkhead55RapidRetry10MethodAsynchBean {

@Bulkhead(waitingTaskQueue = 5, value = 5)
@Asynchronous
@Retry(retryOn = BulkheadException.class, delay = 1, delayUnit = ChronoUnit.MICROS, jitter = 0, maxRetries = 10, maxDuration = 999999)
@Retry(retryOn = BulkheadException.class, delay = 1, delayUnit = ChronoUnit.MICROS, jitter = 0, maxRetries = 10,
maxDuration = 999999)
public Future<?> test(Barrier barrier) {
barrier.await();
return CompletableFuture.completedFuture(null);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,8 @@ public String serviceWithTimeout() {
*
* @return should always throw TimeoutException
*/
@CircuitBreaker(successThreshold = 2, requestVolumeThreshold = 2, failureRatio = 0.75, delay = 50000, failOn = BulkheadException.class)
@CircuitBreaker(successThreshold = 2, requestVolumeThreshold = 2, failureRatio = 0.75, delay = 50000,
failOn = BulkheadException.class)
@Timeout(500) // Adjusted by config
public String serviceWithTimeoutWithoutFailOn() {
try {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ public void serviceRetryOn(RuntimeException e, AtomicInteger counter) {
}

@Retry(retryOn = {TestConfigExceptionA.class,
TestConfigExceptionB.class}, abortOn = RuntimeException.class, maxRetries = 1, delay = 0, jitter = 0)
TestConfigExceptionB.class},
abortOn = RuntimeException.class, maxRetries = 1, delay = 0, jitter = 0)
public void serviceAbortOn(RuntimeException e, AtomicInteger counter) {
counter.getAndIncrement();
throw e;
Expand All @@ -75,7 +76,8 @@ public void serviceAbortOn(RuntimeException e, AtomicInteger counter) {
* <p>
* Limited to 10 seconds or 1000 retries, but will stop as soon as a delay of &gt; 100ms is observed.
*/
@Retry(abortOn = TestConfigExceptionA.class, delay = 0, jitter = 0, maxRetries = 1000, maxDuration = 10, durationUnit = ChronoUnit.SECONDS)
@Retry(abortOn = TestConfigExceptionA.class, delay = 0, jitter = 0, maxRetries = 1000, maxDuration = 10,
durationUnit = ChronoUnit.SECONDS)
public void serviceJitter() {
long startTime = System.nanoTime();
if (lastStartTime != 0) {
Expand Down
Loading

0 comments on commit 2db3341

Please sign in to comment.