PAYARA-3468 MP FT 2.0 update #3911

jbee · 2019-04-23T14:18:41Z

Implements MicroProfile Fault-Tolerance 2.0.

While 2.0 did not add a lot of new features larger changes were done for two reasons:

To make the interaction between different FT behaviours compliant with 2.0 semantics some form of restructuring was required, partly due to semantics being slightly extended and more strictly defined in 2.0, partly because some cases were not compliant before. E.g. @Fallback needed to be allowed alone and in any combination with other annotations. The best solution to most of the interaction and processing problems seemed to be to merge the interceptor into one.
The implementation should become more modular, less repetitive, (unit) testable and easier to follow by introducing abstractions. While non of this was strictly necessary I think its time well spend. Otherwise this time would have gone into lengthy debugging sessions looking for errors. Also this was a prerequisite for the unit tests I wanted to add for TCK tests that cannot be run due to test setup problems.

Change summary:

Interceptors have been merged into FaultToleranceInterceptor that activates on new FaultTolerance marker annotation (FT handling itself was further extracted into FaultTolerancePolicy).
new FaultTolerance annotation is added (at runtime) to all methods with FT annotations (that is the solution to do "one interceptor to rule them all")
FaultToleranceExtension now handles interceptor priority changes done via Config (missing feature)
Validation was moved from annotation processing time to invocation time (cached) as well as annotation processing time to validate both the annotations but also the actual values used after overrides from Config would be applied. This is captured in the overall FaultTolerancePolicy that should be apply to a method. It combines all possible FT policy values for a method. Each of the policies has the same fields as the annotation they represent just that they hold the values after overrides were applied. This makes validation now effectively enforced by construction. Using invalid policies is no longer possible.
Most of the policy analysis moved from each invocation to once per method on first invocation to reduce the overhead of now validating actually used policies. To still allow changing Config overrides at runtime the used policy has a TTL of a minute after which it is recreated.
Use of Config and override logic was extracted and abstracted into FaultToleranceConfig interface
Use of MetricsRegistry and key names was extracted and abstracted into FaultToleranceMetrics interface
Use of any "services" was abstracted into FaultToleranceService interface, the implementation was renamed (and somewhat re-purposed) to FaultToleranceServiceImpl.
FaultToleranceObject was renamed FaultToleranceApplicationState and adopted to other changes
BulkheadSemaphore class was created to contain bulkhead specific requirements on the basic Semaphore.
Fallback method lookup is much more sophisticated taking method parameter types and inheritance into account (see MethodLookupUtils; actually checked by TCK)
Couple of other smaller changes needed to decouple things to the point where they could be tested more isolated.
Moved all the service implementation specific "private" classes into new package service

Overall the FT logic moved from the interceptors into FaultTolerancePolicy where each annotation is handled by a method. These methods are called in a fixed chain, each representing a stage of the overall FT handling as required for the policy in place.

Logging

Level FINER is used for execution status information.
Level FINE is used for "event-like" information.

Tests & Testing:
All tests of the TCK that need to be excluded since they expect an unwrapped exception where replicated in added unit tests so that we do test correct behaviour. In addition I added tests for config overrides because the logic is somewhat confusing and not very well illustrated by the TCK tests. Last but not least I added tests for the asynchronous error handling since I discovered that the TCK has very little coverage on this important aspect (and indeed I did find another error in the implementation when adding the tests).

For most unit tests there is a corresponding method with FT annotations. The test method and the corresponding method under test are linked via name convention. The method under test has the name <test-method-name>_Method (which I took from some TCK test).

FYI: With the added tests the coverage of the module is now ~65% with main logic being around 80% covered. The 73 tests run in less then a second.

…or FT flow control

…lbackmethod PASS; added unit tests for fallback method lookup and validation taken from TCK scenarios

…m error messages

…e package

… calls

…oc, some renames

…es; added copyright header to tests

…tion handling test

jbee · 2019-04-23T15:21:49Z

jenkins test please

jbee · 2019-04-25T12:55:33Z

jenkins test please

pdudits

At least the typo should be corrected ;)

...rance/src/main/java/fish/payara/microprofile/faulttolerance/cdi/FaultToleranceExtension.java

pdudits · 2019-04-25T12:40:55Z

...rance/src/main/java/fish/payara/microprofile/faulttolerance/policy/FaultTolerancePolicy.java

+     * A simple cache with a fix {@link #TTL} with a policy for each target method.
+     */
+    private static final ConcurrentHashMap<Class<?>, ConcurrentHashMap<Method, FaultTolerancePolicy>> POLICY_BY_METHOD 
+        = new ConcurrentHashMap<>();


Adding an actual tuple type for key (Class, Method) might simplify working with this map

Originally this was intentionally done this way to avoid object creation on lookup. I was hoping to not have to create garbage for each invocation. Later this turned out to be very difficult so there will be 2-3 garbage objects per invocation. We could change this and make it a bit more garbage and it certainly would increase readability. On the other hand there are just 2 methods using this so the simplification isn't that big and not creating an additional object might still be worth it. WDYT?

My gut feeling says don't optimise at this level until it's really proven to be a bottleneck (e.g. this being in a tight loop with thousands of lookups). In CDI these temp objects might be dwarfed by all the other things going on.

Also, in select cases the JVM will allocate objects on the stack (if escape analysis proves they can't escape their scope), making object creation for temp object very cheap).

It was not really applied optimizing. I saw two options: use a key class or do a nested map. I knew there was only two usages of the structure and nested maps had the benefit of avoiding garbage objects so I went for that option. Making this decision a big thing is maybe also wrong focus. Only reason I did not change it later was that it does not make much sense to put more work into something that makes so little difference as its scope is and will be tiny.

I was more looking at this from code readability point of view, as operations on nested maps were always bit hard to read (although it's now better with computeIfAbsent).

A theoretical performance argument for the key class would be, that it will reduce lookup time, by only utilizing single map rather than two of them.

A compromise solution would be to encapsulate the map into separate class only exposing the operations that are needed.

I was thinking about readability was well. So ultimately the question now is: do we think improved readability is worth the effort of changing this, then I think we should do it. I hesitated since there are literally 2 lines of code affected and using a key class would mean actually more code line wise so it felt less clear and didn't really call for action so I left it alone.

...rance/src/main/java/fish/payara/microprofile/faulttolerance/policy/FaultTolerancePolicy.java

pdudits · 2019-04-26T15:21:28Z

...olerance/src/test/java/fish/payara/microprofile/faulttolerance/policy/BulkheadBasicTest.java

+ * 
+ * @author Jan Bernitt
+ */
+public class BulkheadBasicTest {


Oh actually, those are nicely testable.

For concurrency, can you think about a stress test? This what helped me time to time to validate, that my view of the concurrent behavior of my code is also CPU's view on it :)

For example limit at most 4 concurrent invocations over 8-thread threadpool putting few thousand invocations in the pool, and verify some resonable invariants?

I added a test that uses all annotations except @Timeout (since I did not want to get into real time waiting) and spawns a number of concurrent callers each doing a number of calls. The tested method will fail hard every 3rd time and "soft" every 5th time (unless this is also every 3rd time). The attributes on the annotations are set in such a way that the failing has different effects, it definitely will cause retries, it definitely will cause circuit breaker transitions and there is a good chance that even after retrying some calls (from caller point of view) do fail entirely. This is nothing we can be sure of though. The test asserts that the numbers make sense and that the end state is clean.

jbee · 2019-04-29T15:28:56Z

jenkins test please

jbee · 2019-04-29T17:09:29Z

@pdudits addressed all your comments. PR ready for re-review.

Pandrex247

Aside from this one comment and the couple of tiny things I've fixed for you, all seems bon :) Nicely done.

appserver/payara-appserver-modules/microprofile/fault-tolerance/pom.xml

arjantijms · 2019-05-01T08:05:19Z

The single policy looks good, and may make it easier indeed to combine the several aspects of FT in a somewhat more coherent way.

I do have some reservations about hardcoding the FT asynchronous annotation here, and would have loved to see this working with any (CDI based) asynchronous annotation, without the policy having explicit knowledge about which one was used, but this reservation for now is not big enough to ask for changes.

arjantijms · 2019-05-01T08:07:30Z

...rance/src/main/java/fish/payara/microprofile/faulttolerance/cdi/FaultToleranceExtension.java

+        }
+    }
+
+    public static final class PriorityLiteral extends AnnotationLiteral<Priority> implements Priority {


Note to self and others: should propose literals to be in the FT spec for each annotation

arjantijms · 2019-05-01T08:10:52Z

Another general comment, also not strong enough to warrant a changes requested, but what about renaming Policy to something a bit more descriptive, like FaultTolerancePolicy?

jbee · 2019-05-01T08:15:41Z

@arjantijms I had not forgotten your input from our show and tell. I was planning to do the "multi-annotation" support as an extra PR. I'll need some input from you on what the goal is. Maybe this is reasonable to handle by a separate jira where you can dump what you know about annotations that should work or just comment on PAYARA-3468?

On the Policy name: The name FaultTolerancePolicy was already taken :D. That is the overall policy combining the 6 possible policies.

…AYARA-3468-MP-FT-2.0

jbee · 2019-05-01T08:53:35Z

Thanks @Pandrex247 for the fixes. I addressed your comment on the version and moved it to dependency management as discussed.

Pandrex247 · 2019-05-01T09:05:47Z

Jenkins test please

jbee · 2019-05-01T09:11:26Z

jenkins test please

jbee added 23 commits March 29, 2019 17:13

PAYARA-3468 fixed: consider global override

9872370

PAYARA-3468 update to API and TCK 2.0

5c48af8

Merge branch 'master' into PAYARA-3468-MP-FT-2.0

4f520b7

PAYARA-3468 decoupling, merge to single interceptor, added policies f…

fdda096

…or FT flow control

PAYARA-3468 TCK passes

be16c14

PAYARA-3468 TCK bulkhead, circuit breaker, config, disableEnv and fal…

ef8dbdb

…lbackmethod PASS; added unit tests for fallback method lookup and validation taken from TCK scenarios

PAYARA-3468 more TCK tests as unit tests that cannot run, more unifor…

e41d816

…m error messages

PAYARA-3468 FIXED policy per target method (not only method)

f13a746

PAYARA-3468 added metrics (mostly correct), fixed Future fault behaviour

f5a20c3

PAYARA-3468 metrics TCK tests pass - rename

09bb6db

PAYARA-3468 config and metrics as service factories, extracted servic…

b7a7105

…e package

PAYARA-3468 added tracing and logging, fixed bulkhead metric rejected…

e4c6244

… calls

PAYARA-3468 removed obsolete interceptors and validators, added javad…

77dc3d4

…oc, some renames

PAYARA-3468 interceptor priority via config; cleanup and rename

cf64a57

PAYARA-3468 added async tracing, javadoc and copyright headers

d3bf29b

PAYARA-3468 more javadoc

0df07f6

PAYARA-3468 added tests for config overrides

6271fde

PAYARA-3468 added tests for config override - all different value typ…

d3cc6b1

…es; added copyright header to tests

PAYARA-3468 added test for configuration override priorities

49b70ea

PAYARA-3468 added tests for config enabled scope

90b0ae5

PAYARA-3468 fixed: async Future exception handling; added async excep…

24ca2e9

…tion handling test

PAYARA-3468 added more tests for async error handling

1f461c1

PAYARA-3468 added javadoc

fdec8f7

jbee self-assigned this Apr 23, 2019

jbee requested a review from Pandrex247 April 23, 2019 14:21

jbee requested a review from pdudits April 24, 2019 06:49

jbee added this to the 5.192 milestone Apr 24, 2019

This was referenced Apr 24, 2019

[microprofile] @Timeout annotation cannot set the java.time.temporal.ChronoUnit via the configuration file. #3821

Closed

[microprofile] @CircuitBreaker with configuration file does not override the requestVolumeThreshold. #3762

Closed

jbee added 3 commits April 24, 2019 17:16

PAYARA-3468 added tests for fallback basic behaviour

013c0e6

PAYARA-3468 added basic correctness test for Bulkhead

039b471

PAYARA-3468 added basic correctness test for circuit breaker

6216b29

pdudits suggested changes Apr 26, 2019

View reviewed changes

jbee added 2 commits April 29, 2019 09:49

PAYARA-3468 corrected typo in method name

2c6a16b

PAYARA-3468 added stress test with multiple concurrent callers

0243816

pdudits approved these changes Apr 30, 2019

View reviewed changes

Pandrex247 added 2 commits April 30, 2019 16:13

Indenting

34b11d3

Typo in method names

7b579aa

Pandrex247 reviewed Apr 30, 2019

View reviewed changes

appserver/payara-appserver-modules/microprofile/fault-tolerance/pom.xml Outdated Show resolved Hide resolved

arjantijms reviewed May 1, 2019

View reviewed changes

jbee added 2 commits May 1, 2019 10:50

PAYARA-3468 added hamcrest to dependency management

2ce3836

Merge branch 'PAYARA-3468-MP-FT-2.0' of github.com:jbee/Payara into P…

ab18255

…AYARA-3468-MP-FT-2.0

Pandrex247 approved these changes May 1, 2019

View reviewed changes

Pandrex247 merged commit 0aa6aa0 into payara:master May 1, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PAYARA-3468 MP FT 2.0 update #3911

PAYARA-3468 MP FT 2.0 update #3911

jbee commented Apr 23, 2019 •

edited

Loading

jbee commented Apr 23, 2019

jbee commented Apr 25, 2019

pdudits left a comment

pdudits Apr 25, 2019

jbee Apr 29, 2019

arjantijms Apr 30, 2019

jbee Apr 30, 2019 •

edited

Loading

pdudits Apr 30, 2019

jbee Apr 30, 2019

pdudits Apr 26, 2019

jbee Apr 29, 2019

jbee commented Apr 29, 2019

jbee commented Apr 29, 2019

Pandrex247 left a comment

arjantijms commented May 1, 2019

arjantijms May 1, 2019

arjantijms commented May 1, 2019

jbee commented May 1, 2019 •

edited

Loading

jbee commented May 1, 2019

Pandrex247 commented May 1, 2019

jbee commented May 1, 2019

PAYARA-3468 MP FT 2.0 update #3911

PAYARA-3468 MP FT 2.0 update #3911

Conversation

jbee commented Apr 23, 2019 • edited Loading

jbee commented Apr 23, 2019

jbee commented Apr 25, 2019

pdudits left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbee Apr 30, 2019 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbee commented Apr 29, 2019

jbee commented Apr 29, 2019

Pandrex247 left a comment

Choose a reason for hiding this comment

arjantijms commented May 1, 2019

Choose a reason for hiding this comment

arjantijms commented May 1, 2019

jbee commented May 1, 2019 • edited Loading

jbee commented May 1, 2019

Pandrex247 commented May 1, 2019

jbee commented May 1, 2019

jbee commented Apr 23, 2019 •

edited

Loading

jbee Apr 30, 2019 •

edited

Loading

jbee commented May 1, 2019 •

edited

Loading