Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PAYARA-3468 MP FT 2.0 update #3911

Merged
merged 32 commits into from
May 1, 2019
Merged

Conversation

jbee
Copy link
Contributor

@jbee jbee commented Apr 23, 2019

Implements MicroProfile Fault-Tolerance 2.0.

While 2.0 did not add a lot of new features larger changes were done for two reasons:

  1. To make the interaction between different FT behaviours compliant with 2.0 semantics some form of restructuring was required, partly due to semantics being slightly extended and more strictly defined in 2.0, partly because some cases were not compliant before. E.g. @Fallback needed to be allowed alone and in any combination with other annotations. The best solution to most of the interaction and processing problems seemed to be to merge the interceptor into one.
  2. The implementation should become more modular, less repetitive, (unit) testable and easier to follow by introducing abstractions. While non of this was strictly necessary I think its time well spend. Otherwise this time would have gone into lengthy debugging sessions looking for errors. Also this was a prerequisite for the unit tests I wanted to add for TCK tests that cannot be run due to test setup problems.

Change summary:

  • Interceptors have been merged into FaultToleranceInterceptor that activates on new FaultTolerance marker annotation (FT handling itself was further extracted into FaultTolerancePolicy).
  • new FaultTolerance annotation is added (at runtime) to all methods with FT annotations (that is the solution to do "one interceptor to rule them all")
  • FaultToleranceExtension now handles interceptor priority changes done via Config (missing feature)
  • Validation was moved from annotation processing time to invocation time (cached) as well as annotation processing time to validate both the annotations but also the actual values used after overrides from Config would be applied. This is captured in the overall FaultTolerancePolicy that should be apply to a method. It combines all possible FT policy values for a method. Each of the policies has the same fields as the annotation they represent just that they hold the values after overrides were applied. This makes validation now effectively enforced by construction. Using invalid policies is no longer possible.
  • Most of the policy analysis moved from each invocation to once per method on first invocation to reduce the overhead of now validating actually used policies. To still allow changing Config overrides at runtime the used policy has a TTL of a minute after which it is recreated.
  • Use of Config and override logic was extracted and abstracted into FaultToleranceConfig interface
  • Use of MetricsRegistry and key names was extracted and abstracted into FaultToleranceMetrics interface
  • Use of any "services" was abstracted into FaultToleranceService interface, the implementation was renamed (and somewhat re-purposed) to FaultToleranceServiceImpl.
  • FaultToleranceObject was renamed FaultToleranceApplicationState and adopted to other changes
  • BulkheadSemaphore class was created to contain bulkhead specific requirements on the basic Semaphore.
  • Fallback method lookup is much more sophisticated taking method parameter types and inheritance into account (see MethodLookupUtils; actually checked by TCK)
  • Couple of other smaller changes needed to decouple things to the point where they could be tested more isolated.
  • Moved all the service implementation specific "private" classes into new package service

Overall the FT logic moved from the interceptors into FaultTolerancePolicy where each annotation is handled by a method. These methods are called in a fixed chain, each representing a stage of the overall FT handling as required for the policy in place.

Logging

  • Level FINER is used for execution status information.
  • Level FINE is used for "event-like" information.

Tests & Testing:
All tests of the TCK that need to be excluded since they expect an unwrapped exception where replicated in added unit tests so that we do test correct behaviour. In addition I added tests for config overrides because the logic is somewhat confusing and not very well illustrated by the TCK tests. Last but not least I added tests for the asynchronous error handling since I discovered that the TCK has very little coverage on this important aspect (and indeed I did find another error in the implementation when adding the tests).

For most unit tests there is a corresponding method with FT annotations. The test method and the corresponding method under test are linked via name convention. The method under test has the name <test-method-name>_Method (which I took from some TCK test).

FYI: With the added tests the coverage of the module is now ~65% with main logic being around 80% covered. The 73 tests run in less then a second.

jbee added 23 commits March 29, 2019 17:13
…lbackmethod PASS; added unit tests for fallback method lookup and validation taken from TCK scenarios
@jbee jbee self-assigned this Apr 23, 2019
@jbee jbee requested a review from Pandrex247 April 23, 2019 14:21
@jbee
Copy link
Contributor Author

jbee commented Apr 23, 2019

jenkins test please

@jbee
Copy link
Contributor Author

jbee commented Apr 25, 2019

jenkins test please

Copy link
Contributor

@pdudits pdudits left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least the typo should be corrected ;)

* A simple cache with a fix {@link #TTL} with a policy for each target method.
*/
private static final ConcurrentHashMap<Class<?>, ConcurrentHashMap<Method, FaultTolerancePolicy>> POLICY_BY_METHOD
= new ConcurrentHashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding an actual tuple type for key (Class, Method) might simplify working with this map

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Originally this was intentionally done this way to avoid object creation on lookup. I was hoping to not have to create garbage for each invocation. Later this turned out to be very difficult so there will be 2-3 garbage objects per invocation. We could change this and make it a bit more garbage and it certainly would increase readability. On the other hand there are just 2 methods using this so the simplification isn't that big and not creating an additional object might still be worth it. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My gut feeling says don't optimise at this level until it's really proven to be a bottleneck (e.g. this being in a tight loop with thousands of lookups). In CDI these temp objects might be dwarfed by all the other things going on.

Also, in select cases the JVM will allocate objects on the stack (if escape analysis proves they can't escape their scope), making object creation for temp object very cheap).

Copy link
Contributor Author

@jbee jbee Apr 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was not really applied optimizing. I saw two options: use a key class or do a nested map. I knew there was only two usages of the structure and nested maps had the benefit of avoiding garbage objects so I went for that option. Making this decision a big thing is maybe also wrong focus. Only reason I did not change it later was that it does not make much sense to put more work into something that makes so little difference as its scope is and will be tiny.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more looking at this from code readability point of view, as operations on nested maps were always bit hard to read (although it's now better with computeIfAbsent).

A theoretical performance argument for the key class would be, that it will reduce lookup time, by only utilizing single map rather than two of them.

A compromise solution would be to encapsulate the map into separate class only exposing the operations that are needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about readability was well. So ultimately the question now is: do we think improved readability is worth the effort of changing this, then I think we should do it. I hesitated since there are literally 2 lines of code affected and using a key class would mean actually more code line wise so it felt less clear and didn't really call for action so I left it alone.

*
* @author Jan Bernitt
*/
public class BulkheadBasicTest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh actually, those are nicely testable.

For concurrency, can you think about a stress test? This what helped me time to time to validate, that my view of the concurrent behavior of my code is also CPU's view on it :)

For example limit at most 4 concurrent invocations over 8-thread threadpool putting few thousand invocations in the pool, and verify some resonable invariants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a test that uses all annotations except @Timeout (since I did not want to get into real time waiting) and spawns a number of concurrent callers each doing a number of calls. The tested method will fail hard every 3rd time and "soft" every 5th time (unless this is also every 3rd time). The attributes on the annotations are set in such a way that the failing has different effects, it definitely will cause retries, it definitely will cause circuit breaker transitions and there is a good chance that even after retrying some calls (from caller point of view) do fail entirely. This is nothing we can be sure of though. The test asserts that the numbers make sense and that the end state is clean.

@jbee
Copy link
Contributor Author

jbee commented Apr 29, 2019

jenkins test please

@jbee
Copy link
Contributor Author

jbee commented Apr 29, 2019

@pdudits addressed all your comments. PR ready for re-review.

Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from this one comment and the couple of tiny things I've fixed for you, all seems bon :) Nicely done.

@arjantijms
Copy link
Contributor

The single policy looks good, and may make it easier indeed to combine the several aspects of FT in a somewhat more coherent way.

I do have some reservations about hardcoding the FT asynchronous annotation here, and would have loved to see this working with any (CDI based) asynchronous annotation, without the policy having explicit knowledge about which one was used, but this reservation for now is not big enough to ask for changes.

}
}

public static final class PriorityLiteral extends AnnotationLiteral<Priority> implements Priority {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self and others: should propose literals to be in the FT spec for each annotation

@arjantijms
Copy link
Contributor

Another general comment, also not strong enough to warrant a changes requested, but what about renaming Policy to something a bit more descriptive, like FaultTolerancePolicy?

@jbee
Copy link
Contributor Author

jbee commented May 1, 2019

@arjantijms I had not forgotten your input from our show and tell. I was planning to do the "multi-annotation" support as an extra PR. I'll need some input from you on what the goal is. Maybe this is reasonable to handle by a separate jira where you can dump what you know about annotations that should work or just comment on PAYARA-3468?

On the Policy name: The name FaultTolerancePolicy was already taken :D. That is the overall policy combining the 6 possible policies.

@jbee
Copy link
Contributor Author

jbee commented May 1, 2019

Thanks @Pandrex247 for the fixes. I addressed your comment on the version and moved it to dependency management as discussed.

@Pandrex247
Copy link
Member

Jenkins test please

@jbee
Copy link
Contributor Author

jbee commented May 1, 2019

jenkins test please

@Pandrex247 Pandrex247 merged commit 0aa6aa0 into payara:master May 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants