APPSERV-12 Common and dynamic trace store size via replicated maps #4471

jbee · 2020-02-05T10:29:21Z

Background

Both historic and non-historic request tracing store can be local or shared in the cluster depending on the instance and cluster configuration or setup. In case instances share a cluster store it becomes unclear which size limit is effective for this common store. As each instance would apply its own limit to the store when adding traces the instance with the lowest setting would clear the store to that size each time an entry is added by that instance. This semantic is unexpected and confusing as a single instance with a very low setting has the potential to "ruin" the contents of the store for all other instances.

Summary

This task seeks to improve the situation described in background section. Instead of the smallest local setting dictating the effective size (when it actually does add traces) the maximum size of all instances with enabled request tracing should be applied in all circumstances (without being dependent on whether or not traces are added by enabled instances).

As each instance (except the DAS) only knows its own configuration instances cannot rely on domain.xml based information to calculate such a common maximum value. In clusters of Payara Micro instances each would also consider itself in the role of the DAS which makes a single central value problematic. This means have a truly cluster wide identical configuration on basis of domain.xml based configuration is not possible.

To introduce configuration values that are truly shared among all instances in the cluster a new service ClusteredConfig was added. It uses hazelcast's ReplicatedMap to hold the local values of each instance that shared its value. The instances are responsible to actively share and un-share (clear) their local value for a shared property depending on the runtime state and the semantics attached to that shared property. Local values of stopped instances will automatically be cleared.

In case of the request store size the logic is to share the size if a clustered store is used and the request tracing is active and to un-share if it is disabled.

The second semantic change is to make the store size dynamic. Instead of setting a fixed int value a IntSupplier is set which provides the correct size for the moment as it can change for each time a trace is added without the general configuration of that instance undergoing a change.
In case of a local store the size always reflects the size set in the RequestTracingExecutionOptions while in the clustered store case it reflects the current maximum of all local sizes of those instances where request tracing is enabled.

Testing

Unit tests were adopted to dynamic size.
Clustered size was tested manually following the below steps.

Testing larger size of another instance takes precedence

build and install payara locally
start in debug from your IDE
start monitoring console and open it in a browser tab so it does its polling
configure request tracing for DAS: enable it, set Target Count 2, Time Value 20, Time Unit SECONDS, Threshold Value 6, Threshold Unit MILLISECONDS.
set a breakpoint in LongestTraceStorageStrategy#getTraceForRemoval (the method called to enforce the effective size passed as maxSize) but disable all breakpoints for now
create another instance and start it
configure a higher Trace Store Size for that instance, e.g. 25 (but don't enable it yet)
enable breakpoints, you should soon get a paused execution, expect the maxSize is still the value of the DAS
enable the other instance request tracing
when now paused at the breakpoint the maxSize should be higher value of the other instance.

Testing not active configurations are no longer relevant

do all 10 steps of above testing (or start when those are done)
disable the request tracing on other instance
when now paused at the breakpoint the maxSize should be back to DAS value
enable request tracing once again for the other instance
stop the other instance
when now paused at the breakpoint the maxSize should be back to DAS value

Testing "No Cluster" config does not cause problems

build server
run java -jar ./appserver/extras/payara-micro/payara-micro-distribution/target/payara-micro.jar --nocluster hello-world.war
check no exception is shown in the logs

jbee · 2020-02-05T11:10:29Z

jenkins test please

jbee · 2020-02-07T16:34:50Z

jenkins test please

jbee · 2020-02-19T11:00:50Z

jenkins test please

jbee · 2020-02-19T11:01:26Z

@Pandrex247 Fixed the NPE for --nocluster

Pandrex247

Seems to work now 👍

jbee · 2020-02-20T09:09:31Z

jenkins test please

APPSERV-12 common and dynamic trace store size via replicated maps

d6eb47e

jbee self-assigned this Feb 5, 2020

APPSERV-12 updates copyright headers

ad5e10b

jbee added this to the 5.201 milestone Feb 5, 2020

Merge branch 'master' into APPSERV-12-common-cluster-store-size

865aa06

PAYARA-12 fixes NPE in post construct when no cluster is available

acd3d6b

jbee requested a review from Pandrex247 February 19, 2020 11:00

Pandrex247 approved these changes Feb 19, 2020

View reviewed changes

jbee merged commit b2e0bd4 into payara:master Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

APPSERV-12 Common and dynamic trace store size via replicated maps #4471

APPSERV-12 Common and dynamic trace store size via replicated maps #4471

jbee commented Feb 5, 2020 •

edited

Loading

jbee commented Feb 5, 2020

jbee commented Feb 7, 2020

jbee commented Feb 19, 2020

jbee commented Feb 19, 2020

Pandrex247 left a comment

jbee commented Feb 20, 2020

APPSERV-12 Common and dynamic trace store size via replicated maps #4471

APPSERV-12 Common and dynamic trace store size via replicated maps #4471

Conversation

jbee commented Feb 5, 2020 • edited Loading

Background

Summary

Testing

Testing larger size of another instance takes precedence

Testing not active configurations are no longer relevant

Testing "No Cluster" config does not cause problems

jbee commented Feb 5, 2020

jbee commented Feb 7, 2020

jbee commented Feb 19, 2020

jbee commented Feb 19, 2020

Pandrex247 left a comment

Choose a reason for hiding this comment

jbee commented Feb 20, 2020

jbee commented Feb 5, 2020 •

edited

Loading