Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

APPSERV-12 Common and dynamic trace store size via replicated maps #4471

Merged
merged 4 commits into from
Feb 20, 2020

Conversation

jbee
Copy link
Contributor

@jbee jbee commented Feb 5, 2020

Background

Both historic and non-historic request tracing store can be local or shared in the cluster depending on the instance and cluster configuration or setup. In case instances share a cluster store it becomes unclear which size limit is effective for this common store. As each instance would apply its own limit to the store when adding traces the instance with the lowest setting would clear the store to that size each time an entry is added by that instance. This semantic is unexpected and confusing as a single instance with a very low setting has the potential to "ruin" the contents of the store for all other instances.

Summary

This task seeks to improve the situation described in background section. Instead of the smallest local setting dictating the effective size (when it actually does add traces) the maximum size of all instances with enabled request tracing should be applied in all circumstances (without being dependent on whether or not traces are added by enabled instances).

As each instance (except the DAS) only knows its own configuration instances cannot rely on domain.xml based information to calculate such a common maximum value. In clusters of Payara Micro instances each would also consider itself in the role of the DAS which makes a single central value problematic. This means have a truly cluster wide identical configuration on basis of domain.xml based configuration is not possible.

To introduce configuration values that are truly shared among all instances in the cluster a new service ClusteredConfig was added. It uses hazelcast's ReplicatedMap to hold the local values of each instance that shared its value. The instances are responsible to actively share and un-share (clear) their local value for a shared property depending on the runtime state and the semantics attached to that shared property. Local values of stopped instances will automatically be cleared.

In case of the request store size the logic is to share the size if a clustered store is used and the request tracing is active and to un-share if it is disabled.

The second semantic change is to make the store size dynamic. Instead of setting a fixed int value a IntSupplier is set which provides the correct size for the moment as it can change for each time a trace is added without the general configuration of that instance undergoing a change.
In case of a local store the size always reflects the size set in the RequestTracingExecutionOptions while in the clustered store case it reflects the current maximum of all local sizes of those instances where request tracing is enabled.

Testing

Unit tests were adopted to dynamic size.
Clustered size was tested manually following the below steps.

Testing larger size of another instance takes precedence

  1. build and install payara locally
  2. start in debug from your IDE
  3. start monitoring console and open it in a browser tab so it does its polling
  4. configure request tracing for DAS: enable it, set Target Count 2, Time Value 20, Time Unit SECONDS, Threshold Value 6, Threshold Unit MILLISECONDS.
  5. set a breakpoint in LongestTraceStorageStrategy#getTraceForRemoval (the method called to enforce the effective size passed as maxSize) but disable all breakpoints for now
  6. create another instance and start it
  7. configure a higher Trace Store Size for that instance, e.g. 25 (but don't enable it yet)
  8. enable breakpoints, you should soon get a paused execution, expect the maxSize is still the value of the DAS
  9. enable the other instance request tracing
  10. when now paused at the breakpoint the maxSize should be higher value of the other instance.

Testing not active configurations are no longer relevant

  1. do all 10 steps of above testing (or start when those are done)
  2. disable the request tracing on other instance
  3. when now paused at the breakpoint the maxSize should be back to DAS value
  4. enable request tracing once again for the other instance
  5. stop the other instance
  6. when now paused at the breakpoint the maxSize should be back to DAS value

Testing "No Cluster" config does not cause problems

  1. build server
  2. run java -jar ./appserver/extras/payara-micro/payara-micro-distribution/target/payara-micro.jar --nocluster hello-world.war
  3. check no exception is shown in the logs

@jbee jbee self-assigned this Feb 5, 2020
@jbee
Copy link
Contributor Author

jbee commented Feb 5, 2020

jenkins test please

@jbee jbee added this to the 5.201 milestone Feb 5, 2020
@jbee
Copy link
Contributor Author

jbee commented Feb 7, 2020

jenkins test please

@jbee jbee requested a review from Pandrex247 February 19, 2020 11:00
@jbee
Copy link
Contributor Author

jbee commented Feb 19, 2020

jenkins test please

@jbee
Copy link
Contributor Author

jbee commented Feb 19, 2020

@Pandrex247 Fixed the NPE for --nocluster

Copy link
Member

@Pandrex247 Pandrex247 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work now 👍

@jbee
Copy link
Contributor Author

jbee commented Feb 20, 2020

jenkins test please

@jbee jbee merged commit b2e0bd4 into payara:master Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants