-
Notifications
You must be signed in to change notification settings - Fork 306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PAYARA-4121 Fixed Updating of Flashlight Class Instrumentation #4278
PAYARA-4121 Fixed Updating of Flashlight Class Instrumentation #4278
Conversation
private final Map<String, List<StatsProviderRegistryElement>> configToRegistryElementMap = new HashMap(); | ||
private final Map<Object, StatsProviderRegistryElement> statsProviderToRegistryElementMap = new HashMap(); | ||
private final Map<String, List<StatsProviderRegistryElement>> configToRegistryElementMap = new ConcurrentHashMap<>(); | ||
private final Map<Object, StatsProviderRegistryElement> statsProviderToRegistryElementMap = new ConcurrentHashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a concurrency fix that most likely is needed.
for (FlashlightProbe probe : probes.values()) { | ||
if (probe.isEnabled() && probe.getInvokerCount() > 0) { | ||
return true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of trying to maintain a "dirty" flag telling us if instrumentation needs to be updated the decision is now made based on the probe state. Anything else would cause us to remove instrumentation when there are active invokers waiting to be called.
// we still need to write out the class file in debug mode if it is | ||
// disabled. | ||
cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES + ClassWriter.COMPUTE_MAXS); | ||
ClassWriter cw = new ClassWriter(ClassWriter.COMPUTE_FRAMES + ClassWriter.COMPUTE_MAXS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cw
could be inlined.
} | ||
Log.fine("untransformed", providerClass.getName()); | ||
return null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: null
means transform back to the class bytes as given in the original file.
} | ||
|
||
private synchronized void addTransformer() { | ||
private void addTransformer() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since all updates are run by the same thread no synchronisation is needed any more.
} | ||
catch (Throwable th) { | ||
try ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just using try-with-resources instead
@@ -323,7 +261,7 @@ public MethodVisitor visitMethod(int access, String name, String desc, String si | |||
MethodVisitor mv = super.visitMethod(access, name, desc, signature, exceptions); | |||
|
|||
FlashlightProbe probe = probes.get(makeKey(name, desc)); | |||
if (probe != null) { | |||
if (probe != null && probe.isEnabled() && probe.getInvokerCount() > 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Important change to the way this class works: We decide if a method gets instrumented based on the state of the probe. As all state changes should trigger flag for update this is now safe to do.
jenkins test please |
...rk/src/main/java/org/glassfish/flashlight/impl/client/ProbeProviderClassFileTransformer.java
Outdated
Show resolved
Hide resolved
...rk/src/main/java/org/glassfish/flashlight/impl/client/ProbeProviderClassFileTransformer.java
Outdated
Show resolved
Hide resolved
jenkins test please |
...rk/src/main/java/org/glassfish/flashlight/impl/client/ProbeProviderClassFileTransformer.java
Show resolved
Hide resolved
jenkins test please |
jenkins test please |
@@ -118,7 +118,8 @@ StatsProviderRegistryElement getStatsProviderRegistryElement(Object statsProvide | |||
} | |||
|
|||
List<StatsProviderRegistryElement> getStatsProviderRegistryElement(String configElement) { | |||
return (this.configToRegistryElementMap.get(configElement)); | |||
List<StatsProviderRegistryElement> plain = configToRegistryElementMap.get(configElement); | |||
return plain == null ? Collections.emptyList() : new ArrayList<>(plain); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an unrelated fix for a problem I observed where the returned list is iterated and the iteration (or another process) causes changes to the list causing a concurrent modification exception. Since the method is only used in this module by methods doing this kind of loop I concluded it is best to always return a copy that can be iterated without problem.
@Pandrex247 might want to review this was well. Waiting for his feedback on it. |
Symthom
When un-deploying and re-deploying an application the web and HTTP statistics would no longer update.
Finding
The reason for updates no longer reaching the statistics counters was that the connection done via class instrumentation was cut when un-deploying an app. This occurred because un-deploying an application does disable statistics providers specifically responsible for the application which wrongly triggered a removal of the class instrumentation for the probe provider class connecting the events to the statistics counters. So instead of just disconnecting the application specific statistics all statistics based on a particular class were disabled. Since the overall statistics for HTTP and web use the same class as the application specific ones this also disabled the overall statistics.
Solution
Because the trigger for changes of class instrumentation was a static call in the innermost part of the logic there was no clear way to only trigger updates to the class instrumentation after the logic is done updating the changed probes and invokers. Another mistake was that disabling a invoker caused disabling of the class even though there might be other invokers and other probes for the same provider class. This is why both
transform
anduntransform
became justupdate
. If updating would require to update the instrumentation or remove it is now evaluated based on the probes and their active invokers. If there are active invokers and the probe is enabled the instrumentation is kept and updated. Methods with no invokers are no longer instrumented.To avoid running the update for each trigger call, which is each method times the statistics provider instances, the trigger only flags for update and a asynchronous daemon thread is used to run the actual updates with a small delay. By granting a delay the logic causing multiple triggers is likely to have completed and flagged all needed updates before the thread actually performs them.
During shutdown the thread ends from interruption and does not perform (most of) the triggered transformation that a shutdown normally causes.
Note to the Reviewer
Main change is in
ProbeProviderClassFileTransformer
. Besides that I only added some missing generics and made some hash maps into concurrent ones while looking for the problem.Note that
synchronized
was largely removed where present. Synchronisation is done by now using a concurrent map and atomics. To allow reclaim of key class key was changed toString
which alled use of concurrent map. Other methods no longer need synchronisation because only the updater thread will call them.This also fixes PAYARA-1285 which should be checked with the reproducer given in the ticket.
Testing
Easiest is to test this after #4274 is merged since it will be easy to verify in the monitoring console that un-deploying and re-deploying an application does no longer cause the request/sec to zero out.
With MC started enable web and HTTP monitoring until you see some request/sec in the Core view of MC. Redeploy MC and check that the requests/sec are still above 0, most likely 0.5 due to data polling every 2secs.
Testing should also check that enabling and disabling the web or HTTP monitoring has the expected effect. Again this is verified easiest after #4274 already being in master.
To check that instrumentation no longer is added or removed multiple times from provider classes I put a log output on INFO level in both cases of this if-statement https://github.com/payara/Payara/pull/4278/files#diff-16bec2e5206edbe4e95b050af4617992R193. Note that occasionally it is still possible that same class is handled twice due to unlucky async timing but not more often then that.