-
Notifications
You must be signed in to change notification settings - Fork 222
Refactor Metrics Collection to be deterministic and complete #671
Refactor Metrics Collection to be deterministic and complete #671
Conversation
Tests fail because of WildFly Creaper is not in Maven Central wildfly-extras/creaper#28 Locally tests should pass with JBoss Users Maven Settings: https://developer.jboss.org/wiki/MavenGettingStarted-Users |
Waiting for engineering operations to set artifacts synchronization up, see linked JIRA. |
b90f147
to
b3442b2
Compare
Small hint: |
Nice catch, @Ladicek , should be fixed now. |
@sebastienblanc @edewit could you please 👀 this? |
Ok, after testing this PR , I still have the hanging Pending status issue but seems to happen only after I start the server for the first psuh notifications, after that it works. If I stop the server and restart, again, the first 2 push notifications are stuck on pending. I can see that in the log for the one pending :
|
Ok , I can really confirm the pattern : 1.start AS 2. the 2 first notifications got stuck on pending 3. the third and following notifications works. 4. stop and restart the AS 5. Go back to step 2 |
@sebastienblanc as I mentioned on hangout, it is necessary to run the CLI script above to set address-settings for TriggerMetricCollection. However I believe I could overlook a setup for BatchLoaded and AllBatchesLoaded queues - it is likely that these two queues will require an extra setting as well to avoid sendings them to DLQ. |
I have tested this setup locally:
Let me update the PR. |
Applied patch that sets max-delivery-attempts to infinite for queues that we pull from in MetricCollector. |
connect | ||
batch | ||
|
||
/subsystem=messaging/hornetq-server=default/address-setting=jms.queue.TriggerMetricCollectionQueue:add(redelivery-delay=1000, max-delivery-attempts=300) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just wonder if a hard limit 5 mins for delivery (300x 1000ms) is enough here. We could either consider much highter values (hours) or even set it to infinite (-1).
For newer WildFly/EAP versions (WF 10, EAP7), we can set up exponential backoff so that if delivery is really long, it does not try redeliver each second, but rather each 15s or so,
such as: redelivery-delay=1000, redelivery-multiplier=1.5, max-redelivery-delay=15000, max-delivery-attempts=300)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lfryc can you create a new JIRA for including this in the installation guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lfryc the file needs to be included in the DIST bundle (via our maven assembly)
Something else, is that in database we already some .cli scripts - is this something we can combine?
and perhaps just rename them?)
@lfryc @sebastienblanc does this require some extra configuration steps to be run? (wondering because of the CLI changes). If yes, where are they documented? |
yes you need to run the jms-setup-wildfly.cli script. On Tue, Mar 1, 2016 at 11:25 AM, Matthias Wessendorf <
|
This causes a new concurrency issue..
Also, I see something like this:
in the logs PS: all push notifications have been delivered, but the code that deals with the analytic UI is broken and has (concurrency) issues |
@lfryc regarding the analytic UI updates, I think the current code is broken when there are concurrent push requests. For instance: have a loop of two push requests (via java sender), sending the same payload. Internally the endpoint issues to different 'submit' to the router, but than, the Message Queue is bound to the variantID. Of course, this is the same for different push jobs. RESULT: |
Thanks for testing @matzew , going to reproduce this locally... |
@lfryc the concurrency is issue is already fixed, but I think the other problem here remains. |
…0.5 refresh delay" This reverts commit 824d560.
…that handled MetricQueue before
…itely since their collection is triggered by TriggerMetricCollection event
e628c12
to
4a8844f
Compare
rebased on latest master @matzew this message rather reflects a feature:
if you look at the diagram [1] CollectTriggerEvent can be queued from several places - in order to make sure there is just one of them processed, HornetQ deduplicates it. Regrettably it also prints Warning for that case. We could recommend users to supress that log if it should be annoying. [1] https://docs.google.com/drawings/d/12cUrYf7ACLYEoYsIWul2szN3lmQcYFMjjaZDq2fr9qQ/edit |
👀 |
<dependency> | ||
<groupId>org.wildfly.extras.creaper</groupId> | ||
<artifactId>creaper-core</artifactId> | ||
<version>0.9.2</version> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1.0.0 is out, and on maven central http://search.maven.org/#search%7Cga%7C1%7Ccreaper-core
Upgraded WildFly Creaper to 1.0.0. Btw I can squash all the previous commits, I just wanted to keep the specific commits around, presumably it may make it simpler for review. |
@squash: sounds good On Wed, Mar 16, 2016 at 3:23 PM, Lukáš Fryč [email protected]
Matthias Wessendorf blog: http://matthiaswessendorf.wordpress.com/ |
|
Sending one push, to a variant (default config for batchSize etc), containing one iOS device, gives me:
Not sure why we see the |
@matzew did you pull me into this discussion on purpose? |
@matzew I've sanitized those warnings by our own de-duplication process. The process is also pretty readable from debug log (level=fine). |
…ic collection process already started
I start server with these sys props:
but I am seeing this:
there should not be two tokens in the batch, @lfryc |
/subsystem=messaging/hornetq-server=default/address-setting=jms.queue.BatchLoadedQueue:add(max-delivery-attempts=-1) | ||
/subsystem=messaging/hornetq-server=default/address-setting=jms.queue.AllBatchesLoadedQueue:add(max-delivery-attempts=-1) | ||
|
||
run-batch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lfryc does the latest queue belong in here too ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@matzew they don't need any special handling, I believe,
the MetricsProcessingStarted topic is just "optimization" for cluster, so that it does not have to go to the DB so often...
and TriggerVariantMetricCollection will be triggered each time the notification is dispatched by PushSender, so it will eventually succeed and push TriggerMetricCollection event as a result
@matzew I can see the two batches are loaded when aerogear.ios.batchSize=1, to be sure that the property is loaded correctly, I've created a new test |
@lfryc awesome! this also fixes the concurrency issue I noticed earlier this week 🌹 💋 |
https://issues.jboss.org/browse/AGPUSH-1512
refactor metric collection logic so that it is triggered externally in cycles until everything is processed
TriggerMetricCollectionQueue
that is used to store a "request for metrics processing"BatchLoaded
events but also severalVariantMetricInformation
eventsSee diagram for more clarity: https://docs.google.com/drawings/d/12cUrYf7ACLYEoYsIWul2szN3lmQcYFMjjaZDq2fr9qQ/edit?usp=sharing
🚢 deployed to https://upsstaging-aerogearz.rhcloud.com/ag-push/ (feel free to test it there, ping me for a password)
🎯 targets
master