-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stats enhancements #714
Stats enhancements #714
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey Alexandra, thanks for working on this. Have some suggestions and questions for you.
Also; general question -- in the description, you mentioned:
calculate average response times without producing incorrect negative values (by ignoring data that looks inconsistent like response time < request time);
Do we know why this inconsistent data happens?
|
||
private double avgBlocksPerSec; | ||
/** @implNote Access to this resource is managed by the {@link #requestsLock}. */ | ||
private final Map<String, RequestCounter> requestsToPeers = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a benefit to manually doing locking as opposed to just using Collections.synchronizedMap to wrap the requestsToPeers map (and same question for all the other maps that have their own lock)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the initial code, we had both synchronized collections and synchronized methods which was overkill. I opted to remove both in favor of separate locks. The disadvantage of the concurrent maps would be that in some methods the map is filtered multiple times. With concurrent maps only, we might get inconsistent data, like the overall average for the peer times not matching the rest of the shown data.
.sum(); | ||
requestsToPeers | ||
.entrySet() | ||
.parallelStream() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I'm not sure if parallelStream() is what we want here... since the stream is modifying percentageReq, can't that cause a ConcurrentModificationException on it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use for-loop will be better. if the calculation is not really heavy. use stream will slow down the processing time cause the thread context switch stuff.
https://blog.oio.de/2016/01/22/parallel-stream-processing-in-java-8-performance-of-sequential-vs-parallel-stream-processing/
@aion-kelvin The stats make some assumptions on the correspondence of a response to a request. They are matched positionally because there are no identifiers to match the request to its response, especially for status requests, which is what is tracked in this case. Since different threads are at work here and the request is first sent out and then added to statistics, there can be scenarios where the response is received and added to statistics before the request. I'd be tempted to log the stats information before sending the request, but I'm worried that it might interfere with application logic, and perhaps delegating all the stats to low priority threads is the next necessary enhancement step. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except for the parallelstream.
Cool, thanks for the explanation |
f1fd792
to
967c298
Compare
I can confirm that a kernel running overnight with this update worked correctly. As you can see below it also fixed the negative values bug. Old version output:
Output for kernel with this PR:
Both versions were ran in parrallel for the same duration of time. |
Is 9 ms correct? Why is it so fast?
…________________________________
From: Alexandra Roatis <[email protected]>
Sent: 21 November 2018 10:13:17
To: aionnetwork/aion
Cc: Kelvin Lam; Mention
Subject: Re: [aionnetwork/aion] Stats enhancements (#714)
I can confirm that a kernel running overnight with this update worked correctly. As you can see below it also fixed the negative values bug.
Old version output:
====== sync-responses-by-peer ======
peer avg. response
------------------------------------
«overall» 765617 ms
id:3f066f -738521 ms
id:8629a3 44610 ms
id:4be9f8 56727 ms
id:a30d20 67141 ms
id:a30d30 67230 ms
id:a30d10 89453 ms
id:acda45 97917 ms
id:526445 109275 ms
id:0f9d39 124429 ms
id:1fe402 7737905 ms
Output for kernel with this PR:
====== sync-responses-by-peer ======
peer avg. response
------------------------------------
«overall» 784311 ms
id:3f066f 9 ms
id:4be9f8 17748 ms
id:acda45 35856 ms
id:a30d30 45848 ms
id:8629a3 50529 ms
id:a30d20 66535 ms
id:a30d10 78114 ms
id:526445 95776 ms
id:0f9d39 140577 ms
id:1fe402 7312124 ms
Both versions were ran in parrallel for the same duration of time.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#714 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AmJXXF-nAqntbeu8ecW2XIows1Q1Pb5fks5uxW2NgaJpZM4YrsTN>.
|
I suspect it's another node running on the office network. |
How does the latency been calculated? |
LGTM, but sync-responses-by-peer probably need to have more look. |
@AionJayT: yes, the response time definitely needs more refinement. As can be deduced from the answer to Kelvin's question above there are many sources of inaccuracies in the measurement. It should only be used for comparing different peers rather than for exact values. |
…m with parallelStream; specified element types for lists
…o stream for sorting the maps
967c298
to
03007e4
Compare
Description
Fixes some issues related to the gathered sync statistics, as follows:
replacesstream
withparallelStream
;System.nanoTime
instead ofSystem.currentTimeMillis
for computing the average response times.Continues Issue #661 .
Type of change
Insert x into the following checkboxes to confirm (eg. [x]):
Testing
Please describe the tests you used to validate this pull request. Provide any relevant details for test configurations as well as any instructions to reproduce these results.
Verification
Insert x into the following checkboxes to confirm (eg. [x]):