Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stats enhancements #714

Merged
merged 9 commits into from
Nov 21, 2018
Merged

Stats enhancements #714

merged 9 commits into from
Nov 21, 2018

Conversation

AlexandraRoatis
Copy link
Contributor

@AlexandraRoatis AlexandraRoatis commented Nov 20, 2018

Description

Fixes some issues related to the gathered sync statistics, as follows:

  • improved access to resources by using different locks;
  • replaces stream with parallelStream;
  • specified element types for lists;
  • calculate average response times without producing incorrect negative values (by ignoring data that looks inconsistent like response time < request time);
  • correct formatting and license for stats test;
  • correctly tracking requests to peers; updated unit test;
  • correctly tracking received blocks instead of imported blocks;
  • using System.nanoTime instead of System.currentTimeMillis for computing the average response times.

Continues Issue #661 .

Type of change

Insert x into the following checkboxes to confirm (eg. [x]):

  • Bug fix.
  • New feature.
  • Enhancement.
  • Unit test.
  • Breaking change (a fix or feature that causes existing functionality to not work as expected).
  • Requires documentation update.

Testing

Please describe the tests you used to validate this pull request. Provide any relevant details for test configurations as well as any instructions to reproduce these results.

  • existing test suite + 1 new unit test

Verification

Insert x into the following checkboxes to confirm (eg. [x]):

  • I have self-reviewed my own code and conformed to the style guidelines of this project.
  • New and existing tests pass locally with my changes.
  • I have added tests for my fix or feature.
  • I have made appropriate changes to the corresponding documentation.
  • My code generates no new warnings.
  • Any dependent changes have been made.

@AlexandraRoatis AlexandraRoatis added bug Something isn't working enhancement New feature or request labels Nov 20, 2018
@AlexandraRoatis AlexandraRoatis added this to the 0.3.2 milestone Nov 20, 2018
Copy link
Contributor

@aion-kelvin aion-kelvin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Alexandra, thanks for working on this. Have some suggestions and questions for you.

Also; general question -- in the description, you mentioned:

calculate average response times without producing incorrect negative values (by ignoring data that looks inconsistent like response time < request time);

Do we know why this inconsistent data happens?


private double avgBlocksPerSec;
/** @implNote Access to this resource is managed by the {@link #requestsLock}. */
private final Map<String, RequestCounter> requestsToPeers = new HashMap<>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a benefit to manually doing locking as opposed to just using Collections.synchronizedMap to wrap the requestsToPeers map (and same question for all the other maps that have their own lock)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the initial code, we had both synchronized collections and synchronized methods which was overkill. I opted to remove both in favor of separate locks. The disadvantage of the concurrent maps would be that in some methods the map is filtered multiple times. With concurrent maps only, we might get inconsistent data, like the overall average for the peer times not matching the rest of the shown data.

modAionImpl/src/org/aion/zero/impl/sync/SyncStats.java Outdated Show resolved Hide resolved
modAionImpl/src/org/aion/zero/impl/sync/SyncStats.java Outdated Show resolved Hide resolved
.sum();
requestsToPeers
.entrySet()
.parallelStream()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not sure if parallelStream() is what we want here... since the stream is modifying percentageReq, can't that cause a ConcurrentModificationException on it?

Copy link
Collaborator

@AionJayT AionJayT Nov 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use for-loop will be better. if the calculation is not really heavy. use stream will slow down the processing time cause the thread context switch stuff.
https://blog.oio.de/2016/01/22/parallel-stream-processing-in-java-8-performance-of-sequential-vs-parallel-stream-processing/

modAionImpl/src/org/aion/zero/impl/sync/SyncStats.java Outdated Show resolved Hide resolved
modAionImpl/src/org/aion/zero/impl/sync/SyncStats.java Outdated Show resolved Hide resolved
@AlexandraRoatis
Copy link
Contributor Author

@aion-kelvin The stats make some assumptions on the correspondence of a response to a request. They are matched positionally because there are no identifiers to match the request to its response, especially for status requests, which is what is tracked in this case.

Since different threads are at work here and the request is first sent out and then added to statistics, there can be scenarios where the response is received and added to statistics before the request.

I'd be tempted to log the stats information before sending the request, but I'm worried that it might interfere with application logic, and perhaps delegating all the stats to low priority threads is the next necessary enhancement step.

Copy link
Collaborator

@AionJayT AionJayT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for the parallelstream.

@aion-kelvin
Copy link
Contributor

Cool, thanks for the explanation

@AlexandraRoatis AlexandraRoatis force-pushed the stats-update branch 2 times, most recently from f1fd792 to 967c298 Compare November 21, 2018 14:44
@AlexandraRoatis
Copy link
Contributor Author

I can confirm that a kernel running overnight with this update worked correctly. As you can see below it also fixed the negative values bug.

Old version output:

====== sync-responses-by-peer ======
        peer        avg. response
------------------------------------
   «overall»            765617 ms
   id:3f066f           -738521 ms
   id:8629a3             44610 ms
   id:4be9f8             56727 ms
   id:a30d20             67141 ms
   id:a30d30             67230 ms
   id:a30d10             89453 ms
   id:acda45             97917 ms
   id:526445            109275 ms
   id:0f9d39            124429 ms
   id:1fe402           7737905 ms

Output for kernel with this PR:

====== sync-responses-by-peer ======
        peer        avg. response
------------------------------------
   «overall»            784311 ms
   id:3f066f                 9 ms
   id:4be9f8             17748 ms
   id:acda45             35856 ms
   id:a30d30             45848 ms
   id:8629a3             50529 ms
   id:a30d20             66535 ms
   id:a30d10             78114 ms
   id:526445             95776 ms
   id:0f9d39            140577 ms
   id:1fe402           7312124 ms

Both versions were ran in parrallel for the same duration of time.

@aion-kelvin
Copy link
Contributor

aion-kelvin commented Nov 21, 2018 via email

@AlexandraRoatis
Copy link
Contributor Author

I suspect it's another node running on the office network.

@AionJayT
Copy link
Collaborator

How does the latency been calculated?

@AionJayT
Copy link
Collaborator

LGTM, but sync-responses-by-peer probably need to have more look.

@AlexandraRoatis
Copy link
Contributor Author

@AionJayT: yes, the response time definitely needs more refinement. As can be deduced from the answer to Kelvin's question above there are many sources of inaccuracies in the measurement. It should only be used for comparing different peers rather than for exact values.

@AlexandraRoatis AlexandraRoatis merged commit 0ae32fb into master-pre-merge Nov 21, 2018
@AlexandraRoatis AlexandraRoatis deleted the stats-update branch November 21, 2018 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants