Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vote timestamps Test Cases #3009

Closed
4 of 6 tasks
zhyatt opened this issue Oct 19, 2020 · 5 comments
Closed
4 of 6 tasks

Vote timestamps Test Cases #3009

zhyatt opened this issue Oct 19, 2020 · 5 comments

Comments

@zhyatt
Copy link
Collaborator

zhyatt commented Oct 19, 2020

See #2879 for the implementation details. Usage of timestamps in votes to replace sequence number will be automatically enabled on upgrade to V22.0DB7, no configuration needed.

Test Cases

VT1: General, sustained vote traffic

As a baseline test a sustained level of transactions at a low BPS should be established, ensure:

  • Verify excessive type: counters subtype: vote_invalid entries are not seen from stats RPC
  • Verify excessive type: counters subtype: vote_replay entries are not seen from stats RPC

VT2: Heavy/saturation vote traffic

Test a heavier level of transactions at a high BPS/saturation and ensure:

  • Verify excessive type: counters subtype: vote_invalid entries are not seen from stats RPC
  • Verify excessive type: counters subtype: vote_replay entries are not seen from stats RPC

VT3: Vote tightness tracking via WebSockets

Capture vote data from WebSockets during heavy network traffic times to determine the tightness of votes seen during varying levels of activity. Use the new "timestamp" field included there, which is pulled directly from the vote itself.

  • a. Track vote tightness during sustained, heavy network traffic below saturation level
  • b. Track vote tightness during heavy network traffic peaking above saturation level

For easier trend identification, having graphs of the tightness of votes over time during these tests would be valuable.

@zhyatt
Copy link
Collaborator Author

zhyatt commented Oct 22, 2020

A couple below saturation events were done today with some vote tightness capture based on vote timestamps compared to publish times. Thanks to @Srayman and for the blocks and graphs.

Based on these tests VT1 seems good on the vote_invalid side (none seen) and the vote_replay turns out to not be useful in these scenarios so those tests are being crossed off for both VT1 and VT2.

These tests are also satisfactory for VT3a test with additional saturation level tests for VT3b outstanding.

500 BPS Test, sustained ~2.5mins, non-saturation
Individual blocks were published during the event and votes captured on those blocks to show the arrival timing. As the network wasn't saturated the votes were largely grouped in tight 200-300ms window except for a couple outlier votes (as late as 6s post-confirmation) and 3 specific reps which had consistently lower vote times together, including the local node. The 0 time represents block publish time and Y-axis is in ms.

500 BPS Graph

1000 BPS Test, sustained ~1.25mins, near saturation
Similar to above graph details there was largely tight vote groupings with a handful of outliers. These outliers were further out though, as late as 68s post-publishing. CPS did trail slightly during the end of the block publishing period, so may have been near saturation.

1000 BPS Graph - Zoomed out

1000 BPS Graph - Zoom medium

1000 BPS Graph - Zoom tight

@zhyatt
Copy link
Collaborator Author

zhyatt commented Oct 22, 2020

Thanks for the blocks on this @Joohansson

A 1500 BPS saturation test (Discord link) was performed which resulted in a lower peak of just under 1000 BPS median seen across all nodes as well as just within PRs (so rebroadcasting issues likely weren't causing poor propagation due to publication to all PRs initially). Still no vote_invalid entries showing up so VT2 will be marked off. Vote tightness tracking will be left open as the poor performance and lack of some tools to properly measure demands more data.

The cause of the lower BPS propagation is not known, but could be due to a variety of network or other factors including some RocksDB caching issues (suspected in prior poor performance instances but mostly with CPS, not BPS), the addition of 10 pruned non-PR nodes on 2vCPU/2GB DigitalOcean droplets (although initial publishing of blocks should go directly to all PRs so non-PRs shouldn't impact BPS much), or other network issues. Further saturation tests will be performed.

Although the vote tightness data isn't the best, it is captured here for completeness. The 0 marker here is confirmation time (not publish time) and each column of data points is a block published and votes coming in for that election.

1500 BPS Graph - Zoomed out - large gaps between blocks being confirmed here with ~275s for confirmation time in the pink case.

1500 BPS Graph - first block after test highlighted - highlighted block took 17s to confirm and had post-confirmation votes seen as late as 34s

@zhyatt
Copy link
Collaborator Author

zhyatt commented Oct 24, 2020

A broader sample set was captured from 98,663 confirmations at 1000 BPS condensed into 21 intervals of 5 seconds each.

Graph 1 - zoomed out
Graph 1 - zoomed in
This first graph is using the max data point per rep for each interval, have max, min, median and mean per rep for each interval.

Graph 2
This graph uses the average per rep for each interval. This average graph looks pretty good so max must be just 1 or a handful that take longer to skew things.

In both graphs some bell curve behavior can be seen, with each case being the same rep generating the delayed votes causing that particular bell curve. These correlate with reps known to fall behind during spam.

@zhyatt
Copy link
Collaborator Author

zhyatt commented Nov 6, 2020

Additional tests above saturation: using the "median" value per rep across all confirmations from each vote interval (15 seconds)

Graph 1
The confirmations at the end with large negative numbers are expired elections, these will be handled in a better way with future capture through the websocket with modifications from #3016.

Graph 2
Roughly similar time window to the vote interval graphs above for reference. You can see as CPS drops some (green line here) the spacing between votes also increases. Then once block publishing stops around 12:02 CPS picks up and votes are spaced closer together again.

Although the vote timestamps feature is considered to be complete from a testing perspective, this issue will be left open for future insights/vote captures heading into later DBs.

@zhyatt
Copy link
Collaborator Author

zhyatt commented Mar 15, 2021

No issues have arisen with vote timestamps in build so far. Additional testing of vote activity will continue with final votes testing separately. Closing.

@zhyatt zhyatt closed this as completed Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant