Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Consensus] Fix testing framework concurrency and improve logging #404

Merged
merged 27 commits into from
Jan 11, 2023

Conversation

Olshansk
Copy link
Member

@Olshansk Olshansk commented Dec 16, 2022

Description

The original motivation for this PR was to fix the block is nil on LocalNet.

While testing (using unit tests), debugging and investigating (using LocalNet), several issues were uncovered in the consensus testing framework introduced by #198. The goal of this PR changed to fix those issues and improve the quality of the consensus codebase.

Issue

Tends to "quality of life" improvements, which are somewhat related to #361.

Type of change

Please mark the relevant option(s):

  • New feature, functionality or library
  • Bug fix
  • Code health or cleanup
  • Major breaking change
  • Documentation
  • Other

List of changes

Consensus - Core

  • Force consensus to use a "star-like" broadcast instead of "RainTree" broadcast
  • Improve logging throughout through the use of emojis and rewording certain statements
  • Slightly improve the block verification flow (renaming, minor fixes, etc…) to stabilize LocalNet

Consensus - Tests

  • Rename the consensus_tests package to e2e_tests
  • Internalize configuration related to fail_on_extra_msgs from the Makefile to the consensus module
  • Forced all tests to fail if we receive extra unexpected messages and modify tests appropriately
  • After [Tooling] Flaky tests - Issue #192 #198, we made tests deterministic but there was a hidden bug that modified how the test utility functions because the clock would not move while we were waiting for messages. This prevented logs from streaming, tests from failing, and other issues. Tend to all related changes.

Consensus - Pacemaker

  • Rename ValidateMessage to ShouldHandleMessage and return a boolean
  • Pass a reason to InterruptRoudn
  • Improve readability of some parts of the code

P2P

  • Add a lock to the mempool to avoid parallel messages which has caused the node to crash in the past

Configs

  • Reorder private keys so addresses (retrieved by transforming private keys) to reflect the numbering in LocalNet appropriately. The address for val1, based on config1, will have the lexicographically first address. This makes debugging easier.

Testing

  • make test_consensus
  • make develop_test
  • LocalNet w/ all of the steps outlined in the README

Required Checklist

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have tested my changes using the available tooling
  • I have updated the corresponding CHANGELOG

If Applicable Checklist

  • I have updated the corresponding README(s); local and/or global
  • I have added tests that prove my fix is effective or that my feature works
  • I have added, or updated, mermaid.js diagrams in the corresponding README(s)
  • I have added, or updated, documentation and mermaid.js diagrams in shared/docs/* if I updated shared/*README(s)

@Olshansk Olshansk self-assigned this Dec 27, 2022
@Olshansk Olshansk added bug Something isn't working - expected behaviour is incorrect core Core infrastructure - protocol related consensus Consensus specific changes labels Dec 27, 2022
@Olshansk
Copy link
Member Author

@gokutheengineer Just a heads up that we might have some merge conflicts. No action items or need for a review yet.

This is where I'm working through the "block is nil" issue and have been overhauling some of the logs to make things clearer to understand and debug as well. Currently thinking of ways to add better unit tests for it.

@Olshansk Olshansk changed the title [WIP] Stabilizing consensus [Consensus] Fix testing framework concurrency and improve logging Jan 2, 2023
@Olshansk Olshansk marked this pull request as ready for review January 2, 2023 19:46
@Olshansk
Copy link
Member Author

Olshansk commented Jan 3, 2023

@deblasis I only added @gokutheengineer as a reviewer because I don't think this requires a review from multiple people but just thought you'd appreciate seeing the change here https://github.com/pokt-network/pocket/pull/404/files#diff-351d333685b4f6a9bb38d617a69a200daa5261f20ea31a62a7f1b4123dba9520.

In particular, look at the changes in waitForEventsInternal. It was a bug both of us overlooked when adding deterministic testing, but works (and IMO is much clearer) now.

➕ 1️⃣ to the iteration on each other's work

consensus/block.go Outdated Show resolved Hide resolved
consensus/block.go Outdated Show resolved Hide resolved
consensus/debugging.go Show resolved Hide resolved
consensus/doc/CHANGELOG.md Outdated Show resolved Hide resolved
consensus/e2e_tests/hotstuff_test.go Show resolved Hide resolved
consensus/e2e_tests/pacemaker_test.go Show resolved Hide resolved
consensus/e2e_tests/utils_test.go Outdated Show resolved Hide resolved
consensus/e2e_tests/utils_test.go Outdated Show resolved Hide resolved
consensus/e2e_tests/utils_test.go Outdated Show resolved Hide resolved
consensus/pacemaker.go Outdated Show resolved Hide resolved
Olshansk and others added 3 commits January 9, 2023 14:09
This is to ensure in the beginning all nodes set leader id to 0, to prevent some possible bugs.

Co-authored-by: goku <[email protected]>
@Olshansk
Copy link
Member Author

Olshansk commented Jan 9, 2023

@gokutheengineer PTAL when you have a chance

@gokutheengineer
Copy link
Contributor

@gokutheengineer PTAL when you have a chance

LGTM!

@Olshansk
Copy link
Member Author

@gokutheengineer I'm going to hold of for us to merge in #425 (already approved).

Plan is to rebase this on top of master, retest, and then submit this.

@Olshansk Olshansk merged commit bf6728e into main Jan 11, 2023
@Olshansk Olshansk deleted the consensus/techdebt branch January 11, 2023 00:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working - expected behaviour is incorrect consensus Consensus specific changes core Core infrastructure - protocol related
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants