Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make bootstrapping handle its own timeouts #3410

Merged
merged 4 commits into from
Oct 21, 2024

Conversation

yacovm
Copy link
Contributor

@yacovm yacovm commented Sep 24, 2024

Why this should be merged

Currently, an engine registers timeouts into the handler, which schedules the timeouts on behalf of the the engine.

The handler then notifies the engine when the timeout expired.

However, the only engine that uses this mechanism is the bootstrapping engine, and not the other engine types such as the snowman and state sync engines.

It therefore makes sense to consolidate the timeout handling instead of delegating them to the handler.

By moving the timeout handling closer to the bootstrapper, we can make the API of the common.Engine be slimmer by removing the Timeout() method from it.

How this works

I refactored the timeout logic and decoupled it from the handler.
I also re-implemented the timeout mechanism in order to be more robust and testable.

How this was tested

  • CI tests
  • Added unit tests with 100% code coverage.
  • I ran a node which I built with the data race detector flag until after it was bootstrapped in order to ensure there were no data races introduced in this PR.
diff --git a/scripts/build_avalanche.sh b/scripts/build_avalanche.sh
index 74ec675e2..26320a562 100755
--- a/scripts/build_avalanche.sh
+++ b/scripts/build_avalanche.sh
@@ -29,4 +29,4 @@ source "$AVALANCHE_PATH"/scripts/constants.sh
 
 build_args="$race"
 echo "Building AvalancheGo..."
-go build $build_args -ldflags "-X github.com/ava-labs/avalanchego/version.GitCommit=$git_commit $static_ld_flags" -o "$avalanchego_path" "$AVALANCHE_PATH/main/"*.go
+go build -race $build_args -ldflags "-X github.com/ava-labs/avalanchego/version.GitCommit=$git_commit $static_ld_flags" -o "$avalanchego_path" "$AVALANCHE_PATH/main/"*.go

The node successfully finished bootstrapping and there were no data races observed in its log.

@yacovm yacovm force-pushed the simplifyTimeouts branch 2 times, most recently from e2641d7 to 09ad764 Compare September 24, 2024 00:56
@yacovm yacovm marked this pull request as draft September 24, 2024 01:07
@yacovm yacovm force-pushed the simplifyTimeouts branch 3 times, most recently from 25122c6 to b54e285 Compare September 24, 2024 01:59
@yacovm yacovm self-assigned this Sep 24, 2024
@yacovm yacovm force-pushed the simplifyTimeouts branch 7 times, most recently from 31d051e to c25a68d Compare September 25, 2024 01:02
@yacovm yacovm marked this pull request as ready for review September 25, 2024 01:15
}

// RegisterTimeout fires the function the timeout handler is initialized with no later than the given timeout.
func (th *timeoutHandler) RegisterTimeout(d time.Duration) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved this code section from handler.go

snow/engine/common/bootstrap_tracker.go Outdated Show resolved Hide resolved
snow/engine/snowman/bootstrap/bootstrapper.go Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
Comment on lines 49 to 52
newTimer: func(d time.Duration) (<-chan time.Time, func() bool) {
timer := time.NewTimer(d)
return timer.C, timer.Stop
},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for specifying this function here rather than just using time.NewTimer directly?

It looks like we replace this with an identical implementation in one of our test cases

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I needed it for the tests :-)

},
},
} {
t.Run(testCase.desc, func(t *testing.T) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to remove clock from these tests?

What do you think of adding test cases for multiple timeouts w/ and w/o preemption?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added two tests as requested, but I need the clock for these tests.

snow/engine/common/bootstrap_tracker.go Outdated Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/snowman/bootstrap/bootstrapper.go Outdated Show resolved Hide resolved
@yacovm yacovm force-pushed the simplifyTimeouts branch 6 times, most recently from 4f949b8 to bdf0063 Compare October 3, 2024 20:14
@yacovm yacovm added the cleanup Code quality improvement label Oct 3, 2024
snow/engine/common/timer.go Outdated Show resolved Hide resolved
timer := th.newTimer(d)
defer timer.Stop()

defer th.onTimeout()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I feel like this makes a bit more sense to just be called inline at the end of the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inlined

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this inlining may need to be pushed from your local branch, it still seems to use the defer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I thought you meant to inline relinquishing the pending token.

snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/common/timer.go Outdated Show resolved Hide resolved
snow/engine/common/timer_test.go Outdated Show resolved Hide resolved
snow/engine/common/timer_test.go Outdated Show resolved Hide resolved
snow/engine/common/timer_test.go Outdated Show resolved Hide resolved
snow/engine/common/timer_test.go Outdated Show resolved Hide resolved
@yacovm
Copy link
Contributor Author

yacovm commented Oct 11, 2024

@StephenButtolph thanks for the review, addressed your comments.

Copy link
Contributor

@StephenButtolph StephenButtolph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

final nits then lgtm

message/internal_msg_builder.go Show resolved Hide resolved
timer := th.newTimer(d)
defer timer.Stop()

defer th.onTimeout()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this inlining may need to be pushed from your local branch, it still seems to use the defer.

snow/engine/common/timer_test.go Outdated Show resolved Hide resolved
snow/engine/common/timer_test.go Outdated Show resolved Hide resolved
snow/engine/common/timer_test.go Outdated Show resolved Hide resolved
snow/engine/common/timer_test.go Show resolved Hide resolved
Currently, an engine registers timeouts into the handler, which schedules the timeouts on behalf of the the engine.
The handler then notifies the engine when the timeout expired.

However, the only engine that uses this mechanism is the bootstrapping engine, and not the other engine types such as the snowman and state sync engines.

It therefore makes sense to consolidate the timeout handling instead of delegating them to the handler.

By moving the timeout handling closer to the bootstrapper, we can make the API of the common.Engine be slimmer by removing the Timeout() method from it.

Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
Signed-off-by: Yacov Manevich <[email protected]>
@StephenButtolph StephenButtolph added this to the v1.11.13 milestone Oct 21, 2024
@StephenButtolph StephenButtolph added this pull request to the merge queue Oct 21, 2024
Merged via the queue into ava-labs:master with commit 071f780 Oct 21, 2024
23 checks passed
yacovm added a commit to yacovm/avalanchego that referenced this pull request Oct 25, 2024
yacovm added a commit to yacovm/avalanchego that referenced this pull request Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cleanup Code quality improvement
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants