Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add wen_restart module #33344

Merged
merged 25 commits into from
Oct 6, 2023
Merged

Conversation

wen-coding
Copy link
Contributor

Problem

Add wen_restart module

Summary of Changes

  • Implement reading LastVotedForkSlots from blockstore.
  • Add proto file to record the intermediate results.
  • Also link wen_restart into validator.
  • Move recreation of tower outside replay_stage so we can get last_vote.

- Implement reading LastVotedForkSlots from blockstore.
- Add proto file to record the intermediate results.
- Also link wen_restart into validator.
- Move recreation of tower outside replay_stage so we can get last_vote.
@codecov
Copy link

codecov bot commented Sep 21, 2023

Codecov Report

Merging #33344 (2a61944) into master (64b3613) will increase coverage by 0.0%.
The diff coverage is 78.6%.

@@           Coverage Diff           @@
##           master   #33344   +/-   ##
=======================================
  Coverage    81.7%    81.7%           
=======================================
  Files         805      807    +2     
  Lines      218162   218256   +94     
=======================================
+ Hits       178411   178493   +82     
- Misses      39751    39763   +12     

wen-restart/src/wen_restart.rs Outdated Show resolved Hide resolved
core/src/replay_stage.rs Outdated Show resolved Hide resolved
validator/src/cli.rs Show resolved Hide resolved
let last_vote_slot = last_vote.last_voted_slot().unwrap();
let mut last_vote_fork = vec![last_vote_slot];
let mut slot = last_vote_slot;
for _ in 0..MAX_SLOTS_ON_VOTED_FORKS {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to go this deep, can we stop at block_commitment_cache.highest_supermajority_root() ?

if i'm behind the smr my last voted fork info doesn't contribute anything to wen restart anyway right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is also an option, I think the current logic is trying to be more straightforward so more fault tolerant. What if there is a bug in code so highest_supermajority_root is wrong?

So when we started the initial design we tried to see what the minimum required info is, we had to send last_vote, and need to send the slots on last voted fork so validators with different progress can sync up again. Other than that, we know an outage is probably less than 9 hours before we intervene, and we avoid relying on all other information. Because software can have bugs, you can't predict what is going wrong in an outage.

Does that make sense?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know what the latest oom time is, can we actually survive 9 hours without making a root?

either way it seems alright to play it safe and send all these slots.
was selfishly hoping we could reduce the size so that we could include hashes for the inevitable duplicate block restart 😛

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in past outages validators lasted maybe 3 to 4 hours without making root, we decided to restart before 9 hours.

Yeah whether we can include hash had been debated, and the first version it looks like hash isn't going to make it. We could always add another set of consensus on bankhash in the future though.

@t-nelson
Copy link
Contributor

@CriesofCarrots mind giving the protobufs stuff a once over for consistency and build reliability?

Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some stylistic suggestions and nits from me. The proto build script itself looks good.

validator/src/cli.rs Outdated Show resolved Hide resolved
wen-restart/protos/wen_restart.proto Outdated Show resolved Hide resolved
DONE = 6;
}

message InitRecord {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not totally clear to me what an InitRecord is. Might be more clear if you spell out Init? I'm not sure if Init is a noun or an adjective (or a verb, I suppose).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to MyLastVotedForkSlots.

wen-restart/src/wen_restart.rs Outdated Show resolved Hide resolved
wen-restart/build.rs Outdated Show resolved Hide resolved
CriesofCarrots
CriesofCarrots previously approved these changes Sep 22, 2023
Copy link
Contributor

@CriesofCarrots CriesofCarrots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r+ final nits. (I only reviewed the proto build, usage, and related)

wen-restart/Cargo.toml Outdated Show resolved Hide resolved
wen-restart/Cargo.toml Outdated Show resolved Hide resolved
wen-restart/src/wen_restart.rs Outdated Show resolved Hide resolved
@wen-coding wen-coding self-assigned this Sep 22, 2023
validator/src/cli.rs Outdated Show resolved Hide resolved
validator/src/cli.rs Outdated Show resolved Hide resolved
wen-restart/src/wen_restart.rs Outdated Show resolved Hide resolved
wen-restart/src/wen_restart.rs Outdated Show resolved Hide resolved
core/src/validator.rs Outdated Show resolved Hide resolved
@wen-coding wen-coding merged commit 630feed into solana-labs:master Oct 6, 2023
27 checks passed
@wen-coding wen-coding deleted the wen_restart_add_module branch October 6, 2023 22:13
deanmlittle pushed a commit to deanmlittle/solana that referenced this pull request Oct 8, 2023
* Add wen_restart module:
- Implement reading LastVotedForkSlots from blockstore.
- Add proto file to record the intermediate results.
- Also link wen_restart into validator.
- Move recreation of tower outside replay_stage so we can get last_vote.

* Update lock file.

* Fix linter errors.

* Fix depencies order.

* Update wen_restart explanation and small fixes.

* Generate tower outside tvu.

* Update validator/src/cli.rs

Co-authored-by: Tyera <[email protected]>

* Update wen-restart/protos/wen_restart.proto

Co-authored-by: Tyera <[email protected]>

* Update wen-restart/build.rs

Co-authored-by: Tyera <[email protected]>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Tyera <[email protected]>

* Rename proto directory.

* Rename InitRecord to MyLastVotedForkSlots, add imports.

* Update wen-restart/Cargo.toml

Co-authored-by: Tyera <[email protected]>

* Update wen-restart/src/wen_restart.rs

Co-authored-by: Tyera <[email protected]>

* Move prost-build dependency to project toml.

* No need to continue if the distance between slot and last_vote is
already larger than MAX_SLOTS_ON_VOTED_FORKS.

* Use 16k slots instead of 81k slots, a few more wording changes.

* Use AncestorIterator which does the same thing.

* Update Cargo.lock

* Update Cargo.lock

---------

Co-authored-by: Tyera <[email protected]>
std::io::Result,
};

fn main() -> Result<()> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe within this main function you'll want to add

println!("cargo:rerun-if-changed=proto/wen_restart.proto");

This causes the generated files to get regenerated if the protos change.

More info:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for checking, sending another PR to fix it.

Comment on lines +15 to +36
// Copied and adapted from
// https://github.com/Kimundi/rustc-version-rs/blob/1d692a965f4e48a8cb72e82cda953107c0d22f47/README.md#example
// Licensed under Apache-2.0 + MIT
match version_meta().unwrap().channel {
Channel::Stable => {
println!("cargo:rustc-cfg=RUSTC_WITHOUT_SPECIALIZATION");
}
Channel::Beta => {
println!("cargo:rustc-cfg=RUSTC_WITHOUT_SPECIALIZATION");
}
Channel::Nightly => {
println!("cargo:rustc-cfg=RUSTC_WITH_SPECIALIZATION");
}
Channel::Dev => {
println!("cargo:rustc-cfg=RUSTC_WITH_SPECIALIZATION");
// See https://github.com/solana-labs/solana/issues/11055
// We may be running the custom `rust-bpf-builder` toolchain,
// which currently needs `#![feature(proc_macro_hygiene)]` to
// be applied.
println!("cargo:rustc-cfg=RUSTC_NEEDS_PROC_MACRO_HYGIENE");
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the wen-restart module, I'm not seeing anything that requires specialization. Is this block needed? (Usually I see this code to enable the AbiExample testing macros.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not, I'll send you another PR to remove it.

@wen-coding wen-coding restored the wen_restart_add_module branch October 9, 2023 14:23
@@ -1381,6 +1381,35 @@ pub fn app<'a>(version: &'a str, default_args: &'a DefaultArgs) -> App<'a, 'a> {
.possible_values(BlockProductionMethod::cli_names())
.help(BlockProductionMethod::cli_message())
)
.arg(
Arg::with_name("wen_restart")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a quick look at this because of #33633. I believe (but could be mistaken) that this feature is not yet production ready. If that is correct, i think we should add this

.hidden(hidden_unless_forced())

with the idea being to keep the in-development features hidden from solana-validator --help until they're ready

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants