Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Miner changes for time-based tenure extends #5493

Open
wants to merge 43 commits into
base: feat/time-based-tenure-extend
Choose a base branch
from

Conversation

obycode
Copy link
Contributor

@obycode obycode commented Nov 21, 2024

See #5468

This currently also includes #5452, since it has changes that would conflict. Once that is merged into develop, I will merge develop into feat/time-based-tenure-extend and then clean this branch up.

With this change, the signer will accept a tenure extend from miner N-1
when miner N wins a sortition but commits to the wrong parent tenure.
The previous design using a global singleton causes trouble in testing,
when we have multiple miners running in different threads of the same
process.
This is useful when checking the behavior during forking.
`SignerDBListener` struct is for a new thread that is always processing
StackerDB messages from the signers during a mining tenure.

`SignerCoordinator` is the interface that the miner uses with the
`SignerDBListener`, to propose a block and wait for signatures.
@hstove
Copy link
Contributor

hstove commented Nov 22, 2024

Should the signerdb_listener module be called stackerdb_listener instead?

@obycode
Copy link
Contributor Author

obycode commented Nov 22, 2024

Should the signerdb_listener module be called stackerdb_listener instead?

Oh, right. It's signer messages in the stacker db. I guess the signerdb is another thing. Oops. Thanks!

@obycode obycode marked this pull request as ready for review November 26, 2024 02:43
@obycode obycode requested review from a team as code owners November 26, 2024 02:43
@obycode
Copy link
Contributor Author

obycode commented Nov 26, 2024

This is "ready for review" as it should now 🤞 pass all of the existing integration tests, so it's ready for a first look. The last piece is to enable the actual usage of the tenure extend now that all the pieces are in place.

@obycode
Copy link
Contributor Author

obycode commented Nov 26, 2024

Forgot to add a comment, but with the last commits, the tenure extend is functional! Still needs more testing and a review of failures in existing tests.

/// Tracks signatures for blocks
/// - key: Sha512Trunc256Sum (signer signature hash)
/// - value: BlockStatus
blocks: Arc<(Mutex<HashMap<Sha512Trunc256Sum, BlockStatus>>, Condvar)>,
Copy link
Member

@jcnelson jcnelson Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you have a low-lift opportunity for a future-proofed refactoring here.

In other places, if struct Foo needs to share its state with other threads, it has a corresponding FooComms struct which contains all of these Arc<Mutex<..>>-wrapped data. The FooComms struct then exposes getters and setters for the wrapped data, so the caller doesn't need to bother with knowing how to deal with whatever concurrency primitives are used to access or mutate the Foo instance's data (e.g. locking the mutex or dereferencing the Arc the right way). The Foo implementation has a method or instantiating a FooComms.

It seems like you could have a StackerDBSessionComms struct which contained blocks and signer_idle_timestamps, and gets instantiated from the StackerDBSession.

ChainstateError::MinerAborted
})?;

sc.listener_thread = Some(listener_thread);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of these days, we need to centralize all the singleton thread-creation that happens in the node.

) -> Result<Vec<MessageSignature>, NakamotoNodeError> {
// Add this block to the block status map.
// Create a scope to drop the lock on the block status map.
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This scoped locking trick is something we should help callers avoid.

let mut blocks = lock.lock().expect("FATAL: failed to lock block status");

loop {
let (guard, timeout_result) = cvar
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another example of something we should help callers avoid. Low-level synchronization like waiting on a condition variable can be hidden within a Comms struct.

} = accepted;
let tenure_extend_timestamp = response_data.tenure_extend_timestamp;

let (lock, cvar) = &*self.blocks;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something a Comms struct could hide from the caller.

);
}
SignerMessageV0::BlockResponse(BlockResponse::Rejected(rejected_data)) => {
let (lock, cvar) = &*self.blocks;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here -- this needlessly marries the synchronization implementation (something that can be hidden) to the business logic.

Copy link
Member

@jcnelson jcnelson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My biggest feedback here is that the code mixes a lot of low-level thread synchronization code into the business logic, which I think is something we should strive to avoid. The synchronization logic may change down the road depending on what other threads need to interact with signature-gathering, so we should do what we do elsewhere and wrap all of the thread synchronization / state-sharing behind a "Comms" struct, and provide methods there that better reflect the business logic's needs.

@obycode
Copy link
Contributor Author

obycode commented Nov 27, 2024

Thanks for that suggestion @jcnelson. I did that refactoring in cef0dd4. Let me know if that seems like the right level of abstraction now.

@obycode
Copy link
Contributor Author

obycode commented Nov 27, 2024

Note to reviewers - there is an integration test for this behavior in #5471.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants