Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proof of time changes #1767

Merged
merged 9 commits into from
Aug 8, 2023
Merged

Proof of time changes #1767

merged 9 commits into from
Aug 8, 2023

Conversation

rahulksnv
Copy link
Contributor

Next batch of changes:

  1. Proof of time feature is flag controlled by cmd line option --pot_role (default off)
  2. Implement the bootstrap/genesis slot discovery mechanism. --pot_bootstrap cmd line flag is introduced for immediate testing purposes. This designates a node to be responsible for building block 1. This will be replaced by specifying AllowAuthoringBy::RootFarmer in chain spec later. The clock master/node client changes include corresponding initialization
  3. Block pre digest changed to include the PoT related state.
  4. Consensus changes(when the feature is enabled): block proposal by slot worker include the proofs, block import verification just validates the proofs and logs the result. These don't affect the current consensus processing for now.
  5. When feature is enabled, subspace-node/subspace-service changed to start the PoT components (clock master, node client)
  6. PoT state management changes: API changes to serve the consensus path, refactoring to perform more validation, avoid repeated look ups, enforce a max limit on the local chain size (until purging is implemented)
  7. Temporary hack to help testing: this produces 1 proof/sec, without using lot of CPU. This will be removed once we perform benchmarking to figure out the right config params.
  8. Gossip improvements: changes from PoT improvemets #1739 included in this PR.
    Next step: consensus derives randomness from the proof of time

Testing: brought up two nodes: node client and clock master, one of them responsible for bootstrapping. Verified the boot strapping works as expected, the proofs are gossiped/included in blocks/verified as expected.

Code contributor checklist:

Copy link
Member

@nazar-pc nazar-pc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly nits, I feel like we have some technical debt already and need to do some simplifications based on recent discussions with Dariia before moving forward.

crates/subspace-node/src/lib.rs Outdated Show resolved Hide resolved
);
task_manager.spawn_essential_handle().spawn_blocking(
"subspace-proof-of-time-clock-master",
Some("subspace-proof-of-time-clock-master"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Group in this case would be pot

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean to change subspace-proof-of-time to pot, even though it makes sense. spawn_blocking takes two arguments, second of which is called group and is meant to group multiple tasks corresponding to the same thing. So all tasks related to PoT should have Some("pot") rather than copying the name of the task.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine either ways(as this group really has one task), changed as you suggested

crates/subspace-service/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines +1132 to +1131
PotVerifyError::ParentMissingPotDigest {
block_number,
parent_slot_number: parent_pre_digest.slot.into(),
slot_number: pre_digest.slot.into(),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a special case for block 1 in slow worker, but this doesn't take that into account and will error right away.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed, once we have the global randomness from PoT in next change, the slot worker will block until the local chain catches up. Then we won't be producing blocks without PoT in them

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean my assumption was that you can test something with this PR, but according to this check you wouldn't be able to produce second block (wouldn't be able to import it to be more specific). Am I right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I take this back. claim_slot() currently errors out if proof creation fails, so we won't be creating blocks with no PoT proof. But this will be changed to block if the chain is behind

crates/sc-consensus-subspace/src/lib.rs Show resolved Hide resolved
self.chain_info_fn.clone(),
)
.await
.expect("Consensus tip info should be available");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a proof that it will be though. I expect proofs in .expect(). The same not proof is also used in other places.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expectation is it should be available, but it is not. Not sure what you mean here, pls explain

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expectations are not proofs. .expect() must contain a proof why it will never ever panic. I have explained this many times at this point.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it crashes here because of any of the 3 reasons that should not happen (unable to read hdr from backend, etc), this is what I would want to see: reason why it crashed, other things are cosmetic. I don't understand your notion of "proof" at this point (I also checked the other instances, this looks consistent). Dragging this further is not productive, I can do this as a follow up if needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't happen, but it doesn't mean it will not happen and there is nothing this function can prove it will not happen. If you can't prove it - .expect() is not appropriate for this situation. I'm just concerned with increased number of .expect() that are not proofs in our codebase. I think I'll have to go through all of them in proof of time crates and either fix or add TODOs to fix.

If there could be error, fn initialize should return Result<T, E>. There are a few invalid proofs in sc-consensus-subspace I'm aware of that I have added too, some might even have TODOs, but we should be very careful with these in the code.

Please add TODOs to handle these properly in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that makes sense. I tend to add expect() only when a precondition fails and we cannot proceed further (e.g) failed to read from client backend, failed to create thread, etc or it is something we would address in an immediate future PR.

I reviewed the expect() in the crate, think there is only one instance (the config could be bad and PoT instance could not be created) that can be handled, for rest it is better to halt IMO. But added TODOs so we don't lose track

crates/sc-proof-of-time/src/node_client.rs Outdated Show resolved Hide resolved
Comment on lines +41 to +43
while sync_oracle.is_major_syncing() {
tokio::time::sleep(delay).await;
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will work as a hack, but I think you should subscribe to block import notifications instead of doing sleeps here and after each block checking if it is still syncing. Though this whole function looks suspicious to me right now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, this would change further, but seems to work well in testing so far. The block import is coming (soonish) anyways for the randomness injection fro m consensus

crates/sp-consensus-subspace/src/digests.rs Outdated Show resolved Hide resolved
crates/sc-proof-of-time/src/gossip.rs Show resolved Hide resolved
Copy link
Member

@dariolina dariolina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having a hard time following everything that is happening, but it looks in line with the specs apart from the bootstrap discussion.

crates/sc-consensus-subspace/src/slot_worker.rs Outdated Show resolved Hide resolved
})?;

proof_of_time
.get_block_proofs(block_number.into(), slot_number, parent_pot_digest)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically you can't be building the block unless you had proof for that slot (because it serves as a challenge) and all previous slots (because you cant verify without all previous proofs). Not sure that's exactly how it's implemented

@nazar-pc
Copy link
Member

nazar-pc commented Aug 7, 2023

Talking about ease of review. Please don't introduce changes and rebase on main at the same time.

Otherwise I get this: https://github.com/subspace/subspace/compare/d1b91fc906260a94aa09e32e5f6aa89731aefc05..7c71020cddf90e592b9571c2a026b1fa7dc61bee

630 additions and 1,533 deletions

And no idea which commits are old and which are new and what exactly has changed comparing to previous revision. If you need to rebase on main and add a few more things, you can do rebase first, push it and comment that nothing has changed, just rebase on main and then push a few commits on top such that it is clear what was added.

@nazar-pc
Copy link
Member

nazar-pc commented Aug 7, 2023

For now I'll assume you have added two commits on top and didn't change anything else

@rahulksnv
Copy link
Contributor Author

Talking about ease of review. Please don't introduce changes and rebase on main at the same time.

Otherwise I get this: https://github.com/subspace/subspace/compare/d1b91fc906260a94aa09e32e5f6aa89731aefc05..7c71020cddf90e592b9571c2a026b1fa7dc61bee

630 additions and 1,533 deletions

And no idea which commits are old and which are new and what exactly has changed comparing to previous revision. If you need to rebase on main and add a few more things, you can do rebase first, push it and comment that nothing has changed, just rebase on main and then push a few commits on top such that it is clear what was added.

Sure, sorry about that as I did everything one shot.

@rahulksnv rahulksnv enabled auto-merge (squash) August 8, 2023 01:46
@rahulksnv rahulksnv merged commit 0c77567 into main Aug 8, 2023
9 checks passed
@rahulksnv rahulksnv deleted the rsub/pot-follow-up-2 branch August 8, 2023 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants