Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reshare in a spawned task and fix propagation pallet rotate keyshares endpoint lookup key #1185

Merged
merged 6 commits into from
Nov 27, 2024

Conversation

ameba23
Copy link
Contributor

@ameba23 ameba23 commented Nov 25, 2024

When fixing the reshare test in #1162 i found two problems which mean the resharing will not work in production.

  1. The lookup key for the rotate keyshares endpoint is incorrect
  2. The http client used by the propagation pallet has a timeout set to two seconds. When this timeout is met, the http client closes the connection, causing axum to kill the task handling the request. This causes the reshare to stop mid-protocol. This is fixed here by running the protocol in a spawned task, as we do with DKG.

I have fixed these in #1162 - but since that PR makes quite some changes to our test setup, and still needs some cleaning up, i wanted to put the fixes in a separate PR, so its clear what they are, and so it could be used for a fix for our current testnet (resharing will not work without these fixes).

If we want to fix things with a runtime upgrade and not need to update the TSS code, we could bump up the timeout instead of spawning a task for the reshare. But i think long term it is better to use a separate task, because amount of time the reshare takes depends on many factors.

@ameba23 ameba23 self-assigned this Nov 26, 2024
Copy link
Collaborator

@HCastano HCastano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice finds! I know how much of a pain this has been to fix.

I agree with the approach of fixing this in the TSS, seems more reasonable long term.

Also thanks for splitting this out into its own PR.

@@ -369,7 +369,7 @@ pub fn new_full_base(
);
offchain_db.local_storage_set(
sp_core::offchain::StorageKind::PERSISTENT,
b"rotate_keyshares",
b"rotate_network_key",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, is this the key you're saying is incorrect?

So even with an extended timeout this wasn't going to work then?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was incorrect. But im not sure if it actually makes a difference in production, since if the lookup fails it defaults to 127.0.0.1:3001/rotate_network_key, which is usually what we want. I think its only in our test setup that we need to set a custom endpoint.

@HCastano HCastano changed the title Reshare in a spawned task, and fix propagation pallet rotate keyshares endpoint lookup key Reshare in a spawned task and fix propagation pallet rotate keyshares endpoint lookup key Nov 26, 2024
@HCastano
Copy link
Collaborator

Maybe worth adding a CHANGELOG entry under "Fixed"?

@ameba23 ameba23 merged commit b6884ec into master Nov 27, 2024
7 checks passed
@ameba23 ameba23 deleted the peg/reshare-fix branch November 27, 2024 07:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants