Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add distributed flag to VC to enable support for DVT #4867

Merged
merged 10 commits into from
Feb 15, 2024

Conversation

AgeManning
Copy link
Member

Am interested in feedback for this PR.

There are efforts towards building out Distributed Validator Technology (DVT). We want to support this effort and make Lighthouse maximally compatible with what is being built in the space.

For an immediate (perhaps temporary) path forward, I'm suggesting we group some of the required functionality under a --distributed CLI flag, which is counter to some opinions expressed in #4248.

I've added additional reasoning to the above issue, but in the current context, there is desire to add a minimal change to the VC which I think is fine to add behind either a CLI flag or feature flag. I expect more changes to occur and my thinking is we can use this banner flag to incorporate any future changes we need to support DVT under this CLI flag.

For this PR the change is to compute sync aggregates for duties per slot, rather than to pre-compute them for each sync committee period.

@xenowits
Copy link

Thanks @AgeManning , looks good!

Seems calculating sync committee aggregations per slot is still in progress, happy to review it when implemented 🙌

@michaelsproul
Copy link
Member

I've added a commit 8d19615 to compute sync committee selection proofs with 1-slot lookahead when --distributed is provided.

The only change for non-distributed validator clients is that selection proofs will no longer be computed in epoch-sized batches. Rather each batch is now just a single slot. I don't think this will have a substantial performance impact (positive or negative), but will be deploying to one of our testnet nodes with 10k+ keys to make sure.

@michaelsproul
Copy link
Member

Looks like this is working as intended. Proofs are computed at the start of the previous slot (7104128=06:25:36):

Dec 05 06:25:36.353 DEBG Calculating sync selection proofs pre_compute_slot: 7104129, current_slot: 7104128, period: 867, service: duties

A validator is found to be a sync aggregator (the slot here is the proof slot, which is the slot prior, i.e. the current slot):

Dec 05 06:25:36.354 DEBG Validator is sync aggregator subnet_id: 3, slot: 7104128, validator_index: 115668, service: duties

Then, 4s into the slot 7104129, the sync aggregate is successfully published:

Dec 05 06:25:56.674 INFO Successfully published sync contributions, slot: 7104129, num_signers: 93, beacon_block_root: 0xeb8b…481e, subnet: 3, service: sync_committee

@michaelsproul
Copy link
Member

@xenowits It would be great if you could take this PR for a spin with Obol and provide feedback 🙏

@xenowits
Copy link

xenowits commented Dec 5, 2023

Hey @michaelsproul , that's some great work!

Unfortunately, i'm out for the next two weeks. I'll review this PR on my side & pass this to the rest of my team to test.

@AgeManning AgeManning marked this pull request as ready for review December 6, 2023 02:25
@AgeManning AgeManning added the ready-for-review The code is ready for review label Dec 6, 2023
@xenowits
Copy link

xenowits commented Dec 13, 2023

Hey @michaelsproul , I have reviewed it & the changes look solid. We'll be running this branch internally on one of our clusters & test it.

@michaelsproul
Copy link
Member

@xenowits Great to hear! Let us know how it goes, and whether you need any other tweaks

@dB2510
Copy link

dB2510 commented Dec 14, 2023

Hey @michaelsproul @AgeManning 👋

We tested these changes with our clusters and aggregations don't work as it seems like lighthouse VC is not calling selection endpoints to get aggregated selection proof as described in #4248. Will the changes to call those endpoints be included in a separate pull request, since this PR does not query those selection endpoints?

I am referring to these endpoints:

Thanks

@AgeManning
Copy link
Member Author

Hey @dB2510

Yep, these need to still be added. I recently saw a talk about running obol which indicated the lodestar vc was required to run an obol node. Do you know which other clients support these endpoints, is it just lodestar?

They are in the specification, I'll have a chat with the others about the best path forward from here. Supporting these endpoints will be a future PR.

@michaelsproul
Copy link
Member

michaelsproul commented Jan 9, 2024

What about the builder registration overrides (#4306), do you still need that too?

It would be good to have a complete list of features required by Obol so we can know the fill scope of changes.

@michaelsproul
Copy link
Member

michaelsproul commented Jan 22, 2024

@xenowits Confirmed via DM that this PR is a good step forward for Obol support, and that we should merge it to allow further testing.

The other features desired for Obol DVT are:

I'm still double-checking whether we need (update: we do):

@michaelsproul michaelsproul added v5.0.0 Q1 2024 and removed blocked labels Jan 22, 2024
Copy link
Collaborator

@dapplion dapplion left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the --distributed flag is good step, this PR can be merged as is. All logic is correct.

Just merged unstable to fix conflicts

@paulhauner paulhauner removed the ready-for-review The code is ready for review label Feb 14, 2024
@michaelsproul
Copy link
Member

I've merged unstable again and updated the CLI tests. Let's merge once CI passes

@michaelsproul
Copy link
Member

@Mergifyio queue

Copy link

mergify bot commented Feb 15, 2024

queue

🛑 The pull request has been removed from the queue default

The queue conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

@michaelsproul
Copy link
Member

@Mergifyio requeue

Copy link

mergify bot commented Feb 15, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Feb 15, 2024

queue

🛑 The pull request has been removed from the queue default

The queue conditions cannot be satisfied due to failing checks.

You can take a look at Queue: Embarked in merge queue check runs for more details.

In case of a failure due to a flaky test, you should first retrigger the CI.
Then, re-embark the pull request into the merge queue by posting the comment
@mergifyio refresh on the pull request.

mergify bot added a commit that referenced this pull request Feb 15, 2024
mergify bot added a commit that referenced this pull request Feb 15, 2024
@michaelsproul
Copy link
Member

@Mergifyio requeue

Copy link

mergify bot commented Feb 15, 2024

requeue

✅ This pull request will be re-embarked automatically

The followup queue command will be automatically executed to re-embark the pull request

Copy link

mergify bot commented Feb 15, 2024

queue

✅ The pull request has been merged automatically

The pull request has been merged automatically at 49536ff

@OisinKyne
Copy link

Hey @AgeManning, appreciate the chat last week. Here’s a synopsis of where I believe we are that (though you wrote the PR so likely have the specifics).



“it seems like lighthouse VC is not calling selection endpoints to get aggregated selection proof as described in #4248. Will the changes to call those endpoints be included in a separate pull request, since this PR does not query those selection endpoints?”



This comment above is the last key update for us on your PR. This was the other thing that’s not that important and is now closed. (So Michaels comment further down of: “I'm still double-checking whether we need (update: we do):” is no longer the case).

Calling those endpoints (attest aggregate, sync aggregate) during the flow and using the returned selection proof rather than the one generated locally to hash and determine if you’re an aggregator should be what’s left IIUC. Here is our original implementation note, and here is a kurtosis repo that might help with testing, though we’ll also happily test once you have a PR or an unstable build :)

Let me know if I can answer any other follow ups. Cheers.

@AgeManning AgeManning mentioned this pull request Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready-for-merge This PR is ready to merge. v5.0.0 Q1 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants