Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The client does not support Gnosis/Chiado spec #153

Open
pinebit opened this issue Aug 12, 2024 · 18 comments
Open

The client does not support Gnosis/Chiado spec #153

pinebit opened this issue Aug 12, 2024 · 18 comments

Comments

@pinebit
Copy link
Contributor

pinebit commented Aug 12, 2024

We are investigating full block hash mismatching issue calculated by VC (confirmed with Lodestar, Teku) and Charon that uses go-eth2-client.
Initially we observed that the signed block submitted by a VC in Gnosis/Chiado fails signature verification. We quickly realized the block hash calculated inside of VC before signing differs from how we calculate hash in Charon using go-eth2-client.
A little deeper investigation led us to chain specification differences that likely impact how ssz serialization behaves and therefore produces different hash.
After checking go-eth2-client code, we saw no conditional code or variables that would change serialization flow. An exception is the fork, of course. However, given that most CL/VC clients support network-specific ssz serializaiton, we wanted to clarify if go-eth2-client would do the same and implement variations for Gnosis/Chiado.

@mcdee
Copy link
Contributor

mcdee commented Aug 12, 2024

A little deeper investigation led us to chain specification differences

Please could you provide details on the specific differences.

@pinebit
Copy link
Contributor Author

pinebit commented Aug 12, 2024

A little deeper investigation led us to chain specification differences

Please could you provide details on the specific differences.

We are still trying to narrow this down. For now we know the difference appears in ExecutionPayload object serialization, but we could not prove it yet.
We hope you would know something special about Gnosis and how this could impact block data serialization.

@mcdee
Copy link
Contributor

mcdee commented Aug 12, 2024

I have no knowledge of gnosis, but if there are changes to any of the sizes of lists (e.g. maximum number of validators) that could cause problems.

@pinebit
Copy link
Contributor Author

pinebit commented Aug 13, 2024

@mcdee we have just found that the single spec parameter that actually impacts block serialization (and hash):
MAX_WITHDRAWALS_PER_PAYLOAD which is 16 in eth mainnet and 8 in gnosis.
I still need to figure out in go-eth2-client where it is..
Sharing this finding with you in case you have better understanding of this parameter and its impact.

@mcdee
Copy link
Contributor

mcdee commented Aug 13, 2024

in that case you'll need to set up the connection using the WithCustomSpecSupport() option as per https://github.com/attestantio/go-eth2-client/blob/490d07a8e0c258f4528d3039109696679d79787d/http/parameters.go#L131C6-L131C27

@pinebit
Copy link
Contributor Author

pinebit commented Aug 13, 2024

// ExecutionPayloadWithdrawals provides information about withdrawals.
type ExecutionPayloadWithdrawals struct {
	Withdrawals []*capella.Withdrawal `ssz-max:"16"`
}

probably this is where it is hardcoded as 16..

UPD: Therefore I can't get how WithCustomSpecSupport would help.. given that the value is "hardcoded".

@pinebit
Copy link
Contributor Author

pinebit commented Aug 13, 2024

UPD: we are testing a simple patch (replaced 16 with 8) in gnosis to prove this would work.

@pinebit
Copy link
Contributor Author

pinebit commented Aug 15, 2024

Hello @mcdee, we tested the patch and this worked great.
Here is the diff: https://github.com/attestantio/go-eth2-client/compare/master...ObolNetwork:go-eth2-client:master?expand=1
I do understand this is not a proper fix, but something to shed light on what possible changes are needed.
Let us know please if you are going to patch go-eth2-client to introduce proper support for custom specs such as gnosis.
Thank you

UPD: Testing WithCustomSpecSupport() now.

@pinebit
Copy link
Contributor Author

pinebit commented Aug 15, 2024

UPD: WithCustomSpecSupport() did not work for gnosis/chiado. Unfortunately the changes similar to what I quoted above are required.

@pinebit pinebit changed the title Possible ssz serialization issue in Gnosis/Chiado The client does not support Gnosis/Chiado spec Aug 15, 2024
@mcdee
Copy link
Contributor

mcdee commented Aug 15, 2024

The static SSZ functions cannot work with different values, hence the custom spec support. However, the custom spec support needs to create a dynssz-max tag and support its use when creating roots. Dynamic SSZ is supplied by https://github.com/pk910/dynamic-ssz written by @pk910 Perhaps you can take a look at that library and add a PR to support the tag and roots, at which point WithCustomSpecSupport() should work for Gnosis.

@pinebit
Copy link
Contributor Author

pinebit commented Aug 16, 2024

@mcdee I checked the current implementation for WithCustomSpecSupport() and found it is only used for decoding HTTP responses. Our case is very different, we need to enable dynamic-ssz for hash calculation in the first place.
I found that dynamic-ssz does not support anything like HashWalker interface that is required to calculate hash respecting dynssz-size tags. The generated code for HashTreeRoot() method apparently uses only fastssz.

If we start adding custom HashTreeRootWithSpec(spec) implementations for BeaconBlock, ExecutionPayload, etc. this would result in a viral effect.

Do you have an idea for a better design given that we need not just SSZ marshaling, but also hash calculation to be driven by dynamic-ssz?

Another "minor" findings is that making .Spec() calls can be expensive if we do this every time we use dynamic-ssz. I understand this is done because of future forks, but we need to find a mechanism to optimize this.

@mcdee
Copy link
Contributor

mcdee commented Aug 16, 2024

Yes, as mentioned the dynamic ssz system would need to add support for creating roots.

Any dynamic system will be slower than one that is hard-coded, which is noted in WithCustomSpecSupprt(). Calls to Spec() itself won't be that bad, because go-eth2-client caches the data, but carrying out the hashing operation with dynamic dependencies will still be slower as a result.

There is nothing in the HashRoot interface in fastssz that allows passing of parameters, so there's no simple fix there. The options seem to be to either update dynamic-ssz to support generating hashes and use dynssz-max, or for fastssz to take some sort of parameterization for values that are currently hard-coded (perhaps via a context?). The former is likely the better path, as dynamic-ssz acts at runtime today whereas fastssz is built with hard-coded values.

@pk910
Copy link
Contributor

pk910 commented Aug 17, 2024

Heya,

Driven by this issue I've started another try to implement the hash tree root calculation in the dynamic-ssz library.
It looks like I finally got a running prototype which is able to compute block & state roots from minimal preset blocks/states 🎉

pk910/dynamic-ssz#2

In theory this should work the same way when using the gnosis specs, but absolutely no guarantee at this point :D

To make the dynamic hash calculation work, there are quite a few additional dynssz-max annotations required.
I've gone through the specs and tried to add them all, but might have missed a few:
https://github.com/attestantio/go-eth2-client/compare/master...pk910:go-eth2-client:pk910/dynssz-hashroot?expand=1
(+ same for electra branch)

The hash tree root calculation code is currently in a quite early prototype state. It probably needs a lot more time to get it as cleanly implemented as the marshaling/unmarshaling code. So yea, please use with care :)

@mcdee:
I've added a quite low level HashRootProvider that provides a generic HashRoot method.
In behind that function switches over to the dynamic ssz library if WithCustomSpecSupprt() is enabled.

I'm not really happy with that solution, but it's the best I could do without really huge changes to the library :D
All the exported Root() & HashTreeRoot() methods of beacon entities cannot be used and will return an error or wrong result when used with non-mainnet preset.
There's also no way to inject the dynamic root calculation as these methods are either auto generated or have no link to the chain specs that are needed for the dynamic root calculation.

I'm not sure how to go forward with that. I'm feeling a bit bad just leaving these methods broken as that would lead to huge headaches of devs :D Any thoughts on this?

@mcdee
Copy link
Contributor

mcdee commented Aug 25, 2024

I've been considering this, and there's no simple/clean solution here that I can come up with. Is our best solution to add a note to all of the relevant methods that these will only work with mainnet configurations, and point them to the dynamic versions if whatever they are building needs to work with other configurations?

@mcdee
Copy link
Contributor

mcdee commented Dec 28, 2024

Coming back to this, is the issue purely around SSZ encoding and decoding? And if so, would using the WithEnforceJSON() address the issue (although again result in an overall slower experience due to lack of SSZ)?

@pinebit
Copy link
Contributor Author

pinebit commented Dec 28, 2024

Coming back to this, is the issue purely around SSZ encoding and decoding? And if so, would using the WithEnforceJSON() address the issue (although again result in an overall slower experience due to lack of SSZ)?

Yes, this issue is about SSZ encoding and decoding. WithEnforceJSON() does not help because this affects only few HTTP calls, not hash calculation. Calculating proper hash is important to validate block signatures and this hash diverges depending on the spec.

UPD: Check my comment above #153 (comment), the struct tag is hardcoded, while it has to be dynamic depending on the spec. Various VC vendors implemented dynamic SSZ serialization customizable with the spec, depending on the network.

@mcdee
Copy link
Contributor

mcdee commented Dec 28, 2024

In which case have you investigated the solution at #153 (comment) as a potential solution (various issues with it notwithstanding; if it works we will need to consider how to support it more fully).

@pinebit
Copy link
Contributor Author

pinebit commented Dec 28, 2024

In which case have you investigated the solution at #153 (comment) as a potential solution (various issues with it notwithstanding; if it works we will need to consider how to support it more fully).

What we have done in Charon: we maintain two sets of structures having different structure attributes values (and generated different set of serializers). Then, depending on the current network with switch them on the fly.

PR for reference: https://github.com/ObolNetwork/charon/pull/3238/files

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants