Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(cannon): Binary serialization for snapshots #7559

Closed
wants to merge 11 commits into from
Closed

Conversation

clabby
Copy link
Member

@clabby clabby commented Oct 5, 2023

Overview

Moves the cannon state snapshots to a streamable binary serialization format, usable by types implementing the Serializable interface:

// Serializable defines functionality for a type that may be serialized to raw bytes.
type Serializable interface {
	// Serialize encodes the type as raw bytes.
	Serialize(out io.Writer) error

	// Deserialize decodes raw bytes into the type.
	Deserialize(in io.Reader) error
}

The encoding scheme is just regular old flat binary - when a fixed size type is written, it's written raw, and when a dynamically sized piece of data is written, it is written following a length prefix.

Rationale

As it stands, the serialization format of state snapshots is somewhat convoluted; We first base64 encode the pages within Cannon's Memory, JSON encode the full State in memory, then we gzip the JSON. This method brings us to a streamable binary format with a single compression pass for the State, which is commonly very large towards the tail end of Cannon execution.

@clabby clabby force-pushed the cl/cannon-ser branch 2 times, most recently from 883c57e to 37ee20f Compare October 5, 2023 14:55
@clabby clabby marked this pull request as ready for review October 5, 2023 15:38
@clabby clabby requested a review from a team as a code owner October 5, 2023 15:38
@clabby clabby self-assigned this Oct 5, 2023
@clabby clabby added A-cannon Area: cannon A-op-challenger Area: op-challenger labels Oct 5, 2023
@Inphi
Copy link
Contributor

Inphi commented Oct 5, 2023

Alternatively, if we don't care about compatibility with other VMs, it'll be much easier to use Go's built-in codec. gob alleviates the need to write and maintain the serde routines when new fields are added. It uses reflection to figure out the encoding.

@clabby
Copy link
Member Author

clabby commented Oct 5, 2023

Alternatively, if we don't care about compatibility with other VMs, it'll be much easier to use Go's built-in codec. gob alleviates the need to write and maintain the serde routines when new fields are added. It uses reflection to figure out the encoding.

One of the primary reasons for this change is to define an easy-to-implement codec for the state snapshots so that other VM implementations, namely cannon-rs, can be a drop-in option for the op-challenger. Currently the op-challenger directly relies on code from Cannon for serde ops, so any go-specific codecs would be a pain to implement in other languages or require an added dependency (of which I've been trying to minimize at all costs).

@Inphi
Copy link
Contributor

Inphi commented Oct 5, 2023

Alternatively, if we don't care about compatibility with other VMs, it'll be much easier to use Go's built-in codec. gob alleviates the need to write and maintain the serde routines when new fields are added. It uses reflection to figure out the encoding.

One of the primary reasons for this change is to define an easy-to-implement codec for the state snapshots so that other VM implementations, namely cannon-rs, can be a drop-in option for the op-challenger. Currently the op-challenger directly relies on code from Cannon for serde ops, so any go-specific codecs would be a pain to implement in other languages or require an added dependency (of which I've been trying to minimize at all costs).

VMs that represent program memory differently, including cannon-rs, will have to massage the data quite a bit to integrate the snapshot. It follows that the snapshots should be in a standard format so it's easy to parse. We sorta had this with the JSON serializer, but an unstructured binary representation will be cumbersome and error-prone to parse. I suggest using something like protobuf, flatbuffers, or even rlp (ok maybe not), if the goal is to allow other VMs use the same snapshot format.

@Inphi
Copy link
Contributor

Inphi commented Oct 5, 2023

Also, I dunno if it's a realistic usecase to support loading and dumping snapshots using different VM implementations. It's probably good to have something like this as a test vector, but not worth it as a user-facing feature imo.

@ajsutton
Copy link
Contributor

ajsutton commented Oct 5, 2023

I'd potentially be open to changing op-challenger to not read cannon snapshots as well. We could run a cannon subcommand to convert a snapshot to a claim hash instead. We'd have to review the use cases to know what that would take but it would be good to keep them more decoupled if possible.

@@ -70,6 +72,214 @@ func (s *State) EncodeWitness() StateWitness {
return out
}

func (s *State) Serialize(out io.Writer) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The state is so small (< 500 bytes) that it may just be worth serializing/deserializing the state witness, and writing/reading it as one blob into the writer, to avoid many different write/read calls.

@protolambda
Copy link
Contributor

protolambda commented Oct 5, 2023

See above suggestion regarding state serialization. Another benefit would also be that if we include the memory-hash as part of the state witness in the snapshot, then you can determine the claim-hash by just parsing and hashing the witness part of the full snapshot, which can be very fast, as you don't have to load the full state.

@protolambda
Copy link
Contributor

If we prefix the state snapshot with the claim hash of the snapshot, maybe we don't have to invoke any binary at all, and are still able to generalize it, since the challenger would just always read the first 32 bytes as claim hash, and ignore the rest of the snapshot?

@clabby
Copy link
Member Author

clabby commented Oct 6, 2023

If we prefix the state snapshot with the claim hash of the snapshot, maybe we don't have to invoke any binary at all, and are still able to generalize it, since the challenger would just always read the first 32 bytes as claim hash, and ignore the rest of the snapshot?

This would definitely be nice if we stick with this format - good thinking. Stepping back a bit, this change can be slowrolled as there's no immediate need for it - What if we thought about making a common FFI schema for all of the VM implementations? The abstraction problem seems to be the binary + the op-challenger using go cannon-specific types for serialization etc. If not for that, the VM impls wouldn’t have to care about the serialization format at all.

With a nice FFI schema for the VMs to implement, we could interact with VM implementations as a library in the op-challenger and galadriel (in the future) + a binary that supports it, and the user of the interface could decide if they’d like to even make snapshots, etc., taking that responsibility off of a specific VM implementation's types. Each one could decide how it wants to do this, and the interface could just have a standardized binary format similar to this one, but with a bit more rigidity. This would make it easier to use several different VM implementations in a drop-in way for services that utilize the interfaces (which can be implemented in multiple languages) and then leave everything else to the consumer, which would be nice.

@ajsutton
Copy link
Contributor

ajsutton commented Oct 6, 2023

I like the idea of prefixing the snapshot with the key data as a simple but very effective step forward. I'm unclear on how much effort an FFI type schema would take to setup. The challenger has a few requirements that may not be obvious here though:

  1. It needs to get data at a specific trace index OR the hash of the final trace index (if the requested one is from a later step). Currently the challenger is reading the output snapshot to get the last step's data and is jumping through some hoops to keep track of what the last step index actually is. Would be nice if the VM could manage that more itself
  2. The data it needs for that step is the claim hash, preimage key and value (if any), state witness (pre-image of the claim hash) and the proof data
  3. It needs to compute the claim hash from the pre-state for verification
  4. It needs to manage disk allocations - it provides a directory to write data to and is responsible for deleting it when the game is no longer relevant.

So we'd need to be able to easily get both the claim hash and the preimage (the state witness). We don't really want to just get the state witness and hash it ourselves as the hashing process requires setting the status byte which would be good to keep inside cannon if possible. Could be hash :: witness :: whatever other data though.

It would be fine for the VM to be just given a directory for its own use and it can manage snapshots etc itself, though the challenger assumes fetching a previously generated proof is fast currently. It is useful for the challenger to have a flag to control how often snapshots are stored to let the user trade off execution time vs disk space.

Beyond that I'm a fan of simplifying the CannonTraceProvider and having cannon manage more of the details itself. That would also give more flexibility to alternate VMs.

@github-actions
Copy link
Contributor

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@codecov
Copy link

codecov bot commented Oct 28, 2023

Codecov Report

Merging #7559 (b7ab721) into develop (62d4457) will decrease coverage by 1.08%.
Report is 672 commits behind head on develop.
The diff coverage is 27.71%.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #7559      +/-   ##
===========================================
- Coverage    53.48%   52.41%   -1.08%     
===========================================
  Files          162      163       +1     
  Lines         6048     6250     +202     
  Branches       970      970              
===========================================
+ Hits          3235     3276      +41     
- Misses        2691     2798     +107     
- Partials       122      176      +54     
Flag Coverage Δ
cannon-go-tests 58.13% <27.71%> (-5.36%) ⬇️
chain-mon-tests 26.95% <ø> (ø)
common-ts-tests 26.74% <ø> (ø)
contracts-bedrock-tests 61.26% <ø> (ø)
contracts-ts-tests 100.00% <ø> (ø)
core-utils-tests 44.03% <ø> (ø)
sdk-next-tests 42.18% <ø> (ø)
sdk-tests 42.18% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
cannon/mipsevm/page.go 88.88% <ø> (+22.67%) ⬆️
cannon/cmd/load_elf.go 0.00% <0.00%> (ø)
cannon/cmd/witness.go 0.00% <0.00%> (ø)
cannon/cmd/run.go 0.00% <0.00%> (ø)
cannon/cmd/binary.go 40.47% <40.47%> (ø)
cannon/mipsevm/memory.go 71.86% <34.54%> (-9.01%) ⬇️
cannon/mipsevm/state.go 37.79% <23.31%> (-51.34%) ⬇️

@@ -70,6 +72,214 @@ func (s *State) EncodeWitness() StateWitness {
return out
}

func (s *State) Serialize(out io.Writer) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like a quick spec on the format before we merge

@ajsutton
Copy link
Contributor

ajsutton commented Nov 1, 2023

Consensus seems to be this is a worthwhile change and should be fine to merge, but it's worth documenting the binary format first. The aim is just to make it easier to understand the format, not to set it as an official standard format - we may well make further changes to this file format and beak compatibility prior to fault proofs shipping to mainnet.

Copy link
Contributor

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days.

@github-actions github-actions bot added the Stale label Nov 16, 2023
@github-actions github-actions bot closed this Nov 21, 2023
@sebastianst sebastianst removed the Stale label Nov 21, 2023
@sebastianst sebastianst reopened this Nov 21, 2023
Comment on lines +82 to +88
serMemBuf := new(bytes.Buffer)
err := s.Memory.Serialize(serMemBuf)
if err != nil {
return err
}
serMemBytes := serMemBuf.Bytes()
serMemLen := uint32(len(serMemBytes))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd missed this before, but this is really very unfortunate since we now need to hold a copy of the entire memory in RAM while serializing. That was causing OOM failures originally until we started gzipping the pages individually to reduce the size.

I think we need to find a way to avoid this and actually stream the memory content. The simplest thing would be for the memory serialisation to prefix its content with the number of pages to read and then read exactly those pages back in rather than reading until EOF. Net result is the prefix becomes the number of pages (ie list length) rather than number of bytes in the memory, but now you can fully stream the content when reading and writing.

As this currently stands I think we'll run into memory issues again.

@ajsutton ajsutton requested a review from a team as a code owner November 24, 2023 00:13
@ajsutton
Copy link
Contributor

Pushed a couple of commits to update this with the latest changes from develop, fix the devnet failure and update how atomic file writes are done.

Copy link
Contributor

github-actions bot commented Dec 8, 2023

This PR is stale because it has been open 14 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cannon Area: cannon A-op-challenger Area: op-challenger Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants