-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(vm): Allow switching between VMs for latest protocol version #2508
feat(vm): Allow switching between VMs for latest protocol version #2508
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A drawback of adding the switch only to BWIP is that there's no way to run integration test with a shadowed VM or to run a loadtest with the new VM. Is this OK, or should I add these options? (Could be dealt with later, of course, but then we could miss divergencies triggered by integration tests, or performance degradation on the load test.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIU BWIP performance is crucial for the proof generation speed, so probably it's important to not accidentally worsen the performance. Given that VM is single-threaded and isn't too IO-heavy, probably it's fine. But still cc @EmilLuta just in case.
@popzxc @EmilLuta This is a good point regarding BWIP availability. Is it possible to run multiple BWIP instances, or is it a singleton by design? Ideally, we'd want to run an additional instance starting from the first L2 block with the supported protocol version, so that it's not a bottleneck and we don't care much about potential panics caused by VM divergencies. If this is impossible / requires significant changes in the BWIP architecture, it could make sense to go with the initial plan of integrating the new VM into the EN state keeper, because in this case it obviously wouldn't be a bottleneck, and VM panics wouldn't need to be handled ASAP. (And it would also have the benefit of being able to test the new VM on integration tests; see my comment above.) OTOH, it's relatively easy to convert VM panics into logged errors, but in this case we could get meaningless cascading errors in a single VM run. |
core/node/node_framework/src/implementations/layers/vm_runner/bwip.rs
Outdated
Show resolved
Hide resolved
Sorry for being late to the party. Let's unpack them 1 by 1. VM Runner is meant to be a singleton, but internally it runs multiple VMs and it can run up to n VMs (I believe the default setup is 3). The exact details of VM Runner, @itegulov would be the best person to run. With respect of what happens if we run multiple BWIPs @Artemka374, is the right person, but AFAIR, it should be fine. My assumption is that this is a one off integration (turn from off to on -- from old VM to new VM). If that's the case, then there's no problem on experimenting with BWIP. Yeah, it's mission critical for proofs, but remember that proofs have a long time to get proven (8h soft for other teams, 24h for SLA). I think EN might be more suitable for recurring work, but for a one off (turn it on, wait for x [where x < 4] weeks, then migrate to state keeper), BWIP seems the natural place. |
#[serde(default)] | ||
pub fast_vm_mode: FastVmMode, | ||
/// Path to the RocksDB cache directory. | ||
#[serde(default = "ExperimentalVmPlaygroundConfig::default_db_path")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: serde(default)
doesn't work with file-based config which we already use on stage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've accounted for defaults in protobuf_config
crate, which AFAIU is sufficient to make file-based config work. Will test this locally and address in a follow-up PR if necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't looked through the whole PR, but the VM runner impl looks good to me
What ❔
MainBatchExecutor
.VmRunner
-powered component, VM playground.Why ❔
A separate component seems a relatively safe place to start integration from.
Checklist
zk fmt
andzk lint
.