Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Zalgo: synchronize syscalls when replaying in concurrent workers #6724

Draft
wants to merge 5 commits into
base: release-pismo
Choose a base branch
from

Conversation

mhofman
Copy link
Member

@mhofman mhofman commented Dec 27, 2022

refs: #6588

Layered on top of #6723

As usual, better reviewed commit-by-commit.

Description

These are the changes required for the replay tool to synchronize the execution of multiple concurrent workers at the granularity of syscalls. Moddable expressed interest in being able to do this since a full delivery can be complex.

This effectively releases Zalgo onto the transcript syscall replay logic but contains it within the various worker managers, in a way that should be safe.

To be more specific, the transcript replay logic will detect a promise returned by compareSyscalls, in which case the worker manager will return a promise from syscallFromWorker, but only if the worker supports async syscalls. This is the case for all workers but local, which if it detects a promise will throw.

It would be a mistake for the application to configure a local manager's options to try and use such a promise returning compareSyscalls.

I am putting this PR up as draft to discuss the possibility of merge, but would be fine if we decide to never merge this.

Security Considerations

None I can think of.

Documentation Considerations

Types were updated to attempt to document the effects of releasing Zalgo. Unfortunately the type of managerOptions and worker kind are not currently threaded through sufficiently to surface an incompatibility in options (promise returning compareSyscalls used with a local worker type).

Testing Considerations

Manually through the replay tool's new option.

@mhofman mhofman requested review from warner and FUDCo December 27, 2022 01:56
* @returns {T extends PromiseLike<any> ? Promise<Awaited<R>> : R}
*/
export function maybeAsync(v, handle) {
if (typeof v === 'object' && v !== null && 'then' in v) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider matching the rules for testing whether it is thenable. I'm not sure the following is exactly right, but it is closer.

Suggested change
if (typeof v === 'object' && v !== null && 'then' in v) {
if (!isPrimitive(v) && typeof v.then === 'function') {

IOW the presence of then is only disqualifying if its value is a function.

v is a candidate whether it is an object or a function.

@FUDCo
Copy link
Contributor

FUDCo commented Jan 12, 2023

I feel like I'm missing something fundamental about what's going on here. It's entirely possible I'm just not successfully following what this code is doing.

only if the worker supports async syscalls

I'm not sure what this means, since IIUC there's no such thing as an async syscall.


Enabling workers to run independently is a good long-term goal that we've mused about for ages, but I'm missing how this PR relates to that, since the big issue that I see is interleaving crank starts and completions, releasing crank side effects (mostly message sends) from a vat in a batch at the end of the crank rather than incrementally during it. There's a bunch of voodoo involved that I've only pondered superficially but I can't find any intersection between that and this.

On the other hand, supporting concurrent replays seems much more approachable in the nearer term, since replay is all about reconstructing the vat's internal memory state (i.e., it's the vat's business and nobody else's) and is side-effect free from the kernel's point of view (which is why we already get away with replaying vat cranks in a different order than they originally executed). Async replay seems like a small matter (heh) of telling the vat "here's the transcript; go to it" (which may become easier when/if we give each vat its own vatstore DB) and then coordinating the completion of all the vats that are replaying at the same time. But I don't think I see any of that here either.

So what is this?

@mhofman
Copy link
Member Author

mhofman commented Jan 12, 2023

only if the worker supports async syscalls

I'm not sure what this means, since IIUC there's no such thing as an async syscall.

We have 3 kind of worker types:

  • Worker that interacts with the kernel asynchronously, but can block when sending syscalls. This is the case for xs-worker
  • Worker that interacts with the kernel asynchronously, but cannot block when sending syscalls (and thus drops results). This is the case for nodeWorker and node-subprocess.
  • Worker that interacts with the kernel synchronously (and thus can obviously block). This is the case of local

So I really meant worker manager that can handle syscalls asynchronously.

So what is this?

This is not parallel vat execution, and it's not really parallel replay. This is purely about allowing an externally provided compareSyscall to have an asynchronous implementation in the case where the worker manager can handle it.

The goal is to allow the replay tool to more precisely synchronize replay in parallel workers that should be identical, so that any behavioral difference can be noticed at the granularity of syscalls instead of deliveries.

This only makes sense for in the context of the parallel replay of the same transcript into multiple workers, which is introduced by #6723.

@mhofman mhofman force-pushed the mhofman/6588-expand-replay-tool-for-anachrophobia-diagnosis branch 3 times, most recently from c068619 to 9a9ea0b Compare January 12, 2023 17:03
@mhofman mhofman force-pushed the mhofman/6588-sync-syscalls-for-replay-tool branch from c3896c1 to 89ef9af Compare January 12, 2023 17:10
Base automatically changed from mhofman/6588-expand-replay-tool-for-anachrophobia-diagnosis to release-pismo January 12, 2023 17:49
@mhofman mhofman force-pushed the mhofman/6588-sync-syscalls-for-replay-tool branch from 3de2ea5 to 34eb7cb Compare February 6, 2023 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants