Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"anachrophobia error" should print the last few transcript entries #5950

Closed
warner opened this issue Aug 12, 2022 · 4 comments
Closed

"anachrophobia error" should print the last few transcript entries #5950

warner opened this issue Aug 12, 2022 · 4 comments
Assignees
Labels
enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE

Comments

@warner
Copy link
Member

warner commented Aug 12, 2022

What is the Problem Being Solved?

@turadg encountered a seemingly-intermittent error today in the CI integration test:

2022-08-12T20:37:21.8572085Z stage-2: chain: 2022-08-12T20:37:21.840Z SwingSet: kernel: anachrophobia strikes vat v7
2022-08-12T20:37:21.8577230Z stage-2: chain: 2022-08-12T20:37:21.841Z SwingSet: kernel: expected: {"0":"vatstoreSet","1":"vom.o+11/5","2":"{\"currentBalance\":{\"body\":\"{\\\"brand\\\":{\\\"@qclass\\\":\\\"slot\\\",\\\"iface\\\":\\\"Alleged: IST brand\\\",\\\"index\\\":0},\\\"value\\\":{\\\"@qclass\\\":\\\"bigint\\\",\\\"digits\\\":\\\"0\\\"}}\",\"slots\":[\"o+14/1\"]},\"recoverySet\":{\"body\":\"{\\\"@qclass\\\":\\\"slot\\\",\\\"iface\\\":\\\"Alleged: setStore\\\",\\\"index\\\":0}\",\"slots\":[\"o+8/29\"]}}","length":3}
2022-08-12T20:37:21.8579093Z stage-2: chain: 2022-08-12T20:37:21.841Z SwingSet: kernel: got     : {"0":"vatstoreGet","1":"vc.12.|schemata","length":2}
2022-08-12T20:37:21.8684929Z stage-2: chain: portHandler threw (Error#1)
2022-08-12T20:37:21.8689099Z stage-2: chain: Error#1: historical inaccuracy in replay of v7
2022-08-12T20:37:21.8692132Z stage-2: chain: 
2022-08-12T20:37:21.8695606Z stage-2: chain:   at requireIdentical (.../swingset-vat/src/kernel/vat-loader/transcript.js:8:8)
2022-08-12T20:37:21.8720355Z stage-2: chain:   at Object.simulateSyscall (.../swingset-vat/src/kernel/vat-loader/transcript.js:63:22)
2022-08-12T20:37:21.8734717Z stage-2: chain:   at Object.syscallFromWorker (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:249:30)
2022-08-12T20:37:21.8739089Z stage-2: chain:   at handleUpstream (.../swingset-vat/src/kernel/vat-loader/manager-subprocess-xsnap.js:86:11)
2022-08-12T20:37:21.8743272Z stage-2: chain:   at handleCommand (.../swingset-vat/src/kernel/vat-loader/manager-subprocess-xsnap.js:109:14)
2022-08-12T20:37:21.8746933Z stage-2: chain:   at handleCommand (packages/xsnap/src/replay.js:111:26)
2022-08-12T20:37:21.8750340Z stage-2: chain:   at runToIdle (packages/xsnap/src/xsnap.js:203:42)
2022-08-12T20:37:21.8753859Z stage-2: chain:   at runMicrotasks (<anonymous>)

It went away after a trivial commit (that shouldn't have changed anything), which is terrifying, but it also made me think that it'd be nice to capture more context in that situation.

I'm thinking that instead of only printing the one transcript entry that we were expecting, we should find a way to print the last 5 or 10 entries.

Also, sometimes the divergence is that the original did syscalls A/B/C, and the replay did B/A/C or A/C/B (swapping the order of two, like if somehow GC finalizers fired in a different order and we didn't sort properly to mask it). It'd be nice to be able to distinguish this from a replay that only did A/B or only A/C. To capture that, I think we'd have to allow the delivery to run to completion, and then only compare syscalls at the very end. Or, allow N more syscalls to be made before we stop accepting any more and just report the error. This sounds trickier, I'm not sure what it would take to implement it safely.

Description of the Design

Not sure yet

Security Considerations

We'd need to maintain the invariant that the kernel halts before committing any state changes. Beyond that, it ought to be safe to let the doomed vat finish out its last delivery.

Test Plan

This sort of logging is pretty tough to test, I'd be satisfied with a manual inspection of a deliberately provoked error. However, we would like confidence that the error can be emitted without e.g. a logging exception causing the final panic() to be bypassed, which would then fail the "kernel should halt" invariant.

So probably a unit test which triggers anachrophobia, and then asserts that the kernel has paniced. Then we can manually look at the log messages and decide that they're correct.

@warner warner added enhancement New feature or request SwingSet package: SwingSet labels Aug 12, 2022
@Tartuffo Tartuffo added this to the Mainnet 1 RC0 milestone Aug 16, 2022
@Tartuffo Tartuffo removed this from the Mainnet 1 RC0 milestone Sep 21, 2022
@rowgraus rowgraus added the vaults_triage DO NOT USE label Jan 3, 2023
@warner warner assigned mhofman and unassigned FUDCo Mar 14, 2023
@warner
Copy link
Member Author

warner commented Mar 14, 2023

@mhofman say he's got a branch with something that might do this, he might pick it up, but we figure it's low priority

@warner
Copy link
Member Author

warner commented Apr 18, 2023

BTW my 6770 branch (PR #7428) should do a better job with this, see the explain() function in vat-warehouse.js

@warner
Copy link
Member Author

warner commented Apr 19, 2023

My 6770 branch will only print a single transcript entry, but it will print all expected syscalls, and all executed syscalls, and highlight the point of divergence.

@mhofman
Copy link
Member

mhofman commented May 1, 2023

I believe this is done with #7428 having landed.

@mhofman mhofman closed this as completed May 1, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request SwingSet package: SwingSet vaults_triage DO NOT USE
Projects
None yet
Development

No branches or pull requests

5 participants