"anachrophobia error" should print the last few transcript entries #5950

warner · 2022-08-12T23:34:57Z

What is the Problem Being Solved?

@turadg encountered a seemingly-intermittent error today in the CI integration test:

2022-08-12T20:37:21.8572085Z stage-2: chain: 2022-08-12T20:37:21.840Z SwingSet: kernel: anachrophobia strikes vat v7
2022-08-12T20:37:21.8577230Z stage-2: chain: 2022-08-12T20:37:21.841Z SwingSet: kernel: expected: {"0":"vatstoreSet","1":"vom.o+11/5","2":"{\"currentBalance\":{\"body\":\"{\\\"brand\\\":{\\\"@qclass\\\":\\\"slot\\\",\\\"iface\\\":\\\"Alleged: IST brand\\\",\\\"index\\\":0},\\\"value\\\":{\\\"@qclass\\\":\\\"bigint\\\",\\\"digits\\\":\\\"0\\\"}}\",\"slots\":[\"o+14/1\"]},\"recoverySet\":{\"body\":\"{\\\"@qclass\\\":\\\"slot\\\",\\\"iface\\\":\\\"Alleged: setStore\\\",\\\"index\\\":0}\",\"slots\":[\"o+8/29\"]}}","length":3}
2022-08-12T20:37:21.8579093Z stage-2: chain: 2022-08-12T20:37:21.841Z SwingSet: kernel: got     : {"0":"vatstoreGet","1":"vc.12.|schemata","length":2}
2022-08-12T20:37:21.8684929Z stage-2: chain: portHandler threw (Error#1)
2022-08-12T20:37:21.8689099Z stage-2: chain: Error#1: historical inaccuracy in replay of v7
2022-08-12T20:37:21.8692132Z stage-2: chain: 
2022-08-12T20:37:21.8695606Z stage-2: chain:   at requireIdentical (.../swingset-vat/src/kernel/vat-loader/transcript.js:8:8)
2022-08-12T20:37:21.8720355Z stage-2: chain:   at Object.simulateSyscall (.../swingset-vat/src/kernel/vat-loader/transcript.js:63:22)
2022-08-12T20:37:21.8734717Z stage-2: chain:   at Object.syscallFromWorker (.../swingset-vat/src/kernel/vat-loader/manager-helper.js:249:30)
2022-08-12T20:37:21.8739089Z stage-2: chain:   at handleUpstream (.../swingset-vat/src/kernel/vat-loader/manager-subprocess-xsnap.js:86:11)
2022-08-12T20:37:21.8743272Z stage-2: chain:   at handleCommand (.../swingset-vat/src/kernel/vat-loader/manager-subprocess-xsnap.js:109:14)
2022-08-12T20:37:21.8746933Z stage-2: chain:   at handleCommand (packages/xsnap/src/replay.js:111:26)
2022-08-12T20:37:21.8750340Z stage-2: chain:   at runToIdle (packages/xsnap/src/xsnap.js:203:42)
2022-08-12T20:37:21.8753859Z stage-2: chain:   at runMicrotasks (<anonymous>)

It went away after a trivial commit (that shouldn't have changed anything), which is terrifying, but it also made me think that it'd be nice to capture more context in that situation.

I'm thinking that instead of only printing the one transcript entry that we were expecting, we should find a way to print the last 5 or 10 entries.

Also, sometimes the divergence is that the original did syscalls A/B/C, and the replay did B/A/C or A/C/B (swapping the order of two, like if somehow GC finalizers fired in a different order and we didn't sort properly to mask it). It'd be nice to be able to distinguish this from a replay that only did A/B or only A/C. To capture that, I think we'd have to allow the delivery to run to completion, and then only compare syscalls at the very end. Or, allow N more syscalls to be made before we stop accepting any more and just report the error. This sounds trickier, I'm not sure what it would take to implement it safely.

Description of the Design

Not sure yet

Security Considerations

We'd need to maintain the invariant that the kernel halts before committing any state changes. Beyond that, it ought to be safe to let the doomed vat finish out its last delivery.

Test Plan

This sort of logging is pretty tough to test, I'd be satisfied with a manual inspection of a deliberately provoked error. However, we would like confidence that the error can be emitted without e.g. a logging exception causing the final panic() to be bypassed, which would then fail the "kernel should halt" invariant.

So probably a unit test which triggers anachrophobia, and then asserts that the kernel has paniced. Then we can manually look at the log messages and decide that they're correct.

The text was updated successfully, but these errors were encountered:

warner · 2023-03-14T18:11:02Z

@mhofman say he's got a branch with something that might do this, he might pick it up, but we figure it's low priority

warner · 2023-04-18T17:29:55Z

BTW my 6770 branch (PR #7428) should do a better job with this, see the explain() function in vat-warehouse.js

warner · 2023-04-19T06:04:51Z

My 6770 branch will only print a single transcript entry, but it will print all expected syscalls, and all executed syscalls, and highlight the point of divergence.

mhofman · 2023-05-01T17:18:25Z

I believe this is done with #7428 having landed.

warner added enhancement New feature or request SwingSet package: SwingSet labels Aug 12, 2022

Tartuffo assigned FUDCo Aug 16, 2022

Tartuffo added this to the Mainnet 1 RC0 milestone Aug 16, 2022

Tartuffo removed this from the Mainnet 1 RC0 milestone Sep 21, 2022

Tartuffo added migrate-up-next and removed migrate-up-next labels Nov 17, 2022

rowgraus added the vaults_triage DO NOT USE label Jan 3, 2023

warner assigned mhofman and unassigned FUDCo Mar 14, 2023

mhofman closed this as completed May 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"anachrophobia error" should print the last few transcript entries #5950

"anachrophobia error" should print the last few transcript entries #5950

warner commented Aug 12, 2022

warner commented Mar 14, 2023

warner commented Apr 18, 2023

warner commented Apr 19, 2023

mhofman commented May 1, 2023

"anachrophobia error" should print the last few transcript entries #5950

"anachrophobia error" should print the last few transcript entries #5950

Comments

warner commented Aug 12, 2022

What is the Problem Being Solved?

Description of the Design

Security Considerations

Test Plan

warner commented Mar 14, 2023

warner commented Apr 18, 2023

warner commented Apr 19, 2023

mhofman commented May 1, 2023