Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(swingset): make slog deliveryNum match transcript position #7550

Merged
merged 1 commit into from
Apr 30, 2023

Conversation

warner
Copy link
Member

@warner warner commented Apr 29, 2023

When I added pseudo-deliveries to the transcript in #7484, I forgot to update the slog entry deliveryNum counter to match. As result, the startVat was deliveryNum 0 but transcript position 1 (since 0 is the initialize-worker). The gap grew by 2 after every snapshot save/load event pair.

We don't slog the pseudo-deliveries as .type: 'deliver' (they're recorded other ways). While it will be weird to have gaps in the slogfile's deliveryNum sequence, I think it's better to maintain easy correlation between the slog and the transcript.

So this changes vatKeeper.nextDeliveryNum(). Previously, that function maintained a separate counter (in kvStore) for each vat, which was only used by slog entries. Now, it queries the transcriptStore, with getCurrentSpanBounds().endPos. This is only incremented when we add an entry to the transcript, so the tests of nextDeliveryNum()'s auto-increment feature were updated too.

closes #7549

@warner warner added the SwingSet package: SwingSet label Apr 29, 2023
@warner warner self-assigned this Apr 29, 2023
@warner warner force-pushed the 7549-deliverynum-match-transcript branch 2 times, most recently from fd5a0cd to dce7bca Compare April 29, 2023 07:53
@warner warner requested a review from FUDCo April 29, 2023 08:03
@warner
Copy link
Member Author

warner commented Apr 29, 2023

cc @mhofman you'll probably want to be aware of this w.r.t. OTEL and telemetry stuff

Copy link
Contributor

@FUDCo FUDCo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@warner warner force-pushed the 7549-deliverynum-match-transcript branch from dce7bca to f3dddef Compare April 29, 2023 08:36
@warner warner added the automerge:rebase Automatically rebase updates, then merge label Apr 29, 2023
Copy link
Member

@mhofman mhofman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice and clean. Otel logic doesn't care about the content.

@warner
Copy link
Member Author

warner commented Apr 29, 2023

Nice and clean. Otel logic doesn't care about the content.

https://github.com/Agoric/agoric-sdk/actions/runs/4838026300/jobs/8622248431?pr=7550 might beg to differ, but it might be a transient failure of the Chain Deployment Test. I'll kick it and see if it fails a second time.

@mhofman
Copy link
Member

mhofman commented Apr 29, 2023

Oh that might be a bug in the loadgen!

Edit: I don't see any loadgen error, but a suspicious cosmos error on the client side:

stage-1: client: 2023-04-29T09:42:45.735Z chain-cosmos-sdk: WebSocket error ErrorEvent {
stage-1: client:   target: WebSocket {
stage-1: client:     _events: [Object: null prototype] {
stage-1: client:       error: [Object],
stage-1: client:       open: [Object],
stage-1: client:       message: [Object],
stage-1: client:       close: [Object]
stage-1: client:     },
stage-1: client:     _eventsCount: 4,
stage-1: client:     _maxListeners: undefined,
stage-1: client:     _binaryType: 'nodebuffer',
stage-1: client:     _closeCode: 1006,
stage-1: client:     _closeFrameReceived: false,
stage-1: client:     _closeFrameSent: false,
stage-1: client:     _closeMessage: '',
stage-1: client:     _closeTimer: null,
stage-1: client:     _extensions: {},
stage-1: client:     _protocol: '',
stage-1: client:     _readyState: 2,
stage-1: client:     _receiver: null,
stage-1: client:     _sender: null,
stage-1: client:     _socket: null,
stage-1: client:     _bufferedAmount: 0,
stage-1: client:     _isServer: false,
stage-1: client:     _redirects: 0,
stage-1: client:     _url: 'ws://172.17.0.2:26657/websocket',
stage-1: client:     _req: null,
stage-1: client:     [Symbol(kCapture)]: false
stage-1: client:   },
stage-1: client:   type: 'error',
stage-1: client:   message: 'connect ECONNREFUSED 172.17.0.2:26657',
stage-1: client:   error: {
stage-1: client:     errno: -111,
stage-1: client:     code: 'ECONNREFUSED',
stage-1: client:     syscall: 'connect',
stage-1: client:     address: '172.17.0.2',
stage-1: client:     port: 26657
stage-1: client:   }
stage-1: client: }

@mhofman
Copy link
Member

mhofman commented Apr 29, 2023

Actually the chain seem to have stopped producing blocks, will check the logs.

Oh no, log collection is busted! Will reproduce locally

When I added pseudo-deliveries to the transcript in #7484, I forgot to
update the slog entry `deliveryNum` counter to match. As result, the
`startVat` was deliveryNum 0 but transcript position 1 (since 0 is the
`initialize-worker`). The gap grew by 2 after every snapshot save/load
event pair.

We don't slog the pseudo-deliveries as `.type: 'deliver'` (they're
recorded other ways). While it will be weird to have gaps in the
slogfile's `deliveryNum` sequence, I think it's better to maintain
easy correlation between the slog and the transcript.

So this changes `vatKeeper.nextDeliveryNum()`. Previously, that
function maintained a separate counter (in `kvStore`) for each vat,
which was only used by slog entries. Now, it queries the
transcriptStore, with `getCurrentSpanBounds().endPos`. This is only
incremented when we add an entry to the transcript, so the tests of
`nextDeliveryNum()`'s auto-increment feature were updated too.

closes #7549
@warner warner force-pushed the 7549-deliverynum-match-transcript branch from f3dddef to a29274b Compare April 29, 2023 21:20
@mhofman
Copy link
Member

mhofman commented Apr 30, 2023

@Mergifyio refresh

@mergify
Copy link
Contributor

mergify bot commented Apr 30, 2023

refresh

✅ Pull request refreshed

@mergify mergify bot merged commit ec3faed into master Apr 30, 2023
@mergify mergify bot deleted the 7549-deliverynum-match-transcript branch April 30, 2023 00:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge SwingSet package: SwingSet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

slogfile deliveryNums don't match transcript positions
3 participants