Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(xsnap)!: refactor xsnap wrapper and snapStore to use streams for snapshots #7531

Merged
merged 2 commits into from
Apr 27, 2023

Conversation

mhofman
Copy link
Member

@mhofman mhofman commented Apr 27, 2023

closes: #7490

Description

Refactors the xsnap.js wrapper around the xsnap-worker, and the snapStore to use streams when loading and saving snapshots. This moves the temporary file logic from the snapstore to xsnap in preparation of #6363.

Security Considerations

None

Scaling Considerations

This will ultimately allow moving away from writing raw snapshots to disk.

Documentation Considerations

We lose the ability to record information about the loaded snapshot in xsnap traces which was previously carried in the temporary filename.

Updated some markdown docs to reflect the new API

Testing Considerations

Updated all related tests.

@mhofman mhofman requested a review from warner April 27, 2023 05:40
Copy link
Member

@warner warner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the refactoring.. feels cleaner, and clearly paves the way for the next step.

Apart from the snapshot name quibbles, it looks good to me. Don't over-rotate on finding better names, but since the simplest possible API a newcomer could imagine would be for complete snapshot Buffers to be passed around, and this API is the second-simplest possible API (streams being passed around), we should choose API names that help that newcomer realize we're doing the streams thing instead of the blobs/buffer thing.

* @returns {Promise<SnapshotResult>}
*/
async function saveSnapshot(vatID, snapPos, saveRaw) {
async function saveSnapshot(vatID, snapPos, snapshot) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we call this snapshotStream or readStream or snapshotReader? The name snapshot makes me think it's an actual complete Uint8Array or a Buffer, not a stream.


const { result: hash, duration: compressSeconds } =
const { duration: compressSeconds, result: compressedSnapshot } =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know the variable was there in the original, but "compressSeconds".. with the tee that's happening, this isn't really measuring the time it takes to compress the snapshot, right? It's more like the time it takes for the writer to give us the snapshot, plus the time it takes us to compress it?

Eventually we should look at the timers we're using and decide if they're providing us with meaningful data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct but since I still expect the compression to be the limiting factor, I didn't feel too bad leave it at compressSeconds. A createAndCompressSeconds would be more accurate, but that's a mouthful. Open to suggestions.

});
const hash = hashStream.digest('hex');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call this snapshotID, that's the name I started using recently in the new transcript entries.

Although.. the DB schema might use hash. If so, I'd be in favor of changing the schema.

async function* loadSnapshot(vatID) {
const loadInfo = sqlLoadSnapshot.get(vatID);
loadInfo || Fail`no snapshot available for vat ${q(vatID)}`;
const { hash, compressedSnapshot } = loadInfo;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe snapshotID here

gzReader.destroy();
// eslint-disable-next-line @jessie.js/no-nested-await
await finished(gzReader);
const h = hashStream.digest('hex');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.. which would open up hash to use here

* @property {(request:Uint8Array) => Promise<Uint8Array>} [handleCommand]
* @property {string} [name]
* @property {boolean} [debug]
* @property {number} [netstringMaxChunkSize] in bytes (must be an integer)
* @property {number} [parserBufferSize] in kB (must be an integer)
* @property {string} [snapshot]
* @property {AsyncIterable<Uint8Array>} [snapshot]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe change this argument name to snapshotStream? The old one should really have been snapshotFilename.

}
await (snapshot &&
(async () => {
const { hint = 'unknown' } = /** @type {any} */ (snapshot);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

huh, the argument is a stream, but it might have a .hint property? is that common?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah poor attempt at plumbing through a description for the tracing feature. Will redo.

vatExit.promise.catch(noop).then(() => startCleanup && startCleanup());
}

const [, , , writer, reader] =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[, , , writer, reader] feels less readable than .. stdio[3]/etc. It'd be nice if someon reading this code could immediately tell that we're using fd3/fd4 here. Hopefully there's some way of making the types work too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This commit shouldn't have made it in, part of the streaming with the worker process

const result = baton.then(async () => {
await messagesToXsnap.next(encoder.encode(`w${file}`));
await runToIdle();
async function* makeSnapshot(hint = 'unknown') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe makeSnapshotReader or makeSnapshotReadStream? although it doesn't feel as urgent for this case, somehow

@@ -116,7 +119,7 @@ export function recordXSnap(options, folderPath, { writeFileSync }) {
name = '_replay_',
debug = false,
parserBufferSize = undefined,
snapshot = undefined,
snapshot,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, snapshot sounds like a complete buffer

@mhofman mhofman force-pushed the mhofman/7490-xsnap-stream-snapshot-refactor branch from b411f5f to 66c5d8e Compare April 27, 2023 11:41
@mhofman
Copy link
Member Author

mhofman commented Apr 27, 2023

Apart from the snapshot name quibbles, it looks good to me. Don't over-rotate on finding better names

I've done the rename to snapshotStream, but the other renames didn't quite fit or were and unnecessarily verbose IMO. We can always revisit.

I had to force-push the changes to avoid conflicts, but I kept the changes in separate commit and have not changed the original ones, so you can focus on the fixup! commits.

@mhofman mhofman added the automerge:rebase Automatically rebase updates, then merge label Apr 27, 2023
@mhofman mhofman force-pushed the mhofman/7490-xsnap-stream-snapshot-refactor branch from 9caba01 to ed87de1 Compare April 27, 2023 14:58
@mergify mergify bot merged commit 374b6f5 into master Apr 27, 2023
@mergify mergify bot deleted the mhofman/7490-xsnap-stream-snapshot-refactor branch April 27, 2023 15:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge:rebase Automatically rebase updates, then merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add streaming readSnapshot API to snapStore
2 participants