Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(store-indexer, store-sync): improve query performance and enable compression, add new api #2026

Merged
merged 24 commits into from
Dec 7, 2023

Conversation

alvrs
Copy link
Member

@alvrs alvrs commented Dec 5, 2023

Query performance is increased by ~10x thanks to @holic's queryLogs helper to replace drizzle, and data transferred over the wire is down by ~40x thanks to @holic's gzip implementation

CleanShot 2023-12-05 at 22 53 04@2x

Copy link

changeset-bot bot commented Dec 5, 2023

🦋 Changeset detected

Latest commit: 244c42e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 30 packages
Name Type
@latticexyz/common Major
@latticexyz/store-indexer Major
@latticexyz/store-sync Major
@latticexyz/block-logs-stream Major
@latticexyz/cli Major
@latticexyz/config Major
@latticexyz/dev-tools Major
@latticexyz/protocol-parser Major
@latticexyz/store Major
@latticexyz/world-modules Major
@latticexyz/world Major
@latticexyz/react Major
@latticexyz/abi-ts Major
create-mud Major
@latticexyz/ecs-browser Major
@latticexyz/faucet Major
@latticexyz/gas-report Major
@latticexyz/network Major
@latticexyz/noise Major
@latticexyz/phaserx Major
@latticexyz/recs Major
@latticexyz/schema-type Major
@latticexyz/services Major
@latticexyz/solecs Major
solhint-config-mud Major
solhint-plugin-mud Major
@latticexyz/std-client Major
@latticexyz/std-contracts Major
@latticexyz/store-cache Major
@latticexyz/utils Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

Comment on lines +19 to +27
let bytesSinceFlush = 0;
stream.on("data", (data) => {
bytesSinceFlush += data.length;
if (bytesSinceFlush > bytesThreshold) {
bytesSinceFlush = 0;
stream.flush();
}
});
return stream;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible that the "last" bytes are stuck in the stream if the data isn't a multiple of bytesThreshold? Would we need some kind of timeout after which we flush even if there is no more data event?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or is that handled by compressed.end(ctx.body)?

Copy link
Member

@holic holic Dec 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible that the "last" bytes are stuck in the stream if the data isn't a multiple of bytesThreshold? Would we need some kind of timeout after which we flush even if there is no more data event?

This might be possible for an open and long running stream, yes. But shouldn't matter as much while we're closing the stream when we get to the end of the query cursor.

The stream is flushed when the stream closes (which is separate from compressed.end(ctx.body), which pushes+closes the entirety of the body to the compressed, for non-streaming bodies).

I think it would be best to expose a .flush method in this middleware that we can access downstream in the eventStream middleware, but it was tricky to get it right with types, so I punted on that for now.

// TODO: move this to `.findFirst` after upgrading drizzle or `rows[0]` after enabling `noUncheckedIndexedAccess: true`
.then((rows) => rows.find(() => true));
const indexerBlockNumber = chainState?.lastUpdatedBlockNumber ?? 0n;
router.get("/get/logs", compress(), async (ctx) => {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is now basically a REST API, can we call this /api/logs and name this function/module api or apiRoutes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep! was waiting for naming feedback, wasn't very happy with /get

@@ -0,0 +1,10 @@
// Inspired by https://doc.rust-lang.org/std/result/
export type Result<Ok, Err = unknown> = { ok: Ok } | { error: Err };
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

@@ -4,7 +4,8 @@
"private": true,
"license": "MIT",
"scripts": {
"test": "vitest --run",
"create:db": "psql postgres://127.0.0.1/postgres -c \"CREATE DATABASE postgres_e2e;\"",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it feasible to do this in the test setup script?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had that initially, but it felt weird to attempt to create a database that might already exist, permissions to create a new database might be different for each user, and it only has to be executed once per machine (since the db is not dropped, just cleaned, at the end), so it felt cleaner to colocate the db creation with the place where we define the db url to use

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was just worried about the availability of psql

@@ -39,7 +39,7 @@ const database = drizzle(postgres(env.DATABASE_URL));

if (await shouldCleanDatabase(database, chainId)) {
console.log("outdated database detected, clearing data to start fresh");
cleanDatabase(database);
await cleanDatabase(database);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oooof, wonder if we should enable a linter for this

feels like we have fewer instances of calling an async function and not awaiting right away and can add a lint ignore for the line we use that approach explicitly

Copy link
Member

@holic holic Dec 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noted here: #2040

const benchmark = createBenchmark("postgres:logs");

try {
const opts = input.parse(typeof ctx.query.input === "string" ? JSON.parse(ctx.query.input) : {});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this parse fails, it should prob be a 4xx error (client error) rather than a 5xx error (server error)

@alvrs alvrs changed the title feat(store-indexer): improve query performance and enable compression feat(store-indexer, store-sync): improve query performance and enable compression, add new api Dec 7, 2023
@alvrs alvrs merged commit 4c1dcd8 into main Dec 7, 2023
11 checks passed
@alvrs alvrs deleted the alvrs/query-perf branch December 7, 2023 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants