-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(store-indexer, store-sync): improve query performance and enable compression, add new api #2026
Conversation
🦋 Changeset detectedLatest commit: 244c42e The changes in this PR will be included in the next version bump. This PR includes changesets to release 30 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
73d7ef6
to
e0f00c1
Compare
d3f9a9e
to
89ac21e
Compare
let bytesSinceFlush = 0; | ||
stream.on("data", (data) => { | ||
bytesSinceFlush += data.length; | ||
if (bytesSinceFlush > bytesThreshold) { | ||
bytesSinceFlush = 0; | ||
stream.flush(); | ||
} | ||
}); | ||
return stream; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible that the "last" bytes are stuck in the stream if the data isn't a multiple of bytesThreshold
? Would we need some kind of timeout after which we flush even if there is no more data event?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or is that handled by compressed.end(ctx.body)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible that the "last" bytes are stuck in the stream if the data isn't a multiple of
bytesThreshold
? Would we need some kind of timeout after which we flush even if there is no more data event?
This might be possible for an open and long running stream, yes. But shouldn't matter as much while we're closing the stream when we get to the end of the query cursor.
The stream is flushed when the stream closes (which is separate from compressed.end(ctx.body)
, which pushes+closes the entirety of the body to the compressed, for non-streaming bodies).
I think it would be best to expose a .flush
method in this middleware that we can access downstream in the eventStream
middleware, but it was tricky to get it right with types, so I punted on that for now.
// TODO: move this to `.findFirst` after upgrading drizzle or `rows[0]` after enabling `noUncheckedIndexedAccess: true` | ||
.then((rows) => rows.find(() => true)); | ||
const indexerBlockNumber = chainState?.lastUpdatedBlockNumber ?? 0n; | ||
router.get("/get/logs", compress(), async (ctx) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is now basically a REST API, can we call this /api/logs
and name this function/module api
or apiRoutes
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep! was waiting for naming feedback, wasn't very happy with /get
@@ -0,0 +1,10 @@ | |||
// Inspired by https://doc.rust-lang.org/std/result/ | |||
export type Result<Ok, Err = unknown> = { ok: Ok } | { error: Err }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀
e2e/packages/sync-test/package.json
Outdated
@@ -4,7 +4,8 @@ | |||
"private": true, | |||
"license": "MIT", | |||
"scripts": { | |||
"test": "vitest --run", | |||
"create:db": "psql postgres://127.0.0.1/postgres -c \"CREATE DATABASE postgres_e2e;\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it feasible to do this in the test setup script?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had that initially, but it felt weird to attempt to create a database that might already exist, permissions to create a new database might be different for each user, and it only has to be executed once per machine (since the db is not dropped, just cleaned, at the end), so it felt cleaner to colocate the db creation with the place where we define the db url to use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was just worried about the availability of psql
@@ -39,7 +39,7 @@ const database = drizzle(postgres(env.DATABASE_URL)); | |||
|
|||
if (await shouldCleanDatabase(database, chainId)) { | |||
console.log("outdated database detected, clearing data to start fresh"); | |||
cleanDatabase(database); | |||
await cleanDatabase(database); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oooof, wonder if we should enable a linter for this
feels like we have fewer instances of calling an async function and not awaiting right away and can add a lint ignore for the line we use that approach explicitly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
noted here: #2040
const benchmark = createBenchmark("postgres:logs"); | ||
|
||
try { | ||
const opts = input.parse(typeof ctx.query.input === "string" ? JSON.parse(ctx.query.input) : {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this parse fails, it should prob be a 4xx error (client error) rather than a 5xx error (server error)
Co-authored-by: Kevin Ingersoll <[email protected]>
1da1757
to
b1c1702
Compare
Query performance is increased by ~10x thanks to @holic's
queryLogs
helper to replace drizzle, and data transferred over the wire is down by ~40x thanks to @holic'sgzip
implementation