Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding resultsFormat to d1-api.ts to fix .raw() bug #1586

Merged
merged 4 commits into from
Feb 2, 2024

Conversation

geelen
Copy link
Contributor

@geelen geelen commented Jan 29, 2024

This PR does a couple of things:

  • Massively improves the realism of the D1 test suite. I've also got d1-api.ts and d1-api-test.js being imported into D1's internal test suite to verify these are correct, but the new d1-mock now much more closely mirrors the real system.
  • Adds a resultsFormat flag to better control what D1 returns behind the scenes, with a default to ROWS_AND_COLUMNS, which should be more efficient (see below).
  • Fixes .run() to no longer return .results property, as per the docs.
  • Makes .raw() effectively the default and derives .all()/.first() from that, rather than the other way around.

In particular, this change fixes a long-standing bug with D1 where joining tables with the same column names was losing data, e.g.

await DB.exec(`
  CREATE TABLE abc (a INT, b INT, c INT);
  CREATE TABLE cde (c INT, d INT, e INT);
  INSERT INTO abc VALUES (1,2,3),(4,5,6);
  INSERT INTO cde VALUES (7,8,9),(1,2,3);
`),

const { results } = await DB.prepare(`SELECT * FROM abc, cde;`).all()
//>  results: [
//>     { a: 1, b: 2, c: 7, d: 8, e: 9 },
//>     { a: 1, b: 2, c: 1, d: 2, e: 3 },
//>     { a: 4, b: 5, c: 7, d: 8, e: 9 },
//>     { a: 4, b: 5, c: 1, d: 2, e: 3 },
//>   ]

Note that cde.c is overriding abc.c in each of the rows as it's a simple object map and last-one-wins.

The intended solution was to be able to use .raw() to skip the step of building JS objects from each row:

await DB.prepare(`SELECT * FROM abc, cde;`).raw()
//>  [
//>    [1, 2, 3, 7, 8, 9],
//>    [1, 2, 3, 1, 2, 3],
//>    [4, 5, 6, 7, 8, 9],
//>    [4, 5, 6, 1, 2, 3],
//>  ]

But previously that wasn't working as the D1 service was building the objects before sending them back, so by the time your worker got a response the data was already missing. In fact, .raw() previously just did a Object.values() on each row of the response, which is some real alpha code (sorry everyone).

That's now fixed, if you pass ?resultsFormat=ROWS_AND_COLUMNS you get { results: { columns: [...], rows: [[...],[...], ...]} instead of { results: [{ ... }, { ... }, ...]} and you can reconstruct the object form for .all() just fine. But since this code is going to roll out independently of the backend changes, this PR was slightly complicated by having to support both old and new response types.

One outstanding question: should we find a way to return .columns to the user at all? As it stands, .raw() returns a simple array: [ row1_values, row2_values, ... ], which doesn't really leave space for a .columns property (even if that would be technically possible to jam in there). Perhaps we should make .raw(true) return a { rows, columns } object?

Leaving this as draft PR for now as I'm not 100% that this API doesn't need further changes, but would appreciate any 👀 from workerd folks.

@geelen
Copy link
Contributor Author

geelen commented Jan 31, 2024

Update on this, we're going to add .raw({ columnNames: true }) to return column names:

const [columns, ...rows] = await DB.prepare(`SELECT * FROM abc, cde;`).raw({ columnNames: true })
//>  columns: ["a","b","c","c","d","e"]
//>  rows: [
//>    [1, 2, 3, 7, 8, 9],
//>    [1, 2, 3, 1, 2, 3],
//>    [4, 5, 6, 7, 8, 9],
//>    [4, 5, 6, 1, 2, 3],
//>  ]

I definitely think we can do better in future but for now this at least makes it possible to return the column names, even if it's slightly unergonomic.

@geelen geelen force-pushed the glen/d1-api-adding-results-format branch from 478c4fe to 4f31e28 Compare January 31, 2024 04:08
@geelen geelen marked this pull request as ready for review January 31, 2024 04:14
@geelen geelen requested review from a team as code owners January 31, 2024 04:14
@geelen geelen requested review from fhanau and jp4a50 January 31, 2024 04:14
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but this file is kinda similar to Miniflare's d1/database.worker.ts. I wonder if there's some way we could avoid duplicating these implementations? Should Miniflare's workers maybe live inside workerd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah this is absolutely doing the same thing, which means miniflare needs to be updated as well. 🤔

Would be great to combine somehow, I'll start a new convo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@geelen geelen force-pushed the glen/d1-api-adding-results-format branch from 4f31e28 to 1bd2170 Compare February 1, 2024 02:40
This will let us test the correctness of the D1 API much more deeply
…y interchange format

Also massively upgraded the test harness to verify this behaviour (also run against D1's internal test suite)
Copy link
Contributor

@mrbbot mrbbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, with one comment: do any of these changes need to be gated behind a compatibility flag? Specifically thinking about raw() now potentially returning an array of arrays with more values/different indexes for existing values.

@@ -42,6 +63,7 @@ class D1Database {
return new D1PreparedStatement(this, query)
}

// DEPRECATED, TO BE REMOVED WITH NEXT BREAKING CHANGE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO dump() is safe to remove today, since it's a no-op for SRS-backed dbs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I know but it's still a breaking change unless we've already shut down all alpha DBs. So I was going to leave it for now and fix it when we're forced to introduce a compat date.

@geelen
Copy link
Contributor Author

geelen commented Feb 2, 2024

Looks good, with one comment: do any of these changes need to be gated behind a compatibility flag? Specifically thinking about raw() now potentially returning an array of arrays with more values/different indexes for existing values.

Yeah I can see arguments either way, but the existing behaviour is definitely a bug (that very few people have actually hit) so I wanted to just roll it out silently. My main issue with a compat_date change is that we have to leave the old API in there alongside the fixed version, or make d1-api aware of the change and able to... reproduce the buggy result for older compat dates?

@geelen geelen merged commit 3c85053 into main Feb 2, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants