Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repro(zql): not exists bug #3534

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

repro(zql): not exists bug #3534

wants to merge 3 commits into from

Conversation

tantaman
Copy link
Contributor

@tantaman tantaman commented Jan 14, 2025

This reproduces a user report with queries that use NOT EXISTS.

Thread: https://discord.com/channels/830183651022471199/1326515508534579240/1326515834763612241
Summary of what they saw --

  1. Given the following query:
const query = newQuery(queryDelegate, schema.tables.TYL_trackable)
   .where(({not, exists}) =>
     not(exists('trackableGroup', q => q.where('group', '=', 'archived'))),
   )
   .related('trackableGroup');
  1. A single trackable row {id: '001', name: 'trackable 1'}.
  2. 0 trackableGroup rows.

After adding a trackableGroup row with group set to archived, the trackable row returned
by the query is removed. As expected.

When deleting the trackableGroup row, the trackable row is not returned by the query. This is unexpected. The row should be returned since it is no longer archived.

This repro shows that zql (not-exists.test.ts) does the correct thing but the pipeline-driver (pipeline-driver.repro.test.ts) is sending the wrong number of add and remove changes.

After the trackableGroup row is deleted from the server, the client is still left with 1 trackableGrouprow on their device.

Copy link

vercel bot commented Jan 14, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
replicache-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 14, 2025 9:32pm
zbugs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jan 14, 2025 9:32pm

Copy link

github-actions bot commented Jan 14, 2025

🐰 Bencher Report

Branchmlaw/repro-not-exists
Testbedlocalhost
Click to view all benchmark results
BenchmarkFile SizeBenchmark Result
kilobytes (KB)
(Result Δ%)
Upper Boundary
kilobytes (KB)
(Limit %)
zero-package.tgz📈 view plot
🚷 view threshold
961.11
(-0.58%)
1,015.09
(94.68%)
zero.js📈 view plot
🚷 view threshold
181.15
(-2.36%)
194.81
(92.99%)
zero.js.br📈 view plot
🚷 view threshold
50.83
(-1.81%)
54.35
(93.52%)
🐰 View full continuous benchmarking report in Bencher

@grgbkr
Copy link
Contributor

grgbkr commented Jan 14, 2025

@tantaman Wow. Great job reproducing!

Any theories on where the bug is?

@tantaman
Copy link
Contributor Author

tantaman commented Jan 14, 2025

Any theories on where the bug is?

A related issue has to do with trying to get the client to keep rows so it can compute exists and not exists locally.

This is broken for NOT EXISTS when a row is added.

SELECT * FROM issue WHERE NOT EXISTS comment

A comment is added to issue 1, becoming the first comment for that issue. Issue 1 is then removed from the query results due to NOT EXISTS. Because issue 1 is a parent of the comment, the comment row will be removed too since the pipeline driver will remove all rows in relationships hanging off of the parent.

@tantaman
Copy link
Contributor Author

A potentially related issue has to do with trying to get the client to keep rows so it can compute exists and not exists locally.

This does present a problem although maybe not the exact problem the user described. If we are not sending down the rows to compute NOT EXISTS then the client will compute incorrect query results.

@tantaman
Copy link
Contributor Author

We'll be disabling not exists on the client for the time being.

Some ideas for fixes:

  1. Custom queries where the user specifies what to sync in their server side query via related and we do not ever sync exists clauses.
  2. Creating a synthetic query where the CVR is the source and the whereNotExists clause is appended.

Example of (1):

// client query:
z.issue.whereExists('assignee')

// server query
z.issue.whereExists('assignee').related('assignee', q => q.limit(2));
// ^- use related to sync what rows you'd like to use for the client to satisfy exists clauses

// todo: work out how this looks for `not exists`

Example of (2):

not exists means we need to fetch a row for everything that does not match. We can do this by looking at the data on device and getting related rows to define the "negative space" of not exists.

If we have issue.whereNotExists('assignee') then we need to fetch an assignee row for every issue row we have on device. This way the client can figure out which issues on device actually do not have an assignee.

To compute this for every row on device using the CVR synthetic source:

cvr.issue.related('assignee')

TODO: how does this work with many nested notExists?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants