-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Using index for NULL values is slower than full table scan (not using index) #62963
Comments
This comment has been minimized.
This comment has been minimized.
FWIW: Deleting and recreating the index appears to have made the "WHERE grouping_id IS NULL" queries fast again. Not a great solution, but a solution. |
These queries should result in a contradiction when we generate index constraints (since the column is defined as |
Oh, sorry I better clarify something. The |
@kjlubick Are you running multiple |
Yes, there would have been multiple |
FWIW, those |
Contention may be the culprit - scanning for rows |
Performance appears normal after recreating the index:
There are currently about 22 million rows in TiledTraceDigests as I write this. |
Another possibility is that you ran into #54029. If so, that first Even if it's unrelated to #54029, I'd suggest reducing the number of rows updated in each batch to around 10k. You should get more consistent performance by doing so. |
@kjlubick I'm going to close this issue for now because it's unlikely we'll get to the bottom of the this unless you encounter it again. Our best guess that it's related to #54029, and reducing the batch size of the updates to ~10k should mitigate that. Please leave a comment if you see it again, and we can try to diagnose it. |
Describe the problem
I have a table (TiledTraceDigests) with ~20 million rows. I realized the table needed another column (grouping_id) and an index on the new column, so I added it with alter table and then create index.
I then needed to fill in this column with data by looking it up in another table (Traces), so I ran several updates like:
The Limit of 1 million was to prevent the updates from taking too long or having to be retried if new data came in.
This was going fine for the first 16 updates or so, taking 60s or so per update. Suddenly, an update stalled out, taking over 15 minutes before I killed it from the UI.
I tried running the same update command with limit 5 and instead of happening in tens or hundreds of milliseconds, it took over 10 seconds (to update 5 rows).
I used EXPLAIN ANALYZE to see where the time was taking. It was blocked on getting rows from TiledTraceDigests.
Here's some interesting queries:
select * From TiledTraceDigests@grouping_digest_idx where grouping_id is null limit 10;
took 6.7 secondsselect * From TiledTraceDigests@primary where grouping_id is null limit 10;
took 5.3 secondsselect * From TiledTraceDigests@grouping_digest_idx where grouping_id = x'a181394e13962c65455837cbdd3a8da8' limit 10;
took 4 milliseconds (as I would expect).It appears that querying the null portion of this index is very very slow.
I first noticed this on v20.2.3, but the problem appears to persist after updating to v20.2.7.
Expected behavior
I expect querying a few rows from an index to be fast (milliseconds), not slower than avoiding use of the index.
Additional data / screenshots
SQL Schemas:
I've attached the zip file taken from Statement Diagnostics in the UI.
stmt-bundle-646243008118882309.zip
Environment:
cockroach sql
Additional context
What was the impact?
My new column is only partially filled out, and I'm not sure how long it will take me to finish.
The text was updated successfully, but these errors were encountered: