Poor performance when filtering on more than one JSON property #136315
Labels
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-community
Originated from the community
T-sql-queries
SQL Queries Team
X-blathers-triaged
blathers was able to find an owner
Describe the problem
When working with JSON data types covered by a GIN or inverted index:
SELECT queries that include more than one property of a JSON column in the WHERE clause perform badly. The performance gets worse as more filters are added. Equivalent queries in Postgres perform better.
To Reproduce
Create the following test table, populate with 100k rows, and update the stats:
Run the following queries, note the count in all cases is 4347:
WHERE with 2 filters
Now run these same queries again, prefixing with EXPLAIN ANALYSE . Note that for the initial scan against the index
test@doc_contents
, the first two queries returnactual row count: 4,347
, while the last query returnsactual row count: 8,694
.This same behaviour also occurs using the alternative syntax:
WHERE with 3 filters
The initial scan against
test@doc_contents
in the above query showsactual row count: 16,386
, even though the number of rows returned by the query is only 334.Expected behavior
As the filters are being ANDed together, the number of rows retrieved from the GIN index should decrease as as additional filters are added to the WHERE clause, not increase.
Running the equivalent queries against a Postgres 16 database, you can see that the correct number of rows is returned by the initial index scan :
Additional data / screenshots
EXPLAIN ANALYSE output for CockroachDB:
cockroachdb-explain-analyse.txt
SQL to setup on PostgreSQL 16:
postgresql-setup-test.txt
EXPLAIN ANALYSE output for PostgreSQL:
postgresql-explain-analyse.txt
Environment:
cockroach sql
Additional context
This is a scenario I have created for testing, but it accurately describes a real problem I am seeing in production. While CockroachDB has JSON support, I am concerned about the performance of tables that use JSON datatypes in practice.
Jira issue: CRDB-44969
The text was updated successfully, but these errors were encountered: