Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Poor performance when filtering on more than one JSON property #136315

Open
cward-titanhq opened this issue Nov 27, 2024 · 2 comments
Open

Poor performance when filtering on more than one JSON property #136315

cward-titanhq opened this issue Nov 27, 2024 · 2 comments
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner

Comments

@cward-titanhq
Copy link

cward-titanhq commented Nov 27, 2024

Describe the problem

When working with JSON data types covered by a GIN or inverted index:
SELECT queries that include more than one property of a JSON column in the WHERE clause perform badly. The performance gets worse as more filters are added. Equivalent queries in Postgres perform better.

To Reproduce

Create the following test table, populate with 100k rows, and update the stats:

CREATE TABLE test (
  id UUID NOT NULL AS ((doc->>'id':::STRING)::UUID) STORED,
  doc JSONB NOT NULL,
  PRIMARY KEY (id),
  INVERTED INDEX doc_contents (doc)
);

INSERT INTO test (doc)
SELECT ('{"id" : "' || gen_random_uuid() || 
      '", "timestamp" : "' || (now() + (n || ' microseconds')::INTERVAL) || 
      '", "flag1" :  ' || (case when (n % 2 = 0) then 'true' else 'false' end) || 
      ', "flag2" :  ' || (case when (n % 23 = 0) then 'true' else 'false' end) || 
      ', "string1" : "testa' || (n % 13) || 
      '", "string2" : "testb' || (n % 23) || 
      '"}')::json
FROM
(select generate_series(1, 100000) as n) AS x;

ANALYSE test;

Run the following queries, note the count in all cases is 4347:

SELECT COUNT(*) FROM test WHERE doc->'flag2' = 'true'::JSONB;
SELECT COUNT(*) FROM test WHERE doc->'string2' = '"testb0"'::JSONB;
SELECT COUNT(*) FROM test WHERE doc->'flag2' = 'true'::JSONB AND doc->'string2' = '"testb0"'::JSONB;

WHERE with 2 filters

Now run these same queries again, prefixing with EXPLAIN ANALYSE . Note that for the initial scan against the index test@doc_contents, the first two queries return actual row count: 4,347, while the last query returns actual row count: 8,694.

This same behaviour also occurs using the alternative syntax:

EXPLAIN ANALYSE SELECT COUNT(*) FROM test WHERE doc @> '{"flag2": true, "string2": "testb0"}';

WHERE with 3 filters

EXPLAIN ANALYSE SELECT COUNT(*) FROM test WHERE doc->'flag2' = 'true'::JSONB AND doc->'string2' = '"testb0"'::JSONB AND doc->'string1'  = '"testa0"'::JSONB;

The initial scan against test@doc_contents in the above query shows actual row count: 16,386, even though the number of rows returned by the query is only 334.

Expected behavior

As the filters are being ANDed together, the number of rows retrieved from the GIN index should decrease as as additional filters are added to the WHERE clause, not increase.

Running the equivalent queries against a Postgres 16 database, you can see that the correct number of rows is returned by the initial index scan :

test=> EXPLAIN ANALYSE SELECT COUNT(*) FROM test WHERE doc @> '{"flag2": true, "string2": "testb0"}';
                                                            QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2577.67..2577.68 rows=1 width=8) (actual time=7.121..7.122 rows=1 loops=1)
   ->  Bitmap Heap Scan on test  (cost=66.60..2572.62 rows=2020 width=0) (actual time=2.694..6.814 rows=4347 loops=1)
         Recheck Cond: (doc @> '{"flag2": true, "string2": "testb0"}'::jsonb)
         Heap Blocks: exact=2500
         ->  Bitmap Index Scan on test_doc_idx  (cost=0.00..66.10 rows=2020 width=0) (actual time=2.384..2.384 rows=4347 loops=1)
               Index Cond: (doc @> '{"flag2": true, "string2": "testb0"}'::jsonb)
 Planning Time: 0.115 ms
 Execution Time: 7.164 ms
(8 rows)

Additional data / screenshots

EXPLAIN ANALYSE output for CockroachDB:
cockroachdb-explain-analyse.txt

SQL to setup on PostgreSQL 16:
postgresql-setup-test.txt

EXPLAIN ANALYSE output for PostgreSQL:
postgresql-explain-analyse.txt

Environment:

  • CockroachDB version: v24.2.4
  • Server OS: Linux
  • Client app cockroach sql
[email protected]:26257/touchy-falcon-3164/defaultdb> select version();
                                                  version
-----------------------------------------------------------------------------------------------------------
  CockroachDB CCL v24.2.4 (x86_64-pc-linux-gnu, built 2024/10/14 17:20:56, go1.22.5 X:nocoverageredesign)
(1 row)     sql nodes: n1
            kv nodes: n2
Time: 9ms total (execution 0ms / network 8ms)

Additional context
This is a scenario I have created for testing, but it accurately describes a real problem I am seeing in production. While CockroachDB has JSON support, I am concerned about the performance of tables that use JSON datatypes in practice.

Jira issue: CRDB-44969

@cward-titanhq cward-titanhq added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Nov 27, 2024
Copy link

blathers-crl bot commented Nov 27, 2024

Hi @cward-titanhq, please add branch-* labels to identify which branch(es) this C-bug affects.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

Copy link

blathers-crl bot commented Nov 27, 2024

Hello, I am Blathers. I am here to help you get the issue triaged.

Hoot - a bug! Though bugs are the bane of my existence, rest assured the wretched thing will get the best of care here.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/kv (found keywords: kv)
  • @cockroachdb/sql-queries (found keywords: PLAN)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Nov 27, 2024
@rytaft rytaft added the T-sql-queries SQL Queries Team label Nov 27, 2024
@github-project-automation github-project-automation bot moved this to Triage in SQL Queries Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. O-community Originated from the community T-sql-queries SQL Queries Team X-blathers-triaged blathers was able to find an owner
Projects
Status: Triage
Development

No branches or pull requests

2 participants