Skip to content
This repository was archived by the owner on Apr 26, 2024. It is now read-only.

Use a query that postgresql optimises better for get_events_around #906

Merged
merged 5 commits into from
Jul 5, 2016

Conversation

NegativeMjark
Copy link
Contributor

No description provided.

@erikjohnston
Copy link
Member

LGTM

@NegativeMjark
Copy link
Contributor Author

@erikjohnston Having looked at how postgres handles the UNION all version I'm less sure that it's actually doing what we want here. It turns out that postgres really wants you to format things as (a,b) < (x,y) to use multi-column indexes.

I've switched everything to use lower_bound/upper_bound from the streams.py and made those functions switch on the database engine to decide which format of SQL to use :(

@erikjohnston
Copy link
Member

Having looked at how postgres handles the UNION all version I'm less sure that it's actually doing what we want here.

Can you elaborate?

token.topological, "topological_ordering",
token.topological, "topological_ordering",
token.stream, "stream_ordering",
token.stream, inclusive, "stream_ordering",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not vulnerable to SQL injection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be if inclusive was ever specified by the user.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made inclusive a boolean to make it a bit clearer. d44d11d

@NegativeMjark
Copy link
Contributor Author

NegativeMjark commented Jul 5, 2016

Can you elaborate?

When I ran that query in a test harness it resulted in a scan over half the index.

DROP TABLE IF EXISTS t;
CREATE TABLE t (c INT, a INT, b INT);

INSERT INTO t (c,a,b) SELECT 0, data.*, data.* FROM
    (SELECT * FROM generate_series(1,1000000)) AS data;

CREATE INDEX t_c_a_b ON t USING btree (c,a,b);

ANALYZE t;

EXPLAIN ANALYSE
        SELECT a,b FROM t WHERE c = 0 AND a < 500000
    UNION ALL
        SELECT a,b FROM t WHERE c = 0 AND a = 500000 AND b < 100
    ORDER BY a DESC, b DESC LIMIT 1;

Resulted in:

                                                             QUERY PLAN                                                             
------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=24416.59..24416.59 rows=1 width=8) (actual time=123.093..123.094 rows=1 loops=1)
   ->  Sort  (cost=24416.59..25663.78 rows=498876 width=8) (actual time=123.093..123.093 rows=1 loops=1)
         Sort Key: t.a DESC, t.b DESC
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Append  (cost=0.00..21922.21 rows=498876 width=8) (actual time=0.009..91.581 rows=499999 loops=1)
               ->  Seq Scan on t  (cost=0.00..16925.00 rows=498875 width=8) (actual time=0.009..71.667 rows=499999 loops=1)
                     Filter: (a < 500000)
                     Rows Removed by Filter: 500001
               ->  Index Only Scan using t_a_b on t t_1  (cost=0.42..8.45 rows=1 width=8) (actual time=0.009..0.009 rows=0 loops=1)
                     Index Cond: ((a = 500000) AND (b < 100))
                     Heap Fetches: 0
 Planning time: 0.050 ms
 Execution time: 123.109 ms
(13 rows)

@erikjohnston
Copy link
Member

This presumably still has terrible performance on SQLite?

@NegativeMjark
Copy link
Contributor Author

This presumably still has terrible performance on SQLite?

Yes. Interestingly SQLite does a reasonable job of optimising the UNION ALL version of the query.

sqlite> EXPLAIN query plan  
   ...> SELECT a,b FROM t WHERE c = 0 AND (a < 50000 OR (a = 50000 AND b < 100))
   ...> ORDER BY a DESC, b DESC LIMIT 1;
0|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=?)
sqlite> EXPLAIN query plan
   ...>         SELECT a,b FROM t WHERE c = 0 AND a < 50000
   ...>     UNION ALL
   ...>         SELECT a,b FROM t WHERE c = 0 AND a = 50000 AND b < 100
   ...>     ORDER BY a DESC, b DESC LIMIT 1;
1|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a<?)
2|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a=? AND b<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)

@NegativeMjark
Copy link
Contributor Author

So for a sqlite3 database only the UNION ALL form has the desired performance :(

sqlite> EXPLAIN query plan SELECT a,b FROM t WHERE c = 0 AND (a < 500000 OR (a = 500000 AND b < 100)) ORDER BY a DESC, b DESC LIMIT 1;
0|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=?)

sqlite> EXPLAIN query plan SELECT a,b FROM t WHERE (c = 0 AND a < 500000) OR (c = 0 AND a = 500000 AND b < 100) ORDER BY a DESC, b DESC LIMIT 1; 
0|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a<?)
0|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a=? AND b<?)
0|0|0|USE TEMP B-TREE FOR ORDER BY

sqlite> EXPLAIN query plan SELECT a,b FROM t WHERE (c = 0 AND a < 500000) UNION ALL SELECT a,b FROM t WHERE (c = 0 AND a = 500000 AND b < 100) ORDER BY a DESC, b DESC LIMIT 1;
1|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a<?)
2|0|0|SEARCH TABLE t USING COVERING INDEX t_c_a_b (c=? AND a=? AND b<?)
0|0|0|COMPOUND SUBQUERIES 1 AND 2 (UNION ALL)
In [47]: %time f("SELECT a,b FROM t WHERE c = 0 AND (a < 500000 OR (a = 500000 AND b < 100)) ORDER BY a DESC, b DESC LIMIT 1;", ())
CPU times: user 61.6 ms, sys: 7 ms, total: 68.6 ms
Wall time: 68 ms
Out[47]: [(499999, 499999)]

In [48]: %time f("SELECT a,b FROM t WHERE (c = 0 AND a < 500000) OR (c = 0 AND a = 500000 AND b < 100) ORDER BY a DESC, b DESC LIMIT 1;", ())
CPU times: user 279 ms, sys: 8.64 ms, total: 287 ms
Wall time: 287 ms
Out[48]: [(499999, 499999)]

in [49]: %time f("SELECT a,b FROM t WHERE (c = 0 AND a < 500000) UNION ALL SELECT a,b FROM t WHERE (c = 0 AND a = 500000 AND b < 100) ORDER BY a DESC, b DESC LIMIT 1;", ())
CPU times: user 368 us, sys: 75 us, total: 443 us
Wall time: 268 us
Out[49]: [(499999, 499999)]

@NegativeMjark
Copy link
Contributor Author

Complete test harness for postgresql fwiw

btree_desc.sql.txt

@NegativeMjark
Copy link
Contributor Author

@erikjohnston PTAL

@erikjohnston
Copy link
Member

LGTM

@NegativeMjark NegativeMjark merged commit 04dee11 into develop Jul 5, 2016
@richvdh richvdh deleted the markjh/faster_events_around branch December 1, 2016 14:09
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants