Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling of multiple settlements in same block #447

Merged
merged 2 commits into from
Aug 12, 2022
Merged

Fix handling of multiple settlements in same block #447

merged 2 commits into from
Aug 12, 2022

Conversation

vkgnosis
Copy link
Contributor

Fixes #89 .

  • This is somewhat different from sketch in the comment because that approach wasn't valid sql and not entirely correct.
  • The previous test wrongly inserted the settlement event before the trade event.
  • We forgot to add an index on the settlements table by tx hash which I have fixed too. This doesn't influence the correctness of the code change.

Test Plan

CI, adapted test

@vkgnosis vkgnosis requested a review from a team as a code owner August 11, 2022 12:36
Copy link
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The query looks correct to me and well commented. Also the tests are very thorough. 👍

The only question I have is if that query is performance sensitive?
The new query looks considerably more complex than the old one and if performance matters here benchmarking the new query would make sense. I don't know how much wiggle room there is for improving the query (if it even is necessary) but maybe someone from the analytics team has some ideas.

@vkgnosis
Copy link
Contributor Author

From a theoretical big O point of view the new query is not more complex than the old one. All data is retrievable through indexes. Considering we previously did not have an index on the tx hash this change should make the query much faster than the old one.
This is all assuming the postgres query planner doesn't do something unexpected. This can be checked with EXPLAIN ANALYZE but I can't properly do that until I have the index in the real database (or I'd need to download the real database but I don't know how to do that).

Copy link
Contributor

@MartinquaXD MartinquaXD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity I used EXPLAIN for the old and the new query. I think EXPLAIN ANALYZE didn't make sense in my local setup because the DB was empty anyway.
According to the postgres execution planner the cost of executing these queries is expected to be very similar:

old
"QUERY PLAN"
"Nested Loop  (cost=19.57..37.30 rows=7 width=32)"
"  ->  Hash Join  (cost=19.43..35.14 rows=8 width=32)"
"        Hash Cond: (t.block_number = s.block_number)"
"        ->  Seq Scan on trades t  (cost=0.00..14.10 rows=410 width=40)"
"        ->  Hash  (cost=19.38..19.38 rows=4 width=8)"
"              ->  Seq Scan on settlements s  (cost=0.00..19.38 rows=4 width=8)"
"                    Filter: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
"  ->  Index Only Scan using orders_pkey on orders o  (cost=0.14..0.27 rows=1 width=32)"
"        Index Cond: (uid = t.order_uid)"
new
"QUERY PLAN"
"Nested Loop  (cost=21.29..37.43 rows=1 width=32)"
"  CTE settlement"
"    ->  Bitmap Heap Scan on settlements  (cost=4.03..12.49 rows=4 width=16)"
"          Recheck Cond: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
"          ->  Bitmap Index Scan on settlements_tx_hash  (cost=0.00..4.03 rows=4 width=0)"
"                Index Cond: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
"  InitPlan 2 (returns $1)"
"    ->  CTE Scan on settlement  (cost=0.00..0.08 rows=4 width=8)"
"  InitPlan 5 (returns $5)"
"    ->  Aggregate  (cost=8.33..8.34 rows=1 width=8)"
"          InitPlan 3 (returns $2)"
"            ->  CTE Scan on settlement settlement_1  (cost=0.00..0.08 rows=4 width=8)"
"          InitPlan 4 (returns $3)"
"            ->  CTE Scan on settlement settlement_2  (cost=0.00..0.08 rows=4 width=8)"
"          ->  Index Only Scan using settlements_pkey on settlements settlements_1  (cost=0.15..8.17 rows=1 width=8)"
"                Index Cond: ((block_number = $2) AND (log_index < $3))"
"  InitPlan 6 (returns $6)"
"    ->  CTE Scan on settlement settlement_3  (cost=0.00..0.08 rows=4 width=8)"
"  ->  Index Scan using trades_pkey on trades t  (cost=0.15..8.17 rows=1 width=32)"
"        Index Cond: ((block_number = $1) AND (log_index >= $5) AND (log_index <= $6))"
"  ->  Index Only Scan using orders_pkey on orders o  (cost=0.14..8.16 rows=1 width=32)"
"        Index Cond: (uid = t.order_uid)"

I'm not sure how much the results might change with a populated DB but the complexity of the new query indeed appears to be very well mitigated by the new index.

@vkgnosis vkgnosis enabled auto-merge (rebase) August 12, 2022 09:48
@vkgnosis vkgnosis disabled auto-merge August 12, 2022 09:48
@vkgnosis vkgnosis enabled auto-merge (squash) August 12, 2022 09:48
@vkgnosis vkgnosis merged commit 0bf98bf into main Aug 12, 2022
@vkgnosis vkgnosis deleted the tx-sql branch August 12, 2022 09:52
@github-actions github-actions bot locked and limited conversation to collaborators Aug 12, 2022
@vkgnosis
Copy link
Contributor Author

before update old query:

 Nested Loop  (cost=1000.84..5843.47 rows=1 width=57) (actual time=159.882..159.979 rows=1 loops=1)
   ->  Nested Loop  (cost=1000.42..5843.01 rows=1 width=57) (actual time=159.865..159.948 rows=3 loops=1)
         ->  Gather  (cost=1000.00..5832.77 rows=1 width=8) (actual time=159.836..159.916 rows=1 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on settlements s  (cost=0.00..4832.67 rows=1 width=8) (actual time=66.547..67.928 rows=0 loops=3)
                     Filter: (tx_hash = '\x9e60d145aacf8d1f2ad46231b636a8a10a19f87498a31e1bb10ab4a1ceb41d2c'::bytea)
                     Rows Removed by Filter: 83661
         ->  Index Scan using trades_pkey on trades t  (cost=0.42..10.22 rows=2 width=65) (actual time=0.023..0.024 rows=3 loops=1)
               Index Cond: (block_number = s.block_number)
   ->  Index Only Scan using orders_pkey on orders o  (cost=0.41..0.46 rows=1 width=57) (actual time=0.008..0.008 rows=0 loops=3)
         Index Cond: (uid = t.order_uid)
         Heap Fetches: 1
 Planning Time: 0.506 ms
 Execution Time: 160.032 ms

after update old query:

 Nested Loop  (cost=0.83..18.71 rows=1 width=57) (actual time=0.023..0.035 rows=1 loops=1)
   ->  Nested Loop  (cost=0.42..18.25 rows=1 width=57) (actual time=0.015..0.018 rows=3 loops=1)
         ->  Index Scan using settlements_tx_hash on settlements s  (cost=0.00..8.02 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=1)
               Index Cond: (tx_hash = '\x9e60d145aacf8d1f2ad46231b636a8a10a19f87498a31e1bb10ab4a1ceb41d2c'::bytea)
         ->  Index Scan using trades_pkey on trades t  (cost=0.42..10.22 rows=2 width=65) (actual time=0.005..0.007 rows=3 loops=1)
               Index Cond: (block_number = s.block_number)
   ->  Index Only Scan using orders_pkey on orders o  (cost=0.41..0.46 rows=1 width=57) (actual time=0.005..0.005 rows=0 loops=3)
         Index Cond: (uid = t.order_uid)
         Heap Fetches: 1
 Planning Time: 0.348 ms
 Execution Time: 0.054 ms

after update new query:

 Nested Loop  (cost=17.38..33.43 rows=1 width=57) (actual time=0.046..0.048 rows=0 loops=1)
   CTE settlement
     ->  Index Scan using settlements_tx_hash on settlements  (cost=0.00..8.02 rows=1 width=16) (actual time=0.008..0.008 rows=1 loops=1)
           Index Cond: (tx_hash = '\x9e60d145aacf8d1f2ad46231b636a8a10a19f87498a31e1bb10ab4a1ceb41d2c'::bytea)
   InitPlan 2 (returns $1)
     ->  CTE Scan on settlement  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
   InitPlan 5 (returns $5)
     ->  Aggregate  (cost=8.48..8.49 rows=1 width=8) (actual time=0.013..0.014 rows=1 loops=1)
           InitPlan 3 (returns $2)
             ->  CTE Scan on settlement settlement_1  (cost=0.00..0.02 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
           InitPlan 4 (returns $3)
             ->  CTE Scan on settlement settlement_2  (cost=0.00..0.02 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
           ->  Index Only Scan using settlements_pkey on settlements settlements_1  (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1 loops=1)
                 Index Cond: ((block_number = $2) AND (log_index < $3))
                 Heap Fetches: 1
   InitPlan 6 (returns $6)
     ->  CTE Scan on settlement settlement_3  (cost=0.00..0.02 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
   ->  Index Scan using trades_pkey on trades t  (cost=0.42..8.44 rows=1 width=57) (actual time=0.034..0.035 rows=2 loops=1)
         Index Cond: ((block_number = $1) AND (log_index >= $5) AND (log_index <= $6))
   ->  Index Only Scan using orders_pkey on orders o  (cost=0.41..8.43 rows=1 width=57) (actual time=0.005..0.005 rows=0 loops=2)
         Index Cond: (uid = t.order_uid)
         Heap Fetches: 0
 Planning Time: 0.344 ms
 Execution Time: 0.084 ms

@nlordell
Copy link
Contributor

Nice! Looks like the new query is way faster than the old one pre-index.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Postgres: Appropriate Handling of Multiple Settlements in Single Block
3 participants