Fix handling of multiple settlements in same block #447

vkgnosis · 2022-08-11T12:36:05Z

Fixes #89 .

This is somewhat different from sketch in the comment because that approach wasn't valid sql and not entirely correct.
The previous test wrongly inserted the settlement event before the trade event.
We forgot to add an index on the settlements table by tx hash which I have fixed too. This doesn't influence the correctness of the code change.

Test Plan

CI, adapted test

database/sql/V028__index_settlements_tx_hash.sql

MartinquaXD

The query looks correct to me and well commented. Also the tests are very thorough. 👍

The only question I have is if that query is performance sensitive?
The new query looks considerably more complex than the old one and if performance matters here benchmarking the new query would make sense. I don't know how much wiggle room there is for improving the query (if it even is necessary) but maybe someone from the analytics team has some ideas.

vkgnosis · 2022-08-12T09:11:59Z

From a theoretical big O point of view the new query is not more complex than the old one. All data is retrievable through indexes. Considering we previously did not have an index on the tx hash this change should make the query much faster than the old one.
This is all assuming the postgres query planner doesn't do something unexpected. This can be checked with EXPLAIN ANALYZE but I can't properly do that until I have the index in the real database (or I'd need to download the real database but I don't know how to do that).

MartinquaXD

Out of curiosity I used EXPLAIN for the old and the new query. I think EXPLAIN ANALYZE didn't make sense in my local setup because the DB was empty anyway.
According to the postgres execution planner the cost of executing these queries is expected to be very similar:

old

"QUERY PLAN"
"Nested Loop  (cost=19.57..37.30 rows=7 width=32)"
"  ->  Hash Join  (cost=19.43..35.14 rows=8 width=32)"
"        Hash Cond: (t.block_number = s.block_number)"
"        ->  Seq Scan on trades t  (cost=0.00..14.10 rows=410 width=40)"
"        ->  Hash  (cost=19.38..19.38 rows=4 width=8)"
"              ->  Seq Scan on settlements s  (cost=0.00..19.38 rows=4 width=8)"
"                    Filter: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
"  ->  Index Only Scan using orders_pkey on orders o  (cost=0.14..0.27 rows=1 width=32)"
"        Index Cond: (uid = t.order_uid)"

new

"QUERY PLAN"
"Nested Loop  (cost=21.29..37.43 rows=1 width=32)"
"  CTE settlement"
"    ->  Bitmap Heap Scan on settlements  (cost=4.03..12.49 rows=4 width=16)"
"          Recheck Cond: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
"          ->  Bitmap Index Scan on settlements_tx_hash  (cost=0.00..4.03 rows=4 width=0)"
"                Index Cond: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
"  InitPlan 2 (returns $1)"
"    ->  CTE Scan on settlement  (cost=0.00..0.08 rows=4 width=8)"
"  InitPlan 5 (returns $5)"
"    ->  Aggregate  (cost=8.33..8.34 rows=1 width=8)"
"          InitPlan 3 (returns $2)"
"            ->  CTE Scan on settlement settlement_1  (cost=0.00..0.08 rows=4 width=8)"
"          InitPlan 4 (returns $3)"
"            ->  CTE Scan on settlement settlement_2  (cost=0.00..0.08 rows=4 width=8)"
"          ->  Index Only Scan using settlements_pkey on settlements settlements_1  (cost=0.15..8.17 rows=1 width=8)"
"                Index Cond: ((block_number = $2) AND (log_index < $3))"
"  InitPlan 6 (returns $6)"
"    ->  CTE Scan on settlement settlement_3  (cost=0.00..0.08 rows=4 width=8)"
"  ->  Index Scan using trades_pkey on trades t  (cost=0.15..8.17 rows=1 width=32)"
"        Index Cond: ((block_number = $1) AND (log_index >= $5) AND (log_index <= $6))"
"  ->  Index Only Scan using orders_pkey on orders o  (cost=0.14..8.16 rows=1 width=32)"
"        Index Cond: (uid = t.order_uid)"

I'm not sure how much the results might change with a populated DB but the complexity of the new query indeed appears to be very well mitigated by the new index.

vkgnosis · 2022-08-12T10:12:01Z

before update old query:

 Nested Loop  (cost=1000.84..5843.47 rows=1 width=57) (actual time=159.882..159.979 rows=1 loops=1)
   ->  Nested Loop  (cost=1000.42..5843.01 rows=1 width=57) (actual time=159.865..159.948 rows=3 loops=1)
         ->  Gather  (cost=1000.00..5832.77 rows=1 width=8) (actual time=159.836..159.916 rows=1 loops=1)
               Workers Planned: 2
               Workers Launched: 2
               ->  Parallel Seq Scan on settlements s  (cost=0.00..4832.67 rows=1 width=8) (actual time=66.547..67.928 rows=0 loops=3)
                     Filter: (tx_hash = '\x9e60d145aacf8d1f2ad46231b636a8a10a19f87498a31e1bb10ab4a1ceb41d2c'::bytea)
                     Rows Removed by Filter: 83661
         ->  Index Scan using trades_pkey on trades t  (cost=0.42..10.22 rows=2 width=65) (actual time=0.023..0.024 rows=3 loops=1)
               Index Cond: (block_number = s.block_number)
   ->  Index Only Scan using orders_pkey on orders o  (cost=0.41..0.46 rows=1 width=57) (actual time=0.008..0.008 rows=0 loops=3)
         Index Cond: (uid = t.order_uid)
         Heap Fetches: 1
 Planning Time: 0.506 ms
 Execution Time: 160.032 ms

after update old query:

 Nested Loop  (cost=0.83..18.71 rows=1 width=57) (actual time=0.023..0.035 rows=1 loops=1)
   ->  Nested Loop  (cost=0.42..18.25 rows=1 width=57) (actual time=0.015..0.018 rows=3 loops=1)
         ->  Index Scan using settlements_tx_hash on settlements s  (cost=0.00..8.02 rows=1 width=8) (actual time=0.009..0.009 rows=1 loops=1)
               Index Cond: (tx_hash = '\x9e60d145aacf8d1f2ad46231b636a8a10a19f87498a31e1bb10ab4a1ceb41d2c'::bytea)
         ->  Index Scan using trades_pkey on trades t  (cost=0.42..10.22 rows=2 width=65) (actual time=0.005..0.007 rows=3 loops=1)
               Index Cond: (block_number = s.block_number)
   ->  Index Only Scan using orders_pkey on orders o  (cost=0.41..0.46 rows=1 width=57) (actual time=0.005..0.005 rows=0 loops=3)
         Index Cond: (uid = t.order_uid)
         Heap Fetches: 1
 Planning Time: 0.348 ms
 Execution Time: 0.054 ms

after update new query:

 Nested Loop  (cost=17.38..33.43 rows=1 width=57) (actual time=0.046..0.048 rows=0 loops=1)
   CTE settlement
     ->  Index Scan using settlements_tx_hash on settlements  (cost=0.00..8.02 rows=1 width=16) (actual time=0.008..0.008 rows=1 loops=1)
           Index Cond: (tx_hash = '\x9e60d145aacf8d1f2ad46231b636a8a10a19f87498a31e1bb10ab4a1ceb41d2c'::bytea)
   InitPlan 2 (returns $1)
     ->  CTE Scan on settlement  (cost=0.00..0.02 rows=1 width=8) (actual time=0.010..0.010 rows=1 loops=1)
   InitPlan 5 (returns $5)
     ->  Aggregate  (cost=8.48..8.49 rows=1 width=8) (actual time=0.013..0.014 rows=1 loops=1)
           InitPlan 3 (returns $2)
             ->  CTE Scan on settlement settlement_1  (cost=0.00..0.02 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
           InitPlan 4 (returns $3)
             ->  CTE Scan on settlement settlement_2  (cost=0.00..0.02 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
           ->  Index Only Scan using settlements_pkey on settlements settlements_1  (cost=0.42..8.44 rows=1 width=8) (actual time=0.010..0.011 rows=1 loops=1)
                 Index Cond: ((block_number = $2) AND (log_index < $3))
                 Heap Fetches: 1
   InitPlan 6 (returns $6)
     ->  CTE Scan on settlement settlement_3  (cost=0.00..0.02 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=1)
   ->  Index Scan using trades_pkey on trades t  (cost=0.42..8.44 rows=1 width=57) (actual time=0.034..0.035 rows=2 loops=1)
         Index Cond: ((block_number = $1) AND (log_index >= $5) AND (log_index <= $6))
   ->  Index Only Scan using orders_pkey on orders o  (cost=0.41..8.43 rows=1 width=57) (actual time=0.005..0.005 rows=0 loops=2)
         Index Cond: (uid = t.order_uid)
         Heap Fetches: 0
 Planning Time: 0.344 ms
 Execution Time: 0.084 ms

nlordell · 2022-08-12T13:56:31Z

Nice! Looks like the new query is way faster than the old one pre-index.

Fix handling of multiple settlements in same block

d269aa2

vkgnosis requested a review from a team as a code owner August 11, 2022 12:36

nlordell reviewed Aug 11, 2022

View reviewed changes

database/sql/V028__index_settlements_tx_hash.sql Show resolved Hide resolved

MartinquaXD reviewed Aug 12, 2022

View reviewed changes

nlordell approved these changes Aug 12, 2022

View reviewed changes

MartinquaXD approved these changes Aug 12, 2022

View reviewed changes

vkgnosis enabled auto-merge (rebase) August 12, 2022 09:48

Merge branch 'main' into tx-sql

68ce1d4

vkgnosis disabled auto-merge August 12, 2022 09:48

vkgnosis enabled auto-merge (squash) August 12, 2022 09:48

vkgnosis merged commit 0bf98bf into main Aug 12, 2022

vkgnosis deleted the tx-sql branch August 12, 2022 09:52

github-actions bot locked and limited conversation to collaborators Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of multiple settlements in same block #447

Fix handling of multiple settlements in same block #447

vkgnosis commented Aug 11, 2022

MartinquaXD left a comment

vkgnosis commented Aug 12, 2022

MartinquaXD left a comment

vkgnosis commented Aug 12, 2022

nlordell commented Aug 12, 2022

Fix handling of multiple settlements in same block #447

Fix handling of multiple settlements in same block #447

Conversation

vkgnosis commented Aug 11, 2022

Test Plan

MartinquaXD left a comment

Choose a reason for hiding this comment

vkgnosis commented Aug 12, 2022

MartinquaXD left a comment

Choose a reason for hiding this comment

vkgnosis commented Aug 12, 2022

nlordell commented Aug 12, 2022