-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix handling of multiple settlements in same block #447
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The query looks correct to me and well commented. Also the tests are very thorough. 👍
The only question I have is if that query is performance sensitive?
The new query looks considerably more complex than the old one and if performance matters here benchmarking the new query would make sense. I don't know how much wiggle room there is for improving the query (if it even is necessary) but maybe someone from the analytics team has some ideas.
From a theoretical big O point of view the new query is not more complex than the old one. All data is retrievable through indexes. Considering we previously did not have an index on the tx hash this change should make the query much faster than the old one. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity I used EXPLAIN
for the old and the new query. I think EXPLAIN ANALYZE
didn't make sense in my local setup because the DB was empty anyway.
According to the postgres execution planner the cost of executing these queries is expected to be very similar:
old
"QUERY PLAN"
"Nested Loop (cost=19.57..37.30 rows=7 width=32)"
" -> Hash Join (cost=19.43..35.14 rows=8 width=32)"
" Hash Cond: (t.block_number = s.block_number)"
" -> Seq Scan on trades t (cost=0.00..14.10 rows=410 width=40)"
" -> Hash (cost=19.38..19.38 rows=4 width=8)"
" -> Seq Scan on settlements s (cost=0.00..19.38 rows=4 width=8)"
" Filter: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
" -> Index Only Scan using orders_pkey on orders o (cost=0.14..0.27 rows=1 width=32)"
" Index Cond: (uid = t.order_uid)"
new
"QUERY PLAN"
"Nested Loop (cost=21.29..37.43 rows=1 width=32)"
" CTE settlement"
" -> Bitmap Heap Scan on settlements (cost=4.03..12.49 rows=4 width=16)"
" Recheck Cond: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
" -> Bitmap Index Scan on settlements_tx_hash (cost=0.00..4.03 rows=4 width=0)"
" Index Cond: (tx_hash = '\xbdc6fc9d766ae2b936a43d76a81d9aac1f9c4adb4f04142db0336a953c34f393'::bytea)"
" InitPlan 2 (returns $1)"
" -> CTE Scan on settlement (cost=0.00..0.08 rows=4 width=8)"
" InitPlan 5 (returns $5)"
" -> Aggregate (cost=8.33..8.34 rows=1 width=8)"
" InitPlan 3 (returns $2)"
" -> CTE Scan on settlement settlement_1 (cost=0.00..0.08 rows=4 width=8)"
" InitPlan 4 (returns $3)"
" -> CTE Scan on settlement settlement_2 (cost=0.00..0.08 rows=4 width=8)"
" -> Index Only Scan using settlements_pkey on settlements settlements_1 (cost=0.15..8.17 rows=1 width=8)"
" Index Cond: ((block_number = $2) AND (log_index < $3))"
" InitPlan 6 (returns $6)"
" -> CTE Scan on settlement settlement_3 (cost=0.00..0.08 rows=4 width=8)"
" -> Index Scan using trades_pkey on trades t (cost=0.15..8.17 rows=1 width=32)"
" Index Cond: ((block_number = $1) AND (log_index >= $5) AND (log_index <= $6))"
" -> Index Only Scan using orders_pkey on orders o (cost=0.14..8.16 rows=1 width=32)"
" Index Cond: (uid = t.order_uid)"
I'm not sure how much the results might change with a populated DB but the complexity of the new query indeed appears to be very well mitigated by the new index.
before update old query:
after update old query:
after update new query:
|
Nice! Looks like the new query is way faster than the old one pre-index. |
Fixes #89 .
Test Plan
CI, adapted test