Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Suspiciously costly join query #5439

Closed
Ralith opened this issue Jun 12, 2019 · 4 comments
Closed

Suspiciously costly join query #5439

Ralith opened this issue Jun 12, 2019 · 4 comments
Labels
A-Performance Performance, both client-facing and admin-facing z-p2 (Deprecated Label)

Comments

@Ralith
Copy link
Contributor

Ralith commented Jun 12, 2019

Description

Examining postgres state during periods of high I/O use on a homeserver where debilitating I/O contention is a regular cause of downtime reveals a possible culprit. The query

SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id='...'

was observed to be consuming the bulk of my I/O bandwidth for periods of at least several seconds, possibly much longer. The query plan

synapse=# EXPLAIN SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id = 'foo';
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=99963.21..99963.22 rows=1 width=0)
   ->  Nested Loop  (cost=0.56..99963.20 rows=5 width=0)
         ->  Seq Scan on state_events  (cost=0.00..70783.26 rows=3445 width=64)
               Filter: (room_id = 'foo'::text)
         ->  Index Scan using events_event_id_key on events  (cost=0.56..8.46 rows=1 width=64)
               Index Cond: (event_id = state_events.event_id)
               Filter: (room_id = 'foo'::text)

suggests to my inexpert analysis that the entire state_events table is being read into memory, which certainly seems like an unreasonable cost. Should state_events have an index on room_id?

#4877 came to mind, but between myself and @richvdh we did not identify any obvious schema errors.

Steps to reproduce

  • Watch iotop for high I/O throughput from postgres (perhaps after startup?)
  • Run SELECT * FROM pg_stat_activity WHERE state = 'active'; to observe the query in question
  • Run EXPLAIN SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id = 'foo'; to observe costly query plan

Version information

  • Homeserver: ralith.com
  • Version: 1.0.0
  • Install method: NixOS module
  • Platform: Low-end dedicated server
@Ralith
Copy link
Contributor Author

Ralith commented Jun 12, 2019

An artificial test on the Synapse Admins room took 42 seconds to complete, saturating I/O for most of it:

synapse=# EXPLAIN ANALYZE SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id = '!HsxjoYRFsDtWBgDQPh:matrix.org';
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=99963.21..99963.22 rows=1 width=0) (actual time=42107.331..42107.331 rows=1 loops=1)
   ->  Nested Loop  (cost=0.56..99963.20 rows=5 width=0) (actual time=909.018..42096.491 rows=7265 loops=1)
         ->  Seq Scan on state_events  (cost=0.00..70783.26 rows=3445 width=64) (actual time=896.295..2446.713 rows=7265 loops=1)
               Filter: (room_id = '!HsxjoYRFsDtWBgDQPh:matrix.org'::text)
               Rows Removed by Filter: 2291244
         ->  Index Scan using events_event_id_key on events  (cost=0.56..8.46 rows=1 width=64) (actual time=5.426..5.451 rows=1 loops=7265)
               Index Cond: (event_id = state_events.event_id)
               Filter: (room_id = '!HsxjoYRFsDtWBgDQPh:matrix.org'::text)
 Planning time: 0.998 ms
 Execution time: 42107.418 ms

@ara4n
Copy link
Member

ara4n commented Jun 12, 2019

possibly related: #5064

@neilisfragile neilisfragile added z-p2 (Deprecated Label) A-Performance Performance, both client-facing and admin-facing labels Jun 20, 2019
@richvdh
Copy link
Member

richvdh commented Jul 1, 2020

is this still an issue?

SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id='...'

this is counting all of the state events that have ever happened in that room - which is a very odd thing to do.

@richvdh
Copy link
Member

richvdh commented Apr 9, 2021

I'm going to assume it's no longer an issue.

@richvdh richvdh closed this as completed Apr 9, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Performance Performance, both client-facing and admin-facing z-p2 (Deprecated Label)
Projects
None yet
Development

No branches or pull requests

4 participants