Suspiciously costly join query #5439

Ralith · 2019-06-12T17:11:55Z

Description

Examining postgres state during periods of high I/O use on a homeserver where debilitating I/O contention is a regular cause of downtime reveals a possible culprit. The query

SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id='...'

was observed to be consuming the bulk of my I/O bandwidth for periods of at least several seconds, possibly much longer. The query plan

synapse=# EXPLAIN SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id = 'foo';
                                          QUERY PLAN                                           
-----------------------------------------------------------------------------------------------
 Aggregate  (cost=99963.21..99963.22 rows=1 width=0)
   ->  Nested Loop  (cost=0.56..99963.20 rows=5 width=0)
         ->  Seq Scan on state_events  (cost=0.00..70783.26 rows=3445 width=64)
               Filter: (room_id = 'foo'::text)
         ->  Index Scan using events_event_id_key on events  (cost=0.56..8.46 rows=1 width=64)
               Index Cond: (event_id = state_events.event_id)
               Filter: (room_id = 'foo'::text)

suggests to my inexpert analysis that the entire state_events table is being read into memory, which certainly seems like an unreasonable cost. Should state_events have an index on room_id?

#4877 came to mind, but between myself and @richvdh we did not identify any obvious schema errors.

Steps to reproduce

Watch iotop for high I/O throughput from postgres (perhaps after startup?)
Run SELECT * FROM pg_stat_activity WHERE state = 'active'; to observe the query in question
Run EXPLAIN SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id = 'foo'; to observe costly query plan

Version information

Homeserver: ralith.com
Version: 1.0.0
Install method: NixOS module
Platform: Low-end dedicated server

The text was updated successfully, but these errors were encountered:

Ralith · 2019-06-12T17:19:00Z

An artificial test on the Synapse Admins room took 42 seconds to complete, saturating I/O for most of it:

synapse=# EXPLAIN ANALYZE SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id = '!HsxjoYRFsDtWBgDQPh:matrix.org';
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=99963.21..99963.22 rows=1 width=0) (actual time=42107.331..42107.331 rows=1 loops=1)
   ->  Nested Loop  (cost=0.56..99963.20 rows=5 width=0) (actual time=909.018..42096.491 rows=7265 loops=1)
         ->  Seq Scan on state_events  (cost=0.00..70783.26 rows=3445 width=64) (actual time=896.295..2446.713 rows=7265 loops=1)
               Filter: (room_id = '!HsxjoYRFsDtWBgDQPh:matrix.org'::text)
               Rows Removed by Filter: 2291244
         ->  Index Scan using events_event_id_key on events  (cost=0.56..8.46 rows=1 width=64) (actual time=5.426..5.451 rows=1 loops=7265)
               Index Cond: (event_id = state_events.event_id)
               Filter: (room_id = '!HsxjoYRFsDtWBgDQPh:matrix.org'::text)
 Planning time: 0.998 ms
 Execution time: 42107.418 ms

ara4n · 2019-06-12T17:25:01Z

possibly related: #5064

richvdh · 2020-07-01T15:24:35Z

is this still an issue?

SELECT COUNT(*) FROM state_events INNER JOIN events USING (room_id, event_id) WHERE room_id='...'

this is counting all of the state events that have ever happened in that room - which is a very odd thing to do.

richvdh · 2021-04-09T14:30:57Z

I'm going to assume it's no longer an issue.

neilisfragile added z-p2 (Deprecated Label) A-Performance Performance, both client-facing and admin-facing labels Jun 20, 2019

richvdh added the info-needed label Jul 1, 2020

richvdh closed this as completed Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suspiciously costly join query #5439

Suspiciously costly join query #5439

Ralith commented Jun 12, 2019

Ralith commented Jun 12, 2019

ara4n commented Jun 12, 2019

richvdh commented Jul 1, 2020

richvdh commented Apr 9, 2021

Suspiciously costly join query #5439

Suspiciously costly join query #5439

Comments

Ralith commented Jun 12, 2019

Description

Steps to reproduce

Version information

Ralith commented Jun 12, 2019

ara4n commented Jun 12, 2019

richvdh commented Jul 1, 2020

richvdh commented Apr 9, 2021