Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

have_seen_event cache is not invalidated when we persist an event #13856

Closed
Tracked by #14284
MadLittleMods opened this issue Sep 21, 2022 · 6 comments · Fixed by #13863
Closed
Tracked by #14284

have_seen_event cache is not invalidated when we persist an event #13856

MadLittleMods opened this issue Sep 21, 2022 · 6 comments · Fixed by #13863
Assignees
Labels
A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) A-Performance Performance, both client-facing and admin-facing O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@MadLittleMods
Copy link
Contributor

MadLittleMods commented Sep 21, 2022

The have_seen_event cache is not invalidated when we persist an event.

This is evident when a /messages request triggers a /backfill and it claims that we have the same missing state and auth events each time we _process_pulled_event from the backfill response even though we just fetched and persisted those same state and auth events from the previous event we processed.

I see that we have one spot in Synapse where we invalidate the have_seen_event cache (besides when we purge a room) but if I add a log line to _invalidate_caches_for_event, it never runs in my Complement tests. Does this have a bug where the CacheInvalidationWorkerStore stuff never runs in monolith mode?

def _invalidate_caches_for_event(
self,
stream_ordering: int,
event_id: str,
room_id: str,
etype: str,
state_key: Optional[str],
redacts: Optional[str],
relates_to: Optional[str],
backfilled: bool,
) -> None:
# This invalidates any local in-memory cached event objects, the original
# process triggering the invalidation is responsible for clearing any external
# cached objects.
self._invalidate_local_get_event_cache(event_id)
self.have_seen_event.invalidate((room_id, event_id))


Discovered while trying to make Synapse fast enough for this MSC2716 test for importing many batches. As an example, disabling the have_seen_event cache saves 10 seconds for each /messages request in that MSC2716 Complement test because we're not making as many federation requests for /state (speeding up have_seen_event itself is related to #13625)

But this will also make /messages faster in general so we can include it in the faster /messages milestone.

Dev notes

have_seen_event cache originally added in #9953

@MadLittleMods MadLittleMods added A-Performance Performance, both client-facing and admin-facing A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) labels Sep 21, 2022
@MadLittleMods MadLittleMods self-assigned this Sep 21, 2022
@DMRobertson DMRobertson added S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. O-Occasional Affects or can be seen by some users regularly or most users rarely labels Sep 21, 2022
@DMRobertson
Copy link
Contributor

I see that we have one spot in Synapse where we invalidate the have_seen_event cache (besides when we purge a room) but if I add a log line to _invalidate_caches_for_event, it never runs in my Complement tests. Does this have a bug where the CacheInvalidationWorkerStore stuff never runs in monolith mode?

Seems plausible: they only seem to be called by process_replication_rows and _process_event_stream_row, both of which are for responding to replication messages only.

@DMRobertson
Copy link
Contributor

Not familiar with the event persistence code, but it looks like persist_event and persist_events end up calling _persist_event_batch and then _persist_events_and_state_updates. The latter might be a good place to invalidate caches? But thinking a little more, maybe it should be the place that sends out the replication messages. Not sure where those are.

@clokep
Copy link
Member

clokep commented Sep 21, 2022

I wonder if we should be prefilling the cache in this case instead of invalidating it? Isn't it likely that we're going to immediately use that event if we just persisted it?

@DMRobertson
Copy link
Contributor

DMRobertson commented Sep 21, 2022

I had the same thought, but I wasn't sure how easy it is to do that with the cachedList magic.

@richvdh richvdh changed the title have_seen_event cache is not invalidated when we persist an event have_seen_event cache is not invalidated on monolith instances when we persist an event Sep 21, 2022
MadLittleMods added a commit that referenced this issue Sep 21, 2022
…vent

Fix for
#13856

Fixed by calling `_invalidate_caches_for_event`
when we persist an event.

And an additional fix in `_invalidate_caches_for_event`
to make sure it uses the correct cache key. This seems
like it would be an easy foot-gun for any `tree=True`
cache.

Wrong:
```py
self.have_seen_event.invalidate((room_id, event_id))
```

Correct:
```py
self.have_seen_event.invalidate(((room_id, event_id),))
```
MadLittleMods added a commit that referenced this issue Sep 21, 2022
Fix #13856

`_invalidate_caches_for_event` doesn't run in monolith mode which
means we never even tried to clear the `have_seen_event` and other
caches. And even in worker mode, it only runs on the workers, not
the master (AFAICT).

Additionally there is bug with the key being wrong so
`_invalidate_caches_for_event` never invalidates the
`have_seen_event` cache even when it does run.

Wrong:
```py
self.have_seen_event.invalidate((room_id, event_id))
```

Correct:
```py
self.have_seen_event.invalidate(((room_id, event_id),))
```
@MadLittleMods MadLittleMods changed the title have_seen_event cache is not invalidated on monolith instances when we persist an event have_seen_event cache is not invalidated when we persist an event Sep 21, 2022
@MadLittleMods
Copy link
Contributor Author

Changing the title back to "have_seen_event cache is not invalidated when we persist an event" because even though _invalidate_caches_for_event was called via events over replication (in worker mode), there is another bug in it where we try to invalidate the have_seen_event cache with the wrong key so it still doesn't work, see #13863 (comment)

@MadLittleMods
Copy link
Contributor Author

Thanks for the input @DMRobertson and @clokep! Really helped ease the pain making the fix in #13863

MadLittleMods added a commit that referenced this issue Sep 22, 2022
Fix #13856

`_invalidate_caches_for_event` doesn't run in monolith mode which
means we never even tried to clear the `have_seen_event` and other
caches. And even in worker mode, it only runs on the workers, not
the master (AFAICT).

Additionally there is bug with the key being wrong so
`_invalidate_caches_for_event` never invalidates the
`have_seen_event` cache even when it does run.

Wrong:
```py
self.have_seen_event.invalidate((room_id, event_id))
```

Correct:
```py
self.have_seen_event.invalidate(((room_id, event_id),))
```
MadLittleMods added a commit that referenced this issue Sep 27, 2022
Fix #13856
Fix #13865

> Discovered while trying to make Synapse fast enough for [this MSC2716 test for importing many batches](matrix-org/complement#214 (comment)). As an example, disabling the `have_seen_event` cache saves 10 seconds for each `/messages` request in that MSC2716 Complement test because we're not making as many federation requests for `/state` (speeding up `have_seen_event` itself is related to #13625) 
> 
> But this will also make `/messages` faster in general so we can include it in the [faster `/messages` milestone](https://github.com/matrix-org/synapse/milestone/11).
> 
> *-- #13856


### The problem

`_invalidate_caches_for_event` doesn't run in monolith mode which means we never even tried to clear the `have_seen_event` and other caches. And even in worker mode, it only runs on the workers, not the master (AFAICT).

Additionally there was bug with the key being wrong so `_invalidate_caches_for_event` never invalidates the `have_seen_event` cache even when it does run.

Because we were using the `@cachedList` wrong, it was putting items in the cache under keys like `((room_id, event_id),)` with a `set` in a `set` (ex. `(('!TnCIJPKzdQdUlIyXdQ:test', '$Iu0eqEBN7qcyF1S9B3oNB3I91v2o5YOgRNPwi_78s-k'),)`) and we we're trying to invalidate with just `(room_id, event_id)` which did nothing.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Messages-Endpoint /messages client API endpoint (`RoomMessageListRestServlet`) (which also triggers /backfill) A-Performance Performance, both client-facing and admin-facing O-Occasional Affects or can be seen by some users regularly or most users rarely S-Minor Blocks non-critical functionality, workarounds exist. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants