Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix AssertionErrors after purging events #11642

Merged
merged 2 commits into from
Jan 4, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/11536.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a long-standing bug which could cause `AssertionError`s to be written to the log when Synapse was restarted after purging events from the database.
1 change: 0 additions & 1 deletion changelog.d/11536.misc

This file was deleted.

1 change: 1 addition & 0 deletions changelog.d/11642.bugfix
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Fix a long-standing bug which could cause `AssertionError`s to be written to the log when Synapse was restarted after purging events from the database.
4 changes: 2 additions & 2 deletions scripts-dev/check-newsfragment
Original file line number Diff line number Diff line change
Expand Up @@ -42,8 +42,8 @@ echo "--------------------------"
echo

matched=0
for f in $(git diff --name-only FETCH_HEAD... -- changelog.d); do
# check that any modified newsfiles on this branch end with a full stop.
for f in $(git diff --diff-filter=d --name-only FETCH_HEAD... -- changelog.d); do
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--diff-filter=d tells git to ignore deleted files here. This should probably be a separate PR but at this stage I've got about 15 of the things open :-S

# check that any added newsfiles on this branch end with a full stop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is added or modified, but not deleted?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastchar=$(tr -d '\n' < "$f" | tail -c 1)
if [ "$lastchar" != '.' ] && [ "$lastchar" != '!' ]; then
echo -e "\e[31mERROR: newsfragment $f does not end with a '.' or '!'\e[39m" >&2
Expand Down
11 changes: 11 additions & 0 deletions synapse/handlers/stats.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,17 @@ async def _unsafe_process(self) -> None:
# If self.pos is None then means we haven't fetched it from DB
if self.pos is None:
self.pos = await self.store.get_stats_positions()
room_max_stream_ordering = self.store.get_room_max_stream_ordering()
if self.pos > room_max_stream_ordering:
# apparently, we've processed more events than exist in the database!
# this can happen if events are removed with history purge or similar.
logger.warning(
"Event stream ordering appears to have gone backwards (%i -> %i): "
"rewinding stats processor",
self.pos,
room_max_stream_ordering,
)
self.pos = room_max_stream_ordering
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm having trouble convincing myself this is a "safe" operation to do. I thought the max stream ordering was monotonically increasing, even if you purged events but I might be misunderstanding how it is used?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, that's a fair question.

The fact is that currently, on sqlite, we do reuse stream orderings, because we have no way of knowing which stream orderings were used before synapse was restarted.

For things like the stats and userdirectory, that's not really a problem, because any events that get a stream ordering later than the max at startup must be new events, so I think this must be safe.

One place it is a problem is in client /sync - since stream_orderings are encoded in sync tokens, any events that reuse old stream_orderings probably won't get synced to the client. Ironically, we don't log any warnings in that case.

So, sure, a better solution here is not to reuse stream orderings at all, but that's not easy. Meanwhile I think we may as well update these bits to stop them complaining.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the thorough explanation! The issue with /sync sounds like it could be a difficult thing to track down, but definitely out of scope of fixing this problem.


# Loop round handling deltas until we're up to date

Expand Down
18 changes: 15 additions & 3 deletions synapse/handlers/user_directory.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,9 +148,21 @@ async def _unsafe_process(self) -> None:
if self.pos is None:
self.pos = await self.store.get_user_directory_stream_pos()

# If still None then the initial background update hasn't happened yet.
if self.pos is None:
return None
# If still None then the initial background update hasn't happened yet.
if self.pos is None:
return None

room_max_stream_ordering = self.store.get_room_max_stream_ordering()
if self.pos > room_max_stream_ordering:
# apparently, we've processed more events than exist in the database!
# this can happen if events are removed with history purge or similar.
logger.warning(
"Event stream ordering appears to have gone backwards (%i -> %i): "
"rewinding user directory processor",
self.pos,
room_max_stream_ordering,
)
self.pos = room_max_stream_ordering

# Loop round handling deltas until we're up to date
while True:
Expand Down