-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
deal with an event that is in event_json but not events #10718
Comments
(As far as I'm aware,) It's not meant to be the case that you can have an event in I'm therefore led to believe that something weird has gone on with your database (do you know if you've done anything unusual that could have led to this? Restored from an old backup, ... anything?). However, as a point of robustness, it might be sensible to implement something like your PR. I will raise it for discussion. |
Yes, there was an DB upgrade that ran into problems, need to check how that was solved. Though that is not an obvious explanation. How likely is it that any of the tools involved pick up half of a transaction? Maybe the transaction is committed by some other mechanism than explicitly by the code doing the insert in certain situations? |
When an incoming event id is present in event_json but not in events synapse fails trying to insert it with "psycopg2.errors.UniqueViolation: duplicate key value violates unique constraints", because it is only filtered based on those that are in events. I don't know why those become out of sync, but this happening was reported by others before. Fix this by using an upsert (which inserts or updates existing records) instead of a normal insert. Please verify that this is the safe and correct thing to do before merging this. Verify e.g. that it doesn't allow breaking history integrity or something like it. As I don't know enough to understand what this change entails. Fixes: matrix-org#10718 Signed-off-by: Jan Zerebecki <[email protected]>
After the failed postgresql upgrade https://progress.opensuse.org/issues/93686 it seems there was some manual editing of the database: https://progress.opensuse.org/issues/94189 . |
Thank you for proposing a pull request (#10719); we'll discuss that there. However, given that manual database edits were involved in reaching this state, it's not immediately clear that the solution in your PR would leave you in a correct -- rather than just differently broken -- state. Adding more constraints to the database would be helpful, but difficult to do after the fact with such a large table. We've also thought about trying to find a way to merge both the It might help to purge the offending rooms from the server, if the side effects thereof are acceptable. |
Loosing more data of the relevant rooms is probably worse. Is there working tooling to backup and restore room history from outside synapse? Is there existing code to check or repair more consistency than the DB constraints provide? |
Note: It was necessary to edit the DB because postgresql violated its own UNIQUE constraints because it doesn't notice if glibc collation changes require a reindex. |
There is no additional tooling that I am aware of.
this is why synapse emits a warning if you use incorrect collation or ctype settings (#6734) |
Thx, the warning is indeed in the logs: Perhaps instead of a warning that should be made to abort startup unless a new setting Perhaps a better way instead of my above PR is to create a tool shipped with synapse that can be used by an admin to run through the db to check for and and optionally fix such inconsistencies, either by synthesizing the missing rows from other tables or by fetching from federation or by deleting those rows that have their corresponding row missing. Though I probably won't get to that. After running synapse with the above PR as of d891766 I get another constraint violation which I'll try fix in a similar way.
|
For what it's worth:
https://matrix-org.github.io/synapse/latest/postgres.html#fixing-incorrect-collate-or-ctype I don't know how long that has been the case*; wonder if you created your database before then or perhaps managed to flip it over during a backup/restore? *edit: #6734 — early 2020. |
We're fine going with this approach to at least help people avoid silent corruption... details tbd. |
We should also probably link to information on how Glibc collation changes can break PostgreSQL. E.g., https://wiki.postgresql.org/wiki/Locale_data_changes |
I ran into some database corruption due to a hardware failure, which caused some data from the event tables to go out of sync with eachother. I ended up taking the linked PR, rebasing it to 1.46 and expanding it so all event storage was done through upserts, which was able to take my server back to a live state again. |
I distinctly remember setting up the database with correct collate and ctype values, so I assume it must have happened during the large db outage we had in the summer |
We don't seem to have written down a clear conclusion about what to do with this, but I'm not happy switching all our inserts to upserts in the hope that it somehow works around a corrupt index, so I'm going to go ahead and close it. |
Please reopen. We did get to a conclusion but the change is not implemented, yet:
We should probably also add the commands needed to drop the broken data from a DB to https://matrix-org.github.io/synapse/latest/postgres.html#fixing-incorrect-collate-or-ctype and also add:
|
I suggest opening a new issue for those changes... they are quite a long way from the original issue. |
Description
When an event comes in that is not in events, when trying to persist it in event_json a possible duplicate should be handled. Currently
synapse/synapse/storage/databases/main/events.py
Lines 1337 to 1339 in f03cafb
The same issue was also seen in #8889 but not further discussed (besides the manually deleting the event from event_json).
Steps to reproduce
Version information
If not matrix.org:
Version: 1.41.0
Install method:
The text was updated successfully, but these errors were encountered: