Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

All tables in Postgres should have a REPLICA IDENTITY available so that Postgres logical replication can be used #16224

Open
reivilibre opened this issue Sep 2, 2023 · 10 comments · Fixed by #16456, #16647 or #16658
Assignees
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.

Comments

@reivilibre
Copy link
Contributor

Description:

When migrating a Postgres database, there are a few options:

  • SQL dump (pg_dump which is then fed into psql on the restoring end). Simple but in my case this is taking 150 minutes to restore, so I have outgrown this solution really as 150 minutes of downtime is quite hard to schedule.
  • Use physical replication and perform a failover to the secondary once it's caught up. This is possible today (and matrix.org uses physical replication for a hot standby), but physical replication is not very granular — you have to use it for the whole Postgres cluster, can't select just one database — and needs the versions of Postgres to match and seems to otherwise enforce some constraints about the setup on both sides needing to match. (As the manual says, really you're better off just having the primary and secondary be identical in as many ways as possible!)
  • Use logical replication and perform a failover to the secondary once it's caught up. This approach is quite elegant to use from Postgres — you first have to pg_dump --schema-only and restore that, then use CREATE PUBLICATION (primary) and CREATE SUBSCRIPTION (secondary) and Postgres takes care of the rest... even across different Postgres versions and different machine architectures... but with Synapse this currently causes a problem on the primary.

The problem is that Postgres needs to identify individual rows in the tables using a so-called REPLICA IDENTITY. This defaults to the primary key of the table if one is set — but many Synapse tables just don't have a primary key.

The net effect is that once you start logical replication, all UPDATEs and DELETEs to the tables without primary keys now fail. In turn, this causes basic features like /sync to stop working (as it deletes from device_inbox at least).

psycopg2.errors.ObjectNotInPrerequisiteState: cannot delete from table "device_inbox" because it does not have a replica identity and publishes deletes
HINT:  To enable deleting from the table, set REPLICA IDENTITY using ALTER TABLE.

You can set the REPLICA IDENTITY per table manually, either to an existing unique index or to the full record as a fallback if you really have no better option. (see https://www.postgresql.org/docs/15/sql-altertable.html#SQL-ALTERTABLE-REPLICA-IDENTITY)

It would be nice to do this in Synapse so that logical replication works out of the box. We might even consider keeping a lint around to check that all tables have a replica identity?

I think this SQL can be used to find tables that are set to use the default replica identity, but which don't have a primary key (i.e. they don't have a valid replica identity after all):

WITH tables_no_pkey AS (
    SELECT tbl.table_schema, tbl.table_name
    FROM information_schema.tables tbl
    WHERE table_type = 'BASE TABLE'
        AND table_schema not in ('pg_catalog', 'information_schema')
        AND NOT EXISTS (
            SELECT 1 
            FROM information_schema.key_column_usage kcu
            WHERE kcu.table_name = tbl.table_name 
                AND kcu.table_schema = tbl.table_schema
        )
)
SELECT oid::regclass FROM tables_no_pkey INNER JOIN pg_class ON oid::regclass = table_name::regclass
WHERE relreplident = 'd';
-- d = default

This currently gives me

event_relations
federation_stream_position
federation_inbound_events_staging
local_media_repository_thumbnails
local_media_repository_url_cache
device_federation_outbox
event_search
ignored_users
insertion_events
monthly_active_users
push_rules_stream
room_stats_state
blocked_rooms
current_state_delta_stream
cache_invalidation_stream_by_instance
deleted_pushers
appservice_room_list
device_inbox
device_lists_outbound_last_success
device_lists_remote_cache
device_lists_stream
device_lists_remote_extremeties
e2e_cross_signing_signatures
device_federation_inbox
e2e_room_keys_versions
event_auth
event_auth_chain_links
insertion_event_extremities
presence_stream
ratelimit_override
remote_media_cache_thumbnails
room_alias_servers
room_stats_earliest_token
user_directory
threepid_guest_access_tokens
state_groups_state
user_filters
users_in_public_rooms
user_threepid_id_server
users_pending_deactivation
insertion_event_edges
users_who_share_private_rooms
worker_locks
device_auth_providers
device_lists_changes_in_room
device_lists_remote_resync
e2e_room_keys
state_group_edges
stream_ordering_to_exterm
event_push_summary
device_lists_outbound_pokes
e2e_cross_signing_keys
erased_users
stream_positions
event_push_actions_staging
user_ips
user_signature_stream
batch_events
user_daily_visits
user_directory_search
@reivilibre reivilibre self-assigned this Sep 2, 2023
@DMRobertson
Copy link
Contributor

This defaults to the primary key of the table if one is set — but many Synapse tables just don't have a primary key.

Maybe we should give every table a primary key? IIRC that would have been useful elsewhere, e.g. #15583

@reivilibre
Copy link
Contributor Author

reivilibre commented Sep 4, 2023

This defaults to the primary key of the table if one is set — but many Synapse tables just don't have a primary key.

Maybe we should give every table a primary key? IIRC that would have been useful elsewhere, e.g. #15583

Agreed but may not be trivial. I would hope that all new tables we create have primary keys.

One thing to note is that many of these tables probably have worthy unique indices already. In Postgres, those can be converted for free with ALTER TABLE my_table ADD CONSTRAINT PK_my_table PRIMARY KEY USING INDEX my_index; (TIL!) but I'm not sure what we can do about SQLite (rebuilding tables is one option of course, but I'm wondering if there's a free method by rewriting the tables' schema — then again, I'm not sure I would actually want to use that solution).

@DMRobertson DMRobertson added S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements. A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely labels Sep 4, 2023
@thinkwelltwd
Copy link

I'm researching the possibility of an HA setup based on pgEdge, which, like logical replication, requires REPLICA IDENTITY set on replicated tables.

One thing to note is that many of these tables probably have worthy unique indices already.

These are the only tables that don't have unique indexes.

deleted_pushers
current_state_delta_stream
device_auth_providers
device_federation_inbox
device_federation_outbox
device_inbox
device_lists_outbound_pokes
device_lists_stream
e2e_cross_signing_signatures
event_auth
event_auth_chain_links
event_push_actions_staging
insertion_event_edges
local_media_repository_url_cache
presence_stream
push_rules_stream
room_alias_servers
state_groups_state
stream_ordering_to_exterm
user_daily_visits
users_pending_deactivation

but I'm not sure what we can do about SQLite (rebuilding tables is one option of course, but I'm wondering if there's a free method

A unique index can be used as a REPLICA IDENTITY

ALTER TABLE table_name REPLICA IDENTITY USING INDEX index_name;

Adding such statements in full.sql.postgres would resolve 2/3 of the tables, without affecting sqlite, right? If so, it seems to me that the following tables could have existing indexes converted to unique, or a unique index added.

deleted_pushers
room_alias_servers
device_auth_providers
device_federation_inbox
push_rules_stream
room_alias_servers
state_groups_state
device_lists_outbound_pokes

And the following tables would seem to likely candidates for REPLICA IDENTITY FULL

event_auth
event_auth_chain_links
insertion_event_edges
state_groups_state
stream_ordering_to_exterm
user_daily_visits
users_pending_deactivation
device_lists_stream

That leaves the tables below...

current_state_delta_stream
device_federation_outbox
device_inbox
e2e_cross_signing_signatures
event_push_actions_staging
local_media_repository_url_cache
presence_stream

I don't have deep knowledge of synapse, so take the above with a grain of salt. All table research aside, the main thing I'm hoping for is that unique indexes as REPLICA IDENTITY are the easy path forward here.

@reivilibre
Copy link
Contributor Author

Going to try and put up a quick solution for this shortly.

Here's a list of the tables above, either table: unique_index_name or table! if there is no unique index. Will feed this in to a big search and replace to set the replica identities.

event_relations: event_relations_id
federation_stream_position: federation_stream_position_instance
federation_inbound_events_staging: federation_inbound_events_staging_instance_event
local_media_repository_thumbnails: local_media_repository_thumbn_media_id_width_height_method_key
local_media_repository_url_cache!
device_federation_outbox!
event_search: event_search_event_id_idx
ignored_users: ignored_users_uniqueness
insertion_events: insertion_events_event_id
monthly_active_users: monthly_active_users_users
push_rules_stream!
room_stats_state: room_stats_state_room
blocked_rooms: blocked_rooms_idx
current_state_delta_stream!
cache_invalidation_stream_by_instance: cache_invalidation_stream_by_instance_id
deleted_pushers!
appservice_room_list: appservice_room_list_idx
device_inbox!
device_lists_outbound_last_success: device_lists_outbound_last_success_unique_idx
device_lists_remote_cache: device_lists_remote_cache_unique_id
device_lists_stream!
device_lists_remote_extremeties: device_lists_remote_extremeties_unique_idx
e2e_cross_signing_signatures: e2e_cross_signing_signatures2_idx
device_federation_inbox!
e2e_room_keys_versions: e2e_room_keys_versions_idx
event_auth!
event_auth_chain_links!
insertion_event_extremities: insertion_event_extremities_event_id
presence_stream!
ratelimit_override: ratelimit_override_idx
remote_media_cache_thumbnails: remote_media_repository_thumbn_media_origin_id_width_height_met
room_alias_servers!
room_stats_earliest_token: room_stats_earliest_token_idx
user_directory: user_directory_user_idx
threepid_guest_access_tokens: threepid_guest_access_tokens_index
state_groups_state!
user_filters: full_users_unique_idx
users_in_public_rooms: users_in_public_rooms_u_idx
user_threepid_id_server: user_threepid_id_server_idx
users_pending_deactivation!
insertion_event_edges!
users_who_share_private_rooms: users_who_share_private_rooms_u_idx
worker_locks: worker_locks_key
device_auth_providers!
device_lists_changes_in_room: device_lists_changes_in_stream_id
device_lists_remote_resync: device_lists_remote_resync_idx
e2e_room_keys: e2e_room_keys_with_version_idx
state_group_edges: state_group_edges_unique_idx
stream_ordering_to_exterm!
event_push_summary: event_push_summary_unique_index2
device_lists_outbound_pokes!
e2e_cross_signing_keys: e2e_cross_signing_keys_stream_idx
erased_users: erased_users_user
stream_positions: stream_positions_idx
event_push_actions_staging!
user_ips: user_ips_user_token_ip_unique_index
user_signature_stream: user_signature_stream_idx
batch_events: chunk_events_event_id
user_daily_visits!
user_directory_search: user_directory_search_user_idx

@abeluck
Copy link

abeluck commented Nov 4, 2023

Really looking forward to be able to use logical replication with our pg instances for synapse.

Just to clarify if/when the PR is merged will that make logical replication feasible with synapse? Or is there other work to do still?

@reivilibre
Copy link
Contributor Author

Really looking forward to be able to use logical replication with our pg instances for synapse.

Just to clarify if/when the PR is merged will that make logical replication feasible with synapse? Or is there other work to do still?

That's the hope! I haven't gotten around to trying it again since setting the replica identities yet, but it should be another hurdle removed from being able to do this. If I run into problems again then I will be trying to fix those, since long story short I need to do this to move my homeserver without it taking hours to restore a backup, so I kind of need to get this done soon :)

@reivilibre
Copy link
Contributor Author

(I should probably add some kind of caveat that you should only try this for now if you know a bit about what you're doing and can sort yourself out if you run into trouble.)

reivilibre added a commit that referenced this issue Nov 13, 2023
…cit one. This should allow use of Postgres logical replication. (#16456)

* Add Postgres replica identities to tables that don't have an implicit one

Fixes #16224

* Newsfile

Signed-off-by: Olivier Wilkinson (reivilibre) <[email protected]>

* Move the delta to version 83 as we missed the boat for 82

* Add a test that all tables have a REPLICA IDENTITY

* Extend the test to include when indices are deleted

* isort

* black

* Fully qualify `oid` as it is a 'hidden attribute' in Postgres 11

* Update tests/storage/test_database.py

Co-authored-by: Patrick Cloke <[email protected]>

* Add missed tables

---------

Signed-off-by: Olivier Wilkinson (reivilibre) <[email protected]>
Co-authored-by: Patrick Cloke <[email protected]>
@reivilibre reivilibre reopened this Nov 16, 2023
@reivilibre
Copy link
Contributor Author

This is not quite correct yet. When I created the replication slot sending messages started failing and I had to apply these (from reading logs):

ALTER TABLE stats_incremental_position REPLICA IDENTITY USING INDEX stats_incremental_position_lock_key;
ALTER TABLE room_forgetter_stream_pos REPLICA IDENTITY USING INDEX room_forgetter_stream_pos_lock_key;
ALTER TABLE devices REPLICA IDENTITY USING INDEX device_uniqueness;
ALTER TABLE redactions REPLICA IDENTITY USING INDEX redactions_event_id_key;
ALTER TABLE user_directory_stream_pos REPLICA IDENTITY USING INDEX user_directory_stream_pos_lock_key;
ALTER TABLE event_push_summary_last_receipt_stream_id REPLICA IDENTITY USING INDEX event_push_summary_last_receipt_stream_id_lock_key;
ALTER TABLE event_push_summary_stream_ordering REPLICA IDENTITY USING INDEX event_push_summary_stream_ordering_lock_key;
ALTER TABLE event_forward_extremities REPLICA IDENTITY USING INDEX event_forward_extremities_event_id_room_id_key;
ALTER TABLE event_backward_extremities REPLICA IDENTITY USING INDEX event_backward_extremities_event_id_room_id_key;
ALTER TABLE event_push_actions REPLICA IDENTITY FULL; -- only unique index has nullable column
ALTER TABLE appservice_stream_position REPLICA IDENTITY USING INDEX appservice_stream_position_lock_key;

Obviously my query to find tables that are missing a primary key is not quite correct. So this needs investigating and:

  • a correct query being put in CI
  • the above tables (and any other missed ones) having replica identities added for them

@reivilibre
Copy link
Contributor Author

this all got backed out because of deadlocking :(

@DMRobertson
Copy link
Contributor

I had to revert this when doing the 1.98.0rc. We saw migration pain on matrix.org:

Traceback (most recent call last):
  File "/home/synapse/env-python311/bin/update_synapse_database", line 33, in <module>
    sys.exit(load_entry_point('matrix-synapse', 'console_scripts', 'update_synapse_database')())
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/synapse/src/synapse/_scripts/update_synapse_database.py", line 109, in main
    hs.setup()
  File "/home/synapse/src/synapse/server.py", line 340, in setup
    self.datastores = Databases(self.DATASTORE_CLASS, self)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/synapse/src/synapse/storage/databases/__init__.py", line 74, in __init__
    prepare_database(
  File "/home/synapse/src/synapse/storage/prepare_database.py", line 149, in prepare_database
    _upgrade_existing_database(
  File "/home/synapse/src/synapse/storage/prepare_database.py", line 543, in _upgrade_existing_database
    database_engine.execute_script_file(cur, absolute_path)
  File "/home/synapse/src/synapse/storage/engines/_base.py", line 155, in execute_script_file
    cls.executescript(cursor, f.read())
  File "/home/synapse/src/synapse/storage/engines/postgres.py", line 246, in executescript
    cursor.execute(f"COMMIT; BEGIN TRANSACTION; {script}")
  File "/home/synapse/src/synapse/storage/database.py", line 419, in execute
    self._do_execute(self.txn.execute, sql, parameters)
  File "/home/synapse/src/synapse/storage/database.py", line 481, in _do_execute
    return func(sql, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
psycopg2.errors.WrongObjectType: index "profiles_user_id_key" cannot be used as replica identity because column "user_id" is nullable

The column is not nullable in schema dump 72:

CREATE TABLE profiles (
user_id text NOT NULL,
displayname text,
avatar_url text
);

We have a background update which rewrites the localpart column user_id to an mxid full_user_id. I think this is here:

async def populate_full_user_id_user_filters(
self, progress: JsonDict, batch_size: int
) -> int:

Presumably part of #15396

I don't understand the full machinations of all this (e.g. is everyone affected? only old deployments? are background updates relevant?)

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Database DB stuff like queries, migrations, new/remove columns, indexes, unexpected entries in the db O-Occasional Affects or can be seen by some users regularly or most users rarely S-Tolerable Minor significance, cosmetic issues, low or no impact to users. T-Enhancement New features, changes in functionality, improvements in performance, or user-facing enhancements.
Projects
None yet
4 participants