Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

When restarting a partial join resync, prioritise the server which actioned a partial join #14126

Merged
merged 17 commits into from
Oct 18, 2022

Conversation

DMRobertson
Copy link
Contributor

@DMRobertson DMRobertson commented Oct 10, 2022

Fixes #12999. See #12999 (comment) and the comments below it for context.

Should be commitwise reviewable.

David Robertson added 9 commits October 10, 2022 16:09
@DMRobertson DMRobertson force-pushed the dmr/faster-joins/pick-resync-server branch from 6e292a8 to b9a77e8 Compare October 11, 2022 11:18
@DMRobertson DMRobertson changed the title [WIP] Keep track of the server which actioned a partial join When restarting a partial join resync, prioritise the server which actioned a partial join Oct 11, 2022
synapse/handlers/federation.py Outdated Show resolved Hide resolved
synapse/storage/database.py Show resolved Hide resolved
Comment on lines 1206 to 1207
# TODO(faster joins): To make this robust we should run both SELECTs in the
# same transaction.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left this as a TODO for pragmatism's sake (read: I don't want to read database.py right now). I'm assuming it's very unlikely that a new partial join will have started and completed between the two SELECTs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a transaction only helps you at a REPEATABLE READ (or higher) isolation level (which, in fairness, we do use by default, but every so often we talk about reducing it).

I'd just do room_servers.get(room_id) and add a check for None, to save having to worry about it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks---the comment is naive as written.

(The reference here is https://www.postgresql.org/docs/current/transaction-iso.html .)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just do room_servers.get(room_id) and add a check for None, to save having to worry about it.

Are you suggesting we ignore this row if we get a None here, or loudly log a warning?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should ignore rows that are returned from partial_state_rooms_servers and not partial_state_rooms. (Which will give the same effect as we would see if the transaction was properly isolated).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks. This is 9e9be2b

@DMRobertson DMRobertson marked this pull request as ready for review October 11, 2022 11:22
@DMRobertson DMRobertson requested a review from a team as a code owner October 11, 2022 11:22
@DMRobertson DMRobertson requested a review from richvdh October 11, 2022 11:22
synapse/handlers/federation.py Outdated Show resolved Hide resolved
Comment on lines 1206 to 1207
# TODO(faster joins): To make this robust we should run both SELECTs in the
# same transaction.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that a transaction only helps you at a REPEATABLE READ (or higher) isolation level (which, in fairness, we do use by default, but every so often we talk about reducing it).

I'd just do room_servers.get(room_id) and add a check for None, to save having to worry about it.

If an `initial_destination` is given, it takes top priority. Otherwise
all servers are treated equally.
"""
if initial_destination is not None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest inverting this condition for simplicity:

Suggested change
if initial_destination is not None:
if initial_destination is None:
return other_destinations

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -1617,7 +1617,7 @@ async def _resume_sync_partial_state_room(self) -> None:
async def _sync_partial_state_room(
self,
initial_destination: Optional[str],
other_destinations: Collection[str],
other_destinations: Sequence[str],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain why this is needed? _prioritise_destinations_for_partial_state_resync takes a Collection, not a Sequence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might have been a left-over from a WIP version before I pulled out the helper. (Or maybe I was trying to express "we preserve the iteration order of other_destinations"? 🤷 )

Copy link
Contributor Author

@DMRobertson DMRobertson Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 1ccf9b8, in any case

synapse/handlers/federation.py Show resolved Hide resolved
@DMRobertson DMRobertson requested a review from richvdh October 17, 2022 17:14
Copy link
Member

@richvdh richvdh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@DMRobertson DMRobertson merged commit c3a4780 into develop Oct 18, 2022
@DMRobertson DMRobertson deleted the dmr/faster-joins/pick-resync-server branch October 18, 2022 11:33
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Faster joins: smarter algorithm for picking a server to resync from
2 participants