Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Remove get rooms for user with stream ordering #13991

Conversation

Fizzadar
Copy link
Contributor

@Fizzadar Fizzadar commented Sep 30, 2022

Builds upon #13787. I have copied over some comments from Erik on the reference PR (I have re-applied this on develop for this PR).

Aim here is complete removal of the get_rooms_for_user_with_stream_ordering function and it's join on the events table. The final use is in the sync handler which currently fetches a users room after the current token, leaving a window during which a user may be joined to a room. The ordering is used to check this.

This approach flips the idea on it's head by getting the users rooms first, then the token, and then any membership changes between. Crucially the membership events are already fetched as part of sync and so this doesn't increase the queries required. Unfortunately due to race conditions with bans any rooms located during this still need a check against their state at membership-event-time, but this is identical to the current implementation.

I have also stored the membership events on the SyncResultBuilder object to avoid looking this up multiple times during a sync, which closes a TODO note in the sync handler.

Signed off by Nick @ Beeper (@Fizzadar).

Pull Request Checklist

  • Pull request is based on the develop branch
  • Pull request includes a changelog file. The entry should:
    • Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from EventStore to EventWorkerStore.".
    • Use markdown where necessary, mostly for code blocks.
    • End with either a period (.) or an exclamation mark (!).
    • Start with a capital letter.
    • Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry.
  • Pull request includes a sign off
  • Code style is correct
    (run the linters)

By getting the joined rooms before the current token we avoid any reading
history to confirm a user *was* in a room. We can then use any membership
change events, which we already fetch during sync, to determine the final
list of joined room IDs.
membership_change_events = []
if since_token:
membership_change_events = await self.store.get_membership_changes_for_user(
user_id, since_token.room_key, now_token.room_key, self.rooms_to_exclude
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from old draft PR

erikjohnston 3 days ago

Isn't since_token significantly older than we need? Can we take the current token before we fetch get_rooms_for_user and then only get the changes between the two?
Member Author

Fizzadar Fizzadar 2 days ago

Yes much older - if we switch to using deltas as above this would be far more efficient.

In the current state of this PR though it makes sense to call get_membership_changes_for_user between the since_token & now because we later fetch this anyway (twice), so passing it around as part of the SyncResultBuilder removes those duplicate lookups.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. But doesn't that mean we'll be handling a bunch of spurious changes below? Specifically won't we hit the log line for every membership change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh nice spot yeah that'll potentially check tonnes of events 🤦 Have pushed up a82b8d7 which limits it to the specific window between get rooms + current token.

)
# User joined a room - we have to then check the room state to ensure we
# respect any bans if there's a race between the join and ban events.
if event.membership == Membership.JOIN:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copied from old draft PR

erikjohnston 3 days ago

I think we need to be careful that we handle a join -> join transition? e.g. a display name update?

I think we could look at the current_state_deltas table to get a more accurate sense of how the state has changed?

erikjohnston erikjohnston 3 days ago

Actually, no we don't need to worry about join -> join transitions. I was thinking of a different un-applicable scenario.

current_state_deltas probably still the right table?
Member Author

Fizzadar Fizzadar 2 days ago

Was just trying to figure this out but I don't think current_state_deltas has all the information we need - specifically the membership field in event content means we can't determine whether an event was a join or invite.

Pulling in the events would solve this, but at that point it's essentially the same as the current get_membership_changes_for_user call I think, is there a reason this isn't suitable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_state_deltas will give you the actual changes in current state between two tokens, whereas get_membership_changes_for_user is a bit dodge and only gets an membership events that have been persisted in the range (these are technically not the same thing).

Pulling in the events would solve this, but at that point it's essentially the same as the current get_membership_changes_for_user call I think

You'd want to join against room_memberships, so current_state_deltas may be cheaper as there will likely be less state changes than events in the room.

Though it occurs to me that we only keep current_state_deltas for a month, so we'd have to handle the case that the since token given was too old. Ideally tell the client to do a full resync, but at that point we're creeping the scope of this PR a bunch

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to ask clients to re-init sync currently? Assuming this is possible I think all the membership change lookups in sync could be optimised out to use the current_stream_delta x room_memberships and drop the get_membership_changes_for_user calls entirely. I wonder/hope if this could dramatically simplify the current membership change handling in sync.

I think this fits into a subsequent PR though, definitely out of scope here! As it stands I believe this change will behave as expected; the sync handler already uses the membership change events to calculate room changes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the best that we can do currently is soft log the user out, which isn't the most ideal. Will double check if the sliding sync proposal has a mechanism

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sliding sync supports it

@Fizzadar Fizzadar marked this pull request as ready for review September 30, 2022 14:43
@Fizzadar Fizzadar requested a review from a team as a code owner September 30, 2022 14:43
membership_change_events = []
if since_token:
membership_change_events = await self.store.get_membership_changes_for_user(
user_id, since_token.room_key, now_token.room_key, self.rooms_to_exclude
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right. But doesn't that mean we'll be handling a bunch of spurious changes below? Specifically won't we hit the log line for every membership change?

)
# User joined a room - we have to then check the room state to ensure we
# respect any bans if there's a race between the join and ban events.
if event.membership == Membership.JOIN:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

current_state_deltas will give you the actual changes in current state between two tokens, whereas get_membership_changes_for_user is a bit dodge and only gets an membership events that have been persisted in the range (these are technically not the same thing).

Pulling in the events would solve this, but at that point it's essentially the same as the current get_membership_changes_for_user call I think

You'd want to join against room_memberships, so current_state_deltas may be cheaper as there will likely be less state changes than events in the room.

Though it occurs to me that we only keep current_state_deltas for a month, so we'd have to handle the case that the since token given was too old. Ideally tell the client to do a full resync, but at that point we're creeping the scope of this PR a bunch

Comment on lines 1365 to 1371
extrems = await self.store.get_forward_extremities_for_room_at_stream_ordering(
room_id, event.internal_metadata.stream_ordering
)
user_ids_in_room = await self.state.get_current_user_ids_in_room(
room_id, extrems
)
if user_id in user_ids_in_room:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an expensive check. Can we instead check if the given user is now in the room? I think that might be good enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think that works - ef2950a.

@Fizzadar Fizzadar requested a review from erikjohnston October 4, 2022 12:02
@erikjohnston erikjohnston merged commit 0506bb1 into matrix-org:develop Oct 4, 2022
@Fizzadar Fizzadar deleted the remove-get-rooms-for-user-with-stream-ordering branch October 4, 2022 15:46
Fizzadar added a commit to beeper/synapse-legacy-fork that referenced this pull request Oct 7, 2022
By getting the joined rooms before the current token we avoid any reading
history to confirm a user *was* in a room. We can then use any membership
change events, which we already fetch during sync, to determine the final
list of joined room IDs.
# Conflicts:
#	synapse/handlers/sync.py
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants