From a7f2105dda71ada585346b97f1667fcc604ccb38 Mon Sep 17 00:00:00 2001 From: David Robertson Date: Wed, 14 Dec 2022 18:25:09 +0000 Subject: [PATCH 01/11] Initial batch of notes --- docs/SUMMARY.md | 1 + .../synapse_architecture/faster_joins.md | 276 ++++++++++++++++++ 2 files changed, 277 insertions(+) create mode 100644 docs/development/synapse_architecture/faster_joins.md diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 8d68719958d6..ade77d49261c 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -97,6 +97,7 @@ - [Log Contexts](log_contexts.md) - [Replication](replication.md) - [TCP Replication](tcp_replication.md) + - [Faster remote joins](development/synapse_architecture/faster_joins.md) - [Internal Documentation](development/internal_documentation/README.md) - [Single Sign-On]() - [SAML](development/saml.md) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md new file mode 100644 index 000000000000..f970312c6b97 --- /dev/null +++ b/docs/development/synapse_architecture/faster_joins.md @@ -0,0 +1,276 @@ +# How do faster joins work? + +This is a work-in progress set of notes with two goals: +- act as a reference, explainig how Synapse implements faster joins; and +- record the rationale behind our choices. + +See also [MSC3902](https:ithub.com/matrix-org/matrix-spec-proposals/pull/3902). + +## Overview: processing events in a partially-joined room + +The response to a partial join consists of +- the requested join event `J`, +- a list of the servers in the room (according to the state before `J`), +- a subset of the state of the room before `J`, +- the full auth chain of that state subset. + +Synapse marks the room as partially joined by adding a row to +`partial_state_rooms`. It also marks the join event `J` as "partially stated", +meaning that we have neither received nor computed the full state before/after +`J`. This is done by adding a row to `partial_state_events`. + +[**TODO**: include a DDL definition of the partial joins tables.] + +While partially joined to a room, Synapse receives events `E` from remote +homeservers as normal, and can create events at the request of its local users. +However, we run into trouble when we enforce the [checks on an event]. + +> 1. Is a valid event, otherwise it is dropped. For an event to be valid, it + must contain a room_id, and it must comply with the event format of that +> room version. +> 2. Passes signature checks, otherwise it is dropped. +> 3. Passes hash checks, otherwise it is redacted before being processed further. +> 4. Passes authorization rules based on the event’s auth events, otherwise it +> is rejected. +> 5. **Passes authorization rules based on the state before the event, otherwise +> it is rejected.** +> 6. **Passes authorization rules based on the current state of the room, +> otherwise it is “soft failed”.** + +[checks on an event]: https://spec.matrix.org/v1.5/server-server-api/#checks-performed-on-receipt-of-a-pdu + +We can enforce checks 1--4 without any problems. +But we cannot enforce checks 5 or 6 with complete certainty, since Synapse does +not know the full state before `E`, nor that of the room. + +### Partial state + +Instead, we make a best-effort approximation. +While the room is considered partially joined, Synapse tracks the "partial +state" before events. +This works in a similar way as regular state: + +- The partial state before `J` is that given to us by the partial join response. +- The partial state before an event `E` is the resolution of the partial states + after each of `E`'s `prev_event`s. +- If `E` is rejected or a message event, the partial state after `E` is the + partial state before `E`. +- Otherwise, the partial state after `E` is the partial state before `E`, plus + `E` itself. + +More concisely, partial state propagates just like full state; the only +difference is that we "seed" it with an incomplete initial state. +Synapse records that we have only calculated partial state for this event with +a row in `partial_state_events`. + +While the room remains partially stated, check 5 on incoming events to that +room becomes: + +> 5. Passes authorization rules based on **the resolution between the partial +> state before `E` and `E`'s auth events.** If the event fails to pass +> authorization rules, it is rejected. + +Additionally, check 6 is deleted: no soft-failures are enforced. + +While partially joined, the current partial state of the room is defined as the +resolution across the partial states after all forward extremities in the room. + +### Approximation error + +Using partial state means the auth checks can fail in a few different ways[^2]. + +[^2]: Is this exhaustive? + +- We may erroneously accept an incoming event in check 5 based on partial state + when it would have been rejected based on full state, or vice versa. +- This means that an event could erroneously be added to the current partial + state of the room when it would not be present in the full state of the room, + or vice versa. +- Additionally, we may have skipped soft-failing an event that would have been + soft-failed based on full state. + +(Note that the discrepancies described in the last two bullets are user-visible.) + +This means that we have to be very careful when we want to lookup pieces of room +state in a partially-joined room. Our approximation of the state may be +incorrect or missing. But we can make some educated guesses. If + +- our partial state is likely to be correct, or +- the consequences of our partial state being incorrect are minor, + +then we proceed as normal, and let the resync process fix up any mistakes (see +below). + +When is our partial state likely to be correct? + +- It's more accurate the closer we are to the partial join event. (So we should + ideally complete the resync as soon as possible.) +- Non-member events: we will have received them as part of the partial join + response, if they were part of the room state at that point. We may + incorrectly accept or reject updates to that state (at first because we lack + remote membership information; later because of compounding errors), so these + can become incorrect over time. +- Local members' memberships: we are the only ones who can create join and + knock events for our users. We can't be completely confident in the + correctness of bans, invites and kicks from other homeservers, but the resync + process should correct any mistakes. +- Remote member's memberships: we did not receive these in the /send_join + response, so we have essentially no idea if these are correct or not. + +In short, we deem it acceptable to trust the partial state for non-membership +and local membership events. For remote membership events, we wait for the +resync to complete, at which point we have the full state of the room and can +proceed as normal. + +### Fixing the approximation with a resync + +**TODO:** needs prose fleshing out. Needs a discussion of what happens if the +full state and partial state disagree---point out that we already have this +problem when resolution changes state but not because of a new event, requiring +clients to clear caches. Assert that sliding sync will fix this. + +- /state_ids before J. Result persisted to DB? +- Continuous loop to fetch events and auth chains of any missing state events in the state before J +- Once they're all available, persist a state group(?) for the state before/after J. +- Recompute the full state of all events seen since J until there are none left. + - (Does this use new state groups or replace old ones?) + - (Are state groups marked as partially stated?) +- Remove events from `partial_state_events` as you go. + +- Once all events have been un-partial-stated, remove the room from `partial_state_rooms`. + +- Then what happens from the client-side; how are changes between partial and full state sent to clients?? Suspect not at all. + +## Specific cases + +### Creating events during a partial join + +**TODO:** needs prose fleshing out. + +Exactly the same. Pick <= 10 fwd extremities as prev events. + +Can you select auth events in the current (partial) state? +- got power levels/create/join rules from the partial join. +- Will have the sender's membership: comes from a make_join handshake. +- Target's membership? E.g. kick or ban someone. + - May not have received the target's membership in the partial join response. + - Probably only going to kickban someone you've seen a msg from + - If you've seen their message, they'll have cited (some previous version of) their membership... + - Could create a reasonable looking ban event... but not a huge prioity; okay to block + +#### Sending out the event over federation + +**TODO:** needs prose fleshing out. + +Normally: send out in a fed txn to all HSes in the room. +We only know that some HSes were in the room at some point. Wat do. +Send it out to the list of servers from the first join. +**TODO** what do we do here if we have full state? +If the prev event was created by us, we can risk sending it to the wrong HS. (Motivation: privacy concern of the content. Not such a big deal for a public room or an encyrpted room. But non-encrypted invite-only...) +But don't want to send out sensitive data in other HS's events in this way. + +Suppose we discover after resync that we shouldn't have sent out one our events (not a prev_event) to a target HS. Not much we can do. +What about if we didn't send them an event but shouldn't've? +E.g. what if someone joined from a new HS shortly after you did? We wouldn't talk to them. +Could imagine sending out the "Missed" events after the resync but... painful to work out what they shuld have seen if they joined/left. +Instead, just send them the latest event (if they're still in the room after resync) and let them backfill.(?) + - Don't do this currently. + - If anyone who has received our messages sends a message to a HS we missed, they can backfill our messages + - Gap: rooms which are infrequently used and take a long time to resync. + +### Joining after a partial join + +**TODO:** needs prose fleshing out. Liase with Matthieu. Explain why /send_join +(Rich was surprised we didn't just create it locally. Answer: to try and avoid +a join which then gets rejected after resync.) + +We don't know for sure that any join we create would be accepted. +E.g. the joined user might have been banned; the join rules might have changed in a way that we didn't realise... some way in which the partial state was mistaken. +Instead, do another partial make-join/send-join handshake to confirm that the join works. +- Probably going to get a bunch of duplicate state events and auth events.... but the point of partial joins is that these should be small. Many are already persisted = good. +- What if the second send_join response includes a different list of reisdent HSes? Could ignore it. + - Could even have a special flag that says "just make me a join", i.e. don't bother giving me state or servers in room. Deffo want the auth chain tho. +- SQ: wrt device lists it's a lot safer to ignore it!!!!! +- What if the state at the second join is inconsistent with what we have? Ignore it? + +### Leaving (and kicks and bans) after a partial join + +When you're fully joined to a room, to have `U` leave a room their homeserver +needs to + +- create a new leave event for `U` which will be accepted by other homeservers, + and +- send that event `U` out to the homeservers in the federation. + +When is a leave event accepted? See +[v10 auth rules](https://spec.matrix.org/v1.5/rooms/v10/#authorization-rules): + +> 4. If type is m.room.member: [...] +> +> 5. If membership is leave: +> +> 1. If the sender matches state_key, allow if and only if that user’s current membership state is invite, join, or knock. +> 2. [...] + +I think this means that (well-formed!) self-leaves are governed entirely by +4.5.1. This means that if we correctly calculate state which says that `U` is +invited, joined or knocked and include it in the leave's auth events, our event +is accepted by checks 4 and 5 on incoming events. + +> 4. Passes authorization rules based on the event’s auth events, otherwise +> it is rejected. +> 5. Passes authorization rules based on the state before the event, otherwise +> it is rejected. + +The only way to fail check 6 is if the receiving server's current state of the +room says that `U` is banned, has left, or has no membership event. But this is +fine: the receiving server already thinks that `U` isn't in the room. + +> 6. Passes authorization rules based on the current state of the room, +> otherwise it is “soft failed”. + +For the second point (publishing the leave event), the best thing we can do is +to is publish to all HSes we know to be currently in the room. If they miss that +event, they might send us traffic in the room that we don't care about. This is +a problem with leaving after a "full" join; we don't seek to fix this with +partial joins. + +(With that said: there's nothing machine-readable in the /send response. I don't +think we can deduce "destination has left the room" from a failure to /send an +event into that room?) + +#### Can we still do this during a partial join? + +We can create leave events and can choose what gets included in our auth events, +so we can be sure that we pass check 4 on incoming events. For check 5, we might +have an incorrect view of the state before an event. +The only way we might erroneously think a leave is valid is if + +- the partial state before the leave has `U` joined, invited or knocked, but +- the full state before the leave has `U` banned, left or not present, + +in which case the leave doesn't make anything worse: other HSes already consider +us as not in the room, and will continue to do so after seeing the leave. + +The remaining obstacle is then: can we safely broadcast the leave event? We may +miss servers or incorrectly think that a server is in the room. Or the +destination server may be offline and miss the transaction containing our leave +event.This should self-heal when they see an event whose `prev_events` descends +from our leave. + +Another option we considered was to use federation `/send_leave` to ask a +fully-joined server to send out the event on our behalf. But that introduces +complexity without much benefit. Besides, as Rich put it, + +> sending out leaves is pretty best-effort currently + +so this is probably good enough as-is. + +#### Cleanup after the last leave + +**TODO**: what cleanup is necessary? Is it all just nice-to-have to save unused +work? + +## Faster joins on workers + +**TODO**: What is Olivier's plan? :) From b2839238a8d9614e54c2845aca4a56f18e9a0899 Mon Sep 17 00:00:00 2001 From: "Olivier Wilkinson (reivilibre)" Date: Mon, 19 Dec 2022 14:53:58 +0000 Subject: [PATCH 02/11] Minor corrections --- docs/development/synapse_architecture/faster_joins.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index f970312c6b97..e3899fbaf597 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -1,7 +1,7 @@ # How do faster joins work? -This is a work-in progress set of notes with two goals: -- act as a reference, explainig how Synapse implements faster joins; and +This is a work-in-progress set of notes with two goals: +- act as a reference, explaining how Synapse implements faster joins; and - record the rationale behind our choices. See also [MSC3902](https:ithub.com/matrix-org/matrix-spec-proposals/pull/3902). @@ -114,7 +114,7 @@ When is our partial state likely to be correct? knock events for our users. We can't be completely confident in the correctness of bans, invites and kicks from other homeservers, but the resync process should correct any mistakes. -- Remote member's memberships: we did not receive these in the /send_join +- Remote members' memberships: we did not receive these in the /send_join response, so we have essentially no idea if these are correct or not. In short, we deem it acceptable to trust the partial state for non-membership @@ -166,7 +166,7 @@ Normally: send out in a fed txn to all HSes in the room. We only know that some HSes were in the room at some point. Wat do. Send it out to the list of servers from the first join. **TODO** what do we do here if we have full state? -If the prev event was created by us, we can risk sending it to the wrong HS. (Motivation: privacy concern of the content. Not such a big deal for a public room or an encyrpted room. But non-encrypted invite-only...) +If the prev event was created by us, we can risk sending it to the wrong HS. (Motivation: privacy concern of the content. Not such a big deal for a public room or an encrypted room. But non-encrypted invite-only...) But don't want to send out sensitive data in other HS's events in this way. Suppose we discover after resync that we shouldn't have sent out one our events (not a prev_event) to a target HS. Not much we can do. From 96f0fa2b9881465a1b235e2654324df84bf34abe Mon Sep 17 00:00:00 2001 From: David Robertson Date: Mon, 9 Jan 2023 13:55:13 +0000 Subject: [PATCH 03/11] Fix typo, thanks Shay Co-authored-by: Shay --- docs/development/synapse_architecture/faster_joins.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index e3899fbaf597..0afd901172d8 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -4,7 +4,7 @@ This is a work-in-progress set of notes with two goals: - act as a reference, explaining how Synapse implements faster joins; and - record the rationale behind our choices. -See also [MSC3902](https:ithub.com/matrix-org/matrix-spec-proposals/pull/3902). +See also [MSC3902](https://github.com/matrix-org/matrix-spec-proposals/pull/3902). ## Overview: processing events in a partially-joined room From f754589c97b9ab1da8a69984107beff79eb679ec Mon Sep 17 00:00:00 2001 From: David Robertson Date: Thu, 26 Jan 2023 12:05:57 +0000 Subject: [PATCH 04/11] Flesh out some prose --- .../synapse_architecture/faster_joins.md | 92 +++++++++++++------ 1 file changed, 64 insertions(+), 28 deletions(-) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index 0afd901172d8..d890d70d0e99 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -75,6 +75,9 @@ Additionally, check 6 is deleted: no soft-failures are enforced. While partially joined, the current partial state of the room is defined as the resolution across the partial states after all forward extremities in the room. +_Remark._ Events with partial state are _not_ considered +[outliers](../room-dag-concepts.md#outliers). + ### Approximation error Using partial state means the auth checks can fail in a few different ways[^2]. @@ -124,25 +127,51 @@ proceed as normal. ### Fixing the approximation with a resync -**TODO:** needs prose fleshing out. Needs a discussion of what happens if the -full state and partial state disagree---point out that we already have this -problem when resolution changes state but not because of a new event, requiring -clients to clear caches. Assert that sliding sync will fix this. +The partial-state approximation is only a temporary affair. In the background, +synapse beings a "resync" process. This is a continuous loop, starting at the +partial join event and proceeding downwards through the event graph. For each +`E` seen in the room since partial join, Synapse will fetch + +- the event ids in the state of the room before `E`, via + [`/state_ids`](https://spec.matrix.org/v1.5/server-server-api/#get_matrixfederationv1state_idsroomid); +- the event ids in the full auth chain of `E`, included in the `/state_ids` + response; and +- any events from the previous two bullets that Synapse hasn't persisted, via + [`/state](https://spec.matrix.org/v1.5/server-server-api/#get_matrixfederationv1stateroomid). + +This means Synapse has (or can compute) the full state before `E`, which allows +Synapse to properly authorise or reject `E`. At this point ,the event +is considered to have "full state" rather than "partial state". We record this +by removing `E` from the `partial_state_events` table. + +\[**TODO:** Does Synapse persist a new state group for the full state +before `E`, or do we alter the (partial-)state group in-place? Are state groups +ever marked as partially-stated? \] -- /state_ids before J. Result persisted to DB? -- Continuous loop to fetch events and auth chains of any missing state events in the state before J -- Once they're all available, persist a state group(?) for the state before/after J. -- Recompute the full state of all events seen since J until there are none left. - - (Does this use new state groups or replace old ones?) - - (Are state groups marked as partially stated?) -- Remove events from `partial_state_events` as you go. +This scheme means it is possible for us to have accepted and sent an event to +clients, only to reject it during the resync. From a client's perspective, the +effect is similar to a retroactive +state change due to state resolution---i.e. a "state reset".[^3] -- Once all events have been un-partial-stated, remove the room from `partial_state_rooms`. +[^3] Clients should refresh caches to detect such a change. Rumour has it that +sliding sync will fix this. -- Then what happens from the client-side; how are changes between partial and full state sent to clients?? Suspect not at all. +When all events since the join `J` have been fully-stated, the room resync +process is complete. We record this by removing the room from +`partial_state_rooms`. + +## Faster joins on workers + +For the time being, the resync process happens on the master worker. +A new replication stream `un_partial_stated_room` is added. Whenever a resync +completes and a partial-state room becomes fully stated, a new message is sent +into that stream containing the room ID. ## Specific cases +> **NB.** The notes below are rough. Some of them are hidden under `
` +disclosures because they have yet to be implemented in mainline Synapse. + ### Creating events during a partial join **TODO:** needs prose fleshing out. @@ -174,12 +203,16 @@ What about if we didn't send them an event but shouldn't've? E.g. what if someone joined from a new HS shortly after you did? We wouldn't talk to them. Could imagine sending out the "Missed" events after the resync but... painful to work out what they shuld have seen if they joined/left. Instead, just send them the latest event (if they're still in the room after resync) and let them backfill.(?) - - Don't do this currently. - - If anyone who has received our messages sends a message to a HS we missed, they can backfill our messages - - Gap: rooms which are infrequently used and take a long time to resync. +- Don't do this currently. +- If anyone who has received our messages sends a message to a HS we missed, they can backfill our messages +- Gap: rooms which are infrequently used and take a long time to resync. ### Joining after a partial join +**NB.** Not yet implemented. + +
+ **TODO:** needs prose fleshing out. Liase with Matthieu. Explain why /send_join (Rich was surprised we didn't just create it locally. Answer: to try and avoid a join which then gets rejected after resync.) @@ -189,12 +222,18 @@ E.g. the joined user might have been banned; the join rules might have changed i Instead, do another partial make-join/send-join handshake to confirm that the join works. - Probably going to get a bunch of duplicate state events and auth events.... but the point of partial joins is that these should be small. Many are already persisted = good. - What if the second send_join response includes a different list of reisdent HSes? Could ignore it. - - Could even have a special flag that says "just make me a join", i.e. don't bother giving me state or servers in room. Deffo want the auth chain tho. + - Could even have a special flag that says "just make me a join", i.e. don't bother giving me state or servers in room. Deffo want the auth chain tho. - SQ: wrt device lists it's a lot safer to ignore it!!!!! - What if the state at the second join is inconsistent with what we have? Ignore it? +
+ ### Leaving (and kicks and bans) after a partial join +**NB.** Not yet implemented. + +
+ When you're fully joined to a room, to have `U` leave a room their homeserver needs to @@ -206,10 +245,10 @@ When is a leave event accepted? See [v10 auth rules](https://spec.matrix.org/v1.5/rooms/v10/#authorization-rules): > 4. If type is m.room.member: [...] -> -> 5. If membership is leave: -> -> 1. If the sender matches state_key, allow if and only if that user’s current membership state is invite, join, or knock. + > + > 5. If membership is leave: + > + > 1. If the sender matches state_key, allow if and only if that user’s current membership state is invite, join, or knock. > 2. [...] I think this means that (well-formed!) self-leaves are governed entirely by @@ -218,16 +257,16 @@ invited, joined or knocked and include it in the leave's auth events, our event is accepted by checks 4 and 5 on incoming events. > 4. Passes authorization rules based on the event’s auth events, otherwise -> it is rejected. + > it is rejected. > 5. Passes authorization rules based on the state before the event, otherwise -> it is rejected. + > it is rejected. The only way to fail check 6 is if the receiving server's current state of the room says that `U` is banned, has left, or has no membership event. But this is fine: the receiving server already thinks that `U` isn't in the room. > 6. Passes authorization rules based on the current state of the room, -> otherwise it is “soft failed”. + > otherwise it is “soft failed”. For the second point (publishing the leave event), the best thing we can do is to is publish to all HSes we know to be currently in the room. If they miss that @@ -270,7 +309,4 @@ so this is probably good enough as-is. **TODO**: what cleanup is necessary? Is it all just nice-to-have to save unused work? - -## Faster joins on workers - -**TODO**: What is Olivier's plan? :) +
From ac5e4c260433ddfc5e775752dc3037288b446d4e Mon Sep 17 00:00:00 2001 From: David Robertson Date: Fri, 27 Jan 2023 16:42:36 +0000 Subject: [PATCH 05/11] faster vs partial --- docs/development/synapse_architecture/faster_joins.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index d890d70d0e99..bff7419d1dcb 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -6,6 +6,11 @@ This is a work-in-progress set of notes with two goals: See also [MSC3902](https://github.com/matrix-org/matrix-spec-proposals/pull/3902). +The key idea is described by [MSC706](https://github.com/matrix-org/matrix-spec-proposals/pull/3902). This allows servers to +request a lightweight response to the federation `/send_join` endpoint. +This is called a **faster join**, also known as a **partial join**. In these +notes we'll usually use the word "partial" as it matches the database schema. + ## Overview: processing events in a partially-joined room The response to a partial join consists of From f231f4571caa746a56569502003f4e9fd1a14506 Mon Sep 17 00:00:00 2001 From: David Robertson Date: Fri, 27 Jan 2023 16:42:47 +0000 Subject: [PATCH 06/11] DB schema sorta --- .../synapse_architecture/faster_joins.md | 30 ++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index bff7419d1dcb..afccfc9a609e 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -24,7 +24,35 @@ Synapse marks the room as partially joined by adding a row to meaning that we have neither received nor computed the full state before/after `J`. This is done by adding a row to `partial_state_events`. -[**TODO**: include a DDL definition of the partial joins tables.] +
DB schema + +``` +matrix=> \d partial_state_events +Table "matrix.partial_state_events" + Column │ Type │ Collation │ Nullable │ Default +══════════╪══════╪═══════════╪══════════╪═════════ + room_id │ text │ │ not null │ + event_id │ text │ │ not null │ + +matrix=> \d partial_state_rooms + Table "matrix.partial_state_rooms" + Column │ Type │ Collation │ Nullable │ Default +════════════════════════╪════════╪═══════════╪══════════╪═════════ + room_id │ text │ │ not null │ + device_lists_stream_id │ bigint │ │ not null │ 0 + join_event_id │ text │ │ │ + joined_via │ text │ │ │ + +matrix=> \d partial_state_rooms_servers + Table "matrix.partial_state_rooms_servers" + Column │ Type │ Collation │ Nullable │ Default +═════════════╪══════╪═══════════╪══════════╪═════════ + room_id │ text │ │ not null │ + server_name │ text │ │ not null │ +``` + +Indices, foreign-keys and check constraints are omitted for brevity. +
While partially joined to a room, Synapse receives events `E` from remote homeservers as normal, and can create events at the request of its local users. From 70068148c56edc64ed0fc05cc0866f9417037d5f Mon Sep 17 00:00:00 2001 From: David Robertson Date: Fri, 27 Jan 2023 17:25:14 +0000 Subject: [PATCH 07/11] a little more prose --- .../synapse_architecture/faster_joins.md | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index afccfc9a609e..875d0d3308e1 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -200,16 +200,26 @@ A new replication stream `un_partial_stated_room` is added. Whenever a resync completes and a partial-state room becomes fully stated, a new message is sent into that stream containing the room ID. -## Specific cases +## Notes on specific cases > **NB.** The notes below are rough. Some of them are hidden under `
` disclosures because they have yet to be implemented in mainline Synapse. ### Creating events during a partial join -**TODO:** needs prose fleshing out. +When sending out messages during a partial join, we assume our partial state is +accurate and proceed as normal. There are two problems that this might cause: + +- We might create an event that is valid under our partial state, only to later + find out that is actually invalid according to the full state. +- Or: we might refuse to create an event that is invalid under our partial + state, even though it would be perfectly valid under the full state. + +To avoid these we want to our partial state to the necessary auth events, +ideally as close as possible to those of the full state. -Exactly the same. Pick <= 10 fwd extremities as prev events. +TODO: flesh out the bullets below. Link to the bit of the spec which says which +auth events you need to provide. Can you select auth events in the current (partial) state? - got power levels/create/join rules from the partial join. From f2eb29ca6792dfcc2b3c7f57207311abfb0e93b9 Mon Sep 17 00:00:00 2001 From: David Robertson Date: Fri, 27 Jan 2023 17:27:28 +0000 Subject: [PATCH 08/11] Changelog --- changelog.d/14677.doc | 1 + 1 file changed, 1 insertion(+) create mode 100644 changelog.d/14677.doc diff --git a/changelog.d/14677.doc b/changelog.d/14677.doc new file mode 100644 index 000000000000..7c275c77a896 --- /dev/null +++ b/changelog.d/14677.doc @@ -0,0 +1 @@ +Describe the ideas and the internal machinery behind faster joins. From b4088d510646c7e91e31987678b3e0bf562d983f Mon Sep 17 00:00:00 2001 From: David Robertson Date: Mon, 30 Jan 2023 12:51:29 +0000 Subject: [PATCH 09/11] Another pass --- .../synapse_architecture/faster_joins.md | 48 +++++++++++++------ 1 file changed, 34 insertions(+), 14 deletions(-) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index 875d0d3308e1..0780f2b82668 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -208,27 +208,47 @@ disclosures because they have yet to be implemented in mainline Synapse. ### Creating events during a partial join When sending out messages during a partial join, we assume our partial state is -accurate and proceed as normal. There are two problems that this might cause: +accurate and proceed as normal. For this to have any hope of succeeding at all, +our partial state must contain an entry for each of the (type, state key) pairs +[specified by the auth rules](https://spec.matrix.org/v1.3/rooms/v10/#authorization-rules): + +- `m.room.create` +- `m.room.join_rules` +- `m.room.power_levels` +- `m.room.third_party_invite` +- `m.room.member` + +The first four of these should be present in the state before `J` that is given +to us in the partial join response; only membership events are omitted. In order +for us to consider the user joined, we must have their membership event. That +means the only possible omission is the target's membership in an invite, kick +or ban. + +The worst possibility is that we locally invite someone who is banned according to +the full state, because we lack their ban in our current partial state. The rest +of the federation---at least, those who are fully joined---should correctly +enforce the [membership transition constraints]( + https://spec.matrix.org/v1.3/client-server-api/#room-membership +). So any the erroneous invite should be ignored by fully-joined +homeservers and resolved by the resync for partially-joined homeservers. + + + +In more generality, there are two problems we're worrying about here: - We might create an event that is valid under our partial state, only to later find out that is actually invalid according to the full state. - Or: we might refuse to create an event that is invalid under our partial state, even though it would be perfectly valid under the full state. -To avoid these we want to our partial state to the necessary auth events, -ideally as close as possible to those of the full state. - -TODO: flesh out the bullets below. Link to the bit of the spec which says which -auth events you need to provide. +However we expect such problems to be unlikely in practise, because -Can you select auth events in the current (partial) state? -- got power levels/create/join rules from the partial join. -- Will have the sender's membership: comes from a make_join handshake. -- Target's membership? E.g. kick or ban someone. - - May not have received the target's membership in the partial join response. - - Probably only going to kickban someone you've seen a msg from - - If you've seen their message, they'll have cited (some previous version of) their membership... - - Could create a reasonable looking ban event... but not a huge prioity; okay to block +- We trust that the room has sensible power levels, e.g. that bad actors with + high power levels are demoted before their ban. +- We trust that the resident server provides us up-to-date power levels, join + rules, etc. +- State changes in rooms are relatively infrequent, and the resync period is + relatively quick. #### Sending out the event over federation From b0a2bbd3bc37f469826bd6bd050a67e7d556b92b Mon Sep 17 00:00:00 2001 From: David Robertson Date: Mon, 30 Jan 2023 12:54:26 +0000 Subject: [PATCH 10/11] Update docs/development/synapse_architecture/faster_joins.md Co-authored-by: Shay --- docs/development/synapse_architecture/faster_joins.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index 0780f2b82668..ed6c9a6d1933 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -19,7 +19,7 @@ The response to a partial join consists of - a subset of the state of the room before `J`, - the full auth chain of that state subset. -Synapse marks the room as partially joined by adding a row to +Synapse marks the room as partially joined by adding a row to the database table `partial_state_rooms`. It also marks the join event `J` as "partially stated", meaning that we have neither received nor computed the full state before/after `J`. This is done by adding a row to `partial_state_events`. From aca9796489452dfcc13f5e90d5df3dd5accf1621 Mon Sep 17 00:00:00 2001 From: David Robertson Date: Mon, 30 Jan 2023 12:55:10 +0000 Subject: [PATCH 11/11] Fix footnote --- docs/development/synapse_architecture/faster_joins.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md index ed6c9a6d1933..c32d713b8af7 100644 --- a/docs/development/synapse_architecture/faster_joins.md +++ b/docs/development/synapse_architecture/faster_joins.md @@ -186,7 +186,7 @@ clients, only to reject it during the resync. From a client's perspective, the effect is similar to a retroactive state change due to state resolution---i.e. a "state reset".[^3] -[^3] Clients should refresh caches to detect such a change. Rumour has it that +[^3]: Clients should refresh caches to detect such a change. Rumour has it that sliding sync will fix this. When all events since the join `J` have been fully-stated, the room resync