From 3131afe142ebefca199360d47909870b85d937a3 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Tue, 9 Jul 2019 20:14:48 +0200 Subject: [PATCH 01/17] Add MSC2162 Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 65 +++++++++++++++++++ 1 file changed, 65 insertions(+) create mode 100644 proposals/2162-signaling-errors-at-bridges.md diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md new file mode 100644 index 00000000000..96333823b7c --- /dev/null +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -0,0 +1,65 @@ +# Signaling Errors at Bridges + +Sometimes bridges just silently swallow messages and other events. This proposal enables bridges to communicate that something went wrong and gives clients the option to give feedback to their users. + +## Proposal + +Bridges might come into a situation where there is nothing more they can do to successfully deliver an event to the foreign network they are connected to. Then they should be able to inform the originating group of the event about this delivery error. + +This document proposes the addition of a new PDU event with type `m.bridge_error`. It is sent by the bridge and references an event previously sent in a room, by that marking it as “failed to deliver” for all users of a bridge. The new event type utilizes reference aggregations ([MSC 1849](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1849/proposals/1849-aggregations.md)) to establish the relation to the event its delivery it is marking as failed. There is no need for a new endpoint as the existing `/send` endpoint will be utilized. + +Additional information contained in the event are the name of the bridged network (e.g. “Discord” or “Telegram”) and a regex¹ describing the affected users (e.g. `@discord_*:example.org`). This regex should be similar to the one any Application Service uses for marking its reserved user namespace. By providing this information clients can inform their users who in the room was affected by the error and for which network the error occurred. + +There are some common reasons why an error occurred. These are encoded in the `reason` attribute and can contain the following types: + +* `m.event_not_handled` Generic error type for when an event can not be handled by the bridge. It is used as a fallback when there is no other more specific reason. + +* `m.event_too_old` When the foreign network does not support timestamp massaging like Matrix does, a message will – with enough time passed – fall out of its original context. In this case the bridge might decide that the event is too old and emit this error. + +* `m.foreign_network_error` The bridge was doing its job fine, but the foreign network permanently refused to handle the event. + +* `m.unknown_event` The bridge is not able to handle events of this type. + +Nothing prevents multiple bridge error events to relate to the same event. This should be pretty common as a room can be bridged to more than one network at a time. + +The need for this proposal arises from a gap between the Matrix network and other foreign networks it bridges to. Matrix with its eventual consistency is unique in having a message delivery guarantee. Because of this property there is no need in the Matrix network itself to model the failure of message delivery. This need only arises for interactions with foreign networks where message delivery might fail. This proposal extends Matrix to be aware of these error cases. + +Additionally there might be some operational restrictions of bridges which might make it necessary for them to refrain from handling an event, e.g. when hitting memory limits. In this case the new event type can be used as well. + +### Example + +``` +{ + "type": "m.room.bridge_error", + "content": { + "network: "Discord", + "affected_users": "@discord_*:example.org", + "reason": "m.event_too_old", + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$some:event.id" + } + } +} +``` + +\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\ +¹ Or similar – see *Security Considerations* + +## Tradeoffs + +Without this proposal, bridges could still inform users in a room that a delivery failed by simply sending a plain message event from a bot account. This possibility carries the disadvantage of conveying no special semantic meaning with the consequence of clients not being able to adapt their presentation. + +A fixed set of error types might be too restrictive to express every possible condition. An alternative would be a free-form text for an error message. This brings the problems of less semantic meaning and a requirement for internationalization with it. In this proposal a generic error type is provided for error cases not considered in this MSC. + +## Potential issues + +When the foreign network is not the cause of the error signaled but the bridge itself (maybe under load), there might be an argument that responding to failed messages increases the pressure. + +## Security considerations + +Sending a custom regex with an event might open the doors for attacking a homeserver and/or a client by exposing a direct pathway to the complex code of a regex parser. Additionally sending arbitrary complex regexes might make Matrix more vulnerable to DoS attacks. To mitigate these risks it might be sensible to only allow a more restricted subset of regular expressions by e.g. requiring a maximal length or falling back to simple globbing. + +## Conclusion + +In this document a new permanent event is proposed which a bridge can emit to signal an error on its side. The event informs the affected room about which message errored for which reason; it gives information about the affected users and the bridged network. By implementing the proposal Matrix users will get more insight into the state of their (un)delivered messages and thus they will become less frustrated. From 3b468c745480439fc093ea6d9e3aae32bbfda6cb Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Wed, 10 Jul 2019 09:55:54 +0200 Subject: [PATCH 02/17] Block style paragraphs for MSC2162 Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 97 ++++++++++++++----- 1 file changed, 75 insertions(+), 22 deletions(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 96333823b7c..a0b4602c958 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -1,30 +1,64 @@ # Signaling Errors at Bridges -Sometimes bridges just silently swallow messages and other events. This proposal enables bridges to communicate that something went wrong and gives clients the option to give feedback to their users. +Sometimes bridges just silently swallow messages and other events. This proposal +enables bridges to communicate that something went wrong and gives clients the +option to give feedback to their users. ## Proposal -Bridges might come into a situation where there is nothing more they can do to successfully deliver an event to the foreign network they are connected to. Then they should be able to inform the originating group of the event about this delivery error. - -This document proposes the addition of a new PDU event with type `m.bridge_error`. It is sent by the bridge and references an event previously sent in a room, by that marking it as “failed to deliver” for all users of a bridge. The new event type utilizes reference aggregations ([MSC 1849](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1849/proposals/1849-aggregations.md)) to establish the relation to the event its delivery it is marking as failed. There is no need for a new endpoint as the existing `/send` endpoint will be utilized. - -Additional information contained in the event are the name of the bridged network (e.g. “Discord” or “Telegram”) and a regex¹ describing the affected users (e.g. `@discord_*:example.org`). This regex should be similar to the one any Application Service uses for marking its reserved user namespace. By providing this information clients can inform their users who in the room was affected by the error and for which network the error occurred. - -There are some common reasons why an error occurred. These are encoded in the `reason` attribute and can contain the following types: - -* `m.event_not_handled` Generic error type for when an event can not be handled by the bridge. It is used as a fallback when there is no other more specific reason. - -* `m.event_too_old` When the foreign network does not support timestamp massaging like Matrix does, a message will – with enough time passed – fall out of its original context. In this case the bridge might decide that the event is too old and emit this error. - -* `m.foreign_network_error` The bridge was doing its job fine, but the foreign network permanently refused to handle the event. +Bridges might come into a situation where there is nothing more they can do to +successfully deliver an event to the foreign network they are connected to. Then +they should be able to inform the originating group of the event about this +delivery error. + +This document proposes the addition of a new PDU event with type +`m.bridge_error`. It is sent by the bridge and references an event previously +sent in a room, by that marking it as “failed to deliver” for all users of a +bridge. The new event type utilizes reference aggregations ([MSC +1849](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1849/proposals/1849-aggregations.md)) +to establish the relation to the event its delivery it is marking as failed. +There is no need for a new endpoint as the existing `/send` endpoint will be +utilized. + +Additional information contained in the event are the name of the bridged +network (e.g. “Discord” or “Telegram”) and a regex¹ describing the affected +users (e.g. `@discord_*:example.org`). This regex should be similar to the one +any Application Service uses for marking its reserved user namespace. By +providing this information clients can inform their users who in the room was +affected by the error and for which network the error occurred. + +There are some common reasons why an error occurred. These are encoded in the +`reason` attribute and can contain the following types: + +* `m.event_not_handled` Generic error type for when an event can not be handled + by the bridge. It is used as a fallback when there is no other more specific + reason. + +* `m.event_too_old` When the foreign network does not support timestamp + massaging like Matrix does, a message will – with enough time passed – fall + out of its original context. In this case the bridge might decide that the + event is too old and emit this error. + +* `m.foreign_network_error` The bridge was doing its job fine, but the foreign + network permanently refused to handle the event. * `m.unknown_event` The bridge is not able to handle events of this type. -Nothing prevents multiple bridge error events to relate to the same event. This should be pretty common as a room can be bridged to more than one network at a time. +Nothing prevents multiple bridge error events to relate to the same event. This +should be pretty common as a room can be bridged to more than one network at a +time. -The need for this proposal arises from a gap between the Matrix network and other foreign networks it bridges to. Matrix with its eventual consistency is unique in having a message delivery guarantee. Because of this property there is no need in the Matrix network itself to model the failure of message delivery. This need only arises for interactions with foreign networks where message delivery might fail. This proposal extends Matrix to be aware of these error cases. +The need for this proposal arises from a gap between the Matrix network and +other foreign networks it bridges to. Matrix with its eventual consistency is +unique in having a message delivery guarantee. Because of this property there is +no need in the Matrix network itself to model the failure of message delivery. +This need only arises for interactions with foreign networks where message +delivery might fail. This proposal extends Matrix to be aware of these error +cases. -Additionally there might be some operational restrictions of bridges which might make it necessary for them to refrain from handling an event, e.g. when hitting memory limits. In this case the new event type can be used as well. +Additionally there might be some operational restrictions of bridges which might +make it necessary for them to refrain from handling an event, e.g. when hitting +memory limits. In this case the new event type can be used as well. ### Example @@ -48,18 +82,37 @@ Additionally there might be some operational restrictions of bridges which might ## Tradeoffs -Without this proposal, bridges could still inform users in a room that a delivery failed by simply sending a plain message event from a bot account. This possibility carries the disadvantage of conveying no special semantic meaning with the consequence of clients not being able to adapt their presentation. +Without this proposal, bridges could still inform users in a room that a +delivery failed by simply sending a plain message event from a bot account. This +possibility carries the disadvantage of conveying no special semantic meaning +with the consequence of clients not being able to adapt their presentation. -A fixed set of error types might be too restrictive to express every possible condition. An alternative would be a free-form text for an error message. This brings the problems of less semantic meaning and a requirement for internationalization with it. In this proposal a generic error type is provided for error cases not considered in this MSC. +A fixed set of error types might be too restrictive to express every possible +condition. An alternative would be a free-form text for an error message. This +brings the problems of less semantic meaning and a requirement for +internationalization with it. In this proposal a generic error type is provided +for error cases not considered in this MSC. ## Potential issues -When the foreign network is not the cause of the error signaled but the bridge itself (maybe under load), there might be an argument that responding to failed messages increases the pressure. +When the foreign network is not the cause of the error signaled but the bridge +itself (maybe under load), there might be an argument that responding to failed +messages increases the pressure. ## Security considerations -Sending a custom regex with an event might open the doors for attacking a homeserver and/or a client by exposing a direct pathway to the complex code of a regex parser. Additionally sending arbitrary complex regexes might make Matrix more vulnerable to DoS attacks. To mitigate these risks it might be sensible to only allow a more restricted subset of regular expressions by e.g. requiring a maximal length or falling back to simple globbing. +Sending a custom regex with an event might open the doors for attacking a +homeserver and/or a client by exposing a direct pathway to the complex code of a +regex parser. Additionally sending arbitrary complex regexes might make Matrix +more vulnerable to DoS attacks. To mitigate these risks it might be sensible to +only allow a more restricted subset of regular expressions by e.g. requiring a +maximal length or falling back to simple globbing. ## Conclusion -In this document a new permanent event is proposed which a bridge can emit to signal an error on its side. The event informs the affected room about which message errored for which reason; it gives information about the affected users and the bridged network. By implementing the proposal Matrix users will get more insight into the state of their (un)delivered messages and thus they will become less frustrated. +In this document a new permanent event is proposed which a bridge can emit to +signal an error on its side. The event informs the affected room about which +message errored for which reason; it gives information about the affected users +and the bridged network. By implementing the proposal Matrix users will get more +insight into the state of their (un)delivered messages and thus they will become +less frustrated. From 6606c84c63a4ab362653f8e3085fa8ef7123cbde Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Sat, 3 Aug 2019 12:52:41 +0200 Subject: [PATCH 03/17] Small fixes Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 22 ++++++++++--------- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index a0b4602c958..0d60985528a 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -8,13 +8,16 @@ option to give feedback to their users. Bridges might come into a situation where there is nothing more they can do to successfully deliver an event to the foreign network they are connected to. Then -they should be able to inform the originating group of the event about this +they should be able to inform the originating room of the event about this delivery error. -This document proposes the addition of a new PDU event with type +### Bridge error event + +This document proposes the addition of a new room event with type `m.bridge_error`. It is sent by the bridge and references an event previously -sent in a room, by that marking it as “failed to deliver” for all users of a -bridge. The new event type utilizes reference aggregations ([MSC +sent in the same room, by that marking the original event as “failed to deliver” +for all users of a bridge. The new event type utilizes reference aggregations +([MSC 1849](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1849/proposals/1849-aggregations.md)) to establish the relation to the event its delivery it is marking as failed. There is no need for a new endpoint as the existing `/send` endpoint will be @@ -34,10 +37,9 @@ There are some common reasons why an error occurred. These are encoded in the by the bridge. It is used as a fallback when there is no other more specific reason. -* `m.event_too_old` When the foreign network does not support timestamp - massaging like Matrix does, a message will – with enough time passed – fall - out of its original context. In this case the bridge might decide that the - event is too old and emit this error. +* `m.event_too_old` A message will – with enough time passed – fall out of its + original context. In this case the bridge might decide that the event is too + old and emit this error. * `m.foreign_network_error` The bridge was doing its job fine, but the foreign network permanently refused to handle the event. @@ -60,11 +62,11 @@ Additionally there might be some operational restrictions of bridges which might make it necessary for them to refrain from handling an event, e.g. when hitting memory limits. In this case the new event type can be used as well. -### Example +This is an example of how the new bridge error might look: ``` { - "type": "m.room.bridge_error", + "type": "m.bridge_error", "content": { "network: "Discord", "affected_users": "@discord_*:example.org", From c6b8c085ea66a0e71077e09d0226d2bba1e8a0e6 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Sat, 3 Aug 2019 13:00:18 +0200 Subject: [PATCH 04/17] Add m.bridge_unavailable and m.no_permission as error types Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 0d60985528a..e5b118ff801 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -46,6 +46,11 @@ There are some common reasons why an error occurred. These are encoded in the * `m.unknown_event` The bridge is not able to handle events of this type. +* `m.bridge_unavailable` The homeserver couldn't reach the bridge. + +* `m.no_permission` The bridge wanted to react to an event, but didn't have + the permission to do so. + Nothing prevents multiple bridge error events to relate to the same event. This should be pretty common as a room can be bridged to more than one network at a time. From cf6723c7dceb45fbf630c8f81d7c596cafdaabe3 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Sat, 3 Aug 2019 13:07:14 +0200 Subject: [PATCH 05/17] Add time_to_permanent property Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index e5b118ff801..77e3fdbc632 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -51,6 +51,15 @@ There are some common reasons why an error occurred. These are encoded in the * `m.no_permission` The bridge wanted to react to an event, but didn't have the permission to do so. +The bridge error can provide a `time_to_permanent` field. If this field is +present it gives the time in seconds one has to wait before declaring the bridge +error as permanent. As long as an error is younger than this time, the client +can expect the possibility of the error being revoked. If a bridge error is +permanent, it should not be revoked anymore. In addition, the field may also +accept the string "never", which means that the error will never be considered +permanent. In case this field is missing, its value is assumed to be 0 and the +error becomes permanent instantly. + Nothing prevents multiple bridge error events to relate to the same event. This should be pretty common as a room can be bridged to more than one network at a time. @@ -75,7 +84,8 @@ This is an example of how the new bridge error might look: "content": { "network: "Discord", "affected_users": "@discord_*:example.org", - "reason": "m.event_too_old", + "reason": "m.bridge_unavailable", + "time_to_permanent": 900, "m.relates_to": { "rel_type": "m.reference", "event_id": "$some:event.id" From 909f0c0c61fd33281193e178f3c3ef8658408947 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Sat, 3 Aug 2019 13:14:55 +0200 Subject: [PATCH 06/17] Add note about recursive bridge errors Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 77e3fdbc632..60fa8e95023 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -60,9 +60,15 @@ accept the string "never", which means that the error will never be considered permanent. In case this field is missing, its value is assumed to be 0 and the error becomes permanent instantly. -Nothing prevents multiple bridge error events to relate to the same event. This -should be pretty common as a room can be bridged to more than one network at a -time. +Notes: + +- Nothing prevents multiple bridge error events to relate to the same event. + This should be pretty common as a room can be bridged to more than one network + at a time. + +- A bridge might choose to handle bridge error events, but this should never + result in emitting a new bridge error as this could lead to an endless + recursion. The need for this proposal arises from a gap between the Matrix network and other foreign networks it bridges to. Matrix with its eventual consistency is From 3ef997c6eb9fc6bee0db26b2e1ccf32b27c3e7ce Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Sat, 3 Aug 2019 13:28:05 +0200 Subject: [PATCH 07/17] Add clarifications to m.unknown_event Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 60fa8e95023..38b52990de2 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -44,7 +44,11 @@ There are some common reasons why an error occurred. These are encoded in the * `m.foreign_network_error` The bridge was doing its job fine, but the foreign network permanently refused to handle the event. -* `m.unknown_event` The bridge is not able to handle events of this type. +* `m.unknown_event` The bridge is not able to handle events of this type. It is + totally legitimate to “handle” an event by doing nothing and not throwing this + error. It is at the discretion of the bridge author to find a good balance + between informing the user and preventing unnecessary spam. Throwing this + error only for some subtypes of an event if fine. * `m.bridge_unavailable` The homeserver couldn't reach the bridge. From 766e9dc3c42f54f8c503ad216d25156000912a59 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Sat, 3 Aug 2019 13:43:28 +0200 Subject: [PATCH 08/17] Add section for retries and error revocation Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 97 ++++++++++++++++++- 1 file changed, 93 insertions(+), 4 deletions(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 38b52990de2..832cca9253a 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -2,14 +2,17 @@ Sometimes bridges just silently swallow messages and other events. This proposal enables bridges to communicate that something went wrong and gives clients the -option to give feedback to their users. +option to give feedback to their users. Clients are given the possibility to +retry a failed event and bridges can signal the success of the retry. ## Proposal Bridges might come into a situation where there is nothing more they can do to successfully deliver an event to the foreign network they are connected to. Then they should be able to inform the originating room of the event about this -delivery error. +delivery error. The user in turn should be able to instruct the bridge to retry +sending the message that was presented him as failed; the bridge should have the +ability to mark an error as being revoked. ### Bridge error event @@ -107,6 +110,78 @@ This is an example of how the new bridge error might look: \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\ ¹ Or similar – see *Security Considerations* +### Retries and error revocation + +Providing a way to retry a failed message delivery gives the sender control over +the importance of her message. An extra procedure for a retry is necessary as +the message might have been delivered to some users (those not on the bridge) +and this would produce duplicate messages for them. + +A retry request is posted by the client to the room for all bridges to see it, +referencing the original event. By inspecting the sender of all related +`m.bridge_error` events, under all bridges the correct one can find out that it +is responsible. The responsible bridge re-fetches the original event and retries +to deliver it. + +A successful retry should be communicated by revoking (not redacting) the +original error that made the retry necessary. Revocation is done by an event +with the type `m.bridge_error_revoke` which references the original event. The +error(s) having a sender of the same bridge as the revocation event are +considered revoked. Clients can show a revoke error e.g. as “Delivered to +Discord at 14:52.” besides the original event. + +On an unsuccessful retry the bridge may edit the errors content to reflect the +new state, e.g. because the type of error changed or to communicate the new +time. + +Example of the new retry events: + +``` +{ + "type": "m.bridge_retry", + "content": { + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$original:event.id" + } + } +} +``` + +``` +{ + "type": "m.bridge_error_revoke", + "content": { + "m.relates_to": { + "rel_type": "m.reference", + "event_id": "$original:event.id" + } + } +} +``` + +Overview of the relations between the different event types: + +``` + ________________ _____________________ +| | | | +| Original Event |-+-| Bridge Error | +|________________| | |_____________________| + | _____________________ + | | | + +-| Retry Request | + m.references | |_____________________| + | _____________________ + | | | + +-| Bridge Error Revoke | + |_____________________| +``` + +A retry might not make much sense for every kind of error e.g. retrying +`m.unknown_event` will probably result in the same error again. Clients may +choose to disable retry options for those cases, but it is not restricted +otherwise. + ## Tradeoffs Without this proposal, bridges could still inform users in a room that a @@ -120,6 +195,20 @@ brings the problems of less semantic meaning and a requirement for internationalization with it. In this proposal a generic error type is provided for error cases not considered in this MSC. +The nature of a retry request from a client to the bridge lends it more to an +ephemeral type of transport than something permanent like a PDU, but it was +advised against it for The Spec doesn't make implementations of new EDU types +easy. Applications Services in general don't allow listening to EDUs, so further +changes to The Spec would be necessary before following the probably more +appropriate route here. + +A new event type `m.bridge_error_revoke` is introduced for revoking a bridge +error. Alternatively it could be considered to redact the bridge error event, +which would eliminate the need for the revocation event and would make this +proposal a little simpler. The disadvantage of this approach is the missing +transparency and context of who had which information at which point in time. +This additional information should make for a better user experience. + ## Potential issues When the foreign network is not the cause of the error signaled but the bridge @@ -137,8 +226,8 @@ maximal length or falling back to simple globbing. ## Conclusion -In this document a new permanent event is proposed which a bridge can emit to -signal an error on its side. The event informs the affected room about which +In this document an event is proposed for bridges to signal errors and a way to +retry and revoke those errors. The event informs the affected room about which message errored for which reason; it gives information about the affected users and the bridged network. By implementing the proposal Matrix users will get more insight into the state of their (un)delivered messages and thus they will become From 4e852c714ca4caf37c9fb7a256d853726098e812 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Sat, 3 Aug 2019 13:49:35 +0200 Subject: [PATCH 09/17] Add special case of unavailable bridge Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 22 +++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 832cca9253a..3e789605bbd 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -182,6 +182,28 @@ A retry might not make much sense for every kind of error e.g. retrying choose to disable retry options for those cases, but it is not restricted otherwise. +### Special case: Unavailable bridge + +In the case the bridge is down or otherwise disconnected from the homeserver, it +naturally has no way to inform its users about the unavailability. In this case +the homeserver can stand in as an agent for the bridge and answer requests in +its responsibility. + +For this to happen, the homeserver will send out a bridge error event in the +moment a transaction delivery to the bridge failed. The clients at this point +will start showing an error. When the bridge comes back online it will encounter +a higher-than-normal load as all events accumulated over the downtime are +flooding in. To handle this scenario well, the bridge will want to simply +discard all messages older than a given threshold and not bother with sending +any answer back. + +By including a timeout in the `time_to_permanent` field of the event, the client +will know without further feedback from the homeserver or bridge when the +message won't be delivered anymore. + +For those events still accepted by the bridge, the error must be revoked by a +`m.bridge_error_revoke` as described in the previous chapter. + ## Tradeoffs Without this proposal, bridges could still inform users in a room that a From 655602430720c8a86f40b6dba124cc001762d71a Mon Sep 17 00:00:00 2001 From: V02460 Date: Sun, 4 Aug 2019 19:10:19 +0200 Subject: [PATCH 10/17] Fix grammar and wording Co-Authored-By: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com> --- proposals/2162-signaling-errors-at-bridges.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 3e789605bbd..25269c630e0 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -51,12 +51,12 @@ There are some common reasons why an error occurred. These are encoded in the totally legitimate to “handle” an event by doing nothing and not throwing this error. It is at the discretion of the bridge author to find a good balance between informing the user and preventing unnecessary spam. Throwing this - error only for some subtypes of an event if fine. + error only for some subtypes of an event is fine. * `m.bridge_unavailable` The homeserver couldn't reach the bridge. -* `m.no_permission` The bridge wanted to react to an event, but didn't have - the permission to do so. +* `m.no_permission` The bridge wanted to handle an event, but didn't have the + permission to do so. The bridge error can provide a `time_to_permanent` field. If this field is present it gives the time in seconds one has to wait before declaring the bridge @@ -127,10 +127,10 @@ A successful retry should be communicated by revoking (not redacting) the original error that made the retry necessary. Revocation is done by an event with the type `m.bridge_error_revoke` which references the original event. The error(s) having a sender of the same bridge as the revocation event are -considered revoked. Clients can show a revoke error e.g. as “Delivered to +considered revoked. Clients can show a revocation message e.g. as “Delivered to Discord at 14:52.” besides the original event. -On an unsuccessful retry the bridge may edit the errors content to reflect the +On an unsuccessful retry the bridge may edit the error's content to reflect the new state, e.g. because the type of error changed or to communicate the new time. @@ -163,6 +163,7 @@ Example of the new retry events: Overview of the relations between the different event types: ``` + m.references ________________ _____________________ | | | | | Original Event |-+-| Bridge Error | @@ -170,7 +171,7 @@ Overview of the relations between the different event types: | _____________________ | | | +-| Retry Request | - m.references | |_____________________| + | |_____________________| | _____________________ | | | +-| Bridge Error Revoke | @@ -187,7 +188,7 @@ otherwise. In the case the bridge is down or otherwise disconnected from the homeserver, it naturally has no way to inform its users about the unavailability. In this case the homeserver can stand in as an agent for the bridge and answer requests in -its responsibility. +its absence. For this to happen, the homeserver will send out a bridge error event in the moment a transaction delivery to the bridge failed. The clients at this point From 4fe8ffeced4f821efbeae8e30ee44ceca4e2a7f6 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Mon, 5 Aug 2019 16:32:14 +0200 Subject: [PATCH 11/17] Add chapter about rights management Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 25269c630e0..885b69110e4 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -205,6 +205,30 @@ message won't be delivered anymore. For those events still accepted by the bridge, the error must be revoked by a `m.bridge_error_revoke` as described in the previous chapter. +### Rights management + +Only bridges should be allowed to send bridge errors and revocations. + +Utilizing the rights system of the room provides a good approximation to this +behavior. It is fine to use it under the assumptions that + +- `m.bridge_error` and `m.bridge_error_revoke` require admin power levels. +- there is always the bridge bot user or a virtual user in the bridge's + namespace present in the room. +- at least one of those users possesses admin power level. +- all users with admin power levels are trusted. + +In short, this requires giving bridges admin power levels in a room and trusting +them to restrict their actions to their own business. It is enough to have one +privileged bridge user in the room. In public rooms this is most commonly the +bridge bot user with admin power level available and in 1:1 conversations it is +the puppeted conversation partner which does generally have admin power levels +as well. + +As long as the above assumptions are met, it is fine to not explicitly denote +bridges and bridge users as such and simply rely on the power levels for access +control to the new events. + ## Tradeoffs Without this proposal, bridges could still inform users in a room that a From 0212729b9dc3016d67d59fdf0615aac08dccf03b Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Mon, 5 Aug 2019 16:33:11 +0200 Subject: [PATCH 12/17] Add alternatives regarding MSC 1410 Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 23 +++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 885b69110e4..ae9943de778 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -14,6 +14,15 @@ delivery error. The user in turn should be able to instruct the bridge to retry sending the message that was presented him as failed; the bridge should have the ability to mark an error as being revoked. +If [MSC 1410: Rich +Bridging](https://github.com/matrix-org/matrix-doc/issues/1410) is utilized for +this proposal it would additionally give the benefits of + +- trimming the number of properties required in each bridge error event by + separately providing these general infos about the bridge in the room state instead. +- not requiring users representing the bridge to have admin power levels + (see [Rights management](#rights-management)). + ### Bridge error event This document proposes the addition of a new room event with type @@ -33,6 +42,11 @@ any Application Service uses for marking its reserved user namespace. By providing this information clients can inform their users who in the room was affected by the error and for which network the error occurred. +*Those two fields will not be required if the variant with [MSC 1410: Rich +Bridging](https://github.com/matrix-org/matrix-doc/issues/1410) is adopted. In +this case the same information is stored alongside other bridge metadata in the +room state* + There are some common reasons why an error occurred. These are encoded in the `reason` attribute and can contain the following types: @@ -229,6 +243,15 @@ As long as the above assumptions are met, it is fine to not explicitly denote bridges and bridge users as such and simply rely on the power levels for access control to the new events. +An alternative for the above solution is the adoption of [MSC 1410: Rich +Bridging](https://github.com/matrix-org/matrix-doc/issues/1410). It stores +information about users affiliation to a bridge in the room state. Instead of +checking power levels of users, rich bridging can be utilized by checking the +room state and only allow valid representatives of the bridge to send bridge +errors and their revocations. This alternative has the advantage of not +requiring agents of the bridge to be powerful. They would be verifiable and +could be trusted without any restrictions regarding their power levels. + ## Tradeoffs Without this proposal, bridges could still inform users in a room that a From 6eeb102e05b5de3a1c96c279dab97ed7a6f6ea38 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Mon, 5 Aug 2019 16:33:57 +0200 Subject: [PATCH 13/17] Add security consideration regarding power levels Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index ae9943de778..919c99f7088 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -294,6 +294,12 @@ more vulnerable to DoS attacks. To mitigate these risks it might be sensible to only allow a more restricted subset of regular expressions by e.g. requiring a maximal length or falling back to simple globbing. +When utilizing power levels instead of building on [MSC 1410: Rich +Bridging](https://github.com/matrix-org/matrix-doc/issues/1410) a malicious user +who has enough power to send `m.bridge_error` or `m.bridge_error_revoke` is able +to impersonate a bridge. She will be able to wrongly mark messages as failed to +deliver or revoke errors when they were not successfully retried. + ## Conclusion In this document an event is proposed for bridges to signal errors and a way to From ab27cca6540d210f2b387ad651ed4d76d8b6c1f4 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Mon, 5 Aug 2019 16:34:33 +0200 Subject: [PATCH 14/17] Add note about homeserver impersonation Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 919c99f7088..95a5ebe4f22 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -219,6 +219,10 @@ message won't be delivered anymore. For those events still accepted by the bridge, the error must be revoked by a `m.bridge_error_revoke` as described in the previous chapter. +**Note:** For this to work, the homeserver is required to impersonate a user of +the bridge as it has no agent of its own. The impersonated user would be the +bridge bot user or one of the virtual users in the bridge's namespace. + ### Rights management Only bridges should be allowed to send bridge errors and revocations. From d0cd9d4e0267de1b50d6c63d9772443dc30ee80a Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Mon, 5 Aug 2019 21:54:27 +0200 Subject: [PATCH 15/17] Make affected_users use an regex array Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 95a5ebe4f22..86d42fd0161 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -36,11 +36,11 @@ There is no need for a new endpoint as the existing `/send` endpoint will be utilized. Additional information contained in the event are the name of the bridged -network (e.g. “Discord” or “Telegram”) and a regex¹ describing the affected -users (e.g. `@discord_*:example.org`). This regex should be similar to the one -any Application Service uses for marking its reserved user namespace. By -providing this information clients can inform their users who in the room was -affected by the error and for which network the error occurred. +network (e.g. “Discord” or “Telegram”) and a regex array¹ describing the +affected users (e.g. `@discord_.*:example.org`). This regex array should be +similar to the one any Application Service uses for marking its reserved user +namespace. By providing this information clients can inform their users who in +the room was affected by the error and for which network the error occurred. *Those two fields will not be required if the variant with [MSC 1410: Rich Bridging](https://github.com/matrix-org/matrix-doc/issues/1410) is adopted. In @@ -110,7 +110,7 @@ This is an example of how the new bridge error might look: "type": "m.bridge_error", "content": { "network: "Discord", - "affected_users": "@discord_*:example.org", + "affected_users": ["@discord_.*:example.org"], "reason": "m.bridge_unavailable", "time_to_permanent": 900, "m.relates_to": { @@ -122,7 +122,7 @@ This is an example of how the new bridge error might look: ``` \_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\_\ -¹ Or similar – see *Security Considerations* +¹ Or similar – see [Security Considerations](#security-considerations) ### Retries and error revocation From 9e1f20a511af1e998cbfa520b9f8ddcaf73da904 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Tue, 6 Aug 2019 10:22:21 +0200 Subject: [PATCH 16/17] Update time_to_permanent description Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 12 +++++------- 1 file changed, 5 insertions(+), 7 deletions(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index 86d42fd0161..e5f1d916639 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -73,13 +73,11 @@ There are some common reasons why an error occurred. These are encoded in the permission to do so. The bridge error can provide a `time_to_permanent` field. If this field is -present it gives the time in seconds one has to wait before declaring the bridge -error as permanent. As long as an error is younger than this time, the client -can expect the possibility of the error being revoked. If a bridge error is -permanent, it should not be revoked anymore. In addition, the field may also -accept the string "never", which means that the error will never be considered -permanent. In case this field is missing, its value is assumed to be 0 and the -error becomes permanent instantly. +present it gives the time in milliseconds one has to wait before declaring the +bridge error as permanent. As long as an error is younger than this time, the +client can expect the possibility of the error being revoked. If a bridge error +is permanent, it should not be revoked anymore. In case this field is missing, +the error will never be considered permanent. Notes: From 54e0546047156a2ea338bc44845e3be6a961cfd0 Mon Sep 17 00:00:00 2001 From: "Kai A. Hiller" Date: Tue, 6 Aug 2019 11:08:11 +0200 Subject: [PATCH 17/17] Change m.relates_to to m.relationship in examples Signed-off-by: Kai A. Hiller --- proposals/2162-signaling-errors-at-bridges.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/proposals/2162-signaling-errors-at-bridges.md b/proposals/2162-signaling-errors-at-bridges.md index e5f1d916639..9414f9b2eb6 100644 --- a/proposals/2162-signaling-errors-at-bridges.md +++ b/proposals/2162-signaling-errors-at-bridges.md @@ -30,7 +30,7 @@ This document proposes the addition of a new room event with type sent in the same room, by that marking the original event as “failed to deliver” for all users of a bridge. The new event type utilizes reference aggregations ([MSC -1849](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1849/proposals/1849-aggregations.md)) +1849](https://github.com/matrix-org/matrix-doc/blob/matthew/msc1849/proposals/1849-aggregations.md#relation-types)) to establish the relation to the event its delivery it is marking as failed. There is no need for a new endpoint as the existing `/send` endpoint will be utilized. @@ -111,7 +111,7 @@ This is an example of how the new bridge error might look: "affected_users": ["@discord_.*:example.org"], "reason": "m.bridge_unavailable", "time_to_permanent": 900, - "m.relates_to": { + "m.relationship": { "rel_type": "m.reference", "event_id": "$some:event.id" } @@ -152,7 +152,7 @@ Example of the new retry events: { "type": "m.bridge_retry", "content": { - "m.relates_to": { + "m.relationship": { "rel_type": "m.reference", "event_id": "$original:event.id" } @@ -164,7 +164,7 @@ Example of the new retry events: { "type": "m.bridge_error_revoke", "content": { - "m.relates_to": { + "m.relationship": { "rel_type": "m.reference", "event_id": "$original:event.id" }