Initial stab at documenting soft fail #1641

erikjohnston · 2018-08-31T15:01:54Z

No description provided.

turt2live · 2018-08-31T15:05:08Z

specification/server_server_api.rst

@@ -570,6 +570,32 @@ transaction request to be responded to with an error response.
    result in the user being considered joined.


+Soft Failure


title casing ("Soft failure")

specification/server_server_api.rst

…on API

turt2live

lgtm although review from @uhoreg would be great :)

uhoreg · 2018-10-10T16:13:37Z

specification/server_server_api.rst

+
+When an event is "soft failed" it should not be relayed to the client nor be
+referenced by new events created by the homeserver. If an event is received that
+references the soft failed event then the new event should be handled as usual.


To follow up with the comments made in Matrix, I think if you said something along the lines of: if the soft-failed event is a state event, then it still participates in state resolution as usual, and if the state resolution algorithm resolves to using that event, then the event will be sent to clients. Aside from this, I don't see any issues.

jevolk · 2018-10-10T21:58:33Z

@uhroreg Can you quote what was said in Matrix please? Otherwise this record is incomplete and confusing in the long-term.

…

On Oct 10, 2018 9:14 AM, "Hubert Chathi" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In specification/server_server_api.rst <#1641 (comment)> : > @@ -570,6 +570,37 @@ transaction request to be responded to with an error response. result in the user being considered joined. +Soft failure +++++++++++++ + +When the homeserver receives a new event over federation it should also check +whether the event passes auth checks based on the current state of the room +(as well as based on the state at the event). If the event does not pass the +auth checks it should be "soft failed". + +When an event is "soft failed" it should not be relayed to the client nor be +referenced by new events created by the homeserver. If an event is received that +references the soft failed event then the new event should be handled as usual. To follow up with the comments made in Matrix, I think if you said something along the lines of: if the soft-failed event is a state event, then it still participates in state resolution as usual, and if the state resolution algorithm resolves to using that event, then the event will be sent to clients. Aside from this, I don't see any issues. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#1641 (review)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEdz0sTHZlTxnDbMVDnOppHAHPJ2L8XNks5ujhzbgaJpZM4WVX8T> .

uhoreg · 2018-10-11T15:21:16Z

@jevolk There wasn't really much of a discussion in Matrix. It was mainly just me saying "I'm confused," and then "Oh wait, I think I get it now," and then Erik confirming my understanding. But for reference, here's the conversation:

erikjohnston · 2018-10-17T10:23:42Z

@uhoreg does that make more sense?

uhoreg · 2018-10-17T15:52:09Z

Yup, that's clearer. Thanks.

ara4n · 2018-10-17T16:22:37Z

specification/server_server_api.rst

+resolution, and so can appear in the state of events that reference the soft
+failed state event. (When this happens the soft failed event should be sent to
+clients).
+


I've deliberately not read the other reviews on this, so apologies if i'm re-treading the same ground, but reading this pretty naively makes me think:

"Why should the arrival of a new event which references the soft-failed event cause the soft-failed event to suddenly be unfailed? If an attacker can just send another event immediately after the previous one which references it in order to unfail it, what's the point of the soft-fail in the first place?"

I assume the answer here is "it doesn't unfail the first event", but this really isn't clear to a casual observer - especially as it says the soft-failed event will be sent to clients & participate in state res, which makes it sound pretty unfailed.

I've tried to clarify further, but its worth noting a few things:

These events aren't being rejected, just ignored. If a legitimate event comes in that references them then they will start becoming part of the state of the room and we should probably not ignore e.g. if the soft failed events become part of state. State resolution v2 will make it a lot harder for this to allow abuse, fwiw

If an attacker sends a second event then that'll almost certainly be soft failed and ignored too

What is the intended legitimate use-case for first ignoring an event, excluding it from state, and then accepting it for inclusion?

I don't think it's really "excluded" from state per se. State is defined with respect to an event, so if no event references the soft-failed event, then there isn't really any place where the soft-failed event would show up in state anyways (other than in the state with respect to the soft-failed event, but if we aren't doing anything with it yet, then we can just ignore it for now).

One example of a situation where this occurs is: Alice is on server A, Bob is on server B. Alice and Bob are in some room, along with some other servers. Alice is admin, and Bob is a mod. Bob gets taken over by mind-controlling aliens, starts spewing random garbage into the room, and sets the room topic to nonsense. Alice mutes Bob (which also deops him in the room). The mute happens at roughly the same time as Bob setting the room topic, such that some servers might see the mute before the topic change, and some servers might see the topic change before the mute. If every server sees the same order of events, then everything is fine.

However, say that Carol's server receives Bob's first topic change before it receives the mute event, and they send a "WTF?" message, which references that topic change as a prev_event. What is the state at that message event? All servers should see the same state for that event, so the topic change has to be taken into consideration when resolving the state at that event. (This would also have the side effect of letting Alice know that some users in the room are seeing a new topic, as she would otherwise not know unless she was a server admin and trawled through the DAG.)

Now if Alice sends a message in reply to Carol's message, which references (possibly via other events) both Carol's event and the mute event, then ideally state resolution would resolve the topic back to the pre-alien state. But dealing with this situation at this point is up to the state resolution algorithm.

But that isn't really an example of a soft-fail. The state of the room at the point of every event in your example was always valid and nothing was ever ignored until being overridden by having the v2 algorithm determine a winning branch at the event which joined Alice's to Bob+Carol. That's fine, but there's more being described here.

First, some notion of soft-fail already exists in synapse and is owed some documentation (considering the title of this issue). The behavior apparent in synapse persists events which are invalid against some "soft" logic test. This is in contrast with a "hard" logic test: e.g. an invalid signature causes an unambiguous rejection of an event. The "soft" failure instead can be triggered by violating room logic: e.g. sending an event requiring a power_level not granted to the sender.

Soft-failure causes the event to be persisted and maintained, but inert without affecting the room. Soft-failed events are not immediately included in federation state and backfill responses, but are presented to servers by the make_join endpoint as a so-called "forward extremity" to be referenced in the next join event. Thus a soft-failed event has the potential to be integrated in the DAG by other servers who don't know or care to agree to its failure -- those cases founding various incoherencies and the so-called "state reset" phenomenon.

What is being described here by @erikjohnston is logic which immediately considers an event in violation of some rule which renders it inert (ignored), but at the same time it must also be persisted to maintain a traceable graph. Even if a choice is made to not persist it, other servers may persist it and include it as a sole reference in the graph. I'll quote the text to be specific:

If the event does not pass the auth checks it should be "soft failed".

@uhoreg This is describing a failure condition at the point of the event. Your example, in contrast, had no such failure condition at the point the event (the topic change by Bob was valid when Bob issued it, and when Carol witnessed it).

When an event is "soft failed" it should not be relayed to the client nor be referenced by new events created by the homeserver.

This is describing behavior which renders the event inert at the point of the event. @uhoreg your example in contrast has already relayed the event to the client and has considered it a valid referable for further events issued by Bob and Carol.

If an event is received that references the soft failed event then the new event should be handled as usual. Soft failed state events participate in state resolution, and so can appear in the state of events that reference the soft failed state event.

Here's where it gets hairy. This is where the server shifts its philosophy from being an independent thinker, having ignored the effects of the event and refusing to gossip it, to being something more of a bovine crowd-follower. While @uhoreg describes very desirable eventually consistent logic blessed by the v2 algorithm I don't think it quite completes the documentation of this soft-failure behavior as written nor even the purpose of this design. There are further complications here: foremost an ambiguity of what gets sent to the client, when and why (an action which cannot really be rolled back in the current protocol, though I wish it were possible) thus leading to a lot of bad things.

some reason to be unfailed due to some later revelation and further state resolution ... violates the fundamental light-cone property of the DAG.

I'll actually concede that in the current system (even v2) there is no limitation to further state resolutions unfailing a soft-failed event perhaps even recursing all the way back to the room's create event. I still maintain that without any realistic limitation this is trouble on many levels :(

So if no servers accept it, then no valid events should reference it. So I think that any events that reference it should be rejected as well.

That may very well be acceptable, but it's a degradation of the robustness gained from soft-failure; in a system structured as a [nearly] linked-list it's important that bugs, regressions, spec pitfalls and implementation ambiguities don't split the room. This can be accidental or it can be exploited for denial of service. So before the rules become more rigid this should be given some thought.

Realistically though, especially in the current system, it's very plausible that a majority faction of servers will accept a history containing a bad event with a single reference and "move on" for any number of reasons and your server is either left to deal with that or sit in a deep freeze at the point right before that bad event. Since this isn't bitcoin, and specifically we're dealing with communication, perhaps robustness-oriented solutions aren't such a bad philosophy.

This section is specifically dealing with events that pass auth based on the state at the event itself.

Let's consider just this case then for a moment; so we're just considering events which started as valid but transitioned to invalid due to a revelation brought by a later event:

events that are soft-failed are held off to the side ... and don't actually get attached to the DAG until some other (valid) event pulls them in.

Such an event transitioned from valid to invalid because the light-cone was enlarged to invalidate whatever had been validating it (i.e. its closest power_levels event). Now we're holding it in limbo in case a further revelation enlarges the light-cone invalidating the event which invalidated our failed event, transitioning it back to valid.

The crux of the problem here is with the last part: "transitioning it back to valid" (or as you said "pulls them in"). What is the formality that determines the event ought to become valid again after having transitioned to invalid from initially being valid. This cherry-picking seems arbitrary at this point. Most eventually consistent systems invalidate entire branches unless very specific granular recombination behavior is specified.

events that are soft-failed are held off to the side ... and don't actually get attached to the DAG

This section is specifically dealing with events that pass auth based on the state at the event itself.

This proposal is to deal with users who used to have some privileges in a room, but had their privileges revoked.

These statements appear in conflict to me. The soft-failed event can be part of the DAG and reachable from some later event which revoked whatever initially auth'ed it...

That may very well be acceptable, but it's a degradation of the robustness gained from soft-failure; in a system structured as a [nearly] linked-list it's important that bugs, regressions, spec pitfalls and implementation ambiguities don't split the room. This can be accidental or it can be exploited for denial of service. So before the rules become more rigid this should be given some thought.

FTR, I looked up what the spec says about what to do with events that don't pass auth based on the state at the event itself, which looks like it persists the events but basically turns them into no-ops (if they're state events, they never affect state). So, it seems to be closer to what @jevolk suggests than what I had said.

Such an event transitioned from valid to invalid because the light-cone was enlarged to invalidate whatever had been validating it (i.e. its closest power_levels event). Now we're holding it in limbo in case a further revelation enlarges the light-cone invalidating the event which invalidated our failed event, transitioning it back to valid.

The proposal is written from the point of view of a specific server. In the situation described in this proposal, an event (e.g. Bob's topic change) only gets soft-failed (or whatever terminology we're using now) if the event that revoked privileges (e.g. Alice muting Bob) arrives at that server before the event that gets soft-failed. So from the point of view of that server, the event does not really transition from valid to invalid. Bob's topic change arrives at a server that thinks that Bob should currently be muted in the room, and the server needs to decide what to do with that event. So at this point, the server does not know if the event is valid or not. The proposal here is that in that situation, the server should hold onto it but avoid referencing it or passing it on for now. If it is a state event, and the server receives new information indicating that the event was not a moderation evasion event (where the "new information" is in the form of some other server sending a valid event that references it, suggesting that the server had received the topic change event before the muting event), then it will consider the soft-failed event in state resolution.

jevolk · 2018-10-17T20:51:29Z

specification/server_server_api.rst

+If an event is received that references the soft failed event then the new event
+should be handled as usual. Soft failed state events participate in state
+resolution, and so can appear in the state of events that reference the soft
+failed state event. This can result in soft-failed events appearing in the state


This can result in soft-failed events appearing in the state of allowed events.

The problem seems to be right here. The mere existence of an allowed event is not a legitimate reason to unfail an already soft-failed event.

uhoreg · 2018-10-17T21:07:58Z

specification/server_server_api.rst

+When the homeserver receives a new event over federation it should also check
+whether the event passes auth checks based on the current state of the room
+(as well as based on the state at the event). If the event does not pass the
+auth checks it should be "soft failed".


I think a possible point of confusion is that this "does not pass the auth checks" only refers to the auth checks based on the current state of the room. If the event does not pass auth checks based on the state at the event itself, then the event should be outright rejected, rather than soft failed. This should probably be clarified.

richvdh · 2018-10-18T11:11:25Z

specification/server_server_api.rst

+.. NOTE::
+
+    This is different than rejections in that soft failed events are simply
+    ignored unless a new event references it.


that's not quite true: soft failed events are not entirely ignored: they are added to the DAG.

(I suppose it would be a valid choice not to persist the event, but that sounds like you could easily get into a mess of repeatedly seeing the same event and doing the same authentication over and over.)

richvdh · 2018-10-18T11:14:59Z

specification/server_server_api.rst

+
+.. NOTE::
+
+    Soft failures are designed to stop malicious servers from avoiding actions


I wonder if it would be clearer to spell this out at the top of the section, rather than here. Currently, the problem is that you have to read and understand the implementation before you get told the motivation, which leads to a bunch of "wtf" type questions as have been seen here.

I also think that it would be helpful to introduce the section with a handwavey overview (possibly including examples) before diving into the specifics of the implementation.

richvdh · 2018-10-18T11:21:05Z

specification/server_server_api.rst

@@ -570,6 +570,65 @@ transaction request to be responded to with an error response.
    result in the user being considered joined.


+Soft failure


I do think that giving this a name other than "soft failure" might help. Maybe "event quarantine"?

Yeah, quite possibly. Though event quarantine in my mind feels more severe than "failure" tbh

I personally like "quarantine" better than "soft failure"

richvdh · 2018-10-18T11:42:35Z

specification/server_server_api.rst

+     \ /
+      D
+
+Where ``A`` is the oldest event, ``B`` bans the sender of ``C`` and ``C`` is a


might be clearer to have two or even three sets of ascii-art here, building up the graph incrementally. The problem is that the graph as it is looks like we're about to discuss D, which isn't even the main point of soft-failure (which is C).

might be clearer to have two or even three sets of ascii-art here, building up the graph incrementally. The problem is that the graph as it is looks like we're about to discuss D, which isn't even the main point of soft-failure (which is C).

I'm going to echo the importance of this. The cases here have the potential to become very complex very quickly. Just illustrating the most basic examples is wholly insufficient.

…orked examples

jevolk · 2018-10-18T13:39:42Z

@erikjohnston @uhoreg I feel this section needs to include exactly how a soft-failed event can become un-failed. The PR and comments thus far seem to describe the inclusion of soft-failed events in future state resolutions once they are referenced but the unfolding of an actual case and diagram of this specific action seems to be essential here. Also, any step-by-step / diagram really needs to illustrate at what point events are revealed to clients.

uhoreg · 2018-10-19T14:03:28Z

One other issue, which I realized last night, is that the spec only defines the state at an event, and doesn't define what the "current" state of the room is. For example, if the room DAG looks like:

    A
   / \
  B   C

then what is the "current" state? Is it the state at B? The state at C? Or some combination? I think that I would expect that the "current" state is the state resolution of B and C, but I don't think this is defined anywhere.

As a follow-up to that is the question of what should be the state that clients see. I would expect that clients should see the "current" state of the room. So, if we start with:

    A
    |
    B

where A is any random event, and B is Alice muting Bob. The server then receives event C, which is Bob trying set the topic:

    A
   / \
  B   C

In this case, C gets soft-failed, so it does not get sent to the client. Suppose that Carol then sends an event referencing Bob's event, but not Alice's, since her server hasn't seen Alice's event yet:

Obviously, since Carol's event does not reference Alice's event, the state at Carol's event will include Bob's topic change. However, if the "current" state of the room is the state resolution of B and D, and that resolution says that the topic change does not take effect, then what should clients see? Should they be sent Bob's event, since it is included in the state at D? Or should it be omitted, since it is never part of the "current" state of the room? (I would hope that it would be omitted.)

erikjohnston · 2018-10-19T14:19:05Z

So the key thing here is that (due to state resolution) the clients have to be told about changes to state that happen independently of sending new events down the stream.

So in your case the client would be told:

A happened
B happened, and the current state now includes B
D happened, and the current state now includes B, C and D (or whatever the resolution of B and D are).

Exactly how this happens is up to the CS API. (Admittedly the current /sync format makes this tricky to do currently, but this is a situation that happens today as it is)

jevolk · 2018-10-19T20:29:43Z

A happened

B happened, and the current state now includes B

D happened, and the current state now includes B, C and D (or whatever the resolution of B and D are).

@erikjohnston This doesn't answer the question because @uhoreg asked:

Suppose that Carol then sends an event referencing Bob's event, but not Alice's, since her server hasn't seen Alice's event yet.

@uhoreg is supposing that '2' carrying event B did not happen yet. The client is informed of A -> C -> D which is all valid and the current state of the room includes A C and D (or the state at event D). Receiving event B then happens last in meatspace time.

erikjohnston · 2018-10-22T08:54:36Z

@uhoreg is supposing that '2' carrying event B did not happen yet. The client is informed of A -> C -> D which is all valid and the current state of the room includes A C and D (or the state at event D). Receiving event B then happens last in meatspace time.

I was reading @uhoreg example as the server receiving the events A then B then C then D. If the server receives B after C or D then soft fail is never triggered and everything is handled as it is today.

uhoreg · 2018-10-22T16:01:03Z

Yes, in my example, the server that does the soft-failing (say, Alice's server) receives the events A then B then C then D. However, Carol's server receives the events A then C then D then B. So there are two servers involved that receive events in different orders.

jevolk · 2018-10-22T22:24:20Z

Okay, so my summary of this proposal is the following:

If an event updates the state of the room in contravenance to the rules set
in the current state of the room: do not apply its effects to the state and do
not reference it.

If such an event is referenced by another event, this section does not apply.

And with that being the case, I think this PR is somewhat wordy now and should have simply been explained concisely in the first place to remain as succinct as it ought to be.

erikjohnston · 2018-10-23T08:41:44Z

I assume you meant "... don't apply its effects to the current state of the room"? But yes, most of the waffle here is due to folks asking for clarifying language and me being incompetent at managing to explain it.

I think it is important to note in the spec that such events shouldn't be (proactively) sent to clients, as that's the actual use of the feature. (Though it could be argued that that should be specified in the CS API spec).

Assuming that current state is defined as the state resolved across all forward extremities in the room, it can be written like:

If an event is received that passes the auth checks, but does not pass auth based on the current state of the room, the server should neither update the set of forward extremities nor send it to clients. Otherwise, the event should be handled as usual.

If an event is later received that references that event then just handling it in the same way will result in the server doing the right thing.

ara4n · 2018-10-23T09:59:02Z

specification/server_server_api.rst

+that includes ``C``, in which case clients should also to be told that the state
+has changed to include ``C``.
+
+Note that this is essentially equivalent to the situation where S1 doesn't


we're introducing S1 and S2 as new terms without defining them

ara4n · 2018-10-23T10:47:40Z

specification/server_server_api.rst

+      D
+
+Then servers will handle ``D`` as normal. ``D`` is sent to the servers' clients
+(assuming ``D`` passes auth checks). The state at ``D`` may resolve to a state


It's very confusing to say that D may resolve to a state that includes C, given for the specific values of B & C in this worked example (ban and topic), state res v2 will correctly prevent the ban evasion in practice and not include C in the resolved state.

Perhaps instead:

In this example, the state at D will not resolve to a state that includes C, as state resolution will prioritise the ban over the topic change to avoid evasion. However, in the general case, the state at D may resolve to a state that includes C

...and then so on as before.

ara4n · 2018-10-23T10:58:45Z

specification/server_server_api.rst

+whether the event passes auth checks based on the current state of the room (as
+well as based on the state at the event). If the event does not pass the auth
+checks based on the current state of the room (but does pass the auth checks
+based on the state at that event) it should be "soft failed".


I think the root of much of the confusion on this PR is the fact that soft-failed events participate in state res as normal is buried relatively deep. How about appending:

"Otherwise it participates in state resolution as normal (and we rely on the state resolution algorithm to avoid malicious events influencing the state of the room)"

ara4n · 2018-10-23T11:00:17Z

specification/server_server_api.rst

+.. NOTE::
+
+    If an event is received that references the soft failed event then the new event
+    should be handled as usual. Soft failed state events participate in state


If we hoist up 'soft failed state events participate in state res as normal' to the main section above as proposed in https://github.com/matrix-org/matrix-doc/pull/1641/files#r227336850, then this would become:

"As soft-failed events participate in state resolution as normal, they can appear in the state of events which reference the soft-failed state event."

richvdh · 2018-10-23T09:48:56Z

specification/server_server_api.rst

+Soft failure
++++++++++++
+
+Motivation


s/Motivation/Background/ I think

richvdh · 2018-10-23T10:40:39Z

specification/server_server_api.rst

+     /
+    B
+
+Where ``B`` is a ban of a user ``X``. If the user ``X`` tries to set the topic


s/Where/where/

richvdh · 2018-10-23T10:40:52Z

specification/server_server_api.rst

+     / \
+    B   C
+
+Servers that receive ``C`` after ``B`` will soft fail event ``C``, and so will


s/Servers/servers/

s/will soft fail/should soft fail/, maybe

richvdh · 2018-10-23T10:41:13Z

specification/server_server_api.rst

+     \ /
+      D
+
+Then servers will handle ``D`` as normal. ``D`` is sent to the servers' clients


s/Then/then/

richvdh · 2018-10-23T10:47:11Z

specification/server_server_api.rst

+Motivation
+^^^^^^^^^^
+
+If a user is banned from a room a server can still send events from them by only


Suggested change

If a user is banned from a room a server can still send events from them by only

It is important that we prevent users from evading bans (or other power restrictions) by

creating events which reference old parts of the DAG. For example, a banned user

could continue to send messages to a room by having their server send events which

reference the event before they were banned.

Note that such events are entirely valid, and we cannot simply reject them, as it is

impossible to distinguish such an event from a legitimate one which has been delayed.

We must therefore accept such events and let them participate in state resolution and

the federation protocol as normal. However, servers may choose not to send such events

on to their clients, so that end users won't actually see the events.

richvdh · 2018-10-23T10:52:59Z

specification/server_server_api.rst

+based on the state at that event) it should be "soft failed".
+
+When an event is "soft failed" it should not be relayed to the client nor be
+referenced by new events created by the homeserver.


I wonder if we should define what forward extremities are somewhere in the spec (not here), and then reference them here: "s/nor be referenced by new events created by the homeserver/nor be added to the list of forward extremities for the room".

richvdh · 2018-10-23T10:53:43Z

specification/server_server_api.rst

+based on the state at that event) it should be "soft failed".
+
+When an event is "soft failed" it should not be relayed to the client nor be
+referenced by new events created by the homeserver.


"(It should, however, participate in state resolution as normal, if further events are received which reference it. The state resolution algorithm will then ensure that malicious events cannot be injected into the room state via this mechanism.)"

richvdh · 2018-10-23T11:06:54Z

specification/server_server_api.rst

+
+.. NOTE::
+
+    If an event is received that references the soft failed event then the new event


Suggested change

If an event is received that references the soft failed event then the new event

Because soft failed state events participate in state

resolution as normal, it is possible for such events to appear in the

state of non-soft-failed events, in which case the client should be told ...

ara4n · 2018-10-23T14:23:12Z

And with that being the case, I think this PR is somewhat wordy now and should have simply been explained concisely in the first place to remain as succinct as it ought to be.

This new text oversimplifies it: for instance, it implies to me that soft-failed events don't participate in state resolution, and it makes it seem that the behaviour when a soft-failed event is referenced or not has to be special-cased for soft-failures, which doesn't seem to be the case.

Given the main text (ignoring the explanatory note & worked example) is already succinct and accurate (modulo my suggestion at https://github.com/matrix-org/matrix-doc/pull/1641/files#r227336850 to make it crystal clear that soft-fail events participate in state res as normal), i don't think it benefits from attempts to simplify it even further.

The PR lgtm modulo my last few reqs for clarification.

jevolk · 2018-10-23T15:45:24Z

And with that being the case, I think this PR is somewhat wordy now and should have simply been explained concisely in the first place to remain as succinct as it ought to be.

This new text oversimplifies it: for instance, it implies to me that soft-failed events don't participate in state resolution, and it makes it seem that the behaviour when a soft-failed event is referenced or not has to be special-cased for soft-failures, which doesn't seem to be the case.

I think the initial confusion here stems from the scope of what this section in the spec is trying to accomplish. While it does make the distinction between events which are unreferenced (so-called forward extremities) and those which are referenced: the confusion is in considering this section applies in some way to events which are referenced. It does not. It may seem obvious to spec authors here when they use phrases such as "handled as usual" but that is somewhat ambiguous and easily confusing for an unfamiliar implementer.

This PR and ultimately this section on soft-failing in the spec should make it clear that it simply applies to unreferenced forward extremities, and if an event has been referenced or is referenced later this part of the spec has no application. It should mention that other servers can trivially violate the soft-failing by referencing the event: after which this section simply no longer applies.

ara4n · 2018-10-23T16:16:27Z

specification/server_server_api.rst

+  a state that includes ``C``, in which case clients should also to be told that
+  the state has changed to include ``C``. (*Note*: This depends on the exact
+  state resolution algorithm used. In the original version of the algorithm
+  ``C`` would be in the resolved state, whereas in latter versions this may not


whilst this is more accurate than the proposed change, it comes across as very cryptic. can we ground it in concrete examples by saying "in later versions the algorithm will try to prioritise the ban over the topic change" or something?

I was really trying to avoid talking about all the different state res algorithms and what they do, and just saying "hey, multiple versions exist"

"in later versions the algorithm will try to prioritise the ban over the topic change"

I sort of see what you mean, but I don't think that is any less cryptic for those who don't know about states resolution algorithms tbh

Initial stab at documenting soft fail

1b366b1

erikjohnston requested a review from a team August 31, 2018 15:01

turt2live reviewed Aug 31, 2018

View reviewed changes

Title casing

6a035cc

uhoreg reviewed Aug 31, 2018

View reviewed changes

specification/server_server_api.rst Outdated Show resolved Hide resolved

turt2live assigned erikjohnston Sep 10, 2018

Clarify whether to return soft failed events are returned in federati…

cafe706

…on API

erikjohnston requested a review from a team October 2, 2018 13:06

turt2live approved these changes Oct 2, 2018

View reviewed changes

neilisfragile added the p1 label Oct 9, 2018

uhoreg reviewed Oct 10, 2018

View reviewed changes

Clarify soft failed state event handling

ed9f05d

ara4n reviewed Oct 17, 2018

View reviewed changes

Further clarifications

7b28b19

jevolk reviewed Oct 17, 2018

View reviewed changes

uhoreg reviewed Oct 17, 2018

View reviewed changes

erikjohnston added 2 commits October 18, 2018 11:25

Clarify soft fail being due to not passing auth based on current state

6445899

Add an attempt at an example

43d2d82

richvdh reviewed Oct 18, 2018

View reviewed changes

erikjohnston added 2 commits October 18, 2018 13:35

Rework entire thing to add motivation, then implementation and then w…

f3f1151

…orked examples

Try different title character

0d9b882

erikjohnston requested a review from a team October 22, 2018 08:53

erikjohnston removed their assignment Oct 22, 2018

ara4n reviewed Oct 23, 2018

View reviewed changes

richvdh reviewed Oct 23, 2018

View reviewed changes

erikjohnston added 2 commits October 23, 2018 14:39

Use Rationale and Example admonitions, rather than headings

ceb7494

Fix up example section

377e02a

Incorporate feedback

56b2887

ara4n reviewed Oct 23, 2018

View reviewed changes

richvdh assigned erikjohnston Oct 25, 2018

Add requested from Matthew to talk about future state res algorithms

40bc911

erikjohnston merged commit 4271391 into master Oct 26, 2018

neilisfragile added the s2s r0 label Nov 1, 2018

erikjohnston mentioned this pull request Mar 6, 2019

Implement soft fail matrix-org/synapse#4814

Merged

turt2live mentioned this pull request Sep 14, 2018

Better specify how to validate events matrix-org/matrix-spec#365

Open

		@@ -570,6 +570,32 @@ transaction request to be responded to with an error response.
		result in the user being considered joined.


		Soft Failure


		.. NOTE::

		Soft failures are designed to stop malicious servers from avoiding actions

		@@ -570,6 +570,65 @@ transaction request to be responded to with an error response.
		result in the user being considered joined.


		Soft failure

-If a user is banned from a room a server can still send events from them by only
+It is important that we prevent users from evading bans (or other power restrictions) by
+creating events which reference old parts of the DAG. For example, a banned user
+could continue to send messages to a room by having their server send events which
+reference the event before they were banned.
+Note that such events are entirely valid, and we cannot simply reject them, as it is
+impossible to distinguish such an event from a legitimate one which has been delayed.
+We must therefore accept such events and let them participate in state resolution and
+the federation protocol as normal. However, servers may choose not to send such events
+on to their clients, so that end users won't actually see the events.


		.. NOTE::

		If an event is received that references the soft failed event then the new event

-    If an event is received that references the soft failed event then the new event
+    Because soft failed state events participate in state
+    resolution as normal, it is possible for such events to appear in the
+    state of non-soft-failed events, in which case the client should be told ...

Initial stab at documenting soft fail #1641

Initial stab at documenting soft fail #1641

Conversation

erikjohnston commented Aug 31, 2018

Choose a reason for hiding this comment

turt2live left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jevolk commented Oct 10, 2018 via email

uhoreg commented Oct 11, 2018

erikjohnston commented Oct 17, 2018

uhoreg commented Oct 17, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jevolk Oct 17, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jevolk commented Oct 18, 2018

uhoreg commented Oct 19, 2018

erikjohnston commented Oct 19, 2018

jevolk commented Oct 19, 2018 • edited Loading

erikjohnston commented Oct 22, 2018

uhoreg commented Oct 22, 2018

jevolk commented Oct 22, 2018 • edited Loading

erikjohnston commented Oct 23, 2018

Choose a reason for hiding this comment

ara4n Oct 23, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ara4n commented Oct 23, 2018

jevolk commented Oct 23, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jevolk Oct 17, 2018 •

edited

Loading

jevolk commented Oct 19, 2018 •

edited

Loading

jevolk commented Oct 22, 2018 •

edited

Loading

ara4n Oct 23, 2018 •

edited

Loading