Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC1763: Proposal for specifying configurable message retention periods #1763

Open
wants to merge 37 commits into
base: old_master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
687b650
first cut of MSC1763 for configurable event retention
ara4n Dec 30, 2018
f770440
ephemeral msging ended up in scope
ara4n Dec 30, 2018
b25367e
fix english
ara4n Dec 30, 2018
2aafa02
clarify this only applies to non-state events; fix retention JSON str…
ara4n Dec 30, 2018
64695ed
make conflict alg explicit for user retention settings
ara4n Dec 30, 2018
c493dbd
change max >= min invariant
ara4n Dec 30, 2018
0afc3af
spell out that self-destructing msgs need explicit RRs
ara4n Dec 30, 2018
7597e03
more validation on fields
ara4n Dec 30, 2018
7a8d204
spell out how the example server admin overrides would work
ara4n Dec 30, 2018
4646fcd
improve wording; spell out purge/redact dichotomy; add explicit alg
ara4n Dec 30, 2018
c55158d
clarify redaction semantic and default PL
ara4n Dec 30, 2018
6e33c2f
track max's idea of advertising retention per-server
ara4n Dec 30, 2018
28ea4e1
fix normatives
ara4n Dec 30, 2018
cca99dd
clarify client behaviour
ara4n Jan 4, 2019
a4974b6
make self_destruct set a timer in seconds rather than be binary.
ara4n Jan 4, 2019
c27394c
clarify warning about conflicts
ara4n Jan 5, 2019
f0553c0
Merge branch 'master' into matthew/msc1763
ara4n Aug 10, 2019
bdce6f1
remove per-message retention and self-destruct messages entirely to t…
ara4n Aug 10, 2019
a30a853
spell out that events will disappear from event streams when purged
ara4n Aug 10, 2019
c281420
add the 'why not nego?' tradeoff
ara4n Aug 10, 2019
ef215dd
clarify the intention to not default to finite message retention
ara4n Aug 10, 2019
0b6a209
spell out not to default to a max_lifetime
ara4n Aug 10, 2019
5c29779
incorporate review
ara4n Aug 11, 2019
032e63b
Apply suggestions from code review
ara4n Aug 11, 2019
1a4101e
link #2228
ara4n Aug 11, 2019
90b17d6
units
ara4n Aug 11, 2019
32f21ac
lifetimes in milliseconds
ara4n Aug 16, 2019
a1b8726
fix json number ranges
ara4n Aug 17, 2019
ee0a7ee
Update 1763-configurable-retention-periods.md
richvdh Aug 19, 2019
cabef48
Apply suggestions from code review
ara4n Aug 26, 2019
f5c3729
incorporate review
ara4n Aug 26, 2019
f8ceb97
spell out an example UI for warning about retention
ara4n Aug 26, 2019
8b1a0c3
clarify care & feeding of DAG
ara4n Aug 28, 2019
9357ec6
incorporate more @richvdh review
ara4n Aug 28, 2019
ac2f87e
Apply suggestions from code review
ara4n Sep 3, 2019
116c5b9
split out media attachment clean-up to #2278
ara4n Sep 3, 2019
f809087
Massively rewrite the proposal
babolivier Oct 11, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
243 changes: 243 additions & 0 deletions proposals/1763-configurable-retention-periods.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
# Proposal for specifying configurable retention periods for messages.
ara4n marked this conversation as resolved.
Show resolved Hide resolved

A major shortcoming of Matrix has been the inability to specify how long
events should stored by the servers and clients which participate in a given
room.

This proposal aims to specify a simple yet flexible set of rules which allow
users, room admins and server admins to determine how long data should be
stored for a room, from the perspective of respecting the privacy requirements
of that room (which may range from "burn after reading" ephemeral messages,
through to FOIA-style public record keeping requirements).

As well as enforcing privacy requirements, these rules provide a way for server
administrators to better manage disk space (e.g. to enforce rules such as "don't
store remote events for public rooms for more than a month").

## Problem:

Matrix is inherently a protocol for storing and synchronising conversation
history, and various parties may wish to control how long that history is stored
for.

* Users may wish to specify a maximum age for their messages for privacy
purposes, for instance:
* to avoid their messages (or message metadata) being profiled by
unscrupulous or compromised homeservers
* to avoid their messages in public rooms staying indefinitely on the public
record
* because of legal/corporate requirements to store message history for a
limited period of time
* because of legal/corporate requirements to store messages forever
(e.g. FOIA)
* to provide "ephemeral messaging" semantics where messages are best-effort
deleted after being read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I question the feasibility of this - on what I essentially see as a matrix-specced version of Synapse's History Purge functionality. What would qualify exactly as "after read"? Shouldn't this be removed and left alone for MSC2228 to specify or address?

* Room admins may wish to specify a retention policy for all messages in a
room.
* A room admin may wish to enforce a lower or upper bound on message
retention on behalf of its users, overriding their preferences.
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* A bridged room should be able to enforce the data retention policies of the
remote rooms.
* Server admins may wish to specify a retention policy for their copy of given
rooms, in order to manage disk space.

Additionally, we would like to provide this behaviour whilst also ensuring that
users generally see a consistent view of message history, without lots of gaps
and one-sided conversations where messages have been automatically removed.

At the least, it should be possible for people participating in a conversation
to know the expected lifetime of the other messages in the conversation **at the
time they are sent** in order to know how best to interact with them (i.e.
whether they are knowingly participating in a future one-sided conversation or
not).

We would also like to discourage users from setting low message retention as a
ara4n marked this conversation as resolved.
Show resolved Hide resolved
matter of course, as it can result in very antisocial conversation patterns to
the detriment of Matrix as a useful communication mechanism.

This proposal does not try to solve the problems of:
* GDPR erasure (as this involves retrospectively changing the lifetime of
messages)
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* Bulk redaction (e.g. to remove all messages from an abusive user in a room,
as again this is retrospectively changing message lifetime)
* Limiting the number (rather than age) of messages stored per room (as this is
more a question of quotaing rather than empowering privacy)
* Ephemeral messaging?

## Proposal

### User-specified per-message retention

Users can specify per-message retention by adding the following fields to the
event alongside its content:
ara4n marked this conversation as resolved.
Show resolved Hide resolved

`max_lifetime`:
ara4n marked this conversation as resolved.
Show resolved Hide resolved
the maximum duration in seconds for which a well-behaved server should store
this event. If absent, or null, it should be interpreted as 'forever'.
ara4n marked this conversation as resolved.
Show resolved Hide resolved


`min_lifetime`:
ara4n marked this conversation as resolved.
Show resolved Hide resolved
the minimum duration for which a well-behaved server should store this event.
If absent, or null, should be interpreted as 'forever'
ara4n marked this conversation as resolved.
Show resolved Hide resolved

`self_destruct`:
ara4n marked this conversation as resolved.
Show resolved Hide resolved
a boolean for whether wellbehaved servers should remove this event after
seeing an explicit read receipt delivered for it.

`expire_on_clients`:
ara4n marked this conversation as resolved.
Show resolved Hide resolved
a boolean for whether well-behaved clients should expire messages clientside
to match the min/max lifetime and/or self_destruct semantics fields.

For instance:

```json
{
"type": "m.room.message",
"max_lifetime": 86400,
"content": ...
}
```

The above example means that servers receiving this message should store the
event for a only 86400 seconds (1 day), as measured from that event's
origin_server_ts, after which they MUST prune the event from their
ara4n marked this conversation as resolved.
Show resolved Hide resolved
DBs. We consciously do not redact the event, as we are trying to eliminate
metadata here at the cost of deliberately fracturing the DAG (which will
ara4n marked this conversation as resolved.
Show resolved Hide resolved
fragment into disparate chunks).

```json
{
"type": "m.room.message",
"min_lifetime": 2419200,
"content": ...
}
```

The above example means that servers receiving this message SHOULD store the
event forever, but MAY choose to prune their copy after 28 days (or longer) in
order to reclaim diskspace.

```json
{
"type": "m.room.message",
"self_destruct": true,
"expire_on_clients": true,
"content": ...
}
```

The above example describes 'self-destructing message' semantics where both server
and clients MUST prune/delete the event and associated data as soon as a read
receipt is received from the recipient.
ara4n marked this conversation as resolved.
Show resolved Hide resolved

### User-advertised per-message retention

If we had extensible profiles, users could advertise their intended per-message
ara4n marked this conversation as resolved.
Show resolved Hide resolved
retention in their profile (in global profile or per-room profile) as a useful
social cue. However, this would be purely informational.

### Room Admin-specified per-room retention

We introduce a `m.room.retention` state event, which room admins can set to
override the retention behaviour for a given room. This takes the same fields
described above.

If set, these fields directly override any per-message retention behaviour
specified by the user - even if it means forcing laxer privacy requirements on
that user. This is a conscious privacy tradeoff to allow admins to specify
explicit privacy requirements for a room. For instance, a room may explicitly
ara4n marked this conversation as resolved.
Show resolved Hide resolved
disable self-destructing messages by setting `self_destruct: false`, or may
require all messages in the room be stored forever with `min_lifetime: null`.

In the instance of `min_lifetime` or `max_lifetime` being overridden, the
invariant that `max_lifetime > min_lifetime` must be maintained by clamping
max_lifetime to be greater than `min_lifetime`.
ara4n marked this conversation as resolved.
Show resolved Hide resolved

If the user's retention settings conflicts with those in the room, then the user's
ara4n marked this conversation as resolved.
Show resolved Hide resolved
clients should warn the user.

### Server Admin-specified per-room retention

Server admins have two ways of influencing message retention on their server:

1) Specifying a default `m.room.retention` for rooms created on the server, as
ara4n marked this conversation as resolved.
Show resolved Hide resolved
defined as a per-server implementation configuration option which inserts the
state events after creating the room (effectively augmenting the presets used
turt2live marked this conversation as resolved.
Show resolved Hide resolved
when creating a room). If a server admin is trying to conserve diskspace, they
may do so by specifying and enforcing a relatively low min_lifetime (e.g. 1
month), but not specify a max_lifetime, in the hope that other servers will
retain the data for longer.

XXX: is this the correct approach to take? It's how we force E2E encryption on,
but it feels very fragmentory and magical presets to do different things depending
on which server you're on.

2) By adjusting how aggressively their server enforces the the `min_lifetime`
ara4n marked this conversation as resolved.
Show resolved Hide resolved
value for message retention. For instance, a server admin could configure their
server to attempt to automatically remote purge messages in public rooms which
are older than three months (unless min_lifetime for those messages was set
higher).

The expected configuration here could be something like:
ara4n marked this conversation as resolved.
Show resolved Hide resolved
* target_lifetime_public_remote_events: 3 months
* target_lifetime_public_local_events: null # forever
* target_lifetime_private_remote_events: null # forever
* target_lifetime_private_local_events: null # forever

...which would try to automatically purge remote events from public rooms after
3 months (assuming their individual min_lifetime is not higher), but leave
others alone.

XXX: should this configuration be specced or left as an implementation-specific
ara4n marked this conversation as resolved.
Show resolved Hide resolved
config option?

Server admins could also override the requested retention limits (e.g. if resource
constrained), but this isn't recommended given it may result in history being
irrevocably lost against the senders' wishes.
ara4n marked this conversation as resolved.
Show resolved Hide resolved

## Client-side behaviour

Clients should independently calculate the retention of a message based on the
ara4n marked this conversation as resolved.
Show resolved Hide resolved
event fields and the room state, and show the message lifespan in the UI. If a
message has a finite lifespan that fact MUST be indicated clearly in the timeline
to allow users to correctly interact with the message. (The details of the
lifespan can be shown on demand, however).

If `expire_on_clients` is true, then clients should also calculate expiration for
said events and delete them from their local stores as required.
ara4n marked this conversation as resolved.
Show resolved Hide resolved

## Tradeoffs

This proposal deliberately doesn't address GDPR erasure or mega-redaction scenarios,
ara4n marked this conversation as resolved.
Show resolved Hide resolved
as it attempts to build a coherent UX around the use case of users knowing their
privacy requirements *at the point they send messages*. Meanwhile GDPR erasure is
handled elsewhere (and involves hiding rather than purging messages, in order to
avoid annhilating conversation history), and mega-redaction is yet to be defined.

## Potential issues

How do we handle scenarios where users try to re-backfill in history which has
already been purged? This should presumably be a server admin option on whether
to allow it or not, and if allowed, configure how long the backfill should persist
for before being purged again?

## Security considerations

There's scope for abuse where users can send abusive messages into a room with a
short max_lifetime and/or self_destruct set true which promptly self-destruct.

One solution for this could be for server implementations to implement a quarantine
mode which initially marks purged events as quarantined for N days before deleting
ara4n marked this conversation as resolved.
Show resolved Hide resolved
them entirely, allowing server admins to address abuse concerns.

## Conclusion

Previous attempts to solve this have got stuck by trying to combine together too many
disparate problems (e.g. reclaiming diskspace; aiding user data privacy; self-destructing
messages; mega-redaction; clearing history on specific devices; etc) - see
https://github.com/matrix-org/matrix-doc/issues/440 and https://github.com/matrix-org/matrix-doc/issues/447
for the history.

This proposal attempts to simplify things to strictly considering the question of
how long servers should persist events for (with the extension of self-destructing
messages added more to validate that the design is able to support such a feature).