[WIP] MSC2706: IPFS as a media repository for Matrix #2706

turt2live · 2020-07-28T04:34:11Z

This is inspired by matrix-media-repo's work towards IPFS, but still needs work.

This is done with a community hat on:

Signed-off-by: Travis Ralston <[email protected]>

KB1RD · 2020-07-28T15:05:50Z

proposals/2706-IPFS.md

+IPFS uses "content IDs" (or "cid") to reference media which are compatible with Matrix's media IDs (**TODO: CONFIRM**),
+making the process even easier to migrate. To support backwards compatability with older clients
+and servers, the media ID is proposed to be formatted as `ipfs:<cid>` for IPFS-hosted media. This
+will allow legacy servers and clients to contact their homeserver and resolve it to an IPFS gateway


Who pins the IPFS content? IMO, server-side pinning creates an opportunity for managing retention and redaction using IPFS.

What do you mean by pinning here?

From https://docs.ipfs.io/concepts/persistence/,

IPFS nodes treat the data they store like a cache, meaning that there is no guarantee that the data will continue to be stored. "Pinning" a CID tells an IPFS server that the data is important and mustn't be thrown away.

AFAIK, the current method of retrieving Matrix media effectively "pins" media on all participant servers. Ideally, a server could do a reference count on IPFS resources and pin them accordingly. The difficult part with that would be that there is no standard way of determining which media an event references without knowing its schema. I.e, if I create a new event type and upload media with it, the server has no clear way of pulling the media out of that new event except by searching for all mxc URLs.

A future P2P world could use a more conservative pinning algorithm.

I'm not sure this is the MSC to solve that problem tbh. The server doesn't need to pin it, and in popular enough rooms the media will get shared across other nodes naturally.

We could try and pin the media to a server, however in a p2p environment we'd probably want to do the opposite in support of freedom?

fwiw it is not proposed (and won't be proposed when this MSC is de-drafted) to have the old media system disappear. It would still exist, just at a lesser prominence than IPFS.

I'm not sure this is the MSC to solve that problem tbh.

Yeah probably best to leave up to a media pruning MSC.

The server doesn't need to pin it, and in popular enough rooms the media will get shared across other nodes naturally.

If not pinned, all participant nodes will prune it if its not accessed for a while, so at least 1 node has to pin it. This could be just the originating server, but if that server goes offline, it can't be accessed. Retaining access after a server goes offline may also be beyond the scope of this MSC. Just adding a file to IPFS pins it though, (unless otherwise specified) so if the originating server adds it, then others will be able to access it.

We could try and pin the media to a server, however in a p2p environment we'd probably want to do the opposite in support of freedom?

True... AFAIK, each client would serve as a pinning node in this case, so technically no "servers."

Pinning it on clients can reveal your IP address to all other participants.
In large rooms this is harder because it's harder to tell what IP belongs to what user, but in smaller rooms, it's easier.
In DMs it's trivial:

send a file

see the one IP that pinned it

KB1RD · 2020-07-28T15:09:14Z

proposals/2706-IPFS.md

+to be served while indicating to supporting implementations that they do not need to contact the
+origin server and can instead use IPFS directly to retrieve the media.
+
+For completeness, an example IPFS-styled MXC URI would be `mxc://example.org/ipfs:cidgoeshere`.


This may conflict with #2703. ( unreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" ) Maybe #2703 could be expanded to allow certain exceptions for protocols like IPFS?

I believe #2703 could use pchar = unreserved / pct-encoded / sub-delims / ":" / "@" instead, sans pct-encoded`?

If #2703 used self-describing CID then you would not need ipfs: prefix.
You could leverage codec field (table) for different types of Media ID.

KB1RD · 2020-07-28T15:13:53Z

proposals/2706-IPFS.md

+to be served while indicating to supporting implementations that they do not need to contact the
+origin server and can instead use IPFS directly to retrieve the media.
+
+For completeness, an example IPFS-styled MXC URI would be `mxc://example.org/ipfs:cidgoeshere`.


It might also be helpful to use a CID that points to an IPLD object containing metadata, such as filename, MIME type, or Content-Disposition data, as well as a link to the actual data. This could help implement #2702.

notramo · 2021-01-09T23:42:41Z

Problems with this:
2 huge problems:

When the user originally uploads a file, it can reveal his IP address to everyone in the room as the only node that has the file at the beginning.
IPFS nodes crash all consumer routers with Intel Puma 6 or 7 chip after few minutes. https://badmodems.com/

Multiple smaller problems:

In a room with many members, an upload would cost multiples of the file size in bandwidth because the first few nodes will download from the uploader
JS-IPFS isn't capable of seeding content (browser clients can't seed)
IPFS use some background traffic to keep the DHT connections, which wastes mobile data. Unacceptable on phones.
This would generate much more traffic than needed for downloading
- If viewers don't seed, then everybody will download from the uploader. In a 200 member room, a 8 MiB image would cost 1.6 GiB for the uploader. Unacceptable on mobile data.
- If viewers seed, then viewing an image would cost multitudes of the file size in traffic. (Dowload it once, and seed it for multiple peers.) Unacceptable on mobile data.
If the uploader has a slow connection, it will be slow for everybody
It has to be pinned on the client, which means it will use lot of storage. Unacceptable on phones. If it would be done that way, I would have to delete Element from my phone because it would use so much space that I don't have on my phone.
If client sends a file in a DM, then becomes offline, the chat partner can't download it

Possible solution:
Upload to the server, and the server hashes, pins it, then every server can access, cache, replicate, or download it on IPFS.
Benefits:

Network traffic equals to the size of file both when downloading and uploading.
- No unnecessary seeding, or DHT gossip
Servers have fast connections
Servers have lot of storage to pin it
Servers are always online, so the uploader can go offline after uploading
No change to the client-server spec.
No client-side implementation needed.

turt2live · 2021-01-10T03:10:38Z

@notramo please use threads to receive replies.

notramo · 2021-01-11T22:52:07Z

@turt2live What do you mean by threads?

momack2 · 2021-02-25T09:11:56Z

This is exciting! Anything needed from us on the IPFS side to land this? (cc @aschmahmann @Stebalien @lidel)

lidel · 2021-02-25T13:55:14Z

@notramo thank you for this comprehensive list.

Small clarifications/updates:

Crashing consumer routers happens only when you run non-desktop settings on consumer network.
Consumer IPFS nodes in Brave and IPFS Desktop are running with lower connection count (trying to avg. 50-300)
JS-IPFS is capable of delegating seeding content to preload nodes (config, FAQ)
DHT traffic can be decreased by setting routing type to dhtclient. go-ipfs >0.5.0 automatically detects when node is behind a NAT and runs as a dhtclient automatically.

That being said, the approach you proposed (using IPFS to simplify backend media management) sounds sensible.

Matrix would no longer need to worry about facilitating data transfers: only CID would have to be passed around, and the actual data would be found and fetched over IPFS.

Moreover, instance operators could decide that their IPFS node acts only as a hot cache, and/or pin data using vendor-agnostic api to a remote service(s) like Pinata (expect more in the future). This simplifies archival of old media without the need for managing archive on your own.

Leveraging IPFS on the client can be added later, but I believe that as long you use CIDs and content-paths in URLs user agents like Brave or IPFS Companion will be able to leverage them and load data from IPFS thanks to protocol upgrade path.

Let us know if you have any questions / concerns / ideas.

turt2live · 2022-06-17T19:54:57Z

(please use comments on the actual diff if you're expecting engagement from MSC authors or the SCT - otherwise feedback is outright ignored)

ple1n · 2022-06-18T08:37:42Z

proposals/2706-IPFS.md

+
+## Security considerations
+
+**TODO: This.**


IPFS is known to have anonymity and privacy issues (fingerprintable, no audits, no Tor support), so it might be problematic for anonymous users. Running it on homeservers could mitigate this issue

IPFS was never designed to be private. It was designed to be censorship and attack resistant, but not private.
So it's important to run it on servers instead of clients.

notramo

Please don't connect to IPFS from clients as it's a terrible idea.

notramo · 2022-07-05T12:45:27Z

proposals/2706-IPFS.md

+
+## Security considerations
+
+**TODO: This.**


IPFS was never designed to be private. It was designed to be censorship and attack resistant, but not private.
So it's important to run it on servers instead of clients.

notramo · 2022-07-05T12:51:06Z

proposals/2706-IPFS.md

+it would be extremely useful to allow the client to bypass the `/upload` endpoint and publish its
+own MXC URI after having used a local IPFS node. Considering `ipfs://` support is not proposed here,
+clients will need to get a homeserver name/origin to put into the `mxc://` URI. They'll also need to
+know if the server even supports IPFS to be able to bypass `/upload` entirely.


Multiple problems with using IPFS on clients:

In a room with many members, an upload would cost multiples of the file size in bandwidth because the first few nodes will download from the uploader

JS-IPFS isn't capable of seeding content (browser clients can't seed)

IPFS use some background traffic to keep the DHT connections, which wastes mobile data. Unacceptable on phones.

This would generate much more traffic than needed for downloading

If viewers don't seed, then everybody will download from the uploader. In a 200 member room, a 8 MiB image would cost 1.6 GiB for the uploader. Unacceptable on mobile data.

If viewers seed, then viewing an image would cost multitudes of the file size in traffic. (Dowload it once, and seed it for multiple peers.) Unacceptable on mobile data.

If the uploader has a slow connection, it will be slow for everybody

It has to be pinned on the client, which means it will use lot of storage. Unacceptable on phones. If it would be done that way, I would have to delete Element from my phone because it would use so much space that I don't have on my phone.

If client sends a file in a DM, then becomes offline, the chat partner can't download it

All the above, including privacy concerns when providing data to IPFS swarm, assume a full IPFS node runs on the client that is also the only provider for the CID.

However, that is not the only possible architecture. P2P part of IPFS is optional. I'd argue even running full node is optional:

CID could be created on the client, and sent to Matrix server as CAR file for pinning and providing.

Or the original file could be sent to Matrix server, and it would do the work and produce CID for it.

In both cases, the client does not leak IP by providing data to the IPFS network.
In both cases, Matrix server performs a similar storage/retrieval function to what it does right now.
The only real difference is standardizing on identifying data with CIDs.

In my mind, the value of using IPFS in Matrix are CIDs. Content-addressed identifiers allow the community to keep the data alive in addition to Matrix server operators, and use the same data outside of Matrix (and benefit from the pinning and caching on various layers).

Matrix servers could cap costs and set up policy to "pin CIDs for X amount of time/space" and if people want them to be available for longer, they can cache them on their clients, external pinning services, or run their own IPFS node and start reproviding it to the IPFS swarm on their own.

When opening very old messages which are no longer kept around by Matrix server, one could still be able to retrieve the content, as long it was pinned somewhere.

Providing CIDs to IPFS swarm can be handled by Matrix server, OR multiple external pinning services.

Even when Matrix server does not have the data anymore, HTTP Gateways allow delegating retrieval of IPFS content without running full node or providing anything to the network.

Client can request deserialized data (and trust the gateway did it correctly - fine if it is provided by Matrix server), or make a trustless request for verifiable application/vnd.ipld.raw or vnd.ipld.car response and do inexpensive validation and deserialization on the client – this way, any gateway can be used, it no longer needs to be run by the same operator as Matrix server.

DarkKirb

I expanded a bit on this MSC here: DarkKirb@d8516e5

The biggest difference between the current msc and my changes is that I added a pinning endpoint. I also changed the model a bit from the original vision. In many cases, nothing will change for the client, except that they can download the media from an IPFS gateway of their choice. Clients will continue uploading to the media server by default, due to the concerns others have listed here. They are also now listed in the MSC. I linked some related RFCs.

Proposal to use IPFS in Matrix

3a7b896

turt2live added kind:feature MSC for not-core and not-maintenance stuff proposal A matrix spec change proposal labels Jul 28, 2020

KB1RD reviewed Jul 28, 2020

View reviewed changes

jcgruenhage mentioned this pull request Nov 2, 2020

MSC2846: Decentralizing media through CIDs #2846

Open

clokep mentioned this pull request Jan 11, 2021

IPFS & Unstoppable Domains matrix-org/synapse#9061

Closed

turt2live added the needs-implementation This MSC does not have a qualifying implementation for the SCT to review. The MSC cannot enter FCP. label Jun 8, 2021

turt2live force-pushed the old_master branch from e895827 to dca99ee Compare August 30, 2021 22:34

turt2live mentioned this pull request Jan 20, 2021

Acceptable characters for MXC opaque identifiers should be more clearly described matrix-org/matrix-spec#503

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] MSC2706: IPFS as a media repository for Matrix #2706

[WIP] MSC2706: IPFS as a media repository for Matrix #2706

turt2live commented Jul 28, 2020

KB1RD Jul 28, 2020

turt2live Jul 28, 2020

KB1RD Jul 28, 2020

turt2live Jul 28, 2020

turt2live Jul 28, 2020

KB1RD Jul 28, 2020

notramo Jul 5, 2022

KB1RD Jul 28, 2020

KitsuneRal Jul 29, 2020

lidel Feb 25, 2021 •

edited

Loading

KB1RD Jul 28, 2020

notramo commented Jan 9, 2021

turt2live commented Jan 10, 2021

notramo commented Jan 11, 2021

momack2 commented Feb 25, 2021

lidel commented Feb 25, 2021 •

edited

Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

turt2live commented Jun 17, 2022

ple1n Jun 18, 2022

notramo Jul 5, 2022

notramo left a comment

notramo Jul 5, 2022

notramo Jul 5, 2022

lidel Jul 6, 2022 •

edited

Loading

DarkKirb left a comment

[WIP] MSC2706: IPFS as a media repository for Matrix #2706

Are you sure you want to change the base?

[WIP] MSC2706: IPFS as a media repository for Matrix #2706

Conversation

turt2live commented Jul 28, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Feb 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

notramo commented Jan 9, 2021

turt2live commented Jan 10, 2021

notramo commented Jan 11, 2021

momack2 commented Feb 25, 2021

lidel commented Feb 25, 2021 • edited Loading

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

turt2live commented Jun 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

notramo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidel Jul 6, 2022 • edited Loading

Choose a reason for hiding this comment

DarkKirb left a comment

Choose a reason for hiding this comment

lidel Feb 25, 2021 •

edited

Loading

lidel commented Feb 25, 2021 •

edited

Loading

lidel Jul 6, 2022 •

edited

Loading