-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] MSC2706: IPFS as a media repository for Matrix #2706
base: old_master
Are you sure you want to change the base?
Conversation
IPFS uses "content IDs" (or "cid") to reference media which are compatible with Matrix's media IDs (**TODO: CONFIRM**), | ||
making the process even easier to migrate. To support backwards compatability with older clients | ||
and servers, the media ID is proposed to be formatted as `ipfs:<cid>` for IPFS-hosted media. This | ||
will allow legacy servers and clients to contact their homeserver and resolve it to an IPFS gateway |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who pins the IPFS content? IMO, server-side pinning creates an opportunity for managing retention and redaction using IPFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by pinning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://docs.ipfs.io/concepts/persistence/,
IPFS nodes treat the data they store like a cache, meaning that there is no guarantee that the data will continue to be stored. "Pinning" a CID tells an IPFS server that the data is important and mustn't be thrown away.
AFAIK, the current method of retrieving Matrix media effectively "pins" media on all participant servers. Ideally, a server could do a reference count on IPFS resources and pin them accordingly. The difficult part with that would be that there is no standard way of determining which media an event references without knowing its schema. I.e, if I create a new event type and upload media with it, the server has no clear way of pulling the media out of that new event except by searching for all mxc
URLs.
A future P2P world could use a more conservative pinning algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the MSC to solve that problem tbh. The server doesn't need to pin it, and in popular enough rooms the media will get shared across other nodes naturally.
We could try and pin the media to a server, however in a p2p environment we'd probably want to do the opposite in support of freedom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw it is not proposed (and won't be proposed when this MSC is de-drafted) to have the old media system disappear. It would still exist, just at a lesser prominence than IPFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the MSC to solve that problem tbh.
Yeah probably best to leave up to a media pruning MSC.
The server doesn't need to pin it, and in popular enough rooms the media will get shared across other nodes naturally.
If not pinned, all participant nodes will prune it if its not accessed for a while, so at least 1 node has to pin it. This could be just the originating server, but if that server goes offline, it can't be accessed. Retaining access after a server goes offline may also be beyond the scope of this MSC. Just add
ing a file to IPFS pins it though, (unless otherwise specified) so if the originating server adds it, then others will be able to access it.
We could try and pin the media to a server, however in a p2p environment we'd probably want to do the opposite in support of freedom?
True... AFAIK, each client would serve as a pinning node in this case, so technically no "servers."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinning it on clients can reveal your IP address to all other participants.
In large rooms this is harder because it's harder to tell what IP belongs to what user, but in smaller rooms, it's easier.
In DMs it's trivial:
- send a file
- see the one IP that pinned it
to be served while indicating to supporting implementations that they do not need to contact the | ||
origin server and can instead use IPFS directly to retrieve the media. | ||
|
||
For completeness, an example IPFS-styled MXC URI would be `mxc://example.org/ipfs:cidgoeshere`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe #2703 could use pchar = unreserved / pct-encoded / sub-delims / ":" / "@" instead, sans
pct-encoded`?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to be served while indicating to supporting implementations that they do not need to contact the | ||
origin server and can instead use IPFS directly to retrieve the media. | ||
|
||
For completeness, an example IPFS-styled MXC URI would be `mxc://example.org/ipfs:cidgoeshere`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also be helpful to use a CID that points to an IPLD object containing metadata, such as filename, MIME type, or Content-Disposition
data, as well as a link to the actual data. This could help implement #2702.
Problems with this:
Multiple smaller problems:
Possible solution:
|
@notramo please use threads to receive replies. |
@turt2live What do you mean by threads? |
This is exciting! Anything needed from us on the IPFS side to land this? (cc @aschmahmann @Stebalien @lidel) |
@notramo thank you for this comprehensive list. Small clarifications/updates:
That being said, the approach you proposed (using IPFS to simplify backend media management) sounds sensible. Matrix would no longer need to worry about facilitating data transfers: only CID would have to be passed around, and the actual data would be found and fetched over IPFS. Moreover, instance operators could decide that their IPFS node acts only as a hot cache, and/or pin data using vendor-agnostic api to a remote service(s) like Pinata (expect more in the future). This simplifies archival of old media without the need for managing archive on your own. Leveraging IPFS on the client can be added later, but I believe that as long you use CIDs and content-paths in URLs user agents like Brave or IPFS Companion will be able to leverage them and load data from IPFS thanks to protocol upgrade path. Let us know if you have any questions / concerns / ideas. |
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
(please use comments on the actual diff if you're expecting engagement from MSC authors or the SCT - otherwise feedback is outright ignored) |
|
||
## Security considerations | ||
|
||
**TODO: This.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPFS is known to have anonymity and privacy issues (fingerprintable, no audits, no Tor support), so it might be problematic for anonymous users. Running it on homeservers could mitigate this issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPFS was never designed to be private. It was designed to be censorship and attack resistant, but not private.
So it's important to run it on servers instead of clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't connect to IPFS from clients as it's a terrible idea.
|
||
## Security considerations | ||
|
||
**TODO: This.** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IPFS was never designed to be private. It was designed to be censorship and attack resistant, but not private.
So it's important to run it on servers instead of clients.
it would be extremely useful to allow the client to bypass the `/upload` endpoint and publish its | ||
own MXC URI after having used a local IPFS node. Considering `ipfs://` support is not proposed here, | ||
clients will need to get a homeserver name/origin to put into the `mxc://` URI. They'll also need to | ||
know if the server even supports IPFS to be able to bypass `/upload` entirely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Multiple problems with using IPFS on clients:
- In a room with many members, an upload would cost multiples of the file size in bandwidth because the first few nodes will download from the uploader
- JS-IPFS isn't capable of seeding content (browser clients can't seed)
- IPFS use some background traffic to keep the DHT connections, which wastes mobile data. Unacceptable on phones.
- This would generate much more traffic than needed for downloading
- If viewers don't seed, then everybody will download from the uploader. In a 200 member room, a 8 MiB image would cost 1.6 GiB for the uploader. Unacceptable on mobile data.
- If viewers seed, then viewing an image would cost multitudes of the file size in traffic. (Dowload it once, and seed it for multiple peers.) Unacceptable on mobile data.
- If the uploader has a slow connection, it will be slow for everybody
- It has to be pinned on the client, which means it will use lot of storage. Unacceptable on phones. If it would be done that way, I would have to delete Element from my phone because it would use so much space that I don't have on my phone.
- If client sends a file in a DM, then becomes offline, the chat partner can't download it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the above, including privacy concerns when providing data to IPFS swarm, assume a full IPFS node runs on the client that is also the only provider for the CID.
However, that is not the only possible architecture. P2P part of IPFS is optional. I'd argue even running full node is optional:
- CID could be created on the client, and sent to Matrix server as CAR file for pinning and providing.
- Or the original file could be sent to Matrix server, and it would do the work and produce CID for it.
In both cases, the client does not leak IP by providing data to the IPFS network.
In both cases, Matrix server performs a similar storage/retrieval function to what it does right now.
The only real difference is standardizing on identifying data with CIDs.
In my mind, the value of using IPFS in Matrix are CIDs. Content-addressed identifiers allow the community to keep the data alive in addition to Matrix server operators, and use the same data outside of Matrix (and benefit from the pinning and caching on various layers).
Matrix servers could cap costs and set up policy to "pin CIDs for X amount of time/space" and if people want them to be available for longer, they can cache them on their clients, external pinning services, or run their own IPFS node and start reproviding it to the IPFS swarm on their own.
When opening very old messages which are no longer kept around by Matrix server, one could still be able to retrieve the content, as long it was pinned somewhere.
- Providing CIDs to IPFS swarm can be handled by Matrix server, OR multiple external pinning services.
- Even when Matrix server does not have the data anymore, HTTP Gateways allow delegating retrieval of IPFS content without running full node or providing anything to the network.
- Client can request deserialized data (and trust the gateway did it correctly - fine if it is provided by Matrix server), or make a trustless request for verifiable application/vnd.ipld.raw or vnd.ipld.car response and do inexpensive validation and deserialization on the client – this way, any gateway can be used, it no longer needs to be run by the same operator as Matrix server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expanded a bit on this MSC here: DarkKirb@d8516e5
The biggest difference between the current msc and my changes is that I added a pinning endpoint. I also changed the model a bit from the original vision. In many cases, nothing will change for the client, except that they can download the media from an IPFS gateway of their choice. Clients will continue uploading to the media server by default, due to the concerns others have listed here. They are also now listed in the MSC. I linked some related RFCs.
Rendered
This is inspired by matrix-media-repo's work towards IPFS, but still needs work.
This is done with a community hat on: