-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] MSC2706: IPFS as a media repository for Matrix #2706
base: old_master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,77 @@ | ||
# MSC2706: IPFS as a media repository | ||
|
||
The current media/content repository in Matrix is somewhat reliant on the origin server staying | ||
online indefinitely to serve the media, which is not always the case. Some servers may be bandwidth | ||
constrained (don't want to be dealing with thousands of people requesting media from them) or simply | ||
go down for maintenance/indefinite closure. When this happens, it would be useful to have media | ||
stored on other nodes and have a way to contact them. | ||
|
||
We could invent our own system for finding out which other servers have a copy of the given media | ||
and gossip it, or we could rely on a solution that has solved this problem. | ||
|
||
[IPFS](https://ipfs.io/) describes itself as a peer-to-peer hypermedia protocol and fits perfectly | ||
within Matrix's vision of an open, secure, and decentralised world. It handles media distribution | ||
for free (from our perspective) and is easily integrated into Matrix. | ||
|
||
## Proposal | ||
|
||
If not obvious by now, the proposal is to use IPFS within Matrix for media handling. Unfortunately | ||
this proposal does not recommend using `ipfs://` URIs in place of `mxc://` for backwards compatibility | ||
reasons, however is sufficient adoption is achieved then Matrix could easily switch over to that. | ||
For now, clients and servers *should* handle `ipfs://` URIs if they see them however this proposal | ||
mostly focuses on introducing IPFS in a backwards compatible manner. | ||
|
||
**TODO: Decide if not using `ipfs://` is a mistake.** | ||
|
||
IPFS uses "content IDs" (or "cid") to reference media which are compatible with Matrix's media IDs (**TODO: CONFIRM**), | ||
making the process even easier to migrate. To support backwards compatability with older clients | ||
and servers, the media ID is proposed to be formatted as `ipfs:<cid>` for IPFS-hosted media. This | ||
will allow legacy servers and clients to contact their homeserver and resolve it to an IPFS gateway | ||
to be served while indicating to supporting implementations that they do not need to contact the | ||
origin server and can instead use IPFS directly to retrieve the media. | ||
|
||
For completeness, an example IPFS-styled MXC URI would be `mxc://example.org/ipfs:cidgoeshere`. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I believe #2703 could use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It might also be helpful to use a CID that points to an IPLD object containing metadata, such as filename, MIME type, or |
||
|
||
Because clients can embed an IPFS node into themselves or [access IPFS from the browser](https://github.com/ipfs/in-web-browsers/blob/master/ADDRESSING.md), | ||
it would be extremely useful to allow the client to bypass the `/upload` endpoint and publish its | ||
own MXC URI after having used a local IPFS node. Considering `ipfs://` support is not proposed here, | ||
clients will need to get a homeserver name/origin to put into the `mxc://` URI. They'll also need to | ||
know if the server even supports IPFS to be able to bypass `/upload` entirely. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Multiple problems with using IPFS on clients:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. All the above, including privacy concerns when providing data to IPFS swarm, assume a full IPFS node runs on the client that is also the only provider for the CID. However, that is not the only possible architecture. P2P part of IPFS is optional. I'd argue even running full node is optional:
In both cases, the client does not leak IP by providing data to the IPFS network. In my mind, the value of using IPFS in Matrix are CIDs. Content-addressed identifiers allow the community to keep the data alive in addition to Matrix server operators, and use the same data outside of Matrix (and benefit from the pinning and caching on various layers). Matrix servers could cap costs and set up policy to "pin CIDs for X amount of time/space" and if people want them to be available for longer, they can cache them on their clients, external pinning services, or run their own IPFS node and start reproviding it to the IPFS swarm on their own. When opening very old messages which are no longer kept around by Matrix server, one could still be able to retrieve the content, as long it was pinned somewhere.
|
||
|
||
To permit the bypass of `/upload`, a new capability is proposed: `m.ipfs`. When present, this indicates | ||
to the client that the server's media repo is IPFS-capable and thus can be bypassed. Clients will still | ||
need to know an origin to provide in the MXC URI however. Clients should use the following steps to | ||
determine an appropriate origin: | ||
|
||
1. The one they were explicitly provided (in the case of a user wanting to use a particular gateway). | ||
2. The origin specified by the optional `preferred_origin` in the `m.ipfs` capability. | ||
3. The domain name for the user's ID, as a default option. | ||
|
||
---- | ||
|
||
This proposal does encourage that client implementation embed IPFS support to avoid having to contact | ||
the homeserver for content. Clients might still wish to use functionality like thumbnails from the | ||
server, however if specified well enough by other MSCs a client could feasibly use the `thumbnail_uri` | ||
provided by the sending client to display appropriate content without ever having to contact the | ||
homeserver. | ||
|
||
## Potential issues | ||
|
||
**TODO: Investigate ways to mitigate.** | ||
|
||
* Retention and redaction, erasure. | ||
* Spam, abuse, etc | ||
* Quarantining content (not currently specified, but should be considered). | ||
|
||
## Alternatives | ||
|
||
**TODO: Find other solutions than IPFS and explain why they're bad.** | ||
|
||
## Security considerations | ||
|
||
**TODO: This.** | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IPFS is known to have anonymity and privacy issues (fingerprintable, no audits, no Tor support), so it might be problematic for anonymous users. Running it on homeservers could mitigate this issue There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IPFS was never designed to be private. It was designed to be censorship and attack resistant, but not private. |
||
|
||
## Unstable prefix | ||
|
||
While this MSC is not in a released version of the spec, `io.t2bot.ipfs` should be used in place of | ||
`m.ipfs`. No special endpoints, version flags, or other prefixes are required for this MSC. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who pins the IPFS content? IMO, server-side pinning creates an opportunity for managing retention and redaction using IPFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by pinning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From https://docs.ipfs.io/concepts/persistence/,
AFAIK, the current method of retrieving Matrix media effectively "pins" media on all participant servers. Ideally, a server could do a reference count on IPFS resources and pin them accordingly. The difficult part with that would be that there is no standard way of determining which media an event references without knowing its schema. I.e, if I create a new event type and upload media with it, the server has no clear way of pulling the media out of that new event except by searching for all
mxc
URLs.A future P2P world could use a more conservative pinning algorithm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is the MSC to solve that problem tbh. The server doesn't need to pin it, and in popular enough rooms the media will get shared across other nodes naturally.
We could try and pin the media to a server, however in a p2p environment we'd probably want to do the opposite in support of freedom?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw it is not proposed (and won't be proposed when this MSC is de-drafted) to have the old media system disappear. It would still exist, just at a lesser prominence than IPFS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah probably best to leave up to a media pruning MSC.
If not pinned, all participant nodes will prune it if its not accessed for a while, so at least 1 node has to pin it. This could be just the originating server, but if that server goes offline, it can't be accessed. Retaining access after a server goes offline may also be beyond the scope of this MSC. Just
add
ing a file to IPFS pins it though, (unless otherwise specified) so if the originating server adds it, then others will be able to access it.True... AFAIK, each client would serve as a pinning node in this case, so technically no "servers."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pinning it on clients can reveal your IP address to all other participants.
In large rooms this is harder because it's harder to tell what IP belongs to what user, but in smaller rooms, it's easier.
In DMs it's trivial: