Currently, matrix media/content repositories work with a MXC to blob mapping, fetching the media from the domain embedded in the MXC to present it to the user.
However, this becomes a problem when media retention, redaction, and resiliency come into play, the singular MXC URI becoming a point of failure once the backing server retracts the URI, either deliberately (aforementioned redaction), or accidentally (via server reset, or losing the backing media).
This is in opposition to how MXCs are used in matrix today, much like Discord media URLs; immutable and always online, links are copied and reused across rooms.
I propose for MXCs to be reworked into being a pointer to hashes.
This gives the extra benefit of decoupling aliasing pointers (such as the MXC is) with the underlying media.
Alongside this change, I also propose for an additional client-side endpoint which can quickly "clone" a MXC. This being done by having the server look up the MXC's hash, and then creating a new MXC also referencing that hash.
The client-server content API would expose a method for the client to retrieve the hash of a particular MXC, alongside aforementioned method to clone it.
The server-server content API would add a dedicated fetch method for fetching the hash to a MXC, and fetching the media to a hash.
This proposal would like to add the following two methods to CS;
POST _matrix/media/v1/clone/{serverName}/{mediaId}
Rate-limited: Yes
Authentication: Yes
Responses:
200: JSON (see below)
429: Ratelimited
503: Could not fetch remote MXC-to-hash mapping
200 response:
{
"m.clone.mxc": "mxc://local.server/media_id"
}
GET _matrix/media/v1/hash/{serverName}/{mediaId}
Rate-limited: Yes
Authentication: Yes
Responses:
200: JSON (see below)
429: Ratelimited
503: Could not fetch remote MXC-to-hash mapping
200 response:
{
"m.mxc.hash": "1234567890abcdef" // hex-encoded hash
}
This proposal would like to add the following two endpoints to S2S;
GET _matrix/federation/v1/media/hash
Rate-limited: No
Authentication: Yes
Query parameters:
media_id: string, the local part of an MXC for which the hash is queried
Responses:
200: Pure-binary encoding of corresponding hash
404: Media ID does not exist
GET _matrix/media/v1/media/fetch/{hash}
Rate-limited: Yes
Authentication: Yes
Responses:
200: Blob of data corresponding to hash
404: Hash-media not found
429: Ratelimited
Note: this is an area of feedback, this'll be removed in the final draft
So far, the definition of "hash" has been vague. I think converging on a specific hash function could be a lock-in for future expansion.
So, i'd like to propose using multihash
for these
purposes, this would allow a common format self-describing the hashes used.
For now, only a set series of hashes would be included (see here for a full table), which can be expanded/deprecated with subsequent matrix spec releases, without changing up the format of the hash, or documenting checks to differentiate the types of hash used, or to reinvent multihash.
However, this is up for debate.
This MSC wishes to unblock efforts for media retention and redaction;
By addition of the /clone
endpoint, any client wishing to preserve media, can do so by simply
fetching/storing media locally, reducing the linkrot effect that remote servers redacting media
could have.
This MSC would also wish to make matrix more flexible for diverse media delivery systems.
Mapping MXCs to hashes could allow the hashes themselves to become self-verifying keys in any (centralized or distributed) KV store.
This, in turn, could prepare matrix better for P2P efforts.
This MSC also wishes to make matrix content delivery more resilient, with the exception of mapping a MXC alias to a hash, a hash could be retrieved from anywhere, and still be self-verifying, considerably lessening the bus factor, and allowing for better distributed load (see the first "future extension" in below section)
This could have a slight performance hit, as an extra RTT between servers is needed to fetch the media actual, after fetching the hash corresponding to that bit of media.
I think this is a more acceptable tradeoff, an alternative would be to side-channel the hash in a header, in an endpoint fetching directly from a MXC.
Note: this is free-form speculation, and serves to illustrate how future MSCs can extend the behavior this MSC is enabling.
A possible extension would be a server-server endpoint which requests what recommended content endpoints would be to fetch hashes from.
(I.e. a server would ask /media/endpoints
, and the server can respond with
["https://common.caching.server", "https://matrix.org"]
, in decreasing order of priority)
This can be helpful when servers share a common "media server", as is the case today with matrix-media-repo, which "tricks" federation by redirecting any request for media to itself. This future extension would formalize this process.
This would also be helpful with dealing with "thundering herds", as servers can be redirected to multiple servers to fetch media from a hash from.
(However, as-is, this could have security problems with DoS-ing, issues with cache invalidation after redacting media, and possibly more. This is only to illustrate flexibility.)
Another possible extension could be to allow to tap in natively to decentralized media stores, which often key their data to hashes. This could make media P2P easier to implement and work with.
One last possible extension is to add 410
to every endpoint pertaining fetching media, this could
help with communicating that media has been deleted to servers and clients.
A big part of this MSC's motivation is to unblock media redaction/retention efforts. However, that does not mean this MSC should be blind to the struggle of containing unsavory media across federation.
This MSC adds a /clone
endpoint, by which a client, on any server, could easily "copy" media,
seemingly making containment efforts useless.
However, at a room-level, and possibly a server-level, hashes themselves could be banned. This can be implementation-specific, or be built-into bots like mjolnir.
This MSC uses the unstable prefix nl.automatia.msc3468
;
_matrix/media/nl.automatia.msc3468/clone/{serverName}/{mediaId}
_matrix/media/nl.automatia.msc3468/hash/{serverName}/{mediaId}
_matrix/federation/nl.automatia.msc3468/media/hash
_matrix/media/nl.automatia.msc3468/media/fetch/{hash}
nl.automatia.msc3468.clone.mxc
nl.automatia.msc3468.mxc.hash