Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MSC 1659 Proposal: Change Event IDs to Hashes #1659

Merged
merged 16 commits into from
Jan 30, 2019
Merged
Changes from 13 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
123 changes: 123 additions & 0 deletions proposals/1659-event-id-as-hashes.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Changing Event IDs to be Hashes
turt2live marked this conversation as resolved.
Show resolved Hide resolved

## Motivation

Having event IDs separate from the hashes leads to issues when a server receives
multiple events with the same event ID but different reference hashes. While
APIs could be changed to better support dealing with this situation, it is
easier and nicer to simply drop the idea of a separate event ID entirely, and
instead use the reference hash of an event as its ID.

## Identifier Format

Currently hashes in our event format include the hash name, allowing servers to
choose which hash functions to use. The idea here was to allow a gradual change
between hash functions without the need to globally coordinate shifting from one
hash function to another.

However now that room versions exist, changing hash functions can be achieved by
bumping the room version. Using this method would allow using a simple string as
the event ID rather than a full structure, significantly easing their usage.

One side effect of this would be that there would be no indication about which
hash function was actually used, and it would need to be inferred from the room
version. To aid debuggability it may be worth encoding the hash function into
the ID format.

**Conclusion:** Don't encode the hash function, since the hash will depend on
the version specific redaction algorithm anyway.

The proposal is therefore that the event IDs are a sha256 hash, encoded using
[unpadded
Base64](https://matrix.org/docs/spec/appendices.html#unpadded-base64), and
prefixed with `$` (to aid distinguishing different types of identifiers). For
example, an event ID might be: `$CD66HAED5npg6074c6pDtLKalHjVfYb2q4Q3LZgrW6o`.

The hash is calculated in the same way as previous event reference hashes were,
which is:

1. Redact the event
2. Remove `signatures` field from the event
3. Serialize the event to canonical JSON
4. Compute the hash of the JSON bytes

Event IDs will no longer be included as part of the event, and so must be
calculated by servers receiving the event.


## Changes to Event Formats

As well as changing the format of event IDs, we also change the format of the
`auth_events` and `prev_events` keys in events to simply be lists of event IDs
(rather than being lists of tuples).

A full event would therefore look something like (note that this is just an
illustrative example, and that the hashes are not correct):

```json
{
"auth_events": [
"$5hdALbO+xIhzcLTxCkspx5uqry9wO8322h/OI9ApnHE",
"$Ga0DBIICBsWIZbN292ATv8fTHIGGimwjb++w+zcHLRo",
"$zc4ip/DpPI9FZVLM1wN9RLqN19vuVBURmIqAohZ1HXg",
],
"content": {
"body": "Here is the message content",
"msgtype": "m.message"
},
"depth": 6,
"hashes": {
"sha256": "M6/LmcMMJKc1AZnNHsuzmf0PfwladVGK2Xbz+sUTN9k"
},
"origin": "localhost:8800",
"origin_server_ts": 1548094046693,
"prev_events": [
"$MoOzCuB/sacqHAvgBNOLICiGLZqGT4zB16MSFOuiO0s",
],
"room_id": "!eBrhCHJWOgqrOizwwW:localhost:8800",
"sender": "@anon-20190121_180719-33:localhost:8800",
"signatures": {
"localhost:8800": {
"ed25519:a_iIHH": "N7hwZjvHyH6r811ebZ4wwLzofKhJuIAtrQzaD3NZbf4WQNijXl5Z2BNB047aWIQCS1JyFOQKPVom4et0q9UOAA"
}
},
"type": "m.room.message"
}
```

## Changes to existing APIs

All APIs that accept event IDs must accept event IDs in the new format.

For S2S API, whenever a server needs to parse an event from a request or
response they must either already know the room version *or* be told the room
version in the request/response. There are separate MSCs to update APIs where
necessary.

For C2S API, the only change clients will see is that the event IDs have changed
format. Clients should already be treating event IDs as opaque strings, so no
turt2live marked this conversation as resolved.
Show resolved Hide resolved
changes should be required. Servers must add the `event_id` when sending the
event to clients, however.

Note that the `auth_events` and `prev_events` fields aren't sent to clients, and
so the changes proposed above won't affect clients.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless they use the federation event format, but then they're on their own anyways.



## Protocol Changes

The `auth_events` and `prev_events` fields on an event need to be changed from a
list of tuples to a list of strings, i.e. remove the old event ID and simply
have the list of hashes.

The auth rules also need to change:

- The event no longer needs to be signed by the domain of the event ID (but
still needs to be signed by the sender’s domain)

- We currently allow redactions if the domain of the redaction event ID
matches the domain of the event ID its redacting; which allows self
erikjohnston marked this conversation as resolved.
Show resolved Hide resolved
redaction. This check is removed and redaction events are always accepted.
Instead, the redaction event only takes effect and is sent down to clients
if/when the original event is received, and the domain of the events'
senders match. (While this is clearly suboptimal, it is the only practical
suggestion)