-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MSC2732: Olm fallback keys #2732
Changes from all commits
7595a63
94465e0
f831162
48b0196
de65f4e
ddbcba4
0841f31
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
# MSC2732: Olm fallback keys | ||
|
||
Olm uses a set of one-time keys when initializing a session between two | ||
devices: Alice uploads one-time keys to her homeserver, and Bob claims one of | ||
them to perform a Diffie-Hellman to generate a shared key. As implied by the | ||
name, a one-time key is only to be used once. However, if all of Alice's | ||
one-time keys are claimed, Bob will not be able to create a session with Alice. | ||
|
||
This can be addressed by Alice uploading a fallback key that is used in place | ||
of a one-time key when no one-time keys are available. | ||
|
||
## Proposal | ||
|
||
A new request parameter, `fallback_keys`, is added to the body of the | ||
[`/keys/upload` client-server API](https://matrix.org/docs/spec/client_server/r0.6.1#post-matrix-client-r0-keys-upload), which is in the same format as the | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this also add any new response to having successfully uploaded the fallback keys, similar to OTKs? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't currently have any addition to the response. |
||
`one_time_keys` parameter with the exception that there must be at most one key | ||
per key algorithm. If the user had previously uploaded a fallback key for a | ||
given algorithm, it is replaced -- the server will only keep one fallback key | ||
per algorithm for each user. | ||
|
||
When uploading fallback keys for algorithms whose key format is a signed JSON | ||
object, client should include a property named `fallback` with a value of | ||
`true`. | ||
|
||
Example: | ||
|
||
`POST /keys/upload` | ||
|
||
```json | ||
{ | ||
"fallback_keys": { | ||
"signed_curve25519:AAAAAA": { | ||
"key": "base64+public+key", | ||
"fallback": true, | ||
"signatures": { | ||
"@alice:example.org": { | ||
"ed25519:DEVICEID": "base64+signature" | ||
} | ||
} | ||
} | ||
} | ||
} | ||
``` | ||
|
||
When Bob calls `/keys/claim` to claim one of Alice's one-time keys, but Alice | ||
has no one-time keys left, the homeserver will return the fallback key instead, | ||
if Alice had previously uploaded one. Unlike with one-time keys, fallback keys | ||
are not deleted when they are returned by `/keys/claim`. However, the server | ||
marks that they have been used. | ||
uhoreg marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
A new response parameter, `device_unused_fallback_key_types`, is added to | ||
`/sync`. This is an array listing the key algorithms for which the server has | ||
an unused fallback key for the device. If the client wants the server to have a | ||
fallback key for a given key algorithm, but that algorithm is not listed in | ||
`device_unused_fallback_key_types`, the client will upload a new key as above. | ||
|
||
The `device_unused_fallback_key_types` parameter must be present if the server | ||
supports fallback keys. Clients can thus treat this field as an indication | ||
that the server supports fallback keys, and so only upload fallback keys to | ||
servers that support them. | ||
|
||
Example: | ||
|
||
`GET /sync` | ||
|
||
Response: | ||
|
||
```jsonc | ||
{ | ||
// other fields... | ||
"device_unused_fallback_key_types": ["signed_curve25519"] | ||
} | ||
``` | ||
|
||
## Security considerations | ||
|
||
Using a fallback key rather than a one-time key has security implications. An | ||
attacker can replay a message that was originally sent with a fallback key, and | ||
the receiving client will accept it as a new message if the fallback key is | ||
still active. Also, an attacker that compromises a client may be able to | ||
retrieve the private part of the fallback key to decrypt past messages if the | ||
client has still retained the private part of the fallback key. | ||
|
||
For this reason, clients should not store the private part of the fallback key | ||
indefinitely. For example, client should only store at most two fallback keys: | ||
the current fallback key (that it has not yet received any messages for) and | ||
the previous fallback key, and should remove the previous fallback key once it | ||
is reasonably certain that it has received all the messages that use it (for | ||
example, one hour after receiving the first message that used it). | ||
Comment on lines
+85
to
+89
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the client supposed to rotate the fallback key as soon as the current one is used? If so, that implies it needs to store more than two keys at once. I think it would be good to give some clearer guidance on how often the client should rotate the keys. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes. We could say that it should be rotated after it has done processing the However, this changes if the client opts not to use one-time keys and to only use fallback keys. And an attacker could possibly run through all the one-time keys quickly. So, I think we could either
I think that I prefer the first option. I can't really think of other options, aside from variations of the theme, that allow an upper bound on the number of keys a client has to keep.
Comment on lines
+84
to
+89
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've no objections to this from the POV of the protocol, but still think it would be nice to clarify how key rotation is expected to work. |
||
|
||
For addressing replay attacks, clients can also keep track of inbound sessions | ||
to detect replays. | ||
|
||
## Unstable prefix | ||
|
||
The `fallback_keys` request parameter and the `device_unused_fallback_key_types` | ||
response parameter will be prefixed by `org.matrix.msc2732.`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not quite clear to me what exactly the purpose and tradeoffs of this entire design are (hence attaching this thread to the first line, as good a place as any).
What does this improve over just having a number of one-time keys? As I understand it, the benefit is that fallback keys can be used indefinitely and so there is no limit on established sessions like with pre-generated keys.
But then why not drop the pre-generated key mechanism in favour of this mechanism entirely? As I understand it, because the security guarantees of this model are weaker, and so it is preferable to use pre-generated keys where possible.
But then doesn't this weaken the overall security model, by making it possible for an attacker to intentionally exhaust all of someone's pre-generated keys, essentially carrying out a downgrade attack and forcing them (or rather, the people communicating with them) into fallback keys being used instead, which would weaken security properties?
I have difficulty squaring this circle and understanding how this can be both a) useful, b) secure, and c) additive to the current model. Can you shed some light on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1. I have vague memories from staring at the signal protocol (which does the same trick) that this wasn't as much of a disaster as you might think, but I can't remember why. @uhoreg I think this is worth spelling out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, I keep forgetting to respond to this.
My understanding is that it is slightly weaker security, but it's not a huge difference. So it is "better" to use one-time keys, but when one is unavailable, a fallback key is "good enough" in many cases. But this proposal allows clients to make up their own mind about the tradeoffs. If they don't think that a fallback key is secure enough, then they don't need to use it. If they don't care at all about the extra security from one-time keys vs. fallback keys, they can drop one-time keys completely and just rely on fallback keys. If they want to the extra security from one-time keys when available, but don't want to inconvenience the user when one-time keys aren't available, it can choose to do that too.
As an explanation of the security difference: under an assumption that an attacker cannot break Curve25519, and so must attack the client directly to get the private keys, there is not much difference between using a one-time key and a fallback key. If an attacker is able to extract the keys from the client, then they will have the ciient's private identity key, all the one-time keys that have already been used, and the current fallback key. And the only difference between having a private key for a one-time key and for a fallback key is that the fallback key may have been used already in a session that was already processed. But if a client promptly replaces a fallback key after it has been used and forgets the private key quickly (after it's reasonably sure that it has received all the sessions that use it), then the difference is small.
So I think this comes down to a case of cryptographic ideal vs. pragmatism. From a cryptographic standpoint, one-time keys are the way to go, because we're paranoid and worry about Curve25519 being broken. But practically speaking, the tradeoff between the possibility of someone cracking Curve25519 or attacking the client at just the right moment that they can decrypt some extra sessions, versus the inconvenience of having undecryptable messages because we ran out of one-time keys, for most people, I think, leans towards convenience in this case. But again, the client can make they choice about what to do, and the user can choose a client that matches their paranoia level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do OTKs help if an attacker can break Curve25519? If they have the encrypted payloads it seems reasonable that they'll also see the OTK key go past as its claimed? Or am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with OTKs you have to break every single curve25519 keypair. With fallback keys if you break the curve25519 fallback key you immidiately have all sessions with devices who used that fallback key to initiate communication.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this correction of the misconception correct (🥁)? I have not been around back then but as far as I can tell libolm was forked from Axolotl (v2?) which did have a concept of a fallback key, Axolotl (v3?) which introduced x3dh appeared a couple of months after the fork. The removal of the fallback key can be found in this commit.
I don't know if the truncation has been changed or not, but I think that one case of such changes (the removal of the fallback key) and one potential such change (the removal of one-time keys) is enough to get my point across.
I don't think this reversal is true given the above so I won't address everything, but my complaint isn't about following Signal step by step, it's about introducing changes to complex protocols without proper justification.
Fallback keys are used to defend against denial of service attacks. We should know why we don't want this property in the protocol.
One-time keys are used to offer better forward secrecy. Again, we should know why this isn't desirable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be almost unhelpfully pedantic here, OTKs don't strictly allow this as they're not deleted immediately on key use, they're deleted once the client has become aware that they've been used. Sure, this is likely a short window of time, but then if the fallback keys are only kept for a short amount of time the situations become effectively equivalent. I don't believe any of our forward secrecy actually immediately protects all old messages, rather it protects sufficiently old messages, i.e. there's always a period of time in the past where an attacker can still get messages for.
This is the crux of what confuses me: if OTKs only slightly reduce this window of time, and we're saying we're happy with the window that having fallback keys gives us (since we're using them), then are OTKs worth the additional complexity?
However:
I think is what I've been missing here, I've been assuming that we can delete the fallback private keys quickly, and so the window of attack for OTKs vs fallback keys are effectively the same. If we have to keep round the fallback keys for hours, then that's probably(?) a sufficiently large window that using OTKs to reduce that windows makes sense.
The other piece here is that while an attacker can drain a device's OTKs, and so force new sessions to use the fallback keys, that is an active attack that can be observed by the servers and device. That will at least give some breadcrumbs that something fishy is going on, rather than allowing completely passive attacks. Though since draining of OTKs happens sufficiently often that we're making this MSC, I don't know if anybody would actually notice the draining of OTKs.
Basically: adding fallback keys weakens security, and while removing OTKs would weaken security some more, do they provide enough meaningful protection to warrant their complexity? It sounds like the answer is "yes, just", so I'm OK keeping them long term if people agree with the above analysis.
I'd rather these thoughts were recorded on the MSC for future reference, so that they can be linked back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh. I'll get back in my box. Sorry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worse than that, since there is no way to determine the number of times the fallback key was retrieved (as far as I'm aware), we have no deterministic way of concluding that all sessions involving that fallback key have been created. So the choices are between keeping a given fallback key for an indefinite amount of time or else risk having undecryptable messages.
This doesn't happen with OTKs due to their property that they will be used at most once, so when a session is created for a given OTK, the client can conclude with certainty that it can drop that OTK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally, it should be fairly safe to assume that when a user claims a one-time key (whether it's actually an OTK or ends up being a fallback key), they're going to use it right away. So clients should only need to keep the fallback key for a little time after it's used (maybe a couple minutes after the sync in which they notice that it's been used, to allow for network delays). The risk of having undecryptable messages is somewhat mitigated by having olm unwedging and key resharing, though that requires the sender to come back online.
We actually do have a similar problem with OTKs. libolm tries to use a constant amount of memory, so it only has limited space for OTKs. That means that libolm will sometimes evict OTKs when it generates new ones, so if someone claims a OTK and waits too long to use it, it may have been evicted by the time they finally get around to using it.