matrix-org · dbkr · Sep 26, 2018 · Sep 26, 2018 · Oct 23, 2018 · Oct 31, 2018
diff --git a/proposals/1687-encrypted-recovery-keys.md b/proposals/1687-encrypted-recovery-keys.md
@@ -0,0 +1,156 @@
+# Proposal for storing an encrypted recovery key on the server to aid recovery of megolm key backups
+
+## Problem
+
+[MSC1219](https://github.com/matrix-org/matrix-doc/issues/1219) proposes an API
+for optionally storing encrypted megolm keys on your homeserver, so if a user
+loses all their devices, they can still recover their history.  The megolm keys
+are public-key encrypted using a private Curve25519 key that only the end-user
+has.
+
+However, there are usability concerns about users having to store their
+Curve25519 recovery private key in a secure manner.  Casual users are likely to
+be scared away by having to file away a relatively long (e.g. 10 word)
+generated recovery key.
+
+We would like to give the user the option to access their key backup using a
+passphrase in addition to their recovery key. We can take inspiration from
+Apple’s [FileVault 2](https://hal.inria.fr/hal-01460615/document) where Apple
+store encrypted copies of your FileVault AES key on your hard disk, encrypted
+by your UNIX account password, or a passphrased SSH private key on a server for
+convenience.
+
+## Proposed solution
+
+Three solutions are given here (two of which are viable, one included for
+completeness), varying in the implications of the user changing their
+passphrase.
+
+Option 1 has been chosen, on the basis that we do not require the user to
+be able to change their passphrase without also changing their recovery key.
+
+### Recovery Key
+
+In all options below, the process for generating a recovery key from a byte
+string, b is as follows:
+ * Prepend the two bytes 0x8B, 0x01 to the byte string b
+ * Compute a parity bit by XORing all bytes of the resulting string (ie. prefix
- * Compute a parity bit by XORing all bytes of the resulting string (ie. prefix
+ * Compute a parity byte by XORing all bytes of the resulting string (ie. prefix
- * Compute a parity bit by XORing all bytes of the resulting string (ie. prefix
+ * Compute a parity byte by XORing all bytes of the resulting string (ie. prefix
+   + `byte string`)
+ * Append the parity byte to the prefix + b
+ * base58 encode the resulting byte string with alphabet
+   '123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz'.
+ * Format the resulting ASCII string into groups of 4 characters separated by
+   spaces.
+
+### Option 1
+
+The user provides a passphrase, P. The client generates the backup encryption
+private key, K<sup>-1</sup> by running PBKDF on this passphrase. The PBKDF
+parameters are stored in the auth_data of the key backup under
+'private_key_salt' and 'private_key_iterations' keys, respectively:
+
+```json
+{
+    [...]
+    "private_key_salt": "MmMsAlty",
+    "private_key_iterations": 100000
+}
+```
+
+The backup public encryption key, K, is determined by running the curve25519
+function on K<sup>-1</sup> with basepoint {9}. The recovery key is then
+generated by encoding K<sup>-1</sup> as above.
+
+To change the passphrase, a client creates a completely new backup version,
+performing the steps above with the new passphrase. The client then re-encrypts
+all sessions keys and uploads them to the new backup. The user will always get
-all sessions keys and uploads them to the new backup. The user will always get
+all sessions keys and uploads them to the new backup. The user gets
-all sessions keys and uploads them to the new backup. The user will always get
+all sessions keys and uploads them to the new backup. The user gets
+a new recovery key whenever they change their passphrase.
+
+In this option, the recovery key is generated directly from the passphrase
-In this option, the recovery key is generated directly from the passphrase
+In this option, the encryption key and recovery key are generated directly from the passphrase
-In this option, the recovery key is generated directly from the passphrase
+In this option, the encryption key and recovery key are generated directly from the passphrase
+using PBKDF. This means the ciphertext of the backed up keys is more vulnerable
+to dictionary attacks. Option 2b attempts to offer a mitigation against this.
+
+### Option 2a
+
+The backup encryption private key, K<sup>-1</sup> is generated by a secure
+random number generator. A private key, K<sup>-1</sup><sub>p</sub> is generated
-random number generator. A private key, K<sup>-1</sup><sub>p</sub> is generated
+random number generator. A second private key, K<sub>p</sub><sup>priv</sup> is generated
-random number generator. A private key, K<sup>-1</sup><sub>p</sub> is generated
+random number generator. A second private key, K<sub>p</sub><sup>priv</sup> is generated
+by running PBKDF on the passphrase. K<sup>-1</sup><sub>p</sub>' is generated by
+XORing K<sup>-1</sup> with K<sup>-1</sup><sub>p</sub>.
+K<sup>-1</sup><sub>p</sub>' is stored on the along with the key backup in the
+`private_key` object above. The recovery key is generated by encoding
+K<sup>-1</sup> as above.
+
+To change the passphrase, the client generates the new
+K<sup>-1</sup><sub>p</sub> from the new passphrase then computes a new
+K<sup>-1</sup><sub>p</sub>'. It then updates the backup information with this
+new K<sup>-1</sup><sub>p</sub>'.
+
+This would require the API to support updating the metadata stored with a
+backup (or the key parameters to be stored elsewhere, eg. in account data).
+
+This option, however, allows the server to obtain K<sup>-1</sup> by obtaining
+any one of the users previous passphrases, assuming it keeps copies of the
+previous versions of the key parameters. This option is therefore not viable,
+but included for completeness.
+
+### Option 2b
+
+A variant on option 2a is to regenerate K<sup>-1</sup> when the passphrase is
+changed, meaning the recovery does change when the passphrase is changed,
+making it identical feature-wise to option 1 and without the problem of any
+previous passphrase being sufficient to obtain K<sup>-1</sup>. It differs,
+however, in that K<sup>-1</sup> is generated randomly and therefore not
+vulnerable to dictionary attacks. However, K<sup>-1</sup><sub>p</sub> is still
+vulnerable to dictionary attacks and is stored in the same place with the same
+protection, and, if compromised, gives access to K<sup>-1</sup>. This option
+therefore offers no significant security benefit over option 1.
+
+### Option 3
+
+The backup encryption private key, K<sup>-1</sup>, and a private,
+passphrase-derived key, K<sup>-1</sup><sub>p</sub> are generated as above.The
-passphrase-derived key, K<sup>-1</sup><sub>p</sub> are generated as above.The
+passphrase-derived key, K<sup>-1</sup><sub>p</sub> are generated as above. The
-passphrase-derived key, K<sup>-1</sup><sub>p</sub> are generated as above.The
+passphrase-derived key, K<sup>-1</sup><sub>p</sub> are generated as above. The
+passphrase key counterpart, K<sup>-1</sup><sub>p</sub>', is also generated as
+above from the K<sup>-1</sup> XOR K<sup>-1</sup><sub>p</sub>. Another private
+key, K<sup>-1</sup><sub>r</sub> is generated also by a secure random number
+generator and encoded to give the recovery key as above.
+K<sup>-1</sup><sub>r</sub>' is generated by XORing K<sup>-1</sup><sub>r</sub>
+with K<sup>-1</sup>. Both K<sup>-1</sup><sub>p</sub>' and
+K<sup>-1</sup><sub>r</sub>' are stored in the `private_key` in the backup under
+keys `passphrase_counterpart` and `recovery_key_counterpart` respectively.
+
+To change the passphrase, the client starts a new backup version as in option 1
+(generating a new K<sup>-1</sup>), but additionally computes a new
+K<sup>-1</sup><sub>r</sub>' by XORing K<sup>-1</sup><sub>r</sub> with the new
+K<sup>-1</sup>. This refreshes all keys, but allows the user to keep the same
+recovery key for their backup, on the assumption that the recovery key itself
+has not been compromised. If it has, the client generates a new backup with a
+completely fresh recovery key instead.
+
+## Security considerations
+
+The proposal above is vulnerable to a malicious server admin performing a
+dictionary attack against the encrypted passphrases stored on their server to
+access history.  (It's worth bearing in mind that the server admin can also
+always hijack its user's accounts; the thing that stopping them from
+impersonating their users is E2E device verification.)
+
+## Possible extensions
+
+In future, we could consider supporting authenticating users for login based on
+their encrypted passphrase, meaning that users only have to remember one
+password for their Matrix account rather than a login password and a
+history-access passphrase.  However, this of course exposes the user's whole
+E2E history to the risk of dictionary attacks by public attackers (i.e. not
+just server admins), keysniffer-at-login attacks or clients which are lazy
+about storing account passwords securely.  There's also a risk that because
+login passwords are much more commonly entered than history passwords, they
+might encourage users to force a weaker password.  It's unclear whether this
+reduction in security-in-depth is worth the UX benefits of a single master
+password, so we suggest checking how this proposal goes first (given in general
+we expect key recovery to happen by cross-verifying devices at login rather
+than by entering a recovery key or passphrase).
+
+## See also:
+
+Notes from discussing this IRL are at
+https://docs.google.com/document/d/11fF1rbX5eTkrfxXRS8UhpW5sBENOCydYlLWzB8X1IuU/edit