Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rclone crypt improvements (envelope encryption, truncation protection, hashing support) #7192

Open
tomekit opened this issue Jul 31, 2023 · 24 comments

Comments

@tomekit
Copy link

tomekit commented Jul 31, 2023

We're running S3Drive (GUI for S3 on desktop, mobile, web) and recently aligned with Rclone's encryption scheme for better interoperability and features like drive mount and Webdav that we could get out of the box.

Prior to that we've researched research couple different encryption schemes and despite current Rclone encryption shortcomings (from our perspective) we've went for it, knowing that there is a room for improvements.

Disclaimer, I am not a cryptographer, but rather an engineer who by the sheer luck of project choices happen to be more involved with cryptography for the past couple years.

1. No envelope encryption

The biggest shortcoming so far that we see, is that every single Rclone file content is encrypted by a single master key (derived from user provided password). On its own this doesn't pose direct security threat since every file has random 24 bytes nonce, but it limits how you can securely exchange the encrypted file.

One of our features is file sharing, this feature is native to S3 protocol. When encrypted shared file is passed to the receiver they need to obtain the file decryption key. With a current Rclone's setup the decryption key happen to be user's master key. By sharing a single file, the encryption key to ALL FILES is revealed which is very undesirable from a security point of view.

Before we've implemented Rclone's encryption, we've been using AWS client-side encryption: https://docs.aws.amazon.com/amazon-s3-encryption-client/latest/developerguide/encryption-algorithms.html#crypto_features it worked well, but it wasn't in the end feasible for us, mostly due to lack of consistent hardware accelerated streamed AES-GCM support across different platforms (web included).

What was great about AWS's approach is how they've implemented key management and key wrapping.
In other words, each file had a different encryption key (wrapped DEK), which would then be encrypted by the master key.
Sharing a resource would reveal only that single key (DEK). By choosing a key wrapping algorithm it would be possible to create multiple different encryption keys (by simply reencrypting the DEK).

I believe it's something that Rclone could use.

This change would have to be reflected in the ciphertext heading and would increase the length of a ciphertext in the ballpark of 16-32 bytes (depending on the KW algorithm).

2. No truncation protection

Currently, you can remove as many as you want blocks (starting from the end) from the ciphertext and no one would notice. There are multiple ways to address that, but given the Rclone's ciphertext format, the one that would make sense in my opinion is: https://github.com/miscreant/meta/wiki/STREAM
In other words it would be good to reserve 1 byte of nonce for the: last_block flag.
We were recently given a great response in this topic which also might be helpful for you to assess alternative options.
https://crypto.stackexchange.com/a/106992/70896

(Update) 2.5 Hashing support

Currently in order to verify plaintext integrity the whole encrypted object had to be scanned and decrypted and hash has to be calculated. This is costly, especially the bigger the file.

To solve this, encrypted object will have an MD5 hash of a plaintext appended to the footer. The hash itself would be encrypted and authenticated using same AEAD construct as file content.

This in fact is already implemented: a5d6af7

There is a 1-byte room that stores the hash type just in case in the future we need to migrate to something stronger/better than MD5.

3. Variable encryption settings

Current Rclone's settings are sane defaults which work well on a typical hardware.
They don't work well on Web though, mostly due to poor speed of XSalsa20 and overhead related to fixed 64KB block processing.

3.1 Dynamic chunk size

This could be achieved by adding the 4 byte UINT field to the header ... or could be as small as 1 byte which would be a multiplier of e.g. 64KB (that would support block size up to 16MB).

3.2 Different cipher support

It's XSalsa20 now. I am not a huge fan of AES-GCM, but there isn't really a choice for performant cipher on Web: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/decrypt
Could we indicate the cipher type in the header and possibly support AES-GCM (libsodium has consistent API on all platforms)?

4. No path protection

If you copy ciphertext from location: /a to location /b, no one would notice. Ciphertext is not aware where it belongs. Perhaps not a big issue for many and this comes with certain side effects and complexities and might not be feasible for Rclone at all. E.g. simple raw byte copy would no longer be possible (as the heading with new location would have to be updated).
I've thought I will just mention it anyway.

Final word

I am writing this to understand what's Rclone maintainers (nice to meet you Nick @ncw for the first time) view on these improvements, whether they make sense for Rclone and if they're interested in taking time to include these changes in the specification.
I can certainly participate in the planning phase and happy to do more research in terms of optimal choices, although I am sure my competence in Rclone's world falls short.

These improvements is something we would be implementing eventually, but given that benefit of staying compliant with Rclone's approach is so great... we've thought it would be a shame not to ask.
We're thinking of executing this later this year and this would approximately be the time where we (well, likely myself) could get our hands dirty with Go and build some pull requests.

@matthewmummert5
Copy link

matthewmummert5 commented Aug 11, 2023

I personally haven't contributed to rclone development (yet, I want to though since I love rclone so much), but I wanted to comment on this as long as the discussion is going. I have a few thoughts on this. I think if the crypt remote were to be improved, that the current design should be kept as legacycrypt or something to that effect, so that people can still decrypt their old stuff. As to your suggestions:

  1. Envelope encryption
    I personally would find envelope encryption very useful, so that I don't have to share passphrases with all my devices that need access to the crypt remote. However, I'd recommend against AES-GCM for this, since AES-GCM isn't key-committing and envelope encryption provides opportunities for an adversary to trick you into using the wrong key to decrypt. I can't immediately think of a way to exploit that to do a partitioning oracle attack on somebody's rclone mount, but it's better to play it safe with a fully committing AEAD scheme like AES-CTR-HMAC-SHA256, and just take the HMAC performance hit.

  2. No Truncation Protection.
    I think libsodium's secretstream API (which is available in go) would do exactly what you want it to do here. To prevent data from being truncated from the end, secretstream allows you to prepend a tag that indicates whether this chunk is the final chunk to the first byte of the plaintext. To prevent data from being truncated from the beginning, the counter portion of the nonce starts at zero. My only concern about libsodium secretstream is the fact that XChaCha20Poly1305 is not key-committing, which might be a problem if combined with envelope encryption. Doing something similar with AES-CTR-HMAC, ChaCha20-HMAC, would be ideal.

  3. Variable Encryption Settings
    No Comment

3.1 Dynamic Chunk Size
I don't really see anything wrong with the 64KB fixed chunk size. But I don't have any comment on this.

3.2 Different Cipher Support.
If envelope encryption is adopted, use a fully-committing AEAD instead of AES-GCM, just to be safe.

  1. No Path Protection.
    Hmm, how could this be implemented? Are there any other file-based encryption schemes out there that do this right?

Great discussion. I love rclone and want to see it improve. I have a full-time dayjob, and I've never contributed to rclone before but maybe I can help on weekends if people really do want to improve the crypt remote.

@tomekit
Copy link
Author

tomekit commented Sep 8, 2023

I think if the crypt remote were to be improved, that the current design should be kept as legacycrypt or something to that effect

Technically the crypt cmd could stay as it is. It would use new encryption header (e.g. RCLONE\x00\x01) and still support existing objects by detecting the current header which is RCLONE\x00\x00. Potential issue is that this would make newly encrypted objects incompatible with older version of Rclone, but if someone sets up the "crypt" then they're either setting up a new workflow (in which case using new crypt shall be fine) or they try to read existing objects (which would be possible since crypt would be backward compatible). Perhaps this is a good moment to implement some version check mechanism which would read cipher version from encryption header and render message: "Cipher version not supported, please upgrade Rclone version".

However, I'd recommend against AES-GCM for this, since AES-GCM isn't key-committing and envelope encryption provides opportunities for an adversary to trick you into using the wrong key to decrypt.

I am not a cryptographer, but I don't think this attack is practical/possible to execute in this setup. As per this answer: https://crypto.stackexchange.com/a/96033/70896, the adversary could generate ciphertext c which would be decryptable by two DEKs: k1 and k2 also in their possession. Then, the adversary (given they have access to remote location) could use ciphertext c to replace contents of existing e.g. fileA, then they could try to embed k1 as a new DEK. If they've succeeded this would break integrity property of AEAD: https://www.ietf.org/id/draft-irtf-cfrg-aead-properties-01.html#name-data-integrity (in other words user would decrypt the fileA not knowing that its contents have been forged)

The challenge is that the adversary wouldn't be able to successfully embed new DEK, since it's "wrapped" using key wrapping algorithm, which is itself authenticated. If user have tried to decrypt such object, the authenticity issue would've been captured already on the process of key unwrapping where integrity is being checked. I believe that key wrapping is the protection layer in a non committing scheme.

I think libsodium's secretstream API (which is available in go) would do exactly what you want it to do here.

I was researching this particular topic a while ago and received pretty good overview of multiple different solutions: https://crypto.stackexchange.com/a/106992/70896
The issue with libsodium's secretstream is that the chunk size is dynamic / unpredictable. This has pretty serious consequences. For instance it would no longer possible to implement random access / seek (e.g. to decrypt video stream from a certain point in time). It's also not really possible to parallelize the decryption/encryption, since you wouldn't be able to "divide" work onto chunks, you'll need to do the work sequentially. It's also heavier than a typical encryption schemes which could be utilized here.
I would not recommend secretstream, because I don't think that downsides justify benefits from using it.

Given that current Rclone's encryption scheme is extremely simple, works reasonably well and already uses constant-sized blocks, I would probably keep using similar setup except it shall be proofed against truncation protection.
Currently Rclone uses 24 bytes nonce: https://rclone.org/crypt/#header little-endian incremented.
Similar and well researched approach STREAM: https://github.com/miscreant/meta/wiki/STREAM doesn't use full 24 bytes for ctr, but instead uses:

Nonce encoding is nonce_prefix || ctr || last_block where:

nonce_prefix: 8-byte (64-bit) fixed prefix
ctr: 32-bit big endian counter value
last_block: 1-byte flag indicating if this is the last block (0x00 if false, 0x01 if true)

I am not sure if we need: nonce_prefix, but at the very least we could reserve one byte for the last_block flag. If we haven't changed the Rclone ctr endianness it would have to be stored at the beginning of the nonce instead at the end as with STREAM protocol.
We can keep Rclone endianess as it is, but only increment the first 23 bytes of nonce. 24th byte (last block indicator) would be appended to the buffer and passed to relevant encrypt/decrypt function.

Great discussion. I love rclone and want to see it improve.

Thank you for taking time to comment on this. I am also happy to participate in this discussion and I hope that eventually we will push these improvements upstream.

I don't think that we need to address all of these issues at once, however I believe that envelope encryption and truncation protection are the ones which shall be a priority.

This is also invitation for anyone to comment on this issue. 🍺

@matthewmummert5
Copy link

matthewmummert5 commented Sep 9, 2023

Technically the crypt cmd could stay as it is. It would use new encryption header (e.g. RCLONE\x00\x01) and still support existing objects by detecting the current header which is RCLONE\x00\x00.

Excellent thinking.

Perhaps this is a good moment to implement some version check mechanism which would read cipher version from encryption header and render message: "Cipher version not supported, please upgrade Rclone version".

Another good idea.

The issue with libsodium's secretstream is that the chunk size is dynamic / unpredictable. This has pretty serious consequences. For instance it would no longer possible to implement random access / seek (e.g. to decrypt video stream from a certain point in time).

This is actually a very good criticism of libsodium's secretstream API and why it might not work well for rclone. I don't actually know because I haven't checked, but does rclone actually do parallel encryption and seek to random blocks in encrypted files? I assume the answer would be yes in the case of an rclone mount.

I am not a cryptographer, but I don't think this attack is practical/possible to execute in this setup. As per this answer: https://crypto.stackexchange.com/a/96033/70896, the adversary could generate ciphertext c which would be decryptable by two DEKs: k1 and k2 also in their possession.

Don't worry, I'm not a cryptographer either lol. So I might say some stuff that is inaccurate and if somebody corrects me I'll admit I'm wrong. So, I presume that being able to share files with other people or other devices you own without sharing the password to all your files is the point of using envelope encryption? The problem with using a non-fully-committing AEAD scheme for this is very subtle:

I could share a file with both you and your friend, giving the two of you different keys to decrypt it with. You would then both be duped into thinking you got the same file even though the plaintext would be very different. Maybe decrypting the file with your key would result in a normal plaintext, but decrypting the file with your friend's key might result in a malicious plaintext. There is nothing in GHASH or Poly1305 preventing me from doing that - you need a collision resistant authenticator like HMAC to make the guarantee that I didn't give you both different keys. The authors of this presentation at USENIX Security conference demonstrate an attack very similar to this:

https://www.youtube.com/watch?v=VazqgsBwzOY

If we stick with password based encryption (which I'm totally fine with), then this committing vs non-committing AEAD stuff wouldn't be an issue. To trick you into using the wrong key, they'd have to poison your rclone config file somehow, which is definitely out-of-scope.

No matter what we do, I think we should improve the file truncation protection.

To avoid everybody interested in this topic talking past each other, I say we come up with a proposal of how this new encryption scheme should actually work. Then we can discuss the pros and cons of the scheme, and get community input on it. I think this new encryption scheme should achieve the following:

  1. Strong confidentiality guarantees.
  2. Strong tramper-resistance guarantees. It should be impossible to flip bits, reorder chunks, truncate chunks, or remove chunks without detection.
  3. The ability to randomly seek into an encrypted file should be preserved.
  4. (Optional if we use envelope encryption for file sharing) Strong guarantees that a recipient of a shared file cannot be duped into decrypting the file with the incorrect key.

What do you think? Great discussion.

@matthewmummert5
Copy link

I just wanted to point out that my current hack for the truncation protection issue is to do something like this, and I kinda hate it, lol.

find -type f | sort | xargs sha256sum | gpg --clearsign --armor | tee signed_filetree.asc

@kmlebedev
Copy link

Before we've implemented Rclone's encryption, we've been using AWS client-side encryption: https://docs.aws.amazon.com/amazon-s3-encryption-client/latest/developerguide/encryption-algorithms.html#crypto_features it worked well, but it wasn't in the end feasible for us, mostly due to lack of consistent hardware accelerated streamed AES-GCM support across different platforms (web included).

#7084 (AES/GCM/NoPadding)

@tomekit
Copy link
Author

tomekit commented Sep 18, 2023

#7084 (AES/GCM/NoPadding)

Thanks for sharing, great work.

I have no experience with KMS. Is it possible to use it simply just with the static key (in a similar way to current rclone crypt) given that there is a KMS API involved?

Do you know what key wrapping method is used by default with your setup?
https://docs.aws.amazon.com/amazon-s3-encryption-client/latest/developerguide/encryption-algorithms.html#v3-algorithms
https://docs.aws.amazon.com/amazon-s3-encryption-client/latest/developerguide/encryption-algorithms.html#supported-algorithms-older
Would that be: "AWS KMS (with an encryption context)"?

I think having support for AWS native encryption is great step forward, however I don't think that it's a good candidate to ever replace: rclone crypt.
With AES/GCM/NoPadding data is a continuous ciphertext with single authentication tag at the end. Authentication requires reading and processing stream until the end.
a) That means that random access / seeking isn't feasible (requires reading all data anyway) or possible only with authentication disabled (reading ciphertext as AES-CTR instead).
b) Since there is no cross-platform support for accelerated streamed AES-GCM that means that some clients (web, mobile) won't be able to process (encrypt/decrypt) files bigger than available memory size.
c) AES-GCM has a max file size limit of ~64GB.

@kmlebedev
Copy link

kmlebedev commented Sep 19, 2023

I have no experience with KMS. Is it possible to use it simply just with the static key (in a similar way to current rclone crypt) given that there is a KMS API involved?

This makes no sense, since security, portability and compatibility with s3 native will be lost. In my case, rclone sync plays the role of an archiving tool with client-side encryption with the ability to access via web front, supporting s3 client-side encryption and using a common local KMS API with IAM Auth nsmithuk/local-kms#56

@tomekit
Copy link
Author

tomekit commented Sep 19, 2023

This makes no sense, since security, portability and compatibility with s3 native will be lost.

The current rclone crypt is compatible with at least ~43 back-ends (https://rclone.org/overview/#features) and S3 is just one of the supported back-ends. The KMS API makes sense in the context of AWS S3 (which is a subset of 27+ S3 compatible back-ends - https://rclone.org/s3/) but for non-AWS S3 back-ends KMS API is just an additional dependency which doesn't bring much value whilst adding significant overhead/requirement (that's only my view though)

with the ability to access via web front, supporting s3 client-side encryption

What's your preferred way to access client-side encrypted objects via web front given that AWS SDK doesn't seem to support client-side encryption for their browser SDK?

@cornfeedhobo
Copy link

I have no idea how KMS has entered this discussion, but it's wildly out of scope. Offloaded encryption (which is not even possible with AWS) should be yet a different backend, similar to #1647.

The focus here should be about creating an improved crypt backend for client-side encryption.

@matthewmummert5
Copy link

matthewmummert5 commented Feb 12, 2024

A few more thoughts.

Since we've had this discussion, AEGIS is in the process of being standardized:

https://www.ietf.org/archive/id/draft-irtf-cfrg-aegis-aead-07.html

AEGIS is a key-committing AEAD scheme, so we could use if we want to add key commitment.

We don't really need a key-committing scheme though, if we just stick to the password authentication and don't bother with envelope encryption. To exploit the lack of key-commitment with the current password based system, you'd have to poison somebody's rclone config file somehow, and that's probably not in scope. I'm not sure whether or not partitioning oracle attacks could apply against somebody's live rclone mount though. Actually, it's probably safer to just use something key-committing. If we wanna be really safe and sacrifice a bit of performance, Use a stream cipher (like AES-CTR or ChaCha) and authenticate each chunk with a collision-resistant hash based authentication, like HMAC-SHA256 or blake2b.

I think the lack of truncation protection definitely needs to be fixed though. Libsodium's secretstream API does it by prepending a single byte to the plaintext before encrypting and authenticating each chunk to indicate whether or not it's the final chunk.

I think our new encryption scheme should work something like this:

  1. Prepend a random salt to each file and use it to derive a unique key for each file.
  2. Start the nonce at 0 for each file, preventing an adversary from removing data from the beginning of a file, and increment the nonce for each chunk.
  3. Prepend a 1 or 0 byte to each plaintext chunk before encryption to indicate if it's the final chunk. This will prevent an adversary from removing chunks from the end of a file.

I'm strongly in favor of using something that's key-committing, just to be on the safe side.

@tomekit
Copy link
Author

tomekit commented Jul 3, 2024

We're slowly getting back to this topic.

Prepend a random salt to each file and use it to derive a unique key for each file.

What's the purpose of deriving a unique key from a salt instead of generating truly random key?

We could generate CEK (Content Encryption Key) random key and then wrap it using AES Key Wrap RFC 3394 (it's commonly used, e.g. for Cryptomator or AWS client-side encryption)

Existing header looks like this: https://rclone.org/crypt/#header

8 bytes magic string RCLONE\x00\x00
24 bytes Nonce (IV)

New header could look like this:

8 bytes magic string RCLONE\x00\x01
23 bytes Nonce (IV) - we could store 23 bytes incremented nonce, but encryption/detection method would receive 24 bytes which means appending final byte (last block indicator) to the 23 bytes counter - required for truncation protection,
40 bytes - 32 byte encryption key wrapped with AES KW RFC3394 (8 bytes overhead),

CEK could be added after or before the Nonce, I am not sure if there is any preference here.

Start the nonce at 0 for each file, preventing an adversary from removing data from the beginning of a file, and increment the nonce for each chunk.

As far I am concerned, that's what the current encryption scheme does already. New encryption scheme should likely follow similar approach.

Prepend a 1 or 0 byte to each plaintext chunk before encryption to indicate if it's the final chunk. This will prevent an adversary from removing chunks from the end of a file.

What's the benefit of prepending 1 or 0 to every chunk which comes at 1 byte storage overhead vs using up 1 byte of nonce space as for instance in this encryption scheme? https://github.com/miscreant/meta/wiki/STREAM

Use a stream cipher (like AES-CTR or ChaCha) and authenticate each chunk with a collision-resistant hash based authentication, like HMAC-SHA256 or blake2b. [...]
I'm strongly in favor of using something that's key-committing, just to be on the safe side.

So am I, but I also try to weigh some risks that come from non standard encryption use and potential additional development / maintenance costs and performance hit (hash based authentication will likely be slower then polynomial, but that's only a guess).

We don't really need a key-committing scheme though, if we just stick to the password authentication and don't bother with envelope encryption.

What I am trying to understand is whether key commitment is required even if envelope encryption is used.

Potential attack scenario assumes that file encryption key (CEK/DEK) is swapped with something else that decrypts content and successfully verifies the authentication tag.

In the setup that I've mentioned above we could use "RFC 3394 Key wrapping" which itself provides a layer of key authentication. Potential attacker have access to wrapped CEK, since it's part of the file contents, however if adversary replaces wrapped CEK with something else the key unwrapping would have failed: https://crypto.stackexchange.com/questions/67869/does-the-aes-kw-key-wrap-algorithm-perform-authenticated-encryption

The attack on non key-committing scheme would've been possible if attacker had access to unwrapped CEK directly, however they don't since it's wrapped / protected. Don't quote me on that though.

@tomekit
Copy link
Author

tomekit commented Jul 4, 2024

I've had a go (no pun intended) and implemented prototype of an envelope encryption: #7192 (comment)

32 bytes random key cek is generated which is used to encrypted file contents instead of current master data key.
cek is then wrapped with existing master data key using AES Key Wrap algorithm and then is appended as 40 bytes ciphertext just after the nonce. This makes additional 40 bytes storage overhead.
Other encryption scheme properties remain unchanged.

It satisfies: 1. No envelope encryption of this issue, however it doesn't satisfy other points that were raised.

I am going to work on: 2. No truncation protection next and hopefully come up with a solution over the next few days.

I won't address : 3. Variable encryption settings and 4. No path protection (requires risky and significant changes)

I am not yet certain how to integrate it with Rclone. I think acceptable way and not requiring much development effort would be to add config parameter to crypt command which would configure cipher version used.
Old version (let's call it 1) would use: RCLONE\x00\x00 and new version (let's call it 2) would use: RCLONE\x00\x01 encryption header. Default one would be 1, new proposed version (2) would remain opt-in.

Question remains whether new version (2), once enabled, shall allow reading version (1) objects and allow users "lazy" migration or should rather be strict and allow reading only items from selected version.
Perhaps behavior "strictness" could also be a configurable parameter, but I am 100% sure about the default.

Comments are welcome, so is the help with modifying crypt command to support both versions and tests (cipher_test.go and crypt_internal_test.go at least)

@cornfeedhobo
Copy link

I wish I had time to test for you. I probably won't have capacity to revisit this deeply until next year :(

@tomekit
Copy link
Author

tomekit commented Jul 4, 2024

Here: master...tomekit:rclone:envelope_encryption_7192_2#diff-d60621b19be9d97af46259d24a081ffd84864417b2cf80a8178f8e713db9264aR839 I've implemented truncation protection, that's approach from STREAM cipher: https://github.com/miscreant/meta/wiki/STREAM (I was also influenced by a helpful response on Crypto Stack Exchange: https://crypto.stackexchange.com/a/106992/70896)
24 bytes nonce consists now 23 bytes (stored in the file header) counter and 1 last byte: last_block (derived on the fly during decryption/encrypted).

Current condition which sets the: lastBlockFlag or doesn't set it which implicitly remains: 0

if n < blockDataSize { // last block
    fh.nonce[len(fh.nonce)-1] = lastBlockFlag // Set last block flag at the last byte
}

Perhaps we could set the: 0 more explicitly.

if n < blockDataSize { // last block
    fh.nonce[len(fh.nonce)-1] = lastBlockFlag // Set last block flag at the last byte
} else {
    fh.nonce[len(fh.nonce)-1] = 0
}

I wasn't sure if we can rely on the fact that if flag is not set then this last byte will be zero anyway (implicit).
I am not entirely sure whether buffer is empty/zeroed from start or perhaps may contain some garbage.

This requires adding tests, e.g. encrypting file smaller, equal or bigger than blockSize = blockHeaderSize + blockDataSize.

I am going to leave this for a short while in a hope that some comments or suggestions come in.

In a few weeks I shall find some time to integrate it with Rclone crypt command and maybe write some tests.

@matthewmummert5
Copy link

matthewmummert5 commented Jul 4, 2024

What's the purpose of deriving a unique key from a salt instead of generating truly random key?

We could generate CEK (Content Encryption Key) random key and then wrap it using AES Key Wrap RFC 3394 (it's >commonly used, e.g. for Cryptomator or AWS client-side encryption)

I actually like this better than what I suggested with using a salt. I think using a standard like this is an excellent idea.

What's the benefit of prepending 1 or 0 to every chunk which comes at 1 byte storage overhead vs using up 1 byte of nonce space as for instance in this encryption scheme? https://github.com/miscreant/meta/wiki/STREAM

When I wrote that, I was fresh off a project that used libsodium's secretstream API, so that was fresh in my mind. Prepending padding to the plaintext that indicated whether it's the final block before encryption is roughly how that works. Truncation protection via the nonce is also fine.

What I am trying to understand is whether key commitment is required even if envelope encryption is used.

Potential attack scenario assumes that file encryption key (CEK/DEK) is swapped with something else that decrypts content and successfully verifies the authentication tag.

Honestly, I'd have to think about how to pull off an attack like that. It probably doesn't hurt to be key committing just in case, since that would completely shut that sort of thing down.

Oh excellent, I see you've been writing code for this. I'll have a look. I'm very curious what you did.

@tomekit
Copy link
Author

tomekit commented Aug 5, 2024

I've took time to integrate proposed changes into crypt command using the additional: cipher_version config param.
By default existing: "1" version is used. User may enable new version: "2" explicitly during configuration.
Complete commit: tomekit@19cb8ca

Version: "2" has 39 bytes overhead comparing to version: "1", since each file is encrypted using separate 32 bytes encryption key (40 bytes when wrapped using AES Key Wrap - RFC3394) .
Nonce stored in file is now 23 bytes and is 1 bytes shorter, since last byte is reserved for truncation protection and is appended on the fly dynamically during encryption/decryption

Cipher version behavior is strict, that is if V1 version is selected it will write files in a current format and won't be able to open/read V2. If V2 version is selected it will write files using new format and won't be able to open/read V1.

==

Additionally I've implemented cipher version detection which maybe useful for smooth transition into new cipher.
In this mode cipher version setting would determine version used for write, but reads would succeed for both V1 and V2 regardless of config selected version.
The challenge is that: DecryptedSize method is being called before object "magic header" containing cipher version is read, so it may return inaccurate size until: setCipherVersion() is being called from: newDecrypter().

This is related to the fact that when objects/files are listed (e.g. using S3 endpoint), then there is no reliable way to derive unencrypted file size from encrypted length without knowing the cipher version used. Above approach assumes the cipher version from the setting which may be right or not and corrects itself after actually reading the file.

I haven't found this to be an issue when read/write testing, but probably someone with more Rclone specific experience could tell more about feasibility of this approach and potential risk of inaccurate size during call to: DecryptedSize() before: newDecrypter() is called.
Additional commit: tomekit@8ef6000

==

Hi @ncw , is it something you could assist me with please?

It would be useful to understand if chosen approach, that is: cipher_version param is the way you would be happy to go with? After all both cipher versions are similar to each other.

It would also be helpful to know if auto-detection mentioned above is feasible. I mean it's always nice to keep things backward compatible, this would allow users to open their existing datasets even if they've enabled new cipher versions.

Once I know that, I could then add couple more unit tests which reflect these changes and would be happy to spend more resources on that in order to make this fit into Rclone way of doing things.

I am available to explain anything related to this.

@DianaNites
Copy link

@tomekit those commit links are 404, and in fact you have no visible rclone fork at all. Is it private or have you deleted it in the last hour?

@tomekit
Copy link
Author

tomekit commented Aug 6, 2024

Thanks for the heads up. I wasn't aware it's not publicly visible since I was viewing it from my own account.
I've pushed these changes now to public fork.

V2 cipher integrated into Rclone: tomekit@19cb8ca
Cipher version auto-detection (optional and experimental): tomekit@8ef6000

Complete diff: master...tomekit:rclone:envelope_encryption_7192_2

@ncw
Copy link
Member

ncw commented Aug 7, 2024

@tomekit - I had a quick look at those changes - they look interesting. There probably needs to be more discussion about exactly what is going into a v2 header though - we should try to come up with a fixed size and a layout of things within that header, maybe with room for expansion.

I don't want to have too many knobs on the encryption to keep it simple for the user.

To add to your list above the biggest missing thing as far as rclone users are concerned is hash support. It would be good if we can squeeze a hash of the plaintext into the header (or maybe footer). This hash should be encrypted of course otherwise we leak data about the plaintext. The purpose of the hash is to enable rclone's integrity checks. If we support a single hash it should be MD5 as that is the most popular (see the overview table for hash support). Yes it is broken for crypto purposes but still makes a fine and relatively inexpensive integrity check. More than one hash is a possibility.

If you are planning to submit this as a PR can you make sure you divide it into easy to review small commits. Large commits are difficult to merge and I really don't want to break crypt!

I agree with your idea that v1 objects may appear with the wrong size until read - I think that is probably and acceptable compromise but needs to be documented.

Another thought - why waste 8 bits of the nonce when 1 would do? I guess we could potentially use that byte to signal other things, but I don't know what.

@tomekit tomekit changed the title Rclone encryption improvements discussion Rclone crypt improvements (envelope encryption, truncation protection) Sep 5, 2024
@tomekit
Copy link
Author

tomekit commented Sep 16, 2024

Thank you for your input.

This is to let you know that I've restarted work on this. I plan to add hash (MD5) support to crypt. I think the viable place is the footer, mostly because hash can't be efficiently known earlier during execution.

I haven't yet decided how to properly encrypt such hash (what key to use? what nonce to use?). We could technically use the V2 approach, so the CEK key is used and nonce could be simply lastNonce + 1. The issue with this approach is that crypt.go:Hash() would require two HTTP requests. One to get the header (read the nonce, unwrap the key etc.), then second request to get the footer with the hash.
I am not sure if there is any viable alternative to this though. Even if we've used master key instead of CEK (V1 approach) we would still probably have to issue request to read the header and get the nonce. If we've used static nonce (therefore saving on 1 HTTP request) instead, the encryption result would be deterministic for same plaintext, which would leak information about same files (which is certainly not acceptable).

Speaking of header layout. I am not exactly sure what else we'd need to fit. We could decide at this stage to reserve e.g. 4 bytes (we could keep it NULL for now) for future use.

3.1 Dynamic chunk size (as mentioned here: #7192 (comment)) isn't possibly feasible parameter to store, as dynamic chunk size would affect the encrypted/decrypt object size. This is also moment where we could actually revise whether 64KB chunk size is big enough and/or if we would benefit from bigger one (e.g. 512KB or 1MB).

3.2 Different cipher support (as mentioned here: #7192 (comment)) would probably fit within a single byte, however I am not sure if that's something we've jointly agreed to include. From our perspective leaving a room for using AES-256 in the future instead of XSalsa20 would allow crypt to work in a web browser in a performant way, so certainly there is a benefit.

If you agree, the header layout could potentially be:

  • magic bytes,
  • nonce,
  • wrapped CEK,
  • 4 reserved bytes,
    • cipher type (XSalsa20, AES-GCM etc.),
    • not used,
    • not used,
    • not used

Speaking of footer layout. Currently I've added encrypted hash at the end, that's 16 bytes of ciphertext + authentication (16 bytes? Need to check). Technically we could add a byte flag, just before the hash to indicate the hash type as defined in the https://github.com/rclone/rclone/blob/master/fs/hash/hash.go#L77, so for instance it could be 0x01 for MD5.
That gives an option to use different than MD5 hash to "describe" the plaintext, I am not sure if we would ever need to do that. I certainly know that in some environments the md5 keyword is being flagged and considered insecure (regardless of the context).

Once we agree on this layout, that means no more changes that break the encrypted/decrypted size calculations. It's probably good idea to be able to always calculate these values without needing to issue the HTTP request (that would be painful for listings at least).

Over the next few days I should have some output. I work on this as a one single changeset on crypt.go and cipher.go with V2/V1 feature flagged (whether this will be exposed to the user or not, that's decision for later).
Once it's done and I feel somewhat confident about it I all I will then present change and start thinking how to reasonably split it for review.

Thanks again!

tomekit added a commit to tomekit/rclone that referenced this issue Sep 23, 2024
In V2 version each file encrypted using separate CEK (wrapped with AES Keywrap RFC3394). Truncation protection uses last byte of nonce to indicate last encryption block.
Added MD5 hash support for crypt.go (only if V2 cipher is used), MD5 of plaintext is encrypted and authenticated at the end of file.
Added 4 reserved bytes at the beginning for future use.
Added 1 hash type byte in the footer to allow upgrade to different than MD5 hash.
cipher_version setting determines hash version used for writes.
For reads we determine cipher version by reading header magic bytes, that's regardless of cipher_version setting. A
Added TestNewEncrypterV2 test to demonstrate the outputs and make sure that outputs match the expected hardcoded values.
Modified other tests to satisfy newly added params (e.g. cek).
Related to: rclone#7192
@tomekit
Copy link
Author

tomekit commented Sep 23, 2024

Hi @ncw,

I've created pull request (#8095) with improvements to crypt and hash support as described briefly in the commit:

[...]
In V2 version each file encrypted using separate CEK (wrapped with AES Keywrap RFC3394). Truncation protection uses last byte of nonce to indicate last encryption block.
Added MD5 hash support for crypt.go (only if V2 cipher is used), MD5 of plaintext is encrypted and authenticated at the end of file.
Added 4 reserved bytes at the beginning for future use.
Added 1 hash type byte in the footer to allow upgrade to different than MD5 hash.
cipher_version setting determines hash version used for writes.
For reads we determine cipher version by reading header magic bytes, that's regardless of cipher_version setting. A
Added TestNewEncrypterV2 test to demonstrate the outputs and make sure that outputs match the expected hardcoded values.
Modified other tests to satisfy newly added params (e.g. cek).
[...]

If user was to encrypt the single byte file, the previous layout was:

=8(magic)+24(nonce)+1(ciphertext)+16(ciphertext_auth)= 49 bytes.

New layout would be:

=8(magic)+23(nonce)+40(wrapped_CEK)+4(reserved bytes)+1(ciphertext)+16(ciphertext_auth)+1(footer_hash_type_header)16(footer_hash)+16(footer_hash_auth) = 125 bytes.

This is still something we can discuss going forwards, but adjustments to this are quite easy now given that code supports now the header and the footer.

Out of 4 reserved bytes:

  • cipher type (XSalsa20, AES-GCM etc.),
const (
	BlockCipherXSalsa20 = 0x01 // Used currently
	BlockCipherAESGCM   = 0x02 // Not used, possible future use (increase web performance): https://github.com/rclone/rclone/issues/7192#issuecomment-2353617267
)
  • not used,
  • not used,
  • not used

Footer hash type (single byte):

var (
HashHeaderNoHash = []byte{0x00} // Future use, in case user doesn't want to store encrypted MD5 at the end of file. We would then store 16 NULL bytes (since we want to keep the decrypted/encrypted size calculations intact).
HashHeaderMD5    = []byte{0x01} // Used by default
)

I've still left: cipher_version config inside. This is used internally as a feature flag to either use V1 or V2 (for writes) and for decryptedSize calculation (based on listing results). Once object is read then version is read from the magic bytes regardless of the: cipher_version setting. I've thought that it might be good idea to keep V1 by default and leave V2 (opt-in) in a next release. We can as well remove this config flag and just have it hardcoded in crypto.go.

Some code structures had to be slightly modified in order to support multiple version of crypt.
For instance previously used consts e.g.: fileHeaderSize had to be replaced with dynamic: func (c *Cipher) getFileHeaderSize().
Additionally: cek *cek had to be introduced as a param to couple functions and tests, as this is now crucial part of both encryption/decryption in a similar league as nonce.

Hash implementation wasn't particularly complex. The: wrapReaderCalculatePlaintextHash method wraps the plaintext io.Reader and then the: wrapReaderAppendPlaintextHash gets the final MD5 hash of a plaintext, it then gets encrypted and appended to the end of file (I've came up with NewLazyReader approach, but perhaps there are more natural ways to calculate hash (exactly at the end of reader) and then append it to the end - https://stackoverflow.com/a/78994809/2263395).

Plaintext hash is encrypted using same cek and +1 incremented last block nonce. Encrypted hash has it's own 16 bytes auth. This is so we can read hash separately and verify it's authenticity without having to process all encrypted blocks. Just before the encrypted hash in the footer we add:

crypt.go signals support for hashing depending on the version used:

func (f *Fs) Hashes() hash.Set {
	if f.cipher.version == CipherVersionV2 {
		return hash.Set(hash.MD5)
	} else {
		return hash.Set(hash.None)
	}
}

I've implemented single test: TestNewEncrypterV2 which isn't sufficient, but somewhat presents the data in the ciphertext.

I haven't modified any documentation at all. I think it would be good idea to agree on the layout/approach and modify documentation once we fully agree on these changes.

I have included these changes in a single commit/changeset, since each change affects the encrypted/decrypted size calculations and it made sense to push them in one go.

I would appreciate if you could have a quick look @ncw to validate it, I believe code won't change much from now on.

I would be happy to schedule a quick call this week if you think if it make sense to brief you with these changes. I will be mostly away since 29th Sep for one week, but will be available for comments/chat.

tomekit added a commit to tomekit/rclone that referenced this issue Sep 26, 2024
In V2 version each file encrypted using separate CEK (wrapped with AES Keywrap RFC3394). Truncation protection uses last byte of nonce to indicate last encryption block.
Added MD5 hash support for crypt.go (only if V2 cipher is used), MD5 of plaintext is encrypted and authenticated at the end of file.
Added 4 reserved bytes at the beginning for future use.
Added 1 hash type byte in the footer to allow upgrade to different than MD5 hash.
cipher_version setting determines hash version used for writes.
For reads we determine cipher version by reading header magic bytes, that's regardless of cipher_version setting. A
Added TestNewEncrypterV2 test to demonstrate the outputs and make sure that outputs match the expected hardcoded values.
Modified other tests to satisfy newly added params (e.g. cek).
Related to: rclone#7192
@hongkongkiwi
Copy link

Being able to use AES for encryption would be HUGE.

AES might not be the best, but it's so well supported, especially in the context of AWS/S3 services and it's accelerated across most chipsets, making it a good choice for lower power devices (e.g. on Android devices).

All of other improvements e.g. being able to share individual files and such would be really great and a win, but even just supporting AES would be awesome.

@tomekit
Copy link
Author

tomekit commented Oct 23, 2024

Being able to use AES for encryption would be HUGE.

This early PR won't implement AES encryption/decryption per se, but will add it to specification (reserved bytes in the encryption header), so it can be implemented in the future without breaking anything.

For anyone viewing this issue, please note that we've also had some discussion in pull requests below.
Initial pull request: #8095
Current pull request: #8105

@tomekit tomekit changed the title Rclone crypt improvements (envelope encryption, truncation protection) Rclone crypt improvements (envelope encryption, truncation protection, hashing support) Dec 23, 2024
@tbodt
Copy link

tbodt commented Jan 3, 2025

Would like to add on a suggestion to obfuscate the directory structure like cryptomator does, as described at https://docs.cryptomator.org/en/1.4/security/architecture/#filename-encryption

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

9 participants