-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rclone crypt improvements (envelope encryption, truncation protection, hashing support) #7192
Comments
I personally haven't contributed to rclone development (yet, I want to though since I love rclone so much), but I wanted to comment on this as long as the discussion is going. I have a few thoughts on this. I think if the crypt remote were to be improved, that the current design should be kept as
3.1 Dynamic Chunk Size 3.2 Different Cipher Support.
Great discussion. I love rclone and want to see it improve. I have a full-time dayjob, and I've never contributed to rclone before but maybe I can help on weekends if people really do want to improve the crypt remote. |
Technically the
I am not a cryptographer, but I don't think this attack is practical/possible to execute in this setup. As per this answer: https://crypto.stackexchange.com/a/96033/70896, the adversary could generate ciphertext The challenge is that the adversary wouldn't be able to successfully embed new DEK, since it's "wrapped" using key wrapping algorithm, which is itself authenticated. If user have tried to decrypt such object, the authenticity issue would've been captured already on the process of key unwrapping where integrity is being checked. I believe that key wrapping is the protection layer in a non committing scheme.
I was researching this particular topic a while ago and received pretty good overview of multiple different solutions: https://crypto.stackexchange.com/a/106992/70896 Given that current Rclone's encryption scheme is extremely simple, works reasonably well and already uses constant-sized blocks, I would probably keep using similar setup except it shall be proofed against truncation protection.
Thank you for taking time to comment on this. I am also happy to participate in this discussion and I hope that eventually we will push these improvements upstream. I don't think that we need to address all of these issues at once, however I believe that envelope encryption and truncation protection are the ones which shall be a priority. This is also invitation for anyone to comment on this issue. 🍺 |
Excellent thinking.
Another good idea.
This is actually a very good criticism of libsodium's secretstream API and why it might not work well for rclone. I don't actually know because I haven't checked, but does rclone actually do parallel encryption and seek to random blocks in encrypted files? I assume the answer would be yes in the case of an rclone mount.
Don't worry, I'm not a cryptographer either lol. So I might say some stuff that is inaccurate and if somebody corrects me I'll admit I'm wrong. So, I presume that being able to share files with other people or other devices you own without sharing the password to all your files is the point of using envelope encryption? The problem with using a non-fully-committing AEAD scheme for this is very subtle: I could share a file with both you and your friend, giving the two of you different keys to decrypt it with. You would then both be duped into thinking you got the same file even though the plaintext would be very different. Maybe decrypting the file with your key would result in a normal plaintext, but decrypting the file with your friend's key might result in a malicious plaintext. There is nothing in GHASH or Poly1305 preventing me from doing that - you need a collision resistant authenticator like HMAC to make the guarantee that I didn't give you both different keys. The authors of this presentation at USENIX Security conference demonstrate an attack very similar to this: https://www.youtube.com/watch?v=VazqgsBwzOY If we stick with password based encryption (which I'm totally fine with), then this committing vs non-committing AEAD stuff wouldn't be an issue. To trick you into using the wrong key, they'd have to poison your rclone config file somehow, which is definitely out-of-scope. No matter what we do, I think we should improve the file truncation protection. To avoid everybody interested in this topic talking past each other, I say we come up with a proposal of how this new encryption scheme should actually work. Then we can discuss the pros and cons of the scheme, and get community input on it. I think this new encryption scheme should achieve the following:
What do you think? Great discussion. |
I just wanted to point out that my current hack for the truncation protection issue is to do something like this, and I kinda hate it, lol.
|
#7084 (AES/GCM/NoPadding) |
Thanks for sharing, great work. I have no experience with KMS. Is it possible to use it simply just with the static key (in a similar way to current Do you know what key wrapping method is used by default with your setup? I think having support for AWS native encryption is great step forward, however I don't think that it's a good candidate to ever replace: |
This makes no sense, since security, portability and compatibility with s3 native will be lost. In my case, |
The current
What's your preferred way to access client-side encrypted objects via web front given that AWS SDK doesn't seem to support client-side encryption for their browser SDK? |
I have no idea how KMS has entered this discussion, but it's wildly out of scope. Offloaded encryption (which is not even possible with AWS) should be yet a different backend, similar to #1647. The focus here should be about creating an improved crypt backend for client-side encryption. |
A few more thoughts. Since we've had this discussion, AEGIS is in the process of being standardized: https://www.ietf.org/archive/id/draft-irtf-cfrg-aegis-aead-07.html AEGIS is a key-committing AEAD scheme, so we could use if we want to add key commitment. We don't really need a key-committing scheme though, if we just stick to the password authentication and don't bother with envelope encryption. To exploit the lack of key-commitment with the current password based system, you'd have to poison somebody's rclone config file somehow, and that's probably not in scope. I'm not sure whether or not partitioning oracle attacks could apply against somebody's live rclone mount though. Actually, it's probably safer to just use something key-committing. If we wanna be really safe and sacrifice a bit of performance, Use a stream cipher (like AES-CTR or ChaCha) and authenticate each chunk with a collision-resistant hash based authentication, like HMAC-SHA256 or blake2b. I think the lack of truncation protection definitely needs to be fixed though. Libsodium's secretstream API does it by prepending a single byte to the plaintext before encrypting and authenticating each chunk to indicate whether or not it's the final chunk. I think our new encryption scheme should work something like this:
I'm strongly in favor of using something that's key-committing, just to be on the safe side. |
We're slowly getting back to this topic.
What's the purpose of deriving a unique key from a salt instead of generating truly random key? We could generate CEK (Content Encryption Key) random key and then wrap it using AES Key Wrap RFC 3394 (it's commonly used, e.g. for Cryptomator or AWS client-side encryption) Existing header looks like this: https://rclone.org/crypt/#header
New header could look like this:
CEK could be added after or before the Nonce, I am not sure if there is any preference here.
As far I am concerned, that's what the current encryption scheme does already. New encryption scheme should likely follow similar approach.
What's the benefit of prepending 1 or 0 to every chunk which comes at 1 byte storage overhead vs using up 1 byte of nonce space as for instance in this encryption scheme? https://github.com/miscreant/meta/wiki/STREAM
So am I, but I also try to weigh some risks that come from non standard encryption use and potential additional development / maintenance costs and performance hit (hash based authentication will likely be slower then polynomial, but that's only a guess).
What I am trying to understand is whether key commitment is required even if envelope encryption is used. Potential attack scenario assumes that file encryption key (CEK/DEK) is swapped with something else that decrypts content and successfully verifies the authentication tag. In the setup that I've mentioned above we could use "RFC 3394 Key wrapping" which itself provides a layer of key authentication. Potential attacker have access to wrapped CEK, since it's part of the file contents, however if adversary replaces wrapped CEK with something else the key unwrapping would have failed: https://crypto.stackexchange.com/questions/67869/does-the-aes-kw-key-wrap-algorithm-perform-authenticated-encryption The attack on non key-committing scheme would've been possible if attacker had access to unwrapped CEK directly, however they don't since it's wrapped / protected. Don't quote me on that though. |
I've had a go (no pun intended) and implemented prototype of an envelope encryption: #7192 (comment) 32 bytes random key It satisfies: I am going to work on: I won't address : I am not yet certain how to integrate it with Rclone. I think acceptable way and not requiring much development effort would be to add config parameter to Question remains whether new version (2), once enabled, shall allow reading version (1) objects and allow users "lazy" migration or should rather be strict and allow reading only items from selected version. Comments are welcome, so is the help with modifying |
I wish I had time to test for you. I probably won't have capacity to revisit this deeply until next year :( |
Here: master...tomekit:rclone:envelope_encryption_7192_2#diff-d60621b19be9d97af46259d24a081ffd84864417b2cf80a8178f8e713db9264aR839 I've implemented truncation protection, that's approach from STREAM cipher: https://github.com/miscreant/meta/wiki/STREAM (I was also influenced by a helpful response on Crypto Stack Exchange: https://crypto.stackexchange.com/a/106992/70896) Current condition which sets the: if n < blockDataSize { // last block
fh.nonce[len(fh.nonce)-1] = lastBlockFlag // Set last block flag at the last byte
} Perhaps we could set the: if n < blockDataSize { // last block
fh.nonce[len(fh.nonce)-1] = lastBlockFlag // Set last block flag at the last byte
} else {
fh.nonce[len(fh.nonce)-1] = 0
} I wasn't sure if we can rely on the fact that if flag is not set then this last byte will be zero anyway (implicit). This requires adding tests, e.g. encrypting file smaller, equal or bigger than I am going to leave this for a short while in a hope that some comments or suggestions come in. In a few weeks I shall find some time to integrate it with Rclone |
I actually like this better than what I suggested with using a salt. I think using a standard like this is an excellent idea.
When I wrote that, I was fresh off a project that used libsodium's secretstream API, so that was fresh in my mind. Prepending padding to the plaintext that indicated whether it's the final block before encryption is roughly how that works. Truncation protection via the nonce is also fine.
Honestly, I'd have to think about how to pull off an attack like that. It probably doesn't hurt to be key committing just in case, since that would completely shut that sort of thing down. Oh excellent, I see you've been writing code for this. I'll have a look. I'm very curious what you did. |
I've took time to integrate proposed changes into Version: "2" has 39 bytes overhead comparing to version: "1", since each file is encrypted using separate 32 bytes encryption key (40 bytes when wrapped using AES Key Wrap - RFC3394) . Cipher version behavior is strict, that is if V1 version is selected it will write files in a current format and won't be able to open/read V2. If V2 version is selected it will write files using new format and won't be able to open/read V1. == Additionally I've implemented cipher version detection which maybe useful for smooth transition into new cipher. This is related to the fact that when objects/files are listed (e.g. using S3 endpoint), then there is no reliable way to derive unencrypted file size from encrypted length without knowing the cipher version used. Above approach assumes the cipher version from the setting which may be right or not and corrects itself after actually reading the file. I haven't found this to be an issue when read/write testing, but probably someone with more Rclone specific experience could tell more about feasibility of this approach and potential risk of inaccurate size during call to: == Hi @ncw , is it something you could assist me with please? It would be useful to understand if chosen approach, that is: It would also be helpful to know if auto-detection mentioned above is feasible. I mean it's always nice to keep things backward compatible, this would allow users to open their existing datasets even if they've enabled new cipher versions. Once I know that, I could then add couple more unit tests which reflect these changes and would be happy to spend more resources on that in order to make this fit into Rclone way of doing things. I am available to explain anything related to this. |
@tomekit those commit links are 404, and in fact you have no visible rclone fork at all. Is it private or have you deleted it in the last hour? |
Thanks for the heads up. I wasn't aware it's not publicly visible since I was viewing it from my own account. V2 cipher integrated into Rclone: tomekit@19cb8ca Complete diff: master...tomekit:rclone:envelope_encryption_7192_2 |
@tomekit - I had a quick look at those changes - they look interesting. There probably needs to be more discussion about exactly what is going into a v2 header though - we should try to come up with a fixed size and a layout of things within that header, maybe with room for expansion. I don't want to have too many knobs on the encryption to keep it simple for the user. To add to your list above the biggest missing thing as far as rclone users are concerned is hash support. It would be good if we can squeeze a hash of the plaintext into the header (or maybe footer). This hash should be encrypted of course otherwise we leak data about the plaintext. The purpose of the hash is to enable rclone's integrity checks. If we support a single hash it should be MD5 as that is the most popular (see the overview table for hash support). Yes it is broken for crypto purposes but still makes a fine and relatively inexpensive integrity check. More than one hash is a possibility. If you are planning to submit this as a PR can you make sure you divide it into easy to review small commits. Large commits are difficult to merge and I really don't want to break crypt! I agree with your idea that v1 objects may appear with the wrong size until read - I think that is probably and acceptable compromise but needs to be documented. Another thought - why waste 8 bits of the nonce when 1 would do? I guess we could potentially use that byte to signal other things, but I don't know what. |
Thank you for your input. This is to let you know that I've restarted work on this. I plan to add hash (MD5) support to crypt. I think the viable place is the footer, mostly because hash can't be efficiently known earlier during execution. I haven't yet decided how to properly encrypt such hash (what key to use? what nonce to use?). We could technically use the V2 approach, so the CEK key is used and nonce could be simply Speaking of header layout. I am not exactly sure what else we'd need to fit. We could decide at this stage to reserve e.g. 4 bytes (we could keep it NULL for now) for future use.
If you agree, the header layout could potentially be:
Speaking of footer layout. Currently I've added encrypted hash at the end, that's 16 bytes of ciphertext + authentication (16 bytes? Need to check). Technically we could add a byte flag, just before the hash to indicate the hash type as defined in the https://github.com/rclone/rclone/blob/master/fs/hash/hash.go#L77, so for instance it could be 0x01 for MD5. Once we agree on this layout, that means no more changes that break the encrypted/decrypted size calculations. It's probably good idea to be able to always calculate these values without needing to issue the HTTP request (that would be painful for listings at least). Over the next few days I should have some output. I work on this as a one single changeset on Thanks again! |
In V2 version each file encrypted using separate CEK (wrapped with AES Keywrap RFC3394). Truncation protection uses last byte of nonce to indicate last encryption block. Added MD5 hash support for crypt.go (only if V2 cipher is used), MD5 of plaintext is encrypted and authenticated at the end of file. Added 4 reserved bytes at the beginning for future use. Added 1 hash type byte in the footer to allow upgrade to different than MD5 hash. cipher_version setting determines hash version used for writes. For reads we determine cipher version by reading header magic bytes, that's regardless of cipher_version setting. A Added TestNewEncrypterV2 test to demonstrate the outputs and make sure that outputs match the expected hardcoded values. Modified other tests to satisfy newly added params (e.g. cek). Related to: rclone#7192
Hi @ncw, I've created pull request (#8095) with improvements to
If user was to encrypt the single byte file, the previous layout was:
New layout would be:
This is still something we can discuss going forwards, but adjustments to this are quite easy now given that code supports now the header and the footer. Out of 4 reserved bytes:
Footer hash type (single byte):
I've still left: Some code structures had to be slightly modified in order to support multiple version of crypt. Hash implementation wasn't particularly complex. The: Plaintext hash is encrypted using same
I've implemented single test: I haven't modified any documentation at all. I think it would be good idea to agree on the layout/approach and modify documentation once we fully agree on these changes. I have included these changes in a single commit/changeset, since each change affects the encrypted/decrypted size calculations and it made sense to push them in one go. I would appreciate if you could have a quick look @ncw to validate it, I believe code won't change much from now on. I would be happy to schedule a quick call this week if you think if it make sense to brief you with these changes. I will be mostly away since 29th Sep for one week, but will be available for comments/chat. |
In V2 version each file encrypted using separate CEK (wrapped with AES Keywrap RFC3394). Truncation protection uses last byte of nonce to indicate last encryption block. Added MD5 hash support for crypt.go (only if V2 cipher is used), MD5 of plaintext is encrypted and authenticated at the end of file. Added 4 reserved bytes at the beginning for future use. Added 1 hash type byte in the footer to allow upgrade to different than MD5 hash. cipher_version setting determines hash version used for writes. For reads we determine cipher version by reading header magic bytes, that's regardless of cipher_version setting. A Added TestNewEncrypterV2 test to demonstrate the outputs and make sure that outputs match the expected hardcoded values. Modified other tests to satisfy newly added params (e.g. cek). Related to: rclone#7192
Being able to use AES for encryption would be HUGE. AES might not be the best, but it's so well supported, especially in the context of AWS/S3 services and it's accelerated across most chipsets, making it a good choice for lower power devices (e.g. on Android devices). All of other improvements e.g. being able to share individual files and such would be really great and a win, but even just supporting AES would be awesome. |
This early PR won't implement AES encryption/decryption per se, but will add it to specification (reserved bytes in the encryption header), so it can be implemented in the future without breaking anything. For anyone viewing this issue, please note that we've also had some discussion in pull requests below. |
Would like to add on a suggestion to obfuscate the directory structure like cryptomator does, as described at https://docs.cryptomator.org/en/1.4/security/architecture/#filename-encryption |
We're running S3Drive (GUI for S3 on desktop, mobile, web) and recently aligned with Rclone's encryption scheme for better interoperability and features like drive mount and Webdav that we could get out of the box.
Prior to that we've researched research couple different encryption schemes and despite current Rclone encryption shortcomings (from our perspective) we've went for it, knowing that there is a room for improvements.
Disclaimer, I am not a cryptographer, but rather an engineer who by the sheer luck of project choices happen to be more involved with cryptography for the past couple years.
1. No envelope encryption
The biggest shortcoming so far that we see, is that every single Rclone file content is encrypted by a single master key (derived from user provided password). On its own this doesn't pose direct security threat since every file has random 24 bytes nonce, but it limits how you can securely exchange the encrypted file.
One of our features is file sharing, this feature is native to S3 protocol. When encrypted shared file is passed to the receiver they need to obtain the file decryption key. With a current Rclone's setup the decryption key happen to be user's master key. By sharing a single file, the encryption key to ALL FILES is revealed which is very undesirable from a security point of view.
Before we've implemented Rclone's encryption, we've been using AWS client-side encryption: https://docs.aws.amazon.com/amazon-s3-encryption-client/latest/developerguide/encryption-algorithms.html#crypto_features it worked well, but it wasn't in the end feasible for us, mostly due to lack of consistent hardware accelerated streamed AES-GCM support across different platforms (web included).
What was great about AWS's approach is how they've implemented key management and key wrapping.
In other words, each file had a different encryption key (wrapped DEK), which would then be encrypted by the master key.
Sharing a resource would reveal only that single key (DEK). By choosing a key wrapping algorithm it would be possible to create multiple different encryption keys (by simply reencrypting the DEK).
I believe it's something that Rclone could use.
This change would have to be reflected in the ciphertext heading and would increase the length of a ciphertext in the ballpark of 16-32 bytes (depending on the KW algorithm).
2. No truncation protection
Currently, you can remove as many as you want blocks (starting from the end) from the ciphertext and no one would notice. There are multiple ways to address that, but given the Rclone's ciphertext format, the one that would make sense in my opinion is: https://github.com/miscreant/meta/wiki/STREAM
In other words it would be good to reserve 1 byte of nonce for the:
last_block
flag.We were recently given a great response in this topic which also might be helpful for you to assess alternative options.
https://crypto.stackexchange.com/a/106992/70896
(Update) 2.5 Hashing support
Currently in order to verify plaintext integrity the whole encrypted object had to be scanned and decrypted and hash has to be calculated. This is costly, especially the bigger the file.
To solve this, encrypted object will have an MD5 hash of a plaintext appended to the footer. The hash itself would be encrypted and authenticated using same AEAD construct as file content.
This in fact is already implemented: a5d6af7
There is a 1-byte room that stores the hash type just in case in the future we need to migrate to something stronger/better than MD5.
3. Variable encryption settings
Current Rclone's settings are sane defaults which work well on a typical hardware.
They don't work well on Web though, mostly due to poor speed of XSalsa20 and overhead related to fixed 64KB block processing.
3.1 Dynamic chunk size
This could be achieved by adding the 4 byte UINT field to the header ... or could be as small as 1 byte which would be a multiplier of e.g. 64KB (that would support block size up to 16MB).
3.2 Different cipher support
It's XSalsa20 now. I am not a huge fan of AES-GCM, but there isn't really a choice for performant cipher on Web: https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto/decrypt
Could we indicate the cipher type in the header and possibly support AES-GCM (libsodium has consistent API on all platforms)?
4. No path protection
If you copy ciphertext from location: /a to location /b, no one would notice. Ciphertext is not aware where it belongs. Perhaps not a big issue for many and this comes with certain side effects and complexities and might not be feasible for Rclone at all. E.g. simple raw byte copy would no longer be possible (as the heading with new location would have to be updated).
I've thought I will just mention it anyway.
Final word
I am writing this to understand what's Rclone maintainers (nice to meet you Nick @ncw for the first time) view on these improvements, whether they make sense for Rclone and if they're interested in taking time to include these changes in the specification.
I can certainly participate in the planning phase and happy to do more research in terms of optimal choices, although I am sure my competence in Rclone's world falls short.
These improvements is something we would be implementing eventually, but given that benefit of staying compliant with Rclone's approach is so great... we've thought it would be a shame not to ask.
We're thinking of executing this later this year and this would approximately be the time where we (well, likely myself) could get our hands dirty with Go and build some pull requests.
The text was updated successfully, but these errors were encountered: