Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

modpow implementation is not constant-time #19

Open
phayes opened this issue Apr 30, 2019 · 59 comments
Open

modpow implementation is not constant-time #19

phayes opened this issue Apr 30, 2019 · 59 comments
Labels

Comments

@phayes
Copy link
Contributor

phayes commented Apr 30, 2019

Hi there,

I'm the author of sidefuzz (https://github.com/phayes/sidefuzz) and I have found what appears to be variable-time behavior in the rsa::internals::encrypt() function. Specifically, rsa::internals::encrypt() appears to be variable-time in relation to the message. Note that I haven't worked this up into an actual exploit, but merely demonstrated that this function isn't constant-time in relation to the message inputed.

Specifically, the message 20d90c8af42aac9b1ee53dc9a0187201 takes 549894 instructions to encrypt, while the message 5a28cec68d47f6fe3b1df54c9f320f6d takes 552427 instruction to encrypt. This is a difference of 2533 instructions, or about 0.5%. So it's a slight difference, but probably exploitable with sufficient sampling.

I have crated a fuzzing targets here: https://github.com/phayes/sidefuzz-targets

You can confirm this difference with the sidefuzz tool like so:

sidefuzz check ./target/wasm32-unknown-unknown/release/rsa_encrypt_message.wasm 5a28cec68d47f6fe3b1df54c9f320f6d 20d90c8af42aac9b1ee53dc9a0187201
samples: 20000, t-value: 219771.0790572351, confidence: 100%
Found timing difference of 2533 instructions between these two inputs with 100% confidence:
input 1: 5a28cec68d47f6fe3b1df54c9f320f6d (552427 instructions)
input 2: 20d90c8af42aac9b1ee53dc9a0187201 (549894 instructions)

My first suspicion was that this was due to num_bigint_dig::BigUint::from_bytes_be() being variable-time, but fuzzing that function specifically results in what appears to be constant-time behavior. So I'm not actually sure where the problem is.

@newpavlov
Copy link
Member

IIRC RSA blinding (which I believe is used in this implementation) uses random number, so while execution time is still variable, it does not correlate with a protected secret.

@phayes
Copy link
Contributor Author

phayes commented Apr 30, 2019

Hi @newpavlov ,

I don't think that's the case here. The fuzzing target is directly testing this function:

/// Raw RSA encryption of m with the public key. No padding is performed.
#[inline]
pub fn encrypt<K: PublicKey>(key: &K, m: &BigUint) -> BigUint {
    m.modpow(key.e(), key.n())

So the problem is going to be in the implementation of modpow.

I've also got another fuzzing target that does full pkcs1v15 padding with a statically seeded CPRNG. It displays the same variable-time behavior (with a statically seeded deterministic PRNG mind-you), but the first fuzzing target was the more minimal case, so that's what I reported. (sidefuzz also accounts for RNGs by repeated sampling and taking a t-value, but this doesn't apply here)

Admittedly, this could also be an artifact of the fuzzer (which would be a bug in the fuzzer), but I don't think that's the case here either.

@dignifiedquire
Copy link
Member

dignifiedquire commented Apr 30, 2019 via email

@tarcieri tarcieri changed the title Potential timing vulnerability modpow implementation is not constant-time Apr 25, 2023
@tarcieri
Copy link
Member

tarcieri commented Apr 25, 2023

crypto-bigint now has a constant-time modpow implementation in the form of DynResidue::pow, however it uses Montgomery form internally so it may not be the fastest option for straight modpow since converting in/out of Montgomery form is costly, and it still monomorphizes around a fixed number of limbs.

For the purposes of rsa (and dsa) we probably need to add a new Uint type like UintVec which uses a number of limbs chosen at runtime as a proper num_bigint_dig::BigUint replacement.

Edit: work on crypto_bigint::BoxedUint has started.

@tarcieri
Copy link
Member

tarcieri commented Sep 5, 2023

Until this is addressed it seems like something we should more prominently highlight in security-related documentation.

Edit: opened #373 (and merged!)

@tomato42
Copy link

tomato42 commented Nov 22, 2023

Hi! 👋

I've recently been working on RSA side-channel attacks, and found that many implementations are vulnerable. That's described in the Marvin Attack.

Thanks to help by @ueno, who contributed the test harness, I was able to run the test against rust-crypto (exact versions of all packages are in the PR).

Unfortunately I have to report that the side-channel leakage from the numerical library is very substantial and easily detectable over the network. As such, I consider RustCrypto RSA to be vulnerable and exploitable.

Test results from a run with 100k repeats per probe on an i9-12900KS @ 5.225GHz:

Sign test mean p-value: 0.2189, median p-value: 0.06354, min p-value: 3.946e-60
Friedman test (chisquare approximation) for all samples
p-value: 8.827139162990364e-108
Worst pair: 2(no_padding_48), 4(signature_padding_8)
Mean of differences: -8.07298e-07s, 95% CI: -9.06918e-07s, -7.019407e-07s (±1.025e-07s)
Median of differences: -8.44500e-07s, 95% CI: -9.78000e-07s, -7.200000e-07s (±1.290e-07s)
Trimmed mean (5%) of differences: -7.27307e-07s, 95% CI: -8.13648e-07s, -6.316737e-07s (±9.099e-08s)
Trimmed mean (25%) of differences: -8.48652e-07s, 95% CI: -9.17708e-07s, -7.769631e-07s (±7.037e-08s)
Trimmed mean (45%) of differences: -9.56301e-07s, 95% CI: -1.05768e-06s, -8.603802e-07s (±9.865e-08s)
Trimean of differences: -4.72000e-07s, 95% CI: -5.76750e-07s, -3.635625e-07s (±1.066e-07s)

pairwise results are in report.txt

confidence intervals for the measurements are as follows:
conf_interval_plot_trim_mean_25

legend:

ID,Name
0,header_only
1,no_header_with_payload_48
2,no_padding_48
3,no_structure
4,signature_padding_8
5,valid_0
6,valid_48
7,valid_192
8,valid_246
9,valid_repeated_byte_payload_246_1
10,valid_repeated_byte_payload_246_255
11,zero_byte_in_padding_48_4

explanations of that are in the step2.py script in marvin-toolkit repo.

In other words, the protection from adding blinding is not sufficient, and Rust Crypto has at least the same issue as the CVE-2022-4304 in OpenSSL.

@tarcieri
Copy link
Member

@tomato42 indeed that's using PKCS#1v1.5 without random blinding, so with modpow being non-constant-time, it's to be expected. We should definitely get rid of any APIs which permit private key use without random blinding.

@tarcieri
Copy link
Member

Also just a note for the future, per our SECURITY.md we would've appreciated an advisory for this opened under a private security disclosure.

@tomato42
Copy link

@tarcieri If the de-blinding isn't performed using constant-time code, then use of blinding won't remove the side-channel signal, that's what the bug in OpenSSL was about: removal of the blinder and conversion to the constant-length bytes needs to be side-channel free too.

Also just a note for the future, per our SECURITY.md we would've appreciated an advisory for this opened under a private security disclosure.

ah, sorry about that; since this was linked as a security issue, I've assumed that the effect of it on security of RSA decryption was assumed public too.

@tarcieri
Copy link
Member

Our answer until now has been that the application of random blinding prevented such sidechannels, so the fact that isn't the case is news to us

@tomato42
Copy link

sorry, I'm confused, on one hand you say that the the privkey.decrypt(Pkcs1v15Encrypt, &buffer); won't use blinding, yet that's the public API that you say is protected by use of blinding in README.md...?

@tarcieri
Copy link
Member

tarcieri commented Nov 22, 2023

The README.md is wrong in that case.

(Also I'm just discovering this and haven't had time to look over any code yet to confirm specific details)

@tomato42
Copy link

tomato42 commented Nov 22, 2023

ok, I'm still running test with a larger sample, to see if there aren't additional sources of leakage. If you have any additional questions, feel free to ask

@tarcieri
Copy link
Member

I'd be curious if you saw similar sidechannels in non-PKCS#1v1.5 constructions like OAEP decryption or PSS signatures

@tomato42
Copy link

if you could propose a change on top of that PR that adds an OAEP decryption, I can run tests for that too (there are scripts in marvin-toolkit for testing OAEP too)

but since the leak is clearly from the numerical library, that means OAEP is also vulnerable, as that leak happens before any OAEP checks

even if PSS leaks like that, it doesn't make the implementation as easily exploitable, as both PKCS#1v1.5 an OAEP attacks are chosen ciphertext attacks, with PSS (or with PKCS#1 v1.5 signatures) the attacker can only reasonably affect the hash being signed, not the whole plaintext
(in other words, it's a fundamentally different attack, so the test for it needs to be different too)

@tarcieri
Copy link
Member

tarcieri commented Nov 22, 2023

Can you confirm you see the sidechannel against the raw blinded modpow operation (when used with e.g. rand_core::OsRng)? That's really what I'm curious about.

@tomato42
Copy link

Sorry, I'm not a rust developer, you will need to hand-hold me (provide the complete code to execute) if you want me to run any tests

@tarcieri
Copy link
Member

The branch containing that code has disappeared: https://github.com/ueno/marvin-toolkit/tree/wip/rust-crypto

You can test the low-level "hazmat" API: https://docs.rs/rsa/latest/rsa/hazmat/fn.rsa_decrypt.html

It would look something like:

https://github.com/tomato42/marvin-toolkit/pull/4/files#diff-c0f6b76f1d52cf643fb2f30ef85b81b20ace1aa7577d1f4603db0e091e052d66R50-R63

...replaced with:

    loop {
        let mut buffer = vec![0; args.size];
        let n = infile.read(&mut buffer)?;
        if n == 0 {
            break;
        }

        assert!(n == buffer.len());

        let now = Instant::now();
        let c = rsa::BigUint::from_bytes_be(&buffer);
        let _ =  rsa::hazmat::rsa_decrypt(Some(&mut rand_core::OsRng), &privkey, &c);
        writeln!(outfile, "{}", now.elapsed().as_nanos())?;
    }

Note you'll need to add rand_core as a dependency and enable the hazmat feature of rsa:

https://github.com/tomato42/marvin-toolkit/pull/4/files#diff-76856434fe4ad8f1f778ba0452c8d8e49e764d25adf1374d385634cd251e1ca2R6-R9

[dependencies]
anyhow = "1"
clap = { version = "4", features = ["derive"] }
rand_core = { version = "0.6", features = ["getrandom"] }
rsa = { version = "0.9", features = ["hazmat"] }

@tomato42
Copy link

(will test rsa_decrypt from hazmat later)

With 1 million observations per sample (same setup) I'm getting:

Sign test mean p-value: 0.09834, median p-value: 2.633e-08, min p-value: 0.0
Friedman test (chisquare approximation) for all samples
p-value: 0.0
Worst pair: 2(no_padding_48), 4(signature_padding_8)
Mean of differences: -2.06196e-06s, 95% CI: -2.09851e-06s, -2.022175e-06s (±3.817e-08s)
Median of differences: -3.22400e-06s, 95% CI: -3.24600e-06s, -3.205000e-06s (±2.050e-08s)
Trimmed mean (5%) of differences: -1.99461e-06s, 95% CI: -2.02364e-06s, -1.965729e-06s (±2.895e-08s)
Trimmed mean (25%) of differences: -2.08608e-06s, 95% CI: -2.10654e-06s, -2.067761e-06s (±1.939e-08s)
Trimmed mean (45%) of differences: -3.09282e-06s, 95% CI: -3.11547e-06s, -3.070944e-06s (±2.227e-08s)
Trimean of differences: -2.22769e-06s, 95% CI: -2.25800e-06s, -2.198494e-06s (±2.975e-08s

conf_interval_plot_trim_mean_25

and pairwise results:
report.txt

Which show that the errors in padding checks also leak (compare valid_48 to valid_246: sign test p-value of 5.13e-7)

@tomato42
Copy link

ran the tests against this:

        let c = rsa::BigUint::from_bytes_be(&buffer);
        let _ =  rsa::hazmat::rsa_decrypt(Some(&mut rand_core::OsRng), &privkey, &c);

and have confirmed that it's leaking too.

Same overall configuration but with 2.5 million observations per probe:

Sign test mean p-value: 0.1915, median p-value: 0.008017, min p-value: 1.428e-15
Friedman test (chisquare approximation) for all samples
p-value: 6.50448690941627e-27
Worst pair: 0(no_padding_48), 3(too_short_payload_0_1)
Mean of differences: -1.11105e-07s, 95% CI: -1.42260e-07s, -7.775684e-08s (±3.225e-08s)
Median of differences: -9.50000e-08s, 95% CI: -1.19000e-07s, -7.300000e-08s (±2.300e-08s)
Trimmed mean (5%) of differences: -9.89875e-08s, 95% CI: -1.25208e-07s, -7.384255e-08s (±2.568e-08s)
Trimmed mean (25%) of differences: -9.23689e-08s, 95% CI: -1.18946e-07s, -6.803432e-08s (±2.546e-08s)
Trimmed mean (45%) of differences: -9.30858e-08s, 95% CI: -1.17708e-07s, -7.039496e-08s (±2.366e-08s)
Trimean of differences: -9.47500e-08s, 95% CI: -1.20250e-07s, -7.056250e-08s (±2.484e-08s)

conf_interval_plot_trim_mean_45

Since for raw RSA padding doesn't matter I've executed a slightly different set of tests:

ID,Name
0,no_padding_48
1,no_structure
2,signature_padding_8
3,too_short_payload_0_1
4,too_short_payload_0_3
5,too_short_payload_0_7
6,too_short_payload_0_15
7,valid_0
8,valid_repeated_byte_payload_245_0

and pairwise statistical test results:
report.txt

@tarcieri
Copy link
Member

I'm not sure what solution we can provide here short of a fully constant time implementation (which was the planned long-term solution for this problem, using e.g. a Barrett reduction as provided by crypto-bigint, but that will require a substantial amount of work)

I guess I need to figure out how to run your reproducer.

Can you explain a bit more what those charts are showing? Are you actually able to recover keys?

@tomato42
Copy link

tomato42 commented Nov 23, 2023

I'm not sure what solution we can provide here short of a fully constant time implementation

what's necessary, is ensuring that the unblinding happens in constant time with respect to the output; that means you need to convert the whatever arbitrary precision format is in use for the modulo exponentiation into constant length integers, then multiply them using real constant-time code, and finally do constant time reduction modulo. Something like what's in https://github.com/tomato42/ctmpi but in rust, not in C. Leakage in conversion from one format to the other is not a problem as that value is uncorrelated with the RSA plaintext, so leakage of it doesn't provide useful information to the attacker (as long as the blinding/unblinding factors remain secret).

That being said, if your mod_exp is leaky, you really have to employ both base blinding and exponent blinding. Details of that, as well as links to papers showing attacks against implementations that used just one kind of blinding are in the https://datatracker.ietf.org/doc/html/draft-kario-rsa-guidance-02
But to fix raw RSA implementation against Bleichenbacher/Marvin you just need to fix the last multiplication and reduction modulo.

Can you explain a bit more what those charts are showing?

they show the estimation of the difference in timing for different inputs. Like in the last one, no_padding_48 (probe 0) is processed about 83 ns slower than no_structure (probe 1): that's why it's marked 1-0 (or "probe 1 time subtract probe 0 time")

Are you actually able to recover keys?

those graphs show that it's possible, using time of the decryption, to execute one instance of the Bleichenbacher oracle. To decrypt a RSA ciphertext (or forge a signature) the attacker would need to run the test good few thousand times. See the details on the Marvin vulnerability page and in the associated papers. Since my goal is proving that there is no side-channel, not actually attacking implementations, I don't have good tooling to actually perform the attack (but then there's 25 years of literature on the topic, so once you have a working oracle, there are ready solutions for the rest).

@tomato42
Copy link

Another run for rsa_decrypt, this time with 10.5 million observations per sample:

Sign test mean p-value: 0.1359, median p-value: 2.134e-07, min p-value: 1.433e-60
Friedman test (chisquare approximation) for all samples
p-value: 9.61782206869971e-128
Worst pair: 0(no_padding_48), 8(valid_repeated_byte_payload_245_0)
Mean of differences: -8.28395e-08s, 95% CI: -9.87044e-08s, -6.762672e-08s (±1.554e-08s)
Median of differences: -9.10000e-08s, 95% CI: -1.02000e-07s, -8.000000e-08s (±1.100e-08s)
Trimmed mean (5%) of differences: -8.62429e-08s, 95% CI: -9.87246e-08s, -7.492710e-08s (±1.190e-08s)
Trimmed mean (25%) of differences: -8.64124e-08s, 95% CI: -9.79358e-08s, -7.520178e-08s (±1.137e-08s)
Trimmed mean (45%) of differences: -9.07003e-08s, 95% CI: -1.00972e-07s, -7.931308e-08s (±1.083e-08s)
Trimean of differences: -9.07500e-08s, 95% CI: -1.02000e-07s, -8.025000e-08s (±1.087e-08s)

conf_interval_plot_trim_mean_45

and pairwise statistical results: report.txt

Which suggests that for the leakage to happen, the whole most significant word needs to be equal 0. This will make attacks against OAEP with 2048 bit keys rather impractical against 64 bit architectures. But it does mean that both 32 bit architectures, and less common key sizes, like 2049 or 2056 bit keys, are rather realistic to attack.

Note: I haven't excluded the possibility of leakage happening with probe 3 or probe 4 (16 and 32 most significant bits being zero respectively), but it does look exactly the same as the leakage pattern for CVE-2022-4304, where I did do that.

@tarcieri
Copy link
Member

what's necessary, is ensuring that the unblinding happens in constant time with respect to the output; that means you need to convert the whatever arbitrary precision format is in use for the modulo exponentiation into constant length integers, then multiply them using real constant-time code, and finally do constant time reduction modulo. Something like what's in https://github.com/tomato42/ctmpi but in rust, not in C.

The core modpow operation in num-bigint-dig should already implemented in terms of Montgomery multiplication: https://github.com/dignifiedquire/num-bigint/blob/6f73f0a4025164325e85f8645dad6712b3110c51/src/monty.rs#L129

That alone is insufficient, however, per the OP. I'd generally worry that num-bigint is a general-purpose arbitrary precision bignum library that wasn't designed for constant-time operation.

I could potentially attempt to completely sidestep num-bigint for this, extracting the raw limbs out of the BigUints and performing modpow in code which has been carefully written to avoid any secret-dependent branching, similar to what we have already in crypto-bigint: https://github.com/RustCrypto/crypto-bigint/blob/a02c505/src/uint/modular/reduction.rs

@gogo2464
Copy link

@tarcieri thanks ! I am going to dig!

@gogo2464
Copy link

@tarcieri I am actively digging!!!!

I used the marvin toolkit. I have the data. But I need a tool to recover message from the identified vulnerability from the previously extracted data. Do you know a such tool please?

@tarcieri
Copy link
Member

I don't offhand, although I believe it's part of the toolkit.

@gogo2464 it would probably help if you asked these questions on Zulip so we can keep this already quite busy issue on topic for resolving the core issue: https://rustcrypto.zulipchat.com/#narrow/stream/260047-RSA

@tomato42
Copy link

No, the toolkit doesn't have the exploit as part of it, there are multiple reasons for it, but the main one is that it's hard to create one that's universal. In practice, Marvin toolkit is able to run a single instance of the Bleichenbacher oracle, you need to create a separate piece of code that will call this oracle.

the Marvin paper does give a high level overview, for details you'll need to look at the past papers

How much noise time is acceptable in miliseconds to have a reliable exploit please?

I'm not exactly sure what you're asking here... a reliable exploit requires you to be able to run statistical tests that give highly significant results (p-value below 1e-6 at least) for values we're interested in (i.e. ciphertexts that decrypt to plaintexts starting with specific bytes) or ones that consistently rejects values we're not interested in (i.e. ciphertexts that decrypt to plaintexts that don't start with specific bytes). That's a product of everything: size of the sample, size of the side-channel, size of the noise, nature of the noise (is it changing in time?), amount of irrelevant code measured together with the leaky code, resolution, precision and accuracy of the time source used... just as an example of the most significant parts of that.

@mepi262
Copy link

mepi262 commented Nov 5, 2024

@tarcieri
Any update on this issue?

@tarcieri
Copy link
Member

tarcieri commented Nov 5, 2024

The PR to address it is still open: #394

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests