config/server: add `token_key` to avoid randomization of handshake token #201

finnbear · 2024-08-20T21:50:16Z

Motivation

To refresh TLS certificates, the Endpoint config must be reloaded. If the new config has a different (random) handshake token, then best case in-flight retries will fail. Worst case (as in my application), the next retry will fail regardless of when it happens.

Testing

my application, via Upgrade to rustls@0.23 and quinn@0.11 #200
my application, via this PR

Changelog entry

- Add `ServerConfigBuilder::token_key` to avoid randomizing handshake token key by @finnbear in #201

Context

Followup to #200

Example code for generating a handshake token:

fn generate_token_key() -> Arc<dyn quinn_proto::crypto::HandshakeTokenKey> {
    use rand::RngCore;
    let rng = &mut rand::thread_rng();
    let mut master_key = [0u8; 64];
    rng.fill_bytes(&mut master_key);
    let master_key = ring::hkdf::Salt::new(ring::hkdf::HKDF_SHA256, &[]).extract(&master_key);
    std::sync::Arc::new(master_key)
}

Unresolved questions

~~should the example code be included in the docs?~~
~~should the example code be exposed from the API? (would require ring and rand deps, but they could be optional)~~
~~or should I PR quinn-proto to include it there?~~

BiagioFesta · 2024-08-21T18:26:00Z

should the example code be included in the docs? [...]

I think this PR is good enough. This is more quinn related (and for advance use); thus, I don't feel an crypto example is required from this crate.

Actually, I am start thinking that we could just allow building a wtransport::*Config directly from quinn::*Config, so we give all flexibility without re-exporting all advanced use cases.

If the new config has a different (random) handshake token, then best case in-flight retries will fail. Worst case (as in my application), the next retry will fail regardless of when it happens.

Can you please provide more details about the worst case? I don't fully understand why we do have it in the first place (naively I would have said only "in-flight retry" could have been invalidated)

finnbear · 2024-08-21T19:10:03Z

Can you please provide more details about the worst case?

Absolutely! I wrote code like this, reusing the same loop to avoid spawning an additional Future.

let config = new_config(port, &rustls_config);
let endpoint = Endpoint::server(config)?;
let mut reload_rate_limit = RateLimiter::new(Duration::from_secs(60));

loop {
	if !reload_rate_limit.should_limit_rate() {
            // may enter if branch every minute
            // assumption: will enter at least once every 0.5 months, which is sufficient for Let's Encrypt
	    endpoint.reload_config(new_config(port, &rustls_config), false).unwrap();
	}

	let incoming_session = endpoint.accept().await;

	if !incoming_session.remote_address_validated() {
	    incoming_session.retry();
	    continue;
	}

	tokio::spawn(async move {
	    let _ = incoming_session.await;
	});
}

The sequence of events is:

enter the top of the loop (random key = A)
await accept
>60s elapse
browser connects
no signature, address isn't validated
retry sent (random key = A)
continue the loop
rate limit expired, reload config (random key = B)
browser reconnects (signed by random key = A)
signature mismatch, address isn't validated
retry sent (random key = B)
continue the loop
repeat step 9 (because apparently that's how Chrome behaves)

In other words, it's not that there is a reload and the next retry fails (as I originally thought and said). It's that the act of retrying causes the reload to happen at the same time, guaranteeing that the retry will be in-flight.

I could have spawned an additional Interval future which reloads the config every few hours (or moved the config reloading below the validation check). Then only a rare coincidentally in-flight request would fail. However, in that unlikely event, the browser's retry system wouldn't fix the problem (as previously discussed). So that's not a solution.

BiagioFesta · 2024-08-21T19:55:33Z

I see, so as I thought the problem is only for those in-flight "retries" after the config reload (which updates the token key). However, if so, the probability of this to happen is proportional to the frequency of reloading the configuration.

Moreover, as you correctly mentioned. the main issue of your code is because:

if !reload_rate_limit.should_limit_rate() ...
let incoming_session = endpoint.accept().await;  // <--- this "blocks" the reload if no incoming

essentially the problem is mixing "sync" and "async" code.

More idiomatic code could be something like:

loop {
  // ...

  // For example:
  // https://docs.rs/tokio/latest/tokio/time/struct.Interval.html#
  let interval_reload_config: tokio::time::Interval = /* ... */

  tokio::select! {
    () = interval_reload_config.tick() => {
      // reload config
    } 
    incoming = endpoint.accept() => {
      // handle incoming
    }
  }
}

More interesting instead is:

repeat step 9 (because apparently that's how Chrome behaves)

(Just want to be sure I am correctly understanding)
Is this mean Chrome sends INITIAL(tokenA) even if it just received RETRY(tokenB) ?

Anyway that's probably the correct behavior as the standard says:

[...] A client MUST accept and process at most one Retry packet for each connection attempt. After the client has received and processed an Initial or Retry packet from the server, it MUST discard any subsequent Retry packets that it receives.

From: RFC9000$17.2.5.2

finnbear · 2024-08-21T20:07:35Z

More idiomatic code could be something like:

Yes, makes the failure rare instead of certain, which is an improvement 👍

But impossible is still better than rare.

(I thought of a way to avoid failure without this PR: don't reload the config if sent a retry in last 10 seconds or whatever the retry token validity is set to, and hope that a 10s gap will exist within a 1 month period, which it certainly will for my app. If you want, I can change this PR to simply warn about this in the documentation of reload_config)

(Just want to be sure I am correctly understanding)
Is this mean Chrome sends INITIAL(tokenA) even if it just received RETRY(tokenB) ?

Yes, that's what I meant. This is based on instrumenting quinn-proto with console logs on the token generation and token validation side, and seeing a new token get generated for each of ~6 retries but the same token being (in)validated.

Anyway that's probably the correct behavior as the standard says:

Yeah, my guess is Chrome follows the spec and the 2nd, 3rd, 4th, etc. retries are ignored. Chrome is simply resending the first one, to account for the unreliable network.

BiagioFesta · 2024-08-21T22:17:20Z

[...] If you want, I can change this PR to simply warn about this in the documentation of reload_config)

Actually I was more curious about the issue than relating this to the PR. In particular, I wanted to see if we had the fully comprehension of the root cause to properly prioritize this. This PR is fine, thank you for that.

My only feeling is that this new API is quite "advanced-use", it also requires a little bit of understanding of the underlying protocol and crypto functions, in contrast with the philosophy of wtransport which aims to have a very friendly high-level user interface (just like W3C broweser API). Anyway, that's ok considering this quinn-specific and, thus, behind quinn feature.

As mentioned, for this kind of "advanced-usage" probably the best option is just having a wtransport::Config built from quinn:Config so with a single "exposed API" (behind quinn feature) we cover all advanced-uses (as essentially has been done for rustls::Config).
Anyway, this is more a follow-up though

Thank you again for those contributions of yours; they are really appreciated! :)

BiagioFesta · 2024-08-23T18:41:07Z

As mentioned, for this kind of "advanced-usage" probably the best option is just having a wtransport::Config built from quinn:Config so with a single "exposed API" (behind quinn feature) we cover all advanced-uses (as essentially has been done for rustls::Config).
Anyway, this is more a follow-up though

As followup of my comment, I've created #202

This remove exposing token_key, but it still allows the possibility (via quinn configuration).

I'd glad if you want to take a look at the PR.

Thank you

config/server: add token_key to avoid randomization of handshake token

572bb4a

finnbear mentioned this pull request Aug 20, 2024

Upgrade to rustls@0.23 and quinn@0.11 #200

Merged

6 tasks

BiagioFesta approved these changes Aug 21, 2024

View reviewed changes

BiagioFesta merged commit 3fdbfc3 into BiagioFesta:master Aug 21, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

config/server: add `token_key` to avoid randomization of handshake token #201

config/server: add `token_key` to avoid randomization of handshake token #201

finnbear commented Aug 20, 2024 •

edited

Loading

BiagioFesta commented Aug 21, 2024

finnbear commented Aug 21, 2024 •

edited

Loading

BiagioFesta commented Aug 21, 2024

finnbear commented Aug 21, 2024 •

edited

Loading

BiagioFesta commented Aug 21, 2024 •

edited

Loading

BiagioFesta commented Aug 23, 2024

config/server: add token_key to avoid randomization of handshake token #201

config/server: add token_key to avoid randomization of handshake token #201

Conversation

finnbear commented Aug 20, 2024 • edited Loading

Motivation

Testing

Changelog entry

Context

Unresolved questions

BiagioFesta commented Aug 21, 2024

finnbear commented Aug 21, 2024 • edited Loading

BiagioFesta commented Aug 21, 2024

finnbear commented Aug 21, 2024 • edited Loading

BiagioFesta commented Aug 21, 2024 • edited Loading

BiagioFesta commented Aug 23, 2024

config/server: add `token_key` to avoid randomization of handshake token #201

config/server: add `token_key` to avoid randomization of handshake token #201

finnbear commented Aug 20, 2024 •

edited

Loading

finnbear commented Aug 21, 2024 •

edited

Loading

finnbear commented Aug 21, 2024 •

edited

Loading

BiagioFesta commented Aug 21, 2024 •

edited

Loading