Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

config/server: add token_key to avoid randomization of handshake token #201

Merged
merged 1 commit into from
Aug 21, 2024

Conversation

finnbear
Copy link
Contributor

@finnbear finnbear commented Aug 20, 2024

Motivation

To refresh TLS certificates, the Endpoint config must be reloaded. If the new config has a different (random) handshake token, then best case in-flight retries will fail. Worst case (as in my application), the next retry will fail regardless of when it happens.

Testing

Changelog entry

- Add `ServerConfigBuilder::token_key` to avoid randomizing handshake token key by @finnbear in #201

Context

Followup to #200

Example code for generating a handshake token:

fn generate_token_key() -> Arc<dyn quinn_proto::crypto::HandshakeTokenKey> {
    use rand::RngCore;
    let rng = &mut rand::thread_rng();
    let mut master_key = [0u8; 64];
    rng.fill_bytes(&mut master_key);
    let master_key = ring::hkdf::Salt::new(ring::hkdf::HKDF_SHA256, &[]).extract(&master_key);
    std::sync::Arc::new(master_key)
}

Unresolved questions

  1. should the example code be included in the docs?
  2. should the example code be exposed from the API? (would require ring and rand deps, but they could be optional)
  3. or should I PR quinn-proto to include it there?

@BiagioFesta
Copy link
Owner

should the example code be included in the docs? [...]

I think this PR is good enough. This is more quinn related (and for advance use); thus, I don't feel an crypto example is required from this crate.


Actually, I am start thinking that we could just allow building a wtransport::*Config directly from quinn::*Config, so we give all flexibility without re-exporting all advanced use cases.


If the new config has a different (random) handshake token, then best case in-flight retries will fail. Worst case (as in my application), the next retry will fail regardless of when it happens.

Can you please provide more details about the worst case? I don't fully understand why we do have it in the first place (naively I would have said only "in-flight retry" could have been invalidated)

@finnbear
Copy link
Contributor Author

finnbear commented Aug 21, 2024

Can you please provide more details about the worst case?

Absolutely! I wrote code like this, reusing the same loop to avoid spawning an additional Future.

let config = new_config(port, &rustls_config);
let endpoint = Endpoint::server(config)?;
let mut reload_rate_limit = RateLimiter::new(Duration::from_secs(60));

loop {
	if !reload_rate_limit.should_limit_rate() {
            // may enter if branch every minute
            // assumption: will enter at least once every 0.5 months, which is sufficient for Let's Encrypt
	    endpoint.reload_config(new_config(port, &rustls_config), false).unwrap();
	}

	let incoming_session = endpoint.accept().await;

	if !incoming_session.remote_address_validated() {
	    incoming_session.retry();
	    continue;
	}

	tokio::spawn(async move {
	    let _ = incoming_session.await;
	});
}

The sequence of events is:

  1. enter the top of the loop (random key = A)
  2. await accept
  3. >60s elapse
  4. browser connects
  5. no signature, address isn't validated
  6. retry sent (random key = A)
  7. continue the loop
  8. rate limit expired, reload config (random key = B)
  9. browser reconnects (signed by random key = A)
  10. signature mismatch, address isn't validated
  11. retry sent (random key = B)
  12. continue the loop
  13. repeat step 9 (because apparently that's how Chrome behaves)

In other words, it's not that there is a reload and the next retry fails (as I originally thought and said). It's that the act of retrying causes the reload to happen at the same time, guaranteeing that the retry will be in-flight.

I could have spawned an additional Interval future which reloads the config every few hours (or moved the config reloading below the validation check). Then only a rare coincidentally in-flight request would fail. However, in that unlikely event, the browser's retry system wouldn't fix the problem (as previously discussed). So that's not a solution.

@BiagioFesta
Copy link
Owner

I see, so as I thought the problem is only for those in-flight "retries" after the config reload (which updates the token key). However, if so, the probability of this to happen is proportional to the frequency of reloading the configuration.

Moreover, as you correctly mentioned. the main issue of your code is because:

if !reload_rate_limit.should_limit_rate() ...
let incoming_session = endpoint.accept().await;  // <--- this "blocks" the reload if no incoming

essentially the problem is mixing "sync" and "async" code.

More idiomatic code could be something like:

loop {
  // ...

  // For example:
  // https://docs.rs/tokio/latest/tokio/time/struct.Interval.html#
  let interval_reload_config: tokio::time::Interval = /* ... */

  tokio::select! {
    () = interval_reload_config.tick() => {
      // reload config
    } 
    incoming = endpoint.accept() => {
      // handle incoming
    }
  }
}

More interesting instead is:

  1. repeat step 9 (because apparently that's how Chrome behaves)

(Just want to be sure I am correctly understanding)
Is this mean Chrome sends INITIAL(tokenA) even if it just received RETRY(tokenB) ?

Anyway that's probably the correct behavior as the standard says:

[...] A client MUST accept and process at most one Retry packet for each connection attempt. After the client has received and processed an Initial or Retry packet from the server, it MUST discard any subsequent Retry packets that it receives.

From: RFC9000$17.2.5.2

@finnbear
Copy link
Contributor Author

finnbear commented Aug 21, 2024

More idiomatic code could be something like:

Yes, makes the failure rare instead of certain, which is an improvement 👍

But impossible is still better than rare.

(I thought of a way to avoid failure without this PR: don't reload the config if sent a retry in last 10 seconds or whatever the retry token validity is set to, and hope that a 10s gap will exist within a 1 month period, which it certainly will for my app. If you want, I can change this PR to simply warn about this in the documentation of reload_config)

(Just want to be sure I am correctly understanding)
Is this mean Chrome sends INITIAL(tokenA) even if it just received RETRY(tokenB) ?

Yes, that's what I meant. This is based on instrumenting quinn-proto with console logs on the token generation and token validation side, and seeing a new token get generated for each of ~6 retries but the same token being (in)validated.

Anyway that's probably the correct behavior as the standard says:

Yeah, my guess is Chrome follows the spec and the 2nd, 3rd, 4th, etc. retries are ignored. Chrome is simply resending the first one, to account for the unreliable network.

@BiagioFesta
Copy link
Owner

BiagioFesta commented Aug 21, 2024

[...] If you want, I can change this PR to simply warn about this in the documentation of reload_config)

Actually I was more curious about the issue than relating this to the PR. In particular, I wanted to see if we had the fully comprehension of the root cause to properly prioritize this. This PR is fine, thank you for that.

My only feeling is that this new API is quite "advanced-use", it also requires a little bit of understanding of the underlying protocol and crypto functions, in contrast with the philosophy of wtransport which aims to have a very friendly high-level user interface (just like W3C broweser API). Anyway, that's ok considering this quinn-specific and, thus, behind quinn feature.

As mentioned, for this kind of "advanced-usage" probably the best option is just having a wtransport::Config built from quinn:Config so with a single "exposed API" (behind quinn feature) we cover all advanced-uses (as essentially has been done for rustls::Config).
Anyway, this is more a follow-up though

Thank you again for those contributions of yours; they are really appreciated! :)

@BiagioFesta BiagioFesta merged commit 3fdbfc3 into BiagioFesta:master Aug 21, 2024
8 checks passed
@BiagioFesta
Copy link
Owner

As mentioned, for this kind of "advanced-usage" probably the best option is just having a wtransport::Config built from quinn:Config so with a single "exposed API" (behind quinn feature) we cover all advanced-uses (as essentially has been done for rustls::Config).
Anyway, this is more a follow-up though

As followup of my comment, I've created #202

This remove exposing token_key, but it still allows the possibility (via quinn configuration).

I'd glad if you want to take a look at the PR.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants