Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SNI match criteria to routes #863

Merged
merged 14 commits into from
Dec 9, 2020
Merged

Add SNI match criteria to routes #863

merged 14 commits into from
Dec 9, 2020

Conversation

rainest
Copy link
Contributor

@rainest rainest commented Sep 18, 2020

What this PR does / why we need it:

Add SNI match criteria to Kong routes when the source Ingress has a TLS rule with hostnames.

Which issue this PR fixes
fixes internal case FTI-1881. Looks like we didn't create a GH issue for that yet. Whoops.

Special notes for your reviewer:

This change always adds this configuration if possible from the Ingress object. This introduces two possible issues:

  • If any Kong functionality differs for routes with no SNI information but with hostname information, this change will break that functionality for controller-managed routes. This likely applies to any controller-managed routes that expect clients with older TLS stacks, i.e. clients whose TLS stacks lack SNI support but will still send well-formed HTTP/1.1 requests including a Host header.
  • This does not allow any Kong functionality that relies on the ability to specify an SNI match criterion that does not correspond to some hostname match criterion, possibly introducing murky scenarios where wildcard route hostnames are not the same as wildcard route SNIs. Expected impact here is limited, but not out of the question.

As-is, practical testing indicates this works as expected. This remains a WIP because it lacks unit tests and does not attempt to account for the edge cases above.

@rainest
Copy link
Contributor Author

rainest commented Sep 23, 2020

Revised, now more or less complete, with tests. Questions and/or points of interest arose!

  • @hbagdi's request to use the rule hostname makes sense. Routes are, after all, derived from the spec rules, not the TLS info, and the TLS info may very well include wildcard hostnames even when the rules do not (https://kubernetes.io/docs/concepts/services-networking/ingress/#tls isn't actually clear on this--wildcards just aren't mentioned there). This introduces a bit of a conundrum where the desired SNI match criteria could be broader than the hostname criteria, but in practice that shouldn't matter, since the more specific HTTP hostname match would ultimately disqualify the route anyway. HOWEVER...
  • The Ingress spec allows for wildcard hostnames in rules, and the Kong proxy allows wildcard HTTP hostname matches. The Kong proxy does not allow wildcard SNI match criteria at present. Because the controller sources both HTTP hostname and SNI criteria from Ingress rules, it's basically just screwed for any wildcard hostname rule, as it cannot derive the absolute hostname it'd need to provide to the admin API when creating a Kong route. This is perhaps more something for @hishamhm to mull over, but for our immediate future, we just won't add any SNI criteria for wildcard hostnames.
  • My new tests follow most of the tests in parser_test.go, and only test against v1beta1 Ingress objects. This seems wrong after introducing support for v1 Ingresses, but that's presumably a known tech debt entry on our balance sheet (if it wasn't, it is now!). Separate tests for v1beta1 and v1 where no difference exists between the two (as far as I know, that is the case for TLS info) seem overly verbose, but maybe we don't intend to add any sort of unified test system given that v1beta1 is deprecated anyway.
  • New tests went into parser_test.go, whereas new code went into translate.go. Review of the existing tests suggests that translate_test.go mostly checks overall state stuff (e.g. which SNI hostnames (for certificates, not routes) map to which Secrets across all parsed Ingresses) whereas parser_test.go includes tests for individual pieces of route configuration (e.g. whether a methods annotation configures route methods). That's probably something we do intend to rearrange later as part of that larger refactoring project, but didn't seem like it'd be in scope here, even though it feels wrong.

@rainest rainest marked this pull request as ready for review September 25, 2020 18:07
@rainest
Copy link
Contributor Author

rainest commented Sep 25, 2020

Forgot to mark this ready the other day. IMO main question is whether we'd want to defer this until we can add an override annotation or some other means of explicitly providing SNI values in the event that the automatic default doesn't work. That allow users to handle the "support SNI-incapable clients" (provide the empty string to disable automatic population from the rule hostname) and wildcard case (non-ideal, but allows you to provide a value where you'd otherwise have none).

As-is, this works well when clients support SNI (most do) and rules have absolute hostnames. Unsure how common wildcard host rules are, but the fallback behavior isn't terrible (you simply don't get SNI info for them, and would be in the same situation as you are without this change--mTLS requests a client cert when it may not need it).

Especially if there are other review concerns, given the planned 1.0 release timeframe, adding

@rainest
Copy link
Contributor Author

rainest commented Sep 30, 2020

Force-pushed to fold in-progress commits into the main commit, with a separate commit for the override annotation.
d3b97f2 is basically all of the old diff prior to the force push if you're trying to get your bearings. The annotation commit is mostly complete, but lacks:

  • KongIngress support. I need to add this, but figure I'll hold off until WIP review of the rest of the logic (it should basically just mirror the annotation logic, so if that needs to change, I'd rather avoid writing the KongIngress code now and have to rewrite it later along with the annotation code).
  • Tests for the override strip behavior, i.e. that adding konghq.com/snis: "" to an Ingress does in fact create a route with no SNI criteria, even when it has hostname rules and TLS info. Not sure if there's a good place to put this at present--it looks like most of the existing test code works with annotations alone or extracted route objects. It's a good candidate for something to go into the new E2E testing Michal's working on, since it's most easily tested by providing an actual Ingress.

Minor point to note during review: we kinda allow you to specify TLS info on a single Ingress only while omitting it from other Ingresses with a hostname covered by that certificate, which in turn configures Kong to present that certificate for the omitted routes (Kong does not bind certificates to routes at all; they have their own SNI configuration, and we de-duplicate TLS info to create it). You cannot do the same and get SNI match criteria, as that's conditioned on having TLS info present.

That maybe doesn't matter so much for what we do, but does raise some questions: I'm unsure how the router behaves if you provide SNI match criteria on a route that's dual-stack HTTP+HTTPS, and if doing so actually makes the router not match for HTTP requests (they won't have SNI ever, because they don't have TLS), that's arguably a core bug, but a significant enough one that it'd probably block this feature. Hopefully this isn't the case, but it's major enough that I want to check.

@rainest
Copy link
Contributor Author

rainest commented Oct 2, 2020

Hope is a fickle thing, and my fears were correct. See Kong/kong#6425 for why this is now donotmerge.

Assuming larger core changes are farther out, we'll probably want to do an initial controller release that only has the annotation/manual SNI configuration. Manual configuration does not actually work around the core issue, but it does allow you to choose which routes are affected.

@rainest rainest removed the do not merge let the author merge this, don't merge for them. label Oct 22, 2020
@rainest
Copy link
Contributor Author

rainest commented Oct 22, 2020

Force-pushed to pull in latest next and clean up the WIP commits. Latest adds KongIngress support, so this should be good to go. It does neuter automatic SNI criteria handling for the time being: while I think this is a useful quality of life feature, Kong/kong#6425 precludes it. We'd still want to add a flag to gate it once that issue is fixed, however, as older Kong versions will still include the router bug.

Copy link
Contributor

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several nits attached.

I recommend that we hold this PR back until #869 gets in, and then add an E2E test for this case. This functionality clearly deserves an E2E test case I think.

internal/ingress/annotations/annotations.go Outdated Show resolved Hide resolved
internal/ingress/annotations/annotations.go Outdated Show resolved Hide resolved
internal/ingress/annotations/annotations.go Outdated Show resolved Hide resolved
internal/ingress/controller/parser/kongstate/route.go Outdated Show resolved Hide resolved
internal/ingress/controller/parser/translate.go Outdated Show resolved Hide resolved
internal/ingress/controller/parser/translate.go Outdated Show resolved Hide resolved
@rainest
Copy link
Contributor Author

rainest commented Oct 29, 2020

Commenting on all the items raised outside context because I'm trying to address two competing goals with the inline comments: when I'm working through a review, it's currently useful for me to resolve inline comments as I fix them locally, because that marks them "done" and I don't try and pay attention to them still when I've already written a commit locally, and can focus on those I haven't yet addressed.

At the same time, this makes it annoying to try and then respond to them inline, because the contents are hidden. Alas, I must choose one or the other, and working on a set of responses in an non-contextual comment works well enough for me while I'm stuck on a small screen because pandemic workspaces 🤷‍♀️

  • Pluralization: The consistency recommendation is good, and I've updated the code to use the plural throughout in ad6632b. The inconsistency prior to review stems from of me thinking about different things at different times when writing this code. "Server Name Indication", the TLS feature, is never plural, because it's a concept/feature, and making that plural makes no sense. When I started this, I was thinking about that. Kong routes and certificates, however, do refer to plural "snis", because in that context "snis" doesn't refer the feature, it refers to the hostnames allowed for the route or matched by a certificate, and those are plural. "sni_hostnames" would have perhaps been a better choice, but the time to make that choice is long past. I was thinking about those later, but didn't catch everything in the course of writing later updates.
  • Capitalization: That inconsistency happened because my brain was fighting to sometimes address the convention in the controller (which usually uses capitals for variables that include acronyms within their name) and Kong's code/API (which lowercases pretty much everything). Historically, support work had me jumping into a number of different codebases depending on wherever I needed to add a hotfix, and conventions vary wildly throughout our codebases. I'd usually try to match whichever one I was working on at the time, but that strategy breaks down now that I'm split between our Go code and our Lua code. 4bd17e9 hits that in the Go code here, along with the suggestion to flip the safe/unsafe prefix, since that was in the same part of the code.
  • nil slice: 7b303d3 address that for both this and the methods annotation, which is where I copy-pastaed that from. Since that addresses something outside the primary scope of this PR, it shouldn't be squashed (the others should be)
  • commented code in PR: I agree with the reasoning here. My rationale for leaving it in was that I thought it'd be hard to review out of context in a diff, and I did want to review for it: essentially, assuming we get a core fix for Dual stack HTTP+HTTPS routes cannot contain SNI match criteria while still honoring their protocols criteria kong#6425, does that automatic handling/code for it make sense? That's probably hard to evaluate meaningfully now, since the exact form of the fix isn't clear: we should get one, but it's not a simple change, and we may have to account for new behavior in any future automation on the controller end. As such, a6330bf to strip out the comments, as git show to save off the original commit that added them and attaching that diff to an issue is probably sufficient now that the team is at least familiar with the problem.
  • E2E tests: have not actually added these. My rationale is that those are most useful to test the proposed automated SNI configuration, and that's being destroyed because the proxy issue would make them incredibly dangerous. I furthermore think we care more about testing E2E behavior after Dual stack HTTP+HTTPS routes cannot contain SNI match criteria while still honoring their protocols criteria kong#6425 is fixed: we don't want to confirm that behavior, it's broken. As such, I think we should defer that to hopeful future automation, and have noted as much in Infer appropriate Kong route snis value from Ingress contents automatically #929. Even if we ultimately don't end up the adding that, any test for the annotation/KongIngress version will need to account for the same behavior, though I'm less sure we'd need an E2E test for those (IMO they're much simpler, and the unit tests alone are fine for them).

Copy link
Contributor

@mflendrich mflendrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. I do think that this scenario requires proper integration testing, though:

This PR introduces a significant change to KIC's behavior that will be a critical part of the setup for users relying on SNI/mTLS. From the discussion around this PR it's pretty clear that we plan further changes to this functionality. I think that such changes pose a significant regression risk for users.

One of the purposes of the E2E test is that this functionality won't regress, either because of a change on our side, or on Kong core's. There are multiple moving parts involved in SNI-based (and possibly mTLS-enabled) proxying. Therefore, I'm convinced that we need to guarantee stability of this functionality, and that an E2E test case is the easiest way to achieve that, given the complexity of this feature.

Example of a test that I consider necessary here:

  • Define multiple SNI-enabled routes, some of these routes requiring a client certificate and some not
  • Connect to Kong trying to access certain routes with various certificates, and ensure that mTLS works as intended in those cases.

@rainest
Copy link
Contributor Author

rainest commented Oct 29, 2020

This PR introduces a significant change to KIC's behavior

Howso? I'd argue that without the originally proposed, and now removed modification, that it doesn't. Annotations/KongIngress fields that set Kong route fields aren't novel; we already have six of them, and this new annotation/field uses the same implementation pattern as the rest.

The only unusual thing about it is that it does perform a validation and sanitization step to strip spaces and reject invalid SNI hostnames that we know Kong will reject. That's not entirely novel either: we do the same thing for when translating methods.

While validation is arguably the most complex part, it's not very complex: it checks the hostname string against a regular expression using regexp and, if it observes an invalid hostname, logs it along with the attached route without setting the Kong route snis field. We can't write a useful E2E test for that, since the end result of the validation doing anything more than a no-op (regex happy, validation successful) is that the controller doesn't create any Kong configuration for the annotation.

this functionality won't regress, either because of a change on our side, or on Kong core's

How do you want to handle this in light of the last point in my previous comment? I understand the desire to do this in general, but in this case there's an extra complication: we know that the current core behavior is already broken. It's not completely broken (HTTPS requests that include SNI in the client handshake do indeed route correctly), but the core team has already acknowledged that the issue is real and that they are writing a fix for it.

While that's in progress, we won't know the full scope of the change until it's done and merged, and we know that any E2E test we write now will have to change to accommodate the new behavior and possible changes to the existing HTTPS behavior. I'm not trying to say that we shouldn't do it ever, I'm trying to say that it's hard to do now, and that we can acknowledge those external complexities and commit to returning to it later, which is why I said as much in #929.

@rainest
Copy link
Contributor Author

rainest commented Nov 2, 2020

https://gist.github.com/rainest/493ccad267ad7e641d37b51dff3d1089 provides an example test that checks this with the current proposed changes in #869. However, it relies on magic values for the verification commands, and we should not include it as such. E.g.

"$(curl -k -sw '%{http_code}' -o /dev/null https://example.net:27443/bar --resolve example.net:27443:127.0.0.1)" == 200

To take advantage of --resolve (which, AFAIK, is necessary to send a chosen SNI value, here example.net while still sending traffic to the test server), I need a separate value for the test server IP and HTTPS port, whereas $SUT_HTTPS_HOST combines the two. Splitting those out via string parsing is possible, but seems less than ideal. run-all-tests.sh currently does set HTTPS_PORT, but does not expose it to tests. It does not set a hostname variable; it hardcodes the test server IP (127.0.0.1) when creating the SUT_HTTP(S)_HOST variables.

@rainest rainest requested a review from mflendrich November 2, 2020 20:06
@mflendrich mflendrich mentioned this pull request Nov 16, 2020
@mflendrich
Copy link
Contributor

@rainest

https://gist.github.com/rainest/493ccad267ad7e641d37b51dff3d1089 provides an example test that checks this with the current proposed changes in #869.

The test logic looks good to me.

Splitting those out via string parsing is possible, but seems less than ideal.

I fully agree. I am fine with either doing something like this in verify.sh:

IFS=: read sut_hostname sut_port <<< "$SUT_HTTPS_HOST"
# now $sut_hostname and $sut_port yield the host and port

or patching the test machinery to break up SUT_HTTP(S)_HOST into hostname/IP and port parts. Up to you.

@rainest
Copy link
Contributor Author

rainest commented Dec 2, 2020

Went with the suggested string mangling route. Not ideal, but difficulty with mostly string-based operation in the existing bash-based system is the rationale for #939, so we may as well stick with that for now since the transformation from the existing value exposed in the test infrastructure isn't that complicated.

Noticed a minor block in that this is targeted at next, but we haven't merged main into next in a while and only main has the E2E tests. We should be fine to go ahead and do that, as only this and #859 (which is old enough that it'd probably need some merge work anyway) are currently open against next, but I didn't want to go ahead and do that unilaterally.

a200315 adds the modified test into a branch that cherry-picks the E2E stuff only; it passes, so we should be good to go after the main->next merge.

There are still some minor cleanup tasks after because of manifest changes. My changes to the base manifests don't conflict with anything, but the generated versions in this branch do conflict with main. We don't care about generated stuff really, so to bring this in line with post-merge next, it will need:

git rebase next -Xtheirs
./hack/build-single-manifests.sh
git commit deploy/ -m "chore(deploy) update single manifests"

That looks good to me locally.

@rainest rainest mentioned this pull request Dec 7, 2020
Travis Raines added 11 commits December 8, 2020 09:09
Add SNI match criteria to Kong routes when the source Ingress has a TLS
rule with hostnames.
Add an Ingress annotation and KongIngress functionality for setting
route SNI match criteria. Although the controller adds SNI criteria
automatically for Ingresses that contain rules with hostnames and have
TLS information, there are some scenarios that require setting SNI
manually:

* You wish to strip SNI match criteria from the route, in order to
  support SNI-incapable clients. SNI annotation processing is unusual to
  accommodate this: providing a "konghq.com/snis: ''" annotation is
  distinct from not providing the annotation at all, and instructs the
  controller to strip SNI information that it'd normally add
  automatically.
* You use wildcard hostnames in your rules. Kong does not allow wildcard
  match criteria as of 2.1, and as such you must enumerate the specific
  SNI matches you want when using a wildcard hostname.
* You wish to use a completely different SNI match criteria that does
  not match your rule hostnames at all. This is highly atypical, and
  should only be done if you know exactly why you need to do it.
Disable automatic SNI criteria addition by commenting out the code that
adds it. This is a useful convenience feature, but would cause breakage
because of Kong/kong#6425

Once that issue is fixed, we should re-enable this functionality, but
gate it with a flag, to avoid issues with older Kong versions.
Use SNI (rather than sni) throughout, and prefix which version is unsafe
rather than which version is safe.
@rainest rainest added this to the 1.1.0 milestone Dec 8, 2020
deploy/single/all-in-one-dbless-k4k8s-enterprise.yaml Outdated Show resolved Hide resolved
test/integration/cases/07-sni-route/verify.sh Outdated Show resolved Hide resolved
@@ -0,0 +1,7 @@
#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add test cases verifying that:

  • https requests to example.com and example.net present appropriate certificates - stackoverflow - hopefully our test environment has openssl installed
  • https requests to example.com/bar and example.net/foo yield 404
  • http requests to example.com/foo and example.net/bar yield 404

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 404 check. The router doesn't actually handle choosing what certificate to present, and will happily SNI route even if the certificate doesn't actually match. You wouldn't do that in practice (the test YAML seeks to present a realistic configuration even though there are parts of it we don't care about), but as far as the proxy implementation is concerned, they're not actually coupled together.

Given that presenting the correct certificate isn't actually in scope for what the route SNI property does and that there's no good way to extract the certificate hostname (both curl and s_client will only present that in human-readable verbose output--neither has a dedicated "return CN only" tool), I think the 404 test should be sufficient here.

Copy link
Contributor

@mflendrich mflendrich Dec 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I have just realized that the current test (in this PR) does not verify the functionality changed by this PR.

We need a test that ensures that, if there is an Ingress resource with a konghq.com/snis annotation, then it won't match if a request comes with a different SNI. I wrote baf06f6 to fix that. Lines 14-15 fail for KIC without this PR, but pass for KIC with this PR. (This commit also adds further 404 cases that I asked for in a previous comment in this thread).

@rainest rainest requested a review from mflendrich December 8, 2020 22:19
@mflendrich
Copy link
Contributor

Could we cherry-pick one test commit as explained in #863 (comment) ? Then the PR is ready to go 👍

@mflendrich mflendrich merged commit d234886 into next Dec 9, 2020
@mflendrich mflendrich deleted the feat/route-sni branch December 9, 2020 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants