Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gateway) allow user-defined Listeners #2555

Merged
merged 6 commits into from
Jun 23, 2022
Merged

Conversation

rainest
Copy link
Contributor

@rainest rainest commented Jun 9, 2022

What this PR does / why we need it:
Rework managed Listener handling. The Gateway controller no longer
replaces user-provided Listeners with a set of Listeners derived from
the Kong proxy Service and listen configuration. It only checks to
ensure a Kong listen with the correct protocol is available for the
requested Listener. If there is none, it marks the Listener detached
because its protocol is unsupported, or because no protocol is
configured for the specific port.

Remove allowedRoutes merger. This is unnecessary with user-defined
Listeners preserved.

Which issue this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close that issue when PR gets merged): prereq for #2474
fixes #2557

Special notes for your reviewer:

  • I ran across something that looked odd with Gateway addresses in the course of this, and think we should consider only using the LoadBalancer address for them. Should we? Some earlier discussion at Implement Gateway certificateRef #2474 (comment)
  • The new getListenerStatus() function charged with determining both spec-based Listener compatibility and Listener attachability based on Kong configuration is LARGE. I believe it's functionally correct, but it's hard to read. I need to put it down for a bit and come back to figure out how to break it down into more digestible bits. This is now revised a bit to break it down more into functions that hopefully flow more cleanly together. Still large but a lot of that is composing Condition objects over several lines.
  • I realized while writing this that my attempts to avoid taking existing Listeners offline because some new Listener introduced a conflict had a bit of a snag: the fields of a Listener that can conflict are not immutable, and changes to them can result in conflict. More on that below.
  • E2E test updated because we can't test absent listens in the integration tests. We create all listen types in integration.

PR Readiness Checklist:

Complete these before marking the PR as ready to review:

  • the CHANGELOG.md release notes have been updated to reflect any significant (and particularly user-facing) changes introduced by this PR

@rainest rainest temporarily deployed to Configure ci June 9, 2022 19:13 Inactive
@rainest rainest force-pushed the feat/listener-revamp branch from 7828718 to f7f30e9 Compare June 9, 2022 23:07
@rainest rainest temporarily deployed to Configure ci June 9, 2022 23:07 Inactive
@rainest rainest temporarily deployed to Configure ci June 9, 2022 23:07 Inactive
@rainest rainest temporarily deployed to Configure ci June 9, 2022 23:32 Inactive
@rainest rainest force-pushed the feat/listener-revamp branch from f7f30e9 to 8497d62 Compare June 9, 2022 23:54
@rainest rainest temporarily deployed to Configure ci June 9, 2022 23:55 Inactive
@rainest rainest temporarily deployed to Configure ci June 9, 2022 23:55 Inactive
@rainest rainest temporarily deployed to Configure ci June 10, 2022 00:17 Inactive
@rainest rainest force-pushed the feat/listener-revamp branch from 8497d62 to f055475 Compare June 14, 2022 00:10
@rainest rainest temporarily deployed to Configure ci June 14, 2022 00:11 Inactive
@rainest rainest force-pushed the feat/listener-revamp branch from f055475 to f172a75 Compare June 14, 2022 00:14
@rainest rainest temporarily deployed to Configure ci June 14, 2022 00:15 Inactive
@rainest rainest temporarily deployed to Configure ci June 14, 2022 00:15 Inactive
@rainest
Copy link
Contributor Author

rainest commented Jun 14, 2022

Changing a Listener's Protocol, Hostname, or Port (all permitted) invalidates that Listener's previous status calculation. The modified Listener may conflict with some other existing Listener, and essentially needs to be treated as new. This in itself is fine, but it introduces a nasty problem: since Listener status has no information about those fields, we have no idea whether a change occurred unless we keep a copy of the old Gateway around.

Tracking state like this is generally bad practice in Reconcilers, and I agree that it looks kinda ugly, even if it works (I still need to write tests for it to be sure). I briefly discussed this with the SIG and there's no standard around how to handle it, though others have encountered the same problem. There are roughly two approaches in other implementations:

  • Let the user footgun and suffer the consequences. Conflicted Listeners are removed from active configuration until fixed, even if they were previously active.
  • Keep state and do something like what I've attempted here. In that case, the other implementation does not track state in the controller itself, though I haven't explored what options we have to do the same. It's possible that we could figure out some way to detect the "both broken" state and make it block parser applies, which is annoying but less dangerous than abruptly dropping config.

Consensus was that it was too late in the API design phase to make these fields immutable and enforced by the API webhooks or similar.

@rainest rainest temporarily deployed to Configure ci June 14, 2022 00:36 Inactive
@shaneutt
Copy link
Contributor

I went back and forth on this since the meeting last night. The lack of upstream support from Gateway API to solve the problem is a serious issue that we're going to just have to pay for it would seem. I'm starting to lean towards going for the footgun approach for now, the reason being we can easily iterate on top of as we build some experience up with it. If we try to build a lot of custom logic to cleanly handle the problem early, we may end up having a harder time iterating on it.

@rainest rainest temporarily deployed to Configure ci June 14, 2022 23:00 Inactive
@rainest rainest temporarily deployed to Configure ci June 14, 2022 23:01 Inactive
@rainest rainest temporarily deployed to Configure ci June 14, 2022 23:08 Inactive
@rainest rainest temporarily deployed to Configure ci June 14, 2022 23:09 Inactive
@rainest rainest temporarily deployed to Configure ci June 14, 2022 23:09 Inactive
@rainest rainest force-pushed the feat/listener-revamp branch from a8065b7 to 3bfab37 Compare June 14, 2022 23:14
@rainest rainest temporarily deployed to Configure ci June 14, 2022 23:14 Inactive
@rainest rainest temporarily deployed to Configure ci June 14, 2022 23:14 Inactive
@rainest
Copy link
Contributor Author

rainest commented Jun 14, 2022

https://github.com/Kong/kubernetes-ingress-controller/actions/runs/2498691049 is the test E2E run that includes this.

Be careful what you wish for 🙂 making everything conflicted is arguably more complicated because you have to walk through the list again to re-mark anything you'd previously marked okay (which was fine when one, but only one Listener could be ready for a given hostname/port).

This goes ahead and strips Cluster IPs from address status--there wasn't really a clear decision on that, but the community seemed to be leaning towards that one, and the commit is essentially atomic if we want to strip it.

I am utterly confused as to what is going wrong with lint. How is running verify-scripts.sh detecting a diff (in something that it shouldn't inspect at all) on CI and not remotely? https://gist.github.com/rainest/4da9e62249460575f5bc0b39e1c25d94

@rainest rainest marked this pull request as ready for review June 14, 2022 23:35
@rainest rainest requested a review from a team as a code owner June 14, 2022 23:35
@rainest rainest temporarily deployed to Configure ci June 14, 2022 23:37 Inactive
@pmalek
Copy link
Member

pmalek commented Jun 15, 2022

I am utterly confused as to what is going wrong with lint. How is running verify-scripts.sh detecting a diff (in something that it shouldn't inspect at all) on CI and not remotely? https://gist.github.com/rainest/4da9e62249460575f5bc0b39e1c25d94

I believe in internal/controllers/gateway/utils.go it strips the //nolint:unparam because all the parameters are being used. Actually if you rebase on main you should get my change that introduced that removal of said comment.

rainest added 5 commits June 15, 2022 11:07
Rework managed Listener handling. The Gateway controller no longer
replaces user-provided Listeners with a set of Listeners derived from
the Kong proxy Service and listen configuration. It only checks to
ensure a Kong listen with the correct protocol is available for the
requested Listener. If there is none, it marks the Listener detached
because its protocol is unsupported, or because no protocol is
configured for the specific port.

Remove allowedRoutes merger. This is unnecessary with user-defined
Listeners preserved.

Remove utility functions used to compare Kong-derived and
Gateway-derived Listener sets. Comparisons against Kong configuration
now contribute to Listener compatibility checks on a per-Listen basis.
Remove awareness of the previous listener state from the Gateway
controller. Although existing status handling is still in place, it is
no longer reliable.
Remove any awareness of Listener history when calculating status other
than the attachedRoutes figure from Status. If any two Listeners result
in a conflict condition, mark all Listeners with that port or hostname
conflicted. Adding a new Listener in conflict with an existing Listener
will break the existing Listener and detach it.
Exclude ClusterIP addresses from Gateway address status. Only
LoadBalancer addresses, if present, will be advertised.
@rainest rainest force-pushed the feat/listener-revamp branch from 3bfab37 to 97b7f2b Compare June 15, 2022 18:07
@rainest rainest temporarily deployed to Configure ci June 15, 2022 18:07 Inactive
@rainest rainest temporarily deployed to Configure ci June 15, 2022 18:08 Inactive
@rainest rainest temporarily deployed to Configure ci June 15, 2022 18:31 Inactive
@rainest rainest temporarily deployed to Configure ci June 15, 2022 18:40 Inactive
@rainest rainest temporarily deployed to Configure ci June 15, 2022 18:40 Inactive
@rainest
Copy link
Contributor Author

rainest commented Jun 15, 2022

Weird. Different golangci-lint version issues? That appears to be the one tool we don't download to and run from local bin. On mine, I do get:

$ golangci-lint --version
golangci-lint has version v1.45.0 built from (unknown, mod sum: "h1:T2oCVkYoeckBxcNS6DTYiSXN2QcTNuAWaHyLGfqzMlU=") on (unknown)

$ golangci-lint run                                                                                                                       
internal/dataplane/kongstate/upstream.go:87:2: `(*Upstream).override` - `log` is unused (unparam)
	log logrus.FieldLogger,
	^
internal/controllers/gateway/utils.go:24:59: `info` - `keysAndValues` always receives `nil` (unparam)
func info(log logr.Logger, obj client.Object, msg string, keysAndValues ...interface{}) {
                                                          ^

Though I'm also slightly confused since I didn't think that verify tool was capable of touching anything in non-generated code, which is where that lives.

@rainest rainest temporarily deployed to Configure ci June 15, 2022 19:06 Inactive
@pmalek
Copy link
Member

pmalek commented Jun 21, 2022

Weird. Different golangci-lint version issues? That appears to be the one tool we don't download to and run from local bin. On mine, I do get:

$ golangci-lint --version
golangci-lint has version v1.45.0 built from (unknown, mod sum: "h1:T2oCVkYoeckBxcNS6DTYiSXN2QcTNuAWaHyLGfqzMlU=") on (unknown)

$ golangci-lint run                                                                                                                       
internal/dataplane/kongstate/upstream.go:87:2: `(*Upstream).override` - `log` is unused (unparam)
	log logrus.FieldLogger,
	^
internal/controllers/gateway/utils.go:24:59: `info` - `keysAndValues` always receives `nil` (unparam)
func info(log logr.Logger, obj client.Object, msg string, keysAndValues ...interface{}) {
                                                          ^

Though I'm also slightly confused since I didn't think that verify tool was capable of touching anything in non-generated code, which is where that lives.

Yeah, so there's this other thing which is the fact that go-get-tool (described in: #2560) doesn't download new version/check for the current version of the tool accepted as parameter. This is why you're still running v1.45.0 even though that your branch has v1.46.2.

And indeed that seems to be the culprit (version mismatch). After downloading v1.45.0 I can confirm that I'm getting the same errors.

This is probably due to the fact that the newer versions of golangci-lint are "1.18 aware" but not all linters support it yet.

WARN [linters context] nilerr is disabled because of go1.18. You can track the evolution of the go1.18 support by following the https://github.com/golangci/golangci-lint/issues/2649.
WARN [linters context] structcheck is disabled because of go1.18. You can track the evolution of the go1.18 support by following the https://github.com/golangci/golangci-lint/issues/2649.
WARN [linters context] unparam is disabled because of go1.18. You can track the evolution of the go1.18 support by following the https://github.com/golangci/golangci-lint/issues/2649.
WARN [linters context] wastedassign is disabled because of go1.18. You can track the evolution of the go1.18 support by following the https://github.com/golangci/golangci-lint/issues/2649.

@mflendrich mflendrich requested a review from shaneutt June 21, 2022 16:19
Copy link
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we're looking to accept this as an incomplete implementation, more or less: I noticed we have a lot of TODOs without any links to their follow up issues. We should be sure that we link follow-ups, and in the follow-up issues it would be helpful to make sure there's a reminder to clean up comments which reference the issue as a part of resolution.

@rainest
Copy link
Contributor Author

rainest commented Jun 22, 2022

It seems that we're looking to accept this as an incomplete implementation, more or less: I noticed we have a lot of TODOs without any links to their follow up issues. We should be sure that we link follow-ups, and in the follow-up issues it would be helpful to make sure there's a reminder to clean up comments which reference the issue as a part of resolution.

@shaneutt which are you seeing unlinked still? They should all have links, though they're mostly at the bottom of the comment blocks instead of the start.

Cleanup links in the issue body are a good idea, though I'd want to link them to a commit in main instead of the temp commits here that'll be squashed away.

Copy link
Contributor

@shaneutt shaneutt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I was mistaken about missing links, let's merge this and then we can iterate on it from here in the follow-up issues.

@rainest rainest merged commit da63a94 into main Jun 23, 2022
@rainest rainest deleted the feat/listener-revamp branch June 23, 2022 16:54
pmalek pushed a commit that referenced this pull request Jun 24, 2022
Rework managed Listener handling. The Gateway controller no longer
replaces user-provided Listeners with a set of Listeners derived from
the Kong proxy Service and listen configuration. It only checks to
ensure a Kong listen with the correct protocol is available for the
requested Listener. If there is none, it marks the Listener detached
because its protocol is unsupported, or because no protocol is
configured for the specific port.

Remove allowedRoutes merger. This is unnecessary with user-defined
Listeners preserved.

Remove utility functions used to compare Kong-derived and
Gateway-derived Listener sets. Comparisons against Kong configuration
now contribute to Listener compatibility checks on a per-Listen basis.

Remove any awareness of Listener history when calculating status other
than the attachedRoutes figure from Status. If any two Listeners result
in a conflict condition, mark all Listeners with that port or hostname
conflicted. Adding a new Listener in conflict with an existing Listener
will break the existing Listener and detach it.

Exclude ClusterIP addresses from Gateway address status. Only
LoadBalancer addresses, if present, will be advertised.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preserve user-defined Gateway Listeners
3 participants