fix(consumer): duplicate ID2 breaking DB-mode #1007

mflendrich · 2021-01-19T13:36:11Z

fixes #729

Before this PR, KIC failed to apply config to DB-backed Kong, in
case the config contained a Consumer with two credentials having
equivalent ID2 fields (Key for key-auth, ClientID for OAuth2, etc.).

This commit de-duplicates credentials of each type, by replacing the
Consumer store for credentials with a map (ID2 -> credential object),
and updates relevant tests.

Also, this commit implements consistent credential ordering.

refactor(controller): implement sortByString

This REVERSES the sorting order from reverse to normal.

The history behind reverse order is purely accidental. There are no
known reasons in favor of one or the other. What matters is the order
being deterministic.

test(controller): toDeckContent sorts creds

test(kongstate): deduplicates credentials

rainest · 2021-01-20T15:27:38Z

How does this handle sort order for credentials that have the same ID2? It's effectively random, correct?

I think this introduces a footgun that allows you to inadvertently replace in-use configuration, e.g.

UserA creates a JWT credential with key secret and RSApubkeyA. Their application starts using this credential.
UserB, having not consulted with UserA, creates another JWT credential with key secret and RSApubkeyB.
The controller arbitrarily chooses UserB's credential after sorting, rejects UserA's credential as a duplicate, and updates state to include UserB's credential only.
UserA's application starts receiving 401s.

This is somewhat mitigated because this only handles credentials for a single consumer, not credentials on separate consumers (we'll still attempt to apply those and hit the same admin API error). We'd need to further tiebreak by credential creation time to avoid that.

shaneutt

Generally LGTM, just a couple of small comments 👍

internal/ingress/controller/kong.go

internal/ingress/controller/kong_test.go

internal/ingress/controller/parser/kongstate/consumer.go

shaneutt

Good explanations in the code review, thank you.

LGTM ✔️

mflendrich · 2021-01-20T22:38:26Z

How does this handle sort order for credentials that have the same ID2? It's effectively random, correct?

The order is directly as returned from informer stores. Therefore, we need to treat it as effectively not guaranteed to be anything meaningful.

I think this introduces a footgun that allows you to inadvertently replace in-use configuration, e.g.

1. UserA creates a JWT credential with key `secret` and RSApubkeyA. Their application starts using this credential.

2. UserB, having not consulted with UserA, creates another JWT credential with key `secret` and RSApubkeyB.

3. The controller arbitrarily chooses UserB's credential after sorting, rejects UserA's credential as a duplicate, and updates state to include UserB's credential only.

4. UserA's application starts receiving 401s.

That is correct, and this is a part of a bigger problem: given two conflicting entities that made it through our admission validations, KIC needs to distill some valid config from it and apply it to Kong. Otherwise the only option is for KIC's sync loop to lock up, which is terrible user experience.

The problem is that a fair "tie breaker" for conflicting configs is an unsolved problem. The best we can do today is to drop an arbitrary subset that makes the config no longer nonsensical.

This is somewhat mitigated because this only handles credentials for a single consumer, not credentials on separate consumers (we'll still attempt to apply those and hit the same admin API error). We'd need to further tiebreak by credential creation time to avoid that.

I would like to avoid using creation time because it introduces an external factor (creation time) to KIC's semantics. This external factor is likely to surprisingly change if the user, for example, restores their k8s objects from a backup, or from a gitops repository. Therefore I don't see a clear benefit of using creation time rather than picking an arbitrary resource. I'm all in for deterministic and well-defined conflict resolution, but I would rather see something that relies on the configuration that the user controls (e.g., in case of conflicting resources, drop one that has a lexicographically greater namespace+name).

I recommend leaving this as-is and creating an issue to be solved later (deterministic conflict resolution based on user-controlled factors). WDYT?

Context: #729 Before this PR, KIC failed to apply config to DB-backed Kong, in case the config contained a Consumer with two credentials having equivalent ID2 fields (Key for key-auth, ClientID for OAuth2, etc.). This commit de-duplicates credentials of each type, by replacing the Consumer store for credentials with a map (ID2 -> credential object), and updates relevant tests. Also, this commit implements consistent credential ordering.

This REVERSES the sorting order from reverse to normal. The history behind reverse order is purely accidental. There are no known reasons in favor of one or the other. What matters is the order being deterministic.

Co-authored-by: Shane Utt <shaneutt@users.noreply.github.com>

codecov · 2021-01-21T11:44:39Z

Codecov Report

Merging #1007 (70dbb1a) into next (daa9afa) will increase coverage by 1.13%.
The diff coverage is 64.70%.

@@            Coverage Diff             @@
##             next    #1007      +/-   ##
==========================================
+ Coverage   49.56%   50.70%   +1.13%     
==========================================
  Files          32       32              
  Lines        3198     3205       +7     
==========================================
+ Hits         1585     1625      +40     
+ Misses       1483     1440      -43     
- Partials      130      140      +10

Impacted Files	Coverage Δ
...l/ingress/controller/parser/kongstate/kongstate.go	`20.49% <0.00%> (ø)`
internal/ingress/controller/kong.go	`16.25% <56.52%> (+9.58%)`	⬆️
...al/ingress/controller/parser/kongstate/consumer.go	`73.25% <74.07%> (-1.75%)`	⬇️
internal/ingress/controller/parser/parser.go	`85.26% <0.00%> (+1.33%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update daa9afa...70dbb1a. Read the comment docs.

mflendrich · 2021-01-26T18:36:07Z

Solution blocked by discrepancy between DB-enabled and DB-less mode of Kong (as described in Kong/kong#6732). Semantics of conflict resolution to be defined following agreement with Kong/kong maintainers on the desired semantics of duplicate entries.

rainest · 2021-01-28T22:52:19Z

Re discussion elsewhere on whether to use creation timestamp or something else, I'd like to step back and reject the underlying premise:

This is a part of a bigger problem: given two conflicting entities that made it through our admission validations, KIC needs to distill some valid config from it and apply it to Kong.

This is hamstrung by a lack of any valid configuration in this scenario. If you've created two credentials that collide, we can only apply one, and which we choose to apply affects the proxy behavior. I originally suggested creation timestamp because it seemed least likely to result in a change in proxy behavior, since the older resource would have a greater chance of being applied first, but it's not a great solution either.

Any tiebreaker we choose is effectively arbitrary, but user-controlled tiebreakers pose an odd question: if you know that two resources collide, why would you set the tiebreaker such that one resource is used rather than either deleting one of the resources or choosing non-colliding values? If you don't know, you could try at random until your configuration wins, but I don't think it's something we'd want to encourage.

Otherwise the only option is for KIC's sync loop to lock up, which is terrible user experience.

Agreed, but it at least fails relatively safely, since your proxy will continue using existing configuration. Assuming you're not aware of the conflict (again, if you are, you should resolve it by other means), using a tiebreaker results in one of the following:

New configuration loses the tiebreaker. The controller doesn't actually apply your configuration, and reports it applied configuration successfully. You don't know why your configuration isn't working.
New configuration wins the tiebreaker. The controller reports success, and as far as you're concerned, everything is working great. However, someone else's configuration has spontaneously stopped working, and they probably aren't happy. They don't know why this happened, because the controller just reports a successful sync, and their configuration wasn't deleted.

These are both bad user experience, and potentially more disruptive, since the latter obliterates in-use configuration.

The real issue with the sync loop blocking is that we cannot discard configuration piecemeal: it'd be a much smaller problem if it did not lock up entirely and prevent sync of other (valid) configuration added after. This is hard to solve, but I don't think attempting to mask collisions is a good way to handle it. Absent some way to report (and ideally reject) the problem configuration, I think the only safe option is to let the status quo (sync block) stand.

mflendrich · 2021-01-29T19:19:59Z

Any tiebreaker we choose is effectively arbitrary, but user-controlled tiebreakers pose an odd question: if you know that two resources collide, why would you set the tiebreaker such that one resource is used rather than either deleting one of the resources or choosing non-colliding values? If you don't know, you could try at random until your configuration wins, but I don't think it's something we'd want to encourage.

I fully agree that we don't want to encourage KIC with invalid configurations.

That said, as a matter of fact, users will keep feeding invalid configurations to KIC. What can we do about it? I see three layers of defense here:

prevent invalid configurations from being fed to KIC altogether
given invalid configuration, warn the user visibly
given invalid configuration, fail gracefully

(1) is achieved by CRD schemas and webhook validations. None of these is capable of validating unique constraints across all resources in the cluster, other than resource name. There are measures we can theoretically take to prevent some of these cases, but conflicts will fall through the cracks. Outside of the scope of this issue I think, but we can agree here to implement such a solution separately to improve UX.

(2) KIC does this today by print warning log entries. A better way to do that would be in status fields of affected resources. A perfect way would also to enhance KIC with a Kubernetes resource for generated configuration (likely happening as part of #702 - @shaneutt) and put the overall success/failure/errors/warnings in the status thereof.

(3) is the clue here.

It's quite clear that if KIC config contains 2 or more conflicting entities, there's no theoretical possibility to apply a working and correct config for that subset of configuration because there's no "correct" way to do it. This is obviously an unrecoverable failure scenario.

In an unrecoverable failure scenario one thing that we have under control is the scope of that failure. And:

if KIC stops syncing config altogether, then that's equivalent to the entire KIC failing. The user loses the ability to configure any other workloads and consumers, possibly causing an outage elsewhere. Imagine a multi-tenant k8s cluster with one ingress controller where:
- tenant A accidentally creates a route on a critical workload that causes a disaster (e.g. directs all traffic to a backend service that should be cached instead)
- tenant B creates a consumer conflict in their toy app
- tenant A reverts the route change (ineffectively) and sees their application melt down further
if KIC contains the failure within conflicting resources, operations for other resources continue without downgrade

So, the question here is whether, in case of failure, we choose to degrade only to the extent of the conflicted resources, or to degrade everything.

Here are the concerns about the proposed solution I've understood from #1007 (comment) (please correct me if I missed something):

KIC syncing new configuration after a conflicts was introduced can cause the user to oversee the conflict
adding new conflicting config may obliterate the old conflicting config
we cannot discard configuration piecemeal

On (1): I believe that we can solve this by providing better feedback about configuration errors - using status fields on (existing and future) resoures. As soon as we have a resource for the entire rendered configuration, its status would be the most natural place to post such errors - as well as status of conflicted resources themselves.

On (2): the premise that old config is more meaningful than new config, and should always win, incompatible with the Kubernetes configuration model that is based on eventual consistency and no temporal relationships between entities.
Furthermore, I view the solution of preserving old configuration by locking up the entire sync loop as throwing the baby out with the bathwater. Alas, I don't see any better way to achieve preserving old configuration.

On (3): why not? This is a serious question. Maybe I'm missing some hidden complexity. My thinking is: "walk through all the resources in a deterministic, well known order. If the currently processed resource is conflicted with the already seen state, warn loudly and assume the fiction that it didn't exist in the config".

Therefore I propose that:

we do improve user feedback by utilizing status fields to expose conflicts,
we do minimize the configuration impact of a configuration error by leaving everything but the conflicted resources unaffected,
we drop the idea to break ties by entity age (unless we find a way that doesn't use timestamps and doesn't lock up too many resources)

WDYT @rainest?

mflendrich · 2021-01-29T19:28:39Z

Removed the "blocked" label because the blocking premise (need to rely on Kong/kong maintainers' input about the discrepancy between db and dbless) is obsolete.

mflendrich · 2021-02-11T11:46:11Z

Closing this PR - migration of KIC to Railgun will command a different solution.

mflendrich changed the base branch from main to next January 19, 2021 14:21

mflendrich force-pushed the fix/duplicate-creds branch 2 times, most recently from c2b8086 to 3169a69 Compare January 20, 2021 13:16

mflendrich marked this pull request as ready for review January 20, 2021 13:31

mflendrich requested review from hbagdi and rainest as code owners January 20, 2021 13:31

mflendrich requested a review from shaneutt January 20, 2021 13:31

mflendrich force-pushed the fix/duplicate-creds branch from 7506c1f to ec7d4a8 Compare January 20, 2021 13:35

shaneutt reviewed Jan 20, 2021

View reviewed changes

mflendrich requested a review from shaneutt January 20, 2021 21:19

shaneutt previously approved these changes Jan 20, 2021

View reviewed changes

mflendrich and others added 6 commits January 21, 2021 12:42

refactor(controller): implement sortByString

3b03f7e

This REVERSES the sorting order from reverse to normal. The history behind reverse order is purely accidental. There are no known reasons in favor of one or the other. What matters is the order being deterministic.

test(controller): toDeckContent sorts creds

feff0fc

test(kongstate): deduplicates credentials

b97ed34

apply PR suggestion

87e167e

Co-authored-by: Shane Utt <shaneutt@users.noreply.github.com>

apply PR suggestion: golint

70dbb1a

mflendrich dismissed shaneutt’s stale review via 70dbb1a January 21, 2021 11:42

mflendrich force-pushed the fix/duplicate-creds branch from 92e8cd6 to 70dbb1a Compare January 21, 2021 11:42

shaneutt added the blocked label Jan 27, 2021

mflendrich removed the blocked label Jan 29, 2021

mflendrich closed this Feb 11, 2021

shaneutt added this to the Kubernetes Ingress Controller (KIC) 2.0 milestone Mar 2, 2021

mflendrich modified the milestones: Kubernetes Ingress Controller (KIC) 2.0 - Rearchitecture Using Kubebuilder SDK, KIC 2.0-preview.1 Mar 18, 2021

shaneutt deleted the fix/duplicate-creds branch October 27, 2021 21:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(consumer): duplicate ID2 breaking DB-mode #1007

fix(consumer): duplicate ID2 breaking DB-mode #1007

mflendrich commented Jan 19, 2021 •

edited

Loading

rainest commented Jan 20, 2021

shaneutt left a comment

shaneutt left a comment

mflendrich commented Jan 20, 2021

codecov bot commented Jan 21, 2021 •

edited

Loading

mflendrich commented Jan 26, 2021

rainest commented Jan 28, 2021

mflendrich commented Jan 29, 2021

mflendrich commented Jan 29, 2021

mflendrich commented Feb 11, 2021

fix(consumer): duplicate ID2 breaking DB-mode #1007

fix(consumer): duplicate ID2 breaking DB-mode #1007

Conversation

mflendrich commented Jan 19, 2021 • edited Loading

rainest commented Jan 20, 2021

shaneutt left a comment

Choose a reason for hiding this comment

shaneutt left a comment

Choose a reason for hiding this comment

mflendrich commented Jan 20, 2021

codecov bot commented Jan 21, 2021 • edited Loading

Codecov Report

mflendrich commented Jan 26, 2021

rainest commented Jan 28, 2021

mflendrich commented Jan 29, 2021

mflendrich commented Jan 29, 2021

mflendrich commented Feb 11, 2021

mflendrich commented Jan 19, 2021 •

edited

Loading

codecov bot commented Jan 21, 2021 •

edited

Loading