Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Revision should manage activation #1997

Closed
mattmoor opened this issue Sep 11, 2018 · 30 comments · Fixed by #3915
Closed

Revision should manage activation #1997

mattmoor opened this issue Sep 11, 2018 · 30 comments · Fixed by #3915
Assignees
Labels
area/API API objects and controllers area/networking kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/spec Discussion of how a feature should be exposed to customers.
Milestone

Comments

@mattmoor
Copy link
Member

In my opinion, the fact that today Route manages the deactivation of Revisions feels like a layering violation.

In more detail, activation is widely viewed as a feature of Revisions, but today the semantic boundary is that the Revision surfaces inactivity via Active: False, Route(s) referring to inactive Revisions then wire in the activator, and after a grace period the Revision (really KPA) scales the Revision to zero.

However, this means that when inactivate the Revision is Ready: False, which is often a source of confusion, and presents another interesting race (during initial deployment) where the Revision must pass a readinessProbe before it is scaled to zero or the Configuration may never witness its readiness.

I believe that Revision (possibly through the KPA) should manage its own activation.

cc @vaikas-google @evankanderson @tcnghia @grantr @dprotaso for thoughts.

@knative-prow-robot knative-prow-robot added area/API API objects and controllers area/networking kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/spec Discussion of how a feature should be exposed to customers. labels Sep 11, 2018
@mattmoor
Copy link
Member Author

mattmoor commented Oct 8, 2018

I suspect that we want the networking abstractions to land first.

@mattmoor
Copy link
Member Author

mattmoor commented Oct 8, 2018

cc @tcnghia

@lichuqiang
Copy link
Contributor

There is no doubt about the issue itself. But we might want to see #1583 land first?

@imikushin
Copy link
Contributor

I believe this can be achieved this way:

  • Add another k8s-service for each revision, named something like hello-world-01-internal (where hello-world-01 is the revision name) and targeting the revision pods (when/if they exist!) via a label selector
  • the activator will proxy to hello-world-01-internal k8s-service instead of hello-world-01 k8s-service
  • hello-world-01 k8s-service will become the facade for the revision:
    • when the revision is Active, i.e. has ready pods, hello-world-01 will target the revision pods (via a label selector)
    • when the revision is Inactive, i.e. has no pods, hello-world-01 will target the activator (by changing the service type to ExternalName and pointing to the activator k8s-service)

In this design, it's the job of the revision controller to update the revision facade k8s-service. Route becomes completely activation-agnostic. In fact, it becomes even revision agnostic, because all it cares about is just a k8s-service to hit.

@mattmoor
Copy link
Member Author

This won't provide the activator with enough context to activate the revision. The way we provide this context today is decorating the request with an extra header.

@imikushin
Copy link
Contributor

What component does add the header now?

@mattmoor
Copy link
Member Author

When we change ClusterIngress (backed by Istio) to point at the Activator service, we include rewrite logic to decorate the request with the extra header.

@cppforlife
Copy link
Contributor

cppforlife commented Oct 30, 2018 via email

@imikushin
Copy link
Contributor

/assign

@imikushin
Copy link
Contributor

@cppforlife I've got the same question! Why not use Host header?

@imikushin
Copy link
Contributor

Why not append headers to all requests, not just to inactive revisions?

@tcnghia
Copy link
Contributor

tcnghia commented Oct 31, 2018

May be it is doable using a Route header, not a Host header because we can't add header per split. For example, for a split like 60% -> Blue, 40% -> Green we can only add the same header for Blue and Green.

However, with a Route header we could probably do this. Both Blue and Green are pointing to activator, and activator can flip coins to route to either Blue-internal or Green-internal.

@tcnghia
Copy link
Contributor

tcnghia commented Oct 31, 2018

Or, instead of a Route header we could probably send a Split header, like {"Blue-internal": 60, "Green-internal": 40} to decouple from knowledge of Route.

@lichuqiang
Copy link
Contributor

Trying to sum up the header stuff:

  1. In VitualService, we'll need to append route header info of all its splits: revisions name/namespace, percent.
  2. In activator, it flip coins to choose a revision. And also, I think it should replace the origin host header in request with destination service of the chosen revision (e.g. hello-world-01-internal). This can also avoid the issue caused by envoy proxy inconsistency (Need a Route prober to confirm effectiveness of VirtualService changes #1582). Any side effect of this?

Another more ideal approach is to add revision host header in VitualService. This way, activator does not need to do the choice, but honestly pass though the revision header.
Unfortunately, currently it is not supported to append header per destination in a split.
We can revisit this when it is supported in istio (envoy) side.

@imikushin
Copy link
Contributor

@lichuqiang How does splitting work now when all revisions are active?

@imikushin
Copy link
Contributor

Got it (after staring at the current code for a while): VirtualService doesn't currently support adding different headers to different splits of traffic! 🦆

@imikushin
Copy link
Contributor

Istio currently (as of v1.1-snapshot2) can't correctly work with ExternalName services: istio/istio#9950

But there's a workaround (also mentioned in the issue ☝️)

@lichuqiang
Copy link
Contributor

Thanks for digging this!
What I kind of concerned about is that to manage activation, revision controller would have to be aware of istio resource (ServiceEntry) this way, which is not what we expect to see.

@imikushin
Copy link
Contributor

Well, that's exactly my concern. So, my current plan is

  1. Have Incorrect handling of ExternalName k8s services istio/istio#9950 fixed (either myself or with some help)
  2. Get back to implementing this.

knative-prow-robot pushed a commit that referenced this issue Apr 10, 2019
For #1997.
This will be reused in HPA and KPA changes.
knative-prow-robot pushed a commit that referenced this issue Apr 10, 2019
* Use generic pkg/resources helpers for endpoint counting

* Move SKS helpers to a shared place.

For #1997.
This will be reused in HPA and KPA changes.

* some transient state

* remove public service and endpoint reconciliation

* remove public service and endpoint reconciliation

* formatting unification
nimakaviani pushed a commit to nimakaviani/serving that referenced this issue Apr 11, 2019
* Use generic pkg/resources helpers for endpoint counting

* Move SKS helpers to a shared place.

For knative#1997.
This will be reused in HPA and KPA changes.

* some transient state

* remove public service and endpoint reconciliation

* remove public service and endpoint reconciliation

* formatting unification
vagababov added a commit to vagababov/serving that referenced this issue Apr 17, 2019
Bring back public service and manual endpoint reconciliation
This is part of the knative#1997.
knative-prow-robot pushed a commit that referenced this issue Apr 18, 2019
)

* Revert the change that removed the public service creation in SKS

Bring back public service and manual endpoint reconciliation
This is part of the #1997.

* remove the transient state

* fix logging

* some random test
@mattmoor
Copy link
Member Author

@vagababov I think where we are qualifies. Let's close this now?

@vagababov
Copy link
Contributor

once the global resync tests merge in,

@vagababov
Copy link
Contributor

Now we can close this one :)

@vagababov
Copy link
Contributor

/close

@knative-prow-robot
Copy link
Contributor

@vagababov: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@imikushin
Copy link
Contributor

Yay!

vagababov added a commit to vagababov/serving that referenced this issue Apr 26, 2019
- move port related funcs and constants to its own file. register.go
  sounded wrong for that
- update the SKS code and tests to deal with the change
- also move a new function there that will be used for probing quite
  often
knative-prow-robot pushed a commit that referenced this issue Apr 27, 2019
- move port related funcs and constants to its own file. register.go
  sounded wrong for that
- update the SKS code and tests to deal with the change
- also move a new function there that will be used for probing quite
  often
duglin pushed a commit to duglin/serving that referenced this issue Dec 6, 2019
* Use generic pkg/resources helpers for endpoint counting

* Move SKS helpers to a shared place.

For knative#1997.
This will be reused in HPA and KPA changes.

* some transient state

* remove public service and endpoint reconciliation

* remove public service and endpoint reconciliation

* formatting unification
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/API API objects and controllers area/networking kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. kind/spec Discussion of how a feature should be exposed to customers.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants