-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prevent creating a controller if a CRD does not exist. #840
Comments
/help |
@sophieliu15: Please ensure the request meets the requirements listed here. If this request no longer meets these requirements, the label can be removed In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@sophieliu15: The label(s) In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Using the RESTMapper is a reasonable choice if you want to continue on, but not create the controller. As for the "no longer fails on .Complete", I have a suspicion that this is due to 0fdf465, which moved around when we start Watches to avoid leaks. |
Based on the linked issue, this is mainly a "don't start this optional controller till this thing is installed", right? |
Specifically, it should now fail when you start the manager instead. |
or if the manager is already started (which it looks like from the stack trace), this error should induce manager shutdown |
In retrospect this is semi, expected behavior, but it was not intended to break workflows. I've added a note to the v0.4.0 release. Technically, we're okay because that was a breaking release, but we shouldn't have done it without at least a note. |
Reason for upgrade: The new version uses [DynamicRESTMapper](https://github.com/kubernetes-sigs/controller-runtime/blob/master/pkg/client/apiutil/dynamicrestmapper.go) as default `RESTMapper` for controller runtime manager. `DynamicRESTMapper` will "reload the delegated `meta.RESTMapper` on a cache miss" (see kubernetes-sigs/controller-runtime#554), which can solve problem that we need to restart HNC after adding a new CRD to create corresponding object reconciler when using controller-runtime v0.2.2 (see details in kubernetes-retired#488). Incompatibility issues addressed in this PR: - Upgrade go version in `go.mod` and `Dockerfile` to 1.13. The [errors.As](https://golang.org/pkg/errors/#As) in [dynamicrestmapper.go](https://github.com/kubernetes-sigs/controller-runtime/blob/bfc982769615817ee15db8a543662652d900d27b/pkg/client/apiutil/dynamicrestmapper.go#L48) requires go 1.13. - A higher version for k8s.io/cli-runtime and k8s.io/client-go are required after upgrading controller-runtime to v0.5.0 - Version changes of other packages in `go.mod` are updated automatically. - serializer.DirectCodecFactory was renamed to [serializer.WithoutConversionCodecFactory](https://godoc.org/k8s.io/apimachinery/pkg/runtime/serializer#WithoutConversionCodecFactory) after k8s.io/apimachinery 1.16 (see [here](kubernetes/apimachinery@ed8af17), and [here](kubernetes/apimachinery@4fac835)) - Default [LeaderElectionID](https://github.com/kubernetes-sigs/controller-runtime/blob/bfc982769615817ee15db8a543662652d900d27b/pkg/leaderelection/leader_election.go#L46) in controller-runtime manager is removed after kubernetes-sigs/controller-runtime#446. - [NewlineReporter](https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest/printer) is moved from `sigs.k8s.io/controller-runtime/pkg/envtest/` to `https://godoc.org/sigs.k8s.io/controller-runtime/pkg/envtest/printer` by [this](kubernetes-sigs/controller-runtime@748f55d#diff-42de1d59fbe8f8b90154f01dd01f5748) commit. - In controller-runtime v0.2.2, if a resource does not exist, a controller cannot be created successfully. After update controller-runtime to v0.5.0, a controller can be created without error. However, when the `Reconcile` method is triggered, there will be an error complaining the resource does not exist. Therefore, we will explicitly check if a resource exists before creating the corresponding object reconciler in `createObjectReconciler` in `hnc_config.go` (see details in kubernetes-sigs/controller-runtime#840) Tested: - Unit tests. - Went through [demo script](https://docs.google.com/document/d/1tKQgtMSf0wfT3NOGQx9ExUQ-B8UkkdVZB6m4o3Zqn64/edit#) to make sure HNC behaves as expected on a GKE cluster. - Manually test if the PR solves the restart problem as described in kubernetes-retired#488 with following workflow: - Install HNC - Install a new CRD - Config the new type in `config` singleton Before this PR, corresponding object reconciler for the new type will not be created unless we restart HNC. After the change, corresponding object reconciler can be created and it reconciles objects of the new type as expected without restarting HNC. This partly solve: kubernetes-retired#488
I do feel mildly uncomfortable about inducing manager failure here, but the general policy is that individual controller failure causes general manager failure, so it's in line with that. |
Thanks so much for the explanation! See my replies below.
Yes that is correct!
The manager is already started. Looks like in v0.2.2, if the kind does not exist, we cannot create the new controller, but the manager is also not shutted down. In the latest version, the controller can be created, but then it will fail at controller start time and cause manager shutdown (basically what you said in the release notes). Thanks for the answers and updating the release notes. |
Hey @DirectXMan12 , wdyt about just having Obviously this can wait for a future release of controller-runtime. |
… reconciler After upgrading sigs.k8s.io/controller-runtime version to v0.5.0, we can create reconciler successfully even when the resource does not exist in the cluster. Therefore, we explicitly check if the resource exists before creating the reconciler. See detailed discussion in: kubernetes-sigs/controller-runtime#840 Tested: unit tests, GKE cluster
After upgrading sigs.k8s.io/controller-runtime version to v0.5.0, we can create reconciler successfully even when the resource does not exist in the cluster. Therefore, we explicitly check if the resource exists before creating the reconciler. See detailed discussion in: kubernetes-sigs/controller-runtime#840 Tested: unit tests, GKE cluster
Hmm... The whole weird thing here is that now non-existence is potentially transient, and that it doesn't actually matter if the type doesn't exist on a call to I think this is a question of semantics of On the other hand, delaying it till start doesn't really make sense when the manager is the one in charge of calling start on all the things -- there's no good way to capture the specific failure without wrapping the controller is a new runnable, and there's currently no good way to say "run this and don't stop the manager if it fails". All of this is a long-winded way of saying "I'm not sure". I can't help but think this a symptom of something being wrong with the overall architecture of manager, but I'm not quite certain what (maybe error handling in Anyone have any thoughts? Making |
FWIW, doing non-initialization things in |
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Stale issues rot after 30d of inactivity. If this issue is safe to close now please do so with Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
Rotten issues close after 30d of inactivity. Send feedback to sig-testing, kubernetes/test-infra and/or fejta. |
@fejta-bot: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I hope to create a controller for each GVK that I added dynamically. Previously I noticed that if I started the manager and then installed a new CRD on the fly, I would get an error saying the CRD did not exist when I called
NewControllerManagedBy(mgr)...Complete(r)
. Specifically, this line would return an error. The error message was the following, whereCronTab
was a new CRD that I installed after starting the controller manager:Then I found #554 and updated controller-runtime version to the latest one (v0.5.0) to use
DynamicRESTMapper
. After updated, I started manager without installing CRD forCronTab
. I found that I could create controller forCronTab
successfully (i.e., this line does not return any error). However, when theReconcile
method for the controller ofCronTab
was called, I got the error from pkg/source/source.go complaining the CRD was not found. Following was the error:My questions are:
Not sure if there is a better way (or common practice) to do that?
The text was updated successfully, but these errors were encountered: