Best practices for updating a CRD's spec #1128

dprotaso · 2018-06-11T14:21:12Z

Once we transition to k8s 1.11 and merge the subresources PR #786 we need some best practices around updating the CRD spec.

The code below is problematic as it may trigger a generation bump when it doesn't need to

func (c *Controller) updateService(service *v1alpha1.Service) (*v1alpha1.Service, error) {
	serviceClient := c.ElaClientSet.ServingV1alpha1().Services(service.Namespace)
	existing, _ := serviceClient.Get(service.Name, metav1.GetOptions{})

	// Check if there is anything to update.
	if !reflect.DeepEqual(existing.Spec, service.Spec) {
		return serviceClient.Update(existing)
	}
	return existing, nil
}

Triggering a generation bump becomes inherently expensive for certain CRDs. For a Configuration this will trigger a new revision and it's subsequent pods, services etc. to be created.

Whether a generation bump occurs will depend on the CRD definition. In our case it's very likely as we have nested kubernetes core types which although maybe be semantically equivalent will not be considered equal when using reflect.DeepEqual

An approach to resolve this is to use k8s apimachinery pkg/api/equality for comparison when updating our CRD's spec

A second approach is to utilize the clientset's patch

Related:
Best practices for updating [CRD] status (#1107)
Best practices for updating a CRD's metadata (#1127)

The text was updated successfully, but these errors were encountered:

grantr · 2018-06-11T16:48:06Z

Currently the webhook uses jsonpatch to compare specs and determine equality:

serving/pkg/webhook/webhook.go

Lines 573 to 578 in e1b3aae

    
           specPatches, err := jsonpatch.CreatePatch(oldSpecJSON, newSpecJSON) 
        
           if err != nil { 
        
           	fmt.Printf("Error creating JSON patch:%v", err) 
        
           	return err 
        
           } 
        
           if len(specPatches) > 0 {

mattmoor · 2018-07-08T15:42:21Z

@dprotaso Where was your example from?

The only places I would expect one of our controllers to update the Spec of one of our CRDs is the Service controller reconciling itself with Route and Configuration. We also now widely use equality.Semantic.DeepEqual, although I haven't triaged what might remain (we should).

For me, the "top of mind" remaining thing here isn't CRD specific: How do we avoid fighting with Spec defaulting done by another controller? This is the main reason we aren't reconciling changes to the Deployments we create in the Revision controller today. :(

dprotaso · 2018-07-11T15:23:54Z

Where was your example from?

I most likely tweaked the service's updateStatus for illustrative purposes

serving/pkg/controller/service/service.go

Lines 204 to 217 in 804f5ee

    
           func (c *Controller) updateStatus(service *v1alpha1.Service) (*v1alpha1.Service, error) { 
        
           	existing, err := c.serviceLister.Services(service.Namespace).Get(service.Name) 
        
           	if err != nil { 
        
           		return nil, err 
        
           	} 
        
           	// Check if there is anything to update. 
        
           	if !reflect.DeepEqual(existing.Status, service.Status) { 
        
           		existing.Status = service.Status 
        
           		serviceClient := c.ServingClientSet.ServingV1alpha1().Services(service.Namespace) 
        
           		// TODO: for CRD there's no updatestatus, so use normal update. 
        
           		return serviceClient.Update(existing) 
        
           	} 
        
           	return existing, nil 
        
           }

This becomes a non-issue once we start using the clientset's UpdateStatus against a 1.11 cluster and subresources enabled on our CRDs

How do we avoid fighting with Spec defaulting done by another controller? This is the main reason we aren't reconciling changes to the Deployments we create in the Revision controller today. :(

@grantr and I had a discussion here that might be relevant https://github.com/grantr/serving/pull/3#pullrequestreview-129649072

A thought could be reconciling updates to a Deployment explicitly using a clientset's Patch operation.

dprotaso · 2018-07-11T15:44:22Z

Also we could potentially use a Deployment defaulter function when creating our deployment

https://github.com/kubernetes/kubernetes/blob/master/pkg/apis/apps/v1/zz_generated.defaults.go#L191

mattmoor · 2018-11-28T01:03:15Z

@dprotaso do you want to tackle any follow ups here as part of the 1.11 sub-resource scope?

dprotaso · 2018-11-28T02:00:15Z

@mattmoor @mattmoor-sockpuppet I think it's a requirement.

The only thing remaining wrt to this issue is to use UpdateStatus during reconciliation.

mattmoor · 2018-11-28T05:52:21Z

/assign @dprotaso

mattmoor · 2018-12-19T01:26:25Z

Per above, I think what's left here is tracked by: #643 so closing to dedupe.

* Pin to 1.23 S-O branch * Add 0-kourier.yaml and 1-config-network.yaml to kourier.yaml (#1122) * Rename kourier.yaml with 0-kourier.yaml * Concat the files * fix csv logic (#1125) * Reduce the period and failure threshold for activator readiness (knative#12618) The default drain timeout is 45 seconds which was much shorter than the time it takes the activator to be recognized as not ready (2 minutes) This was resulting in 503s since the activator was receiving traffic when it was not expecting it Co-authored-by: dprotaso <dprotaso@gmail.com> * Address 503s when the autoscaler is being rolled (knative#12621) The activator's readiness depends on the status of web socket connection to the autoscaler. When the connection is down the activator will report ready=false. This can occur when the autoscaler deployment is updating. PR knative#12614 made the activator's readiness probe fail aggressively after a single failure. This didn't seem to impact istio but with contour it started returning 503s since the activator started to report ready=false immediately. This PR does two things to mitigate 503s: - bump the readiness threshold to give the autoscaler more time to rollout/startup. This still remains lower than the drain duration - Update the autoscaler rollout strategy so we spin up a new instance prior to bring down the older one. This is done using maxUnavailable=0 Co-authored-by: dprotaso <dprotaso@gmail.com> * [release-1.2] Drop MaxDurationSeconds from the RevisionSpec (knative#12640) * Drop MaxDurationSeconds from the RevisionSpec (knative#12635) We added MaxDurationSeconds (knative#12322) because the behaviour of RevisionSpec.Timeout changed from total duration to time to first byte. In hindsight changing the behaviour of Timeout was a mistake since it goes against the original specification. Thus we're going to create a path for migration and the first part is to remove MaxDurationSeconds from the RevisionSpec. * fix conformance test * [release-1.2] fix ytt package name (knative#12657) * fix ytt package name * use correct path Co-authored-by: dprotaso <dprotaso@gmail.com> * Remove an unnecessary start delay when resolving tag to digests (knative#12669) Co-authored-by: dprotaso <dprotaso@gmail.com> * Drop collecting performance data in release branch (knative#12673) Co-authored-by: dprotaso <dprotaso@gmail.com> * bump ggcr which includes auth config lookup fixes for k8s (knative#12656) Includes the fixes: - google/go-containerregistry#1299 - google/go-containerregistry#1300 * Fixes an activator panic when the throttle encounters a cache.DeleteFinalStateUnknown (knative#12680) Co-authored-by: dprotaso <dprotaso@gmail.com> * upgrade to latest dependencies (knative#12674) bumping knative.dev/pkg 77555ea...083dd97: > 083dd97 Wait for reconciler/controllers to return prior to exiting the process (# 2438) > df430fa dizzy: we must use `flags` instead of `pflags`, since this is not working. It seems like pflag.* adds the var to its own flag set, not the one package flag uses, and it doesn't expose the internal flag.Var externally - hence this fix. (# 2415) Signed-off-by: Knative Automation <automation@knative.team> * [release-1.2] fix tag to digest resolution (ggcr bump) (knative#12834) * pin k8s dep * Fix tag to digest resolution with K8s secrets I forgot to bump ggcr's sub package in the prior release github.com/google/go-containerregistry/pkg/authn/k8schain * bump ggcr which fixes tag-to-digest resolution for Azure & GitLab (knative#12857) Co-authored-by: Stavros Kontopoulos <st.kontopoulos@gmail.com> Co-authored-by: Knative Prow Robot <knative-prow-robot@google.com> Co-authored-by: dprotaso <dprotaso@gmail.com> Co-authored-by: knative-automation <automation@knative.team>

google-prow-robot added area/API API objects and controllers kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 11, 2018

This was referenced Jun 11, 2018

Best practices for updating a CRD's metadata #1127

Closed

Consider using fine grained RBAC for components which interact with Knative CRDs #1131

Closed

mattmoor added this to the Needs Triage milestone Nov 28, 2018

mattmoor modified the milestones: Needs Triage, Serving 0.3 Nov 28, 2018

knative-prow-robot assigned dprotaso Nov 28, 2018

mattmoor closed this as completed Dec 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Best practices for updating a CRD's spec #1128

Best practices for updating a CRD's spec #1128

dprotaso commented Jun 11, 2018 •

edited

Loading

grantr commented Jun 11, 2018

mattmoor commented Jul 8, 2018

dprotaso commented Jul 11, 2018

dprotaso commented Jul 11, 2018

mattmoor commented Nov 28, 2018

dprotaso commented Nov 28, 2018

mattmoor commented Nov 28, 2018

mattmoor commented Dec 19, 2018

Best practices for updating a CRD's spec #1128

Best practices for updating a CRD's spec #1128

Comments

dprotaso commented Jun 11, 2018 • edited Loading

grantr commented Jun 11, 2018

mattmoor commented Jul 8, 2018

dprotaso commented Jul 11, 2018

dprotaso commented Jul 11, 2018

mattmoor commented Nov 28, 2018

dprotaso commented Nov 28, 2018

mattmoor commented Nov 28, 2018

mattmoor commented Dec 19, 2018

dprotaso commented Jun 11, 2018 •

edited

Loading