-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(controller): Improve resilience to transient errors. Fixes #3791 and #3217 #3800
Conversation
|
||
// we must check to see if the pod exists rather than just optimistically creating the pod and see if we get | ||
// an `AlreadyExists` error because we won't get that error if there is not enough resources | ||
obj, exists, err := woc.controller.podInformer.GetStore().Get(cache.ExplicitKey(pod.Namespace + "/" + pod.Name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This solution depends on the informer being correct when queries here.
If the informer is incorrect, i.e believe that a pod has not been created when it in fact has, then the workflow will error.
An alternative solution is to do podInterface.Get(pod.Name)
and see what error it returns (i.e. is it an apierr.IsNotFound(..)
?).
@@ -56,6 +56,7 @@ AUTH_MODE := client | |||
endif | |||
K3D := $(shell if [ "`which kubectl`" != '' ] && [ "`kubectl config current-context`" = "k3s-default" ]; then echo true; else echo false; fi) | |||
LOG_LEVEL := debug | |||
UPPERIO_DB_DEBUG := 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
disabled the SQL logging by default now (I found it confusing)
@@ -2,6 +2,6 @@ apiVersion: kustomize.config.k8s.io/v1beta1 | |||
kind: Kustomization | |||
|
|||
resources: | |||
- minio-pod.yaml | |||
- minio-deployment.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changing to a deployment allows us to disable mysql by scaling it down
@@ -0,0 +1,9 @@ | |||
apiVersion: v1 | |||
kind: PersistentVolumeClaim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we need a PVC for data to survive scaled down/up
@@ -5,7 +5,7 @@ metadata: | |||
data: | |||
containerRuntimeExecutor: pns | |||
executor: | | |||
imagePullPolicy: Never | |||
imagePullPolicy: IfNotPresent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change the default for tests
// * Fails the `reapplyUpdate` pre-condition - it can never recover. | ||
// * It will double the number of Kubernetes API requests. | ||
if woc.orig.ResourceVersion != woc.wf.ResourceVersion { | ||
panic("cannot re-apply update with unequal original and modified resource versions") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of panic. Can we re-queue the workflow and return? otherwise, log monitor like (Splunk) will keep reporting the panic
Broken up into smaller PRs that will be easier to review and to revert if/when needed. |
Checklist:
"fix(controller): Updates such and such. Fixes #1234"
.Fixes #3791
Fixes #3217
Fixes #3645
See #1913
See #3812
See #3745
Depends on #3846
Changes
exceeded quota
errors. Fixes #3791 #3851IsTransientErr
to tolerate transient errors.