-
Notifications
You must be signed in to change notification settings - Fork 680
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: using domain-qualified finalizers #6023
Conversation
Signed-off-by: Roger Torrentsgenerós <rogert@spotify.com>
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #6023 +/- ##
==========================================
- Coverage 37.03% 36.99% -0.04%
==========================================
Files 1313 1317 +4
Lines 131622 132471 +849
==========================================
+ Hits 48742 49006 +264
- Misses 78652 79210 +558
- Partials 4228 4255 +27
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Roger Torrentsgenerós <rogert@spotify.com>
}, | ||
} | ||
|
||
assert.NoError(t, fakeKubeClient.GetClient().Create(ctx, o)) | ||
|
||
p.OnBuildIdentityResource(ctx, tctx.TaskExecutionMetadata()).Return(o, nil) | ||
pluginManager := PluginManager{plugin: &p, kubeClient: fakeKubeClient} | ||
pluginManager := PluginManager{plugin: &p, kubeClient: fakeKubeClient, updateBackoffRetries: 5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This updateBackoffRetries
parameter ends up being e.updateBackoffRetries
in
retryBackoff := wait.Backoff{
Duration: time.Duration(e.updateBaseBackoffDuration) * time.Millisecond,
Factor: 2.0,
Jitter: 0.1,
Steps: e.updateBackoffRetries,
}
If this is unset (which it was in tests), then Steps: 0
which means wait.ExponentialBackoff(retryBackoff, func() (bool, error)
inside Finalize()
times out almost immediately, which effectively means e.clearFinalizer()
is never called and the finalizer is never removed. The default setting is 5 but I think it's too dangerous a) to allow users to tweak these backoff settings, or b) to use an exponential backoff at all.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original context of this change had to do with informers getting stale info in the case of array nodes.
cc: @pvditt
to allow users to tweak these backoff settings
Do you mean we should validate that it's a strictly positive value, right?
to use an exponential backoff at all.
Can you expand on that?
}, | ||
} | ||
|
||
assert.NoError(t, fakeKubeClient.GetClient().Create(ctx, o)) | ||
|
||
p.OnBuildIdentityResource(ctx, tctx.TaskExecutionMetadata()).Return(o, nil) | ||
pluginManager := PluginManager{plugin: &p, kubeClient: fakeKubeClient} | ||
pluginManager := PluginManager{plugin: &p, kubeClient: fakeKubeClient, updateBackoffRetries: 5} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The original context of this change had to do with informers getting stale info in the case of array nodes.
cc: @pvditt
to allow users to tweak these backoff settings
Do you mean we should validate that it's a strictly positive value, right?
to use an exponential backoff at all.
Can you expand on that?
That, or completely remove the ability to set an arbitrary value there. I can picture situations where a too short backoff expires when the k8s client inside the backoff func tries to talk to a busy/lagged/high network latency k8s apiserver.
What I meant is we have to decide first if it's ok to leave to-be-deleted objects in the cluster with a finalizer nobody is going to remove, or not. If that's ok we don't need a backoff, we just need to try once and we have to make sure no ultrafast backoff gives up before the one try has even been performed. But OTOH, if we don't want to leave any garbage behind maybe we should retry infinitely instead. I am up for exponentially spacing the retries not to hammer the apiserver, but I think such retries should keep happening until the removal finally succeeds. However I think all of this is outside the scope of this PR :D |
Signed-off-by: Roger Torrentsgenerós <rogert@spotify.com>
+1 to retry with backoff. We’ve noticed some pods staying in the namespace indefinitely, and we’ve had to manually remove them, which is not great. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM overall, @pvditt mind taking another look as well
I'm in favor of keeping the current retries w/ backoff as is for now. If that update fails after the currently hardcoded retries, it should bubble up in an error. We did run into issues as @pingsutw mentioned, but that was w/ an external service re-adding finalizers after propeller cleared them. Thanks for cleaning up a lot of the finalizers logic - I agree that propeller should only remove flyte specific finalizers. Changes look great. |
Tracking issue
Closes #6019
Why are the changes needed?
Switching to domain-qualified finalizers. Kubernetes introduced a warning in kubernetes/kubernetes#119508 so using old finalizers is harmless today, but updating them for the sake of clean Flyte admin logs and in advance of possible future enforcements of such finalizers.
What changes were proposed in this pull request?
flyte-finalizer
toflyte.org/finalizer
k8s
plugin finalizer fromflyte/flytek8s
toflyte.org/finalizer-k8s
array
plugin finalizer fromflyte/array
toflyte.org/finalizer-array
finalizers.go
andfinalizers_test.go
files and start leveraging the finalizer goodies in the upstreamcontrollerutil
packageHow was this patch tested?
Unit tests were modified to check for the presence/absence of the new finalizer. Some new tests were added and some existing tests were fixed so that
clearFinalizer()
func was actually run.Setup process
Screenshots
Check all the applicable boxes
Related PRs
Docs link