-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Change Plugin.stop signature to allow asynchronous stop lifecycle #83612
Comments
Pinging @elastic/kibana-platform (Team:Platform) |
Agree. It doesn't have to be that I'm actually expecting that behavior, but in the opposite way - event log is used by several plugins, and so my hope is that by the time they've One thing we WILL need to do is put a timeout on the |
We are stopping (and awaiting) plugin in reverse order, meaning that if pluginA depends on pluginB, pluginA will start AFTER pluginB but will be stopped BEFORE, so you should be safe regarding this. |
The lifecycles unblocked RFC didn't actually touch on how we want However, to avoid data loss, we need a way for Kibana to gracefully shutdown like clearing all incoming connection queues or in-progress tasks. I mention this in the saved object migrations RFC, but it isn't totally fleshed out. The longer Kibana takes to shutdown gracefully the more downtime a customer has when they're trying to upgrade Kibana. If Kibana is running inside an orchestration layer like Elastic Cloud / Kubernetes then Kibana refusing to shutdown after a SIGINT could cause a long downtime until someone manually intervenes. So I think it makes sense to have an upper bound, after SIGINT kibana won't spend more than X minutes trying to stop safely. If we give each plugin a budget of 30s before we run the next plugin's stop, then it becomes hard to know what's the maximum time it could take, because this depends on how deep the dependency graph is. I'm also not sure what we will do when a plugin's budget is over, we probably just want to go ahead and stop the other plugins in the dependency graph anyway. This means at the end of the day, there's no guarantee that dependant plugins will be truly stopped, so when we call event log's After we receive a SIGINT, core will reject all incoming requests so plugins won't accept new work, but a plugin might be busy processing work. So if event logs are truly critical I think the plugin needs to flush it's queue, resolve the stop promise, and then enter a non-queue mode where every incoming event is immediately persisted. That way it keeps accepting and writing logs as much as possible. Maybe this is overkill since a stop timeout is hopefully an edge case? |
Is waiting one more minute to stop Kibana really an issue when you know the time it takes to perform a 5k+ objects migration?
I can bet that any process unresponsive after a SIGINT in an orchestrated environment will nicely receive a SIGTERM after a given timeout. At least I know that for sure for K8S, or any container-based infra. Having an upper bound / timeout for plugins to stop still of course make sense.
Quite easy actually.
Seems like the thing to do yea. |
The important thing is that we provide users with a fixed number which they can use to for instance set their Kubernetes In the example of audit logs, it will have to wait for all the plugins that depend on it before its shutdown handler is called, which means it potentially gets a very small piece of the shutdown budget. One solution could be to introduce a new |
Might, not will. That is assuming that all plugins would be fully consuming their stop timeout period, which seems very far from being realistic:
Honestly, documenting the required grace period to be 3 or 4 minutes would probably cover the 99 percentile of our shutdown scenarios (But even that number is enormous to be honest. Give or take, but I'd bet 1min should be covering 95pct). We could probably monitor that periodically to see if we need to update this documented fixed number. A Also, as already stated, as the Kibana process can be terminated for a lot of unexpected reasons (power outage, or more realistically
We already have a |
To summarise: here are several teardown cases:
From the list above it doesn't look like the proposed |
I think that's about as good as you can get - I'll take it! |
## Summary Fix #83612 This PR doesn't change any behavior, as we're already supporting (and awaiting) promises returned from `stop` calls to plugin, it just changes the type's signature to reflect that. Also removed empty `stop` methods from existing plugins to make typescript happy. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Initial discussion started because of #80941. In this issue, the
event_log
plugin needs to perform some asynchronous cleanup when Kibana stops.Our current
Plugin
interface does not declarestop
as possibly asynchronous:kibana/src/core/server/plugins/types.ts
Lines 251 to 260 in f8d796a
Even if technically, we are awaiting for plugins to stop:
kibana/src/core/server/plugins/plugins_system.ts
Lines 169 to 174 in f49ee06
I also know that we got plans to deprecates asycnhronous
setup
andstart
methods. However, even when we'll do that, I have the feeling that allowing asynchronousstop
lifecycle would make sense, as ifstop
is only synchronous, there is effectively no way to perform async cleanups during this stage (this process will just terminate if the method needs to perform an async call)So my two questions:
Plugin.stop
signature to reflect that we currently allow plugins to return promises and that we do wait for them?setup
andstart
?The text was updated successfully, but these errors were encountered: