-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Alerting] Explanation and approach to removing the "siem.notification" saved object and rule type #112209
Comments
Pinging @elastic/security-detections-response (Team:Detections and Resp) |
Task manager will mark unregistered task types as 'unrecognized' and stop claiming them. Here is the PR that introduced that behavior: #84273 So if you removed the task definition for |
We have some precedent for using a task manager task to clean up saved objects. Here is the PR for that: #96971 The reason we did this was because previously, if an action execution failed, we kept the Perhaps a similar approach could be taken to clean up these |
From discussions, if we delay this work past 7.16.0 (or not ever) the expected behaviors would be that we would have at most 3 dead Saved Objects and task manager would stop doing claims against |
Reading through past discussions it seems like we introduced Given that we expect there to be only 3 saved objects, conceptually this fits in better with a task that does cleanup after Kibana has started, there's no strong reason to do this during the migration. Pragmatically, even though we're removing the API in 8.0 we could use this in 7.16 to cleanup Would be interested to hear what the rest of @elastic/kibana-core thinks. |
Ya, seems like that's the safest thing to do, and maybe it's time to have some sort of generalized "cleanup" task for alerting, to handle multiple types of things, and hopefully only run once per migration every Kibana version (or on a very long interval). |
@rudolf why would we remove this API in 8.0? Users are able to upgrade from 7.x to 8.x without going through 7.last. Wouldn't we put these upgrades at risk? |
@kobelb Further context can be found here: #106991 (comment). |
Thanks everyone so far, unless there is any strong objections to the current 🌟 approach of 1. which is delay the work and not do it for 7.16.0 I think from our end we are completely fine with the approach and that is our current choice or at the very least this is moved to the bottom of our 7.16.0 backlog. But please! Keep talking, and we will keep reading. |
That's true, OTOH the second main usage of This is definitely not a great way to handle such cleanup but AFAIK atm thats the only one we got.
If we were to use this API to let type owners register 'cleanup' tasks, we could just make this more official and just keep this API for this specific purpose on 8+? |
Yeah, it's impossible to remove the risk that plugins at some point create huge amount of saved objects and we've seen this happen at least 3 times during 7.x. We could hack a fix into core the next time there's a bug, but having an API that makes it easy to prevent upgrade downtime seems worth keeping. So in the short term it's maybe more a question of priorities, is it worth spending time on this now given that there's only a few teams that could/would use this. |
The API is already in place. What needs to be done apart from deciding to keep it in 8.0+? |
@rylnd is this issue still necessary given your recent update here: #112327 (comment)? |
@mikecote I think not. We've committed to maintaining these legacy saved objects for now, and migrating them to proper actions as users touch the rules. I believe this can be closed. |
We removed the legacy notification system during part 1 here:
#109722
Which included the code for the rule type of:
You can see this alert/rule type through the query:
Starting in 7.16.0 we now use the proper Kibana alerting notification system for all newly created alerts which does not require us to carry around this special rule type anymore 🎉 .
However, for the
siem.notifications
alert type we would like to ensure its removal/retirement to:.kibana
index if this is possible.siem.notifications
is now gone. I have not tested that ghost fires would happen on an upgrade but I want to ask is it possible? At worst if so, I would expect just errors of the form,has resulted in Error: Rule type "siem.notifications" is not registered.
but those errors might repeat endlessly. I have not seen this on my system since I removed the code fwiw locally.Hence we feel it's important enough to try to do a removal of this saved object during an upgrade to
7.16.0+
I do not know if there is a suggested/recommended way for removing alert types that are no longer used or if there is a kibana core way that is recommended to remove saved objects in general. We have seen this slice of code:
https://github.com/elastic/kibana/blob/master/src/core/server/saved_objects/types.ts#L285
Called
excludeOnUpgrade
as one viable option and read through the reasons why it was added for actions and task manager.However it appears to run on each upgrade and that you cannot target a specific upgrade meaning it would run on each and every upgrade. Also it looks like you can only have one instance of it for your entire saved object which means if you have multiple rule types in the future you have to keep appending/changing the query or create a type of "append" query pipe specific to alert types you want to retire and then provide us a way to plug into that.
But taking the query above of:
And making a pull request to add that to the
excludeOnUpgrade
for the alerting saved object would on upgrades remove oursiem.notifications
For task manager it does spawn the task as seen with this query:
And we would like to add this query to the task manager's existing query to remove tasks/clean up tasks involving the siem notifications to avoid issues unless there is a better suggested way to let task manager clean things up.
Approaches:
🌟 1. We could do nothing for 7.16.0. This seems the safest and if we do not see repeating error messages our risks are just on other upgrades something goes wrong with the dead SO objects such as getting in the way of some other migration. This seems a low risk though. From conversations below and through other communications it seems like these would be just left over dead SO objects. Later when the
excludeOnUpgrade
is either solidified or another one is added we would then complete this work. If kibana-alerting makes a more generic routine way for us to plug into then we don't have to do any work on our side.2. We could use the
excludeOnUpgrade
but with the understanding that it will go away in 8.0.0 and be replaced with something else we migrate to. Risks are thatexcludeOnUpgrade
runs on each upgrade cycle. Also we have to make a PR against alerting and push it together with any others they already have for their task SO and their alert SO's.3. We could use a task manager and during our plugin "start up" run a one-off task that will clean these up for us. Risk is we are maintaining code that could get in the way later or malfunction and remove all tasks or something silly. We would have to write more code for this effort or utilize our small startup framework migration we have in draft mode for this work.
So far it seems like most people are leaning towards approach 🌟 1 unless during an upgrade or testing we do see error messages repeating.
The text was updated successfully, but these errors were encountered: