-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Limit number of alertmanager alerts in alert group to autoresolve #1779
Conversation
@@ -145,7 +145,7 @@ class AlertGroup(AlertGroupSlackRenderingMixin, EscalationSnapshotMixin, models. | |||
"GroupData", ["is_resolve_signal", "group_distinction", "web_title_cache", "is_acknowledge_signal"] | |||
) | |||
|
|||
SOURCE, USER, NOT_YET, LAST_STEP, ARCHIVED, WIPED, DISABLE_MAINTENANCE = range(7) | |||
SOURCE, USER, NOT_YET, LAST_STEP, ARCHIVED, WIPED, DISABLE_MAINTENANCE, NOT_YET_STOP_AUTORESOLVE = range(8) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not the most elegant solution, but we already have source NOT_YET
, so adding NOT_YET_STOP_AUTORESOLVE
.
I don't want to add the new flag for that
@@ -22,6 +22,11 @@ def resolve_alert_group_by_source_if_needed(alert_group_pk): | |||
alert_group.active_resolve_calculation_id | |||
) | |||
else: | |||
if alert_group.resolved_by == alert_group.NOT_YET_STOP_AUTORESOLVE: | |||
return "alert_group is too big to auto-resolve" | |||
if alert_group.alerts.count() > AlertGroupForAlertManager.MAX_ALERTS_IN_GROUP_FOR_AUTO_RESOLVE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if more than N alerts in alert group, set the flag and make the last attempt to autoresolve
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this design!
engine/apps/integrations/tasks.py
Outdated
alert_group = alert.group | ||
if alert_group.resoved_by != alert_group.NOT_YET_STOP_AUTORESOLVE: | ||
task = resolve_alert_group_by_source_if_needed.apply_async((alert.group.pk,), countdown=5) | ||
alert.group.active_resolve_calculation_id = task.id | ||
alert.group.save(update_fields=["active_resolve_calculation_id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if alert group has flag set, don't even start the task. This will decrease the number of load to the database
@@ -22,6 +22,11 @@ def resolve_alert_group_by_source_if_needed(alert_group_pk): | |||
alert_group.active_resolve_calculation_id | |||
) | |||
else: | |||
if alert_group.resolved_by == alert_group.NOT_YET_STOP_AUTORESOLVE: | |||
return "alert_group is too big to auto-resolve" | |||
if alert_group.alerts.count() > AlertGroupForAlertManager.MAX_ALERTS_IN_GROUP_FOR_AUTO_RESOLVE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this design!
# What this PR does `SOURCE_CHOICES` was updated in #1779 but we forgot to include the accompanying database migration. If you run `make engine-manage CMD="makemigrations"`, this migration is output. This PR simply adds that. ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated (N/A) - [ ] Documentation added (or `pr:no public docs` PR label added if not required) (N/A) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) (N/A)
# What this PR does `SOURCE_CHOICES` was updated in #1779 but we forgot to include the accompanying database migration. If you run `make engine-manage CMD="makemigrations"`, this migration is output. This PR simply adds that. ## Checklist - [ ] Unit, integration, and e2e (if applicable) tests updated (N/A) - [ ] Documentation added (or `pr:no public docs` PR label added if not required) (N/A) - [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not required) (N/A)
What this PR does
This PR set the limit so that workers won't attempt to autoresolve too big alertmanager alert groups.
Which issue(s) this PR fixes
Checklist
pr:no public docs
PR label added if not required)CHANGELOG.md
updated (orpr:no changelog
PR label added if not required)