Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit number of alertmanager alerts in alert group to autoresolve #1779

Merged
merged 10 commits into from
Apr 24, 2023

Conversation

iskhakov
Copy link
Contributor

@iskhakov iskhakov commented Apr 18, 2023

What this PR does

This PR set the limit so that workers won't attempt to autoresolve too big alertmanager alert groups.

Which issue(s) this PR fixes

Checklist

  • Unit, integration, and e2e (if applicable) tests updated
  • Documentation added (or pr:no public docs PR label added if not required)
  • CHANGELOG.md updated (or pr:no changelog PR label added if not required)

@@ -145,7 +145,7 @@ class AlertGroup(AlertGroupSlackRenderingMixin, EscalationSnapshotMixin, models.
"GroupData", ["is_resolve_signal", "group_distinction", "web_title_cache", "is_acknowledge_signal"]
)

SOURCE, USER, NOT_YET, LAST_STEP, ARCHIVED, WIPED, DISABLE_MAINTENANCE = range(7)
SOURCE, USER, NOT_YET, LAST_STEP, ARCHIVED, WIPED, DISABLE_MAINTENANCE, NOT_YET_STOP_AUTORESOLVE = range(8)
Copy link
Contributor Author

@iskhakov iskhakov Apr 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not the most elegant solution, but we already have source NOT_YET, so adding NOT_YET_STOP_AUTORESOLVE.
I don't want to add the new flag for that

@@ -22,6 +22,11 @@ def resolve_alert_group_by_source_if_needed(alert_group_pk):
alert_group.active_resolve_calculation_id
)
else:
if alert_group.resolved_by == alert_group.NOT_YET_STOP_AUTORESOLVE:
return "alert_group is too big to auto-resolve"
if alert_group.alerts.count() > AlertGroupForAlertManager.MAX_ALERTS_IN_GROUP_FOR_AUTO_RESOLVE:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if more than N alerts in alert group, set the flag and make the last attempt to autoresolve

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this design!

Comment on lines 61 to 65
alert_group = alert.group
if alert_group.resoved_by != alert_group.NOT_YET_STOP_AUTORESOLVE:
task = resolve_alert_group_by_source_if_needed.apply_async((alert.group.pk,), countdown=5)
alert.group.active_resolve_calculation_id = task.id
alert.group.save(update_fields=["active_resolve_calculation_id"])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if alert group has flag set, don't even start the task. This will decrease the number of load to the database

@@ -22,6 +22,11 @@ def resolve_alert_group_by_source_if_needed(alert_group_pk):
alert_group.active_resolve_calculation_id
)
else:
if alert_group.resolved_by == alert_group.NOT_YET_STOP_AUTORESOLVE:
return "alert_group is too big to auto-resolve"
if alert_group.alerts.count() > AlertGroupForAlertManager.MAX_ALERTS_IN_GROUP_FOR_AUTO_RESOLVE:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this design!

@iskhakov iskhakov added the pr:no public docs Added to a PR that does not require public documentation updates label Apr 20, 2023
@iskhakov iskhakov marked this pull request as ready for review April 20, 2023 07:42
@iskhakov iskhakov requested a review from a team April 20, 2023 07:42
@iskhakov iskhakov enabled auto-merge April 24, 2023 05:23
@iskhakov iskhakov removed the pr:no public docs Added to a PR that does not require public documentation updates label Apr 24, 2023
@iskhakov iskhakov added this pull request to the merge queue Apr 24, 2023
Merged via the queue into dev with commit 6e61643 Apr 24, 2023
@iskhakov iskhakov deleted the iskhakov/alertmanager-autoresolve branch April 24, 2023 05:43
joeyorlando added a commit that referenced this pull request May 8, 2023
# What this PR does

`SOURCE_CHOICES` was updated in #1779 but we forgot to include the
accompanying database migration. If you run `make engine-manage
CMD="makemigrations"`, this migration is output. This PR simply adds
that.

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
iskhakov pushed a commit that referenced this pull request May 12, 2023
# What this PR does

`SOURCE_CHOICES` was updated in #1779 but we forgot to include the
accompanying database migration. If you run `make engine-manage
CMD="makemigrations"`, this migration is output. This PR simply adds
that.

## Checklist

- [ ] Unit, integration, and e2e (if applicable) tests updated (N/A)
- [ ] Documentation added (or `pr:no public docs` PR label added if not
required) (N/A)
- [ ] `CHANGELOG.md` updated (or `pr:no changelog` PR label added if not
required) (N/A)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants