-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use periodic task for heartbeats #2723
Merged
Merged
Changes from 3 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
6027afd
Use periodic task for heartbeats
iskhakov 01579d2
Merge branch 'dev' into iskhakov/fix-heartbeats
iskhakov 472eceb
Update engine/apps/heartbeat/models.py
iskhakov edbaca3
Merge branch 'dev' into iskhakov/fix-heartbeats
mderynck 2a5ea67
Small fixes to process_heartbeat_task, add celery beat schedule
mderynck 1cdb5b7
Change comments formatting
iskhakov 67456c5
Cleanup and fix tests
iskhakov 07385b0
Edit tests
iskhakov 8417400
Edit tests
iskhakov 4c50497
Fix tests
iskhakov daa3c34
Merge branch 'dev' into iskhakov/fix-heartbeats
iskhakov 9ec678e
Fix tests
iskhakov 5ba1aa7
Fix tests
iskhakov b668e7e
Add a special case for sqlite
iskhakov 0a887cd
Fix
iskhakov f9e9103
Fix for DurationField
iskhakov a6e6db2
Merge branch 'dev' into iskhakov/fix-heartbeats
iskhakov f2705d0
Update views.py
iskhakov e1c120b
Update CHANGELOG.md
iskhakov 6612d98
Small fix
iskhakov b09ccd0
Merge branch 'dev' into iskhakov/fix-heartbeats
iskhakov File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,57 +1,92 @@ | ||
from time import perf_counter | ||
|
||
from celery.utils.log import get_task_logger | ||
from django.db import transaction | ||
from django.db.models import DurationField, ExpressionWrapper, F | ||
from django.db.models.functions import Now | ||
from django.utils import timezone | ||
|
||
from apps.heartbeat.models import IntegrationHeartBeat | ||
from apps.integrations.tasks import create_alert | ||
from common.custom_celery_tasks import shared_dedicated_queue_retry_task | ||
|
||
logger = get_task_logger(__name__) | ||
|
||
|
||
@shared_dedicated_queue_retry_task() | ||
def integration_heartbeat_checkup(heartbeat_id: int) -> None: | ||
from apps.heartbeat.models import IntegrationHeartBeat | ||
def check_heartbeats() -> None: | ||
""" | ||
Periodic task to check heartbeats status change and create alerts (or auto-resolve alerts) if needed | ||
""" | ||
# Heartbeat is considered enabled if it | ||
# * has timeout_seconds set to non-zero (non-default) value, | ||
# * received at least one checkup (last_heartbeat_time set to non-null value) | ||
enabled_heartbeats = ( | ||
IntegrationHeartBeat.objects.filter(last_heartbeat_time__isnull=False).exclude(timeout_seconds=0) | ||
# Convert integer `timeout_seconds`` to datetime.timedelta `timeout` | ||
# microseconds = seconds * 10**6 | ||
# TODO: consider migrate timeout_seconds from IntegerField to DurationField | ||
.annotate(timeout=(ExpressionWrapper(F("timeout_seconds") * 10**6, output_field=DurationField()))) | ||
) | ||
with transaction.atomic(): | ||
# Heartbeat is considered expired if it | ||
# * is enabled, | ||
# * is not already expired, | ||
# * has not received a checkup for timeout period | ||
expired_heartbeats = enabled_heartbeats.filter(last_heartbeat_time__lte=(timezone.now() - F("timeout"))).filter( | ||
previous_alerted_state_was_life=True | ||
) | ||
# Schedule alert creation for each expired heartbeat after transaction commit | ||
for heartbeat in expired_heartbeats: | ||
transaction.on_commit( | ||
lambda: create_alert.apply_async( | ||
kwargs={ | ||
"title": heartbeat.alert_receive_channel.heartbeat_expired_title, | ||
"message": heartbeat.alert_receive_channel.heartbeat_expired_message, | ||
"image_url": None, | ||
"link_to_upstream_details": None, | ||
"alert_receive_channel_pk": heartbeat.alert_receive_channel.pk, | ||
"integration_unique_data": {}, | ||
"raw_request_data": heartbeat.alert_receive_channel.heartbeat_expired_payload, | ||
}, | ||
) | ||
) | ||
# Update previous_alerted_state_was_life to False | ||
expired_heartbeats.update(previous_alerted_state_was_life=False) | ||
with transaction.atomic(): | ||
# Heartbeat is considered restored if it | ||
# * is enabled, expired, | ||
# * has received a checkup in timeout period from now, | ||
# * was is alerted state (previous_alerted_state_was_life is False) | ||
restored_heartbeats = ( | ||
enabled_heartbeats.select_for_update() | ||
.filter(last_heartbeat_time__gte=(Now() - F("timeout"))) | ||
.filter(previous_alerted_state_was_life=False) | ||
) | ||
# Schedule auto-resolve alert creation for each expired heartbeat after transaction commit | ||
for heartbeat in restored_heartbeats: | ||
transaction.on_commit( | ||
lambda: create_alert.apply_async( | ||
kwargs={ | ||
"title": heartbeat.alert_receive_channel.heartbeat_restored_title, | ||
"message": heartbeat.alert_receive_channel.heartbeat_restored_message, | ||
"image_url": None, | ||
"link_to_upstream_details": None, | ||
"alert_receive_channel_pk": heartbeat.alert_receive_channel.pk, | ||
"integration_unique_data": {}, | ||
"raw_request_data": heartbeat.alert_receive_channel.heartbeat_restored_payload, | ||
}, | ||
) | ||
) | ||
restored_heartbeats.update(previous_alerted_state_was_life=True) | ||
iskhakov marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
IntegrationHeartBeat.perform_heartbeat_check(heartbeat_id, integration_heartbeat_checkup.request.id) | ||
|
||
@shared_dedicated_queue_retry_task() | ||
def integration_heartbeat_checkup(heartbeat_id: int) -> None: | ||
"""Deprecated. TODO: Remove this task after this task cleared from queue""" | ||
pass | ||
|
||
|
||
@shared_dedicated_queue_retry_task() | ||
def process_heartbeat_task(alert_receive_channel_pk): | ||
start = perf_counter() | ||
from apps.heartbeat.models import IntegrationHeartBeat | ||
|
||
with transaction.atomic(): | ||
heartbeats = IntegrationHeartBeat.objects.filter( | ||
alert_receive_channel__pk=alert_receive_channel_pk, | ||
).select_for_update() | ||
if len(heartbeats) == 0: | ||
logger.info(f"Integration Heartbeat for alert_receive_channel {alert_receive_channel_pk} was not found.") | ||
return | ||
else: | ||
heartbeat = heartbeats[0] | ||
heartbeat_selected = perf_counter() | ||
logger.info( | ||
f"IntegrationHeartBeat selected for alert_receive_channel {alert_receive_channel_pk} in {heartbeat_selected - start}" | ||
) | ||
task = integration_heartbeat_checkup.apply_async( | ||
(heartbeat.pk,), | ||
countdown=heartbeat.timeout_seconds + 1, | ||
) | ||
is_touched = heartbeat.last_heartbeat_time is not None | ||
heartbeat.actual_check_up_task_id = task.id | ||
heartbeat.last_heartbeat_time = timezone.now() | ||
update_fields = ["actual_check_up_task_id", "last_heartbeat_time"] | ||
task_started = perf_counter() | ||
logger.info( | ||
f"heartbeat_checkup task started for alert_receive_channel {alert_receive_channel_pk} in {task_started - start}" | ||
) | ||
if is_touched: | ||
state_changed = heartbeat.check_heartbeat_state() | ||
state_checked = perf_counter() | ||
logger.info( | ||
f"state checked for alert_receive_channel {alert_receive_channel_pk} in {state_checked - start}" | ||
) | ||
if state_changed: | ||
update_fields.append("previous_alerted_state_was_life") | ||
heartbeat.save(update_fields=update_fields) | ||
IntegrationHeartBeat.objects.filter( | ||
alert_receive_channel__pk=alert_receive_channel_pk, | ||
).first().update(last_heartbeat=timezone().now()) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be safe to simply drop these two fields in this PR? Or at a minimum remove them from the model without dropping them from the db, so that a subsequent PR we can drop them? (Vadim docs have more info on this)