-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed: Queued jobs are not considered in deferring logic #7907
Changes from 31 commits
bc920a8
683420a
7d095f4
e7fc367
e272452
79d2c87
566b908
0ac46f8
7ac72dd
d1cb317
29bbb44
74393e6
43bbc97
8ed8c49
3bbc2c9
98a00b4
0536e0d
fda8d89
ed0cc01
9a35fa9
c082802
9e7f6ae
57616e9
98fa00b
c2de3e3
d5a57cf
fbff394
bf054ce
89a8b6c
b332d9c
6af94a7
c7955ca
8a86065
8f43503
959d340
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
# Copyright (C) 2018-2022 Intel Corporation | ||
# Copyright (C) 2022-2023 CVAT.ai Corporation | ||
# Copyright (C) 2022-2024 CVAT.ai Corporation | ||
# | ||
# SPDX-License-Identifier: MIT | ||
|
||
|
@@ -14,8 +14,11 @@ | |
from datetime import datetime | ||
from tempfile import NamedTemporaryFile | ||
from textwrap import dedent | ||
from contextlib import suppress | ||
|
||
import django_rq | ||
from rq.command import send_stop_job_command | ||
from rq.exceptions import InvalidJobOperation | ||
from attr.converters import to_bool | ||
from django.conf import settings | ||
from django.contrib.auth.models import User | ||
|
@@ -2965,6 +2968,7 @@ def _export_annotations( | |
tasks_update = list(map(lambda db_task: timezone.localtime(db_task.updated_date), db_instance.tasks.all())) | ||
last_instance_update_time = max(tasks_update + [last_instance_update_time]) | ||
|
||
|
||
timestamp = datetime.strftime(last_instance_update_time, "%Y_%m_%d_%H_%M_%S") | ||
is_annotation_file = rq_id.startswith('export:annotations') | ||
|
||
|
@@ -2975,6 +2979,8 @@ def _export_annotations( | |
# in case the server is configured with ONE_RUNNING_JOB_IN_QUEUE_PER_USER | ||
# we have to enqueue dependent jobs after canceling one | ||
rq_job.cancel(enqueue_dependents=settings.ONE_RUNNING_JOB_IN_QUEUE_PER_USER) | ||
with suppress(InvalidJobOperation): | ||
send_stop_job_command(rq_job.connection, rq_job.id) | ||
bsekachev marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ensure proper handling of The function There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Really? Check line number 20 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bsekachev, you are correct. The Analysis chainScripts executedThe following scripts were executed for the analysis: Script: #!/bin/bash
# Search for the definition or import of 'send_stop_job_command' in the file 'cvat/apps/engine/views.py'
rg "send_stop_job_command" cvat/apps/engine/views.py
Length of output: 165 New Learnings Added
|
||
rq_job.delete() | ||
else: | ||
if rq_job.is_finished: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,16 +1,34 @@ | ||
# Copyright (C) 2018-2022 Intel Corporation | ||
# Copyright (C) 2022-2023 CVAT.ai Corporation | ||
# Copyright (C) 2022-2024 CVAT.ai Corporation | ||
# | ||
# SPDX-License-Identifier: MIT | ||
|
||
import os | ||
|
||
import signal | ||
from rq import Worker | ||
from rq.worker import StopRequested | ||
|
||
import cvat.utils.remote_debugger as debug | ||
|
||
class CVATWorker(Worker): | ||
# may be called from work horse's perform_job::except block | ||
# or from parent's Worker::monitor_work_horse_process | ||
# if parent process sees that work-horse is dead | ||
|
||
# This modification ensures that jobs stopped intentionally | ||
# do not get their status updated or placed in the failed registry | ||
# as the main server code must delete them at all | ||
def handle_job_failure(self, job, queue, **kwargs): | ||
# pylint: disable=access-member-before-definition | ||
if self._stopped_job_id == job.id: | ||
self._stopped_job_id = None | ||
self.set_current_job_id(None) | ||
else: | ||
super().handle_job_failure(job, queue, **kwargs) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I see one problem here - a started rq job can also be stopped from django admin panel and with these changes such a job won't be handled correctly. |
||
|
||
|
||
DefaultWorker = Worker | ||
|
||
DefaultWorker = CVATWorker | ||
|
||
|
||
class BaseDeathPenalty: | ||
|
@@ -24,7 +42,7 @@ def __exit__(self, exc_type, exc_value, traceback): | |
pass | ||
|
||
|
||
class SimpleWorker(Worker): | ||
class SimpleWorker(CVATWorker): | ||
""" | ||
Allows to work with at most 1 worker thread. Useful for debugging. | ||
""" | ||
|
@@ -46,6 +64,27 @@ def execute_job(self, *args, **kwargs): | |
|
||
return self.perform_job(*args, **kwargs) | ||
|
||
def kill_horse(self, sig: signal.Signals = signal.SIGTERM): | ||
# In debug mode we send SIGTERM instead of default SIGKILL | ||
# Because SIGKILL is not handled (and can't be handled) by RQ code and it kills debug process from VSCode | ||
# All three signals (SIGKILL, SIGTERM, SIGINT) are regularly used at RQ code | ||
super().kill_horse(sig) | ||
|
||
def handle_exception(self, *args, **kwargs): | ||
# In production environment it sends SIGKILL to work horse process and this method never called | ||
# But for development we overrided the signal and it sends SIGTERM to the process | ||
# This modification ensures that exceptions are handled differently | ||
# when the job is stopped intentionally, avoiding incorrect exception handling. | ||
|
||
# PROBLEM: default "handle_exception" code saves meta with exception | ||
# It leads to bugs: | ||
# - we create another job with the same ID in the server process | ||
# - when exception message is saved in worker code, it also saves outdated datetime value as part of meta information | ||
# - this outdated value then used in server code | ||
is_stopped_export_job = isinstance(args[2], (StopRequested, SystemExit)) | ||
if not is_stopped_export_job: | ||
super().handle_exception(*args, **kwargs) | ||
bsekachev marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
|
||
if debug.is_debugging_enabled(): | ||
class RemoteDebugWorker(SimpleWorker): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added these settings previously, but now it looks like they bring more problems than profit