You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When there are a few thousand tasks in the config, beat becomes completely unreliable and unstable. Reconfiguration, syncing takes a lot of time, and with frequent schedule changes, hours can pass by without sending any task to the queue.
Celery Version: 5.3.4
Celery-Beat Version: 2.5.0
Exact steps to reproduce the issue:
Register 10k tasks with any config, (we only use CronSchedule)
Apply a change in the configuration in random intervals, 5s-300s, e.g. enabling or disabling tasks, or change the cron schedule.
Actually we tried to run a forked version with the above modifications, but it was just not enough, it is still very slow and unreliable.
Do you have any general ideas what should be the correct way to continue the investigation?
This "Writing entries..." part takes most of the time, and it is not clearly database related, I can see a lot of SELECT "django_celery_beat_periodictasks"."ident", "django_celery_beat_periodictasks"."last_update" FROM "django_celery_beat_periodictasks" WHERE "django_celery_beat_periodictasks"."ident" = 1 LIMIT 21 which I assume is def last_change(...) , which is called in def schedule_changed(...) which is called from the @property def schedule(...) which is a bit harder to track.
Are you intested in applying performance related optimisations?
Should we run multiple instances of beat, maybe with slices of the whole schedule? Are there any caveats?
Any other thoughts?
The text was updated successfully, but these errors were encountered:
Summary:
When there are a few thousand tasks in the config, beat becomes completely unreliable and unstable. Reconfiguration, syncing takes a lot of time, and with frequent schedule changes, hours can pass by without sending any task to the queue.
Exact steps to reproduce the issue:
and beat is not applying any tasks.
Detailed information
A few parts are clearly written without performance considerations, like:
def all_as_schedule(...)
could use the prefetch related api https://docs.djangoproject.com/en/4.2/ref/models/querysets/#prefetch-related
def sync(...)
could use the bulk update api https://docs.djangoproject.com/en/4.2/ref/models/querysets/#django.db.models.query.QuerySet.bulk_update
Actually we tried to run a forked version with the above modifications, but it was just not enough, it is still very slow and unreliable.
Do you have any general ideas what should be the correct way to continue the investigation?
This "Writing entries..." part takes most of the time, and it is not clearly database related, I can see a lot of
SELECT "django_celery_beat_periodictasks"."ident", "django_celery_beat_periodictasks"."last_update" FROM "django_celery_beat_periodictasks" WHERE "django_celery_beat_periodictasks"."ident" = 1 LIMIT 21
which I assume isdef last_change(...)
, which is called indef schedule_changed(...)
which is called from the@property def schedule(...)
which is a bit harder to track.The text was updated successfully, but these errors were encountered: