Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add options to purge big workers #3252

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

TyeMcQueen
Copy link

Described in Issue #3251

@TyeMcQueen TyeMcQueen force-pushed the 3251-purge-big-workers branch 4 times, most recently from 4094992 to d7b3c4d Compare July 30, 2024 09:20
@TyeMcQueen TyeMcQueen force-pushed the 3251-purge-big-workers branch from d7b3c4d to 6e1ca03 Compare July 30, 2024 09:23
Comment on lines +613 to +615
worker = self.worker_class(
self.worker_age, self.pid, self.LISTENERS, self.app,
self.timeout / 2.0, self.cfg, self.log)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like unrelated whitespace changes

elif self.next_prune <= time.monotonic():
self.prune_worker()
self.next_prune += self.cfg.prune_seconds * (
0.95 + 0.10 * random.random())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does not appear to match documentation: 0.95+0.10 = 1.05 > 100%. But the general idea of averaging out late firing timers seems much more reasonable, so I'd change documentation to clarify that its not the pause between invocations but the average interval.

Comment on lines +207 to +213
if 0 < self.cfg.prune_seconds:
if self.next_prune is None:
self.next_prune = time.monotonic() + self.cfg.prune_seconds
elif self.next_prune <= time.monotonic():
self.prune_worker()
self.next_prune += self.cfg.prune_seconds * (
0.95 + 0.10 * random.random())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to put all this logic into prune_worker? I mean, something like:

    def prune_worker(self):
        """\
        Kill the worker with highest prune score
        """
        if self.cfg.prune_seconds <= 0:
            return

        if self.next_prune is None:
            self.next_prune = time.monotonic() + self.cfg.prune_seconds

        if time.monotonic() < self.next_prune:
            return
        
        maxi = self.cfg.prune_floor
        victim = 0
        for pid in list(self.WORKERS.keys()):
            score = self.cfg.prune_function(pid)
            if maxi < score:
                maxi = score
                victim = pid
        if victim != 0:
            self.log.info(f"Pruning worker (pid: {victim}) with score {score}")
            self.kill_worker(victim, signal.SIGTERM)

        self.next_prune += self.cfg.prune_seconds * random.uniform(0.95, 1.05)

... and then call self.prune_worker() just after self.maybe_promote_master()? This has the benefit of keeping all worker prune logic at the same place.

I had two other minor suggestions as well (incorporated in the proposed method above):

  • Would random.uniform(0.95, 1.05) work instead of 0.95 + 0.10 * random.random()? I recon it's probably the same, but the former is a little bit shorter.
  • Since we're not interested in the workers themselves but only their PIDs, perhaps calling self.WORKERS.keys() would suffice?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants