Thread leaks in TimerTask for long-executing jobs #639

waynerobinson · 2017-03-21T07:26:43Z

For long-executing jobs and small execution_interval values, TimerTask will leak threads at the rate of execution_interval.

timeout_interval, whilst documented, doesn't seem to be referenced anywhere. There seems to be an old (but unmerged) PR (#526) that at least solves the problem of timeouts running at the rate of execution_interval instead of timeout_interval.

But that still wouldn't solve the problem of a long-running task still being rescheduled every timeout_interval, even if it couldn't be explicitly killed off. Nor can TimerTask be run without a timeout interval, if you were happy to just let tasks run until they're complete and reschedule them for another period afterwards.

The text was updated successfully, but these errors were encountered:

pitr-ch · 2017-04-02T18:26:55Z

Thanks I'll have a look. Unless you would have a time to dig in?

pitr-ch · 2017-04-17T10:47:32Z

Sorry, I don't have time at the moment to investigate.

shun0309-zz · 2017-06-20T15:18:26Z

@waynerobinson @pitr-ch Hi guys, I would like to fix the bug. I have reproduced the problem by using the following code:

task = Concurrent::TimerTask.new(execution_interval: 2, timeout_interval: 1) do
  sleep 5
end

task.execute

loop do
  sleeping = Thread.list.count do |t|
    t.status == "sleep"
  end

  puts "total threads #{Thread.list.count}, sleeping #{sleeping}"

  sleep 1
end

I can see the total number of threads is increasing with only two threads are running. One is the main thread, the other is the worker inside the thread pool which executes the task. It can be different from time to time as there are many workers competing for the task.

My initial guess is inside the thread pool, somehow workers are being created constantly. Since I am a total newbie on this project, the source code of timer task and scheduled task is a bit over whelming to me. Would you mind giving me some hints or brief information about the creation of workers?

shun0309-zz · 2017-06-20T15:35:29Z

@waynerobinson Sorry, just saw your comments on #526. You mentioned nothing actually seems to kill the running task if timeout. I guess I have to find somewhere to kill the running worker.

…d timer_task timeout specs Before this commit when a TimerTask task exceeded the timeout the job kept running. That could lead to thread leaking. Related to ruby-concurrency#639 This PR changes the TimerTask executor to be a SingleThreadExecutor. So whenever the task doesn't finish in time, the execution thread is killed (in this case the pool is killed) The spec ensures that this new behavior is working correctly. It fails on master. If the main job isn't killed after the timeout, the latch.wait(2) would raise, since the main task sleeps for 5 seconds. Closes ruby-concurrency#639

rubemz · 2018-07-13T15:53:31Z

@pitr-ch I'm looking at this issue and I'm up to be a contributor. We're using Shoryuken and I believe some problems we're seeing in production may be related to this issue. Please review #749 when you have some time.

…d timer_task timeout specs Before this commit when a TimerTask task exceeded the timeout the job kept running. That could lead to thread leaking. Related to ruby-concurrency#639 This PR changes the TimerTask executor to be a SingleThreadExecutor. So whenever the task doesn't finish in time, the execution thread is killed (in this case the pool is killed) The spec ensures that this new behavior is working correctly. It fails on master. If the main job isn't killed after the timeout, the latch.wait(2) would raise, since the main task sleeps for 5 seconds. Closes ruby-concurrency#639

pitr-ch · 2018-07-13T20:40:46Z

Great! Thanks @rubemz. I'll have a look.

ax-slad · 2019-05-01T00:57:56Z

Any update here? Is this something that may get resolved anytime soon?

mhenrixon · 2021-02-15T11:13:34Z

I did the following:

# frozen_string_literal: true

module SidekiqUniqueJobs
  # @see [Concurrent::TimerTask] https://www.rubydoc.info/gems/concurrent-ruby/Concurrent/TimerTask
  #
  class TimerTask < ::Concurrent::TimerTask
    private

    def ns_initialize(opts, &task)
      set_deref_options(opts)

      self.execution_interval = opts[:execution] || opts[:execution_interval] || EXECUTION_INTERVAL
      self.timeout_interval = opts[:timeout] || opts[:timeout_interval] || TIMEOUT_INTERVAL
      @run_now  = opts[:now] || opts[:run_now]
      @executor = Concurrent::RubySingleThreadExecutor.new
      @running  = Concurrent::AtomicBoolean.new(false)
      @task     = task
      @value    = nil

      self.observers = Concurrent::Collection::CopyOnNotifyObserverSet.new
    end

    # @!visibility private
    def execute_task(completion) # rubocop:disable Metrics/MethodLength
      return nil unless @running.true?

      timeout_task = ->{ timeout_task(completion) }

      Concurrent::ScheduledTask.execute(
        timeout_interval,
        args: [completion],
        &timeout_task
      )
      @thread_completed = Concurrent::Event.new

      @value = @reason  = nil
      @executor.post do
        @value = @task.call(self)
      rescue Exception => ex # rubocop:disable Lint/RescueException
        @reason = ex
      ensure
        @thread_completed.set
      end

      @thread_completed.wait

      if completion.try?
        schedule_next_task
        time = Time.now
        observers.notify_observers do
          [time, value, @reason]
        end
      end
      nil
    end

    # @!visibility private
    def timeout_task(completion)
      return unless @running.true?
      return unless completion.try?

      @executor.kill
      @executor.wait_for_termination
      @executor = Concurrent::RubySingleThreadExecutor.new

      @thread_completed.set

      schedule_next_task
      observers.notify_observers(Time.now, nil, Concurrent::TimeoutError.new)
    end
  end
end

It seems to fix the issue for me for now. Any chance we could see something like this implemented in the actual TimerTask? Doesn't seem correct that a timer task should ever be allowed to use more than one thread? mhenrixon/sidekiq-unique-jobs#576

jacobat · 2021-10-27T13:52:47Z

I think the right approach here would be to remove the timeout entirely from the TimerTask class. Given that the TimerTask doesn't actually do anything after the timeout has passed it doesn't make sense that TimerTask exists. And as described the timeout is causing thread leakage. In #749 it's mentioned that killing threads on timeout is not an option.

pitr-ch added this to the 1.1.0 milestone Apr 2, 2017

pitr-ch added the bug A bug in the library or documentation. label Apr 2, 2017

pitr-ch mentioned this issue Apr 2, 2017

TimerTask timeout_interval unused #525

Closed

pitr-ch added the looking-for-contributor label Apr 17, 2017

pitr-ch modified the milestones: 1.1.1, 1.1.0 Apr 17, 2017

pitr-ch closed this as completed Apr 17, 2017

pitr-ch reopened this Apr 17, 2017

pitr-ch modified the milestones: 1.1.1, 1.1.0 Feb 21, 2018

bnferguson mentioned this issue May 25, 2018

Thread leak in logs-drain (potentially due to TimerTask bug) travis-ci/travis-logs#178

Open

pitr-ch added the minor label Jun 18, 2018

pitr-ch added medium-priority Should be done soon. looking-for-contributor We are looking for a contributor to help with this issue. and removed looking-for-contributor labels Jun 29, 2018

pitr-ch removed this from the 1.1.0 milestone Jul 6, 2018

rubemz mentioned this issue Jul 13, 2018

TimerTask: kill the execution when it exceeds the task timeout #749

Closed

pitr-ch added in progress and removed looking-for-contributor We are looking for a contributor to help with this issue. labels Jul 13, 2018

pitr-ch assigned rubemz Jul 15, 2018

pitr-ch added high-priority Should be done ASAP. and removed medium-priority Should be done soon. labels Oct 10, 2018

pitr-ch removed the in progress label May 17, 2019

markiz mentioned this issue Jun 12, 2020

Infinite loop in ruby reaper mhenrixon/sidekiq-unique-jobs#515

Closed

mhenrixon mentioned this issue Feb 10, 2021

Please try v7.0.10 mhenrixon/sidekiq-unique-jobs#571

Closed

jacobat mentioned this issue Oct 27, 2021

Remove timeout from TimerTask #926

Merged

sl0thentr0py mentioned this issue Feb 18, 2022

feat(sessions): Implement automatic session tracking getsentry/sentry-ruby#1715

Merged

9 tasks

chrisseaton closed this as completed in #926 Mar 7, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Thread leaks in TimerTask for long-executing jobs #639

Thread leaks in TimerTask for long-executing jobs #639

waynerobinson commented Mar 21, 2017

pitr-ch commented Apr 2, 2017

pitr-ch commented Apr 17, 2017

shun0309-zz commented Jun 20, 2017

shun0309-zz commented Jun 20, 2017

rubemz commented Jul 13, 2018 •

edited

Loading

pitr-ch commented Jul 13, 2018

ax-slad commented May 1, 2019

mhenrixon commented Feb 15, 2021

jacobat commented Oct 27, 2021 •

edited

Loading

Thread leaks in TimerTask for long-executing jobs #639

Thread leaks in TimerTask for long-executing jobs #639

Comments

waynerobinson commented Mar 21, 2017

pitr-ch commented Apr 2, 2017

pitr-ch commented Apr 17, 2017

shun0309-zz commented Jun 20, 2017

shun0309-zz commented Jun 20, 2017

rubemz commented Jul 13, 2018 • edited Loading

pitr-ch commented Jul 13, 2018

ax-slad commented May 1, 2019

mhenrixon commented Feb 15, 2021

jacobat commented Oct 27, 2021 • edited Loading

rubemz commented Jul 13, 2018 •

edited

Loading

jacobat commented Oct 27, 2021 •

edited

Loading