Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thread leaks in TimerTask for long-executing jobs #639

Closed
waynerobinson opened this issue Mar 21, 2017 · 9 comments · Fixed by #926
Closed

Thread leaks in TimerTask for long-executing jobs #639

waynerobinson opened this issue Mar 21, 2017 · 9 comments · Fixed by #926
Assignees
Labels
bug A bug in the library or documentation. high-priority Should be done ASAP.

Comments

@waynerobinson
Copy link

For long-executing jobs and small execution_interval values, TimerTask will leak threads at the rate of execution_interval.

timeout_interval, whilst documented, doesn't seem to be referenced anywhere. There seems to be an old (but unmerged) PR (#526) that at least solves the problem of timeouts running at the rate of execution_interval instead of timeout_interval.

But that still wouldn't solve the problem of a long-running task still being rescheduled every timeout_interval, even if it couldn't be explicitly killed off. Nor can TimerTask be run without a timeout interval, if you were happy to just let tasks run until they're complete and reschedule them for another period afterwards.

@pitr-ch pitr-ch added this to the 1.1.0 milestone Apr 2, 2017
@pitr-ch pitr-ch added the bug A bug in the library or documentation. label Apr 2, 2017
@pitr-ch
Copy link
Member

pitr-ch commented Apr 2, 2017

Thanks I'll have a look. Unless you would have a time to dig in?

@pitr-ch pitr-ch modified the milestones: 1.1.1, 1.1.0 Apr 17, 2017
@pitr-ch pitr-ch closed this as completed Apr 17, 2017
@pitr-ch pitr-ch reopened this Apr 17, 2017
@pitr-ch
Copy link
Member

pitr-ch commented Apr 17, 2017

Sorry, I don't have time at the moment to investigate.

@shun0309-zz
Copy link

@waynerobinson @pitr-ch Hi guys, I would like to fix the bug. I have reproduced the problem by using the following code:

task = Concurrent::TimerTask.new(execution_interval: 2, timeout_interval: 1) do
  sleep 5
end

task.execute

loop do
  sleeping = Thread.list.count do |t|
    t.status == "sleep"
  end

  puts "total threads #{Thread.list.count}, sleeping #{sleeping}"

  sleep 1
end

I can see the total number of threads is increasing with only two threads are running. One is the main thread, the other is the worker inside the thread pool which executes the task. It can be different from time to time as there are many workers competing for the task.

My initial guess is inside the thread pool, somehow workers are being created constantly. Since I am a total newbie on this project, the source code of timer task and scheduled task is a bit over whelming to me. Would you mind giving me some hints or brief information about the creation of workers?

@shun0309-zz
Copy link

@waynerobinson Sorry, just saw your comments on #526. You mentioned nothing actually seems to kill the running task if timeout. I guess I have to find somewhere to kill the running worker.

@pitr-ch pitr-ch modified the milestones: 1.1.1, 1.1.0 Feb 21, 2018
@pitr-ch pitr-ch added the minor label Jun 18, 2018
@pitr-ch pitr-ch added medium-priority Should be done soon. looking-for-contributor We are looking for a contributor to help with this issue. and removed looking-for-contributor labels Jun 29, 2018
@pitr-ch pitr-ch removed this from the 1.1.0 milestone Jul 6, 2018
rubemz added a commit to rubemz/concurrent-ruby that referenced this issue Jul 13, 2018
…d timer_task timeout specs

Before this commit when a TimerTask task exceeded the timeout the job
kept running. That could lead to thread leaking. Related to ruby-concurrency#639

This PR changes the TimerTask executor to be a SingleThreadExecutor. So
whenever the task doesn't finish in time, the execution thread is killed
(in this case the pool is killed)

The spec ensures that this new behavior is working correctly. It fails
on master. If the main job isn't killed after the timeout, the
latch.wait(2) would raise, since the main task sleeps for 5 seconds.

Closes ruby-concurrency#639
@rubemz
Copy link
Contributor

rubemz commented Jul 13, 2018

@pitr-ch I'm looking at this issue and I'm up to be a contributor. We're using Shoryuken and I believe some problems we're seeing in production may be related to this issue. Please review #749 when you have some time.

rubemz added a commit to rubemz/concurrent-ruby that referenced this issue Jul 13, 2018
…d timer_task timeout specs

Before this commit when a TimerTask task exceeded the timeout the job
kept running. That could lead to thread leaking. Related to ruby-concurrency#639

This PR changes the TimerTask executor to be a SingleThreadExecutor. So
whenever the task doesn't finish in time, the execution thread is killed
(in this case the pool is killed)

The spec ensures that this new behavior is working correctly. It fails
on master. If the main job isn't killed after the timeout, the
latch.wait(2) would raise, since the main task sleeps for 5 seconds.

Closes ruby-concurrency#639
rubemz added a commit to rubemz/concurrent-ruby that referenced this issue Jul 13, 2018
…d timer_task timeout specs

Before this commit when a TimerTask task exceeded the timeout the job
kept running. That could lead to thread leaking. Related to ruby-concurrency#639

This PR changes the TimerTask executor to be a SingleThreadExecutor. So
whenever the task doesn't finish in time, the execution thread is killed
(in this case the pool is killed)

The spec ensures that this new behavior is working correctly. It fails
on master. If the main job isn't killed after the timeout, the
latch.wait(2) would raise, since the main task sleeps for 5 seconds.

Closes ruby-concurrency#639
@pitr-ch
Copy link
Member

pitr-ch commented Jul 13, 2018

Great! Thanks @rubemz. I'll have a look.

@pitr-ch pitr-ch added in progress and removed looking-for-contributor We are looking for a contributor to help with this issue. labels Jul 13, 2018
@pitr-ch pitr-ch added high-priority Should be done ASAP. and removed medium-priority Should be done soon. labels Oct 10, 2018
@ax-slad
Copy link

ax-slad commented May 1, 2019

Any update here? Is this something that may get resolved anytime soon?

@mhenrixon
Copy link

I did the following:

# frozen_string_literal: true

module SidekiqUniqueJobs
  # @see [Concurrent::TimerTask] https://www.rubydoc.info/gems/concurrent-ruby/Concurrent/TimerTask
  #
  class TimerTask < ::Concurrent::TimerTask
    private

    def ns_initialize(opts, &task)
      set_deref_options(opts)

      self.execution_interval = opts[:execution] || opts[:execution_interval] || EXECUTION_INTERVAL
      self.timeout_interval = opts[:timeout] || opts[:timeout_interval] || TIMEOUT_INTERVAL
      @run_now  = opts[:now] || opts[:run_now]
      @executor = Concurrent::RubySingleThreadExecutor.new
      @running  = Concurrent::AtomicBoolean.new(false)
      @task     = task
      @value    = nil

      self.observers = Concurrent::Collection::CopyOnNotifyObserverSet.new
    end

    # @!visibility private
    def execute_task(completion) # rubocop:disable Metrics/MethodLength
      return nil unless @running.true?

      timeout_task = ->{ timeout_task(completion) }

      Concurrent::ScheduledTask.execute(
        timeout_interval,
        args: [completion],
        &timeout_task
      )
      @thread_completed = Concurrent::Event.new

      @value = @reason  = nil
      @executor.post do
        @value = @task.call(self)
      rescue Exception => ex # rubocop:disable Lint/RescueException
        @reason = ex
      ensure
        @thread_completed.set
      end

      @thread_completed.wait

      if completion.try?
        schedule_next_task
        time = Time.now
        observers.notify_observers do
          [time, value, @reason]
        end
      end
      nil
    end

    # @!visibility private
    def timeout_task(completion)
      return unless @running.true?
      return unless completion.try?

      @executor.kill
      @executor.wait_for_termination
      @executor = Concurrent::RubySingleThreadExecutor.new

      @thread_completed.set

      schedule_next_task
      observers.notify_observers(Time.now, nil, Concurrent::TimeoutError.new)
    end
  end
end

It seems to fix the issue for me for now. Any chance we could see something like this implemented in the actual TimerTask? Doesn't seem correct that a timer task should ever be allowed to use more than one thread? mhenrixon/sidekiq-unique-jobs#576

@jacobat
Copy link
Contributor

jacobat commented Oct 27, 2021

I think the right approach here would be to remove the timeout entirely from the TimerTask class. Given that the TimerTask doesn't actually do anything after the timeout has passed it doesn't make sense that TimerTask exists. And as described the timeout is causing thread leakage. In #749 it's mentioned that killing threads on timeout is not an option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug in the library or documentation. high-priority Should be done ASAP.
Projects
None yet
7 participants