-
Notifications
You must be signed in to change notification settings - Fork 420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Thread leaks in TimerTask for long-executing jobs #639
Comments
Thanks I'll have a look. Unless you would have a time to dig in? |
Sorry, I don't have time at the moment to investigate. |
@waynerobinson @pitr-ch Hi guys, I would like to fix the bug. I have reproduced the problem by using the following code: task = Concurrent::TimerTask.new(execution_interval: 2, timeout_interval: 1) do
sleep 5
end
task.execute
loop do
sleeping = Thread.list.count do |t|
t.status == "sleep"
end
puts "total threads #{Thread.list.count}, sleeping #{sleeping}"
sleep 1
end I can see the total number of threads is increasing with only two threads are running. One is the main thread, the other is the worker inside the thread pool which executes the task. It can be different from time to time as there are many workers competing for the task. My initial guess is inside the thread pool, somehow workers are being created constantly. Since I am a total newbie on this project, the source code of timer task and scheduled task is a bit over whelming to me. Would you mind giving me some hints or brief information about the creation of workers? |
@waynerobinson Sorry, just saw your comments on #526. You mentioned nothing actually seems to kill the running task if timeout. I guess I have to find somewhere to kill the running worker. |
…d timer_task timeout specs Before this commit when a TimerTask task exceeded the timeout the job kept running. That could lead to thread leaking. Related to ruby-concurrency#639 This PR changes the TimerTask executor to be a SingleThreadExecutor. So whenever the task doesn't finish in time, the execution thread is killed (in this case the pool is killed) The spec ensures that this new behavior is working correctly. It fails on master. If the main job isn't killed after the timeout, the latch.wait(2) would raise, since the main task sleeps for 5 seconds. Closes ruby-concurrency#639
…d timer_task timeout specs Before this commit when a TimerTask task exceeded the timeout the job kept running. That could lead to thread leaking. Related to ruby-concurrency#639 This PR changes the TimerTask executor to be a SingleThreadExecutor. So whenever the task doesn't finish in time, the execution thread is killed (in this case the pool is killed) The spec ensures that this new behavior is working correctly. It fails on master. If the main job isn't killed after the timeout, the latch.wait(2) would raise, since the main task sleeps for 5 seconds. Closes ruby-concurrency#639
…d timer_task timeout specs Before this commit when a TimerTask task exceeded the timeout the job kept running. That could lead to thread leaking. Related to ruby-concurrency#639 This PR changes the TimerTask executor to be a SingleThreadExecutor. So whenever the task doesn't finish in time, the execution thread is killed (in this case the pool is killed) The spec ensures that this new behavior is working correctly. It fails on master. If the main job isn't killed after the timeout, the latch.wait(2) would raise, since the main task sleeps for 5 seconds. Closes ruby-concurrency#639
Great! Thanks @rubemz. I'll have a look. |
Any update here? Is this something that may get resolved anytime soon? |
I did the following: # frozen_string_literal: true
module SidekiqUniqueJobs
# @see [Concurrent::TimerTask] https://www.rubydoc.info/gems/concurrent-ruby/Concurrent/TimerTask
#
class TimerTask < ::Concurrent::TimerTask
private
def ns_initialize(opts, &task)
set_deref_options(opts)
self.execution_interval = opts[:execution] || opts[:execution_interval] || EXECUTION_INTERVAL
self.timeout_interval = opts[:timeout] || opts[:timeout_interval] || TIMEOUT_INTERVAL
@run_now = opts[:now] || opts[:run_now]
@executor = Concurrent::RubySingleThreadExecutor.new
@running = Concurrent::AtomicBoolean.new(false)
@task = task
@value = nil
self.observers = Concurrent::Collection::CopyOnNotifyObserverSet.new
end
# @!visibility private
def execute_task(completion) # rubocop:disable Metrics/MethodLength
return nil unless @running.true?
timeout_task = ->{ timeout_task(completion) }
Concurrent::ScheduledTask.execute(
timeout_interval,
args: [completion],
&timeout_task
)
@thread_completed = Concurrent::Event.new
@value = @reason = nil
@executor.post do
@value = @task.call(self)
rescue Exception => ex # rubocop:disable Lint/RescueException
@reason = ex
ensure
@thread_completed.set
end
@thread_completed.wait
if completion.try?
schedule_next_task
time = Time.now
observers.notify_observers do
[time, value, @reason]
end
end
nil
end
# @!visibility private
def timeout_task(completion)
return unless @running.true?
return unless completion.try?
@executor.kill
@executor.wait_for_termination
@executor = Concurrent::RubySingleThreadExecutor.new
@thread_completed.set
schedule_next_task
observers.notify_observers(Time.now, nil, Concurrent::TimeoutError.new)
end
end
end It seems to fix the issue for me for now. Any chance we could see something like this implemented in the actual TimerTask? Doesn't seem correct that a timer task should ever be allowed to use more than one thread? mhenrixon/sidekiq-unique-jobs#576 |
I think the right approach here would be to remove the timeout entirely from the TimerTask class. Given that the TimerTask doesn't actually do anything after the timeout has passed it doesn't make sense that TimerTask exists. And as described the timeout is causing thread leakage. In #749 it's mentioned that killing threads on timeout is not an option. |
For long-executing jobs and small
execution_interval
values,TimerTask
will leak threads at the rate ofexecution_interval
.timeout_interval
, whilst documented, doesn't seem to be referenced anywhere. There seems to be an old (but unmerged) PR (#526) that at least solves the problem of timeouts running at the rate ofexecution_interval
instead oftimeout_interval
.But that still wouldn't solve the problem of a long-running task still being rescheduled every
timeout_interval
, even if it couldn't be explicitly killed off. Nor canTimerTask
be run without a timeout interval, if you were happy to just let tasks run until they're complete and reschedule them for another period afterwards.The text was updated successfully, but these errors were encountered: