Timed out after 0s while waiting for primed token #601

joshhubers · 2021-05-05T20:55:33Z

Describe the bug
When executing some workers we are getting a Timed out after 0s while waiting for primed token warning.

Expected behavior
Workers should be executed appropriately.

Current behavior
It appears that jobs are started, the warning is thrown, no worker code is executed, and then the job finishes without executing.

Worker class

class PublishWorker
  include Sidekiq::Worker

  sidekiq_options lock: :until_and_while_executing
  # This lock allows one instance of this job with a particuar argument
  # to be in the queue and one to be executing.

  def perform(live_object)
    return if cancelled?

    FeedService::Publish.call(live_object)
  end

  def cancelled?
    REDIS.exists?("sidekiq-cancelled-#{jid}")
  end

  def self.cancel!(jid)
    REDIS.setex("sidekiq-cancelled-#{jid}", 86_400, 1)
  end
end

Additional context
This warning and lack of execution is only happen sometimes but not always. Inside of FeedService::Publish.call(live_object) we created nested workers from an iteration. The number of nested workers range from 1~20. They are also until_and_while_executing locks. Usually what happens is the first few nested workers will execute and the rest will not perform. Sometimes even the "top level" worker above however does not perform.

Configuration:

        ::Sidekiq.configure_client do |config|
          config.client_middleware do |chain|
            chain.add SidekiqUniqueJobs::Middleware::Client
            chain.add Sidekiq::Status::ClientMiddleware, expiration: 30.minutes
          end
        end

         ::Sidekiq.configure_server do |config|
          config.client_middleware do |chain|
            chain.add SidekiqUniqueJobs::Middleware::Client
            chain.add Sidekiq::Status::ClientMiddleware, expiration: 30.minutes
          end

          config.server_middleware do |chain|
            chain.add Sidekiq::Status::ServerMiddleware, expiration: 30.minutes
            chain.add SidekiqUniqueJobs::Middleware::Server
            chain.add Sidekiq::Middleware::Server::Statsd
          end

          SidekiqUniqueJobs::Server.configure(config)
        end

version 7.0.9

Any pointers or thoughts would be immensely appreciated! Thank you for your hard work on this spectacular gem!

The text was updated successfully, but these errors were encountered:

mhenrixon · 2021-05-06T06:59:10Z

Hi @joshhubers, this is actually a feature, not a bug. Please see this comment #571 (comment).

This happens for locks that have a specific or unique job during execution. If the server process can't achieve a lock in the allowed time, it logs a warning. I am looking into providing a better mechanism for when this occurs.

In general, I am looking at letting the users subscribe to an event and deal with it any way they want. Like I wrote in the linked comment, perhaps the log level is wrong, which people find confusing.

Anyway, I will improve this next. I have just been swamped with paid work for a while.

joshhubers · 2021-05-06T13:50:21Z

Thank for for the quick response.

I had read that comment earlier, I guess I'm just struggling to understand why it's happening and what I can do to resolve the issue?

Are there other logs that describe the root of the issue I'm missing? Are the client or server queues full for some reason? Am I submitting multiple jobs with non-unique arguments which cause multiple jobs with the same lock digest?

Just trying to figure out the root of the issue. Would setting lock_timeout value to say even 1 second help?

Thank you for all your hard work.

mhenrixon · 2021-05-06T15:25:24Z

It might help in certain scenarios. It depends on how many duplicates are coming in at the same time etc.

In your example above you'd just slow things down a little.

There is conflict resolution you can configure and such things.

rasto28 · 2021-05-07T13:54:29Z

Hello,

we have similar issue with this warning logs, but changing lock_timeout in settings does not help. What helps us was setting up lock_ttl (we were using default value).

When logging config value in warn_about_timeout method:

def warn_about_timeout
   log_warn(config.inspect)
   log_warn("Timed out after #{config.timeout}s while waiting for primed token (digest: #{key}, job_id: #{job_id})")
end

I got:

#<SidekiqUniqueJobs::LockConfig:0x00007fd5ea9c3818 @type=:until_executed, 
@worker=PeriodicWorkflowsJob, @limit=1, @timeout=30, @ttl=0, @pttl=0, 
@lock_info=true, @on_conflict=nil, @errors={}, @on_client_conflict=nil, @on_server_conflict=nil>

Then I notice value of @ttl=0. Should this be nil? With value set to 0, will the lock expire immediately?

Changing config to lock_ttl = 2.weeks helps us solve the problem.

joshhubers · 2021-05-07T16:27:50Z

I've been looking at this issue all week and I can't seem to find the root of the issue. I've tried logging out the worker arguments that are hashed into the unique lock digest and all the arguments are unique, yet somewhere something is locking the worker out from performing.

From logging I can see I start about 10 workers with unique arguments, it's flaky, but about the first 2/3rds will execute, and then I get the Timed out after ... warning and the rest of the workers don't execute.

I've tried looking in the web interface for anything that would stick out, but I haven't found anything. No exceptions from the workers, no retries occurring, no duplicated arguments or repeated performs. I'm stumped.

I have yet to verify it, but it seems like the issue started when we upgraded from version 6 to 7.

mhenrixon · 2021-05-07T16:32:57Z

#571 (comment)

mhenrixon · 2021-05-10T07:30:07Z

It should hopefully be solved in v7.0.10 sorry about the trouble @joshhubers + @rasto28 and thank you for the detailed report.

joshhubers · 2021-05-10T13:39:28Z

Fantastic! That was fast! No, Thank you @mhenrixon !

brycemelvin · 2021-05-20T17:29:26Z

Hi @mhenrixon,
We are still seeing these logs in our production environment after upgrading to v7.0.10 last night - same case where our team was reporting intermittent processing and it seems related to the fact that we do have some nested unique sidekiq jobs (i.e. a order processing job that then triggers another metrics recalculation job both of which are set to until_and_while_executing).

2021-05-20T17:15:25.496Z 28169 TID-osbm440ht Sidekiq::Extensions::DelayedClass JID-b2bc8c59f029ab12c403b4f8 uniquejobs-client DIG-uniquejobs:c9dc4ca3139aa56673ab4de8c692c933 WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:c9dc4ca3139aa56673ab4de8c692c933, job_id: 7bb14d6d36716578e48fae43)
2021-05-20T17:15:44.253Z 28392 TID-oscpr4qng PrerenderFbWorker JID-3ffa0aeb224251f9e85fa45c uniquejobs-server DIG-uniquejobs:833987ddbb7f2227771b8a4df3751c18 uniquejobs-server DIG-uniquejobs:833987ddbb7f2227771b8a4df3751c18:RUN PrerenderFbWorker uniquejobs-client DIG-uniquejobs:ea737b56400ad169bf00d0de6762be26 WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:ea737b56400ad169bf00d0de6762be26, job_id: c8a0f9a38dbb74cd64c15092)
2021-05-20T17:16:18.095Z 28060 TID-ov6zl7pww PrerenderFbWorker JID-9c37880686f490dd3717e88b uniquejobs-server DIG-uniquejobs:8e1dd960e6fab9604c6e49e7d42cbc01 uniquejobs-server DIG-uniquejobs:8e1dd960e6fab9604c6e49e7d42cbc01:RUN PrerenderFbWorker uniquejobs-client DIG-uniquejobs:4181fe1e2e332735de1773d7b48c86f1 WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:4181fe1e2e332735de1773d7b48c86f1, job_id: df542d18cf0d64a4291eb1ca)
2021-05-20T17:18:11.679Z 28392 TID-oscpr4qng PrerenderFbWorker JID-f16fd8f1f9d47859960c67a2 uniquejobs-server DIG-uniquejobs:95da17edacfc5aa8d4ba7268606fbdbb uniquejobs-server DIG-uniquejobs:95da17edacfc5aa8d4ba7268606fbdbb:RUN PrerenderFbWorker uniquejobs-client DIG-uniquejobs:856fbf1387506e4a457943e26866f1b7 WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:856fbf1387506e4a457943e26866f1b7, job_id: 7f1c1a8f373806242a39bad3)
2021-05-20T17:18:54.505Z 28392 TID-oscpr4kkw PrerenderFbWorker JID-240eafc8300d72c207dcac52 uniquejobs-server DIG-uniquejobs:75eb03da615cad607bd511d407880320 uniquejobs-server DIG-uniquejobs:75eb03da615cad607bd511d407880320:RUN PrerenderFbWorker uniquejobs-client DIG-uniquejobs:6cef6477a5e55f6ccc8424ff2c4c9f8b WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:6cef6477a5e55f6ccc8424ff2c4c9f8b, job_id: 65ea787d950f4d0b3081781a)

Config

SidekiqUniqueJobs.configure do |config|
  config.enabled         = !Rails.env.test?
  config.debug_lua       = Rails.env.prod? ? false : true # true for debugging
  config.lock_info       = Rails.env.prod? ? false : true # true for debugging
  config.max_history     = 500  # keeps n number of changelog entries
  config.reaper          = :ruby # also :lua but that will lock while cleaning
  config.reaper_count    = 1000  # Reap maximum this many orphaned locks
  config.reaper_interval = 600   # Reap every 10 minutes
  config.reaper_timeout  = 150   # Give the reaper 2.5 minutes to finish
end

Worker

  log_payload = Rails.env.prod? ? false : true

  sidekiq_options lock: :until_and_while_executing,
                  log_duplicate: log_payload,
                  queue: Sidekiq::Queues::DEFAULT_QUEUE,
                  lock_args_method: ->(args) { [ args.first ] }

We are still running sidekiq 5.2.9. Any thoughts or suggestions?

brycemelvin · 2021-05-20T20:27:15Z

After further investigation, I am seeing this same Timed out after 0s while waiting for primed token error when I am trying to push many jobs which should be unique along with the duplicate logging message as well. So would this time out message occur during the rejection of a duplicate job? If that is the case it seems like it can be safely ignored?

mhenrixon · 2021-05-21T06:59:16Z

@brycemelvin you are correct, it is just a warning ⚠️, not really something to be worried about. It should only occur when you have too many jobs with from the same queue, worker and with the same arguments being pushed or tried at the same time.

jerryjohnjacob · 2021-05-21T10:47:00Z

I'm also seeing this warning. However, it doesn't seem to be restricted to the situation when an identical job is getting enqueued. I tried simulating this through a test which enqueues exactly 1 job and ran that test in isolation (flushing the redis cache before and after running the test). Got the warning every time.

Using sidekiq 6.2.1 and sidekiq-unique-jobs 7.0.11.

P.S. thank you for the amazing work!

jerryjohnjacob · 2021-05-21T11:30:40Z

Just ran another test where I enqueued 100 jobs that each updated their own database record. In every case where the warning was received, the database record was not updated meaning the job itself didn't run. This does not seem to be an ignorable warning.

mhenrixon · 2021-05-21T13:54:47Z

@jerryjohnjacob thanks for the report. I will do some more debugging on the issue given your test example.

joshhubers · 2021-05-21T14:05:06Z

Sorry I never circled back on this issue. Upgrading to the version v7.0.10 worked for us at least. We still receive the warnings, but our execution occured as expected. Not the most helpful I know for others, but I thought I'd say something for reference.

mhenrixon · 2021-05-21T14:11:01Z

@joshhubers That is super helpful feedback! Thank you

krutovgerman · 2021-05-28T09:24:26Z

Just ran another test where I enqueued 100 jobs that each updated their own database record. In every case where the warning was received, the database record was not updated meaning the job itself didn't run. This does not seem to be an ignorable warning.

Hey, did you resolve the problem?

jerryjohnjacob · 2021-05-28T10:06:50Z

@krutovgerman Not exactly. I had to downgrade to the version I was using before, which is quite old, so not a "solution" I would recommend.

jerryjohnjacob · 2021-05-31T12:38:07Z

@mhenrixon Did some additional testing. Looks like the issue was introduced in v7.0.8. I ran similar tests to what I'd previously mentioned and noticed that the issue of the jobs not getting executed started with this version. With v7.0.7, the warnings would appear in cases where there were duplicate jobs being enqueued, but at least one of those jobs would get properly executed. Hope this helps.

mhenrixon · 2021-05-31T18:44:34Z

~~@jerryjohnjacob did you also try 7.0.11?~~ never mind, saw you did.

jerryjohnjacob · 2021-06-01T04:12:42Z

@mhenrixon: @joshhubers says v7.0.10 solves the problem for him, but it does not for me. I only get the behaviour he described with v7.0.7. So I'm wondering whether the issue is beyond the scope of just sidekiq-unique-jobs. e.g., there might be a dependency on the Sidekiq and/or Ruby versions. For reference, I'm using sidekiq 6.2.1 and ruby-2.7.1. If there is any other information that I can provide, please let me know. This is an amazing library and we appreciate you and all the work you do!

monsha · 2021-06-02T00:32:47Z

I'm getting this warning too. I understand this is the expected behaviour when there is already a job in the queue (the same job with the same args), but what I see is the lock is not removed after the job finishes. I see the lock remains in the Locks tab when busy/enqueued/retries/etc is 0. It's the same with until_and_while_executing or until_executed.

Using sidekiq 6.2.0, sidekiq-unique-jobs 7.0.11, Ruby 3.0.1.

P.S. thank you for this amazing gem @mhenrixon

mhenrixon · 2021-06-02T05:36:47Z

@monsha @jerryjohnjacob Thanks for the details. I'll do some experiments this week.

heaven · 2021-06-04T07:39:10Z

This looks like debug information to me:

pid=4 tid=oxaqvpq4w class=PageCrawlJob jid=73203f0c06e7f736cbe75331 bid=qXGzBcNaFQP7Ww uniquejobs=client until_executed=uniquejobs:d762c6d561dcdb6387c8a9d6acf41643 until_executed=uniquejobs:d3cd8c187f4b454a583464625dceddd7 WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:d3cd8c187f4b454a583464625dceddd7, job_id: 338287642716773462df1b28)

There is no way I can use this so the value to me is 0. But our server logs are literally flooded by these. We need a way to silence this completely.

For INFO log level it could show something like a job with args ... already exists failed to get lock within ... second.
For DEBUG it may include what it does currently + unique arguments.

But since this is expected behavior it shouldn't be a warning, I want all duplicate jobs to be silently discarded and I don't need to know about that. It's enough that when a job failed to be queued I'm getting nil instead of jid.

mhenrixon · 2021-06-04T08:29:42Z

@heaven would it work if I set the log level to debug perhaps? There is also a PR to enable silencing of these particular log entries but I'd prefer to find a more suitable solution.

heaven · 2021-06-04T08:45:14Z

@mhenrixon there are 2 issues I'm facing:

flood in the logs
lack of useful information in the output

Regarding the first problem, I think it would make sense to silence the warning when the timeout is 0. Or change the level from WARN to DEBUG. And keep using warnings when timeout is > 0. Or add a config option, such as debug:boolean. I feel that if you silence it completely there will be people complaining that the jobs discarded silently :) So I'd simply add some control to this. Like you did here:

  config.debug_lua       = !Rails.env.prod? # true for debugging
  config.lock_info       = !Rails.env.prod? # true for debugging

For the second problem, I've found that more useful information could be added, e.g. I was missing item["lock_args"]. Currently, we have all sorts of job IDs and unique job keys, which aren't very helpful in production.

mhenrixon · 2021-06-04T08:48:56Z

@heaven I am working on a kind of event system for the gem that instead of logging will give you the event that generated the entry and the item/job itself. That way you can do whatever you want with it and I will remove the logging.

Would something like that work better perhaps?

heaven · 2021-06-04T09:03:42Z

@mhenrixon I'm sure that would be awesome.

Well, you can still keep the logging but in this case, make it using the debug level and allow turning it on/off like with the config settings above. Might be useful in some scenarios helping to identify that there's a problem. While the event system will help to deal with it.

But in the meantime, if you can allow us to silence this warning that'd be a huge relief because our logs are bloated.

mhenrixon · 2021-06-04T09:05:09Z

I will reduce the log level to debug immediately @heaven. Sorry about the noise

mhenrixon · 2021-06-04T09:26:06Z

@heaven #610 610

mhenrixon · 2021-06-04T09:44:05Z

Released as v7.0.12

heaven · 2021-06-04T11:14:04Z

@mhenrixon amazing, thank you, man!

mhenrixon · 2021-06-04T12:17:40Z

My pleasure @heaven! Always happy to help

jerryjohnjacob · 2021-09-17T09:45:26Z

Hi @mhenrixon! Were you able to find a solution for the issue of jobs getting dropped? I retried the same tests with the latest version of the gem and with more records (30000+ instead of 100). The warnings no longer appear, but the issue still exists. Some jobs were dropped and the database records were not updated.

mhenrixon · 2021-09-18T06:19:22Z

@jerryjohnjacob it is on my radar. I've unfortunately had a bad stroke of luck in the last couple of months.

Been sick 3 weeks with the flu, lost my dog and was out another week for tweaking my back.

I'm back now and can't wait to dig into some of these problems.

jerryjohnjacob · 2021-09-18T07:38:21Z

Oh goodness, I'm so sorry to hear that! Please, take your time. No urgency on this matter.

…

________________________________ From: Mikael Henriksson ***@***.***> Sent: Saturday, September 18, 2021 11:49:33 AM To: mhenrixon/sidekiq-unique-jobs ***@***.***> Cc: Jerry John Jacob ***@***.***>; Mention ***@***.***> Subject: Re: [mhenrixon/sidekiq-unique-jobs] Timed out after 0s while waiting for primed token (#601) @jerryjohnjacob<https://github.com/jerryjohnjacob> it is on my radar. I've unfortunately had a bad stroke of luck in the last couple of months. Been sick 3 weeks with the flu, lost my dog and was out another week for tweaking my back. I'm back now and can't wait to dig into some of these problems. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub<#601 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ABF3RACCWUDLAG3YY4YDPRLUCQVPLANCNFSM44FXXQVQ>. Triage notifications on the go with GitHub Mobile for iOS<https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android<https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

mhenrixon · 2021-09-28T04:36:24Z

Version 7.1.7 partly fixes similar issues (#640) that jobs are silently dropped when using replace strategy.

In the original question, no strategy is in use so it is likely also something else.

jerryjohnjacob · 2021-09-28T05:06:06Z

@mhenrixon Hope you're doing better now!

Just tried v7.1.7. Still seeing the issue. Here are the stats for a job count of 34493:
Number of successful jobs: 32339
Number of dropped jobs: 2154

I still wonder if it has something to do with the configuration I'm using. Here are the latest details:
Ruby v2.7.4
Rails v6.1.4
Sidekiq v6.2.2
Sidekiq pro v5.2.4

Sidekiq config:

Sidekiq::Client.reliable_push!

Sidekiq.configure_client do |config|
  config.client_middleware do |chain|
    chain.add SidekiqUniqueJobs::Middleware::Client
  end
end

Sidekiq.configure_server do |config|
  config.reliable_scheduler!
  config.super_fetch!

  config.client_middleware do |chain|
    chain.add SidekiqUniqueJobs::Middleware::Client
  end

  config.server_middleware do |chain|
    chain.add SidekiqUniqueJobs::Middleware::Server
  end

  SidekiqUniqueJobs::Server.configure(config)
end

Worker:

class UpdateRecordWorker
  include Sidekiq::Worker

  sidekiq_options lock: :until_and_while_executing, unique_across_queues: true, queue: 'high_priority'

  def perform(record_id)
    Record.find(record_id).update_column(:my_db_column, true)
  end
end

Final script:

Record.update_all(my_db_column: false) # reset
Record.ids.each { |record_id| UpdateRecordWorker.perform_async(record_id) }

This configuration works with v7.0.7 of the gem, the only issue being the warning being printed constantly in the logs, e.g.:

2021-09-21T10:36:20.203Z pid=2323 tid=air uniquejobs=client until_and_while_executing=uniquejobs:48cbec793851a14c97c9cfeba9925afb WARN: Timed out after 0s while waiting for primed token (digest: uniquejobs:48cbec793851a14c97c9cfeba9925afb, job_id: 3f8280b500c90a4cbb9ddfdb)

jerryjohnjacob · 2021-11-19T10:28:22Z

@mhenrixon Were you able to look into this afterwards? I ran into the issue reported here and have to upgrade, but I'm still facing the issue of workers randomly failing to run. On the latest test, found that 1978 out of 34493 jobs were not run. This only happens when I add the uniqueness options, so I think we can rule out the root cause directly coming from somewhere else.

mhenrixon · 2021-11-20T07:15:02Z

@jerryjohnjacob not yet but I am slowly getting to it. Starting a company and doing full time freelancing isn't easy but I finally have a project again that uses the gem so I'll get to it at some point.

jerryjohnjacob · 2021-11-22T11:51:15Z

Wow, mega stuff @mhenrixon! 🎉

mhenrixon mentioned this issue May 10, 2021

Add drift to original value #603

Merged

mhenrixon closed this as completed in #603 May 10, 2021

mhenrixon mentioned this issue Sep 21, 2021

Memory bloat / dangling keys / reaper not cleaning orphans #637

Closed

Timed out after 0s while waiting for primed token #601

Timed out after 0s while waiting for primed token #601

Comments

joshhubers commented May 5, 2021 • edited Loading

mhenrixon commented May 6, 2021

joshhubers commented May 6, 2021 • edited Loading

mhenrixon commented May 6, 2021

rasto28 commented May 7, 2021

joshhubers commented May 7, 2021 • edited Loading

mhenrixon commented May 7, 2021

mhenrixon commented May 10, 2021 • edited Loading

joshhubers commented May 10, 2021

brycemelvin commented May 20, 2021

brycemelvin commented May 20, 2021

mhenrixon commented May 21, 2021

jerryjohnjacob commented May 21, 2021

jerryjohnjacob commented May 21, 2021

mhenrixon commented May 21, 2021

joshhubers commented May 21, 2021

mhenrixon commented May 21, 2021

krutovgerman commented May 28, 2021

jerryjohnjacob commented May 28, 2021

jerryjohnjacob commented May 31, 2021

mhenrixon commented May 31, 2021 • edited Loading

jerryjohnjacob commented Jun 1, 2021

monsha commented Jun 2, 2021

mhenrixon commented Jun 2, 2021

heaven commented Jun 4, 2021 • edited Loading

mhenrixon commented Jun 4, 2021

heaven commented Jun 4, 2021 • edited Loading

mhenrixon commented Jun 4, 2021

heaven commented Jun 4, 2021

mhenrixon commented Jun 4, 2021

mhenrixon commented Jun 4, 2021

mhenrixon commented Jun 4, 2021

heaven commented Jun 4, 2021

mhenrixon commented Jun 4, 2021

jerryjohnjacob commented Sep 17, 2021

mhenrixon commented Sep 18, 2021

jerryjohnjacob commented Sep 18, 2021 via email

mhenrixon commented Sep 28, 2021

jerryjohnjacob commented Sep 28, 2021

jerryjohnjacob commented Nov 19, 2021

mhenrixon commented Nov 20, 2021

jerryjohnjacob commented Nov 22, 2021

joshhubers commented May 5, 2021 •

edited

Loading

joshhubers commented May 6, 2021 •

edited

Loading

joshhubers commented May 7, 2021 •

edited

Loading

mhenrixon commented May 10, 2021 •

edited

Loading

mhenrixon commented May 31, 2021 •

edited

Loading

heaven commented Jun 4, 2021 •

edited

Loading

heaven commented Jun 4, 2021 •

edited

Loading