4.* is hurting background job workers performance #127

PikachuEXE · 2015-10-13T05:26:57Z

Background

We upgrade the gem to 4.* this week
And we use New Relic to monitor performance

Issue

We are seeing jobs being run 6~7 minutes for all small jobs
But by tailing the live log (STDOUT) of workers, they actually take much less time (< 1 min)
So we look at New Relic to try to find out what's going on, and we found the following data:

Before upgrade

After upgrade

mhenrixon · 2015-10-13T06:38:05Z

Doesn't really say much though. You'd have to share your job configuration as well. Only thing I can think of is if all your jobs are 'unique: :while_executing'. That is a lock that uses a mutex and will keep trying. The rest of the code should be faster since it is doing fewer redis requests. (1 request for lock, 1 for unlock instead of multiple for lock and a gazillion for unlock). I chose the default lock type to be while executing to get the most feedback. Maybe that was a bad choice.

Can you share your sidekick config, how you run your no config eg Foreman and some of the workers in question so that I can create a rails app trying to replicate the problem?

Mikael Henriksson
Sent with Airmail

PikachuEXE · 2015-10-13T07:35:51Z

I remove all config about unique jobs from all my workers.
So that would make all my workers unique by default? (and using while_executing)

My initializer for sidekiq:

require 'try_to'

# To config Redis server, please use REDIS_PROVIDER or REDIS_URL
# https://github.com/mperham/sidekiq/wiki/Advanced-Options#setting-the-location-of-your-redis-server

# Assign new concurrency from ENV if it equals to the default (25)
# @see https://github.com/mperham/sidekiq/wiki/Advanced-Options
DEFAULT_CONCURRENCY = 25
is_default_concurrency = (Sidekiq.options[:concurrency] == DEFAULT_CONCURRENCY)
sidekiq_concurrency_number = is_default_concurrency ? ENV![:SIDEKIQ_CONCURRENCY] : Sidekiq.options[:concurrency]

Sidekiq.configure_server do |config|
  config.redis = {
    namespace: "#{Rails.application.class.parent_name.underscore}:#{Rails.env.to_s.underscore}:sidekiq",
    size: sidekiq_concurrency_number + 1,
    url: Rails.configuration.redis_url,
  }

  # Change DB pool size for ActiveRecord
  # Replace with MongoDB or whatever
  if defined?(ActiveRecord::Base)
    # Note we use `ActiveRecord::Base.connection_pool`
    # Not `ActiveRecord::Base.connection`
    # Not sure why
    ActiveRecord::Base.connection_pool.disconnect!
    # Could be nil in 4.1
    config = ActiveRecord::Base.configurations[Rails.env] || {}
    config['reaping_frequency']   = ENV![:DATABASE_REAPING_FREQUENCY] # seconds
    config['pool']                = sidekiq_concurrency_number + 2 # Give it some buffer
    config['prepared_statements'] = ENV![:DATABASE_USE_PREPARED_STATEMENTS]
    ActiveRecord::Base.establish_connection(config)
    Rails.logger.info('Connected to ActiveRecord in Sidekiq')
  end
end

Sidekiq.configure_client do |config|
  config.redis = {
    namespace: "#{Rails.application.class.parent_name.underscore}:#{Rails.env.to_s.underscore}:sidekiq",
    size: 1, # Client only needs one connection
    url: Rails.configuration.redis_url,
  }
end

Sidekiq.options[:concurrency] = sidekiq_concurrency_number
# Sidekiq got a bug for this default option
# Some gems are depending on the default queue value to override the queue name
# So we don't set it here
# Default queue name is `default` in `String`
Sidekiq.default_worker_options = { retry: false, backtrace: true }.stringify_keys

My sidekiq config files (yes I got multiple files) are only for storing queue names for a worker to process.

Please tell me what else do you need.

wpolicarpo · 2015-10-13T16:03:21Z

@PikachuEXE did you notice any performance drop for jobs that were not using unique middleware?

mhenrixon · 2015-10-13T18:25:16Z

@PikachuEXE yes that would make the jobs unique: :while_executing which would explain your problem. I'll investigate this and see what can be done in terms of performance but I think it is a sensible default but maybe it should be some sensible way of breaking out of the loop.

In terms of performance it looks like it is extremely high considering your number of calls to evalsha is 298 000 instead of the around 4 000 (to get/set/multi and friends) that was before the upgrade so I am in fact pretty happy with that. It is unfortunate with your situation though that the locking is done.

There are a few ways forward on this as I see it.

Some time of retry configuration (yikes configuration 😢) for the worker
Something like sliding expiration for retrying to acquire the lock. Think how login methods should to be really secure add some additional wait between retries and then lock out the user to be really secure.
Putting the job back (yikes what if the jobs sort of needs to be processed in order)

Out of those options I only really view number 3 as a viable solution for your case since with any combination of the other two would cause the jobs to appear to complete slow when in fact it is just waiting to be able to lock the job. Thoughts?

PikachuEXE · 2015-10-14T01:34:17Z

I now removed the gem from production (and only load it in test for matchers)
The "graph" is normal again
I never expect the gem would set such "aggressive" option by default.
I think a gem should make users opt-in instead of opt-out.

PikachuEXE · 2015-10-14T01:59:32Z

Before putting this gem back on production, I want to know:

How to opt-out by default (I can't see how to opt-out in README)
The possible performance impact of each option (how often the custom scripts are called)

mhenrixon · 2015-10-14T03:06:22Z

@PikachuEXE I don't understand what you mean? If you don't configure unique: :while_executing the gem won't be used.

PikachuEXE · 2015-10-14T03:17:38Z

I have SidekiqUniqueJobs.config.unique_args_enabled = true in initializer
Other then that no other default worker config about uniqueness is supplied (I intentionally removed them after upgrade)

mhenrixon · 2015-10-14T03:44:13Z

@PikachuEXE I really don't understand what is going on then because unique_args_enabled was deprecated in 4.0.4. What does your worker really look like? You still haven't posted that code.

mhenrixon · 2015-10-14T03:52:03Z

oh darn you are right, I have accidentally removed the check if unique is enabled in the server middleware. Will have a fix in a few minutes!

mhenrixon · 2015-10-14T04:18:33Z

@PikachuEXE v4.0.6 should fix your issue

mhenrixon · 2015-10-14T04:59:49Z

Sorry partly fix your issue! :) There is one more thing to do!

wpolicarpo · 2015-10-14T13:53:33Z

@mhenrixon maybe you should yank all 4.0.x versions affected by this bug from rubygems. What do you think?

I had problems with 4.0.2 too. That's why I asked @PikachuEXE about having problems with non-unique jobs.

mhenrixon · 2015-10-14T16:55:37Z

So this should be closed now, I'll see about adding some type of running application eventually. Both for tracking performance and for spotting eager loading issues (missing constants).

mhenrixon · 2015-10-14T16:56:52Z

@PikachuEXE would it be possible to get verified that it is working better for you now. v4.0.7 is the one.

Bertg · 2015-10-14T21:39:42Z

Any ETA on the arrival of v4.0.7 ?

PikachuEXE · 2015-10-15T01:05:35Z

I can't verify anything until 4.0.7 is released :o

mhenrixon · 2015-10-15T09:36:50Z

😢 that was lame! I forgot to actually release the gem... @Bertg @PikachuEXE it has been released now.

PikachuEXE · 2015-10-15T09:38:11Z

Will test it soon

PikachuEXE · 2015-10-22T02:23:20Z

Performance seems fine when requiring the gem without any extra config.
I don't have plan to test the uniqueness options yet (the project might not even need it)

mhenrixon closed this as completed in 3a09329 Oct 14, 2015

mhenrixon reopened this Oct 14, 2015

mhenrixon closed this as completed in fb97902 Oct 14, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4.* is hurting background job workers performance #127

4.* is hurting background job workers performance #127

PikachuEXE commented Oct 13, 2015

mhenrixon commented Oct 13, 2015

PikachuEXE commented Oct 13, 2015

wpolicarpo commented Oct 13, 2015

mhenrixon commented Oct 13, 2015

PikachuEXE commented Oct 14, 2015

PikachuEXE commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

PikachuEXE commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

wpolicarpo commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

Bertg commented Oct 14, 2015

PikachuEXE commented Oct 15, 2015

mhenrixon commented Oct 15, 2015

PikachuEXE commented Oct 15, 2015

PikachuEXE commented Oct 22, 2015

4.* is hurting background job workers performance #127

4.* is hurting background job workers performance #127

Comments

PikachuEXE commented Oct 13, 2015

Background

Issue

Before upgrade

After upgrade

mhenrixon commented Oct 13, 2015

PikachuEXE commented Oct 13, 2015

wpolicarpo commented Oct 13, 2015

mhenrixon commented Oct 13, 2015

PikachuEXE commented Oct 14, 2015

PikachuEXE commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

PikachuEXE commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

wpolicarpo commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

mhenrixon commented Oct 14, 2015

Bertg commented Oct 14, 2015

PikachuEXE commented Oct 15, 2015

mhenrixon commented Oct 15, 2015

PikachuEXE commented Oct 15, 2015

PikachuEXE commented Oct 22, 2015