redis script cache gets bloated after update to bull 2.0 #426

prdn · 2017-01-03T10:15:20Z

Hello,

I'm experiencing a growing Lua script cache after updating to bull-2.0
After 20 days of uptime my redis Lua cache grown up to 15GB

Anyone experiencing same issue?

Thanks

manast · 2017-01-03T10:41:17Z

can you check what entries are there in the cache? there must be a spurious entry that is for some reason not being reused but created every time it is used.

prdn · 2017-01-03T16:44:36Z

@manast I've added some console logs to the script caching function in lib/scripts.js. The hashes seem all the same.
Just to be sure that we are on the same, the value that is growing fast is used_memory_lua
Reverting to v1.1.3 is working fine.

Do you have a preferred way to analyse the redis lua script cache?

prdn · 2017-01-03T17:34:01Z

@manast can it be that _retryAtOnce is using directly eval and not caching the script and using evalsha so it is accumulating the same script again and again every time there is a retry?

manast · 2017-01-03T19:36:20Z

eval is not caching the script, so I don't think that is the reason of the problem. I am not aware of any other command in redis besides LOAD that caches scripts. So if you have time you could put a console.log just before calling load, and you should be able to see which script is being cached continuously: https://redis.io/commands/script-load

prdn · 2017-01-03T23:30:57Z

You're right. I added a console.log in my dev environment.
It looks everything normal but just a nodejs process (concurrency = 5) is bloating my redis instance.
On production I need to revert to 1.1.3. The main change seems moving to ioredis and that looks good. I will try to spend a bit of time on it in next days, since it's crucial for me to maintain bull as a dependency.

Thanks for helping

manast · 2017-01-04T12:27:19Z

The major new thing in 2.0 is the redlock algo, I would bet that is the root of the problem. @doublerebel any ideas?

prdn · 2017-01-07T00:22:02Z

I think you're right @manast @doublerebel
Is there an easy way to test the previous locking system?
Otherwise will set up a branch for it

manast · 2017-01-07T21:54:08Z

when the old locking system was replaced by redlock many other changes where performed, so it is not easy to replace it. I will investigate when I have some time why the script cache is increasing, there could be an issue in the redlock dependency as well: https://github.com/mike-marcacci/node-redlock

manast · 2017-01-08T11:50:42Z

I have analised the code, and I cannot see anything in redlock or in bull that would currently cache a lot of scripts. In redlock it is actually running eval directly (which is not so performant, I would like to improve this in the future), and in bull, the code related to lock is not using any scripts, it is relayed to redlock. If you put a console log here https://github.com/OptimalBits/bull/blob/master/lib/scripts.js#L46 outputing the hashes you should only see a few ones (correct behaviour) otherwise you should see a lot of them.
Now, is this cache bloating creating problems? it could be some redis optimization that caches lua stuff but that is reused by actual keys if necessary, and therefore nothing to worry about?

manast · 2017-01-08T12:04:48Z

From redis documentation https://redis.io/commands/eval

"The EVAL command forces you to send the script body again and again. Redis does not need to recompile the script every time as it uses an internal caching mechanism, however paying the cost of the additional bandwidth may not be optimal in many contexts.hanism, however paying the cost of the additional bandwidth may not be optimal in many contexts."

prdn · 2017-01-08T12:52:26Z

@manast well my redis db that is around 1GB reached 15GB of memory, that's a pretty big overhead in my opinion and it is a risk on a production environment since there is no clear limit for it.

I think I found the issue:

bull/lib/scripts.js

Line 328 in f1ea81c

    
           var lockAcquired = ['and redis.call("HSET", "', queue.toKey(job.jobId), '", "lockAcquired", "1")'].join('');

Here you call redlock passing a variable locking Lua script.
This is probably what's bloating the redis cache

manast · 2017-01-08T20:45:45Z

without checking redlock, I guess that line is there because it is not currently possible to specify extra keys to be used by redlock. That would certainly need a change in redlock as well, to be merged and released before we can fix it in bull...

prdn · 2017-01-08T20:59:18Z

Sounds good. At least it seems the explanation of what I'm seeing. I'll see if I can prepare something for graylog.
I think that any real use of bull-2.0 in production will end up in causing troubles. I'm handling around 1 million jobs per day and with the current cache grow I reach 15GB in 3 days.

Thanks for this awesome lib btw, I truly like it

manast · 2017-01-08T21:20:07Z

Even if the 15Gb is scary, it may not be a problem if redis just reuse that memory when needed. I checked redis.conf file but unfortunately there is no setting to limit the max amount of lua cache. Maybe @antirez could give some explanation about this behaviour.

prdn · 2017-01-08T22:08:44Z

I agree that it would not be scary if there was a recycle of the memory. I checked for that parameter as well.
From here https://redis.io/commands/eval
The reason why scripts can be cached for long time is that it is unlikely for a well written application to have enough different scripts to cause memory problems. Every script is conceptually like the implementation of a new command, and even a large application will likely have just a few hundred of them. Even if the application is modified many times and scripts will change, the memory used is negligible.

I have the feeling that the changing the script each time is against redis design.

gblock0 · 2017-01-17T17:28:26Z

We are seeing this as well, our used_memory_lua was at 1118090240. Our backend locks up and once we run script flush things start working again.

…ts#426 NOTES: this will still violate redis cluster rules of each script explicitly defining the keys it writes to in the KEYS array, howerver the current implementation also does this and this should fix the growing lua_memory_cache issue.

joeandrews · 2017-01-17T20:48:53Z

I think we have fixed this on our pull request.

The issue is that line

bull/lib/scripts.js

Line 328 in f1ea81c

    
           var lockAcquired = ['and redis.call("HSET", "', queue.toKey(job.jobId), '", "lockAcquired", "1")'].join('');

was creating a new script for every single job ID. 10,000 jobs, results in 10,000 scripts in the script cache and hence this issue.

jeltok · 2017-01-21T17:15:29Z

We experience this issue as well with about 600k jobs a day. We use Redis 3.0.7 on Ubuntu 14.04

After 7 days of use Redis memory is arounf 8GB - data is less than 1GB, rest is reported as Lua scripts cache.

prdn · 2017-02-03T08:17:03Z

Any update on this issue?
Thanks

manast · 2017-02-03T09:01:47Z

we have a PR that solves this issue but there is another issue with that PR that needs to be resolved before it can be merged.

jeltok · 2017-02-25T17:11:58Z

That's how the memory on my Redis instance (that supports Bull) looks like. We are running about 1 million messages a day through the Bull queues. The server has 16 GB of memory, and looses about 3GB a day to Lua cache.
Running SCRIPT FLUSH command any other day helps, but is a hustle. Other than that, Bull works just great for us.
Hope this issue will be fixed soon.

manast · 2017-02-27T22:38:38Z

@jeltok a fix for this issue is the next thing in my list.

Fix Lock Redis script to not be a different script for every Job ID #426

manast · 2017-03-01T20:19:28Z

@jeltok can you please verify that the issue is resolved with bull 2.2.0 so that we can close it?

engvik · 2017-03-01T20:27:55Z

Looks like I'm still seeing the issue. Upgraded from 1.1.3 to 2.2.0 today. Have kept an eye on the used_memory_lua field and it's been steadily increasing since the upgrade. Flushing the scripts from the redis instance drops the usage down to normal.

manast · 2017-03-01T20:29:39Z

hmm, ok, I will need to dig more into it.

engvik · 2017-03-01T20:34:12Z

Thanks! Just switched from kue, and other than this it's been working great! Let me know if you need anything 😄

jeltok · 2017-03-02T14:46:21Z

I have installed version 2.2.0 yesterday. Unfortunately it does not improve the issue with Lua cache by much.

manast · 2017-03-02T15:06:08Z

@jeltok I would say nothing at all :). I will try to do a deeper analysis.

joeandrews · 2017-03-02T16:02:11Z

@manast @jeltok I have some time today will try to look into it further as well.

Joe

manast · 2017-03-08T11:32:25Z

Ok, I have created a custom bull-redlock package and fixed one last issue. The queue works pretty well now without leaks. I will make a new release in the following hours.

jeltok · 2017-03-09T14:54:42Z

Thanks a lot!!! Looks much better now :-)

boatmeme · 2017-04-22T04:02:04Z

Sorry for the noise, just wanted to send some gratitude to @manast for fixing this issue.

manast added the bug label Jan 9, 2017

bradvogel mentioned this issue Jan 17, 2017

OOM command not allowed when used memory > maxmemory #433

Closed

joeandrews mentioned this issue Jan 17, 2017

Fix Lock Redis script to not be a different script for every Job ID #426 #434

Merged

jeltok mentioned this issue Feb 11, 2017

How do I manage the redis space bull uses? #411

Closed

manast added a commit that referenced this issue Mar 1, 2017

Merge pull request #434 from goradish/lua_memory_patch

a9cce34

Fix Lock Redis script to not be a different script for every Job ID #426

manast added a commit that referenced this issue Mar 3, 2017

used custom redlock for fixing #426

590dd04

manast added a commit that referenced this issue Mar 3, 2017

used custom redlock for fixing #426

c750cbb

manast closed this as completed Mar 8, 2017

manast mentioned this issue Mar 16, 2017

High memory usage #467

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

redis script cache gets bloated after update to bull 2.0 #426

redis script cache gets bloated after update to bull 2.0 #426

prdn commented Jan 3, 2017

manast commented Jan 3, 2017

prdn commented Jan 3, 2017

prdn commented Jan 3, 2017

manast commented Jan 3, 2017 •

edited

Loading

prdn commented Jan 3, 2017

manast commented Jan 4, 2017

prdn commented Jan 7, 2017 •

edited

Loading

manast commented Jan 7, 2017 •

edited

Loading

manast commented Jan 8, 2017 •

edited

Loading

manast commented Jan 8, 2017 •

edited

Loading

prdn commented Jan 8, 2017

manast commented Jan 8, 2017

prdn commented Jan 8, 2017

manast commented Jan 8, 2017 •

edited

Loading

prdn commented Jan 8, 2017

gblock0 commented Jan 17, 2017

joeandrews commented Jan 17, 2017 •

edited

Loading

jeltok commented Jan 21, 2017

prdn commented Feb 3, 2017

manast commented Feb 3, 2017

jeltok commented Feb 25, 2017

manast commented Feb 27, 2017

manast commented Mar 1, 2017

engvik commented Mar 1, 2017 •

edited

Loading

manast commented Mar 1, 2017

engvik commented Mar 1, 2017

jeltok commented Mar 2, 2017

manast commented Mar 2, 2017

joeandrews commented Mar 2, 2017

manast commented Mar 8, 2017

jeltok commented Mar 9, 2017

boatmeme commented Apr 22, 2017

redis script cache gets bloated after update to bull 2.0 #426

redis script cache gets bloated after update to bull 2.0 #426

Comments

prdn commented Jan 3, 2017

manast commented Jan 3, 2017

prdn commented Jan 3, 2017

prdn commented Jan 3, 2017

manast commented Jan 3, 2017 • edited Loading

prdn commented Jan 3, 2017

manast commented Jan 4, 2017

prdn commented Jan 7, 2017 • edited Loading

manast commented Jan 7, 2017 • edited Loading

manast commented Jan 8, 2017 • edited Loading

manast commented Jan 8, 2017 • edited Loading

prdn commented Jan 8, 2017

manast commented Jan 8, 2017

prdn commented Jan 8, 2017

manast commented Jan 8, 2017 • edited Loading

prdn commented Jan 8, 2017

gblock0 commented Jan 17, 2017

joeandrews commented Jan 17, 2017 • edited Loading

jeltok commented Jan 21, 2017

prdn commented Feb 3, 2017

manast commented Feb 3, 2017

jeltok commented Feb 25, 2017

manast commented Feb 27, 2017

manast commented Mar 1, 2017

engvik commented Mar 1, 2017 • edited Loading

manast commented Mar 1, 2017

engvik commented Mar 1, 2017

jeltok commented Mar 2, 2017

manast commented Mar 2, 2017

joeandrews commented Mar 2, 2017

manast commented Mar 8, 2017

jeltok commented Mar 9, 2017

boatmeme commented Apr 22, 2017

manast commented Jan 3, 2017 •

edited

Loading

prdn commented Jan 7, 2017 •

edited

Loading

manast commented Jan 7, 2017 •

edited

Loading

manast commented Jan 8, 2017 •

edited

Loading

manast commented Jan 8, 2017 •

edited

Loading

manast commented Jan 8, 2017 •

edited

Loading

joeandrews commented Jan 17, 2017 •

edited

Loading

engvik commented Mar 1, 2017 •

edited

Loading