Rate Limiting plugin memory leak issue #3124

wjxcai · 2017-12-26T04:07:30Z

Summary

After set config.second property in ratelimit plugin. The second cache (e.g. dict = kong_cache, key = ratelimit:561f23d4-23d2-4283-9f4c-7c077b5c77d3:118.116.111.5:1514260849000:second, value = 1) will take all cahce memory and never expired. When new API add in or be called. No memory for new API and plugins. Then no memory error show up.

Steps To Reproduce

vim kong.conf
mem_cache_size = 100k
kong start --vv
create 6 APIs
Add ratelimit plugin set second
$ curl -X POST http://kong:8001/apis/{api}/plugins
--data "name=rate-limiting"
--data "config.second=100"
--data "config.hour=10000"
keep calling the first 5 APIs.
After few minutes with no error.
call 6th API. you will see the error in error.log
[error] 19343#0: *2427 [lua] responses.lua:107: send_HTTP_INTERNAL_SERVER_ERROR(): failed to get from node cache: could not write to lua_shared_dict: no memory, client: 118.116.111.5, server: kong, request: "GET /V1/ben_test_1221?auth_key=097C993DF185099B2B7633F4C1778023&t=1514259888796 HTTP/1.1", host: "ec2-13-56-255-95.us-west-1.compute.amazonaws.com:8000"

Additional Details & Logs

Kong version (0.11.0)
Kong debug-level startup logs
startup.log
Kong error logs (<KONG_PREFIX>/logs/error.log)
error.log
Kong configuration (registered APIs/Plugins & configuration file)
Operating System
ubuntu 16.04 LTS

The text was updated successfully, but these errors were encountered:

wjxcai · 2017-12-26T05:42:25Z

More observation.
The code of https://github.com/Kong/kong/blob/master/kong/plugins/rate-limiting/policies/init.lua
line 33 - 38
local newval, err = shm:incr(cache_key, value, 0)
if not newval then
ngx_log(ngx.ERR, "[rate-limiting] could not increment counter ", "for period '", period, "': ", err)
return nil, err
end
The function ngx.shared.DICT.incr does not have expiration setting. it will overrides the (least recently used) unexpired items in the store when running out of storage in the shared memory zone. Please refer to https://github.com/openresty/lua-nginx-module#ngxshareddictincr

I suggest to change above code to the following code that use ngx.shared.DICT.set to set respective expire time. Just like the logic in policy=redis
local expireSecond = EXPIRATIONS[period]
local existVal, err = shm:get(cache_key)
if not existVal then
local success, errNew = shm:set(cache_key, value, expireSecond)
if not success then
ngx_log(ngx.ERR, "[rate-limiting] could not increment counter ", "for period '", period, "': ", errNew)
return nil, errNew
end
end

if existVal then
local newVal = existVal + value
local success, errUpdate = shm:set(cache_key, newVal, expireSecond)
if not success then
ngx_log(ngx.ERR, "[rate-limiting] could not increment counter ", "for period '", period, "': ", errUpdate)
return nil, errUpdate
end
end

p0pr0ck5 · 2017-12-27T18:24:37Z

Well, this is somewhat embarrassing. It seems I made a somewhat short-sighted decision in fc9a96b76, removing the expiry field from these elements.

Good news, though! We can use the hot-off-the-press :expire call (https://github.com/openresty/lua-nginx-module#ngxshareddictexpire) once we upgrade to OR 1.13.x.

davidsiracusa · 2018-01-07T15:32:03Z

The rate limiting plugins, call to shm:incr, call ngx_http_lua_shdict.c|ngx_http_lua_shdict_incr which performs an atomic increment across all worker processes. The underlying C code performs lock/unlock as needed. Performing a get/set outside of the C logic, will lead to concurrency issues. I have submitted an issue to openresty: openresty/openresty#328

davidsiracusa · 2018-01-07T18:12:41Z

For clarity: with regard to the logic above, performing the arithmetic increment will not work, as there are multiple worker processes and threads involved. The increment must be performed in ngx_http_lua_shdict_incr, where locking ensures an atomic increment. A get/set will fall short.

. . .
if existVal then
local newVal = existVal + value
local success, errUpdate = shm:set(cache_key, newVal, expireSecond)
if not success then
ngx_log(ngx.ERR, "[rate-limiting] could not increment counter ", "for period '", period, "': ", errUpdate)
return nil, errUpdate
end
end

wjxcai · 2018-01-08T02:50:27Z

@davidsiracusa Yes, your are correct. shm:set will not work for multiple worker processes.
So use :expire call (https://github.com/openresty/lua-nginx-module#ngxshareddictexpire) is what we can do for now. Hope openresty can add expire parameter/setting to shm:incr ASAP.

davidsiracusa · 2018-01-08T14:51:42Z

@agentzh’s is now open to adding a ttl argument to incr. He has asked me to contribute to openresty. Doing so is more efficient than calling expire multiple times from Kong.

p0pr0ck5 · 2018-01-08T18:52:20Z

Prior art related to this subject: openresty/lua-nginx-module#942

davidsiracusa · 2018-01-11T11:27:55Z

openresty/openresty#328 has been closed.
An initial time to live argument has been added to ngx_http_lua_ffi_shdict_incr in file: ngx_http_lua_shdict.c.

jeremyjpj0916 · 2018-02-15T07:37:48Z

Been a bit over a month and thought would be good for a checkup, is this fix in the pipes to be released into the 12.x series or the 13.x series coming up?

thibaultcha · 2018-02-15T09:15:08Z

There is a lot of confusion and a few assumptions being made here. There is no memory leak in the rate-limiting plugin’s business logic code. The exptime argument helps reducing the footprint, but by no means resolves a fictitious memory leak. It is not planned to be added to 0.12 or 0.13, since it is not even released in OpenResty.

The initial error encountered here seems to be related to the host’s memory being filled because it simply isn’t large enough. We have encountered many such cases already in the issues the past few months. A reminder that kong will use roughly ~400Mbs of memory per worker (maybe more) with the default settings.

davidsiracusa · 2018-02-15T11:19:22Z

Currently, request/response rate-limiting leverages kong_cache, of which entries accumulate over time, never to expire, eventually leading to a "no memory" error. These never to be freed entries accelerate if per second rate limiting is used because the key uses a fully qualified date/time. The issue is self-evident in the openresty code and has been acknowledged. I've read that setting exptime may be made separately, shortly after incr, I feel this can affect performance, a waste of a roundtrip involving locking/unlocking the zone. I suggested that the atomic increment include an exptime, as it's best set at creation. This change has been made in the openresty master.

thibaultcha · 2018-02-15T17:22:08Z

@davidsiracusa

of which entries accumulate over time, never to expire

That is simply not true, as I have been trying to tell you over and over again. The LRU eviction mechanism will eventually kick in and expire keys once the shm is full.

The issue is self-evident in the openresty code and has been acknowledged.

By who? When? Having followed the entire conversation and contributed the init_ttl? argument addition you are talking about, I do not remember anybody acknowledging that Kong actually had a memory leak.

This change has been made in the openresty master.

I am very much aware :) But like I said, we will not use it until the next OpenResty release, as we try to avoid bundling our own "flavor" and since this is not a memory leak, hence severely lowering the priority of this to deserve such a short-notice patch.

There are several code paths that lead to this no memory error in the shdict code, and we have already debated them here: #3105

you may be running Kong in an environment that does not have enough memory. Reduce the amount of workers or reduce the size of the shm before it reaches its limit. We will later on provide a way to disable the Lua-land caching to take full advantage of shared memory and reduce footprint.
you may be running into an edge-case in which your shm is full, but the retrieved value from the cache is larger than that of the one scheduled for eviction, thus returning this error. It may be worth investigating if it is the case, in which case, you might want to use a different shm for rate-limiting.

The memory consumption of rate-limiting might be aggressive, yes, but there is no case of a "memory leak" - we see memory usage stabilizing over time. We will use the new init_ttl? argument when it will be released by OpenResty. In the meantime, there are some alternatives, including rollign your own patche(s).

davidsiracusa · 2018-02-15T18:51:56Z

Just so we're on the same page, we're using Kong 0.11.xx:

In openresty file: ngx_http_lua_shdict.c, procedure: ngx_http_lua_ffi_shdict_incr expiry or time-to-live is hardcoded to 0, or forever so the LRU mechanism doesn't apply. The updated openresty with init_ttl is welcome, and will address this leak.

p0pr0ck5 · 2018-02-15T18:59:00Z

@davidsiracusa have a look at https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L249, which is called from almost all dictionary func handlers, including the FFI incr wrapper (https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L2624). It is not accurate to say the LRU mechanism doesn't apply, because in cases where there is memory pressure in the zone the oldest entries will be forcefully evicted (see https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L2773-L2784). None of this behavior has changed in between the Openresty releases used for Kong 0.11.x and 0.12.x.

thibaultcha · 2018-02-15T18:59:07Z

@davidsiracusa I am sorry to have to say it, but this conversation is seriously starting to bother me.

https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L2768-L2784

https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L266

davidsiracusa · 2018-02-15T19:08:02Z

init_ttl was recently added, and is not present in openresty for Kong 0.11.xx.
The procedure looks like this:
int ngx_http_lua_ffi_shdict_incr(ngx_shm_zone_t *zone, u_char *key,
size_t key_len, double *value, char **err, int has_init, double init,
int *forcible)...

thibaultcha · 2018-02-15T20:03:32Z

How is this even related? The init_ttl flag has nothing to do with the LRU eviction mechanism.

davidsiracusa · 2018-02-15T21:23:04Z

Sorry, as mentioned before these incr key/value pairs cannot be reclaimed by the LRU, they persist forever. Only candidates which do specify a ttl fall into the LRU expiry candidate category. Eventually incr uses enough memory that memory is fully exhausted. There a window as the ceiling closes in that some LRU eviction of compliant entries may free up holes, at this point "no memory errors" are sporadic, as incr closes in, a fatal lack of memory occurs. This is easily reproduced.

thibaultcha · 2018-02-15T23:08:28Z

as mentioned before these incr key/value pairs cannot be reclaimed by the LRU, they persist forever. Only candidates which do specify a ttl fall into the LRU expiry candidate category.

For the last time, that is simply wrong.

error_log logs/error.log notice;

events {}

http {
    lua_shared_dict counters 12k;

    init_by_lua_block {
        require "resty.core"
    }

    server {
        listen 9000;

        location / {
            content_by_lua_block {
                local key = "key_" .. ngx.now() * 1000

                local newcount, err, forcible = ngx.shared.counters:incr(key, 1, 0)
                if err then
                    ngx.log(ngx.ERR, "failed to incr(): ", err)
                    return ngx.exit(500)
                end

                ngx.print(key, ": ", newcount)
                if forcible then
                    ngx.print(" (forcibly removed a value from the shm)")
                    ngx.log(ngx.NOTICE, "forcibly removed a value")
                end

                ngx.say()
            }
        }
    }
}

Yes, no memory error can happen but they do not happen because of a memory leak. Re-read my numerous answers to you in the Kong and OpenResty repositories, and the related thread I linked.

I am locking this topic.

Addresses the issue discussed in #3124 and #3241. This is part of a series of fixes to address those errors. Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241

Addresses the issue discussed in #3124 and #3241. This is part of a series of fixes to address those errors. Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241

This is part of a series of fixes: - thibaultcha/lua-resty-mlcache#41 - thibaultcha/lua-resty-mlcache#42 - #3311 - #3341 Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241

This is part of a series of fixes: - thibaultcha/lua-resty-mlcache#41 - thibaultcha/lua-resty-mlcache#42 - #3311 - #3341 Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241 From #3311

wjxcai changed the title ~~ratelimit plugin memery leak issue~~ Rate Limiting plugin memory leak issue Dec 26, 2017

kikito added the task/bug label Jan 2, 2018

jeremyjpj0916 mentioned this issue Jan 8, 2018

*127975840 [lua] responses.lua:107: do_authentication(): failed to get from node cache: could not write to lua_shared_dict: no memory #3105

Closed

thibaultcha removed the task/bug label Feb 15, 2018

thibaultcha closed this as completed Feb 15, 2018

Kong locked as too heated and limited conversation to collaborators Feb 15, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rate Limiting plugin memory leak issue #3124

Rate Limiting plugin memory leak issue #3124

wjxcai commented Dec 26, 2017

wjxcai commented Dec 26, 2017 •

edited

Loading

p0pr0ck5 commented Dec 27, 2017

davidsiracusa commented Jan 7, 2018

davidsiracusa commented Jan 7, 2018 •

edited

Loading

wjxcai commented Jan 8, 2018

davidsiracusa commented Jan 8, 2018

p0pr0ck5 commented Jan 8, 2018

davidsiracusa commented Jan 11, 2018

jeremyjpj0916 commented Feb 15, 2018

thibaultcha commented Feb 15, 2018 •

edited

Loading

davidsiracusa commented Feb 15, 2018 •

edited

Loading

thibaultcha commented Feb 15, 2018 •

edited

Loading

davidsiracusa commented Feb 15, 2018

p0pr0ck5 commented Feb 15, 2018

thibaultcha commented Feb 15, 2018

davidsiracusa commented Feb 15, 2018

thibaultcha commented Feb 15, 2018

davidsiracusa commented Feb 15, 2018

thibaultcha commented Feb 15, 2018

Rate Limiting plugin memory leak issue #3124

Rate Limiting plugin memory leak issue #3124

Comments

wjxcai commented Dec 26, 2017

Summary

Steps To Reproduce

Additional Details & Logs

wjxcai commented Dec 26, 2017 • edited Loading

p0pr0ck5 commented Dec 27, 2017

davidsiracusa commented Jan 7, 2018

davidsiracusa commented Jan 7, 2018 • edited Loading

wjxcai commented Jan 8, 2018

davidsiracusa commented Jan 8, 2018

p0pr0ck5 commented Jan 8, 2018

davidsiracusa commented Jan 11, 2018

jeremyjpj0916 commented Feb 15, 2018

thibaultcha commented Feb 15, 2018 • edited Loading

davidsiracusa commented Feb 15, 2018 • edited Loading

thibaultcha commented Feb 15, 2018 • edited Loading

davidsiracusa commented Feb 15, 2018

p0pr0ck5 commented Feb 15, 2018

thibaultcha commented Feb 15, 2018

davidsiracusa commented Feb 15, 2018

thibaultcha commented Feb 15, 2018

davidsiracusa commented Feb 15, 2018

thibaultcha commented Feb 15, 2018

wjxcai commented Dec 26, 2017 •

edited

Loading

davidsiracusa commented Jan 7, 2018 •

edited

Loading

thibaultcha commented Feb 15, 2018 •

edited

Loading

davidsiracusa commented Feb 15, 2018 •

edited

Loading

thibaultcha commented Feb 15, 2018 •

edited

Loading