Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rate Limiting plugin memory leak issue #3124

Closed
wjxcai opened this issue Dec 26, 2017 · 19 comments
Closed

Rate Limiting plugin memory leak issue #3124

wjxcai opened this issue Dec 26, 2017 · 19 comments

Comments

@wjxcai
Copy link

wjxcai commented Dec 26, 2017

Summary

After set config.second property in ratelimit plugin. The second cache (e.g. dict = kong_cache, key = ratelimit:561f23d4-23d2-4283-9f4c-7c077b5c77d3:118.116.111.5:1514260849000:second, value = 1) will take all cahce memory and never expired. When new API add in or be called. No memory for new API and plugins. Then no memory error show up.

Steps To Reproduce

  1. vim kong.conf
    mem_cache_size = 100k
  2. kong start --vv
  3. create 6 APIs
  4. Add ratelimit plugin set second
    $ curl -X POST http://kong:8001/apis/{api}/plugins
    --data "name=rate-limiting"
    --data "config.second=100"
    --data "config.hour=10000"
  5. keep calling the first 5 APIs.
  6. After few minutes with no error.
  7. call 6th API. you will see the error in error.log
    [error] 19343#0: *2427 [lua] responses.lua:107: send_HTTP_INTERNAL_SERVER_ERROR(): failed to get from node cache: could not write to lua_shared_dict: no memory, client: 118.116.111.5, server: kong, request: "GET /V1/ben_test_1221?auth_key=097C993DF185099B2B7633F4C1778023&t=1514259888796 HTTP/1.1", host: "ec2-13-56-255-95.us-west-1.compute.amazonaws.com:8000"

Additional Details & Logs

  • Kong version (0.11.0)
  • Kong debug-level startup logs
    startup.log
  • Kong error logs (<KONG_PREFIX>/logs/error.log)
    error.log
  • Kong configuration (registered APIs/Plugins & configuration file)
  • Operating System
    ubuntu 16.04 LTS
@wjxcai
Copy link
Author

wjxcai commented Dec 26, 2017

More observation.
The code of https://github.com/Kong/kong/blob/master/kong/plugins/rate-limiting/policies/init.lua
line 33 - 38
local newval, err = shm:incr(cache_key, value, 0)
if not newval then
ngx_log(ngx.ERR, "[rate-limiting] could not increment counter ", "for period '", period, "': ", err)
return nil, err
end
The function ngx.shared.DICT.incr does not have expiration setting. it will overrides the (least recently used) unexpired items in the store when running out of storage in the shared memory zone. Please refer to https://github.com/openresty/lua-nginx-module#ngxshareddictincr

I suggest to change above code to the following code that use ngx.shared.DICT.set to set respective expire time. Just like the logic in policy=redis
local expireSecond = EXPIRATIONS[period]
local existVal, err = shm:get(cache_key)
if not existVal then
local success, errNew = shm:set(cache_key, value, expireSecond)
if not success then
ngx_log(ngx.ERR, "[rate-limiting] could not increment counter ", "for period '", period, "': ", errNew)
return nil, errNew
end
end

if existVal then
local newVal = existVal + value
local success, errUpdate = shm:set(cache_key, newVal, expireSecond)
if not success then
ngx_log(ngx.ERR, "[rate-limiting] could not increment counter ", "for period '", period, "': ", errUpdate)
return nil, errUpdate
end
end

@wjxcai wjxcai changed the title ratelimit plugin memery leak issue Rate Limiting plugin memory leak issue Dec 26, 2017
@p0pr0ck5
Copy link
Contributor

Well, this is somewhat embarrassing. It seems I made a somewhat short-sighted decision in fc9a96b76, removing the expiry field from these elements.

Good news, though! We can use the hot-off-the-press :expire call (https://github.com/openresty/lua-nginx-module#ngxshareddictexpire) once we upgrade to OR 1.13.x.

@kikito kikito added the task/bug label Jan 2, 2018
@davidsiracusa
Copy link

The rate limiting plugins, call to shm:incr, call ngx_http_lua_shdict.c|ngx_http_lua_shdict_incr which performs an atomic increment across all worker processes. The underlying C code performs lock/unlock as needed. Performing a get/set outside of the C logic, will lead to concurrency issues. I have submitted an issue to openresty: openresty/openresty#328

@davidsiracusa
Copy link

davidsiracusa commented Jan 7, 2018

For clarity: with regard to the logic above, performing the arithmetic increment will not work, as there are multiple worker processes and threads involved. The increment must be performed in ngx_http_lua_shdict_incr, where locking ensures an atomic increment. A get/set will fall short.

. . .
if existVal then
local newVal = existVal + value
local success, errUpdate = shm:set(cache_key, newVal, expireSecond)
if not success then
ngx_log(ngx.ERR, "[rate-limiting] could not increment counter ", "for period '", period, "': ", errUpdate)
return nil, errUpdate
end
end

@wjxcai
Copy link
Author

wjxcai commented Jan 8, 2018

@davidsiracusa Yes, your are correct. shm:set will not work for multiple worker processes.
So use :expire call (https://github.com/openresty/lua-nginx-module#ngxshareddictexpire) is what we can do for now. Hope openresty can add expire parameter/setting to shm:incr ASAP.

@davidsiracusa
Copy link

@agentzh’s is now open to adding a ttl argument to incr. He has asked me to contribute to openresty. Doing so is more efficient than calling expire multiple times from Kong.

@p0pr0ck5
Copy link
Contributor

p0pr0ck5 commented Jan 8, 2018

Prior art related to this subject: openresty/lua-nginx-module#942

@davidsiracusa
Copy link

openresty/openresty#328 has been closed.
An initial time to live argument has been added to ngx_http_lua_ffi_shdict_incr in file: ngx_http_lua_shdict.c.

@jeremyjpj0916
Copy link
Contributor

Been a bit over a month and thought would be good for a checkup, is this fix in the pipes to be released into the 12.x series or the 13.x series coming up?

@thibaultcha
Copy link
Member

thibaultcha commented Feb 15, 2018

There is a lot of confusion and a few assumptions being made here. There is no memory leak in the rate-limiting plugin’s business logic code. The exptime argument helps reducing the footprint, but by no means resolves a fictitious memory leak. It is not planned to be added to 0.12 or 0.13, since it is not even released in OpenResty.

The initial error encountered here seems to be related to the host’s memory being filled because it simply isn’t large enough. We have encountered many such cases already in the issues the past few months. A reminder that kong will use roughly ~400Mbs of memory per worker (maybe more) with the default settings.

@davidsiracusa
Copy link

davidsiracusa commented Feb 15, 2018

Currently, request/response rate-limiting leverages kong_cache, of which entries accumulate over time, never to expire, eventually leading to a "no memory" error. These never to be freed entries accelerate if per second rate limiting is used because the key uses a fully qualified date/time. The issue is self-evident in the openresty code and has been acknowledged. I've read that setting exptime may be made separately, shortly after incr, I feel this can affect performance, a waste of a roundtrip involving locking/unlocking the zone. I suggested that the atomic increment include an exptime, as it's best set at creation. This change has been made in the openresty master.

@thibaultcha
Copy link
Member

thibaultcha commented Feb 15, 2018

@davidsiracusa

of which entries accumulate over time, never to expire

That is simply not true, as I have been trying to tell you over and over again. The LRU eviction mechanism will eventually kick in and expire keys once the shm is full.

The issue is self-evident in the openresty code and has been acknowledged.

By who? When? Having followed the entire conversation and contributed the init_ttl? argument addition you are talking about, I do not remember anybody acknowledging that Kong actually had a memory leak.

This change has been made in the openresty master.

I am very much aware :) But like I said, we will not use it until the next OpenResty release, as we try to avoid bundling our own "flavor" and since this is not a memory leak, hence severely lowering the priority of this to deserve such a short-notice patch.

There are several code paths that lead to this no memory error in the shdict code, and we have already debated them here: #3105

  1. you may be running Kong in an environment that does not have enough memory. Reduce the amount of workers or reduce the size of the shm before it reaches its limit. We will later on provide a way to disable the Lua-land caching to take full advantage of shared memory and reduce footprint.
  2. you may be running into an edge-case in which your shm is full, but the retrieved value from the cache is larger than that of the one scheduled for eviction, thus returning this error. It may be worth investigating if it is the case, in which case, you might want to use a different shm for rate-limiting.

The memory consumption of rate-limiting might be aggressive, yes, but there is no case of a "memory leak" - we see memory usage stabilizing over time. We will use the new init_ttl? argument when it will be released by OpenResty. In the meantime, there are some alternatives, including rollign your own patche(s).

@davidsiracusa
Copy link

Just so we're on the same page, we're using Kong 0.11.xx:

In openresty file: ngx_http_lua_shdict.c, procedure: ngx_http_lua_ffi_shdict_incr expiry or time-to-live is hardcoded to 0, or forever so the LRU mechanism doesn't apply. The updated openresty with init_ttl is welcome, and will address this leak.

@p0pr0ck5
Copy link
Contributor

@davidsiracusa have a look at https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L249, which is called from almost all dictionary func handlers, including the FFI incr wrapper (https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L2624). It is not accurate to say the LRU mechanism doesn't apply, because in cases where there is memory pressure in the zone the oldest entries will be forcefully evicted (see https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L2773-L2784). None of this behavior has changed in between the Openresty releases used for Kong 0.11.x and 0.12.x.

@thibaultcha
Copy link
Member

@davidsiracusa
Copy link

init_ttl was recently added, and is not present in openresty for Kong 0.11.xx.
The procedure looks like this:
int ngx_http_lua_ffi_shdict_incr(ngx_shm_zone_t *zone, u_char *key,
size_t key_len, double *value, char **err, int has_init, double init,
int *forcible)...

@thibaultcha
Copy link
Member

How is this even related? The init_ttl flag has nothing to do with the LRU eviction mechanism.

@davidsiracusa
Copy link

Sorry, as mentioned before these incr key/value pairs cannot be reclaimed by the LRU, they persist forever. Only candidates which do specify a ttl fall into the LRU expiry candidate category. Eventually incr uses enough memory that memory is fully exhausted. There a window as the ceiling closes in that some LRU eviction of compliant entries may free up holes, at this point "no memory errors" are sporadic, as incr closes in, a fatal lack of memory occurs. This is easily reproduced.

@thibaultcha
Copy link
Member

as mentioned before these incr key/value pairs cannot be reclaimed by the LRU, they persist forever. Only candidates which do specify a ttl fall into the LRU expiry candidate category.

For the last time, that is simply wrong.

error_log logs/error.log notice;

events {}

http {
    lua_shared_dict counters 12k;

    init_by_lua_block {
        require "resty.core"
    }

    server {
        listen 9000;

        location / {
            content_by_lua_block {
                local key = "key_" .. ngx.now() * 1000

                local newcount, err, forcible = ngx.shared.counters:incr(key, 1, 0)
                if err then
                    ngx.log(ngx.ERR, "failed to incr(): ", err)
                    return ngx.exit(500)
                end

                ngx.print(key, ": ", newcount)
                if forcible then
                    ngx.print(" (forcibly removed a value from the shm)")
                    ngx.log(ngx.NOTICE, "forcibly removed a value")
                end

                ngx.say()
            }
        }
    }
}

Yes, no memory error can happen but they do not happen because of a memory leak. Re-read my numerous answers to you in the Kong and OpenResty repositories, and the related thread I linked.

I am locking this topic.

@Kong Kong locked as too heated and limited conversation to collaborators Feb 15, 2018
thibaultcha added a commit that referenced this issue Mar 17, 2018
Addresses the issue discussed in #3124 and #3241. This is part of a
series of fixes to address those errors.

Context
-------
In the `local` mode of the rate-limiting plugins, storing the
rate-limiting counters in the same shm used by Kong's database cache is
too invasive for the underlying shm, especially when the rate-limiting
plugins are used with a `seconds` precision.

On top of exhausting the database cache slots, this approach also
generates some form of fragmentation in the shm. This is due to the
side-by-side storage of values with sizes of different orders of
magnitude (JSON strings vs. an incremented double) and the LRU eviction
mechanism. When the shm is full and LRU kicks-in, it is highly probable
that several rate-limiting counters will be evicted (due to their
proliferation), thus not freeing enough space to store the retrieved
data, causing a `no memory` error to be reported by the shm.

Solution
--------
Declaring shms that are only used by some plugins is not very elegant.
Now, all users (even those not using rate-limiting plugins) have to pay
a memory cost (although small).
Unfortunately, and in the absence of a more dynamic solution to shm
configuration such as a more dynamic templating engine, or a
`configure_by_lua` phase, this is the safest solution.

Size rationale
--------------
Running a script generating similar keys and storing similar values
(double) indicates that an shm with 12Mb should be able to store about
~48,000 of those values at once. It is important to remind ourselves
that one Consumer/IP address might use more than one key (in fact, one
per period configured on the plugin), and both the rate-limiting and
response-ratelimiting plugins at once, and they use the same shms.

Even considering the above statements, ~48,000 keys per node seems
somewhat reasonable, considering keys of `second` precision will most
likely fill up the shm and be candidates for LRU eviction. Our concern
lies instead around long-lived limits (and thus, keys) set by the user.

Additionally, a future improvement upon this will be the setting of the
`init_ttl` argument for the rate-limiting keys, which will help **quite
considerably** in reducing the footprint of the plugins on the shm. As
of this day, this feature has been contributed to ngx_lua but not
released yet:

    openresty/lua-nginx-module#1226

See also
--------
Another piece of the fixes for the `no memory` errors resides in the
behavior of the database caching module upon a full shm. See:

    thibaultcha/lua-resty-mlcache#41

This patch reduces the likeliness of a full shm (by a lot!), but does
not remove it. The above patch ensures a somewhat still sane behavior
would the shm happen to be full again.

Fix #3124
Fix #3241
thibaultcha added a commit that referenced this issue Mar 20, 2018
Addresses the issue discussed in #3124 and #3241. This is part of a
series of fixes to address those errors.

Context
-------
In the `local` mode of the rate-limiting plugins, storing the
rate-limiting counters in the same shm used by Kong's database cache is
too invasive for the underlying shm, especially when the rate-limiting
plugins are used with a `seconds` precision.

On top of exhausting the database cache slots, this approach also
generates some form of fragmentation in the shm. This is due to the
side-by-side storage of values with sizes of different orders of
magnitude (JSON strings vs. an incremented double) and the LRU eviction
mechanism. When the shm is full and LRU kicks-in, it is highly probable
that several rate-limiting counters will be evicted (due to their
proliferation), thus not freeing enough space to store the retrieved
data, causing a `no memory` error to be reported by the shm.

Solution
--------
Declaring shms that are only used by some plugins is not very elegant.
Now, all users (even those not using rate-limiting plugins) have to pay
a memory cost (although small).
Unfortunately, and in the absence of a more dynamic solution to shm
configuration such as a more dynamic templating engine, or a
`configure_by_lua` phase, this is the safest solution.

Size rationale
--------------
Running a script generating similar keys and storing similar values
(double) indicates that an shm with 12Mb should be able to store about
~48,000 of those values at once. It is important to remind ourselves
that one Consumer/IP address might use more than one key (in fact, one
per period configured on the plugin), and both the rate-limiting and
response-ratelimiting plugins at once, and they use the same shms.

Even considering the above statements, ~48,000 keys per node seems
somewhat reasonable, considering keys of `second` precision will most
likely fill up the shm and be candidates for LRU eviction. Our concern
lies instead around long-lived limits (and thus, keys) set by the user.

Additionally, a future improvement upon this will be the setting of the
`init_ttl` argument for the rate-limiting keys, which will help **quite
considerably** in reducing the footprint of the plugins on the shm. As
of this day, this feature has been contributed to ngx_lua but not
released yet:

    openresty/lua-nginx-module#1226

Again, this limit only applies when using the **local** strategy, which
also likely means that a load-balancer is distributing traffic to a pool
of Kong nodes with some sort of consistent load-balancing technique.
Thus considerably reducing the number of concurrent Consumers a given
node needs to handle at once.

See also
--------
Another piece of the fixes for the `no memory` errors resides in the
behavior of the database caching module upon a full shm. See:

    thibaultcha/lua-resty-mlcache#41

This patch reduces the likeliness of a full shm (by a lot!), but does
not remove it. The above patch ensures a somewhat still sane behavior
would the shm happen to be full again.

Fix #3124
Fix #3241
thibaultcha added a commit that referenced this issue Mar 26, 2018
This is part of a series of fixes:
- thibaultcha/lua-resty-mlcache#41
- thibaultcha/lua-resty-mlcache#42
- #3311
- #3341

Context
-------
In the `local` mode of the rate-limiting plugins, storing the
rate-limiting counters in the same shm used by Kong's database cache is
too invasive for the underlying shm, especially when the rate-limiting
plugins are used with a `seconds` precision.

On top of exhausting the database cache slots, this approach also
generates some form of fragmentation in the shm. This is due to the
side-by-side storage of values with sizes of different orders of
magnitude (JSON strings vs. an incremented double) and the LRU eviction
mechanism. When the shm is full and LRU kicks-in, it is highly probable
that several rate-limiting counters will be evicted (due to their
proliferation), thus not freeing enough space to store the retrieved
data, causing a `no memory` error to be reported by the shm.

Solution
--------
Declaring shms that are only used by some plugins is not very elegant.
Now, all users (even those not using rate-limiting plugins) have to pay
a memory cost (although small).
Unfortunately, and in the absence of a more dynamic solution to shm
configuration such as a more dynamic templating engine, or a
`configure_by_lua` phase, this is the safest solution.

Size rationale
--------------
Running a script generating similar keys and storing similar values
(double) indicates that an shm with 12Mb should be able to store about
~48,000 of those values at once. It is important to remind ourselves
that one Consumer/IP address might use more than one key (in fact, one
per period configured on the plugin), and both the rate-limiting and
response-ratelimiting plugins at once, and they use the same shms.

Even considering the above statements, ~48,000 keys per node seems
somewhat reasonable, considering keys of `second` precision will most
likely fill up the shm and be candidates for LRU eviction. Our concern
lies instead around long-lived limits (and thus, keys) set by the user.

Additionally, a future improvement upon this will be the setting of the
`init_ttl` argument for the rate-limiting keys, which will help **quite
considerably** in reducing the footprint of the plugins on the shm. As
of this day, this feature has been contributed to ngx_lua but not
released yet:

    openresty/lua-nginx-module#1226

Again, this limit only applies when using the **local** strategy, which
also likely means that a load-balancer is distributing traffic to a pool
of Kong nodes with some sort of consistent load-balancing technique.
Thus considerably reducing the number of concurrent Consumers a given
node needs to handle at once.

See also
--------
Another piece of the fixes for the `no memory` errors resides in the
behavior of the database caching module upon a full shm. See:

    thibaultcha/lua-resty-mlcache#41

This patch reduces the likeliness of a full shm (by a lot!), but does
not remove it. The above patch ensures a somewhat still sane behavior
would the shm happen to be full again.

Fix #3124
Fix #3241
thibaultcha added a commit that referenced this issue Mar 28, 2018
This is part of a series of fixes:
- thibaultcha/lua-resty-mlcache#41
- thibaultcha/lua-resty-mlcache#42
- #3311
- #3341

Context
-------
In the `local` mode of the rate-limiting plugins, storing the
rate-limiting counters in the same shm used by Kong's database cache is
too invasive for the underlying shm, especially when the rate-limiting
plugins are used with a `seconds` precision.

On top of exhausting the database cache slots, this approach also
generates some form of fragmentation in the shm. This is due to the
side-by-side storage of values with sizes of different orders of
magnitude (JSON strings vs. an incremented double) and the LRU eviction
mechanism. When the shm is full and LRU kicks-in, it is highly probable
that several rate-limiting counters will be evicted (due to their
proliferation), thus not freeing enough space to store the retrieved
data, causing a `no memory` error to be reported by the shm.

Solution
--------
Declaring shms that are only used by some plugins is not very elegant.
Now, all users (even those not using rate-limiting plugins) have to pay
a memory cost (although small).
Unfortunately, and in the absence of a more dynamic solution to shm
configuration such as a more dynamic templating engine, or a
`configure_by_lua` phase, this is the safest solution.

Size rationale
--------------
Running a script generating similar keys and storing similar values
(double) indicates that an shm with 12Mb should be able to store about
~48,000 of those values at once. It is important to remind ourselves
that one Consumer/IP address might use more than one key (in fact, one
per period configured on the plugin), and both the rate-limiting and
response-ratelimiting plugins at once, and they use the same shms.

Even considering the above statements, ~48,000 keys per node seems
somewhat reasonable, considering keys of `second` precision will most
likely fill up the shm and be candidates for LRU eviction. Our concern
lies instead around long-lived limits (and thus, keys) set by the user.

Additionally, a future improvement upon this will be the setting of the
`init_ttl` argument for the rate-limiting keys, which will help **quite
considerably** in reducing the footprint of the plugins on the shm. As
of this day, this feature has been contributed to ngx_lua but not
released yet:

    openresty/lua-nginx-module#1226

Again, this limit only applies when using the **local** strategy, which
also likely means that a load-balancer is distributing traffic to a pool
of Kong nodes with some sort of consistent load-balancing technique.
Thus considerably reducing the number of concurrent Consumers a given
node needs to handle at once.

See also
--------
Another piece of the fixes for the `no memory` errors resides in the
behavior of the database caching module upon a full shm. See:

    thibaultcha/lua-resty-mlcache#41

This patch reduces the likeliness of a full shm (by a lot!), but does
not remove it. The above patch ensures a somewhat still sane behavior
would the shm happen to be full again.

Fix #3124
Fix #3241
From #3311
thibaultcha added a commit that referenced this issue Apr 23, 2018
This is part of a series of fixes:
- thibaultcha/lua-resty-mlcache#41
- thibaultcha/lua-resty-mlcache#42
- #3311
- #3341

Context
-------
In the `local` mode of the rate-limiting plugins, storing the
rate-limiting counters in the same shm used by Kong's database cache is
too invasive for the underlying shm, especially when the rate-limiting
plugins are used with a `seconds` precision.

On top of exhausting the database cache slots, this approach also
generates some form of fragmentation in the shm. This is due to the
side-by-side storage of values with sizes of different orders of
magnitude (JSON strings vs. an incremented double) and the LRU eviction
mechanism. When the shm is full and LRU kicks-in, it is highly probable
that several rate-limiting counters will be evicted (due to their
proliferation), thus not freeing enough space to store the retrieved
data, causing a `no memory` error to be reported by the shm.

Solution
--------
Declaring shms that are only used by some plugins is not very elegant.
Now, all users (even those not using rate-limiting plugins) have to pay
a memory cost (although small).
Unfortunately, and in the absence of a more dynamic solution to shm
configuration such as a more dynamic templating engine, or a
`configure_by_lua` phase, this is the safest solution.

Size rationale
--------------
Running a script generating similar keys and storing similar values
(double) indicates that an shm with 12Mb should be able to store about
~48,000 of those values at once. It is important to remind ourselves
that one Consumer/IP address might use more than one key (in fact, one
per period configured on the plugin), and both the rate-limiting and
response-ratelimiting plugins at once, and they use the same shms.

Even considering the above statements, ~48,000 keys per node seems
somewhat reasonable, considering keys of `second` precision will most
likely fill up the shm and be candidates for LRU eviction. Our concern
lies instead around long-lived limits (and thus, keys) set by the user.

Additionally, a future improvement upon this will be the setting of the
`init_ttl` argument for the rate-limiting keys, which will help **quite
considerably** in reducing the footprint of the plugins on the shm. As
of this day, this feature has been contributed to ngx_lua but not
released yet:

    openresty/lua-nginx-module#1226

Again, this limit only applies when using the **local** strategy, which
also likely means that a load-balancer is distributing traffic to a pool
of Kong nodes with some sort of consistent load-balancing technique.
Thus considerably reducing the number of concurrent Consumers a given
node needs to handle at once.

See also
--------
Another piece of the fixes for the `no memory` errors resides in the
behavior of the database caching module upon a full shm. See:

    thibaultcha/lua-resty-mlcache#41

This patch reduces the likeliness of a full shm (by a lot!), but does
not remove it. The above patch ensures a somewhat still sane behavior
would the shm happen to be full again.

Fix #3124
Fix #3241
From #3311
thibaultcha added a commit that referenced this issue Apr 24, 2018
This is part of a series of fixes:
- thibaultcha/lua-resty-mlcache#41
- thibaultcha/lua-resty-mlcache#42
- #3311
- #3341

Context
-------
In the `local` mode of the rate-limiting plugins, storing the
rate-limiting counters in the same shm used by Kong's database cache is
too invasive for the underlying shm, especially when the rate-limiting
plugins are used with a `seconds` precision.

On top of exhausting the database cache slots, this approach also
generates some form of fragmentation in the shm. This is due to the
side-by-side storage of values with sizes of different orders of
magnitude (JSON strings vs. an incremented double) and the LRU eviction
mechanism. When the shm is full and LRU kicks-in, it is highly probable
that several rate-limiting counters will be evicted (due to their
proliferation), thus not freeing enough space to store the retrieved
data, causing a `no memory` error to be reported by the shm.

Solution
--------
Declaring shms that are only used by some plugins is not very elegant.
Now, all users (even those not using rate-limiting plugins) have to pay
a memory cost (although small).
Unfortunately, and in the absence of a more dynamic solution to shm
configuration such as a more dynamic templating engine, or a
`configure_by_lua` phase, this is the safest solution.

Size rationale
--------------
Running a script generating similar keys and storing similar values
(double) indicates that an shm with 12Mb should be able to store about
~48,000 of those values at once. It is important to remind ourselves
that one Consumer/IP address might use more than one key (in fact, one
per period configured on the plugin), and both the rate-limiting and
response-ratelimiting plugins at once, and they use the same shms.

Even considering the above statements, ~48,000 keys per node seems
somewhat reasonable, considering keys of `second` precision will most
likely fill up the shm and be candidates for LRU eviction. Our concern
lies instead around long-lived limits (and thus, keys) set by the user.

Additionally, a future improvement upon this will be the setting of the
`init_ttl` argument for the rate-limiting keys, which will help **quite
considerably** in reducing the footprint of the plugins on the shm. As
of this day, this feature has been contributed to ngx_lua but not
released yet:

    openresty/lua-nginx-module#1226

Again, this limit only applies when using the **local** strategy, which
also likely means that a load-balancer is distributing traffic to a pool
of Kong nodes with some sort of consistent load-balancing technique.
Thus considerably reducing the number of concurrent Consumers a given
node needs to handle at once.

See also
--------
Another piece of the fixes for the `no memory` errors resides in the
behavior of the database caching module upon a full shm. See:

    thibaultcha/lua-resty-mlcache#41

This patch reduces the likeliness of a full shm (by a lot!), but does
not remove it. The above patch ensures a somewhat still sane behavior
would the shm happen to be full again.

Fix #3124
Fix #3241
From #3311
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants