-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate Limiting plugin memory leak issue #3124
Comments
More observation. I suggest to change above code to the following code that use ngx.shared.DICT.set to set respective expire time. Just like the logic in policy=redis
|
Well, this is somewhat embarrassing. It seems I made a somewhat short-sighted decision in Good news, though! We can use the hot-off-the-press |
The rate limiting plugins, call to shm:incr, call ngx_http_lua_shdict.c|ngx_http_lua_shdict_incr which performs an atomic increment across all worker processes. The underlying C code performs lock/unlock as needed. Performing a get/set outside of the C logic, will lead to concurrency issues. I have submitted an issue to openresty: openresty/openresty#328 |
For clarity: with regard to the logic above, performing the arithmetic increment will not work, as there are multiple worker processes and threads involved. The increment must be performed in ngx_http_lua_shdict_incr, where locking ensures an atomic increment. A get/set will fall short. . . . |
@davidsiracusa Yes, your are correct. shm:set will not work for multiple worker processes. |
@agentzh’s is now open to adding a ttl argument to incr. He has asked me to contribute to openresty. Doing so is more efficient than calling expire multiple times from Kong. |
Prior art related to this subject: openresty/lua-nginx-module#942 |
openresty/openresty#328 has been closed. |
Been a bit over a month and thought would be good for a checkup, is this fix in the pipes to be released into the 12.x series or the 13.x series coming up? |
There is a lot of confusion and a few assumptions being made here. There is no memory leak in the rate-limiting plugin’s business logic code. The exptime argument helps reducing the footprint, but by no means resolves a fictitious memory leak. It is not planned to be added to 0.12 or 0.13, since it is not even released in OpenResty. The initial error encountered here seems to be related to the host’s memory being filled because it simply isn’t large enough. We have encountered many such cases already in the issues the past few months. A reminder that kong will use roughly ~400Mbs of memory per worker (maybe more) with the default settings. |
Currently, request/response rate-limiting leverages kong_cache, of which entries accumulate over time, never to expire, eventually leading to a "no memory" error. These never to be freed entries accelerate if per second rate limiting is used because the key uses a fully qualified date/time. The issue is self-evident in the openresty code and has been acknowledged. I've read that setting exptime may be made separately, shortly after incr, I feel this can affect performance, a waste of a roundtrip involving locking/unlocking the zone. I suggested that the atomic increment include an exptime, as it's best set at creation. This change has been made in the openresty master. |
That is simply not true, as I have been trying to tell you over and over again. The LRU eviction mechanism will eventually kick in and expire keys once the shm is full.
By who? When? Having followed the entire conversation and contributed the
I am very much aware :) But like I said, we will not use it until the next OpenResty release, as we try to avoid bundling our own "flavor" and since this is not a memory leak, hence severely lowering the priority of this to deserve such a short-notice patch. There are several code paths that lead to this
The memory consumption of rate-limiting might be aggressive, yes, but there is no case of a "memory leak" - we see memory usage stabilizing over time. We will use the new |
Just so we're on the same page, we're using Kong 0.11.xx: In openresty file: ngx_http_lua_shdict.c, procedure: ngx_http_lua_ffi_shdict_incr expiry or time-to-live is hardcoded to 0, or forever so the LRU mechanism doesn't apply. The updated openresty with init_ttl is welcome, and will address this leak. |
@davidsiracusa have a look at https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L249, which is called from almost all dictionary func handlers, including the FFI |
@davidsiracusa I am sorry to have to say it, but this conversation is seriously starting to bother me. https://github.com/openresty/lua-nginx-module/blob/master/src/ngx_http_lua_shdict.c#L266 |
init_ttl was recently added, and is not present in openresty for Kong 0.11.xx. |
How is this even related? The init_ttl flag has nothing to do with the LRU eviction mechanism. |
Sorry, as mentioned before these incr key/value pairs cannot be reclaimed by the LRU, they persist forever. Only candidates which do specify a ttl fall into the LRU expiry candidate category. Eventually incr uses enough memory that memory is fully exhausted. There a window as the ceiling closes in that some LRU eviction of compliant entries may free up holes, at this point "no memory errors" are sporadic, as incr closes in, a fatal lack of memory occurs. This is easily reproduced. |
For the last time, that is simply wrong.
Yes, I am locking this topic. |
Addresses the issue discussed in #3124 and #3241. This is part of a series of fixes to address those errors. Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241
Addresses the issue discussed in #3124 and #3241. This is part of a series of fixes to address those errors. Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241
This is part of a series of fixes: - thibaultcha/lua-resty-mlcache#41 - thibaultcha/lua-resty-mlcache#42 - #3311 - #3341 Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241
This is part of a series of fixes: - thibaultcha/lua-resty-mlcache#41 - thibaultcha/lua-resty-mlcache#42 - #3311 - #3341 Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241 From #3311
This is part of a series of fixes: - thibaultcha/lua-resty-mlcache#41 - thibaultcha/lua-resty-mlcache#42 - #3311 - #3341 Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241 From #3311
This is part of a series of fixes: - thibaultcha/lua-resty-mlcache#41 - thibaultcha/lua-resty-mlcache#42 - #3311 - #3341 Context ------- In the `local` mode of the rate-limiting plugins, storing the rate-limiting counters in the same shm used by Kong's database cache is too invasive for the underlying shm, especially when the rate-limiting plugins are used with a `seconds` precision. On top of exhausting the database cache slots, this approach also generates some form of fragmentation in the shm. This is due to the side-by-side storage of values with sizes of different orders of magnitude (JSON strings vs. an incremented double) and the LRU eviction mechanism. When the shm is full and LRU kicks-in, it is highly probable that several rate-limiting counters will be evicted (due to their proliferation), thus not freeing enough space to store the retrieved data, causing a `no memory` error to be reported by the shm. Solution -------- Declaring shms that are only used by some plugins is not very elegant. Now, all users (even those not using rate-limiting plugins) have to pay a memory cost (although small). Unfortunately, and in the absence of a more dynamic solution to shm configuration such as a more dynamic templating engine, or a `configure_by_lua` phase, this is the safest solution. Size rationale -------------- Running a script generating similar keys and storing similar values (double) indicates that an shm with 12Mb should be able to store about ~48,000 of those values at once. It is important to remind ourselves that one Consumer/IP address might use more than one key (in fact, one per period configured on the plugin), and both the rate-limiting and response-ratelimiting plugins at once, and they use the same shms. Even considering the above statements, ~48,000 keys per node seems somewhat reasonable, considering keys of `second` precision will most likely fill up the shm and be candidates for LRU eviction. Our concern lies instead around long-lived limits (and thus, keys) set by the user. Additionally, a future improvement upon this will be the setting of the `init_ttl` argument for the rate-limiting keys, which will help **quite considerably** in reducing the footprint of the plugins on the shm. As of this day, this feature has been contributed to ngx_lua but not released yet: openresty/lua-nginx-module#1226 Again, this limit only applies when using the **local** strategy, which also likely means that a load-balancer is distributing traffic to a pool of Kong nodes with some sort of consistent load-balancing technique. Thus considerably reducing the number of concurrent Consumers a given node needs to handle at once. See also -------- Another piece of the fixes for the `no memory` errors resides in the behavior of the database caching module upon a full shm. See: thibaultcha/lua-resty-mlcache#41 This patch reduces the likeliness of a full shm (by a lot!), but does not remove it. The above patch ensures a somewhat still sane behavior would the shm happen to be full again. Fix #3124 Fix #3241 From #3311
Summary
After set config.second property in ratelimit plugin. The second cache (e.g. dict = kong_cache, key = ratelimit:561f23d4-23d2-4283-9f4c-7c077b5c77d3:118.116.111.5:1514260849000:second, value = 1) will take all cahce memory and never expired. When new API add in or be called. No memory for new API and plugins. Then no memory error show up.
Steps To Reproduce
mem_cache_size = 100k
$ curl -X POST http://kong:8001/apis/{api}/plugins
--data "name=rate-limiting"
--data "config.second=100"
--data "config.hour=10000"
[error] 19343#0: *2427 [lua] responses.lua:107: send_HTTP_INTERNAL_SERVER_ERROR(): failed to get from node cache: could not write to lua_shared_dict: no memory, client: 118.116.111.5, server: kong, request: "GET /V1/ben_test_1221?auth_key=097C993DF185099B2B7633F4C1778023&t=1514259888796 HTTP/1.1", host: "ec2-13-56-255-95.us-west-1.compute.amazonaws.com:8000"
Additional Details & Logs
startup.log
<KONG_PREFIX>/logs/error.log
)error.log
ubuntu 16.04 LTS
The text was updated successfully, but these errors were encountered: