*127975840 [lua] responses.lua:107: do_authentication(): failed to get from node cache: could not write to lua_shared_dict: no memory #3105

Zeous9 · 2017-12-18T08:08:46Z

NOTE: GitHub issues are reserved for bug reports only.

Please read the CONTRIBUTING.md guidelines to learn on which channels you can
seek for help and ask general questions:

https://github.com/Kong/kong/blob/master/CONTRIBUTING.md#where-to-seek-for-help

Summary

use default settings shows:no memory error

Steps To Reproduce

1.create 4 api
2.create 100 consumers
3.consumers access those apis
4.error
*127975840 [lua] responses.lua:107: do_authentication(): failed to get from node cache: could not write to lua_shared_dict: no memory

Additional Details & Logs

Kong version (0.11.0`)
Kong debug-level startup logs ($ kong start --vv)
Kong error logs (<KONG_PREFIX>/logs/error.log)
Kong configuration (registered APIs/Plugins & configuration file)
Operating System linux(ubuntu 16.04 LTS)

The text was updated successfully, but these errors were encountered:

kikito · 2017-12-18T12:21:55Z

Hello and thanks for reporting this.

That message seems to indicate that you need to increase the value of the mem_cache_size variable. It's 128 MB by default.

thibaultcha · 2017-12-18T18:18:05Z

Because we do not use the "safe" setter in our shdict cache, we should expect the shdict to evict older items via its LRU mechanism, so I don't think this is related to the size of mem_cache_size.

It seems to me @Zeous9 like the NGINX slab allocator failed to allocate more pages to the shared memory zone, and you might be running into this code path. Are you running NGINX in some peculiar environment - memory limitations, containerized, or anything like so?

thibaultcha · 2017-12-18T19:37:40Z

@Zeous9 Another possibility is that you may have a single value that is too large (over the mem_cache_size) and prevents the LRU eviction mechanism to kick in altogether because it would be useless. Do you have any such value? Maybe a large text field in some of these entities?

Could you also share the list of plugins you are using, that'd be great!

derrley · 2018-01-08T18:49:03Z

We've also just run into this issue in our environment. We are running Kong in a containerized environment, but it appears to be living within its means just fine (screenshot attached). We run with mem_cache_size at its default value.

derrley · 2018-01-08T18:52:49Z

plugin configuration:

    "plugins": {
        "available_on_server": {
            "acl": true,
            "aws-lambda": true,
            "basic-auth": true,
            "bot-detection": true,
            "correlation-id": true,
            "cors": true,
            "datadog": true,
            "file-log": true,
            "galileo": true,
            "hmac-auth": true,
            "http-log": true,
            "ip-restriction": true,
            "jwt": true,
            "key-auth": true,
            "ldap-auth": true,
            "loggly": true,
            "oauth2": true,
            "rate-limiting": true,
            "request-size-limiting": true,
            "request-termination": true,
            "request-transformer": true,
            "response-ratelimiting": true,
            "response-transformer": true,
            "runscope": true,
            "statsd": true,
            "syslog": true,
            "tcp-log": true,
            "udp-log": true
        },
        "enabled_in_cluster": [
            "key-auth",
            "acl",
            "rate-limiting"
        ]
    },

This is a test environment, so there are lots of consumers coming and going as test users are added and removed -- "total": 39477

thibaultcha · 2018-01-08T18:54:55Z

@derrley We recently hit this issue internally as well and we found out that in our case, it would happen when our VMs' memory was full, preventing NGINX to allocate any additional memory, as I was suspecting in my above comment. Could you investigate this possibility on your side as well? I am having a hard time making any sense of the graph you posted, or what it is supposed to mean.

derrley · 2018-01-08T19:46:37Z

@thibaultcha Kong pod is holding steady at 549 out of its allocated 576M. We'll try increasing the limitation to see if that resolves the issue.

jeremyjpj0916 · 2018-01-08T21:26:31Z

Is this related: #3124 ? I see you are using rate limiting as well.

derrley · 2018-01-08T21:31:21Z

@jeremyjpj0916 -- I'm not sure. We only use one worker process. Does that mitigate the leak issue?

jeremyjpj0916 · 2018-01-08T21:33:22Z

I am curious if you temp turned off rate limiting if your problem disappears.

derrley · 2018-01-08T21:36:18Z

@jeremyjpj0916 I've already lifted Kong's memory limit, which forced Kong to redeploy, which, itself, massively reduced its memory consumption. If the problem reproduces in our test environment I will turn off rate limiting and see if that fixes it.

derrley · 2018-01-08T21:36:41Z

@jeremyjpj0916 is the bug only in the shared memory rate limiting? If we used redis rate limiting would that fix the issue?

Tieske · 2018-01-09T10:53:21Z

@derrley shared memory is used for many purposes, so if you temporarily get away with using another rate limiting mechanism, then the issue will return sooner or later in another spot.

As such I'd not like to classify it as a 'bug' just yet, when it might be an out-of-memory issue

kikito closed this as completed Dec 18, 2017

thibaultcha mentioned this issue Feb 15, 2018

Rate Limiting plugin memory leak issue #3124

Closed

thibaultcha mentioned this issue Mar 26, 2018

chore(deps) use upstream mlcache 2.0.0 #3341

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

*127975840 [lua] responses.lua:107: do_authentication(): failed to get from node cache: could not write to lua_shared_dict: no memory #3105

*127975840 [lua] responses.lua:107: do_authentication(): failed to get from node cache: could not write to lua_shared_dict: no memory #3105

Zeous9 commented Dec 18, 2017

kikito commented Dec 18, 2017

thibaultcha commented Dec 18, 2017

thibaultcha commented Dec 18, 2017

derrley commented Jan 8, 2018

derrley commented Jan 8, 2018

thibaultcha commented Jan 8, 2018

derrley commented Jan 8, 2018

jeremyjpj0916 commented Jan 8, 2018

derrley commented Jan 8, 2018

jeremyjpj0916 commented Jan 8, 2018

derrley commented Jan 8, 2018

derrley commented Jan 8, 2018

Tieske commented Jan 9, 2018

*127975840 [lua] responses.lua:107: do_authentication(): failed to get from node cache: could not write to lua_shared_dict: no memory #3105

*127975840 [lua] responses.lua:107: do_authentication(): failed to get from node cache: could not write to lua_shared_dict: no memory #3105

Comments

Zeous9 commented Dec 18, 2017

Summary

Steps To Reproduce

Additional Details & Logs

kikito commented Dec 18, 2017

thibaultcha commented Dec 18, 2017

thibaultcha commented Dec 18, 2017

derrley commented Jan 8, 2018

derrley commented Jan 8, 2018

thibaultcha commented Jan 8, 2018

derrley commented Jan 8, 2018

jeremyjpj0916 commented Jan 8, 2018

derrley commented Jan 8, 2018

jeremyjpj0916 commented Jan 8, 2018

derrley commented Jan 8, 2018

derrley commented Jan 8, 2018

Tieske commented Jan 9, 2018