Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't deploy on Kong 1.0.1, init errors out #4212

Closed
jeremyjpj0916 opened this issue Jan 17, 2019 · 15 comments
Closed

Can't deploy on Kong 1.0.1, init errors out #4212

jeremyjpj0916 opened this issue Jan 17, 2019 · 15 comments

Comments

@jeremyjpj0916
Copy link
Contributor

jeremyjpj0916 commented Jan 17, 2019

Summary

Can't start on 1.0.1

Output here:

2019/01/17 17:53:45 [debug] 22#0: [lua] globalpatches.lua:13: installing the globalpatches
2019/01/17 17:53:45 [debug] 22#0: [lua] globalpatches.lua:205: randomseed(): seeding PRNG from OpenSSL RAND_bytes()
2019/01/17 17:53:45 [debug] 22#0: [lua] globalpatches.lua:231: randomseed(): random seed: 182251240244 for worker nb 0
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:441: init(): [dns-client] (re)configuring dns client
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:446: init(): [dns-client] staleTtl = 4
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:449: init(): [dns-client] validTtl = nil
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:453: init(): [dns-client] noSynchronisation = false
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:472: init(): [dns-client] query order = LAST, SRV, A, CNAME
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-loopback = [::1]
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-mcastprefix = [fe00::0]
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localnet = [fe00::0]
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:512: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: localhost = [::1]
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localhost = [::1]
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:512: init(): [dns-client] adding A-record from 'hosts' file: kong-735-6kj7l = 10.xxx.xx.xxx
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allnodes = [fe00::1]
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allrouters = [fe00::2]
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:571: init(): [dns-client] nameserver 10.xx.x.xx
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:576: init(): [dns-client] attempts = 5
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:585: init(): [dns-client] timeout = 2000 ms
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:589: init(): [dns-client] ndots = 5
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:591: init(): [dns-client] search = kong-gateway-dev.svc.cluster.local, svc.cluster.local, cluster.local, company.com
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:597: init(): [dns-client] badTtl = 30 s
2019/01/17 17:53:45 [debug] 22#0: [lua] client.lua:599: init(): [dns-client] emptyTtl = 1 s
2019/01/17 17:53:45 [warn] 22#0: [lua] cluster.lua:496: refresh(): [lua-cassandra] found contact point with '0.0.0.0' as rpc_address, using 'server00466' to contact it instead. If this is incorrect you should avoid using '0.0.0.0' in rpc_address
nginx: [warn] [lua] cluster.lua:496: refresh(): [lua-cassandra] found contact point with '0.0.0.0' as rpc_address, using 'server00466' to contact it instead. If this is incorrect you should avoid using '0.0.0.0' in rpc_address
2019/01/17 17:53:45 [debug] 22#0: [lua] cluster.lua:427: next_coordinator(): [lua-cassandra] load balancing policy chose host at 10.xx.xx.xxx
2019/01/17 17:53:45 [debug] 22#0: [lua] plugins.lua:31: check_db_against_config(): Discovering used plugins
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:441: init(): [dns-client] (re)configuring dns client
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:446: init(): [dns-client] staleTtl = 4
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:449: init(): [dns-client] validTtl = nil
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:453: init(): [dns-client] noSynchronisation = false
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:472: init(): [dns-client] query order = LAST, SRV, A, CNAME
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-loopback = [::1]
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-mcastprefix = [fe00::0]
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localnet = [fe00::0]
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:512: init(): [dns-client] adding A-record from 'hosts' file: localhost = 127.0.0.1
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: localhost = [::1]
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-localhost = [::1]
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:512: init(): [dns-client] adding A-record from 'hosts' file: kong-735-6kj7l = 10.xxx.xx.xxx
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allnodes = [fe00::1]
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:527: init(): [dns-client] adding AAAA-record from 'hosts' file: ip6-allrouters = [fe00::2]
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:571: init(): [dns-client] nameserver 10.xx.x.xx
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:576: init(): [dns-client] attempts = 5
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:585: init(): [dns-client] timeout = 2000 ms
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:589: init(): [dns-client] ndots = 5
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:591: init(): [dns-client] search = kong-gateway-dev.svc.cluster.local, svc.cluster.local, cluster.local, company.com
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:597: init(): [dns-client] badTtl = 30 s
2019/01/17 17:53:46 [debug] 22#0: [lua] client.lua:599: init(): [dns-client] emptyTtl = 1 s
2019/01/17 17:53:46 [info] 22#0: [lua] mesh.lua:64: init(): initialising cluster ca...
2019/01/17 17:53:46 [info] 22#0: [lua] mesh.lua:131: init(): cluster ca initialised
2019/01/17 17:53:46 [debug] 22#0: [lua] mesh.lua:132: init(): mesh alpn is 'com.konghq/service-mesh/1/Cb5/lNJklIJaue8Z07GpTp6XTz0U5KjDy1dDgvbzMP8='
2019/01/17 17:53:46 [debug] 22#0: [lua] mesh.lua:133: init(): node certificate generated, fingerprint is 'vpdLPeURhT3uj80UfUatpvSbX8AxCdtsm8FRY/ma02w='
2019/01/17 17:53:46 [debug] 22#0: [lua] mesh.lua:182: init(): SSL_CTX* mesh ALPN selection setup complete
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-path-based-routing
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-spec-expose
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-cluster-drain
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: correlation-id
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-splunk-log
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: jwt
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:281: load_plugin_schemas(): Loading custom plugin entity: 'jwt.jwt_secrets'
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: cors
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-oidc-multi-idp
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-response-size-limiting
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-oidc-implicit-token
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-upstream-jwt
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: zipkin
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-error-log
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: request-termination
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: request-size-limiting
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: acl
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:281: load_plugin_schemas(): Loading custom plugin entity: 'acl.acls'
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: kong-service-virtualization
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: request-transformer
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: rate-limiting
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: statsd
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:267: load_plugin_schemas(): Loading plugin: oauth2
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:281: load_plugin_schemas(): Loading custom plugin entity: 'oauth2.oauth2_credentials'
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:281: load_plugin_schemas(): Loading custom plugin entity: 'oauth2.oauth2_authorization_codes'
2019/01/17 17:53:46 [debug] 22#0: [lua] plugins.lua:281: load_plugin_schemas(): Loading custom plugin entity: 'oauth2.oauth2_tokens'
2019/01/17 17:53:49 [error] 22#0: init_by_lua error: /usr/local/share/lua/5.1/kong/db/init.lua:125: attempt to concatenate local 'err' (a nil value)
stack traceback:
	/usr/local/share/lua/5.1/kong/db/init.lua:125: in function 'prefix_err'
	/usr/local/share/lua/5.1/kong/db/init.lua:190: in function 'close'
	/usr/local/share/lua/5.1/kong/init.lua:360: in function 'init'
	init_by_lua:3: in main chunk
nginx: [error] init_by_lua error: /usr/local/share/lua/5.1/kong/db/init.lua:125: attempt to concatenate local 'err' (a nil value)
stack traceback:
	/usr/local/share/lua/5.1/kong/db/init.lua:125: in function 'prefix_err'
	/usr/local/share/lua/5.1/kong/db/init.lua:190: in function 'close'
	/usr/local/share/lua/5.1/kong/init.lua:360: in function 'init'
	init_by_lua:3: in main chunk

Same issue I reported here:
#4178
And here:
#4138

In this deployment of 1.0.1 I am not running any custom kong pr patches(since all those 1.0.0 perf PRs are now merged).

Steps To Reproduce

I don't know how to reproduce it, just happens on start every time. The fix for me was commenting out:
https://github.com/Kong/kong/blob/master/kong/init.lua#L360

C* Configuration:

- name: KONG_CASSANDRA_PORT
          value: '9042'
        - name: KONG_CASSANDRA_KEYSPACE
          value: kong_dev
        - name: KONG_CASSANDRA_TIMEOUT
          value: '5000'
        - name: KONG_CASSANDRA_SSL
          value: 'on'
        - name: KONG_CASSANDRA_SSL_VERIFY
          value: 'on'
        - name: KONG_CASSANDRA_USERNAME
          value: *****
        - name: KONG_CASSANDRA_PASSWORD
          value:  *****
        - name: KONG_CASSANDRA_CONSISTENCY
          value: LOCAL_QUORUM
        - name: KONG_CASSANDRA_LB_POLICY
          value: RequestDCAwareRoundRobin
        - name: KONG_CASSANDRA_LOCAL_DATACENTER
          value: DC2
        - name: KONG_CASSANDRA_CONTACT_POINTS
          value: 'server00466,server00467,server00468,server00469,server00470,server00471'

Additional Details & Logs

  • Kong version 1.0.1
  • Kong debug-level startup logs included debug log execution above.
  • Operating system Alpine from docker-kong repo
@jeremyjpj0916 jeremyjpj0916 changed the title Can't deployon Kong 1.0.1, errors out Can't deploy on Kong 1.0.1, errors out Jan 17, 2019
@jeremyjpj0916 jeremyjpj0916 changed the title Can't deploy on Kong 1.0.1, errors out Can't deploy on Kong 1.0.1, init errors out Jan 17, 2019
@bungle
Copy link
Member

bungle commented Jan 17, 2019

I guess changing this:
https://github.com/Kong/kong/blob/master/kong/db/strategies/cassandra/connector.lua#L241

to

return true

also fixes it.

@jeremyjpj0916
Copy link
Contributor Author

@bungle I will give that a go too and let you know if that fixes it too.

@hishamhm
Copy link
Contributor

@jeremyjpj0916 sorry about dropping the ball on those comments that reported this particular issue (lots of issues and PRs in flight at the same time leading up to release). We're trying to reproduce the issue locally to get a better hang of the circumstances triggering it.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Jan 17, 2019

No dice on the return true patch at that line. Still seeing the error, if you have an ngx.log line you want me to pretty print or add somewhere to dump a lua table I will give it a go and save the reproducing time:


2019/01/17 20:09:47 [error] 22#0: init_by_lua error: /usr/local/share/lua/5.1/kong/db/init.lua:125: attempt to concatenate local 'err' (a nil value)
--
  | stack traceback:
  | /usr/local/share/lua/5.1/kong/db/init.lua:125: in function 'prefix_err'
  | /usr/local/share/lua/5.1/kong/db/init.lua:190: in function 'close'
  | /usr/local/share/lua/5.1/kong/init.lua:360: in function 'init'
  | init_by_lua:3: in main chunk
  | nginx: [error] init_by_lua error: /usr/local/share/lua/5.1/kong/db/init.lua:125: attempt to concatenate local 'err' (a nil value)
  | stack traceback:
  | /usr/local/share/lua/5.1/kong/db/init.lua:125: in function 'prefix_err'
  | /usr/local/share/lua/5.1/kong/db/init.lua:190: in function 'close'
  | /usr/local/share/lua/5.1/kong/init.lua:360: in function 'init'
  | init_by_lua:3: in main chunk


@bungle
Copy link
Member

bungle commented Jan 17, 2019

@jeremyjpj0916

That was unexpected, so it leaves us with this:
https://github.com/Kong/kong/blob/1.0.1/kong/db/strategies/cassandra/connector.lua#L249

Can you quickly try to change this:
https://github.com/Kong/kong/blob/1.0.1/kong/db/strategies/cassandra/connector.lua#L248

to:

if not ok and err then

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Jan 17, 2019

Want me to also leave the above return true attempt in place with that or revert it @bungle ?

@bungle
Copy link
Member

bungle commented Jan 17, 2019

@jeremyjpj0916 both can be there.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Jan 17, 2019

@bungle gave it a go(but had already reverted the first recommendation before seeing your response). The second recommendation worked and fixed the stacktrace error and Kong is now running. Maybe a 1.0.1 hotfix release otw if that change does not cause unforeseen issues anywhere else 👍 ?

@bungle
Copy link
Member

bungle commented Jan 17, 2019

@jeremyjpj0916 thank you! It is a bit tricky still to reproduce. I have tried cluster and everything. But why the second proposal worked is because of:
http://w3.impa.br/~diego/software/luasocket/tcp.html#close

Lua Cassandra doesn't fully normalize OpenResty/Nginx socket and LuaSocket. LuaSocket has no return value on close(?). Though it really impresses me why I cannot reproduce it, or that test suite didn't reproduce it. (we need to fallback to LuaSocket on init as there are no co-sockets available at that point).

@jeremyjpj0916
Copy link
Contributor Author

Strange, promise I am not doing anything funky on my end lol. If I can add any debug trace lines to help you out in my env just let me know!

@bungle
Copy link
Member

bungle commented Jan 17, 2019

Now I am wondering if it only happens when LuaSec wraps the LuaSocket:
https://github.com/brunoos/luasec/wiki/LuaSec-0.7#conn_close

I'll test that next (aka issue only happens with SSL connection).

@thibaultcha
Copy link
Member

I have reproduced this and can confirm that it only occurs when connection to C* with client-to-node TLS enabled. A fix was opened here: #4214

We'll roll out 1.0.2 with this fix.

@jeremyjpj0916
Copy link
Contributor Author

jeremyjpj0916 commented Jan 17, 2019

Woot! Glad to know it was not me being stupid(I know that yall have seen my dumb moments), I had a sneaking suspicion that it may have had to do with us using TLS Kong->C*, hopefully something that can be added to a unit test to prevent a future regression. Will 1.0.2 be weeks away or will it just focus on this one bug?

@thibaultcha
Copy link
Member

We're aiming at an urgent release for 1.0.2 (possibly today or tomorrow).

thibaultcha added a commit that referenced this issue Jan 17, 2019
In the init phase, lua-cassandra fallbacks to LuaSocket (since cosockets
are not supported). When we ask Kong to connect to the C* peers over
TLS, we use LuaSec. LuaSec wraps the LuaSocket TCP object (which itself
wraps the kernel socket).

LuaSocket normally returns `1` when calling `sock:close()`, but when
wrapped by LuaSec, the latter dismisses that return value, and thus,
`sslsock:close()` does _not_ return anything.

We could work around this limitation by providing an additional fix to
the pgmoon and lua-cassandra LuaSocket metatable wrappers, but this
commit presents a faster fix for the sake of efficiency, in the spirit
of the 1.0.2 release.

Note also how we changed the return value of `db_conn:close()` to `true`
when no stored connection is found. While not following the initial
design of this API, this changes will help prevent misuses in
higher-level modules (such as the DAO).

Fix #4212
From #4214
thibaultcha added a commit that referenced this issue Jan 18, 2019
In the init phase, lua-cassandra fallbacks to LuaSocket (since cosockets
are not supported). When we ask Kong to connect to the C* peers over
TLS, we use LuaSec. LuaSec wraps the LuaSocket TCP object (which itself
wraps the kernel socket).

LuaSocket normally returns `1` when calling `sock:close()`, but when
wrapped by LuaSec, the latter dismisses that return value, and thus,
`sslsock:close()` does _not_ return anything.

We could work around this limitation by providing an additional fix to
the pgmoon and lua-cassandra LuaSocket metatable wrappers, but this
commit presents a faster fix for the sake of efficiency, in the spirit
of the 1.0.2 release.

Note also how we changed the return value of `db_conn:close()` to `true`
when no stored connection is found. While not following the initial
design of this API, this changes will help prevent misuses in
higher-level modules (such as the DAO).

Fix #4212
From #4214
@thibaultcha
Copy link
Member

We just released 1.0.2 which should address this regression. Thanks for the report @jeremyjpj0916!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants