Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(core) implement internal dns with loadbalancing, replace dnsmasq #1587

Merged
merged 38 commits into from
Nov 4, 2016

Conversation

Tieske
Copy link
Member

@Tieske Tieske commented Sep 2, 2016

Summary

implements an internal dns resolver, removes the dnsmasq dependency.

Features

  • uses the balancer_by_lua directive
  • dns resolution is cached in Lua memory
  • loadbalancing; lookups are rotated (round-robin) over the entire dns record. For example if the upstream url resolves to an A record with 3 entries, Kong will automatically round-robin the requests over those
  • resolver also resolves SRV records and takes into account the ports in those cases
  • the new connect method also resolves ports. So auxiliary systems can now also be configured using SRV records. The effect is that for example, the LDAP plugin, when connecting to a host named "myHost" and port "123", will be looked up against an SRV record as well. If the SRV record returns 3 hosts with ports, those will be used in round robin mode (the provided port '123' will be overridden by the dns results)

todo

additional todos, not making it here...

related to

instructions

To use this branch the dns.lua module must be manually installed

@thibaultcha
Copy link
Member

Cool stuff 👏

@@ -111,7 +112,7 @@ _send = function(premature, self, to_send)
local client = http.new()
client:set_timeout(self.connection_timeout)

local ok, err = client:connect(self.host, self.port)
local ok, err = connect(client, self.host, self.port)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could implement this in our globalpatches module now, to avoid diverging from the tcpsock regular API.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you see that? the connect method is metatable property of the socket userdata. I don't think we should mess with that. Or did you have something different in mind?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

-- globalpatches.lua
local tcp = ngx.socket.tcp
_G.ngx.socket.tcp = function(...)
  local sock = tcp(...)
  local connect = sock.connect
  sock.connect = function(...)
    return dns.connect(sock, ...)
  end
  return sock
end

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you try that? I'd expect sock to be a userdata, so the line sock.connect = ... will fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might just work

> resty -e "local s = ngx.socket.tcp(); print(type(s));s.connect = function() end"
table

Copy link
Member

@thibaultcha thibaultcha Sep 3, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does work, but was rejected by resty-cli

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean 'was rejected by resty-cli'?

@thibaultcha
Copy link
Member

Should we make proxy_next_upstream and proxy_next_upstream_tries configurable? They could be configured on an per-API basis just like proxy_pass = $upstream_host; (which btw, I think should be updated).

@thibaultcha
Copy link
Member

check necessity to update cli / db name resolving, would be nice to use SRV records here as well

Is there any reason why we couldn't do that? resty-cli runs in a timer context and we do not mind about blocking operations. I believe we should be able to resolve /etc/hosts entries as well as use dns.lua atop tcpsock:connect() (see my note about patching it in globalpatches too).

@thibaultcha thibaultcha added pr/wip A work in progress PR opened to receive feedback and removed pr/status/ongoing-discussion labels Sep 2, 2016
balancer_by_lua_block {
kong.balancer()
}
keepalive 60;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for polishing this PR: make this configurable.
Also worth considering allowing users to add custom upstream blocks (managed by our template) to allow for different upstream connection pools maybe? With different keepalive settings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With different keepalive settings.

I think that making it globally configurable in the configuration file would be enough for now.

@sonicaghi
Copy link
Member

loadbalancing; lookups are rotated (round-robin) over the entire dns record

Would be cool if the LB will be customizable with custom policies in the future.

@thibaultcha thibaultcha changed the title feature(core) implement internal dns with loadbalancing, replace dnsmasq feat(core) implement internal dns with loadbalancing, replace dnsmasq Sep 2, 2016
@Tieske
Copy link
Member Author

Tieske commented Sep 2, 2016

Is there any reason why we couldn't do that? resty-cli runs in a timer context and we do not mind about blocking operations. I believe we should be able to resolve /etc/hosts entries as well as use dns.lua atop tcpsock:connect() (see my note about patching it in globalpatches too).

If patching the the connect method in globalpatches works, then it would be easy.

@subnetmarco
Copy link
Member

Currently having this error:

Missing dependencies for kong:
dns == 0.1


Error: Could not satisfy dependency: dns == 0.1
make: *** [install] Error 1

…rs an empty table if nothing is specified

instead of the previous `nil`. So if table length is 0, drop it and revert to defaults.
@Tieske Tieske mentioned this pull request Oct 11, 2016
7 tasks
@@ -1,16 +1,12 @@
#!/usr/bin/env resty
#!/usr/bin/env resty -c 65535
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would override whatever ulimit I have already set - do we really need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it, latest dns.lua master doesn't use it anymore

-- @param target the table with the target details
-- @return balancer if found, or nil if not found, or nil+error on error
local get_balancer = function(target)
return nil -- TODO: place holder, forces dns use to first fix regression
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tieske said we can merge with this TODO in place, because it's fixed in the other upstream PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

local ok, err = balancer_execute(balancer_address)
if not ok then
ngx.log(ngx.ERR, "failed the initial dns/balancer resolve: ", err)
return ngx.exit(500)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use return responses.send_HTTP_INTERNAL_SERVER_ERROR(err).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed


for _, row in ipairs(rows) do
if not row.retries then -- only if retries is not set already
local _, err = dao.apis:update(row, { id = row.id }, {full = true})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: this is not doing anything. It's probably missing a row.retries = 5 before executing the update(..) method.

@@ -163,7 +163,8 @@ return {
request_path = {type = "string", unique = true, func = check_request_path},
strip_request_path = {type = "boolean", default = false},
upstream_url = {type = "url", required = true, func = validate_upstream_url_protocol},
preserve_host = {type = "boolean", default = false}
preserve_host = {type = "boolean", default = false},
retries = {type = "number", default = 5, func = check_retries},
Copy link
Member

@subnetmarco subnetmarco Oct 11, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the previous row.retries = 5 in the Cassandra migration was not set because the expectation was that re-updating the same entity would populate the default values.

Not sure if this really happens, and if it doesn't, should it be fixed? This needs to be tested.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just checked it and it works as it currently is. Adding an api on 0.9.2, then restarting Kong with this branch (running the migrations), results in this api output;

Mashapes-MacBook-Pro-2:kong thijs$ http get localhost:8001/apis
HTTP/1.1 200 OK
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Wed, 12 Oct 2016 18:10:31 GMT
Server: kong/0.9.2
Transfer-Encoding: chunked

{
    "data": [
        {
            "created_at": 1476295721000,
            "id": "13af5ca7-a617-4e22-92b7-540fe9607e65",
            "name": "tieske",
            "preserve_host": false,
            "request_path": "/",
            "retries": 5,
            "strip_request_path": false,
            "upstream_url": "http://www.thijsschreijer.nl"
        }
    ],
    "total": 1
}

Retries default is now set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^^ being with Cassandra configured as the database

local ok, err = balancer_execute(addr)
if not ok then
ngx.log(ngx.ERR, "failed to retry the balancer/resolver: ", err)
return ngx.exit(500)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use responses.send_HTTP..

local ok, err = set_current_peer(addr.ip, addr.port)
if not ok then
ngx.log(ngx.ERR, "failed to set the current peer: ", err)
return ngx.exit(500)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should use responses.send_HTTP..

@subnetmarco
Copy link
Member

We need to add more integration tests that check if SRV records are properly resolved, for example.

@Tieske
Copy link
Member Author

Tieske commented Oct 12, 2016

keepalive is now configurable (one global setting)

@pgieniec5tar
Copy link

@Tieske one thing I wanted to bring up here, for ELB's in AWS, they will resolve to an ip address, however that underlying ip can eventually change.

I noticed one of the features is dns resolution is cached in Lua memory, but it should be looked up every few seconds to mitigate this.

Here is a much better post describing the issue and solution:
http://gc-taylor.com/blog/2011/11/10/nginx-aws-elb-name-resolution-resolvers/

@Tieske
Copy link
Member Author

Tieske commented Oct 24, 2016

@pgieniec5tar apparently (what I read from the post) is that nginx does not honour the ttl by simply holding on to the IP address indefinitely.

This dns resolver branch will honour ttl so as long as the proper values are set, it will renew the ip addresses (and update its in memory cache) when the record expires.

initial merge commit, untested. Probably needs some fixes still.
@subnetmarco
Copy link
Member

subnetmarco commented Oct 25, 2016

@Tieske can you please fix the conflicts? I would like to execute some benchmarks too on the latest code.

@Tieske Tieske merged commit e121325 into next Nov 4, 2016
@Tieske Tieske deleted the feature/dns2 branch November 4, 2016 12:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants