-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use workers instead spawning goroutines for each incoming DNS request #565
Conversation
1f26fee
to
75c7b1d
Compare
Why? |
server.go
Outdated
// Maximum number of incoming DNS messages in queue. | ||
maxQueueSize = 1000000 | ||
// Maximum number of workers. | ||
maxWorkers = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a 100?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Observed that 100 is optimal value, <100 and >100 drops performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But probably we need more tests.
server.go
Outdated
// Maximum number of TCP queries before we close the socket. | ||
maxTCPQueries = 128 | ||
// Maximum number of incoming DNS messages in queue. | ||
maxQueueSize = 1000000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 1000000?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CoreDNS cannot handle 1M requests - so this queue size is not affected anything (prevents bottleneck).
75c7b1d
to
5ec8082
Compare
Codecov Report
@@ Coverage Diff @@
## master #565 +/- ##
==========================================
+ Coverage 57.85% 57.93% +0.08%
==========================================
Files 37 37
Lines 9984 10008 +24
==========================================
+ Hits 5776 5798 +22
Misses 3158 3158
- Partials 1050 1052 +2
Continue to review full report at Codecov.
|
Couple a things:
|
d8c3d8b
to
89b23b3
Compare
c2b1d54
to
f7e4c4c
Compare
f7e4c4c
to
6c77da5
Compare
What do other think of this? The speed is nice. Is the contant of '100' a problem? |
I asked @UladzimirTrehubenka to put some more details (like those in the email) onto this PR or onto an issue, I think others will need to see that to weigh in. I think to know if the constant is a problem probably requires more empirical tests on other platforms. But if this feature is disabled with workers == 0, then the cost is low (well, if you consider maintaining two different paths low cost). Silent crashes are bad...I like that it fixes that (it does, right?). |
Good point, somehow I tunnel visioned on the constant. |
6c77da5
to
5d657cf
Compare
Actually this PR breaks nothing. By default Workers and QueueSize set to zero. These params are set during server object initialization (e.g. on CoreDNS side). Finally on AWS cluster with using handler that returns random A record for any request I got following numbers (dnsperf against test binary over network):
BTW the project performance (with 100 workers) is 38K for size=0 vs 47K for size=1000000.
|
server.go
Outdated
srv.lock.Lock() | ||
defer srv.lock.Unlock() | ||
if srv.started { | ||
if srv.isRunning() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this need in this PR? (Not opposing the change - but it seems to do the same thing as earlier code, or is there a bug fixed?)
[it does clear out of awful locking we had in server]
Can you make this a seperate PR?
|
||
if srv.Handler == nil { | ||
srv.Handler = DefaultServeMux | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this on needed?
scrolls down
Ah you're pulling it out from the serveX functions; sensible change; can you make that also a new PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serveUDP() and serveTCP() have:
handler := srv.Handler
if handler == nil {
handler = DefaultServeMux
}
...
go srv.serve(s.RemoteAddr(), handler, ...)
Why do we need to set handler each time on serveUDP() or serveTCP() call and then pass handler into serve() if we can set srv.Handler only once and serve() can just use srv.Handler?
server.go
Outdated
func (srv *Server) serveTCP(l net.Listener) error { | ||
srv.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is start()
need again here?
server.go
Outdated
if in.s != nil { | ||
a = in.s.RemoteAddr() | ||
} else if in.t != nil { | ||
a = in.t.RemoteAddr() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if in.s != nil {
a = in.s.RemoteAddr()
}
if in.t != nil {
a = in.t.RemoteAddr()
}
All the reader stuff should apply for both cases and can be outdented
server.go
Outdated
for q := 0; q < maxTCPQueries; q++ { // TODO(miek): make this number configurable? | ||
req := new(Msg) | ||
err := req.Unpack(in.m) | ||
if err != nil { // Send a FormatError back |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced the if-elseif-else is better than the goto we had.
server.go
Outdated
// Number of workers, if set to zero - use spawn goroutines instead | ||
Workers int | ||
// Size of DNS requests queue | ||
QueueSize int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the #workers is one thing, but the QueueSize is quite another, if this doesn't perform better with QueueSize == 0 it's not worth adding. A buffered channel leads to weird "sometimes it is slow - or blocking" errors in prod when the thing finally fills up.
5d657cf
to
0859349
Compare
0859349
to
9e70f9e
Compare
[ Quoting <[email protected]> in "Re: [miekg/dns] Use workers instead..." ]
UladzimirTrehubenka commented on this pull request.
> return &Error{err: "server already started"}
}
+
+ if srv.Handler == nil {
+ srv.Handler = DefaultServeMux
+ }
serveUDP() and serveTCP() have:
```
handler := srv.Handler
if handler == nil {
handler = DefaultServeMux
}
...
go srv.serve(s.RemoteAddr(), handler, ...)
```
Why do we need to set handler each time on serveUDP() or serveTCP() call and then pass handler into serve() if we can set srv.Handler only once and serve() can just use srv.Handler?
Good question, it does look a bit odd doing this in the serving path.. And no
comments (yeah!) on why this makes sense.
Can't think of a good reason right now.
|
[ Quoting <[email protected]> in "Re: [miekg/dns] Use workers instead..." ]
Why do we need to set handler each time on serveUDP() or serveTCP() call and then pass handler into serve() if we can set srv.Handler only once and serve() can just use srv.Handler?
A quick hack to remove this crashes with 'go test'
|
This is not enough just remove this - need to change srv.serve() to use srv.Handler instead passed handler and set srv.Handler to DefaultServeMux (if handler is empty) in srv.ListenAndServe() and srv.ActivateAndServe() as in the PR. |
BTW PR passed all UT and don't change default behavior. |
There are two major issues:
unlimited using resources - performance test shows that on high load CoreDNS silently crashed;
performance drop due much time is spent on management goroutines instead serve DNS requests.