Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop when connection unexpectedly dies #11

Closed
nvx opened this issue Sep 26, 2019 · 5 comments
Closed

Infinite loop when connection unexpectedly dies #11

nvx opened this issue Sep 26, 2019 · 5 comments

Comments

@nvx
Copy link

nvx commented Sep 26, 2019

When the connection is interrupted unexpectedly (eg, stateful firewall loses track of the connection) it seems that when database/sql tries to signal to the driver to close the connection it gets stuck in a busy loop causing 100% CPU usage.

Stack trace of the offending goroutine running under go version go1.12.9 linux/amd64:

goroutine 855659 [runnable]:
syscall.Syscall(0x0, 0x7, 0xc0001565a8, 0x1, 0x0, 0x1, 0x0)
        /usr/local/go/src/syscall/asm_linux_amd64.s:18 +0x5
syscall.read(0x7, 0xc0001565a8, 0x1, 0x8, 0x0, 0x0, 0x0)
        /usr/local/go/src/syscall/zsyscall_linux_amd64.go:732 +0x5a
syscall.Read(...)
        /usr/local/go/src/syscall/syscall_unix.go:172
internal/poll.(*FD).Read(0xc00014eb00, 0xc0001565a8, 0x1, 0x8, 0x0, 0x0, 0x0)
        /usr/local/go/src/internal/poll/fd_unix.go:165 +0x131
net.(*netFD).Read(0xc00014eb00, 0xc0001565a8, 0x1, 0x8, 0x0, 0xe97160, 0xc000096050)
        /usr/local/go/src/net/fd_unix.go:202 +0x4f
net.(*conn).Read(0xc0000ac2b8, 0xc0001565a8, 0x1, 0x8, 0x0, 0x0, 0x0)
        /usr/local/go/src/net/net.go:177 +0x69
io.ReadAtLeast(0x7f47d8c79a08, 0xc0004741c0, 0xc0001565a8, 0x1, 0x8, 0x1, 0x0, 0xe97160, 0xc000096050)
        /usr/local/go/src/io/io.go:310 +0x88
io.ReadFull(...)
        /usr/local/go/src/io/io.go:329
github.com/thda/tds/binary.(*Encoder).Read(0xc000156588, 0xc0001565a8, 0x1, 0x8, 0x0, 0xe97160, 0xc000096050)
        /home/jenkins_slave/workspace/tds-test/src/github.com/thda/tds/binary/encoder.go:78 +0x6b
github.com/thda/tds/binary.(*Encoder).ReadByte(...)
        /home/jenkins_slave/workspace/tds-test/src/github.com/thda/tds/binary/encoder.go:306
github.com/thda/tds.(*header).read(0xc000156490, 0xc000156588, 0xe97160, 0xc000096050)
        /home/jenkins_slave/workspace/tds-test/src/github.com/thda/tds/buffer.go:42 +0x1ab
github.com/thda/tds.(*buf).readPkt(0xc000156480, 0xc000499001, 0xe97160, 0xc000096050)
        /home/jenkins_slave/workspace/tds-test/src/github.com/thda/tds/buffer.go:140 +0x54
github.com/thda/tds.(*buf).processCancel(0xc000156480, 0x0, 0x0)
        /home/jenkins_slave/workspace/tds-test/src/github.com/thda/tds/buffer.go:554 +0xcf
github.com/thda/tds.(*buf).sendPkt(0xc000156480, 0xeb3401, 0xc0003c8350, 0x0)
        /home/jenkins_slave/workspace/tds-test/src/github.com/thda/tds/buffer.go:187 +0x1c0
github.com/thda/tds.(*buf).send(0xc000156480, 0xeadfe0, 0xc0002fb620, 0xc0003c830f, 0xc000499180, 0x1, 0x1, 0x0, 0x0)
        /home/jenkins_slave/workspace/tds-test/src/github.com/thda/tds/buffer.go:398 +0xee
github.com/thda/tds.(*session).Close(0xc0003ff040, 0x0, 0x0)
        /home/jenkins_slave/workspace/tds-test/src/github.com/thda/tds/session.go:342 +0x192
database/sql.(*driverConn).finalClose.func2()
        /usr/local/go/src/database/sql/sql.go:521 +0x49
database/sql.withLock(0xea1e80, 0xc000119280, 0xc000499270)
        /usr/local/go/src/database/sql/sql.go:3097 +0x63
database/sql.(*driverConn).finalClose(0xc000119280, 0x0, 0x4)
        /usr/local/go/src/database/sql/sql.go:519 +0x130
database/sql.(*driverConn).Close(0xc000119280, 0xc000097390, 0xc000097390)
        /usr/local/go/src/database/sql/sql.go:500 +0x138
database/sql.(*DB).putConn(0xc00010e180, 0xc000119280, 0xe97160, 0xc000097390, 0xc000119201)
        /usr/local/go/src/database/sql/sql.go:1250 +0x2d5
database/sql.(*driverConn).releaseConn(...)
        /usr/local/go/src/database/sql/sql.go:421
database/sql.(*DB).pingDC(0xc00010e180, 0xeae020, 0xc000520e40, 0xc000119280, 0xc0004994a8, 0x0, 0x0)
        /usr/local/go/src/database/sql/sql.go:719 +0x84
database/sql.(*DB).PingContext(0xc00010e180, 0xeae020, 0xc000520e40, 0xc000499520, 0xd567e0)
        /usr/local/go/src/database/sql/sql.go:742 +0x170
<truncated>

In this instance, db.PingContext() was called which made the database/sql package realise there was a bad connection in the pool. Relevant bit of code in the database/sql package that kicks off the chain of events: https://github.com/golang/go/blob/release-branch.go1.12/src/database/sql/sql.go#L1250

Subsequent calls to db.PingContext() worked fine using a fresh connection while the offending go routine was still caught up in a busy loop.

@nvx
Copy link
Author

nvx commented Sep 26, 2019

Going through the stack trace, I suspect the offending loop is this one: https://github.com/thda/tds/blob/master/buffer.go#L549

@thda
Copy link
Owner

thda commented Sep 26, 2019

Indeed that for loop should break when readPkt returns an error. Will fix by tomorrow.

@thda thda closed this as completed in 49af1c6 Sep 27, 2019
@thda
Copy link
Owner

thda commented Sep 27, 2019

Hi,

I think I've fixed in master. Could you please confirm?
If it's working I'll cut a release.
Thanks for the detailed analysis BTW.

Cheers

@thda thda reopened this Sep 27, 2019
@nvx
Copy link
Author

nvx commented Sep 30, 2019

I agree that looks like it should fix the issue. We ran into this when our production load balancer was upgraded so I'm not sure how reliably I can reproduce the issue on demand unfortunately.

I'm happy enough to close this and I'll open a new ticket if I run into it again next time a similar change happens on our load balancer.

@thda
Copy link
Owner

thda commented Sep 30, 2019

Closing then. Thanks!

@thda thda closed this as completed Sep 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants