-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
data race in events monitoring code #511
Comments
Just experienced this (or at least something similar) when playing around with the event listener:
This happened when I stopped the docker daemon it was streaming events from (connected through TCP+TLS). I had already received the channel close notification, which I could see from output of my code before the panic. The docker daemon I stopped had no containers running, so I'm not sure what event it was even trying to send. It happened only once. I've been trying to reproduce it, but no luck so far. A race condition sounds very likely. |
After more testing, I was able to reproduce my issue, described above. I traced the problem and fixed it in #514. I'm not sure if this covers the possible race condition/trace starting this issue. |
@cezarsa I've been failing to detect this race with |
@fsouza For some reason the race detector is only catching these data races on linux with:
Running against master I can see that there are still race conditions even after #514:
|
These data races should be fixed now. I enabled race detector in Travis to help us detect this kind of error in the future. The fix is far from ideal, I want to have a deeper look in the event monitoring code and potentially refactor it soon. |
I think there's still a race condition there. It took about 20 minutes of running
|
Looks like travis caught this one as well: https://travis-ci.org/fsouza/go-dockerclient/jobs/125598582. |
Closing this, as I believe we've made progress fixing the data races in the event monitoring code. Also, I've run the while ./go-dockerclient.test experiment for ~30 minutes on both Mac and Linux, so we should be good. |
Re-opening as I just got one more lol
|
This change causes a data race: ``` ================== WARNING: DATA RACE Write at 0x00c0003b6b50 by goroutine 167: runtime.closechan() /usr/local/go/src/runtime/chan.go:334 +0x0 github.com/fsouza/go-dockerclient.(*eventMonitoringState).disableEventMonitoring() /code/event.go:191 +0x10e github.com/fsouza/go-dockerclient.(*Client).RemoveEventListener() /code/event.go:113 +0xd1 github.com/fsouza/go-dockerclient.testEventListeners.func2() /code/event_test.go:238 +0x46 github.com/fsouza/go-dockerclient.testEventListeners() /code/event_test.go:269 +0x1022 github.com/fsouza/go-dockerclient.TestTLSEventListeners() /code/event_test.go:28 +0x8e testing.tRunner() /usr/local/go/src/testing/testing.go:862 +0x163 Previous read at 0x00c0003b6b50 by goroutine 410: runtime.chansend() /usr/local/go/src/runtime/chan.go:142 +0x0 github.com/fsouza/go-dockerclient.(*Client).eventHijack.func1() /code/event.go:356 +0x65d Goroutine 167 (running) created at: testing.(*T).Run() /usr/local/go/src/testing/testing.go:913 +0x699 testing.runTests.func1() /usr/local/go/src/testing/testing.go:1154 +0xa8 testing.tRunner() /usr/local/go/src/testing/testing.go:862 +0x163 testing.runTests() /usr/local/go/src/testing/testing.go:1152 +0x523 testing.(*M).Run() /usr/local/go/src/testing/testing.go:1069 +0x2eb main.main() _testmain.go:656 +0x222 Goroutine 410 (finished) created at: github.com/fsouza/go-dockerclient.(*Client).eventHijack() /code/event.go:344 +0x56a github.com/fsouza/go-dockerclient.(*eventMonitoringState).connectWithRetry() /code/event.go:255 +0xdc github.com/fsouza/go-dockerclient.(*eventMonitoringState).monitorEvents() /code/event.go:217 +0xba ================== ``` My bad x) Related to #511. This reverts commit a107859.
Ok, I'm confident now :) |
I could only get this race trace when running the tests with Go tip. It seems like a write on closed channel panic is possible after taking a quick look at this.
The text was updated successfully, but these errors were encountered: