-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CI flake: Panic in Spec Teardown: send on closed channel #6518
Comments
Still happening, and (as far as my logs can tell) still only on f32: Podman containers [AfterEach] podman wait to pause|unpause condition
|
A friendly reminder that this issue had no activity for 30 days. |
@edsantiago Still an issue? |
Yes, at least as of three days ago. Podman containers [AfterEach] podman wait to pause|unpause condition
|
And another one just now, on my own PR #7070 |
Another one yesterday:
Still no sign of it on anything other than f32 |
The "podman wait to pause|unpause condition" test is failing several times a day, always a flake. Issue containers#6518. Disable it until the cause can be identified and fixed. Signed-off-by: Ed Santiago <[email protected]>
A friendly reminder that this issue had no activity for 30 days. |
@edsantiago Still an issue? |
Uh, well, no, because it was flaking so much that I disabled the test (#7143). I haven't tried to generate a reproducer, but if today is calm I will try to do so. I've removed the stale-issue label because until the test is reenabled, we don't know. |
Reference: containers#6518, a very-frequently-flaking CI test, disabled a month ago (containers#7143) because it was triggering so often in CI. Unfortunately, that seems to have simply swept the problem under the rug. AFAICT nobody has bothered to look at the root bug, so let's just reenable. If the problem persists, I'll let annoyed developers squeaky-wheel 6158 so there's some incentive to fix it. If the problem has miraculously gone away in the last month, that's a win too. (This test failure does not reproduce on my laptop, nor does it lend itself to devising a simple reproducer on a test VM.) Also: since containers#5325 appears to have been closed as fixed, remove a 'Skip' that references it. Unfortunately this also requires removing a lot of other cruft. This was an incidental oh-by-the-way addition that I thought would be trivial but ended up causing a much larger diff. Signed-off-by: Ed Santiago <[email protected]>
Yes, still happening, in post-merge testing on master:
Links, from most- to least-specific: |
This is still happening. Recent failures: |
A friendly reminder that this issue had no activity for 30 days. |
@edsantiago still seeing this? |
Sorry, I haven't had time to look (at this nor the other stale-issue that you haven't pinged me about yet). Won't have time until Thursday most likely. But I will, I promise. |
Yes, still happening. I'll skip the September instances, and just list October/November:
Note to self: the search term for cirrus-flake-summarize is "unpause" |
It's continuing to flake, and I see no activity on containers#6518. Flakes are evil. Let's just disable the test again, until someone takes the initiative to fix the bug. Signed-off-by: Ed Santiago <[email protected]>
Still flaking. I've filed #8536 to re-disable the offending test. |
@edsantiago What do you think of this patch. I think the issue is the second errchan is setup before the first one completes. Causing the issue you are seeing. If we separate the channels, we should not be closing the channel before it is used. |
I don't see how there could be a race here -- the code looks really sequential to me. But Go has subtleties I don't understand. I'm willing and even eager to give your approach a try for a few months, if you'd like to submit that! I will close this once your PR goes through CI and merges. Thank you! |
I am wondering if threads would skip over the wait error channel, but you may be right. Only way I could see this happening would be if the second err = make(chan error) could fire before the close(errChan) in the fist function happened. |
The It("podman wait to pause|unpause condition"... test is flaking every so often when a messages is sent in the second function to a channel. It is my believe that in between the time the first function sends a message to the channel and before it closes the channel the second errChan=make() has happened. This would mean that the fist function closes the second errChan, and then when the second function sends a message to the second errChan, it fails and blows up with the error you are seeing. By creating a different variable for the second channel, we eliminate the race. Fixes: containers#6518 Signed-off-by: Daniel J Walsh <[email protected]>
cirrus-flake-xref is reporting three instances of this since May 26:
Log links:
Link to containers_test.go:308.
All failures have been in
fedora-32 special_testing_bindings
.The text was updated successfully, but these errors were encountered: