-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix CI flake on time of shutdown for API service #11927
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: jwhonce The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Hmmmm, is this (f34) helpful?
Or this (ubuntu)?
|
@edsantiago For debugging those messages tell me the which shutdown handlers have been called and properly returned. The |
I meant the 6-second delay in one, 4-second in the other |
@edsantiago Ah, I misunderstood. I wanted to get this PR in to take care of the CI flake. There will be a follow on PR to resolve what is taking so long to shutdown. |
a4a849f
to
3d25836
Compare
I can reproduce this on my laptop, 100%, by changing the timeout back to $ ~/go/bin/ginkgo -trace -debug -r -focus "are not available" test/e2e
...
time="2021-10-12T10:26:27-06:00" level=debug msg="API service shutdown request ignored as timeout Duration is UnlimitedService"
time="2021-10-12T10:26:28-06:00" level=debug msg="IdleTracker:new 0m+0h/0t connection(s)" X-Reference-Id=0xc0006aadd0
time="2021-10-12T10:26:28-06:00" level=debug msg="IdleTracker:new 0m+0h/1t connection(s)" X-Reference-Id=0xc0006aadd8
time="2021-10-12T10:26:28-06:00" level=debug msg="IdleTracker:new 0m+0h/2t connection(s)" X-Reference-Id=0xc0006aade0
time="2021-10-12T10:26:28-06:00" level=debug msg="IdleTracker:new 0m+0h/3t connection(s)" X-Reference-Id=0xc0006aade8
time="2021-10-12T10:26:28-06:00" level=info msg="Received shutdown signal \"interrupt\", terminating!" PID=3186527
time="2021-10-12T10:26:28-06:00" level=info msg="Invoking shutdown handler \"service\"" PID=3186527
time="2021-10-12T10:26:28-06:00" level=debug msg="API service forced shutdown, ignoring timeout Duration"
time="2021-10-12T10:26:28-06:00" level=debug msg="API service shutdown, 0/4 connection(s)"
time="2021-10-12T10:26:28-06:00" level=debug msg="IdleTracker:closed 0m+0h/4t connection(s)" X-Reference-Id=0xc0006aade8
time="2021-10-12T10:26:28-06:00" level=warning msg="IdleTracker: StateClosed transition by connection marked un-managed" X-Reference-Id=0xc0006aade8
time="2021-10-12T10:26:28-06:00" level=debug msg="API service forced shutdown, ignoring timeout Duration"
------------------------------
• Failure [6.735 seconds]
podman system service
/home/esm/src/atomic/2018-02.podman/libpod/test/e2e/system_service_test.go:18
verify pprof endpoints
/home/esm/src/atomic/2018-02.podman/libpod/test/e2e/system_service_test.go:54
are not available [It]
/home/esm/src/atomic/2018-02.podman/libpod/test/e2e/system_service_test.go:97
Timed out after 5.001s.
Expected process to exit. It did not. Unfortunately, the debug messages aren't helpful to me; but maybe you can try to reproduce, and if you can, track down the source of the hang? PS I focus on the "are not available" test because it is the (much) simpler cleaner one. |
* Increase timeout for tests to 10s * To aid in debugging add PID to shutdown package logging * Added new message for forced service shutdown * Always wait for HTTP server to shutdown, duration of 0 not friendly to clients Note: The log event "IdleTracker: StateClosed transition by connection marked un-managed" denotes a TCP connection has been initiated but no HTTP request was sent. And is expected during these tests. Fixes containers#11921 Signed-off-by: Jhon Honce <[email protected]>
This is the source of the 6-second delay: https://github.com/containers/podman/blob/main/pkg/api/server/server.go#L292 And, I think it's a problem with cleanup here: https://github.com/containers/podman/blob/main/test/e2e/system_service_test.go#L121-L135 If I replace this function with |
@edsantiago Turns out Shutdown() was waiting for the length of context.WithTimeout() rather than exiting early. For --time=0, I have changed the default to 1s and tests to Wait(2*time.Second). I have seen no flakes on my laptop. I'm leaving the extra debug logging in the code in case this becomes an issue on some upgrade of Go. Additionally, I removed some expensive debugging calls that no longer reported meaningful information. |
/lgtm @jwhonce thank you! |
Nice work @jwhonce and @edsantiago |
/hold cancel |
to clients
Note: The log event
IdleTracker: StateClosed transition by connection marked un-managed
denotes a TCP connection has been initiated but no HTTP request was sent.
Fixes #11921
Signed-off-by: Jhon Honce [email protected]
What this PR does / why we need it:
How to verify it
Which issue(s) this PR fixes:
Special notes for your reviewer: