-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Defunct git processes when running Gitea as PID 1 on docker #19077
Comments
Please upgrade to v1.16.3 |
I hope to have the time soon. Does that mean that this is not an issue in 1.16.3? |
If I'm not wrong, it should be. |
Same issue on 1.16.2 in Docker |
Please upgrade to v1.16.5 |
I'm assuming I'll just be told to upgrade, but for what it's worth, the issue persists in v1.16.3. I'm also using docker, if that makes a difference. |
Could you please share why you think upgrading to 1.16.5 fixes the problem? |
The below was a misunderstanding when the title was zombie processes rather than defunct. #13987 related to stuck open git processes - whereas this case relates to processes that are dead but have not been reaped.
|
@zeripath thanks for the info. I think I'm on 1.16.8 now, so when I get the time, I'll verify. |
@zeripath The zombie (defunct) processes still get created at a roughly 10-20 minute interval with 1.16.8.
If there's something I can provide, then I'm more than willing to, but I'm not sure what I can give you that will help, or how to get it. It's also not a major issue for me, because it only happens in error cases and I've just found out that running docker with the However, the settings page of the mirror reports the latest update date as being the date of the last failed sync job, so it is not obvious that an error is occurring without comparing the two repos or looking at the execution logs of gitea for an error. As best I can tell, the following steps should reproduce the issue (assuming a gitea instance running in a docker container):
After the above steps, every time the mirror update job runs in gitea, a zombie git process results (sample ps output)
and an error similar to below is generated in the log file
|
Ah so you think that these defunct processes are related to SyncMirrors? I suspect that these defunct processes are actually because of git created child processes. Does this happen on 1.17 too? What is your version of Git? (I wonder if it's possibly related to a version of Git?) I have one further possible solution - we could try to set a process attribute Could you apply the below patch? diff --git a/modules/git/command.go b/modules/git/command.go
index 3dd12e421..b3266ce99 100644
--- a/modules/git/command.go
+++ b/modules/git/command.go
@@ -13,6 +13,7 @@ import (
"os"
"os/exec"
"strings"
+ "syscall"
"time"
"unsafe"
@@ -157,6 +158,7 @@ func (c *Command) Run(opts *RunOpts) error {
"GIT_NO_REPLACE_OBJECTS=1",
)
+ cmd.SysProcAttr = &syscall.SysProcAttr{Setpgid: true}
cmd.Dir = opts.Dir
cmd.Stdout = opts.Stdout
cmd.Stderr = opts.Stderr
and see if that prevents these defunct processes? |
It would help to know if the zombie processes are orphans or if their parent is the Gitea process. Running something like If the git process is indeed an orphan, setting the process group id as @zeripath suggests will help. The Gitea children then becomes a process group leader and killing it will also kill all its children instead of not.
That being said I'd be surprised that it is the cause. A process is a zombie when the parent fails to wait(2). And it is more likely that said process is Gitea itself. The reproducer you provide is a good one 👍 and I'll give it a try. |
How exactly do you think it could happen that Gitea doesn't wait? Lines 164 to 177 in d002e3d
The only way that something doesn't wait() here is if there is a panic in PipelineFunc but there's no PipelineFunc here and the panic would be logged. Looking deeper at cmd.Wait()From os/exec/exec.go: func (c *Cmd) Wait() error {
if c.Process == nil {
return errors.New("exec: not started")
}
if c.finished {
return errors.New("exec: Wait was already called")
}
c.finished = true
state, err := c.Process.Wait()
if c.waitDone != nil {
close(c.waitDone)
}
c.ProcessState = state
var copyError error
for range c.goroutine {
if err := <-c.errch; err != nil && copyError == nil {
copyError = err
}
}
c.closeDescriptors(c.closeAfterWait)
if err != nil {
return err
} else if !state.Success() {
return &ExitError{ProcessState: state}
}
return copyError
}
...
// Wait waits for the Process to exit, and then returns a
// ProcessState describing its status and an error, if any.
// Wait releases any resources associated with the Process.
// On most operating systems, the Process must be a child
// of the current process or an error will be returned.
func (p *Process) Wait() (*ProcessState, error) {
return p.wait()
} and os/exec/exec_unix.go: func (p *Process) wait() (ps *ProcessState, err error) {
if p.Pid == -1 {
return nil, syscall.EINVAL
}
// If we can block until Wait4 will succeed immediately, do so.
ready, err := p.blockUntilWaitable()
if err != nil {
return nil, err
}
if ready {
// Mark the process done now, before the call to Wait4,
// so that Process.signal will not send a signal.
p.setDone()
// Acquire a write lock on sigMu to wait for any
// active call to the signal method to complete.
p.sigMu.Lock()
p.sigMu.Unlock()
}
var (
status syscall.WaitStatus
rusage syscall.Rusage
pid1 int
e error
)
for {
pid1, e = syscall.Wait4(p.Pid, &status, 0, &rusage)
if e != syscall.EINTR {
break
}
}
if e != nil {
return nil, NewSyscallError("wait", e)
}
if pid1 != 0 {
p.setDone()
}
ps = &ProcessState{
pid: pid1,
status: status,
rusage: &rusage,
}
return ps, nil
} Gitea is waiting for the process... (assuming the goroutine isn't blocked in cmd.Wait() which we know it isn't here because we've got logs.) |
I have no clue. I'm betting on a bug as a more likely alternative to git spawning a child, but it's just a wild guess. I have the reproducer in place. To be continued :-) |
I guess an alternative would be that this isn't related to SyncMirrors at all and the defunct git processes are coming from something else - but what? The processes tab in /admin should show a stuck process if so - but they haven't reported this. If they are related to the SyncMirrors call though then I really think that these have to be child processes from the git call itself. You can see that we're reading the Stdout, the Stderr and the ExitStatus from the |
@jswolf19 I'm unable to reproduce the problem with the steps you listed. For the record, here is what I did:
@zeripath I did not configure the verbose logs just yet but I will as soon as I'm able to reproduce the problem, unless @jswolf19 does it first ;-) @jswolf19 would you be so kind as to let me know what I do differently from what you do? That will probably explain why I don't see the problem and you do. I'll let the container run for a few hours but I don't see any rational reason why it would start showing zombies now. |
The reported SyncMirrors call here calls Looking at the code for builtin/remote.c in git we can see that calls If the parent |
I stand corrected, your guess is better than mine 👍 |
docker exec -u git <gitea-container> gitea manager logging add console -n traceconsole -l TRACE -e '((modules/git)|(services/mirror))'
docker logs -f <gitea-container> |
I did this: $ docker exec -u git gitea gitea manager logging add console -n traceconsole -l TRACE -e '((modules/git)|(services/mirror))'
Added Then clicked on the monitor to force the mirror task: And looked at the logs with: $ docker logs -n 10 gitea
2022/06/01 12:41:22 Completed POST /api/internal/manager/add-logger 200 OK in 161.111µs
2022/06/01 12:41:46 Started POST /admin for 172.17.0.1:40330
2022/06/01 12:41:46 ...rvices/cron/tasks.go:114:GetTask() [I] Getting update_mirrors in &{{0 0} update_mirrors 0xc0046a55c0 0x1e650e0 8}
2022/06/01 12:41:46 Completed POST /admin 302 Found in 2.680367ms
2022/06/01 12:41:46 ...ces/mirror/mirror.go:60:Update() [T] Doing: Update
2022/06/01 12:41:46 ...ces/mirror/mirror.go:141:Update() [T] Finished: Update: 0 pull mirrors and 0 push mirrors queued
2022/06/01 12:41:46 Started GET /admin/monitor for 172.17.0.1:40330
2022/06/01 12:41:46 Completed GET /admin/monitor 200 OK in 2.782788ms
2022/06/01 12:41:46 Started GET /assets/img/logo.svg for 172.17.0.1:40330
2022/06/01 12:41:46 Completed GET /assets/img/logo.svg 200 OK in 106.279µs Which apparently means that there was no attempt to mirror. Shouldn't this trigger a mirror attempt? |
Not if one is not due... |
Ah, this "Run" button is in case one is due but the opportunity was missed because of a restart or something? Is there a way to force a mirror task to run or should I just wait for the delay to expire? |
Go to the repository itself and click the synchronize now button |
That's in the mirror settings: I need to set the delay to something shorter than 8h... I did try to force from the settings and it does not create a zombie. I'm now verifying if that holds true when the task does the same which is a slightly different code path. I don't think it makes a difference but I'm being thorough. |
Update: Hum, maybe the conditions are not right in my test and sending the process in the background does something different. Let's scratch that. @zeripath let say gitea runs
So... either: a) the zombie is not in the container and the environment is different from what we assume it is |
Here is an updated and better attempt to run the reproducer, no success though. @jswolf19 I'm unable to reproduce the problem with the steps you listed. For the record, here is what I did:
@jswolf19 would you be so kind as to let me know what I do differently from what you do? That will probably explain why I don't see the problem and you do. |
Ah but are you running docker with
|
I do not but since it was mentioned as a way to get rid of zombies... I did not follow this path. I'm unable to see zombies. This is good right? 🧟 I'll stop obsessing over this and wait for more information to get a reproducer working. |
@zeripath @singuliere Sorry for not getting back sooner. I should have made it clearer, but I'm running a container from an image built from a custom Dockerfile. I'm pretty sure the exec process is gitea (when not using the I don't have access to it right now, but if you'd like, tomorrow I should be able to provide a Dockerfile and run command that will work with the steps I provided earlier. If the fact that I'm not using the official docker image is enough that you don't want to pursue this issue further, I completely understand. |
@jswolf19 it's actually good news! It means the fix that @zeripath proposed (making Gitea children process group leaders) will solve your problem. While your situation is certainly an unusual border case because most system will automatically adopt orphaned children, the fix will also address another problem:
|
When Gitea is running as PID 1 git will occassionally orphan child processes leading to (defunct) processes. This PR simply sets Setpgid to true on these child processes meaning that these defunct processes will also be correctly killed. Fix go-gitea#19077 Signed-off-by: Andrew Thornton <[email protected]>
@singuliere You may be able to reproduce using the rootless Dockerfile, as this uses gitea as the root process. If you want me to provide a Dockerfile, though, I can. My current setup is using a package maintained by @ecsgh, so the rootless file may be easier to use for debugging and testing if you can reproduce with it. @zeripath My git version in the container is git 2.36.1-lp153.546.1 (OpenSuse Leap 15.3 package). I'm using a pre-built package, so patching will be difficult, as is. If you still want me to try out this patch, then I can, but it'll take me some time (probably sometime next week at the earliest). |
@jswolf19 I don't think the failed authentication causes the zombie because it goes like this:
In order for the zombie to happen, I think it involves a mirror that timesout for some reason and forces Gitea to kill the |
@singuliere if you could help me know what to look for and where I'll check. Would the timeout events be logged by gitea, or is there some other log I'd need to check? If they're logged by gitea, do I need to increase the log output in order to get the logs for the timeout events? |
@jswolf19 a detailed explanation including a reproducer based on the rootless image as you suggested 👍 was just published on the Hostea blog https://hostea.org/blog/zombies/ It all makes perfect sense now and @zeripath patch is the definitive answer to this ✨ |
I appreciate the work you guy's've put into looking into this and fixing it. I will be sure to verify once I've updated ^_^ |
It appears that #19865 does fix this at least by my reckoning. |
When Gitea is running as PID 1 git will occassionally orphan child processes leading to (defunct) processes. This PR simply sets Setpgid to true on these child processes meaning that these defunct processes will also be correctly reaped. Fix #19077 Signed-off-by: Andrew Thornton <[email protected]>
Backport go-gitea#19865 When Gitea is running as PID 1 git will occassionally orphan child processes leading to (defunct) processes. This PR simply sets Setpgid to true on these child processes meaning that these defunct processes will also be correctly reaped. Fix go-gitea#19077 Signed-off-by: Andrew Thornton <[email protected]>
When Gitea is running as PID 1 git will occassionally orphan child processes leading to (defunct) processes. This PR simply sets Setpgid to true on these child processes meaning that these defunct processes will also be correctly reaped. Fix go-gitea#19077 Signed-off-by: Andrew Thornton <[email protected]>
I think this bug (or a similar one) is still present. Recently I've restarted Gitea, it's been running for 53 minutes as of writing this, and I already have 20 zombie git processes.
I've seen this in the past few days too while I was inspecting Gitea processes, but probably it has been happening for longer. Should I open a new issue for this? |
I think we need to check if every child process' terminal status has been read by parent process. |
Oh I think I can tie this to something. I run several services - including Gitea - on a Raspberry Pi. I usually check it's resource utilization by bpytop. Not a very resource efficient tool, but whatever, still the best of which I tried. I have accidentally hidden the process list, and noticed that this way bpytop can precisely keep up with the 1 second update rate. After Gitea has been running for 1 day and 21 hours, there are 970 defunct git processes on the system, out of 1167 total. |
Gitea Version
1.15.7
Git Version
2.34.1
Operating System
OpenSuse Leap 15.3 (Docker)
How are you running Gitea?
I used an rpm built using the repo at https://build.opensuse.org/package/show/devel:tools:scm/gitea and installed with zypper.
I'm running in a docker container based on OpenSuse Leap 15.3, base os also OpenSuse Leap 15.3
Database
MySQL
Can you reproduce the bug on the Gitea demo site?
No
Log Gist
No response
Description
I have zombie processes whose start time coincides with log entries such as below:
Upon updating my github login credentials, the zombie processes seem to no longer occur.
#13987 is documented as being fixed by #14006 in the 1.14.0 milestone, so I'm reporting this as a possible regression. I'm sorry I haven't been able to upgrade to the latest version to check if the issue still exists. If you need any additional information, please let me know.
Screenshots
No response
The text was updated successfully, but these errors were encountered: