[bitnami/etcd] healthcheck.sh leaving behind etcdctl zombies on timeout #13989

RobinGeuze · 2022-12-16T08:48:50Z

Name and Version

bitnami/etcd 8.5.8

What steps will reproduce the bug?

We run a kubeadm cluster using calico as the networking layer on physical machines. We've recently had some calico issues which caused networking interruptions between pods. When something like that happens we end up with a bunch of etcdctl zombies.

My guess is that due to the networking issues the etcdctl command takes a long time, which causes healthcheck.sh to get killed. For reasons unclear to me that doesn't properly kill the etcdctl command at that point, which, because the parent process is gone then gets attached to the etcd command running as PID 1, which obviously doesn't handle cleaning up zombies properly. I suspect it is very similar to this issue seen with redis: #5328

Due to the redis issue I think that enabling shareProcessNamespace on the etcd pod should already fix this issue, but the chart currently doesn't allow doing that.

Are you using any custom parameters or values?

We use a replicacount of 3

What is the expected behavior?

No zombies

What do you see instead?

An etcdctl zombie for every timed out healthcheck

carrodher · 2022-12-19T10:28:18Z

It seems a very specific use case difficult to reproduce on our side and very tied to your scenario.

For information regarding the application itself, customization of the content within the application, or questions about the use of the technology or infrastructure; we highly recommend checking forums and user guides made available by the project behind the application or the technology.

That said, we will keep this ticket open until the stale bot closes it just in case someone from the community adds some valuable info.

If you think there is something fixable at the Helm chart level and you would like to contribute by creating a PR to solve the issue, the Bitnami team will be happy to review it and provide feedback. Here you can find the contributing guidelines.

RobinGeuze · 2022-12-19T11:00:04Z

Hey @carrodher, I've created a pull request for a potential solution here: #14018

github-actions · 2023-01-04T01:30:33Z

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

RobinGeuze added the tech-issues The user has a technical issue about an application label Dec 16, 2022

github-actions bot added the triage Triage is needed label Dec 16, 2022

bitnami-bot assigned carrodher Dec 16, 2022

carrodher added the etcd label Dec 19, 2022

RobinGeuze mentioned this issue Dec 19, 2022

[bitnami/etcd] Add support for shareProcessNamespace #14018

Merged

4 tasks

github-actions bot added the stale 15 days without activity label Jan 4, 2023

CeliaGMqrz closed this as completed in #14018 Jan 9, 2023

github-actions bot added the solved label Jan 9, 2023

LoveEachDay mentioned this issue Oct 9, 2023

[Bug]: The etcd service of milvus generates a lot of zombie processes milvus-io/milvus#27544

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[bitnami/etcd] healthcheck.sh leaving behind etcdctl zombies on timeout #13989

[bitnami/etcd] healthcheck.sh leaving behind etcdctl zombies on timeout #13989

RobinGeuze commented Dec 16, 2022 •

edited by carrodher

Loading

carrodher commented Dec 19, 2022

RobinGeuze commented Dec 19, 2022

github-actions bot commented Jan 4, 2023

[bitnami/etcd] healthcheck.sh leaving behind etcdctl zombies on timeout #13989

[bitnami/etcd] healthcheck.sh leaving behind etcdctl zombies on timeout #13989

Comments

RobinGeuze commented Dec 16, 2022 • edited by carrodher Loading

Name and Version

What steps will reproduce the bug?

Are you using any custom parameters or values?

What is the expected behavior?

What do you see instead?

carrodher commented Dec 19, 2022

RobinGeuze commented Dec 19, 2022

github-actions bot commented Jan 4, 2023

RobinGeuze commented Dec 16, 2022 •

edited by carrodher

Loading