-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filebeat autodiscover stopping too early when kubernetes pod terminates #34789
Comments
This could potentially be a duplicate of #34045 |
@cpaton do your PODs container a single container or multiple? Your analysis sounds as if the error would only be present if there are multiple containers? |
In this example the pod has a single container. In the first Kubernetes event seen by filebeat when the pod terminates the single containers status has a non-empty running state. Its this non-empty running state which I think prevents the delay as the pod is deemed to be terminating but not terminated resulting in the input stopping immediately. The code before #24742 explicitly checked for containers still running in a terminated pod and didn't emit an event func (p *pod) emit(pod *kubernetes.Pod, flag string) {
...
delay := (flag == "stop" && podTerminated(pod, containers))
...
}
// podTerminating returns true if a pod is marked for deletion or is in a phase beyond running.
func podTerminating(pod *kubernetes.Pod) bool {
if pod.GetObjectMeta().GetDeletionTimestamp() != nil {
return true
}
switch pod.Status.Phase {
case kubernetes.PodRunning, kubernetes.PodPending:
default:
return true
}
return false
}
// podTerminated returns true if a pod is terminated, this method considers a
// pod as terminated if none of its containers are running (or going to be running).
func podTerminated(pod *kubernetes.Pod, containers []*containerInPod) bool {
// Pod is not marked for termination, so it is not terminated.
if !podTerminating(pod) {
return false
}
// If any container is running, the pod is not terminated yet.
for _, container := range containers {
if container.status.State.Running != nil {
return false
}
}
return true
} |
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
I am using filebeat with autodiscover within a Kubernetes cluster to capture logs. When a Kubernetes pod terminates filebeat immediately stops reading log entries which can result in log lines at the end of the logs not being published.
Filebeat configuration is setup as
This issue appears to the same as reported in Filebeat autodiscover should not stop docker prospectors until it reads all lines · Issue #6694 · elastic/beats (github.com) but for the kubernetes input. That was fixed by the introduction of the close_timeout configuration option which is still in place today.
From testing different versions it works as expected up to Filebeat 7.12.1, but is not working as expected from 7.13.0 onwards.
Logs from Filebeat 7.12.1 when a pod terminates
Note the 60 second gap between the pod update event being received from Kubernetes as the pod terminates and the stop event being published within filebeat to stop reading the logs
Logs from Filebeat 7.13.0 when a pod terminates
Note the stop event being published immediately
From some investigation I think it relates to changes in #24742 - Refactor kubernetes autodiscover to avoid skipping short-living pods . Before that change the stop event on pod termination would never be published immediately it was always delayed by the close timeout. Post that change the stop event will be published immediately if any container within the pod has a non-empty running container status. This is the bit of logic which I think has changed - below is the snippet before the above change
Now the stop event is not delayed as the pod hasn't terminated, its only in the process of terminating
Note this was originally posted on the discussion forums but did not get any replies - https://discuss.elastic.co/t/filebeat-autodiscover-stopping-too-early-when-kubernetes-pod-terminates/325491
The text was updated successfully, but these errors were encountered: