Filebeat autodiscover stopping too early when kubernetes pod terminates #34789

cpaton · 2023-03-09T18:23:26Z

I am using filebeat with autodiscover within a Kubernetes cluster to capture logs. When a Kubernetes pod terminates filebeat immediately stops reading log entries which can result in log lines at the end of the logs not being published.

Filebeat configuration is setup as

filebeat.autodiscover:
  providers:
    - type: kubernetes
      node: ${NODE_NAME}
      resource: pod
      scope: node
      hints.enabled: true
      hints.default_config:
        type: container
        paths:
          - /var/log/containers/*-${data.container.id}.log
        fields_under_root: true          
      include_annotations:
        - app.kubernetes.io/version
        - app-release

This issue appears to the same as reported in Filebeat autodiscover should not stop docker prospectors until it reads all lines · Issue #6694 · elastic/beats (github.com) but for the kubernetes input. That was fixed by the introduction of the close_timeout configuration option which is still in place today.

From testing different versions it works as expected up to Filebeat 7.12.1, but is not working as expected from 7.13.0 onwards.

Logs from Filebeat 7.12.1 when a pod terminates

2023-02-13T20:27:52.120Z	DEBUG	[autodiscover.pod]	kubernetes/pod.go:129	Watcher Pod update for pod: log-provider-0, status: Running
2023-02-13T20:27:52.854Z	DEBUG	[autodiscover.pod]	kubernetes/pod.go:145	Watcher Pod update (terminating)
2023-02-13T20:28:52.856Z	DEBUG	[autodiscover]	autodiscover/autodiscover.go:253	Got a stop event
2023-02-13T20:28:52.857Z	INFO	input/input.go:136	input ticker stopped

Note the 60 second gap between the pod update event being received from Kubernetes as the pod terminates and the stop event being published within filebeat to stop reading the logs

Logs from Filebeat 7.13.0 when a pod terminates

2023-02-13T20:26:10.123Z	DEBUG	[autodiscover.pod]	kubernetes/pod.go:145	Watcher Pod update
2023-02-13T20:26:10.128Z	DEBUG	[autodiscover]	autodiscover/autodiscover.go:253	Got a stop event
2023-02-13T20:26:10.129Z	INFO	input/input.go:136	input ticker stopped

Note the stop event being published immediately

From some investigation I think it relates to changes in #24742 - Refactor kubernetes autodiscover to avoid skipping short-living pods . Before that change the stop event on pod termination would never be published immediately it was always delayed by the close timeout. Post that change the stop event will be published immediately if any container within the pod has a non-empty running container status. This is the bit of logic which I think has changed - below is the snippet before the above change

	if pod.GetObjectMeta().GetDeletionTimestamp() != nil {
		p.logger.Debugf("Watcher Pod update (terminating): %+v", obj)
		// Pod is terminating, don't reload its configuration and ignore the event
		// if some pod is still running, we will receive more events when containers
		// terminate.
		for _, container := range pod.Status.ContainerStatuses {
			if container.State.Running != nil {
				return
			}
		}
		time.AfterFunc(p.config.CleanupTimeout, func() { p.emit(pod, "stop") })
		return
	}

Now the stop event is not delayed as the pod hasn't terminated, its only in the process of terminating

delay := (flag == "stop" && podTerminated(pod, containers))

Note this was originally posted on the discussion forums but did not get any replies - https://discuss.elastic.co/t/filebeat-autodiscover-stopping-too-early-when-kubernetes-pod-terminates/325491

The text was updated successfully, but these errors were encountered:

StephanErb · 2023-03-23T20:20:30Z

This could potentially be a duplicate of #34045

StephanErb · 2023-03-23T20:25:46Z

@cpaton do your PODs container a single container or multiple? Your analysis sounds as if the error would only be present if there are multiple containers?

cpaton · 2023-03-29T10:41:07Z

In this example the pod has a single container.

In the first Kubernetes event seen by filebeat when the pod terminates the single containers status has a non-empty running state. Its this non-empty running state which I think prevents the delay as the pod is deemed to be terminating but not terminated resulting in the input stopping immediately. The code before #24742 explicitly checked for containers still running in a terminated pod and didn't emit an event

func (p *pod) emit(pod *kubernetes.Pod, flag string) {
	...
	delay := (flag == "stop" && podTerminated(pod, containers))
	...
}

// podTerminating returns true if a pod is marked for deletion or is in a phase beyond running.
func podTerminating(pod *kubernetes.Pod) bool {
	if pod.GetObjectMeta().GetDeletionTimestamp() != nil {
		return true
	}

	switch pod.Status.Phase {
	case kubernetes.PodRunning, kubernetes.PodPending:
	default:
		return true
	}

	return false
}

// podTerminated returns true if a pod is terminated, this method considers a
// pod as terminated if none of its containers are running (or going to be running).
func podTerminated(pod *kubernetes.Pod, containers []*containerInPod) bool {
	// Pod is not marked for termination, so it is not terminated.
	if !podTerminating(pod) {
		return false
	}

	// If any container is running, the pod is not terminated yet.
	for _, container := range containers {
		if container.status.State.Running != nil {
			return false
		}
	}

	return true
}

elasticmachine · 2024-03-27T02:35:52Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Mar 9, 2023

StephanErb mentioned this issue Mar 23, 2023

Filebeat doesn't collect logs of CronJob pods #34045

Open

ycombinator added the Team:Elastic-Agent Label for the Agent team label Mar 27, 2024

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Mar 27, 2024

cmacknz added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Mar 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filebeat autodiscover stopping too early when kubernetes pod terminates #34789

Filebeat autodiscover stopping too early when kubernetes pod terminates #34789

cpaton commented Mar 9, 2023

StephanErb commented Mar 23, 2023

StephanErb commented Mar 23, 2023

cpaton commented Mar 29, 2023

elasticmachine commented Mar 27, 2024

Filebeat autodiscover stopping too early when kubernetes pod terminates #34789

Filebeat autodiscover stopping too early when kubernetes pod terminates #34789

Comments

cpaton commented Mar 9, 2023

StephanErb commented Mar 23, 2023

StephanErb commented Mar 23, 2023

cpaton commented Mar 29, 2023

elasticmachine commented Mar 27, 2024