8.5.1 agents go unhealthy #1790

pjbertels · 2022-11-24T18:20:28Z

Issues encountered during Fleet Scaling testing with drones and a subset of real VMs.

Version: 8.5.1

Operating System: Linux Ubuntu VM (e2-standard-8)

Steps to Reproduce:
We used some tooling to bring up 200 VMs and 9800 Horde drones, some VM's report errors in the logs on the way up and take longer to come up. Once we begin testing some VMs go unhealthy(33/200).

pjbertels · 2022-11-24T18:22:26Z

Here are the logs from one of the VMs:

10:54:00.981
elastic_agent
[elastic_agent][info] APM instrumentation disabled
10:54:00.983
elastic_agent
[elastic_agent][info] Detecting execution mode
10:54:00.983
elastic_agent
[elastic_agent][info] Agent is managed locally
10:54:00.983
elastic_agent
[elastic_agent][info] capabilities file not found in /opt/Elastic/Agent/capabilities.yml
10:54:01.397
elastic_agent
[elastic_agent][info] Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
10:54:01.398
elastic_agent
[elastic_agent][info] Starting stats endpoint
10:54:01.398
elastic_agent
[elastic_agent][info] Agent is starting
10:54:01.399
elastic_agent
[elastic_agent][info] Metrics endpoint listening on: /opt/Elastic/Agent/data/tmp/elastic-agent.sock (configured: unix:///opt/Elastic/Agent/data/tmp/elastic-agent.sock)
10:54:01.399
elastic_agent
[elastic_agent][info] Agent is stopped
10:54:01.399
elastic_agent
[elastic_agent][info] Configuration changes detected
10:54:01.404
elastic_agent
[elastic_agent][info] Source URI reset from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
10:54:01.404
elastic_agent
[elastic_agent][info] New State ID is ppTx2cPe
10:54:01.404
elastic_agent
[elastic_agent][info] Converging state requires execution of 2 step(s)
10:54:02.095
elastic_agent
[elastic_agent][info] reexec Shutdown channel triggered
10:54:02.097
elastic_agent
[elastic_agent][info] Restarting Elastic Agent
10:54:02.097
elastic_agent
[elastic_agent][info] waiting for installer of pipeline 'default' to finish
10:54:02.104
elastic_agent
[elastic_agent][info] Signaling application to stop because of shutdown: metricbeat--8.5.1
10:54:02.380
elastic_agent
[elastic_agent][info] APM instrumentation disabled
10:54:02.382
elastic_agent
[elastic_agent][info] Detecting execution mode
10:54:02.398
elastic_agent
[elastic_agent][info] Agent is managed by Fleet
10:54:02.398
elastic_agent
[elastic_agent][info] capabilities file not found in /opt/Elastic/Agent/capabilities.yml
10:54:02.791
elastic_agent
[elastic_agent][info] Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
10:54:02.791
elastic_agent
[elastic_agent][info] Starting stats endpoint
10:54:02.792
elastic_agent
[elastic_agent][info] Agent is starting
10:54:02.792
elastic_agent
[elastic_agent][info] Metrics endpoint listening on: /opt/Elastic/Agent/data/tmp/elastic-agent.sock (configured: unix:///opt/Elastic/Agent/data/tmp/elastic-agent.sock)
10:54:04.774
elastic_agent
[elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
10:54:04.775
elastic_agent
[elastic_agent][info] New State ID is Nk2lL5Z-
10:54:04.775
elastic_agent
[elastic_agent][info] Converging state requires execution of 4 step(s)
10:54:16.972
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:16Z - message: Application: endpoint-security--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:54:26.527
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:26Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:54:27.727
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:27Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
10:54:36.134
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:36Z - message: Application: metricbeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:54:36.377
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.1
10:54:36.866
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:36Z - message: Application: filebeat--8.5.1--36643631373035623733363936343635[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:54:37.178
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.1
10:54:37.331
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:37Z - message: Application: metricbeat--8.5.1--36643631373035623733363936343635[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:54:37.334
elastic_agent
[elastic_agent][info] Updating internal state
10:54:37.399
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:37Z - message: Application: metricbeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
10:54:38.081
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:38Z - message: Application: filebeat--8.5.1--36643631373035623733363936343635[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
10:54:38.597
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:38Z - message: Application: metricbeat--8.5.1--36643631373035623733363936343635[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
10:54:46.144
elastic_agent
[elastic_agent][info] 2022-11-24T15:54:46Z - message: Application: endpoint-security--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to RUNNING: Protecting with policy {4d2eac66-89e2-492a-a803-220d65fc0f70} - type: 'STATE' - sub_type: 'RUNNING'
12:00:33.696
elastic_agent
[elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
12:00:33.696
elastic_agent
[elastic_agent][info] Source URI in operator changed to "https://artifacts.elastic.co/downloads/"
12:00:33.698
elastic_agent
[elastic_agent][info] New State ID is 1vpjk-gr
12:00:33.698
elastic_agent
[elastic_agent][info] Converging state requires execution of 4 step(s)
12:00:33.850
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for endpoint-security.8.5.1
12:00:33.850
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for endpoint-security.8.5.1
12:00:34.084
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.1
12:00:34.084
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.1
12:00:34.367
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.1
12:00:34.367
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.1
12:00:34.606
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.1
12:00:34.606
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.1
12:00:34.889
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.1
12:00:34.889
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.1
12:00:34.894
elastic_agent
[elastic_agent][info] Updating internal state
12:50:19.066
elastic_agent
[elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
12:50:19.067
elastic_agent
[elastic_agent][info] Source URI in operator changed to "https://artifacts.elastic.co/downloads/"
12:50:19.068
elastic_agent
[elastic_agent][info] New State ID is lRsBU2cL
12:50:19.069
elastic_agent
[elastic_agent][info] Converging state requires execution of 4 step(s)
12:50:19.219
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for endpoint-security.8.5.1
12:50:19.219
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for endpoint-security.8.5.1
12:50:19.471
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.1
12:50:19.471
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.1
12:50:19.756
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.1
12:50:19.756
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.1
12:50:19.900
elastic_agent
[elastic_agent][error] Elastic Agent status changed to "error": "app filebeat--8.5.1-187f2c97: 1 error occurred:\n\t* 1 error: Error creating runner from config: Can only start an input when all related states are finished: {Id: native::65765-2049, Finished: false, Fileinfo: &{auth.log 47665 416 {163532742 63804908720 0x560688b86220} {2049 65765 1 33184 104 4 0 0 47665 4096 104 {1669311064 260418006} {1669311920 163532742} {1669311920 163532742} [0 0 0]}}, Source: /var/log/auth.log, Offset: 48765, Timestamp: 2022-11-24 17:47:20.414186027 +0000 UTC m=+6773.844941457, TTL: -1ns, Type: log, Meta: map[], FileStateOS: 65765-2049}\n\n"
12:50:19.901
elastic_agent
[elastic_agent][error] 2022-11-24T17:50:19Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to FAILED: 1 error occurred:
	* 1 error: Error creating runner from config: Can only start an input when all related states are finished: {Id: native::65765-2049, Finished: false, Fileinfo: &{auth.log 47665 416 {163532742 63804908720 0x560688b86220} {2049 65765 1 33184 104 4 0 0 47665 4096 104 {1669311064 260418006} {1669311920 163532742} {1669311920 163532742} [0 0 0]}}, Source: /var/log/auth.log, Offset: 48765, Timestamp: 2022-11-24 17:47:20.414186027 +0000 UTC m=+6773.844941457, TTL: -1ns, Type: log, Meta: map[], FileStateOS: 65765-2049}

 - type: 'ERROR' - sub_type: 'FAILED'
12:50:19.996
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.1
12:50:19.996
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.1
12:50:20.288
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.1
12:50:20.289
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.1
12:50:20.291
elastic_agent
[elastic_agent][info] Updating internal state
12:50:20.389
elastic_agent
[elastic_agent][error] lazy acker: failed ack batch, enqueue for retry: []fleetapi.Action{(*fleetapi.ActionPolicyChange)(0xc0006be1e0)}
12:50:22.803
elastic_agent
[elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
12:50:22.803
elastic_agent
[elastic_agent][info] Source URI in operator changed to "https://artifacts.elastic.co/downloads/"
12:50:22.805
elastic_agent
[elastic_agent][info] New State ID is lRsBU2cL
12:50:22.805
elastic_agent
[elastic_agent][info] Converging state requires execution of 3 step(s)
12:50:23.037
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.1
12:50:23.037
elastic_agent
[elastic_agent][warn] Elastic Agent status changed to "online": ""
12:50:23.037
elastic_agent
[elastic_agent][info] 2022-11-24T17:50:23Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
12:50:23.038
elastic_agent
[elastic_agent][info] 2022-11-24T17:50:23Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'
12:50:23.223
elastic_agent
[elastic_agent][error] filebeat stderr: "Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 4474\n"
12:50:23.228
elastic_agent
[elastic_agent][info] 2022-11-24T17:50:23Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to RESTARTING: exited with code: 1 - type: 'STATE' - sub_type: 'STARTING'
12:50:23.228
elastic_agent
[elastic_agent][info] 2022-11-24T17:50:23Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
12:50:23.228
elastic_agent
[elastic_agent][info] 2022-11-24T17:50:23Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'
12:50:23.345
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.1
12:50:23.345
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.1
12:50:23.415
elastic_agent
[elastic_agent][error] filebeat stderr: "Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 4474\n"
12:50:23.591
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.1
12:50:23.591
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.1
12:50:23.882
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.1
12:50:23.883
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.1
12:50:23.886
elastic_agent
[elastic_agent][info] Updating internal state
12:51:23.490
elastic_agent
[elastic_agent][warn] Elastic Agent status changed to "degraded": "app filebeat--8.5.1-187f2c97: Missed last check-in"
12:51:23.490
elastic_agent
[elastic_agent][info] 2022-11-24T17:51:23Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'
12:52:23.497
elastic_agent
[elastic_agent][error] Elastic Agent status changed to "error": "app filebeat--8.5.1-187f2c97: Missed two check-ins"
12:52:23.497
elastic_agent
[elastic_agent][error] 2022-11-24T17:52:23Z - message: Application: filebeat--8.5.1[a24d0a4b-42ca-4fd2-ae6e-59383a6781dc]:

pjbertels · 2022-11-24T18:26:26Z

Here is the status:

ubuntu@ogc-e81cb331-elastic-agent-ubuntu:~$ sudo elastic-agent status
Status: FAILED
Message: app filebeat--8.5.1-187f2c97: Missed two check-ins
Applications:
  * endpoint-security      (HEALTHY)
                           Protecting with policy {4d2eac66-89e2-492a-a803-220d65fc0f70}
  * filebeat               (FAILED)
                           Missed two check-ins
  * metricbeat             (HEALTHY)
                           Running
  * filebeat_monitoring    (HEALTHY)
                           Running
  * metricbeat_monitoring  (HEALTHY)
                           Running

cmacknz · 2022-11-24T21:09:47Z

[elastic_agent][error] Elastic Agent status changed to "error": "app filebeat--8.5.1-187f2c97: 1 error occurred:\n\t* 1 error: Error creating runner from config: Can only start an input when all related states are finished: {Id: native::65765-2049, Finished: false, Fileinfo: &{auth.log 47665 416 {163532742 63804908720 0x560688b86220} {2049 65765 1 33184 104 4 0 0 47665 4096 104 {1669311064 260418006} {1669311920 163532742} {1669311920 163532742} [0 0 0]}}, Source: /var/log/auth.log, Offset: 48765, Timestamp: 2022-11-24 17:47:20.414186027 +0000 UTC m=+6773.844941457, TTL: -1ns, Type: log, Meta: map[], FileStateOS: 65765-2049}\n\n"
12:50:19.901

@belimawr has a bug for this problem tracked elsewhere, I couldn't find the link looking quickly. Does Filebeat eventually recover from this? That is usually what we observe. If it doesn't we likely need to raise the severity of the bug.

cmacknz · 2022-11-24T22:20:50Z

Found the relevant bug elastic/beats#33653

ceeeekay · 2022-11-25T04:04:48Z

Might be unrelated, so I'm just posting to add a data point, but I've been chasing the same "operation skipped" loop messages for the last day or so. The difference in my case is that I can get new agents to start up if they enrol with an Elasticsearch output, but not with a Logstash output - they will sit forever and never reach healthy.

Once they've enrolled with an Elasticsearch output I can switch back to Logstash and everything seems fine, unless I restart an agent at which point I'm back in the same unhealthy / startup loop.

If I then switch back to Elasticsearch output, the unhealthy agents all come back on line.

pjbertels · 2022-11-28T14:11:11Z

I also observed the issue with 8.5.2 agents. I left this setup running over the weekend and it did not recover, my agents are still marked by Fleet as unhealthy. The problem seems to happen when ever I make a change that results in a lot of network activity. Often it is a when moving the agents between policies.

10:10:42.102
elastic_agent
[elastic_agent][info] APM instrumentation disabled
10:10:42.104
elastic_agent
[elastic_agent][info] Detecting execution mode
10:10:42.105
elastic_agent
[elastic_agent][info] Agent is managed locally
10:10:42.105
elastic_agent
[elastic_agent][info] capabilities file not found in /opt/Elastic/Agent/capabilities.yml
10:10:42.676
elastic_agent
[elastic_agent][info] Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
10:10:42.677
elastic_agent
[elastic_agent][info] Starting stats endpoint
10:10:42.677
elastic_agent
[elastic_agent][info] Agent is starting
10:10:42.678
elastic_agent
[elastic_agent][info] Metrics endpoint listening on: /opt/Elastic/Agent/data/tmp/elastic-agent.sock (configured: unix:///opt/Elastic/Agent/data/tmp/elastic-agent.sock)
10:10:42.678
elastic_agent
[elastic_agent][info] Agent is stopped
10:10:42.679
elastic_agent
[elastic_agent][info] Configuration changes detected
10:10:42.685
elastic_agent
[elastic_agent][info] Source URI reset from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
10:10:42.685
elastic_agent
[elastic_agent][info] New State ID is NLIys1oS
10:10:42.685
elastic_agent
[elastic_agent][info] Converging state requires execution of 2 step(s)
10:10:43.556
elastic_agent
[elastic_agent][info] reexec Shutdown channel triggered
10:10:43.556
elastic_agent
[elastic_agent][info] Restarting Elastic Agent
10:10:43.556
elastic_agent
[elastic_agent][info] waiting for installer of pipeline 'default' to finish
10:10:43.573
elastic_agent
[elastic_agent][info] Signaling application to stop because of shutdown: metricbeat--8.5.2
10:10:43.575
elastic_agent
[elastic_agent][info] Source URI reset from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
10:10:43.862
elastic_agent
[elastic_agent][info] APM instrumentation disabled
10:10:43.864
elastic_agent
[elastic_agent][info] Detecting execution mode
10:10:43.881
elastic_agent
[elastic_agent][info] Agent is managed by Fleet
10:10:43.881
elastic_agent
[elastic_agent][info] capabilities file not found in /opt/Elastic/Agent/capabilities.yml
10:10:44.228
elastic_agent
[elastic_agent][info] Docker provider skipped, unable to connect: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
10:10:44.229
elastic_agent
[elastic_agent][info] Starting stats endpoint
10:10:44.229
elastic_agent
[elastic_agent][info] Agent is starting
10:10:44.229
elastic_agent
[elastic_agent][info] Metrics endpoint listening on: /opt/Elastic/Agent/data/tmp/elastic-agent.sock (configured: unix:///opt/Elastic/Agent/data/tmp/elastic-agent.sock)
10:10:46.170
elastic_agent
[elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
10:10:46.171
elastic_agent
[elastic_agent][info] New State ID is DpwmbSTj
10:10:46.171
elastic_agent
[elastic_agent][info] Converging state requires execution of 4 step(s)
10:10:52.401
elastic_agent
[elastic_agent][info] 2022-11-25T15:10:52Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:10:53.564
elastic_agent
[elastic_agent][info] 2022-11-25T15:10:53Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
10:10:58.833
elastic_agent
[elastic_agent][info] 2022-11-25T15:10:58Z - message: Application: metricbeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:11:00.085
elastic_agent
[elastic_agent][info] 2022-11-25T15:11:00Z - message: Application: metricbeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
10:11:03.060
elastic_agent
[elastic_agent][info] 2022-11-25T15:11:03Z - message: Application: endpoint-security--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:11:03.637
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.2
10:11:03.741
elastic_agent
[elastic_agent][info] 2022-11-25T15:11:03Z - message: Application: filebeat--8.5.2--36643631373035623733363936343635[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:11:03.980
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.2
10:11:04.211
elastic_agent
[elastic_agent][info] 2022-11-25T15:11:04Z - message: Application: metricbeat--8.5.2--36643631373035623733363936343635[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:11:04.217
elastic_agent
[elastic_agent][info] Updating internal state
10:11:04.903
elastic_agent
[elastic_agent][info] 2022-11-25T15:11:04Z - message: Application: filebeat--8.5.2--36643631373035623733363936343635[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
10:11:05.421
elastic_agent
[elastic_agent][info] 2022-11-25T15:11:05Z - message: Application: metricbeat--8.5.2--36643631373035623733363936343635[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RUNNING: Running - type: 'STATE' - sub_type: 'RUNNING'
10:11:33.409
elastic_agent
[elastic_agent][info] 2022-11-25T15:11:33Z - message: Application: endpoint-security--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RUNNING: Protecting with policy {4d2eac66-89e2-492a-a803-220d65fc0f70} - type: 'STATE' - sub_type: 'RUNNING'
10:12:54.456
elastic_agent
[elastic_agent][error] lazy acker: failed ack batch, enqueue for retry: []fleetapi.Action{(*fleetapi.ActionPolicyReassign)(0xc00064e040)}
10:13:07.321
elastic_agent
[elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
10:13:07.322
elastic_agent
[elastic_agent][info] Source URI in operator changed to "https://artifacts.elastic.co/downloads/"
10:13:07.324
elastic_agent
[elastic_agent][info] New State ID is blfDr6k2
10:13:07.324
elastic_agent
[elastic_agent][info] Converging state requires execution of 4 step(s)
10:13:07.516
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.2
10:13:07.516
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.2
10:13:07.744
elastic_agent
[elastic_agent][error] Elastic Agent status changed to "error": "app filebeat--8.5.2-541572af: 1 error occurred:\n\t* 2 errors: Error creating runner from config: Can only start an input when all related states are finished: {Id: native::65761-2049, Finished: false, Fileinfo: &{auth.log 28751 416 {560260638 63804985843 0x56268cb170c0} {2049 65761 1 33184 104 4 0 0 28751 4096 64 {1669388370 181926416} {1669389043 560260638} {1669389043 560260638} [0 0 0]}}, Source: /var/log/auth.log, Offset: 29084, Timestamp: 2022-11-25 15:11:14.149589201 +0000 UTC m=+21.693797461, TTL: -1ns, Type: log, Meta: map[], FileStateOS: 65761-2049}; Error creating runner from config: Can only start an input when all related states are finished: {Id: native::65699-2049, Finished: false, Fileinfo: &{syslog 182333 416 {572261817 63804985843 0x56268cb170c0} {2049 65699 1 33184 104 4 0 0 182333 4096 360 {1669388363 813926088} {1669389043 572261817} {1669389043 572261817} [0 0 0]}}, Source: /var/log/syslog, Offset: 187220, Timestamp: 2022-11-25 15:12:50.225119703 +0000 UTC m=+117.769327963, TTL: -1ns, Type: log, Meta: map[], FileStateOS: 65699-2049}\n\n"
10:13:07.744
elastic_agent
[elastic_agent][error] 2022-11-25T15:13:07Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to FAILED: 1 error occurred:
	* 2 errors: Error creating runner from config: Can only start an input when all related states are finished: {Id: native::65761-2049, Finished: false, Fileinfo: &{auth.log 28751 416 {560260638 63804985843 0x56268cb170c0} {2049 65761 1 33184 104 4 0 0 28751 4096 64 {1669388370 181926416} {1669389043 560260638} {1669389043 560260638} [0 0 0]}}, Source: /var/log/auth.log, Offset: 29084, Timestamp: 2022-11-25 15:11:14.149589201 +0000 UTC m=+21.693797461, TTL: -1ns, Type: log, Meta: map[], FileStateOS: 65761-2049}; Error creating runner from config: Can only start an input when all related states are finished: {Id: native::65699-2049, Finished: false, Fileinfo: &{syslog 182333 416 {572261817 63804985843 0x56268cb170c0} {2049 65699 1 33184 104 4 0 0 182333 4096 360 {1669388363 813926088} {1669389043 572261817} {1669389043 572261817} [0 0 0]}}, Source: /var/log/syslog, Offset: 187220, Timestamp: 2022-11-25 15:12:50.225119703 +0000 UTC m=+117.769327963, TTL: -1ns, Type: log, Meta: map[], FileStateOS: 65699-2049}

 - type: 'ERROR' - sub_type: 'FAILED'
10:13:07.748
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.2
10:13:07.749
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.2
10:13:07.929
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for endpoint-security.8.5.2
10:13:07.929
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for endpoint-security.8.5.2
10:13:08.004
elastic_agent
[elastic_agent][info] 2022-11-25T15:13:08Z - message: Application: endpoint-security--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RUNNING: Protecting with policy {35482b04-d74c-4b58-a8b1-651fc051a0fe} - type: 'STATE' - sub_type: 'RUNNING'
10:13:08.126
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.2
10:13:08.126
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.2
10:13:08.359
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.2
10:13:08.359
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.2
10:13:08.361
elastic_agent
[elastic_agent][info] Updating internal state
10:13:08.492
elastic_agent
[elastic_agent][error] lazy acker: failed ack batch, enqueue for retry: []fleetapi.Action{(*fleetapi.ActionPolicyChange)(0xc000650330)}
10:13:12.125
elastic_agent
[elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
10:13:12.126
elastic_agent
[elastic_agent][info] Source URI in operator changed to "https://artifacts.elastic.co/downloads/"
10:13:12.128
elastic_agent
[elastic_agent][info] New State ID is blfDr6k2
10:13:12.128
elastic_agent
[elastic_agent][info] Converging state requires execution of 3 step(s)
10:13:12.321
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.2
10:13:12.322
elastic_agent
[elastic_agent][warn] Elastic Agent status changed to "online": ""
10:13:12.322
elastic_agent
[elastic_agent][info] 2022-11-25T15:13:12Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:13:12.322
elastic_agent
[elastic_agent][info] 2022-11-25T15:13:12Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'
10:13:12.458
elastic_agent
[elastic_agent][error] filebeat stderr: "Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 3313\n"
10:13:12.461
elastic_agent
[elastic_agent][info] 2022-11-25T15:13:12Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RESTARTING: exited with code: 1 - type: 'STATE' - sub_type: 'STARTING'
10:13:12.462
elastic_agent
[elastic_agent][info] 2022-11-25T15:13:12Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to RESTARTING: Restarting - type: 'STATE' - sub_type: 'STARTING'
10:13:12.462
elastic_agent
[elastic_agent][info] 2022-11-25T15:13:12Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'
10:13:12.561
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.2
10:13:12.561
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.2
10:13:12.597
elastic_agent
[elastic_agent][error] filebeat stderr: "Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 3313\n"
10:13:12.765
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.2
10:13:12.765
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.2
10:13:13.002
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.2
10:13:13.002
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.2
10:13:13.005
elastic_agent
[elastic_agent][info] Updating internal state
10:13:13.052
elastic_agent
[elastic_agent][error] lazy acker: failed ack batch, enqueue for retry: []fleetapi.Action{(*fleetapi.ActionPolicyChange)(0xc0008c3320)}
10:13:18.172
elastic_agent
[elastic_agent][info] Source URI changed from "https://artifacts.elastic.co/downloads/" to "https://artifacts.elastic.co/downloads/"
10:13:18.172
elastic_agent
[elastic_agent][info] Source URI in operator changed to "https://artifacts.elastic.co/downloads/"
10:13:18.174
elastic_agent
[elastic_agent][info] New State ID is blfDr6k2
10:13:18.174
elastic_agent
[elastic_agent][info] Converging state requires execution of 3 step(s)
10:13:18.366
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.2
10:13:18.366
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.2
10:13:18.600
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.2
10:13:18.600
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.2
10:13:18.799
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for filebeat.8.5.2
10:13:18.799
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for filebeat.8.5.2
10:13:19.033
elastic_agent
[elastic_agent][info] operation 'operation-install' skipped for metricbeat.8.5.2
10:13:19.033
elastic_agent
[elastic_agent][info] operation 'operation-start' skipped for metricbeat.8.5.2
10:13:19.035
elastic_agent
[elastic_agent][info] Updating internal state
10:14:09.033
elastic_agent
[elastic_agent][warn] Elastic Agent status changed to "degraded": "app filebeat--8.5.2-541572af: Missed last check-in"
10:14:09.033
elastic_agent
[elastic_agent][info] 2022-11-25T15:14:09Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'
10:15:09.042
elastic_agent
[elastic_agent][error] Elastic Agent status changed to "error": "app filebeat--8.5.2-541572af: Missed two check-ins"
10:15:09.043
elastic_agent
[elastic_agent][error] 2022-11-25T15:15:09Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'
10:15:09.043
elastic_agent
[elastic_agent][error] 2022-11-25T15:15:09Z - message: Application: filebeat--8.5.2[2a9455e0-ccf1-47d7-8701-7664c4802907]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'
Showing entries until Nov 25, 10:15:09
Agent logging level

info

cmacknz · 2022-11-28T14:59:24Z

@ceeeekay There is at least one known bug where the policy revision can get out of sync between the agent and Fleet using the Logstash output: elastic/fleet-server#2105. Possibly that is related to the problems you are observing.

cmacknz · 2022-11-28T15:18:47Z

@pjbertels looking at the error [elastic_agent][error] filebeat stderr: "Exiting: cannot obtain lockfile: connot start, data directory belongs to process with pid 3313\n" 10:13:12.765

This should have been fixed, the original issue was reopened here: elastic/beats#31670

ceeeekay · 2022-11-28T18:13:05Z

@ceeeekay There is at least one known bug where the policy revision can get out of sync between the agent and Fleet using the Logstash output: elastic/fleet-server#2105. Possibly that is related to the problems you are observing.

@cmacknz That looks like the one - thanks.

fearful-symmetry · 2022-11-28T20:11:41Z

@pjbertels what environment is fleet running in here? K8s? Docker? Native? I've recently ran into some issues with how this behaves inside certain k8s environments where the namespace's PID counter will reset but the filesystem remains the same, meaning the PID in the lockfile can be mapped to a new process.

Also, have we confirmed that the beats aren't trying to run multiple instances out of the same data directory?

pjbertels · 2022-11-30T20:53:49Z

This is an agent installed in an Ubuntu VM in GCP. The VM is created and the sw is installed in the usual way by a tool we have(OGC). Fleet is running in ESS. There are a few issues that have been identified and we will retest when we can get those fixes.

AndersonQ · 2022-12-01T15:45:13Z

The problem seems to happen when ever I make a change that results in a lot of network activity. Often it is a when moving the agents between policies.

I can only imagine that the problem is indeed a change in the policy, which makes the agent restart and for some reason it does not restart correctly

pjbertels · 2022-12-06T16:32:21Z

We think the issue is this one ... elastic/beats#33653.

0xdeadbeer · 2022-12-08T10:36:16Z

More info from my side:

agent version: 8.4.3
integrations: system & docker metrics and logs

no matter how or in which order i enroll it, if the docker metrics and logs integration is present filebeat will fail.. (and result in the agent appearing as unhealthy)

Status: FAILED
Message: app filebeat--8.4.3-38e1f0dc: 1 error occurred:
        * 5 errors: Error creating runner from config: unexpected to option in processors.0.add_fields; Error creating runner from config: unexpected to option in processors.0.add_fields; Error creating runner from config: unexpected to option in processors.0.add_fields; Error creating runner from config: unexpected to option in processors.0.add_fields; Error creating runner from config: unexpected to option in processors.0.add_fields


Applications:
  * filebeat  (FAILED)
              1 error occurred:
              * 5 errors: Error creating runner from config: unexpected to option in processors.0.add_fields; Error creating runner from config: unexpected to option in processors.0.add_fields; Error creating runner from config: unexpected to option in processors.0.add_fields; Error creating runner from config: unexpected to option in processors.0.add_fields; Error creating runner from config: unexpected to option in processors.0.add_fields


  * metricbeat             (HEALTHY)
                           Running
  * filebeat_monitoring    (HEALTHY)
                           Running
  * metricbeat_monitoring  (HEALTHY)
                           Running

0xdeadbeer · 2022-12-08T10:37:16Z

After checking the logs, this is the only type of log i see that stands out from the rest..

cmacknz · 2022-12-08T13:54:21Z

@osamu-kj unexpected to option in processors.0.add_field is the key error in your case, likely the syntax in one of your processor definitions is wrong. If you have not added custom processors manually then this is a bug in the integration and you can open a new issue in https://github.com/elastic/integrations/.

0xdeadbeer · 2022-12-08T14:02:26Z

@cmacknz in the meantime i've upgraded elk and all of the agents to 8.5.2 and now its working fine, i'll let you know if something goes wrong on my side as well or similar.. thanks for mentioning it :)

jlind23 · 2023-02-27T09:17:05Z

@pjbertels is this still occurring or can we close that issue?

jlind23 · 2023-06-22T11:27:39Z

Not relevant anymore, closing.

pjbertels added bug Something isn't working Project:FleetScaling labels Nov 24, 2022

pjbertels assigned AndersonQ Nov 28, 2022

cmacknz mentioned this issue Nov 28, 2022

beats locker should be pid aware elastic/beats#31670

Closed

1 task

ceeeekay mentioned this issue Feb 24, 2023

Fleet server / Elastic agent: "output not supported" using Logstash output on 8.6 #2316

Closed

jlind23 closed this as not planned Won't fix, can't repro, duplicate, stale Jun 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

8.5.1 agents go unhealthy #1790

8.5.1 agents go unhealthy #1790

pjbertels commented Nov 24, 2022

pjbertels commented Nov 24, 2022

pjbertels commented Nov 24, 2022

cmacknz commented Nov 24, 2022

cmacknz commented Nov 24, 2022

ceeeekay commented Nov 25, 2022 •

edited

Loading

pjbertels commented Nov 28, 2022 •

edited

Loading

cmacknz commented Nov 28, 2022

cmacknz commented Nov 28, 2022

ceeeekay commented Nov 28, 2022

fearful-symmetry commented Nov 28, 2022

pjbertels commented Nov 30, 2022

AndersonQ commented Dec 1, 2022

pjbertels commented Dec 6, 2022

0xdeadbeer commented Dec 8, 2022

0xdeadbeer commented Dec 8, 2022

cmacknz commented Dec 8, 2022 •

edited

Loading

0xdeadbeer commented Dec 8, 2022

jlind23 commented Feb 27, 2023

jlind23 commented Jun 22, 2023

8.5.1 agents go unhealthy #1790

8.5.1 agents go unhealthy #1790

Comments

pjbertels commented Nov 24, 2022

pjbertels commented Nov 24, 2022

pjbertels commented Nov 24, 2022

cmacknz commented Nov 24, 2022

cmacknz commented Nov 24, 2022

ceeeekay commented Nov 25, 2022 • edited Loading

pjbertels commented Nov 28, 2022 • edited Loading

cmacknz commented Nov 28, 2022

cmacknz commented Nov 28, 2022

ceeeekay commented Nov 28, 2022

fearful-symmetry commented Nov 28, 2022

pjbertels commented Nov 30, 2022

AndersonQ commented Dec 1, 2022

pjbertels commented Dec 6, 2022

0xdeadbeer commented Dec 8, 2022

0xdeadbeer commented Dec 8, 2022

cmacknz commented Dec 8, 2022 • edited Loading

0xdeadbeer commented Dec 8, 2022

jlind23 commented Feb 27, 2023

jlind23 commented Jun 22, 2023

ceeeekay commented Nov 25, 2022 •

edited

Loading

pjbertels commented Nov 28, 2022 •

edited

Loading

cmacknz commented Dec 8, 2022 •

edited

Loading