[8.1.0-SNAPSHOT] Fleet Server can't enroll: FAILED: Missed two check-ins #1129

mtojek · 2022-02-03T10:16:03Z

Hi,

we adopted elastic-package to use predefined agent policies and confirmed with @juliaElastic that we're ready for switch (main branch is green).

Since yesterday we're facing problems with enrollment:

Attaching to elastic-package-stack_fleet-server_1
�[36mfleet-server_1               |�[0m Performing setup of Fleet in Kibana
�[36mfleet-server_1               |�[0m 
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:21.515Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":587},"message":"Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:21.712Z","log.origin":{"file.name":"application/application.go","file.line":78},"message":"Detecting execution mode","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:21.713Z","log.origin":{"file.name":"application/application.go","file.line":98},"message":"Agent is in Fleet Server bootstrap mode","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:22.031Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":62},"message":"Starting stats endpoint","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:22.031Z","log.origin":{"file.name":"application/fleet_server_bootstrap.go","file.line":134},"message":"Agent is starting","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:22.031Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":64},"message":"Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:22.034Z","log.origin":{"file.name":"application/fleet_server_bootstrap.go","file.line":144},"message":"Agent is stopped","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:22.141Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":48},"message":"New State ID is Wb5PhdQX","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:22.142Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 1 step(s)","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:22.733Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-02-03T08:05:22Z - message: Application: fleet-server--8.1.0-SNAPSHOT[]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:22.735Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":66},"message":"Updating internal state","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:05:24.523Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":792},"message":"Fleet Server - Starting","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"warn","@timestamp":"2022-02-03T08:06:27.040Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'degraded'","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:06:27.040Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-02-03T08:06:27Z - message: Application: fleet-server--8.1.0-SNAPSHOT[]: State changed to DEGRADED: Missed last check-in - type: 'STATE' - sub_type: 'RUNNING'","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:07:21.517Z","log.origin":{"file.name":"cmd/run.go","file.line":203},"message":"Shutting down Elastic Agent and sending last events...","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:07:21.519Z","log.origin":{"file.name":"operation/operator.go","file.line":223},"message":"waiting for installer of pipeline 'default' to finish","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:07:21.520Z","log.origin":{"file.name":"process/app.go","file.line":176},"message":"Signaling application to stop because of shutdown: fleet-server--8.1.0-SNAPSHOT","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"error","@timestamp":"2022-02-03T08:07:27.047Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'error'","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"error","@timestamp":"2022-02-03T08:07:27.047Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-02-03T08:07:27Z - message: Application: fleet-server--8.1.0-SNAPSHOT[]: State changed to FAILED: Missed two check-ins - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:07:51.570Z","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'online'","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:07:51.570Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-02-03T08:07:51Z - message: Application: fleet-server--8.1.0-SNAPSHOT[]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:07:51.570Z","log.origin":{"file.name":"cmd/run.go","file.line":211},"message":"Shutting down completed.","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m {"log.level":"info","@timestamp":"2022-02-03T08:07:51.570Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":66},"message":"Stats endpoint (/usr/share/elastic-agent/state/data/tmp/elastic-agent.sock) finished: accept unix /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock: use of closed network connection","ecs.version":"1.6.0"}
�[36mfleet-server_1               |�[0m Error: fleet-server failed: context canceled
�[36mfleet-server_1               |�[0m For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.1/fleet-troubleshooting.html
�[36mfleet-server_1               |�[0m Error: enrollment failed: exit status 1
�[36mfleet-server_1               |�[0m For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.1/fleet-troubleshooting.html

More logs: https://beats-ci.elastic.co/job/Ingest-manager/job/integrations/job/main/98/artifact/build/elastic-stack-dump/synthetics/logs/

It affects the Integrations main branch, incl. synthetics, containerd, etc.

Steps to reproduce:

elastic-package stack update -v -d --version 8.1.0-SNAPSHOT
elastic-package stack up -v -d --version 8.1.0-SNAPSHOT

Thanks for any help with investigating this problem.

cc @jlind23 @joshdover

The text was updated successfully, but these errors were encountered:

joshdover · 2022-02-03T12:00:56Z

Curious that there's no logs from Fleet server related to the policy it selected.

One possible workaround could be to add FLEET_SERVER_POLICY_ID=fleet-server-managed-ep to the env vars for Fleet Server here: https://github.com/elastic/elastic-package/blob/main/internal/profile/_static/docker-compose-stack.yml#L79

But we should figure out what the root issue here is regardless of if the workaround works.

mtojek · 2022-02-03T12:04:19Z

Hey @joshdover, we tried it with Julia while working on migration to hosted policies. With FLEET_SERVER_POLICY_ID=fleet-server-managed-ep it will break compatibility with 7.x stack. If it isn't really necessary, I would postpone introducing this env. Otherwise, we'll have to implement some hack/forked stack.

mtojek · 2022-02-03T12:55:58Z

We did some investigation and it seems that the root cause is in the Elastic Agent Docker image. The last stable one we managed to build is this one: elastic/elastic-package#683 (@sha256:a5c580573376d65ed2eba92d359b411cdae4bf52745af8e3bb8c0c91f8ce53a5).

It maps onto:

elastic-agent@cd978fd268e2:~$ ./elastic-agent diagnostics
elastic-agent  version: 8.1.0
               build_commit: 56b227d00945ae97d7e8663df048c29f311b8894  build_time: 2022-02-01 07:01:00 +0000 UTC  snapshot_build: true
Applications:
  *  name: metricbeat  route_key: default
     error: Get "http://unix/": dial unix /usr/share/elastic-agent/state/data/tmp/default/metricbeat/metricbeat.sock: connect: no such file or directory

mtojek · 2022-02-03T12:57:41Z

We suspect that problem might have been introduced with this PR: elastic/beats#29031

cc @ph @blakerouse

ph · 2022-02-03T13:11:04Z

I will take a look, but I think the PR you have is the only major thing that I know that could have impact the agent.

juliaElastic · 2022-02-03T13:36:42Z

@criamico @mtojek this change might be related as well: elastic/kibana#108252

when starting elastic-package locally, I see this in fleet-server logs:
Kibana Fleet setup failed: http POST request to http://kibana:5601/api/fleet/setup fails: Forbidden: <nil>. Response: {"statusCode":403,"error":"Forbidden","message":"Forbidden"}

what is more, .fleet-agents index is not created, I have a suspicion that fleet-server might not have access to fleet API at all

joshdover · 2022-02-03T13:46:58Z

I don't think it's related to elastic/kibana#108252. I've been able to successfully run this on main without any issues as a manual test:

# Create new elastic/fleet-server token
curl --request POST \
  --url http://localhost:9200/_security/service/elastic/fleet-server/credential/token \
  -u elastic:changeme

# Copy token response into authz header below
curl --request POST \
  --url http://localhost:5601/api/fleet/setup \
  --header 'authorization: Bearer <token>' \
  --header 'content-type: application/json' \
  --header 'kbn-xsrf: x'

Do we need to update the token that we're using? Maybe our manual hardcoded token isn't working anymore due to a change in ES?

mtojek · 2022-02-03T13:58:32Z

@joshdover In this PR I forced the specific Docker image for Elastic Agent and it passed. Elasticsearch, Kibana images were the same.

ph · 2022-02-03T14:08:20Z

OK, I went through the all commits in fleet-server, the latest commit that add actual code in the server is 4 days ago, https://github.com/elastic/fleet-server/pulls?q=is%3Apr+is%3Amerged I am going to concentrate on the Agent side of things.

This revert the code of the APM Instrumentation of the Elastic Agent. To unblock the build of and the CI for other team. This would require more investigation to really understand the problem. Fixes elastic/fleet-server#1129

ph · 2022-02-03T19:51:43Z

This was a really deep rabbit hole, I took some time to have a running testing and a fast environment to debug using the AGENT_DROP_PATH and having part of Elastic-Agent precompiled and just building the current platform and architecture of the Elastic Agent. Looking at the behavior of the elastic-package the Fleet server was really waiting for an initial configuration. Looking at the Fleet in Kibana, I was able to see that the system was stuck in waiting the first enrollment. The other behavior of Fleet-Server was the heavy CPU usage, maybe the fleet-server is stuck in a live loop? I also tested outside of the Docker environment, and the bug was present there two.

Because of this, I've initially thought the problem was fleet-server. I've bisected the last good commit of the fleet server and the bug was still present. Now, everything was showing a problem with the Elastic Agent side, so I also did a bisect of the last good build to the working version. I was able to narrow it down to the actual APM instrumentation.

Looking at the implementations the traces should have been disabled by default and not impact any behavior of the agent. If I remove the whole PR the Elastic Agent was able to do the initial enrollment into Fleet without any problems. I look more closely at the code I've tried to remove the code of the gRPC interceptor but it did not fix the situation. I've decided to revert the whole implementation of the APM instrumentation and we will need to look into it more. I've detected that importing apmgprc has an init side effect. I've removed the import but it didn't fix the problem.

Reverting the PR was noted a simple revert, another pull request applied after had a conflicting change.

Looking at that PR, it was green except for the e2e CI, if the latter was working I am confident it would have caught that issues.

Action items:

Write a better documentation for testing this scenario.
Make the build way quicker, It takes a lot of minutes to have a working binary.
Write a post-morterm
E2E testing needs to be enabled back or have a simple job with the elastic-package to bring the stack.
Investigate high CPU usage
Investigate APM Instrumentation of Elastic Agent with @stuartnelson3

* Revert #29031 This revert the code of the APM Instrumentation of the Elastic Agent. To unblock the build of and the CI for other team. This would require more investigation to really understand the problem. Fixes elastic/fleet-server#1129 * fix make update * fix linter (cherry picked from commit 718c923)

mtojek · 2022-02-04T09:08:19Z

Let's keep it open until we confirm that it fixed.

ph · 2022-02-04T14:52:00Z

interesting that the issue was closed from a forked repository? I don't remember seeing that ever.

axw · 2022-02-08T09:58:42Z

I'm seeing what appears to be the same issue with docker.elastic.co/beats/elastic-agent:8.2.0-5d69c4c3-SNAPSHOT, which is built from elastic/beats@5529c31 (after the revert).

To reproduce, clone elastic/apm-server#7227 and run docker-compose up -d. Fleet Server fails to enroll,

Logs:

$ docker-compose logs fleet-server
Attaching to apm-server_fleet-server_1
fleet-server_1      | Requesting service_token from Kibana.
fleet-server_1      | Created service_token named: token-1644310962231
fleet-server_1      | Performing setup of Fleet in Kibana
fleet-server_1      | 
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:43.549Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":572},"message":"Spawning Elastic Agent daemon as a subprocess to complete bootstrap process.","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:43.766Z","log.origin":{"file.name":"application/application.go","file.line":68},"message":"Detecting execution mode","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:43.774Z","log.origin":{"file.name":"application/application.go","file.line":88},"message":"Agent is in Fleet Server bootstrap mode","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:44.557Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":744},"message":"Waiting for Elastic Agent to start Fleet Server","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:44.576Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":62},"message":"Starting stats endpoint","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:44.576Z","log.origin":{"file.name":"application/fleet_server_bootstrap.go","file.line":131},"message":"Agent is starting","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:44.576Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":64},"message":"Metrics endpoint listening on: /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock (configured: unix:///usr/share/elastic-agent/state/data/tmp/elastic-agent.sock)","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:44.577Z","log.origin":{"file.name":"application/fleet_server_bootstrap.go","file.line":141},"message":"Agent is stopped","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:46.731Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":48},"message":"New State ID is nLbqrZoq","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:46.731Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 1 step(s)","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:48.286Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-02-08T09:02:48Z - message: Application: fleet-server--8.2.0-SNAPSHOT[]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:48.287Z","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":66},"message":"Updating internal state","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:49.368Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-02-08T09:02:49Z - message: Application: fleet-server--8.2.0-SNAPSHOT[]: State changed to STARTING: Waiting on default policy with Fleet Server integration - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:02:50.566Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":777},"message":"Fleet Server - Waiting on default policy with Fleet Server integration","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:04:43.551Z","log.origin":{"file.name":"cmd/run.go","file.line":185},"message":"Shutting down Elastic Agent and sending last events...","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:04:43.551Z","log.origin":{"file.name":"operation/operator.go","file.line":216},"message":"waiting for installer of pipeline 'default' to finish","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:04:43.551Z","log.origin":{"file.name":"process/app.go","file.line":176},"message":"Signaling application to stop because of shutdown: fleet-server--8.2.0-SNAPSHOT","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:04:45.053Z","log.origin":{"file.name":"cmd/run.go","file.line":193},"message":"Shutting down completed.","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:04:45.053Z","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-02-08T09:04:45Z - message: Application: fleet-server--8.2.0-SNAPSHOT[]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'","ecs.version":"1.6.0"}
fleet-server_1      | {"log.level":"info","@timestamp":"2022-02-08T09:04:45.053Z","log.logger":"api","log.origin":{"file.name":"api/server.go","file.line":66},"message":"Stats endpoint (/usr/share/elastic-agent/state/data/tmp/elastic-agent.sock) finished: accept unix /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock: use of closed network connection","ecs.version":"1.6.0"}
fleet-server_1      | Error: fleet-server failed: context canceled
fleet-server_1      | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.2/fleet-troubleshooting.html
fleet-server_1      | Error: enrollment failed: exit status 1
fleet-server_1      | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.2/fleet-troubleshooting.html

mtojek · 2022-02-08T10:11:13Z

@ph Could you please check what is the status of the elastic-agent Docker image? The issue still persists in Integrations.

ph · 2022-02-08T14:10:33Z

@mtojek I am taking another look.

ph · 2022-02-08T14:16:11Z

@mtojek Looking at the failure of the ci, this concern the 8.1 snapshots, and well I didn't merge elastic/beats#30209, I will double-check the failures and merge it. Is there a job that test on master?

axw · 2022-02-08T14:38:00Z

@ph if you don't care about running the specific steps that @mtojek mentioned: the steps I listed in #1129 (comment) are for main (8.2.0-SNAPSHOT).

ph · 2022-02-08T16:45:10Z

unix /usr/share/elastic-agent/state/data/tmp/elastic-agent.sock: use of closed network connection","ecs.version":"1.6.0"}
fleet-server_1 | Error: fleet-server failed: context canceled
fleet-server_1 | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.2/fleet-troubleshooting.html
fleet-server_1 | Error: enrollment failed: exit status 1
fleet-server_1 | For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.2/fleet-troubleshooting.html
fleet-server_1 | {"log.level":"info","@timestamp":"2022-02-08T16:42:09.461Z","log.origin":

mtojek · 2022-02-08T16:49:16Z

@mtojek Looking at the failure of the ci, this concern the 8.1 snapshots, and well I didn't merge elastic/beats#30209, I will double-check the failures and merge it. Is there a job that test on master?

For example, it fails for the Integrations master, containerd integration.

ph · 2022-02-08T16:49:49Z

Build is 5529c31cf1bd68bf2ad089ef747186f9510ff3f1 and does include the revert

❯ git show 5529c31cf1bd68bf2ad089ef747186f9510ff3f1                                                                                                                                                                             [11:48:31]
commit 5529c31cf1bd68bf2ad089ef747186f9510ff3f1 (HEAD)
Author: Elastic Machine <[email protected]>
Date:   Mon Feb 7 10:17:17 2022 -0600

    [Release] update version to next minor 8.2.0 (#30160)

diff --git a/libbeat/version/version.go b/libbeat/version/version.go
index 873ae40db0..38249106a4 100644
--- a/libbeat/version/version.go
+++ b/libbeat/version/version.go
@@ -18,4 +18,4 @@
 // Code generated by dev-tools/set_version
 package version
 
-const defaultBeatVersion = "8.1.0"
+const defaultBeatVersion = "8.2.0

This is indeed a strange behavior, because I was able to reproduce the bug everytime with the instrumention commit and not without it.

mtojek · 2022-02-08T17:04:57Z

I retriggered the main job. Let's see what's the current status: link

ph · 2022-02-08T17:07:56Z

It should fail @mtojek I can reproduce the bug with the docker-images, the debug statement are lacking I am shooting a bit in the dark at that point.

ph · 2022-02-08T17:34:54Z

Interesting logs in the Kibana side, not sure why we have multiple fleet-setup completed statement

[2022-02-08T16:39:23.863+00:00][INFO ][status] Kibana is now degraded (was available)
[2022-02-08T16:39:24.670+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2022-02-08T16:39:24.759+00:00][INFO ][plugins.fleet] Fleet setup completed
[2022-02-08T16:39:24.798+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2022-02-08T16:39:24.890+00:00][INFO ][plugins.fleet] Fleet setup completed
[2022-02-08T16:39:29.403+00:00][INFO ][status] Kibana is now available (was degraded)
[2022-02-08T17:00:25.540+00:00][INFO ][status] Kibana is now degraded (was available)
[2022-02-08T17:00:29.298+00:00][INFO ][status] Kibana is now available (was degraded)
[2022-02-08T17:02:38.529+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2022-02-08T17:02:38.615+00:00][INFO ][plugins.fleet] Fleet setup completed
[2022-02-08T17:02:38.632+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2022-02-08T17:02:38.718+00:00][INFO ][plugins.fleet] Fleet setup completed
[2022-02-08T17:07:14.094+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2022-02-08T17:07:14.145+00:00][INFO ][plugins.fleet] Fleet setup completed
[2022-02-08T17:07:14.168+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2022-02-08T17:07:14.217+00:00][INFO ][plugins.fleet] Fleet setup completed
[2022-02-08T17:14:45.142+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2022-02-08T17:14:45.208+00:00][INFO ][plugins.fleet] Fleet setup completed
[2022-02-08T17:14:45.226+00:00][INFO ][plugins.fleet] Beginning fleet setup
[2022-02-08T17:14:45.349+00:00][INFO ][plugins.fleet] Fleet setup completed
[2022-02-08T17:25:17.553+00:00][ERROR][plugins.taskManager] Failed to poll for work: Error: work has timed out
[2022-02-08T17:25:17.580+00:00][INFO ][status] Kibana is now degraded (was available)
[2022-02-08T17:25:18.802+00:00][WARN ][plugins.kibanaUsageCollection] Average event loop delay threshold exceeded 350ms. Received 10813.265237333333ms. See https://ela.st/kibana-scaling-considerations for more information about scaling Kibana.
[2022-02-08T17:25:29.266+00:00][INFO ][status] Kibana is now available (was degraded)

ph · 2022-02-08T18:10:25Z

OK, I think we might have two different problems, let's start with the APM-Server, recently we have removed the autogeneration of configuration of fleet-server without a human 'intervention' elastic/kibana#108456. Looking at the APM docker-compose file at https://github.com/elastic/apm-server/blob/main/docker-compose.yml#L41-L63 we never configure the default fleet server configuration. So this aligns with we what see in the log Fleet Server is waiting on a configuration that will never exist. Elastic Package has creared a PR with elastic/elastic-package#676

Now I will check with the elastic-package.

ph · 2022-02-08T19:11:33Z

When I've tested #1129 (comment) I didn't use the container subcommand and did use the link from the kibana UI, so in that case Kibana generates the appropriate server configuration.

ph · 2022-02-08T20:28:12Z

Added notes here, using this configuration yield a few deprecation warning from elastic/kibana#108456 (comment)

xpack.fleet.agentPolicies:
  - name: Agent policy 1
    description: Agent policy 1
    is_managed: false
    namespace: default
    monitoring_enabled:
      - logs
      - metrics
    package_policies:
      - name: system-1
        id: default-system
        package:
          name: system
  - name: Fleet Server policy preconfigured
    id: fleet-server-policy
    namespace: default
    package_policies:
      - name: Fleet Server
        package:
          name: fleet_server

[2022-02-08T20:34:28.207+00:00][WARN ][config.deprecation] Config key [xpack.fleet.agentPolicies.is_default] is deprecated.
[2022-02-08T20:34:28.208+00:00][WARN ][config.deprecation] Config key [xpack.fleet.agentPolicies.is_default_fleet_server] is deprecated.
[2022-02-08T20:34:28.208+00:00][WARN ][config.deprecation] Config key [xpack.fleet.agents.elasticsearch.host] is deprecated and replaced by

ph · 2022-02-08T20:32:55Z

I still think it's something only when using an automation workflow, when I have a user journey it seems to work at least outside of containers.

ph · 2022-02-08T22:38:21Z

OK, 8.1.0 is stuck in a failure loop on Fleet-Server, the server is not even started. This is exactly what marcin had.
Going to do the same thing in the 8.2.0 artifacts.

ph · 2022-02-08T23:02:05Z

OK, 8.2.0 elastic-package stack works for me.

Creating network "elastic-package-stack_default" with the default driver
Creating elastic-package-stack_elasticsearch_1    ... done
Creating elastic-package-stack_package-registry_1 ... done
Creating elastic-package-stack_package-registry_is_ready_1 ... done
Creating elastic-package-stack_kibana_1                    ... done
Creating elastic-package-stack_elasticsearch_is_ready_1    ... done
Creating elastic-package-stack_fleet-server_1              ... done
Creating elastic-package-stack_kibana_is_ready_1           ... done
Creating elastic-package-stack_elastic-agent_1             ... done
Creating elastic-package-stack_fleet-server_is_ready_1     ... done
Creating elastic-package-stack_elastic-agent_is_ready_1    ... done
Done

Logging into Kibana show both Elastic Agent connecting to it. everything seems to be enrolled fine.
I don't know how it's used in the CI but if the 8.2.0 works. I wonder if the asserts or the configuration of the integration has issues in the CI.

@jlind23 @axw The main difference in 8.2.0 and 8.1.0 is really the instrumentation. fleet-server is identical.

mtojek · 2022-02-09T08:08:48Z

Thanks, @ph, for working on this to reduce the blast.

I opened a similar PR to verify the 8.2.0 stack: elastic/elastic-package#692

Hey @simitt @axw @stuartnelson3, I suppose you've been already researching the APM instrumentation issue. Could you please share more details or link the issue, so we can learn what went wrong here? My bet is an undetected library conflict somewhere around GRPC.

ph · 2022-02-09T19:23:31Z

Elastic-package and apm-server problem is fixed so I am going to close this issue, If there is a problem we can reopen it.

simitt · 2022-02-11T16:00:58Z

Hey @simitt @axw @stuartnelson3, I suppose you've been already researching the APM instrumentation issue. Could you please share more details or link the issue, so we can learn what went wrong here? My bet is an undetected library conflict somewhere around GRPC.

@stuartnelson3 is looking into this.

This revert the code of the APM Instrumentation of the Elastic Agent. To unblock the build of and the CI for other team. This would require more investigation to really understand the problem. Fixes elastic/fleet-server#1129

mtojek added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Fleet Label for the Fleet team labels Feb 3, 2022

mtojek mentioned this issue Feb 3, 2022

Emergency fix for fleet-server bootstrap issue elastic/elastic-package#683

Merged

ph self-assigned this Feb 3, 2022

ph added the bug Something isn't working label Feb 3, 2022

ph closed this as completed Feb 3, 2022

ph reopened this Feb 3, 2022

ph mentioned this issue Feb 3, 2022

Revert #29031 elastic/beats#30197

Merged

6 tasks

ph closed this as completed in elastic/beats#30197 Feb 4, 2022

amolnater-qasource mentioned this issue Feb 4, 2022

[Self-Managed]: Unable to install Fleet Server on 8.1 BC-1. #1134

Closed

mtojek reopened this Feb 4, 2022

tetianakravchenko closed this as completed in tetianakravchenko/beats@718c923 Feb 4, 2022

ph reopened this Feb 4, 2022

ph mentioned this issue Feb 8, 2022

[automation] update elastic stack version for testing 8.2.0-5d69c4c3 elastic/apm-server#7227

Merged

juliaElastic mentioned this issue Feb 9, 2022

[Fleet] Remove default integrations elastic/kibana#108456

Closed

3 tasks

ph closed this as completed Feb 9, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[8.1.0-SNAPSHOT] Fleet Server can't enroll: FAILED: Missed two check-ins #1129

[8.1.0-SNAPSHOT] Fleet Server can't enroll: FAILED: Missed two check-ins #1129

mtojek commented Feb 3, 2022 •

edited

Loading

joshdover commented Feb 3, 2022

mtojek commented Feb 3, 2022

mtojek commented Feb 3, 2022 •

edited

Loading

mtojek commented Feb 3, 2022

ph commented Feb 3, 2022

juliaElastic commented Feb 3, 2022 •

edited

Loading

joshdover commented Feb 3, 2022 •

edited

Loading

mtojek commented Feb 3, 2022

ph commented Feb 3, 2022

ph commented Feb 3, 2022 •

edited

Loading

mtojek commented Feb 4, 2022

ph commented Feb 4, 2022

axw commented Feb 8, 2022

mtojek commented Feb 8, 2022 •

edited

Loading

ph commented Feb 8, 2022

ph commented Feb 8, 2022

axw commented Feb 8, 2022

ph commented Feb 8, 2022

mtojek commented Feb 8, 2022

ph commented Feb 8, 2022 •

edited

Loading

mtojek commented Feb 8, 2022

ph commented Feb 8, 2022

ph commented Feb 8, 2022 •

edited

Loading

ph commented Feb 8, 2022

ph commented Feb 8, 2022

ph commented Feb 8, 2022 •

edited

Loading

ph commented Feb 8, 2022

ph commented Feb 8, 2022

ph commented Feb 8, 2022

mtojek commented Feb 9, 2022

ph commented Feb 9, 2022

simitt commented Feb 11, 2022

[8.1.0-SNAPSHOT] Fleet Server can't enroll: FAILED: Missed two check-ins #1129

[8.1.0-SNAPSHOT] Fleet Server can't enroll: FAILED: Missed two check-ins #1129

Comments

mtojek commented Feb 3, 2022 • edited Loading

joshdover commented Feb 3, 2022

mtojek commented Feb 3, 2022

mtojek commented Feb 3, 2022 • edited Loading

mtojek commented Feb 3, 2022

ph commented Feb 3, 2022

juliaElastic commented Feb 3, 2022 • edited Loading

joshdover commented Feb 3, 2022 • edited Loading

mtojek commented Feb 3, 2022

ph commented Feb 3, 2022

ph commented Feb 3, 2022 • edited Loading

mtojek commented Feb 4, 2022

ph commented Feb 4, 2022

axw commented Feb 8, 2022

mtojek commented Feb 8, 2022 • edited Loading

ph commented Feb 8, 2022

ph commented Feb 8, 2022

axw commented Feb 8, 2022

ph commented Feb 8, 2022

mtojek commented Feb 8, 2022

ph commented Feb 8, 2022 • edited Loading

mtojek commented Feb 8, 2022

ph commented Feb 8, 2022

ph commented Feb 8, 2022 • edited Loading

ph commented Feb 8, 2022

ph commented Feb 8, 2022

ph commented Feb 8, 2022 • edited Loading

ph commented Feb 8, 2022

ph commented Feb 8, 2022

ph commented Feb 8, 2022

mtojek commented Feb 9, 2022

ph commented Feb 9, 2022

simitt commented Feb 11, 2022

mtojek commented Feb 3, 2022 •

edited

Loading

mtojek commented Feb 3, 2022 •

edited

Loading

juliaElastic commented Feb 3, 2022 •

edited

Loading

joshdover commented Feb 3, 2022 •

edited

Loading

ph commented Feb 3, 2022 •

edited

Loading

mtojek commented Feb 8, 2022 •

edited

Loading

ph commented Feb 8, 2022 •

edited

Loading

ph commented Feb 8, 2022 •

edited

Loading

ph commented Feb 8, 2022 •

edited

Loading