-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Elastic Agent on Cloud] Fleet Server ends up shut down by Agent, so Cloud hosted Fleet Server is not started, can't use cloud #26588
Comments
Pinging @elastic/agent (Team:Agent) |
I believe this is because of my recent change for HTTP2 |
Running the container locally shows that it crashes as it fails to enroll.
|
I've been digging into this from another direction. The current 7.14-SNAPSHOT container cannot be successfully created in Cloud QA (or Cloud master). In https://github.com/elastic/cloud/pull/83408 we have changed the container health check to use the agent In the latest 7.14-SNAPSHOT this API appears to be unresponsive and so the container never passes the health check. |
Hi @EricDavisX Thanks |
Validated and confirmed that #25219 fixes it in master, just waiting for green test run in 7.x for backport and then this will be fixed. |
Seems that even with those fixes applied in 7.14, I am still seeing the following error:
|
I re-opened this issue because I kept getting the same issue that I commented above, that was because I was starting the same broken container image each time (user error on my part). With a build docker image from the 7.14 branch of the beats repo with a 7.14 fleet-server included in the bundle, the container starts up correctly. |
Hi @EricDavisX
Thanks |
I’m seeing a problem with Fleet Server on 7.14 cloud in cloud-staging. It [Fleet Server Agent] isn’t standing up on its own [the rest of cloud env seems fine] … is seems not known so I am logging it.
This is reproduced on latest 7.14 snapshot as of Jun 29 4PM
the kibana hash is: dcacd04872050ff322f7e9bb36af913e40d5977e
which should give us the timing for the whole stack and Agent...
edavis-mbp:kibana_elastic edavis$ git show -s dcacd04872050ff322f7e9bb36af913e40d5977e
commit dcacd04872050ff322f7e9bb36af913e40d5977e
Author: Kibana Machine [email protected]
Date: Tue Jun 29 00:33:06 2021 -0400
reproduced by using defaults in cloud staging, and picking 7.14-snapshot to deploy
From the Kibana UI, it manifests as the APM/Fleet container just isn't set up (although it is):
Brief conversation with Alex P from cloud team helped us find some logs which seem to indicate the problem needs review on Agent / Beats side.
Notes from slack, logs:
Alex Piggott 13 minutes ago
Failed to connect to backoff(elasticsearch(http://7cd47f69212147abb63f979fe801cd88.containerhost:9244)): Connection marked as failed because the onConnect callback failed: resource 'apm-7.14.0-transaction' exists, but it is not an alias
Alex Piggott 11 minutes ago
i assume that’s an unrelated issue?
oh wait wrong logs that’s APM
Alex Piggott 9 minutes ago
2021-06-29T17:08:03Z - message: Application: fleet-server--7.14.0-SNAPSHOT[]: State changed to STOPPED: Stopped - type: 'STATE' - sub_type: 'STOPPED'
Alex Piggott 9 minutes ago
so fleet server is stopped by agent
for this reason may be: 2021-06-29T17:08:02Z - message: Application: fleet-server--7.14.0-SNAPSHOT[]: State changed to DEGRADED: Running on policy with Fleet Server integration: policy-elastic-agent-on-cloud; missing config fleet.agent.id (expected during bootstrap process) - type: 'STATE' - sub_type: 'RUNNING'
The text was updated successfully, but these errors were encountered: