Elastic Agent won't start if the default rpc port is used #28118

psanz-estc · 2021-09-24T11:13:44Z

Version: Elastic Agent 7.14.2
Operating System: Ubuntu 20.04
Steps to Reproduce:

Elastic Agent will bind to port 6789 by default, but when that port is used by another application, agent cannot be started.

Enrollment/installation (via fleet server) doesn't return any error, only INFO messages:

root@server:~/elastic-agent-7.14.2-linux-x86_64# ./elastic-agent install -f --url=https://URL:443 --enrollment-token=TOKEN
2021-09-22T18:40:00.862+0200	INFO	cmd/enroll_cmd.go:396	Starting enrollment to URL: https://URL:443/
2021-09-22T18:40:02.024+0200	INFO	cmd/enroll_cmd.go:232	Elastic Agent might not be running; unable to trigger restart
2021-09-22T18:40:02.024+0200	INFO	cmd/enroll_cmd.go:234	Successfully triggered restart on running Elastic Agent.
Successfully enrolled the Elastic Agent.
Elastic Agent has been successfully installed.

The messages indicate that the agent has been restarted, enrolled and installed... But agent is not running and all we see in Kibana is that the agent is "Updating"

Only journalctl shows the real problem:

sep 22 18:52:05 server systemd[1]: Started Elastic Agent is a unified agent to observe, monitor and protect your system..
sep 22 18:52:05 server elastic-agent[2085818]: starting GRPC listener: listen tcp 127.0.0.1:6789: bind: address already in use
sep 22 18:52:05 server systemd[1]: elastic-agent.service: Main process exited, code=exited, status=1/FAILURE
sep 22 18:52:05 server systemd[1]: elastic-agent.service: Failed with result 'exit-code'

So we have to edit elastic-agent.yml under /opt/Elastic/Agent, and add a different grpc port:

agent.grpc:
  address: localhost
  port: 16789

And then run /opt/Elastic/Agent/elastic-agent restart

Elastic Agent should at least identify this port collision during the installation and display an error message warning user about the problem

The text was updated successfully, but these errors were encountered:

psanz-estc · 2021-09-24T11:15:22Z

cc: @EricDavisX

elasticmachine · 2021-09-28T00:31:37Z

Pinging @elastic/agent (Team:Agent)

EricDavisX · 2021-09-28T00:33:50Z

At a minimum, we can probably detect this and put better error logging in place to help triage.

botelastic · 2022-09-28T01:14:45Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

barrettsmits · 2022-10-05T18:38:39Z

👍

This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.

barrettsmits · 2022-10-11T13:17:18Z

👍

This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.

After going line by line, the deployment and Daemonset cannot use the hostNetwork: True flag

Spec:
      hostNetwork: true

Changed to False solved this error on our deployment to a Kubernetes cluster with the Daemonset already running.

Possible documentation to prevent others from running into it?

_Per this page: https://www.elastic.co/guide/en/fleet/master/running-on-kubernetes-managed-by-fleet.html

Deploying Elastic Agent to collect cluster-level metrics in large cluster
The size and the number of nodes in a Kubernetes cluster can be fairly large at times, and in such cases the Pod that will be collecting cluster level metrics might face performance issues due to resources limitations. In this case users might consider to avoid using the leader election strategy and instead run a dedicated, standalone Elastic Agent instance using a Deployment in addition to the DaemonSet._

slacksach · 2022-12-16T00:52:21Z

👍
This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.

After going line by line, the deployment and Daemonset cannot use the hostNetwork: True flag
Spec:
      hostNetwork: true
Changed to False solved this error on our deployment to a Kubernetes cluster with the Daemonset already running.

Possible documentation to prevent others from running into it?

_Per this page: https://www.elastic.co/guide/en/fleet/master/running-on-kubernetes-managed-by-fleet.html

Deploying Elastic Agent to collect cluster-level metrics in large cluster The size and the number of nodes in a Kubernetes cluster can be fairly large at times, and in such cases the Pod that will be collecting cluster level metrics might face performance issues due to resources limitations. In this case users might consider to avoid using the leader election strategy and instead run a dedicated, standalone Elastic Agent instance using a Deployment in addition to the DaemonSet._

Thanks for this, saved us a bunch of time, we wanted to run synthetics browser monitors but as the normal DaemonSet requires runAsUser: 0 and synthetics requires runAsUser: 1000 we needed to combine hostNetwork: false and runAsUser: 1000 for that to work. Thanks

dmgeurts · 2023-01-12T16:59:01Z

Still experiencing this with v8.5.3 and editing /opt/Elastic/Agent/elastic-agent.reference.yml doesn't work as the installation seems to have failed (Fleet says the agent status is "updating") and a restart just throws the following socket error:

$ sudo /opt/Elastic/Agent/elastic-agent restart
Error: Failed trigger restart of daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/elastic-agent.sock: connect: no such file or directory"
Usage:
  elastic-agent restart [flags]

Flags:
  -h, --help   help for restart

Global Flags:
  -c, --c string                     Configuration file, relative to path.config (default "elastic-agent.yml")
  -d, --d string                     Enable certain debug selectors
  -e, --e                            Log to stderr and disable syslog/file output
      --environment environmentVar   set environment being ran in (default default)
      --path.config string           Config path is the directory Agent looks for its config file (default "/opt/Elastic/Agent")
      --path.downloads string        Downloads path contains binaries Agent downloads
      --path.home string             Agent root path (default "/opt/Elastic/Agent")
      --path.install string          Install path contains binaries Agent extracts
      --path.logs string             Logs path contains Agent log output (default "/opt/Elastic/Agent")
  -v, --v                            Log at INFO level

Failed trigger restart of daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/elastic-agent.sock: connect: no such file or directory"

EDIT: Just noticed I changed the wrong file: /opt/Elastic/Agent/elastic-agent.yml is the right one, and the suggested change from the original post works. But, sudo elastic-agent restart didn't work for me. However, sudo systemctl restart elastic-agent did.

botelastic · 2024-01-12T17:10:20Z

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 24, 2021

EricDavisX added the Team:Elastic-Agent Label for the Agent team label Sep 28, 2021

botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 28, 2021

juliaElastic mentioned this issue Sep 28, 2021

[Fleet] cypress setup elastic/kibana#113106

Merged

9 tasks

botelastic bot added the Stalled label Sep 28, 2022

botelastic bot removed the Stalled label Oct 5, 2022

botelastic bot added the Stalled label Jan 12, 2024

botelastic bot closed this as completed Jul 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic Agent won't start if the default rpc port is used #28118

Elastic Agent won't start if the default rpc port is used #28118

psanz-estc commented Sep 24, 2021

psanz-estc commented Sep 24, 2021

elasticmachine commented Sep 28, 2021

EricDavisX commented Sep 28, 2021

botelastic bot commented Sep 28, 2022

barrettsmits commented Oct 5, 2022

barrettsmits commented Oct 11, 2022

slacksach commented Dec 16, 2022

dmgeurts commented Jan 12, 2023 •

edited

Loading

botelastic bot commented Jan 12, 2024

Elastic Agent won't start if the default rpc port is used #28118

Elastic Agent won't start if the default rpc port is used #28118

Comments

psanz-estc commented Sep 24, 2021

psanz-estc commented Sep 24, 2021

elasticmachine commented Sep 28, 2021

EricDavisX commented Sep 28, 2021

botelastic bot commented Sep 28, 2022

barrettsmits commented Oct 5, 2022

barrettsmits commented Oct 11, 2022

slacksach commented Dec 16, 2022

dmgeurts commented Jan 12, 2023 • edited Loading

botelastic bot commented Jan 12, 2024

dmgeurts commented Jan 12, 2023 •

edited

Loading