Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Elastic Agent won't start if the default rpc port is used #28118

Closed
psanz-estc opened this issue Sep 24, 2021 · 9 comments
Closed

Elastic Agent won't start if the default rpc port is used #28118

psanz-estc opened this issue Sep 24, 2021 · 9 comments
Labels
Stalled Team:Elastic-Agent Label for the Agent team

Comments

@psanz-estc
Copy link

  • Version: Elastic Agent 7.14.2
  • Operating System: Ubuntu 20.04
  • Steps to Reproduce:

Elastic Agent will bind to port 6789 by default, but when that port is used by another application, agent cannot be started.

Enrollment/installation (via fleet server) doesn't return any error, only INFO messages:

root@server:~/elastic-agent-7.14.2-linux-x86_64# ./elastic-agent install -f --url=https://URL:443 --enrollment-token=TOKEN
2021-09-22T18:40:00.862+0200	INFO	cmd/enroll_cmd.go:396	Starting enrollment to URL: https://URL:443/
2021-09-22T18:40:02.024+0200	INFO	cmd/enroll_cmd.go:232	Elastic Agent might not be running; unable to trigger restart
2021-09-22T18:40:02.024+0200	INFO	cmd/enroll_cmd.go:234	Successfully triggered restart on running Elastic Agent.
Successfully enrolled the Elastic Agent.
Elastic Agent has been successfully installed.

The messages indicate that the agent has been restarted, enrolled and installed... But agent is not running and all we see in Kibana is that the agent is "Updating"

Only journalctl shows the real problem:

sep 22 18:52:05 server systemd[1]: Started Elastic Agent is a unified agent to observe, monitor and protect your system..
sep 22 18:52:05 server elastic-agent[2085818]: starting GRPC listener: listen tcp 127.0.0.1:6789: bind: address already in use
sep 22 18:52:05 server systemd[1]: elastic-agent.service: Main process exited, code=exited, status=1/FAILURE
sep 22 18:52:05 server systemd[1]: elastic-agent.service: Failed with result 'exit-code'

So we have to edit elastic-agent.yml under /opt/Elastic/Agent, and add a different grpc port:

agent.grpc:
  address: localhost
  port: 16789

And then run /opt/Elastic/Agent/elastic-agent restart

Elastic Agent should at least identify this port collision during the installation and display an error message warning user about the problem

@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 24, 2021
@psanz-estc
Copy link
Author

cc: @EricDavisX

@EricDavisX EricDavisX added the Team:Elastic-Agent Label for the Agent team label Sep 28, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 28, 2021
@EricDavisX
Copy link
Contributor

At a minimum, we can probably detect this and put better error logging in place to help triage.

@botelastic
Copy link

botelastic bot commented Sep 28, 2022

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Sep 28, 2022
@barrettsmits
Copy link

👍

This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.

@botelastic botelastic bot removed the Stalled label Oct 5, 2022
@barrettsmits
Copy link

👍

This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.

After going line by line, the deployment and Daemonset cannot use the hostNetwork: True flag

Spec:
      hostNetwork: true

Changed to False solved this error on our deployment to a Kubernetes cluster with the Daemonset already running.

Possible documentation to prevent others from running into it?

_Per this page: https://www.elastic.co/guide/en/fleet/master/running-on-kubernetes-managed-by-fleet.html

Deploying Elastic Agent to collect cluster-level metrics in large cluster
The size and the number of nodes in a Kubernetes cluster can be fairly large at times, and in such cases the Pod that will be collecting cluster level metrics might face performance issues due to resources limitations. In this case users might consider to avoid using the leader election strategy and instead run a dedicated, standalone Elastic Agent instance using a Deployment in addition to the DaemonSet._

@slacksach
Copy link

👍
This is causing issues for my org when a kubernetes deployment of a standalone agent on a cluster with a daemonset running on each node gets this error.

After going line by line, the deployment and Daemonset cannot use the hostNetwork: True flag

Spec:
      hostNetwork: true

Changed to False solved this error on our deployment to a Kubernetes cluster with the Daemonset already running.

Possible documentation to prevent others from running into it?

_Per this page: https://www.elastic.co/guide/en/fleet/master/running-on-kubernetes-managed-by-fleet.html

Deploying Elastic Agent to collect cluster-level metrics in large cluster The size and the number of nodes in a Kubernetes cluster can be fairly large at times, and in such cases the Pod that will be collecting cluster level metrics might face performance issues due to resources limitations. In this case users might consider to avoid using the leader election strategy and instead run a dedicated, standalone Elastic Agent instance using a Deployment in addition to the DaemonSet._

Thanks for this, saved us a bunch of time, we wanted to run synthetics browser monitors but as the normal DaemonSet requires runAsUser: 0 and synthetics requires runAsUser: 1000 we needed to combine hostNetwork: false and runAsUser: 1000 for that to work. Thanks

@dmgeurts
Copy link

dmgeurts commented Jan 12, 2023

Still experiencing this with v8.5.3 and editing /opt/Elastic/Agent/elastic-agent.reference.yml doesn't work as the installation seems to have failed (Fleet says the agent status is "updating") and a restart just throws the following socket error:

$ sudo /opt/Elastic/Agent/elastic-agent restart
Error: Failed trigger restart of daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/elastic-agent.sock: connect: no such file or directory"
Usage:
  elastic-agent restart [flags]

Flags:
  -h, --help   help for restart

Global Flags:
  -c, --c string                     Configuration file, relative to path.config (default "elastic-agent.yml")
  -d, --d string                     Enable certain debug selectors
  -e, --e                            Log to stderr and disable syslog/file output
      --environment environmentVar   set environment being ran in (default default)
      --path.config string           Config path is the directory Agent looks for its config file (default "/opt/Elastic/Agent")
      --path.downloads string        Downloads path contains binaries Agent downloads
      --path.home string             Agent root path (default "/opt/Elastic/Agent")
      --path.install string          Install path contains binaries Agent extracts
      --path.logs string             Logs path contains Agent log output (default "/opt/Elastic/Agent")
  -v, --v                            Log at INFO level

Failed trigger restart of daemon: rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial unix /run/elastic-agent.sock: connect: no such file or directory"

EDIT: Just noticed I changed the wrong file: /opt/Elastic/Agent/elastic-agent.yml is the right one, and the suggested change from the original post works. But, sudo elastic-agent restart didn't work for me. However, sudo systemctl restart elastic-agent did.

@botelastic
Copy link

botelastic bot commented Jan 12, 2024

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jan 12, 2024
@botelastic botelastic bot closed this as completed Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Stalled Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

6 participants