Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[All OS's]: Agent goes unhealthy when some other output is set to Default output. #257

Closed
amolnater-qasource opened this issue Mar 28, 2022 · 18 comments
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@amolnater-qasource
Copy link

Kibana version: 8.2 Snapshot Kibana cloud environment

Host OS: Windows

Build details:
VERSION: 8.2.0 Snapshot
BUILD: 51431
COMMIT: a743498436a863e142592cb535b43f44c448851a
Artifact link: https://artifacts-api.elastic.co/v1/search/8.2-SNAPSHOT

Preconditions:

  1. 8.2 Snapshot Kibana cloud environment should be available.
  2. Windows agent should be installed.

Integrations:
System and Endpoint Security.

Steps to reproduce:

  1. Login to Kibana environment.
  2. Navigate to Fleet>Agent Policy tab.
  3. Set output as "default" for agent policy.
  4. Create a new output under Fleet Settings and set it to Default.
  5. Observe within 60 seconds Windows Agent goes Unhealthy.

Expected Result:
Agent should not go unhealthy on changing Default output to some other user created output.

Logs:
logs.zip

Screen Recording:

Windows.Agent.Policy.-.Agent.policies.-.Fleet.-.Elastic.-.Google.Chrome.2022-03-28.12-48-34.mp4

Windows Output:
1

Note:

  • Issue is not observed on Linux agent.
@amolnater-qasource amolnater-qasource added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team impact:high Short-term priority; add to current release, or definitely next. labels Mar 28, 2022
@amolnater-qasource
Copy link
Author

@manishgupta-qasource Please review.

@manishgupta-qasource
Copy link

Secondary review for this ticket is Done

@ph
Copy link
Contributor

ph commented Mar 30, 2022

@lykkin might be related to you changes?

@amolnater-qasource amolnater-qasource changed the title [Windows Agent]: Agent goes unhealthy when some other output is set to Default output. [All OS's]: Agent goes unhealthy when some other output is set to Default output. Apr 21, 2022
@amolnater-qasource
Copy link
Author

Hi Team
We have revalidated this issue on latest 8.2 BC-4 Kibana cloud environment and found it reproducible on all OS's.
We have updated the title to avoid any confusions.

  • We have set another output under policy settings, and it gets Agents Unhealthy.

Integrations:
System and Endpoint Security

Platforms Validated:

  • Windows
  • Ubuntu
  • Centos
  • MAC

Build details:
BUILD: 52005
COMMIT: 9a5003d8cf0062bf24ef64d6712b44823888cc03
Artifact Link: https://staging.elastic.co/8.2.0-3b2b9b86/summary-8.2.0.html

Logs:
elastic-agent-diagnostics-2022-04-21T06-36-28Z-00.zip

Screenshot:
5
6
7

Please let us know if anything else is required from our end.
Thanks

@jlind23
Copy link
Contributor

jlind23 commented Apr 28, 2022

@lykkin any progress here?

@lykkin
Copy link
Contributor

lykkin commented Apr 28, 2022

In the logs we see

{"log.level":"error","@timestamp":"2022-03-28T08:00:31.761Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-03-28T04:00:31-04:00 - message: Application: endpoint-security--8.2.0-SNAPSHOT[3674c80e-9c57-47c2-9f2e-1e967c53b755]: State changed to FAILED: failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted. - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}

repeated many times. Out of curiosity, if you remove endpoint from the agent does the issue still occur?

Also pinging @elastic/endpoint in case this is a known issue.

@amolnater-qasource
Copy link
Author

Hi @lykkin
We have revalidated this issue on latest 8.2 BC-4 and yes we have observed that agent goes to unhealthy state only when Endpoint Security is added to it.
Issue is not reproducible without ES.

Build details:
BUILD: 52005
COMMIT: 9a5003d8cf0062bf24ef64d6712b44823888cc03
Artifact Link: https://staging.elastic.co/8.2.0-3b2b9b86/summary-8.2.0.html

Screenshot:
image

Logs:
elastic-agent-diagnostics-2022-04-29T11-31-28Z-00.zip
endpoint-000000.zip

Thanks

@ferullo
Copy link

ferullo commented Apr 29, 2022

Can you share elastic-endpoint.yaml?

@amolnater-qasource
Copy link
Author

Hi @ferullo
Please find below elastic-endpoint.yaml from 8.2 BC-4 Endpoint:
elastic-endpoint.zip

Please let us know if anything else is required from our end.
Thanks

@intxgo
Copy link
Contributor

intxgo commented May 3, 2022

The endpoint log looks fine. Endpoint connects successfully to Agent.

The former log is an Agent log:

{"log.level":"error","@timestamp":"2022-03-28T08:00:31.761Z","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-03-28T04:00:31-04:00 - message: Application: endpoint-security--8.2.0-SNAPSHOT[3674c80e-9c57-47c2-9f2e-1e967c53b755]: State changed to FAILED: failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: Only one usage of each socket address (protocol/network address/port) is normally permitted. - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}

It looks like the Agent is trying to start duplicate TCP server on localhost:6788. This localhost port is used by Endpoint as bootstrap connection port. It's really not related to any stack Output and since it's not possible to run several Endpoints on the same host it would make no sense to use any advanced socket options like SO_REUSEADDR on the TCP server side. Most likely it's a bug in Agent logic.

On a side note, I'm curious what is the purpose of the custom output, how should it work? Is it supposed to alter behavior of Endpoint in any way? The only documentation I was able to find it's rather shallow. Digging deeper from that page, I found the following snippet here

Only a single output may be defined.

@jlind23
Copy link
Contributor

jlind23 commented May 3, 2022

@intxgo the other output may have been logstash in this case but we need to keep the Elasticsearch one as it is the one used for Elastic Agent API key and so on..

@intxgo
Copy link
Contributor

intxgo commented May 3, 2022

Is it expected to have Endpoint output data (events and alerts) to both Elasticsearch and Logstash at the same time if configured?

@jlind23
Copy link
Contributor

jlind23 commented May 4, 2022

Nope, only one should be used then.

@jlind23
Copy link
Contributor

jlind23 commented May 4, 2022

@ph shouldn't we change something on Fleet-ui/Elastic Agent end to avoid having two default outputs?

@jlind23
Copy link
Contributor

jlind23 commented May 11, 2022

ping @ph

@amolnater-qasource
Copy link
Author

Bug Conversion

  • New testcase is not required for this scenario as it is covered under Exploratory Testing.

Thanks!

@ph
Copy link
Contributor

ph commented May 19, 2022

@jlind23 This is interesting, I would have expected that is not a possible behavior, because the default is a key in a map and the last would have won. We should validate in Fleet UI, and Fleet-Server we shouldn't trust fleet.

@amolnater-qasource Do you have the generated agent policy yml from fleet, please remove any credentials.

@amolnater-qasource
Copy link
Author

Hi @ph
Please find below attached elastic-agent.yml:
elastic-agent.zip

We have revalidated this issue on latest 8.3 Snapshot and found it fixed now.

  • Agent remains Healthy on changing Default output to some other user created output.

Build details:
BUILD: 52935
COMMIT: 5d5603a57237d8fe9cf186916c713b9ddddf039d

Screenshots:
3
4

Hence we are closing this issue and marking this as QA:Validated.

cc: @jlind23
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working impact:high Short-term priority; add to current release, or definitely next. QA:Validated Validated by the QA Team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

No branches or pull requests

7 participants