Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cannot apply a policy that changes output settings with Endpoint Security present #761

Closed
gbanasiak opened this issue Jul 21, 2022 · 13 comments
Labels
bug Something isn't working

Comments

@gbanasiak
Copy link
Contributor

Version

8.3.2

Operating System

Centos 7 (not verified on other OSes)

Description

It's impossible to apply a policy that changes default Elasticsearch output to non-default output with Endpoint Security present. Endpoint collides on port TCP/6788 with Elastic Agent.

Policy before:

outputs:
  default:
    type: elasticsearch
    hosts:
      - >-
        https://REDACTED.europe-west1.gcp.cloud.es.io:443

elastic-agent-before.yml.txt

Policy after:

outputs:
  703329b0-ebb2-11ec-816c-e300c95fdbc7:
    type: elasticsearch
    worker: 2
    bulk_max_size: 4096
    hosts:
      - >-
        https://REDACTED.europe-west1.gcp.cloud.es.io:443
  default:
    type: elasticsearch
    hosts:
      - >-
        https://REDACTED.europe-west1.gcp.cloud.es.io:443

elastic-agent-after.yml.txt

Screenshot 2022-07-21 at 17 04 32

Screenshot 2022-07-21 at 17 05 33

Symptoms

Elastic Agent status:

# elastic-agent status
Status: FAILED <--- HERE
Message: (no message)
Applications:
  * endpoint-security      (HEALTHY)
                           Protecting with policy {604429b8-889c-4517-95cd-b534d6492c41}
  * filebeat_monitoring    (HEALTHY)
                           Running
  * metricbeat_monitoring  (HEALTHY)
                           Running
  * endpoint-security      (FAILED)
                           failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: address already in use <--- HERE

Tartget policy not applied:

# elastic-agent inspect
[..]
outputs:
  default:
    api_key: REDACTED
    hosts:
    - https://REDACTED.europe-west1.gcp.cloud.es.io:443
    type: elasticsearch

Logs:

{"log.level":"info","@timestamp":"2022-07-21T16:49:55.466+0200","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":48},"message":"New State ID is kt6zc5pb","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-21T16:49:55.466+0200","log.origin":{"file.name":"stateresolver/stateresolver.go","file.line":49},"message":"Converging state requires execution of 2 step(s)","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-21T16:49:55.641+0200","log.origin":{"file.name":"operation/operator.go","file.line":307},"message":"operation 'operation-install' skipped for endpoint-security.8.3.2","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2022-07-21T16:49:55.642+0200","log.origin":{"file.name":"log/reporter.go","file.line":40},"message":"2022-07-21T16:49:55+02:00 - message: Application: endpoint-security--8.3.2[31e595cb-6fa0-4d62-b5ba-772ec96e796d]: State changed to STARTING: Starting - type: 'STATE' - sub_type: 'STARTING'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-07-21T16:49:55.953+0200","log.origin":{"file.name":"status/reporter.go","file.line":236},"message":"Elastic Agent status changed to: 'error'","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-07-21T16:49:55.953+0200","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":208},"message":"failed to dispatch actions, error: operator: failed to execute step sc-run, error: failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: address already in use: failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: address already in use","ecs.version":"1.6.0"}
{"log.level":"error","@timestamp":"2022-07-21T16:49:55.953+0200","log.origin":{"file.name":"log/reporter.go","file.line":36},"message":"2022-07-21T16:49:55+02:00 - message: Application: endpoint-security--8.3.2[31e595cb-6fa0-4d62-b5ba-772ec96e796d]: State changed to FAILED: failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: address already in use - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}

What stands out is: {"file.name":"log/reporter.go","file.line":36},"message":"2022-07-21T16:49:55+02:00 - message: Application: endpoint-security--8.3.2[31e595cb-6fa0-4d62-b5ba-772ec96e796d]: State changed to FAILED: failed to start connection credentials listener: listen tcp 127.0.0.1:6788: bind: address already in use - type: 'ERROR' - sub_type: 'FAILED'","ecs.version":"1.6.0"}

TCP/6788 is used by Elastic Agent:

# netstat -anp | grep 6788
tcp        0      0 127.0.0.1:6788          0.0.0.0:*               LISTEN      989/elastic-agent <--- HERE

# ps auxwwf
[..]
root       982  1.9  2.2 3103352 177676 ?      Ssl  16:24   0:55 /opt/Elastic/Endpoint/elastic-endpoint run
[..]
root       989  6.8  0.6 1279912 53724 ?       Ssl  16:24   3:18 elastic-agent <--- HERE
root      1247  0.0  0.0      0     0 ?        Zs   16:24   0:00  \_ [elastic-agent] <defunct>
root      2619  0.0  0.0      0     0 ?        Zs   16:27   0:00  \_ [elastic-agent] <defunct>
root      3848  0.0  0.0      0     0 ?        Zs   16:46   0:00  \_ [elastic-agent] <defunct>
root      3877  5.5  2.3 1398108 189384 ?      Sl   16:46   1:25  \_ /opt/Elastic/Agent/data/elastic-agent-16c55b/install/filebeat-8.3.2-linux-x86_64/filebeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${FILEBEAT_GOGC:100} -E filebeat.config.modules.enabled=false -E logging.level=info -E path.data=/opt/Elastic/Agent/data/elastic-agent-16c55b/run/default/filebeat--8.3.2--36643631373035623733363936343635
root      3900  0.2  1.4 1428076 119660 ?      Sl   16:46   0:03  \_ /opt/Elastic/Agent/data/elastic-agent-16c55b/install/metricbeat-8.3.2-linux-x86_64/metricbeat -E setup.ilm.enabled=false -E setup.template.enabled=false -E management.enabled=true -E logging.level=debug -E gc_percent=${METRICBEAT_GOGC:100} -E metricbeat.config.modules.enabled=false -E logging.level=info -E path.data=/opt/Elastic/Agent/data/elastic-agent-16c55b/run/default/metricbeat--8.3.2--36643631373035623733363936343635

Steps to Reproduce

  1. Create empty policy and add Endpoint Security with default settings.
  2. Deploy to 8.3.2 agent running on Linux and confirm agent healthy.
  3. Modify the policy changing integration output settings as shown above, and apply.

That is similar to #257.

@gbanasiak gbanasiak added the bug Something isn't working label Jul 21, 2022
@AndersonQ
Copy link
Member

AndersonQ commented Jul 25, 2022

This seems related to elastic/fleet-server#1672. It should not be possible to have 2 Elasticsearch outputs on a agent policy.
By related, I mean, both are adding another ES output, what isn't really supported by fleet-server

@nicpenning
Copy link
Contributor

Same issue on Windows environment deployment of Elastic Agent and Elastic endpoint security integration.

Step 1. Have an initial Endpoint security integration on an agent
Step 2. Apply a different policy that has an Endpoint security integration and notice that the "listen tcp 127.0.0.1:6788: bind: address already in use" error log is recorded. It is almost as if the integration does not fully remove the current endpoint security integration and redeploy it from scratch. (The elastic-endpoint.exe process never stops)

The work around to is deploying a different policy that will remove the Endpoint security integration and then deploying the policy you wish to use.

❗However, if you tried to deploy an Endpoint Security policy to an agent that already has one then this error gets the endpoint stuck where it is in a degraded state and won't take the new policies. So a reboot (or restarting the agent) will be required to have the agent attempt to get a working state and then get the correct policy which is a significant issue.

@nicpenning
Copy link
Contributor

Update: This is still a problem in 8.4.0.

@ferullo
Copy link

ferullo commented Aug 29, 2022

@AndersonQ can you confirm that this is an Agent side issue and there's nothing in Endpoint that needs to be done to fix this? Based on your comment I think we're in agreement but I'd like to make sure I'm not misinterpreting you.

This issue seems the same as an old one which was marked as fixed in 8.3.0. Was it possibly fixed and the bug has reappeared?

@AndersonQ
Copy link
Member

Hello folks, first let me take a step back, weŕe dealing with 2 problems here:

@nicpenning
Copy link
Contributor

Sorry if I crossed the streams here! I will note that I was switching from a policy with then endpoint security policy with a Logstash output to a nearly identical policy that as an Elasticsearch output.

@AndersonQ
Copy link
Member

ok, so to confirm, the steps to reproduce are:

  1. Create empty policy and add Endpoint Security with default settings and ES as output
  2. Deploy to 8.3.2 agent running on Linux and confirm agent healthy.
  3. Create a Logstash output
  4. create another policy identical to the 1st one, but using Logstash as "Output for integrations"
  5. swap the agent to the new policy

is it correct @nicpenning

@nicpenning
Copy link
Contributor

That should do it!

@nicpenning
Copy link
Contributor

I haven't tested in 8.4.1 yet though.

@AndersonQ
Copy link
Member

I did some more tests and even a change of output permissions did not cause the problem, it really seem to be related to a change in the whole output

@nimarezainia
Copy link
Contributor

@AndersonQ & @pierrehilbert can this issue be closed? the main API key issues have long been addressed. Let me know if there's anything remaining.

@anderson you mention: "the other issue, indeed most likely is on the elastic-agent. Let me reproduce it to double check and be completely sure." -- not quiet sure what the other issue here is exactly. thanks

@AndersonQ
Copy link
Member

I think it was the port collision, I'm not sure anymore. But yes, it seems ok to close it

@jlind23
Copy link
Contributor

jlind23 commented May 27, 2024

I think it was the port collision, I'm not sure anymore. But yes, it seems ok to close it

Closing as per @AndersonQ's comment

@jlind23 jlind23 closed this as completed May 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants