Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Elastic-Agent] Modify output to be insecure if flag is provided #28007

Merged
merged 12 commits into from
Oct 13, 2021

Conversation

michalpristas
Copy link
Contributor

What does this PR do?

At the moment when we enter --insecure during install/enroll it applies only for communication between agent and fleet server.
This is a source of confusion as users often times expect all communication to be insecure.

This PR propagates insecure also to fleet-server to ES communication and after it is enrolled it also updates output definition passed to processes.

Why is it important?

in case fleet-server to ES communication is not insecure we often see x509 errors when self signed certs are used. Especially when ES is running on other machine than the one running fleet-server. In this case self signed cert is not validated as it's generated usually for localhost/127.0.0.1 and validation fails when dialing IP of ES machine.

Passing this to processes is important in case we have not only fleet-server running on edge but also monitoring or other integration. If we did not pass this, no events would be sent to ES (same failures as the one during enroll in fleet server scenario)

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in CHANGELOG.next.asciidoc or CHANGELOG-developer.next.asciidoc.

How to test

  1. Create elastic stack instance with ES and Kibana enable TLS using self signed cert.
  2. On another machine try to enroll with --fleet-server-es pointing to ES from step 1 and --fleet-server-es-ca shared with the ES from step 1
  3. enroll should be successful
  4. logs are visible in logs tab/metrics dashboards are populated

@michalpristas michalpristas added Team:Elastic-Agent Label for the Agent team backport-v7.14.0 Automated backport with mergify backport-v7.15.0 Automated backport with mergify backport-v7.16.0 Automated backport with mergify labels Sep 20, 2021
@michalpristas michalpristas self-assigned this Sep 20, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/agent (Team:Agent)

@botelastic botelastic bot added needs_team Indicates that the issue/PR needs a Team:* label and removed needs_team Indicates that the issue/PR needs a Team:* label labels Sep 20, 2021
@michalpristas michalpristas changed the title Fix/certs insecure [Elastic-Agent] Modify output to be insecure if flag is provided Sep 20, 2021
@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 20, 2021

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2021-10-12T13:47:48.983+0000

  • Duration: 88 min 8 sec

  • Commit: 595fd40

Test stats 🧪

Test Results
Failed 0
Passed 7109
Skipped 16
Total 7125

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

@michalpristas
Copy link
Contributor Author

/package

@@ -147,7 +147,7 @@ func newManaged(
router,
&pipeline.ConfigModifiers{
Decorators: []pipeline.DecoratorFunc{modifiers.InjectMonitoring},
Filters: []pipeline.FilterFunc{filters.StreamChecker, modifiers.InjectFleet(rawConfig, sysInfo.Info(), agentInfo)},
Filters: []pipeline.FilterFunc{filters.StreamChecker, modifiers.InjectInsecureOutput(cfg.Fleet), modifiers.InjectFleet(rawConfig, sysInfo.Info(), agentInfo)},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it will affect more than just Fleet Server running under Elastic Agent, it will also affect all the other beats, correct?

If this does affect the other beats, I don't think we want this, because how will this work when it comes to multiple outputs? I believe it will have the effect that if --insecure is used all outputs will then become insecure.

Copy link
Contributor Author

@michalpristas michalpristas Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes that's the purpose. if we wont pass, events wont get consumed
this is to ease up on initial experience. you wont use insecure in prod. at least i hope nobody will

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They would get consumed if the policy is configured correctly. I don't think it should be Elastic Agent's job to override the output settings from Kibana.

If we had proper health reporting for outputs, it would be clear there is an issue. The real issue is that the Elastic Agent reports healthy when it is not.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if value is provided by kibana it should not be overriden, i'll add testcase to make it more visible

Copy link
Contributor Author

@michalpristas michalpristas Sep 20, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kibana on the other hand would needs to have the info that hey this cert we're using is self signed and may no be properly configured to support some config automatically (providing values to out config option).
insecure is our concept and i think it's up to us to help user using this with FRE.

if fleetConfig == nil ||
fleetConfig.Server == nil ||
fleetConfig.Server.TLS == nil ||
fleetConfig.Server.TLS.VerificationMode == tlscommon.VerifyFull {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem to have the affect that it will only do something if the Elastic Agent is running Fleet Server, but it will affect all beats.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right. changed to use client config instead

@michalpristas
Copy link
Contributor Author

/package

1 similar comment
@michalpristas
Copy link
Contributor Author

/package

@michalpristas
Copy link
Contributor Author

/package

@michalpristas
Copy link
Contributor Author

e2e passes but not within a time limit of 1 minute which needs to be increased, 2 minutes seems to work fine locally

@EricDavisX
Copy link
Contributor

e2e passes but not within a time limit of 1 minute which needs to be increased, 2 minutes seems to work fine locally

I know that the e2e-testing jenkins runs at night have a timeout factor, here:
https://github.com/elastic/e2e-testing/blob/master/.ci/Jenkinsfile#L47
The default is 5 mins. It may be that a specific area in the code was set up not using this variable, or something else. @mdelapenya can we hook up with Michal to review? I don't want to trigger tests again necessarily, but the last run is already aged-off so I don't have any quick poking in I could do.

@michalpristas
Copy link
Contributor Author

@EricDavisX i did a fix on e2e side, it should be fine now

@mdelapenya
Copy link
Contributor

@EricDavisX i did a fix on e2e side, it should be fine now

@michalpristas you mean this PR elastic/e2e-testing#1599, right? As this PR was merged two weeks ago, and we have not seen any instability related to that change since then, I'd say we are good to go from e2e-standpoint

@EricDavisX
Copy link
Contributor

I had a review with Michal over this and understand it better now. It looks like this can solve a lot of customer / user pain as is, and if we ever did decide to change the fix on the Kibana side this merge doesn't preclude that or make it harder.

I understand that we inject into each output with setting verification to 'none', but if the verification mode is set then that is skipped - so it seems good to me.

this is backed up by tests as well, so I'm giving it a thumb after discussion and fairly light literal code review

Copy link
Contributor

@EricDavisX EricDavisX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; will help fix customer pain

@ruflin
Copy link
Contributor

ruflin commented Oct 12, 2021

++ on having 2 flags. Maybe we should rename also insecure to make it more obvious.

  • --insecure-fleet-server-connection
  • --insecure-fleet-server-elasticsearch-connection

We could deprecate --insecure or use it as --insecure-all.

I wrote out elasticsearch above as I'm not sure ES is obvious to everyone. For us it is as we use these abbrevations on a daily base like KB, LS but it is not obvious for users.

@michalpristas
Copy link
Contributor Author

/package

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add the option to the container command as well. Adding it to the setup_config.go and the environment variable FLEET_SERVER_ELASTICSEARCH_INSECURE=1.

Copy link
Contributor

@michel-laterman michel-laterman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, we should be explicit on what this does vs the --insecure flag does in the docs

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks for all the fixes!

@michalpristas michalpristas merged commit 62d84db into elastic:master Oct 13, 2021
mergify bot pushed a commit that referenced this pull request Oct 13, 2021
)

[Elastic-Agent] Modify output to be insecure if flag is provided (#28007)

(cherry picked from commit 62d84db)
mergify bot pushed a commit that referenced this pull request Oct 13, 2021
)

[Elastic-Agent] Modify output to be insecure if flag is provided (#28007)

(cherry picked from commit 62d84db)

# Conflicts:
#	x-pack/elastic-agent/pkg/agent/cmd/enroll.go
#	x-pack/elastic-agent/pkg/agent/cmd/enroll_cmd.go
mergify bot pushed a commit that referenced this pull request Oct 13, 2021
)

[Elastic-Agent] Modify output to be insecure if flag is provided (#28007)

(cherry picked from commit 62d84db)
michalpristas added a commit that referenced this pull request Oct 14, 2021
…if flag is provided (#28387)

* [Elastic-Agent] Modify output to be insecure if flag is provided (#28007)

[Elastic-Agent] Modify output to be insecure if flag is provided (#28007)

(cherry picked from commit 62d84db)

# Conflicts:
#	x-pack/elastic-agent/pkg/agent/cmd/enroll.go
#	x-pack/elastic-agent/pkg/agent/cmd/enroll_cmd.go

* conflicts

Co-authored-by: Michal Pristas <[email protected]>
michalpristas added a commit that referenced this pull request Oct 14, 2021
) (#28386)

[Elastic-Agent] Modify output to be insecure if flag is provided (#28007)

(cherry picked from commit 62d84db)

Co-authored-by: Michal Pristas <[email protected]>
michalpristas added a commit that referenced this pull request Oct 14, 2021
) (#28388)

[Elastic-Agent] Modify output to be insecure if flag is provided (#28007)

(cherry picked from commit 62d84db)

Co-authored-by: Michal Pristas <[email protected]>
Icedroid pushed a commit to Icedroid/beats that referenced this pull request Nov 1, 2021
…stic#28007)

[Elastic-Agent] Modify output to be insecure if flag is provided (elastic#28007)
@dikshachauhan-qasource
Copy link

Hi @michalpristas cc @EricDavisX

I have re-attempted to validate this PR on 7.16 BC3 on-prem env as per steps mentioned in Ticket summary. However we are unable to reproduce desired results. Please check out the below steps and let us know if we are missing anything.

  1. On one VM, we enable elasticsearch and Kibana and created self assigned CA certs for running elasticsearch.
  2. Accessed another VM and downloaded agent artifact.
  3. Created in bound rules with Port 8220 on both VMs.

Scenario1: Ran a fleet server command with --insecure flag and observed below error. Screenshot:
image

Scenario2: Copied same certs at VM2 and re-attempted to run updated fleet-server command with --insecure flag
image

Scenario3: Created new certs at VM2 and re-attempted to run fleet-server command with --insecure flag
Option A: Fleet-server-es value not changed
Option B: Updated Fleet-server-es value with VM1 IP where ES is running
image

Build Details:

BUILD 45816
COMMIT acaa761f4ce46680fd7cfbeba03a652c72dc786b
Artifact: https://staging.elastic.co/7.16.0-8dc8b6a1/summary-7.16.0.html

Please let us know what we are still missing.

Thanks
QAS

@jlind23
Copy link
Collaborator

jlind23 commented Dec 2, 2021

ping @michalpristas could you have a look?

@michalpristas
Copy link
Contributor Author

michalpristas commented Dec 2, 2021

can you setup environment as this:
VM1:

  • elasticsearch
  • kibana
  • elastic agent with fleet server

VM2:

  • elastic agent without fleet server (default policy)

you may see agent from vm2 not reporing any events as ES is not configured to accept connections from non-local sources but enroll should work. agent2 should still be able to talk to fleet server running under agent1 and appear healthy.

also another thing is that we do not propagate insecure to beats so they may fail on cert validation and you need to update verification mode in output setting in fleet. this is what was agreed on in a PR.

@dikshachauhan-qasource
Copy link

dikshachauhan-qasource commented Dec 2, 2021

Hi @michalpristas

We have already testing this and are able to connect agent2 with agent1 as per scenario mentioned above, however, only issue at here is though agent2 seems up healthy but it doesn't send any logs to kibana UI or data streams or logs under discover tab.

Tested on Build 7.16 BC6:
BUILD 46139
COMMIT a35f543b45a71a3029c7b5e3624081b27532207f
Artifact link: https://staging.elastic.co/7.16.0-b878d709/summary-7.16.0.html

Steps used:

  • Deployed ES and Kibana on VM1
  • Installed Fleet server with CA certs on VM1 with Fleet server policy.
  • Created inbound rule for PORT 8220 on VM1
  • Installed Agent2 on VM2 with --insecure flag

Observations:

  • Agent Shows healthy on VM1
  • NO Logs on Kibana UI
  • No data stream
  • No logs on Discover tab.

Agent2 logs:
logs.zip

Screenshot:
image

Please let us know if more info is required from our end.

Thanks
QAS

@EricDavisX
Copy link
Contributor

@dikshachauhan-qasource I think what you have is a successful test! The data flow issue is, as Michal notes (presumably, as you did not mention it in your setup steps) due to the missing addition to the Fleet output settings. can you set:
ssl.verification_mode: none
in the Fleet output settings for ES and apply it and see if it works?

@dikshachauhan-qasource
Copy link

Hi @EricDavisX ,

We have tried with above settings too and found no change in Agent 2 or Fleet-server behavior.

Agent 2 still not able to send any logs to Kibana UI

Screenshot:
image

Agent2 logs:
logs.zip

Please let us know if anything is required from our end.

Thanks
QAS

leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
…secure if flag is provided (elastic#28387)

* [Elastic-Agent] Modify output to be insecure if flag is provided (elastic#28007)

[Elastic-Agent] Modify output to be insecure if flag is provided (elastic#28007)

(cherry picked from commit 9c6a0bb)

# Conflicts:
#	x-pack/elastic-agent/pkg/agent/cmd/enroll.go
#	x-pack/elastic-agent/pkg/agent/cmd/enroll_cmd.go

* conflicts

Co-authored-by: Michal Pristas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v7.14.0 Automated backport with mergify backport-v7.15.0 Automated backport with mergify backport-v7.16.0 Automated backport with mergify Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants