Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Add ability to show FQDN of agents #150239

Merged
merged 12 commits into from
Feb 7, 2023

Conversation

hop-dev
Copy link
Contributor

@hop-dev hop-dev commented Feb 2, 2023

Summary

Closes #149059

Adds the agent_features agent policy field which is an array of feature flags. As part of this allow the FQDN feature to be configured in the agent policy settings.

Screenshot 2023-02-02 at 23 59 17

@hop-dev hop-dev changed the title 1490595 fqdn setting [Fleet] Add agent_features to agent policy + FQDN feature option Feb 2, 2023
@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 3, 2023

@elasticmachine merge upstream

@hop-dev hop-dev marked this pull request as ready for review February 3, 2023 10:04
@hop-dev hop-dev requested review from a team as code owners February 3, 2023 10:04
@hop-dev hop-dev requested a review from AndersonQ February 3, 2023 10:06
@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 3, 2023

@AndersonQ we should talk about this on monday and try and test it end to end, I have added agent.features to the agent policy passed to fleet server in the following format:

          fqdn: {
            enabled: true,
          }

Is that what you were expecting?

@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Feb 3, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@hop-dev hop-dev added the release_note:feature Makes this part of the condensed release notes label Feb 3, 2023
@hop-dev hop-dev changed the title [Fleet] Add agent_features to agent policy + FQDN feature option [Fleet] Add ability to show FQDN of agents Feb 3, 2023
@AndersonQ
Copy link
Member

@AndersonQ we should talk about this on monday and try and test it end to end, I have added agent.features to the agent policy passed to fleet server in the following format:

          fqdn: {
            enabled: true,
          }

Is that what you were expecting?

hi @hop-dev,

from what I could see, looking at the expected results in the tests, yes, it looks correct.

yes, we need to test it all together. I'll try to run your branch, but we could also jump on a call to test if you want to.

@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 6, 2023

@elasticmachine merge upstream

Copy link
Member

@nchaulet nchaulet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀

@jlind23
Copy link
Contributor

jlind23 commented Feb 6, 2023

@hop-dev @AndersonQ were you able to test it end to end?

@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 6, 2023

@hop-dev @AndersonQ were you able to test it end to end?

I think Anderson is having some issues on his end preventing him from testing at the moment

@AndersonQ
Copy link
Member

I'm testing it right now

@jlind23
Copy link
Contributor

jlind23 commented Feb 6, 2023

@AndersonQ @hop-dev any update on this?

@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 6, 2023

@AndersonQ @hop-dev any update on this?

Had an issue with my code, pushed a fix so should be testing again soon

@kpollich
Copy link
Member

kpollich commented Feb 6, 2023

@hop-dev - Is this mergeable in its current state or are we waiting for @AndersonQ to confirm E2E behavior before this lands?

@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 6, 2023

@hop-dev - Is this mergeable in its current state or are we waiting for @AndersonQ to confirm E2E behavior before this lands?

It's mergeable, would be nice to have confirmation that it works e2e but obviously any tweaks can be treated as bug. Need to get the core team's approval too

@@ -113,6 +113,13 @@ const getSavedObjectTypes = (
monitoring_output_id: { type: 'keyword' },
download_source_id: { type: 'keyword' },
fleet_server_host_id: { type: 'keyword' },
agent_features: {
dynamic: false,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@elastic/kibana-core would someone be able to review this SO change, the only thing of note is that I've set dynamic: false as we want to allow arbitrary values, not sure if there are any risks here?

Copy link
Contributor

@TinaHeiligers TinaHeiligers Feb 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rudolf has an open PR that cleans up saved object fields with mappings configured as enabled: false because it provides insufficient validation and allows indexing a string into these objects.

Allowing arbitrary values is risky regarding the goal of versioned APIs, particularly when supporting find operations. In the future, using find for any arbitrary field will make validation really hard in your versioned APIs.

Can you refactor to only index a known subset of searchable fields rather?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to add to that, having unknown fields is also going to make your life hard w.r.t BWC. How would you handle a backward-compatible migration for something that's unknown?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dynamic: false just means all the underlying fields won't be indexed. Unless I missed something in @rudolf's last work, this is still the recommended way of storing unindexed data?

@TinaHeiligers in term of BWC compliance, we need to be aware that schema != model. Having data being non-indexed doesn't really have an impact on BWC (or non-BWC).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'll wait to see what Rudolf thinks too, thanks so much for the input, it was a slight red flag to me which is why I called it out 👍

Copy link
Contributor

@rudolf rudolf Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah these mappings looks fine to me 👍 and these arbitrary values will only ever be stored or retrieved, there won't be any find/search over them.

This API is tightly coupled to the persistence which we don't recommend. This is because whatever is stored will be returned which could be a problem for BWC if ever any of the unknown keys in agent_features got transformed by a migration. But I suspect this is "by design" and fleet would never touch this data on the Kibana side.

Copy link
Contributor

@rudolf rudolf Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I just noticed these values come from the Kibana UI. So while the mappings still look perfect to me, I think if we can, having the API validate the input would be preferable over just allowing any unknowns and doing UI-side validation.

Copy link
Contributor

@TinaHeiligers TinaHeiligers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm doubtful the change is going to make life easy when it comes to BWC compliance.

@@ -113,6 +113,13 @@ const getSavedObjectTypes = (
monitoring_output_id: { type: 'keyword' },
download_source_id: { type: 'keyword' },
fleet_server_host_id: { type: 'keyword' },
agent_features: {
dynamic: false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to add to that, having unknown fields is also going to make your life hard w.r.t BWC. How would you handle a backward-compatible migration for something that's unknown?

Copy link
Contributor

@pgayvallet pgayvallet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but @TinaHeiligers made me doubt 😅 , so I'll refer to @rudolf for a double check

@pgayvallet pgayvallet requested a review from rudolf February 7, 2023 09:02
@@ -113,6 +113,13 @@ const getSavedObjectTypes = (
monitoring_output_id: { type: 'keyword' },
download_source_id: { type: 'keyword' },
fleet_server_host_id: { type: 'keyword' },
agent_features: {
dynamic: false,
Copy link
Contributor

@rudolf rudolf Feb 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah these mappings looks fine to me 👍 and these arbitrary values will only ever be stored or retrieved, there won't be any find/search over them.

This API is tightly coupled to the persistence which we don't recommend. This is because whatever is stored will be returned which could be a problem for BWC if ever any of the unknown keys in agent_features got transformed by a migration. But I suspect this is "by design" and fleet would never touch this data on the Kibana side.

@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 7, 2023

@rudolf @pgayvallet @TinaHeiligers Thanks for all of your input. On balance, I think I am going to err on the side of "You aren't going to need it" and remove the dynamic config values for now, we can look at this again in the future when we have a concrete use case. From your team's point of view I will just be removing dynamic: false so I will not re-request your review if you're happy with that. Thanks again! 👍

@hop-dev hop-dev force-pushed the 1490595-fqdn-setting branch from 0fd7afd to 41f2357 Compare February 7, 2023 11:05
@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

Public APIs missing comments

Total count of every public API that lacks a comment. Target amount is 0. Run node scripts/build_api_docs --plugin [yourplugin] --stats comments for more detailed information.

id before after diff
fleet 963 964 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
fleet 925.5KB 927.1KB +1.6KB

Saved Objects .kibana field count

Every field in each saved object type adds overhead to Elasticsearch. Kibana needs to keep the total field count below Elasticsearch's default limit of 1000 fields. Only specify field mappings for the fields you wish to search on or query. See https://www.elastic.co/guide/en/kibana/master/saved-objects-service.html#_mappings

id before after diff
ingest-agent-policies 20 23 +3
Unknown metric groups

API count

id before after diff
fleet 1068 1069 +1

ESLint disabled line counts

id before after diff
fleet 57 50 -7

Total ESLint disabled count

id before after diff
fleet 67 60 -7

History

  • 💚 Build #105720 succeeded 0fd7afdd276255c93ab8ab7be4053d1f3ed281fe
  • 💚 Build #105602 succeeded dd5b43fcd7ef29f46b0f13ca9613bfc653a0f463
  • 💚 Build #105431 succeeded bed1038a468d12d8d0883d15a0659e9627d1f0e6
  • 💔 Build #105403 failed e2d9fe6e08b198cb4fa487d6b3ab3e1e6dd4c14c
  • 💔 Build #105372 failed dff0bab124dcc7793b2820ddd3820befdb62ffaf
  • 💔 Build #105327 failed 5f5ab75540648d3db182dfa2a9c3236310923df9

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@hop-dev hop-dev merged commit 7e9a9bc into elastic:main Feb 7, 2023
@hop-dev hop-dev deleted the 1490595-fqdn-setting branch February 7, 2023 13:56
@kibanamachine kibanamachine added v8.7.0 backport:skip This commit does not require backporting labels Feb 7, 2023
@nicpenning
Copy link

This is great.

What will be the default?

@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 8, 2023

Hi @nicpenning, thanks! we will be defaulting to the current behaviour which is hostname only. We are still waiting for the agent changes to be merged so there is a small chance that this won't be making it into the 8.7 release but 🤞

@nicpenning
Copy link

Today using Winlogbeat (7.17) or Elastic Agent (8.6) I see host.name as FQDN, but agent.hostname as short name. I will report back to confirm but the consistency of hostname fields is something I have wondered about for quite some time.

I will continue to monitor for this!

@hop-dev
Copy link
Contributor Author

hop-dev commented Feb 8, 2023

I will defer to @AndersonQ for the exact behaviour of the new mode as he is working on the agent side of things 👍

@AndersonQ
Copy link
Member

Today using Winlogbeat (7.17) or Elastic Agent (8.6) I see host.name as FQDN, but agent.hostname as short name. I will report back to confirm but the consistency of hostname fields is something I have wondered about for quite some time.

I will continue to monitor for this!

@nicpenning That is interesting. Can you reproduce it?

@nicpenning
Copy link

nicpenning commented Feb 8, 2023

For Winlogbeat 7.16+ here are some examples:
host.name : computer1.state.nd.local
host.hostname : computer1
agent.name : computer1
agent.hostname : computer1

For Elastic Agent with Windows/System/Sysmon Integrations:
host.name : computer1.state.nd.local
host.hostname : computer1
agent.name : computer1
agent.hostname : does_not_exist

For Elastic Agent with Elastic Agent/Fleet/OSQuery/Metricbeat/Endpoint Integrations:
host.name : computer1
host.hostname : computer1
agent.name : computer1
agent.hostname : does_not_exist

So it seems that the Windows event logs have the FQDN but everything else does not. A bit inconsistent.

@AndersonQ
Copy link
Member

yeah, Winlogbeat seems to use a different approach for host.name.

@nicpenning
Copy link

FYI, this was a bit of downer for us because the host.name started recording without the FQDN after we upgraded to 8.7.0 so I had to go to all of my policies and set this to FQDN which I firmly believe should have been default. The reason is because the Elastic agent already recorded this as FQDN (on Windows) and this change altered what analysts expected to see in this field (potentially breaking searches, alerting, etc.)

Thoughts? 🤔

@AndersonQ
Copy link
Member

Hi @nicpenning,

could you provide steps to reproduce it? I should not be the case that the behaviour changed, the FQDN feature is to be opt-in, not should have changed if you haven't turned it on.

@nicpenning
Copy link

nicpenning commented Jun 1, 2023

Certainly!

We were running Winlogbeat 7.16.2 and above you see that the FQDN was in host.name.

I went ahead and spot checked 8.1, 8.6, and 8.7 and found that host.name still included the FDQN. Here is winlogbeat downloaded and outputting to a file with FQDN in the host name:

{"@timestamp":"2023-04-13T16:26:54.136Z","@metadata":{"beat":"winlogbeat","type":"_doc","version":"8.7.0"},"host":{"ip":["10.202.130.15","169.254.186.221","169.254.124.54","169.254.194.156","192.168.86.20","172.25.96.1","172.28.128.1","172.30.32.1"],"mac":["00-15-5D-05-68-A5","00-15-5D-DB-C5-F6","00-15-5D-E0-93-F9","04-CF-4B-A8-52-2D","04-CF-4B-A8-52-2E","06-CF-4B-A8-52-2D","C8-4B-D6-5D-EE-32"],"hostname":"TEST12009LL","architecture":"x86_64","name":"TEST12009LL.state.sd.local","os":{"family":"windows","name":"Windows 11 Enterprise","kernel":"10.0.22000.1936 (WinBuild.160101.0800)","build":"22000.1936","type":"windows","platform":"windows","version":"10.0"},"id":"b941aa88-b947-4ffe-acaa-33aa428996b5"},"ecs":{"version":"8.0.0"},"agent":{"id":"a261584b-e712-47c0-ad66-dd5bdf0e31a3","name":"TEST12009LL","type":"winlogbeat","version":"8.7.0","ephemeral_id":"855ce9ba-ac31-4654-85be-66e083dc4016"},"winlog":{"computer_name":"TEST12009LL.state.sd.local","provider_name":"PowerShell","task":"Pipeline Execution Details","keywords":["Classic"],"process":{"pid":36152},"channel":"Windows PowerShell","event_id":"800","...omitted...}

So the problem is that if someone is migrating from Winlogeat to Filebeat via Elastic Agent this is a hard change and not from the default, so it may be hard to explain to people migrating to expect this new change.

Also, for when using Elastic Agent as noted above, I firmly believe if you are running Elastic Stack 8.6, all of your Windows events will contain the host.name : host.local.domain (FQDN). So to repeat, spin up a stack running 8.6 with the Elastic Agent and Windows integration and see that the FQDN was default.

So to summarize:

Winlogbeat by default up to version 8.7.0 is using FQDN for host.name and Elastic Agent Windows integrations in 8.6 use FQDN as well.

Does that help?

@AndersonQ
Copy link
Member

Hey, sorry for the delay. Yes that helps a lot.
From the top of my mind, there has been a difference on how winlogbeat reports the host.name in comparison with the other beats. So most likely the case here isn't related to the FQDN feature, but the differences between winlogbeat, filebeat and the Windows integrations.

I'll have a deeper look at it.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting release_note:feature Makes this part of the condensed release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.7.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Fleet] Add a setting inside Agent Policy for FQDN