Elastic-agent must be able to manage global variables within a beat #163

fearful-symmetry · 2021-10-28T18:15:35Z

CC @ruflin @jen-huang @joshdover @jlind23 @masci

What's this about?

This is a result of me trying to deal with the bug in elastic/beats#28546, where the problem stems for an inability to manage the global hostfs from within fleet. I'm going to be summarizing from an email thread a bit here.

Right now, we have we have no real way for a user to set and update global variables within metricbeat from the fleet UI. In addition, the config system that manages a beat config from fleet has no way to set or manage global variables. The OnReload callbacks that happen only impact config that happens from within a given module. In addition libbeat's management of global config and state is highly piecemeal, and not well-suited to clean remote management by something like elastic-agent. These are all interrelated problems, and whatever fix comes out of this needs to deal with all of these problems in some way.

What do we need?

While there's a handful of use cases for management of global variables, the first and most immediate from a tech debt perspective is the system.hostfs variable that's used by various system components to specify an alternate root filesystem, in cases where a user wants to monitor a host system from within a container. Usage of this setting is spread across two metricbeat modules and libbeat, and will probably expand to other modules over time.

While I'm sure we could put something together in a week to fix hostfs, I'd like to approach this from a generic perspective, as this need is going to crop up in other areas over time, and we need a generic, scalable way of managing global state from fleet.

At minimum, it seems to me need the following:

A UI element that sets variables at a Fleet Config Policy Level
Some component that transmits this config to beats and applies it a global level
Some component(s) in libbeat that are capable of updating global config variables from fleet inputs

The workaround in the above PR is to set the global variable from the system module init, which is not a great solution, but it's the only one I have, considering we just need the bug fixed in the short term.

Analysis: What are the underlying issues?

When I was considering the requirements for a "proper" hostfs fix, I came up with a few requirements (copying here from an email):

Must work under metricbeat's Initialization Scheme, as well as Fleet
Must have an API that insures different components are using the HostFS in a consistent manner
Any component that needs to know the HostFS must not not carry it around in global state, and instead receive it from an API
It must be able to be updated with a new HostFS during runtime, and that updated value must be viewable to any downstream consumers

It struck me halfway through that I was sort of describing a key-value store. Right now we don't have anything like that in fleet as far as I'm aware, but I think we should consider something like it (at least from an API perspective), as the fundamental problem this issue is trying to address is that global config within beats is done in a rather piecemeal, distributed fashion without any central gatekeeping or management. This makes sense for a CLI-based application like metricbeat/filebeat, but not so much for a remotely managed component of multi-part system like the fleet stack.

If we want a solution that's more elegant than "append a flag to the exec command, restart the process", we'll need to re-evaluate how we utilize libbeat from inside elastic-agent. A meaningful, long-term solution to this problem will probably require a lot of refactoring and new code, but I think it's worth it, as large portions of libbeat were built around the assumption of an independent application that is configured locally via a CLI and yaml file. The longer we go without creating a codebase that treats remote management as a first-class use case, the more tech debt we're gonna build up.

The fleet data streams -> metricbeat module architecture we have in place now works because modules within a beat operate in a relatively independent, generic fashion, allowing something like the central management code to come along and re-implement the underlying interface. This isn't really the case with libbeat, as CLI flags, and init functions happen in different places. For example, when one excludes testing flags, there's about four different sub-libraries in libbeat that set and fetch their own CLI flags and (I imagine in some cases) merge them with yml values using their own logic.

There's a lot of potential overlapping solutions to this problem (the KV store, spinning up inputs as independent processes and not a global beat instance, plugging modules directly into elastic-agent without metricbeat, etc, etc), and I'm not pushing for one solution, but more using the KV-store idea as an illustration for the kinds of overarching, big-picture changes I think we should be looking at.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2021-10-28T18:15:37Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

elasticmachine · 2021-10-29T08:45:29Z

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

ruflin · 2021-11-02T13:34:45Z

Thanks @fearful-symmetry for the write up around the current issue(s) and share ideas on the direction we could take this. As described, we should tackled the more general problem of "global variables" in Fleet and Elastic Agent which then provides a solution for hostfs. I expect this to be a collaboration between the Control Plane and Fleet team (@jen-huang ) as it will also have an affect on how the policy is built, how users modify it etc.

jlind23 · 2022-02-10T13:58:16Z

@ph i don't think we did enough preparatory work on our end to make this happen. I would rather deal with it in 8.3 or 8.4 - Thoughts?

ph · 2022-02-10T14:09:08Z

I think I would be more drastic... Using environment variables as flag enablement is a fragile idea, I see this as a tech deb that we should clean. @fearful-symmetry got it right here.

In addition libbeat's management of global config and state is highly piecemeal, and not well-suited to clean remote management by something like elastic-agent. These are all interrelated problems, and whatever fix comes out of this needs to deal with all of these problems in some way.

I would prefer we look at this problem without having the need for an environment variable as a configuration in the context of input v2 and removing the requirement for a global variable both in input and libbeat.

ph · 2022-02-10T14:10:06Z

Just to be clear, I would suggest we have an issue removing the need for global variables completely which would fall into the data plane side.

jlind23 · 2022-02-10T17:09:56Z

@cmacknz would you take on this issue or should I create a new one following what @ph stated?

cmacknz · 2022-02-10T18:26:35Z

I don't know enough about the interactions here to have an opinion on what the solution to this is. I can believe changing the metricbeat side will be easier than coordinating something across all of agent and fleet.

Hopefully @fearful-symmetry can give me a tour of the code involved here the next time we talk.

fearful-symmetry · 2022-02-10T18:37:29Z

I would prefer we look at this problem without having the need for an environment variable as a configuration in the context of input v2 and removing the requirement for a global variable both in input and libbeat.

@ph Seconded, most of the logic around global variables rolling around the code is for managing either CLI flags for or random configs thrown into the main whateverbeat.yml, once we're no longer tied to a beat with a command line, we get rid of the need for a lot of this. Considering how V2 will also isolate inputs from queues, shippers, etc, there might not be a lot of global variables left.

The most awkward component here (and the one I'm most familiar with), is hostfs. Until we refactor out all the vendored libraries like gosigar, some subcomponent in the metricbeat module will need to set hostfs as a global variable in order to make those vendor libraries happy.

ph · 2022-02-10T18:53:55Z

The most awkward component here (and the one I'm most familiar with), is hostfs. Until we refactor out all the vendored libraries like gosigar, some subcomponent in the metricbeat module will need to set hostfs as a global variable in order to make those vendor libraries happy.

Sadly, yes, it's been a while since I've gone into go-sigar rabbit hole. So I don't know what is the size of that refactoring effort. But as we move forward we should reduce side effect code (like env variable or func init()) and move to explicit configuration.

fearful-symmetry · 2022-02-10T19:15:17Z

Sadly, yes, it's been a while since I've gone into go-sigar rabbit hole. So I don't know what is the size of that refactoring effort. But as we move forward we should reduce side effect code (like env variable or func init()) and move to explicit configuration.

Agreed. That refactoring is currently in-progress (See elastic/beats#30076), but it's a huge slog, as there's tons of code to re-write and move over. There's also shirou/gopsutil, which requires a global hostfs in a few places as well.

fearful-symmetry · 2022-02-15T21:19:02Z

On the request of @cmacknz , here's a breakdown of the parts of the system code that still have discrepancies as far as hostfs handling:

Libraries still using shirou/gopsutil, which requires hostfs to be set globally at the library level:
- libbeat/metric/system/diskio
- metricbeat/module/system/network
- metricbeat/module/system/socket_summary
Other uses of gopsutil that doesn't require hostfs:
- The darwin implementation of "github.com/shirou/gopsutil/v3/cpu"
- libbeat/cmd/platformcheck/platformcheck.go
Libraries still using elastic/go-sysinfo, which doesn't take any hostfs variable, and thus won't work under containers:
- libbeat/metric/system/host
- metricbeat/module/linux/memory
- metricbeat/module/system/network_summary
Libraries using elastic/go-sysinfo that I don't think will care about an alternate hostfs:
- libbeat/metric/system/process (used to fetch host memory stats, as the memory code lives in metricbeat/internal
- libbeat/processors/add_observer_metadata
- libbeat/processors/add_host_metadata
- metricbeat/module/docker/network_summary

go-sysinfo is also used in packetbeat, agent, and auditbeat. I'm not knowledgeable enough with auditbeat to say for sure, but there might be some code in there that would care about a procfs root, but I'm not sure if auditbeat supports that at all.

Libraries still using gosigar, which requires hostfs to be set globally
- libbeat/metric/system/cpu (the name is deceptive, this is for load averages)
- libbeat/metric/system/diskio
- metricbeat/module/system/filesystem
- metricbeat/module/system/process_summary
Libraries still using gosigar but don't require hostfs:
- libbeat/metric/system/cgroup/cgv1
- libbeat/metric/system/process
- metricbeat/internal/metrics/cpu (windows only)
- metricbeat/module/system/socket
- /metricbeat/module/system/uptime
- metricbeat/module/windows/service
- packetbeat/procs

As far as the path forward here, go-sysinfo is the low-hanging fruit. We own the code, we can just copy it over with some changes and cleanup to support beat's internal hostfs resolver. However, a lot of the go-sysinfo code is used outside of metricbeat metrics gathering, and it raises the question of what parts of the beats ecosystem should use what. The original plan (which I'm still planning to stick to) is to extract any parts of go-sysinfo that are only used by metricbeat modules, and later add hostfs support to go-sysinfo in case we need it in other places.

The remaining gosigar libraries are also rather low-hanging. It should be noted that a lot of the remaining uses of gosigar don't require any hostfs value, and are instead used to import legacy non-linux implementations of various system APIs, as we aren't exactly overflowing with knowledgeable Darwin or Windows systems devs right now who can refactor this code. I can do it with a little self-training, but it's a lower priority.

gopsutil is going to be a bit of a snag, as we don't own the code, and a lot of the APIs that it implements, like networking, aren't exactly a breeze. As an amusing aside, last time I dug into a lot of these libraries, a lot of the code (particularly implementations on Darwin and Windows) seems to be copy-and-pasted ad infinitum across github, and I have no idea where some of it actually came from, from an authorship standpoint.

jlind23 · 2022-03-15T13:24:05Z

After discussing with @cmacknz and having in mind the current design on the input as part of the Elastic V2, we decided to froze this work for now and rather keep it for later to see if V2 design will fixes these problems.

cmacknz · 2022-03-15T13:37:11Z

Yes, there is some valuable refactoring to be done here but completing all of the refactoring to solve this problem within metricbeat will take a long time. We are shifting focus to V2 implementation and we can hopefully address the root problem with V2 (via separate processes, and agent KV store, etc as suggested in the description).

jlind23 · 2024-05-27T13:27:24Z

@cmacknz can this be closed as completed/outdated?

cmacknz · 2024-05-27T14:17:43Z

This hasn't come up again since it was written, so we can close and reopen if needed.

If we still need to expose hostfs it can be added as a new setting in the agent policy. We can open a more targeted issue for that.

therealdwright · 2024-06-12T04:00:38Z

@cmacknz I have been in discussion with your support team about this exact issue for almost a month now. I think a solution is still required.

fearful-symmetry added bug Something isn't working enhancement New feature or request Team:Elastic-Agent Label for the Agent team labels Oct 28, 2021

jlind23 added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Oct 29, 2021

jlind23 added the 8.1-candidate label Nov 9, 2021

This was referenced Nov 10, 2021

Fix behavior of hostfs under agent elastic/beats#28546

Merged

[Fleet][Synthetics] Add global settings for agent elastic/kibana#118033

Open

jlind23 added v8.1.0 and removed 8.1-candidate labels Dec 7, 2021

jlind23 added 8.2-candidate and removed v8.1.0 labels Feb 11, 2022

fearful-symmetry mentioned this issue Feb 24, 2022

Refactor process_summary to add less common process states, remove gosigar deps elastic/beats#30577

Merged

6 tasks

jlind23 transferred this issue from elastic/beats Mar 7, 2022

jlind23 removed the 8.2-candidate label Mar 15, 2022

cmacknz closed this as completed May 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Elastic-agent must be able to manage global variables within a beat #163

Elastic-agent must be able to manage global variables within a beat #163

fearful-symmetry commented Oct 28, 2021 •

edited

Loading

elasticmachine commented Oct 28, 2021

elasticmachine commented Oct 29, 2021

ruflin commented Nov 2, 2021

jlind23 commented Feb 10, 2022

ph commented Feb 10, 2022

ph commented Feb 10, 2022

jlind23 commented Feb 10, 2022

cmacknz commented Feb 10, 2022

fearful-symmetry commented Feb 10, 2022

ph commented Feb 10, 2022

fearful-symmetry commented Feb 10, 2022

fearful-symmetry commented Feb 15, 2022

jlind23 commented Mar 15, 2022

cmacknz commented Mar 15, 2022

jlind23 commented May 27, 2024

cmacknz commented May 27, 2024

therealdwright commented Jun 12, 2024

Elastic-agent must be able to manage global variables within a beat #163

Elastic-agent must be able to manage global variables within a beat #163

Comments

fearful-symmetry commented Oct 28, 2021 • edited Loading

What's this about?

What do we need?

Analysis: What are the underlying issues?

elasticmachine commented Oct 28, 2021

elasticmachine commented Oct 29, 2021

ruflin commented Nov 2, 2021

jlind23 commented Feb 10, 2022

ph commented Feb 10, 2022

ph commented Feb 10, 2022

jlind23 commented Feb 10, 2022

cmacknz commented Feb 10, 2022

fearful-symmetry commented Feb 10, 2022

ph commented Feb 10, 2022

fearful-symmetry commented Feb 10, 2022

fearful-symmetry commented Feb 15, 2022

jlind23 commented Mar 15, 2022

cmacknz commented Mar 15, 2022

jlind23 commented May 27, 2024

cmacknz commented May 27, 2024

therealdwright commented Jun 12, 2024

fearful-symmetry commented Oct 28, 2021 •

edited

Loading