-
Notifications
You must be signed in to change notification settings - Fork 148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Elastic-agent must be able to manage global variables within a beat #163
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane) |
Thanks @fearful-symmetry for the write up around the current issue(s) and share ideas on the direction we could take this. As described, we should tackled the more general problem of "global variables" in Fleet and Elastic Agent which then provides a solution for hostfs. I expect this to be a collaboration between the Control Plane and Fleet team (@jen-huang ) as it will also have an affect on how the policy is built, how users modify it etc. |
@ph i don't think we did enough preparatory work on our end to make this happen. I would rather deal with it in 8.3 or 8.4 - Thoughts? |
I think I would be more drastic... Using environment variables as flag enablement is a fragile idea, I see this as a tech deb that we should clean. @fearful-symmetry got it right here.
I would prefer we look at this problem without having the need for an environment variable as a configuration in the context of input v2 and removing the requirement for a global variable both in input and libbeat. |
Just to be clear, I would suggest we have an issue removing the need for global variables completely which would fall into the data plane side. |
I don't know enough about the interactions here to have an opinion on what the solution to this is. I can believe changing the metricbeat side will be easier than coordinating something across all of agent and fleet. Hopefully @fearful-symmetry can give me a tour of the code involved here the next time we talk. |
@ph Seconded, most of the logic around global variables rolling around the code is for managing either CLI flags for or random configs thrown into the main The most awkward component here (and the one I'm most familiar with), is |
Sadly, yes, it's been a while since I've gone into go-sigar rabbit hole. So I don't know what is the size of that refactoring effort. But as we move forward we should reduce side effect code (like env variable or |
Agreed. That refactoring is currently in-progress (See elastic/beats#30076), but it's a huge slog, as there's tons of code to re-write and move over. There's also |
On the request of @cmacknz , here's a breakdown of the parts of the system code that still have discrepancies as far as hostfs handling:
As far as the path forward here, The remaining
|
After discussing with @cmacknz and having in mind the current design on the input as part of the Elastic V2, we decided to froze this work for now and rather keep it for later to see if V2 design will fixes these problems. |
Yes, there is some valuable refactoring to be done here but completing all of the refactoring to solve this problem within metricbeat will take a long time. We are shifting focus to V2 implementation and we can hopefully address the root problem with V2 (via separate processes, and agent KV store, etc as suggested in the description). |
@cmacknz can this be closed as completed/outdated? |
This hasn't come up again since it was written, so we can close and reopen if needed. If we still need to expose hostfs it can be added as a new setting in the agent policy. We can open a more targeted issue for that. |
@cmacknz I have been in discussion with your support team about this exact issue for almost a month now. I think a solution is still required. |
CC @ruflin @jen-huang @joshdover @jlind23 @masci
What's this about?
This is a result of me trying to deal with the bug in elastic/beats#28546, where the problem stems for an inability to manage the global
hostfs
from within fleet. I'm going to be summarizing from an email thread a bit here.Right now, we have we have no real way for a user to set and update global variables within metricbeat from the fleet UI. In addition, the config system that manages a beat config from fleet has no way to set or manage global variables. The
OnReload
callbacks that happen only impact config that happens from within a given module. In addition libbeat's management of global config and state is highly piecemeal, and not well-suited to clean remote management by something like elastic-agent. These are all interrelated problems, and whatever fix comes out of this needs to deal with all of these problems in some way.What do we need?
While there's a handful of use cases for management of global variables, the first and most immediate from a tech debt perspective is the
system.hostfs
variable that's used by various system components to specify an alternate root filesystem, in cases where a user wants to monitor a host system from within a container. Usage of this setting is spread across two metricbeat modules and libbeat, and will probably expand to other modules over time.While I'm sure we could put something together in a week to fix
hostfs
, I'd like to approach this from a generic perspective, as this need is going to crop up in other areas over time, and we need a generic, scalable way of managing global state from fleet.At minimum, it seems to me need the following:
The workaround in the above PR is to set the global variable from the
system
module init, which is not a great solution, but it's the only one I have, considering we just need the bug fixed in the short term.Analysis: What are the underlying issues?
When I was considering the requirements for a "proper"
hostfs
fix, I came up with a few requirements (copying here from an email):It struck me halfway through that I was sort of describing a key-value store. Right now we don't have anything like that in fleet as far as I'm aware, but I think we should consider something like it (at least from an API perspective), as the fundamental problem this issue is trying to address is that global config within beats is done in a rather piecemeal, distributed fashion without any central gatekeeping or management. This makes sense for a CLI-based application like metricbeat/filebeat, but not so much for a remotely managed component of multi-part system like the fleet stack.
If we want a solution that's more elegant than "append a flag to the
exec
command, restart the process", we'll need to re-evaluate how we utilize libbeat from insideelastic-agent
. A meaningful, long-term solution to this problem will probably require a lot of refactoring and new code, but I think it's worth it, as large portions of libbeat were built around the assumption of an independent application that is configured locally via a CLI and yaml file. The longer we go without creating a codebase that treats remote management as a first-class use case, the more tech debt we're gonna build up.The fleet
data streams -> metricbeat module
architecture we have in place now works because modules within a beat operate in a relatively independent, generic fashion, allowing something like the central management code to come along and re-implement the underlying interface. This isn't really the case with libbeat, as CLI flags, and init functions happen in different places. For example, when one excludes testing flags, there's about four different sub-libraries in libbeat that set and fetch their own CLI flags and (I imagine in some cases) merge them with yml values using their own logic.There's a lot of potential overlapping solutions to this problem (the KV store, spinning up inputs as independent processes and not a global beat instance, plugging modules directly into elastic-agent without metricbeat, etc, etc), and I'm not pushing for one solution, but more using the KV-store idea as an illustration for the kinds of overarching, big-picture changes I think we should be looking at.
The text was updated successfully, but these errors were encountered: