-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
filestream: validate input id on startup #41731
filestream: validate input id on startup #41731
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
|
During startup filebeat now validates the filestream inputs and fails to start if there are inputs without ID or with duplicated IDs
247727b
to
b6ecc4e
Compare
Pinging @elastic/elastic-agent-data-plane (Team:Elastic-Agent-Data-Plane) |
filebeat/input/filestream/config.go
Outdated
toJson := map[string]interface{}{} | ||
err := cfg.Unpack(&toJson) | ||
if err != nil { | ||
toJson["emptyID"] = fmt.Sprintf("failes to umpack config: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
toJson["emptyID"] = fmt.Sprintf("failes to umpack config: %v", err) | |
toJson["emptyID"] = fmt.Sprintf("failed to unpack config: %v", err) |
filebeat/input/filestream/config.go
Outdated
toJson := map[string]interface{}{} | ||
err := dupcfgs.Unpack(&toJson) | ||
if err != nil { | ||
toJson[id] = fmt.Sprintf("failes to umpack config: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Codewise looks good, left some comments about some minor spelling issues.
I tested locally following the issue description and confirmed that it validates the ids properly.
LGTM.
An empty ID is fine, the problem is if there are two inputs with an empty ID. Agent allows the empty ID as a valid ID. This avoids breaking the configuration unnecessarily where there is only a single filestream input with no ID. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All my comments have been addressed.
We might want to treat an empty ID as a regular ID value as @cmacknz pointed out. So, we can apply the duplication check to that as well.
I removed the bit forbidding an empty ID. It was already checking for multiple empty IDs and including them in the list of invalid inputs. |
@AndersonQ looks like the tests need adjusting?
|
filebeat/beater/filebeat.go
Outdated
@@ -291,6 +292,11 @@ func (fb *Filebeat) Run(b *beat.Beat) error { | |||
} | |||
defer stateStore.Close() | |||
|
|||
err = filestream.ValidateInputIDs(config.Inputs, logp.NewLogger("filestream")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filestream
input actually uses input.filestream
as the logger name, so I suggest keeping the pattern.
err = filestream.ValidateInputIDs(config.Inputs, logp.NewLogger("filestream")) | |
err = filestream.ValidateInputIDs(config.Inputs, logp.NewLogger("input.filestream")) |
filebeat.WaitForLogs( | ||
"filestream inputs validation error", | ||
10*time.Second, | ||
"Filebeat did log a validation error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should it be "Filebeat did not log a validation error"?
"Filebeat did log a validation error") | |
"Filebeat did not log a validation error") |
filebeat.WaitForLogs( | ||
"Input 'filestream' starting", | ||
10*time.Second, | ||
"Filebeat did log a validation error") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something else, like "filebeat did not start the Filestream input"
"Filebeat did log a validation error") | |
"Filebeat did not start the Filestream input") |
Honestly, as we're already doing a breaking change to enforce ID validation, I rather not allow empty IDs, it is a corner case and Elastic-Agent/Fleet already adds IDs to all inputs, so I don't see a real need for it. It only makes for an extra corner case to maintain. |
@AndersonQ, @cmacknz, Should we also cover the case of dynamic inputs like:
I like the way it's implemented here that gives this quick and precise feedback to the use, however it does not cover all cases. I believe we should, at least, also cover the |
Agreed, but also what was approved in the breaking change committee was indeed to enforce the uniqueness of IDs. Also considering the agent already allows a single empty ID, I agree with Craig it's fine to have an empty ID as long as it's just one.
Those are different things. Here we're tacking the start up and early feedback. In order to completely prevent an input with duplicated ID, we'd need to change it to return an error instead of just logging it. Which happens in a completely different part of the code. Having it on another issue and PR if we really want to do that is a better option. |
…ids' into 40540-filestream-require-unique-ids
Since 8.6.0 Elastic Agent has required that all inputs have a unique ID, and it considers an empty ID a unique ID. The constraint is that two inputs can't have the same ID, not that the ID has to be non-empty. func hasDuplicate(outputsMap map[string]outputI, id string) bool {
for _, o := range outputsMap {
for _, i := range o.inputs {
for _, j := range i {
if j.id == id {
return true
}
}
}
}
return false
} If we were designing this from scratch we'd probably forbid empty IDs, but allowing empty IDs solves the underlying functional problem without unnecessarily breaking people who started from our default configurations that didn't specify IDs a while ago. We could log a warning that IDs should be explicit if you want.
+1 to handling these as well, autodiscovery with the ID composed from variable substitutions with non-unique values is a common way people were ending up with duplicated logs on K8s. The original issue doesn't put a constraint on the execution environment :) This doesn't need to be fixed in this PR though, it can be done as a separate PR, but I don't think we can close #40540 until it's done, otherwise we have implemented "sometimes require unique IDs" and not "always require unique IDs". |
filebeat/input/filestream/config.go
Outdated
// containing the offending input configurations and returns an error containing | ||
// the duplicated IDs. | ||
func ValidateInputIDs(inputs []*conf.C, logger *logp.Logger) error { | ||
ids := make(map[string][]*conf.C) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[Suggestion]
Rename this map to something that better describe its contents. It is a map of inputs (or configs) grouped by ID, so ids
does not seem to be the better description to it.
Reviewed again, LGTM. |
@cmacknz, regarding
Whereas I agree this PR only enforces unique IDs at startup, it's exactly what #40540. Perhaps it'd be better to create another issue, or change the whole issue to be indeed about fielstream requiring unique IDs and add 2 sub issues, one with the current content of #40540 and another for enforce uniqueness at runtime. Also I believe when requiring uniqueness at runtime we should not crash filebeat, I believe it's better for the users to log an error and not start the offending input. Thinking about a standalone filebeat, which might have different modules running at the same time, I don't think it'd be good for the users to have filebeat crashing because a possible data duplication. Better to have just the offending issue not running. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 suggestions for code readability, the new changes look good to me.
…ids' into 40540-filestream-require-unique-ids
My guide for how this should be have is to test what Elastic Agent does and then follow its behavior because this problem is completely solved there. When dynamic variable substitution is involved, we don't need to fail at runtime because templated inputs with the same ID end up only creating a single input in agent. I will open an issue for you to either confirm Filebeat also does this or implement that behavior based on what you find Filebeat does in that situation. |
During startup filebeat now validates the filestream inputs and fails to start if there are inputs without ID or with duplicated IDs. Duplicated IDs might cause data duplication therefore now it's mandatory to have unique and non-empty ID for each filestream input.
Proposed commit message
Checklist
[ ] I have made corresponding change to the default configuration filesCHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.Disruptive User Impact
Impact: Users who have not configured unique IDs for their filestream inputs will need to update their configurations to include unique IDs for each input. Failure to do so will prevent Filebeat from starting.
Previously: Filebeat would only log an error if filestream would find inputs had missing or duplicated IDs, potentially leading to data duplication.
Now: Filebeat will fail to start if any filestream input lacks an ID or has a duplicated ID.
How to test this PR locally
filebeat.yml
. Change as necessary to simulate" only unique IDs, valid empty ID, duplicated IDs, duplicated empty ID.verify filebeat:
Related issues
Logs