Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Errors loading inputs and outputs configured by the Elastic Agent should be reported per unit #35874

Closed
cmacknz opened this issue Jun 21, 2023 · 1 comment · Fixed by #36183
Closed
Assignees
Labels
Team:Elastic-Agent Label for the Agent team

Comments

@cmacknz
Copy link
Member

cmacknz commented Jun 21, 2023

Today when we encounter an error loading an input or output configuration send by the Elastic Agent we cannot associate that error with a specific input or integration. Instead we report the entire Beat component process as failed:

// any error during reload changes the whole state of the beat to failed
if len(errs) > 0 {
cm.status = lbmanagement.Failed
cm.message = fmt.Sprintf("%s", errs)
}

Instead of failing the entire component, errors should always be reported for the Unit representing the input configuration that failed. This can be done using the UpdateState method of the Elastic Agent V2 client.

For outputs this should be straightforward because there is only one output unit today:

// reload the output configuration
if err := cm.reloadOutput(outputUnit); err != nil {
// Output creation failed, there is no point in going any further
// because there is no output read the events.
//
// Trying to start inputs will eventually lead them to deadlock
// waiting for the output. Log input will deadlock when starting,
// effectively blocking this manager.
err = fmt.Errorf("could not start output: %w", err)
outputUnit.UpdateState(client.UnitStateFailed, err.Error(), nil)
cm.status = lbmanagement.Failed
cm.message = err.Error()
// If there are any other errors, set the status accordingly.
// If len(errs), then the there were no previous and the only
// error has been reported already.
if len(errs) > 0 {
errs = append(errs, err)
cm.message = fmt.Sprintf("%s", errs)
}

For inputs we will need to propagate the input unit ID into the Beats object reloaded, because each input configuration is consolidated into a single Beat configuration representing all inputs:

if err := obj.Reload(inputBeatCfgs); err != nil {

Definition of Done:

  • All input and output errors are reported using the UpdateState method for the correct configuration Unit over the agent control protocol.
  • Input and output errors are associated with the correct units in the output of elastic-agent status.
  • Input and output errors are associated with the correct inputs and integrations in the Fleet UI (seen [Fleet] Implement per-integration health reporting kibana#154634).
@cmacknz cmacknz added the Team:Elastic-Agent Label for the Agent team label Jun 21, 2023
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants