-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cmd/opampsupervisor] Report bad remote config to OpAMP server #21079
Comments
This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping |
I can pick this up. |
… to report remote config status (#34907) **Description:** This pull request addresses the remote config status reporting issue discussed in #21079 by introducing the following options to the Agent config: 1. `config_apply_timeout`: config update is successful if we receive a healthy status and then observe no failure updates for the entire duration of the timeout period; otherwise, failure is reported. **Link to tracking Issue:** #21079 **Testing:** Added e2e test **Documentation:** <Describe the documentation added.>
… to report remote config status (open-telemetry#34907) **Description:** This pull request addresses the remote config status reporting issue discussed in open-telemetry#21079 by introducing the following options to the Agent config: 1. `config_apply_timeout`: config update is successful if we receive a healthy status and then observe no failure updates for the entire duration of the timeout period; otherwise, failure is reported. **Link to tracking Issue:** open-telemetry#21079 **Testing:** Added e2e test **Documentation:** <Describe the documentation added.>
… to report remote config status (open-telemetry#34907) **Description:** This pull request addresses the remote config status reporting issue discussed in open-telemetry#21079 by introducing the following options to the Agent config: 1. `config_apply_timeout`: config update is successful if we receive a healthy status and then observe no failure updates for the entire duration of the timeout period; otherwise, failure is reported. **Link to tracking Issue:** open-telemetry#21079 **Testing:** Added e2e test **Documentation:** <Describe the documentation added.>
Component(s)
No response
Is your feature request related to a problem? Please describe.
Currently, the Supervisor only tries to compose the effective configuration for the Collector before reporting that remote configuration has been applied. We should instead check to make sure that the Collector successfully starts or fails to start before reporting that it has been successfully applied or failed respectively.
Describe the solution you'd like
Wait to receive a healthcheck or crash from the Collector before reporting the final remote configuration status to the server. The Supervisor can still report that it is applying the remote configuration as it composes the effective configuration and signals the Collector to load the new configuration.
It should be clarified whether changing telemetry/other connection settings should also trigger the Supervisor to report that a remote config has been applied.
I'm considering all Collector pipelines starting to be a successfully applied configuration even if the configuration may have bugs while handling telemetry records. We could consider using a configurable waiting period or number of healthchecks to be required before declaring the configuration as applied.
Describe alternatives you've considered
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: