Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[cmd/opampsupervisor]: Supervisor waits for configurable healthchecks to report remote config status #33805

Closed
wants to merge 6 commits into from

Conversation

srikanthccv
Copy link
Member

Description:

This pull request addresses the remote config status reporting issue discussed in #21079. Currently, there's no way to determine if the agent terminated due to an incorrect configuration. While we can use the dry-run option, it only performs basic validation. This update introduces the following options to the Agent section of the supervisor config:

  1. successful_health_checks: The number of consecutive successful checks needed before reporting an APPLIED status.
  2. config_apply_timeout: If neither success nor failure is reported within this time frame, a FAILED status is reported.

Link to tracking Issue: #21079

Testing: Added e2e test

Documentation:

@srikanthccv srikanthccv marked this pull request as ready for review July 2, 2024 09:52
@srikanthccv srikanthccv requested a review from a team July 2, 2024 09:52
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jul 17, 2024
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Jul 31, 2024
@BinaryFissionGames
Copy link
Contributor

@srikanthccv I don't think I saw this was open 😞 - Could you re-open this/make a new PR? I'll give it a review.

@Asarew
Copy link

Asarew commented Aug 21, 2024

I don't think that while the supervisor is applying the new config it should report Healthy = false. We should report RemoteConfigStatus.Status = APPLYING together with Healthy = true as it's following a desired flow and nothing unusual is happening.

@srikanthccv
Copy link
Member Author

@BinaryFissionGames, I have been meaning to submit a new PR but have had a tight work schedule for quite some time. I promise to get back to this soon.

@tigrannajaryan
Copy link
Member

I don't think that while the supervisor is applying the new config it should report Healthy = false. We should report RemoteConfigStatus.Status = APPLYING together with Healthy = true as it's following a desired flow and nothing unusual is happening.

I would suggest a slight modification from this logic. While applying the new config the Supervisor should continue reporting the value of Healthy as it was known before we started applying the new config. If the agent was previously healthy then keep reporting it is healthy, if it was unhealthy keep reporting it is unhealthy, until we get a new health status.

I agree we should report RemoteConfigStatus.Status = APPLYING.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants