Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client checkin spinlocks on RPC error #66

Closed
faec opened this issue Apr 5, 2023 · 1 comment · Fixed by #70
Closed

Client checkin spinlocks on RPC error #66

faec opened this issue Apr 5, 2023 · 1 comment · Fixed by #70
Assignees
Labels
8.9-candidate bug Something isn't working Team:Elastic-Agent Label for the Agent team

Comments

@faec
Copy link
Contributor

faec commented Apr 5, 2023

In client/client.go, (*client).startCheckin has a loop calling through to (*client).checkinRoundTrip. In normal operation the rate of these round trips is gated by the pace of the Agent's checkin responses, however if there is an error condition that prevents connecting to the Agent the loop will fall through immediately, spinlocking on the error. This manifests in live deployments as e.g. a gRPC PermissionDenied error occurring 5-10K times per second, which both worsens the effects of the error and floods the logs so that any clues about the original cause of the error are lost.

This error loop should have some minimum retry interval to prevent this symptom.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.9-candidate bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants