Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Operator is blocked while waiting for resources to stabilize #678

Closed
jpkrohling opened this issue Sep 27, 2019 · 2 comments
Closed

Operator is blocked while waiting for resources to stabilize #678

jpkrohling opened this issue Sep 27, 2019 · 2 comments
Labels
enhancement New feature or request hacktoberfest

Comments

@jpkrohling
Copy link
Contributor

Not quite sure this is a bug, but it's certainly something we need to document and keep in mind. Right now, the operator has a lock and there's only one reconciliation loop at a time. Each loop will handle exactly one CR, meaning that a CR for instance-a will block CR updates related to instance-b. It will also block further updates to instance-a, which is probably the reason why the lock exists in the first place.

Typically, this isn't a big problem, as CR updates are quick. Most of the times, it will block only until the deployment/dependencies are handled, and it will happen almost instantly after the images are pulled. However, when there's a configuration problem (see #670 for an example), this will block until the reconciliation times out after 5 minutes.

The lock code comes from the operator-sdk, and we might be able to change it to a per-instance lock, so that the instance-b won't wait for instance-a to be applied. But updates to instance-a will still be blocked until a previous loop from instance-a has finished/timed out.

Right now, this is only a problem under error conditions, so, it might not be that critical, but we certainly want to experiment with locking only the resources we need, and perhaps send a signal to the loop, to cancel a reconciliation in case a new one exists in the queue.

@jpkrohling jpkrohling added enhancement New feature or request hacktoberfest labels Sep 27, 2019
@objectiser
Copy link
Contributor

we might be able to change it to a per-instance lock

Assume this means requesting an enhancement in the sdk? Or do you think they already support this with appropriate configuration?

@jpkrohling
Copy link
Contributor Author

Good question. I don't remember seeing this in the scaffold for the OpenTelemetry operator, but I do remember reading about different leader election options.

@jpkrohling jpkrohling added needs-triage New issues, in need of classification and removed needs-triage New issues, in need of classification labels Dec 16, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request hacktoberfest
Projects
None yet
Development

No branches or pull requests

2 participants