Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: The batchSyncEnabled flag is not reverted back to false when batch processing ends #6445

Closed
1 task done
dareste opened this issue Sep 17, 2024 · 1 comment · Fixed by #6446
Closed
1 task done
Assignees
Labels
bug An issue reporting a potential bug refined Issues that are ready to be prioritized
Milestone

Comments

@dareste
Copy link
Contributor

dareste commented Sep 17, 2024

Version

3.6.2

What Kubernetes platforms are you running on?

GKE Google Cloud

What happened?

One of the goals in PR #4371 was to avoid unnecessary nginx reloads in batch processing when the tasks in the queue are of kind endpointslice and are not referenced in any of the watched resources.

This is done via the batchSyncEnabled flag in the controller.sync(task) function. The flag controls whether an nginx reload will be done at the end of a batch processing event.

The flag starts as false and is set to true in two scenarios (synced task is not an endpointslice, or synced task is an endpointslice but it's not referenced in other watched resources). Unfortunately, the flag is never reverted back to false once it's set to true. This means that any batch process from that point on will trigger an nginx reload.

This results in undesired nginx reloads in environments with the following conditions:

  • There has been an event that set batchSyncEnabled to true.
  • There is a high number of non-interesting (i.e. not referenced) endpointslice resources.
  • These resources change frequently, up to the point where is not unusual to enter into batch processing mode.

In these environments, one can observe one or more of these behaviors:

  • Eventual reloads not associated with any relevant change.
  • Reloads in loop every N seconds.
  • Reloads in loop for a period of time after a NIC pod restarts.

Steps to reproduce

  1. Create a k8s stack with a high number of endpointslice objects that also change frequently (*)
  2. Monitor the NIC pod and verify that the pace of changes is enough for the system to enter into batch processing mode from time to time.
  3. Change existing config, or create new one (try to do a few changes at once, as it will increase the likelihood of observing the issue).

(*) Operators that implement endpointslicemirroring-controller.k8s.io, like https://github.com/zalando/postgres-operator, will help with that.

Expected behaviour

No response

Kubectl Describe output

No response

Log output

No response

Contributing Guidelines

  • I confirm that I have read the Report a Bug section of the Contributing Guidelines
@dareste dareste added bug An issue reporting a potential bug needs triage An issue that needs to be triaged labels Sep 17, 2024
Copy link

Hi @dareste thanks for reporting!

Be sure to check out the docs and the Contributing Guidelines while you wait for a human to take a look at this 🙂

Cheers!

@shaun-nx shaun-nx added this to the v3.7.0 milestone Sep 18, 2024
@shaun-nx shaun-nx added refined Issues that are ready to be prioritized and removed needs triage An issue that needs to be triaged labels Sep 19, 2024
@shaun-nx shaun-nx linked a pull request Sep 23, 2024 that will close this issue
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug An issue reporting a potential bug refined Issues that are ready to be prioritized
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants