-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check resource generation when processing updates of some resources to skip config regeneration #1422
Conversation
Hey @kevin85421, saw your comment here and just wanted to give you a heads up that the majority of the engineers on the team are currently on holiday until January 2nd, @sjberman may be able to take a look tomorrow, but if not, this may just sit for a little while. Hope that's alright! |
@bjee19 Thanks for letting me know! Happy holidays! |
I'd just use one PR for all of these changes, and ensure the title of the PR defines what the high level fix is. |
At first glance I think the approach looks good. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevin85421 thanks for working on this!
I left a few comments, but I think you are on the right track 👍
d14b413
to
5206313
Compare
Hi @sjberman @kate-osborn, I have already addressed the comments. In addition, there are no Kubernetes resources that miss the generation check, so there is no commit for "Requirement 2". See the PR description for more details. Could you provide a pointer on where to add tests, and are there any existing tests that I can refer to? Thanks! |
@kevin85421 If you could update the initial commit message to include the problem/solution structure, this will make it easier on our team when we merge it in. This way you can supply the commit message and we don't need to write one for you. |
5206313
to
5a880c7
Compare
Thanks! I have updated the commit message. I originally planned to update it after adding enough tests 😅. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevin85421 thanks for the details in the PR description.
The changes look good to me, I just left one small nitpicky comment.
As far as testing goes, we don't have any unit tests for the registerControllers
function because they don't add a lot of value. We would just be checking that we adding the correct options for each objectType.
In order to test this PR, I think we need to do the following:
- Run the conformance tests. I will kick off the pipeline which will run the conformance tests, so you don't need to run them locally.
- Manually verify that we don't reconcile events for GatewayClasses, Gateways, HTTPRoutes, and RefGrants when the generation doesn't change. This can be verified through logs ( You'll want to look for the log that starts with "Reconciling the resource" ) I would also double check that the GatewayClass
And
predicate works as expected.
@pleshakov does this seem like a good approach or do you think it's worth it to add a unit test for the registerControllers
function?
that makes sense to me If the controllers still reconcile when there is no gen change (because of the bug), will it trigger rebuild of the graph and reload of NGINX configuration? |
Thanks, @kate-osborn and @pleshakov! I have just written down how I test this PR manually. Please let me know if this is not sufficient. Thanks!
|
Hi @kevin85421
I wonder why we wait for 40 mins. Can we wait for less time? I think we can avoid waiting at all if we try to change GatewayClass/Gateway/HTTPRoute/ReferenceGrant in the API without changing their generation. We can do that by adding or updating an annotation on the resource - such operation doesn't change resource generation. For example, if I add an annotation to an HTTPRoute resource:
However, note that Currently, I see the following in the logs:
Which should not happen as a result of your PR. We can also look at Prometheus metrics (https://docs.nginx.com/nginx-gateway-fabric/how-to/monitoring/monitoring/#controller-runtime-metrics ). Their include controller-runtime metrics. So that we can check resource reconciliation count. For example:
I expect the following to not change if our controllers successfully filter out generation changes.:
|
@pleshakov Yes, it would trigger a rebuild and reload. I think an automated test is a good idea. I added an issue: #1463 |
@pleshakov Thank you for the reply!
No. I just want to ensure the NGF controller doesn't requeue k8s resources unconditionally. I have already updated #1422 (comment). I added annotations to all related Kubernetes resources and checked the Prometheus metrics. |
Gentle ping - Do the tests in #1422 (comment) make sense to you? cc @kate-osborn @pleshakov Thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
Sorry for the delay, @kevin85421. The test results look good! Just approved ✅ Do you mind rebasing? |
Problem: When processing updates to cluster resources, for some resources, we check their generation, so that we don't trigger state change (graph rebuild) if the generation didn't change. This is a performance optimization so that we don't rebuild the graph and as a result do not regenerate NGINX config and reload it. This is not a K8s-native solution, and may introduce complexity for the codebase. Solution: Use `GenerationChangedPredicate` in controller-runtime to filter out the resource events at the controller level.
…cate Use the `And` controller-runtime function to create a composite predicate.
Rename generationChangedPredicate to alwaysTruePredicate
Fix coding-style issue
de143ef
to
37f6b2e
Compare
Thank you for the prompt reply! I have already rebased the branch with the |
Rename alwaysTruePredicate to alwaysProcess
@kevin85421 thanks!! |
Proposed changes
Because I lack sufficient context for this repository, I decided to approach this PR step-by-step and ensure each step is correct.
Requirement 1: Move generation check from ChangeProcessor to Controllers, for the types we already check generation
Step 1: Move generation check from ChangeProcessor to Controllers, for the types we already check generation 0d20c61
type generationChangedPredicate struct{}
. (Update: Add it back in Step 2. See Step 2 for more details.)change_processor.go
, there are four kinds of Kubernetes resources that usegenerationChangedPredicate
. These include GatewayClass, Gateway, HTTPRoute, and ReferenceGrant.controller.WithK8sPredicate(k8spredicate.GenerationChangedPredicate{})
for these four Kubernetes resources inmanager.go
in this commit.Step 2: Remove tests that do not involve a
generation
change fromchange_processor_test.go
. (9e01812)change_processor.go
should not handle any cases with the same generation, as these should already be filtered out by the controller'sGenerationChangedPredicate
.type generationChangedPredicate struct{}
and replaced thepredicate
for the four Kubernetes resources inchange_processor.go
withnil
. However,predicate: nil
indicates that the state change will not trigger a graph rebuild. See [Bug] Remove unused data from persistedGVKs #1358 for more details. This behavior is incompatible with the old behavior. Hence, I addedtype generationChangedPredicate struct{}
back and made it always returntrue
to keep the behavior compatible. We may need to remove it in the future.Requirement 2: Figure out which types miss generation check and add it.
TL;DR: There is no Kubernetes resources that miss generation check.
"Requirement 1" is entirely equivalent to the existing behavior, and "Requirement 2" introduces new behavior.
GatewayClass
: The existingGatewayClassPredicate
filters out all events that are not using NGF as the controller by checkingSpec.ControllerName
. Addk8spredicate.GenerationChangedPredicate
toGatewayClass
. See Check resource generation when processing updates of some resources to skip config regeneration #1422 (comment) for more details.Gateway
: Addk8spredicate.GenerationChangedPredicate
toGateway
. See Check resource generation when processing updates of some resources to skip config regeneration #1422 (comment) for more details.HTTPRoute
: Addk8spredicate.GenerationChangedPredicate
toHTTPRoute
. The behavior remains equivalent, with or without this PR. See "Requirement 1" for more details.Service
:ServicePortsChangedPredicate
filters out all events that do not have changes inSpec.Ports
. IfServicePortsChangedPredicate
is true,k8spredicate.GenerationChangedPredicate
should always be true. Hence, we don't need to addk8spredicate.GenerationChangedPredicate
.GatewayServicePredicate
filters out all events without changes inSpec.Type
andStatus.LoadBalancer.Ingress
. As I understand it, changes inStatus
do not increment the generation value. Therefore,k8spredicate.GenerationChangedPredicate
might filter out some changes related toStatus.LoadBalancer.Ingress
. Hence, we should not add the predicate for this case.metadata.generation
field. I am not 100% sure, but it's highly likely.Secret
:metadata.generation
field. I am not 100% sure, but it's highly likely.EndpointSlice
: Thek8spredicate.GenerationChangedPredicate
already exists.Namespace
:metadata.generation
field. I am not 100% sure, but it's highly likely.CRD
AnnotationPredicate
filters out all events without changes inmetadata.Annotations
. Changes to annotations on a custom resource will not increment themetadata.generation
. Therefore,k8spredicate.GenerationChangedPredicate
might filter out events that only involve changes inmetadata.Annotations
.Closes #825
Checklist
Before creating a PR, run through this checklist and mark each as complete.