-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve configuration change detection #2656
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: aledbf The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
ff26ed1
to
5ec16e3
Compare
Codecov Report
@@ Coverage Diff @@
## master #2656 +/- ##
==========================================
+ Coverage 40.82% 40.93% +0.11%
==========================================
Files 72 72
Lines 5078 5088 +10
==========================================
+ Hits 2073 2083 +10
+ Misses 2723 2721 -2
- Partials 282 284 +2
Continue to review full report at Codecov.
|
What about forcereload we currently have? Is it not supposed to cover this case? |
internal/ingress/status/status.go
Outdated
@@ -185,7 +185,7 @@ func NewStatusSyncer(config Config) Sync { | |||
go st.syncQueue.Run(time.Second, stop) | |||
wait.PollUntil(updateInterval, func() (bool, error) { | |||
// send a dummy object to the queue to force a sync | |||
st.syncQueue.Enqueue("sync status") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we have to reload/regenerate Nginx config when controller leader change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unnecessary now. Removed.
internal/ingress/status/status.go
Outdated
@@ -185,7 +185,7 @@ func NewStatusSyncer(config Config) Sync { | |||
go st.syncQueue.Run(time.Second, stop) | |||
wait.PollUntil(updateInterval, func() (bool, error) { | |||
// send a dummy object to the queue to force a sync | |||
st.syncQueue.Enqueue("sync status") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening currently here? Runtime error because of an incorrect type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that it is an interface, so eventually it will get passed to cache.DeletionHandlingMetaNamespaceKeyFunc
to generate a key. And https://github.com/kubernetes/client-go/blob/8aceb98010c1c18b6b54a35b52fd5b46905e3d7f/tools/cache/store.go#L77 will make sure the key is "sync status"
internal/ingress/controller/nginx.go
Outdated
@@ -153,7 +152,7 @@ Error loading new template: %v | |||
|
|||
n.t = template | |||
glog.Info("New NGINX configuration template loaded.") | |||
n.SetForceReload(true) | |||
n.syncQueue.Enqueue(task.GetDummyObject("template-change"), false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for better readability should we create helper functions for these two operations? i.e something like
enqueueTask(obj)
enqueueSkippableTask(obj)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the changes in the PR 👍
But I could not figure out what's the edge case that existing forceReload functionality does not cover but this does. Can you elaborate?
The force reload approach works only if we can enqueue objects. Without the new parameter in enqueue, a change in the configmap could be skipped. Also, to reach the force reload validation the model must be different. This is not true if there is no change in Ingress rules, that's the reason for the new checksum field in the configuration type. |
cec6143
to
da6ca01
Compare
When
In case of configmap, how can this happen? In the code I see
Could you elaborate? 🤔 from what I read, force reload forces the sync method to always execute |
internal/task/queue.go
Outdated
Timestamp: ts, | ||
Key: key, | ||
Timestamp: ts, | ||
IsSkippable: skippable, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we piggyback on Timestamp
field to implement "skippable" logic? For example if we want to enqueue an event that's not skippable we can set its Timestamp to current time + an hour. This way t.lastSync > item.Timestamp
will not hold, and worker won't skip the event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -311,10 +308,11 @@ func (n *NGINXController) Start() { | |||
if evt, ok := event.(store.Event); ok { | |||
glog.V(3).Infof("Event %v received - object %v", evt.Type, evt.Obj) | |||
if evt.Type == store.ConfigurationEvent { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's special with this event type? Why can we not simply enqueue a skippable task?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because event could be en endpoint, service, ingress, secret or configmap but only on a change in a configmap should escape the enqueue skippable logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but only on a change in a configmap should escape the enqueue skippable logic
This is the part I don't understand completely. My current understanding why we have to force reload in this case is because in syncIngress
function the model we use currently does not include this configmap data, and therefore when the change is only about configmap it does not regenerate the Nginx configuration.
But in this PR you are adding a new field to that model (ConfigurationChecksum
) which to my understanding means we don't need this special case anymore when event type is ConfigurationEvent
.
To reach the syncIngress function an item must be processed by the syncQueue. In this queue we use a time window to handle changes using a batch approach to avoid multiple reloads. This means in some scenarios, like simultaneous updates, we could skip a reload. This means we would never reach the isForceReload check. |
@ElvinEfendi you can tests this easilly following this example #2567 (comment) |
@aledbf as you can see in that example, the controller logs
which comes from |
How are you using for the test, this PR or 0.15.0? |
@aledbf I have not run the test myself, that's just the logs from @antoineco's test in the comment, does it matter though?
Can this be a race issue? I see that https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/store/store.go#L178 is not synchronized and it is being written from one goroutine (https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/store/store.go#L700) and red from another goroutine (https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/nginx.go#L427) And you confirmed that this PR fixes the issue? |
No, but I will add a mutex to avoid this from happening. |
If you check the generated nginx.con you will see that the reload didn't change the configuration file. |
I tried to test this branch but I'm getting
Also, this issue #2567 seems to be specifically about |
7bc5b3a
to
0ec99f8
Compare
internal/ingress/controller/nginx.go
Outdated
@@ -311,10 +308,11 @@ func (n *NGINXController) Start() { | |||
if evt, ok := event.(store.Event); ok { | |||
glog.V(3).Infof("Event %v received - object %v", evt.Type, evt.Obj) | |||
if evt.Type == store.ConfigurationEvent { | |||
n.SetForceReload(true) | |||
n.syncQueue.EnqueueTask(task.GetDummyObject("configmap-change")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this special case is necessary after you added checksum to the model? Please also refer to https://github.com/kubernetes/ingress-nginx/pull/2656/files#r196736703.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain why this special case is necessary after you added checksum to the model?
If we don't add this we could skip one update (don't adding an element to the queue)
key := k8s.MetaNamespaceKey(ingKey) | ||
ing, err := store.GetIngress(key) | ||
if err != nil { | ||
glog.Errorf("could not find Ingress %v in local store", key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "not found" the only error GetIngress
returns? Would it be useful to include err
message itself in the log as well?
@@ -479,6 +479,18 @@ func New(checkOCSP bool, | |||
if key == configmap { | |||
store.setConfig(cm) | |||
} | |||
|
|||
ings := store.listers.IngressAnnotation.List() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is going to fix the issue, it seems hacky to me. I think the root of the issue is the way IP whitelisting is implemented at
defBackend := a.r.GetDefaultBackend() |
This is adding an extra cost that we can avoid by fixing root cause of the issue:
Reading https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/annotations/ipwhitelist/main.go#L80 what it's doing is if whitelist-source-range
annotation is not set then we use whitelist-source-range
value from the configmap otherwise we use the one from the annotation. IMO that logic should not be part of the annotation parsing - annotation parsing should not be concerned about configmap. I suggest we move that logic into the template, in other words in the template we first check location.Whitelist.CIDR
if it is not empty we then use that to generate geo ...
config otherwise we use $cfg.WhitelistSourceRange
to do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I share that opinion, the improvement to the config detection is independent from the issue related to whitelist-source-range
, this should be made clearer via a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@antoineco my comment was mainly about the fact that this fix is a hack, it is not fixing the root of the issue and unnecessarily introducing extra work on configmap updates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While this is going to fix the issue, it seems hacky to me. I think the root of the issue is the way IP whitelisting is implemented at
This approach is used in all the annotation parsing step. Changing that in this PR is not going to happen.
IMO that logic should not be part of the annotation parsing - annotation parsing should not be concerned about configmap
You are right here, this should not be part of the annotation parsing but right now we have no alternative. This is the most important reason why I want to move to CRDs and don't use annotations anymore, not only because is complex but it doesn't scale and it's impossible to add the semantics we need in the configuration options (like a list of IP addresses and not a comma-separated string that can contain anything)
I suggest we move that logic into the template
No, that would make that impossible to understand, even now the templating step contains too much logic.
|
||
Expect(checksum).NotTo(BeEquivalentTo(newChecksum)) | ||
}) | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
loving this e2e test 😍
What this PR does / why we need it:
Introduce additional information (checksum) to the configuration to allow determine if there was a change or not. Until now a change in the configuration configmap only triggered a reload after a change in some Ingress.
Which issue this PR fixes:
fixes #2567