-
Notifications
You must be signed in to change notification settings - Fork 39.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Starting in 1.25 clusters, services of type=LB and xTP=Local sometimes does not update node backends on load balancers #112793
Comments
@swetharepakula: This issue is currently awaiting triage. If a SIG or subproject determines this is a relevant issue, they will accept it by applying the The Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/sig network |
/cc |
I have done a test with code on master and looked through the changes that have been made since 1.25 was cut and this issue seems to be fixed. I believe this issue is specific only to 1.25 |
/assign I filed #112798 which provides an in-depth answer as to why that fixes the problem |
with #112807 merged, is this resolved? |
Yes, it should be. I think this was verified by @swetharepakula last Friday? |
This is resolved with #112807 /close |
@swetharepakula: Closing this issue. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
…sion Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
Service handling logic in k8s.io/cloud-provider v0.25.(0,1,2) is broken. This issue was fixed upstream, but not released yet. We are expecting that this fixes will be included into v0.25.3, but now need to replace released `cloud-provider` library version with the latest version from `release-1.25` branch. Xref: kubernetes/kubernetes#112793 This commit is expected to be dropped during next rebase.
@alexanderConstantinescu The revert that you put up in #112807 went directly into the release-1.25 branch. I believe I am seeing the same issue again on release-1.26 and release-1.27. Did a revert or fix go up to the master branch that I've missed? |
@JoelSpeed: There was no revert PR needed on master, because master (at that time) didn't have the bug. This is the PR I filed which attempted to fix the issue on 1.25: #112798 but which later got superseded by the revert PR. It explains the issue in greater depth. What issue are you seeing? |
Observing on Azure, with the in-tree cloud provider in 1.27 (this could be a bug directly in the Azure code to be fair but wanted to rule this bug out too)
The symptoms look the same as this bug, but I'm guessing now that we are using |
I think I found the issue, kubernetes-sigs/cloud-provider-azure#4230 |
The revert PR (#112807) only landed in the release-1.25 branch, and #112798 never merged... what resolved this issue in master and 1.26+? Can you link to the later PRs in master that fixed the issue? |
This bug was fixed on 1.26 during the feature cycle. Here's a timeline of events:
I.e: we could have applied #112798 in step 3, but chose to revert instead Let me know if this is not clear? |
What happened?
When upgrading nodes from 1.24 to 1.25, on a cluster where master is already at 1.25, I notice that my Service
type=LoadBalancer
andxTP=Local
have an incorrect set of nodes after the nodes have been upgraded. The set contains only the old nodes that no longer exist resulting my service being unavailable through my load balancer.What did you expect to happen?
I would expect that the load balancer to be properly updated with the new set of nodes after the upgrade.
How can we reproduce it (as minimally and precisely as possible)?
type=LoadBalancer
andxTP=Local
Anything else we need to know?
The existing logging is not enough to diagnose the issue. I added some more logs and ran the KCM at log level =5 to find the root cause.
There was a change introduced to reduce the number syncs for
xTP=Local
services: #109706. With this change, there are situations where thexTP=Local
never gets updated.The following is the chain of events.
triggerNodeSync()
:kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go
Lines 169 to 192 in a866cbe
nodeTriggerSync()
, the nodeLister (line 264) filters for Ready only nodes so it does not have the new node or still contains the deleted node, which means thatc.needFullSync = false
when line 281 is executed.kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go
Lines 260 to 288 in a866cbe
Following down the chain of functions that are called (across goroutines communicating with nodeSyncCh), we end up at
nodeSyncInternal
. Becausec.needFullSync = false
, we will only do a sync of services that were marked for retry. If the state previously was good, this meansc.servicesToUpdate
has 0 services before enteringupdateLoadBalancerHosts
.kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go
Lines 725 to 741 in a866cbe
Nodes are queried again from the NodeLister. But this time the new node or the deleted node is reflected.
kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go
Lines 782 to 811 in a866cbe
nodeSyncService is then parallelized based on the length of services. In this case we have no services, so we do no updates but on line 808, c.lastSyncedNodes is set to the nodes found in step 4.
kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go
Line 808 in a866cbe
A subsequent full sync does not fix things. We go through steps 1 through 5 again, the difference being
c.needFullSync=true
. This will mean that in step 4 & 5,c.servicesToUpdate
will not be empty resulting in the following:nodeSyncService
we filter based on predicates for thexTP=Local
andxTP=Cluster
. In the case ofxTP=Local
since we do not pay attention toReady
status, all of the nodes in thec.lastSyncedNodes
will be theoldNodes
. And since no node creations or deletions have occurred, all thenewNodes
will be the same. This results in no sync (line 767).kubernetes/staging/src/k8s.io/cloud-provider/controllers/service/controller.go
Lines 759 to 777 in a866cbe
In the the
xTP=Cluster
, the node ready status is used to filter the nodes, soc.LastSyncedNodes
would already have the newly created node, but it would not be ready. Which means the node will not be part ofoldNodes
but it will exist innewNodes
and allows the sync to continue as expected.Kubernetes version
Cloud provider
OS version
N/A
Install tools
N/A
Container runtime (CRI) and version (if applicable)
N/A
Related plugins (CNI, CSI, ...) and versions (if applicable)
N/A
The text was updated successfully, but these errors were encountered: