Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statuses not reported because no leader gets elected #1100

Closed
pleshakov opened this issue Sep 27, 2023 · 2 comments · Fixed by #1130
Closed

Statuses not reported because no leader gets elected #1100

pleshakov opened this issue Sep 27, 2023 · 2 comments · Fixed by #1130
Assignees
Labels
bug Something isn't working refined Requirements are refined and the issue is ready to be implemented. size/medium Estimated to be completed within a week
Milestone

Comments

@pleshakov
Copy link
Contributor

Describe the bug
If an NGF pod stops being the leader, it cannot become the leader again.
This becomes problematic when only one NGF pod is running. Because after it stops being the leader, this means it will not report any statuses. And since only one pod is running, this means no statuses will be reported at all (until the pod is restarted).

To Reproduce
The problem was observed when the pod lost connectivity to the k8s API server:

kubectl logs -n nginx-gateway <pod-name> -c nginx-gateway | grep leader
I0926 20:58:42.883382       6 leaderelection.go:250] attempting to acquire leader lease nginx-gateway/nginx-gateway-leader-election...
I0926 20:58:43.073317       6 leaderelection.go:260] successfully acquired lease nginx-gateway/nginx-gateway-leader-election
{"level":"info","ts":"2023-09-26T20:58:43Z","logger":"leaderElector","msg":"Started leading"}
E0927 08:09:20.830614       6 leaderelection.go:332] error retrieving resource lock nginx-gateway/nginx-gateway-leader-election: Get "https://10.64.0.1:443/apis/coordination.k8s.io/v1/namespaces/nginx-gateway/leases/nginx-gateway-leader-election?timeout=5s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
E0927 08:09:25.829736       6 leaderelection.go:332] error retrieving resource lock nginx-gateway/nginx-gateway-leader-election: Get "https://10.64.0.1:443/apis/coordination.k8s.io/v1/namespaces/nginx-gateway/leases/nginx-gateway-leader-election?timeout=5s": context deadline exceeded
I0927 08:09:25.830070       6 leaderelection.go:285] failed to renew lease nginx-gateway/nginx-gateway-leader-election: timed out waiting for the condition
{"level":"info","ts":"2023-09-27T08:09:25Z","logger":"leaderElector","msg":"Stopped leading"}
E0927 08:09:35.862628       6 event.go:289] Unable to write event: '&v1.Event{TypeMeta:v1.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:v1.ObjectMeta{Name:"nginx-gateway-leader-election.1788b315c7bd90e5", GenerateName:"", Namespace:"nginx-gateway", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), DeletionTimestamp:<nil>, DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]v1.OwnerReference(nil), Finalizers:[]string(nil), ManagedFields:[]v1.ManagedFieldsEntry(nil)}, InvolvedObject:v1.ObjectReference{Kind:"Lease", Namespace:"nginx-gateway", Name:"nginx-gateway-leader-election", UID:"eb133a0d-7622-4b80-a0d1-d49755e52a1f", APIVersion:"coordination.k8s.io/v1", ResourceVersion:"1044977", FieldPath:""}, Reason:"LeaderElection", Message:"nginx-gateway-b6cdb65cd-bt7zg stopped leading", Source:v1.EventSource{Component:"nginx-gateway-fabric-nginx", Host:""}, FirstTimestamp:time.Date(2023, time.September, 27, 8, 9, 25, 831766245, time.Local), LastTimestamp:time.Date(2023, time.September, 27, 8, 9, 25, 831766245, time.Local), Count:1, Type:"Normal", EventTime:time.Date(1, time.January, 1, 0, 0, 0, 0, time.UTC), Series:(*v1.EventSeries)(nil), Action:"", Related:(*v1.ObjectReference)(nil), ReportingController:"nginx-gateway-fabric-nginx", ReportingInstance:""}': 'Post "https://10.64.0.1:443/api/v1/namespaces/nginx-gateway/events?timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)'(may retry after sleeping)
{"level":"info","ts":"2023-09-27T17:19:27Z","logger":"statusUpdater","msg":"Skipping updating Nginx Gateway status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:13Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:15Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:24Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}
{"level":"info","ts":"2023-09-27T19:54:25Z","logger":"statusUpdater","msg":"Skipping updating Gateway API status because not leader"}

Expected behavior

  • The pod becomes the leader again

Your environment

NKF: ,"version":"edge","commit":"8e57fe86d311d6a618afa109999d80439d5ca9e9","date":"2023-09-22T17:16:36Z"
Server Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.3-gke.100", GitCommit:"6466b51b762a5c49ae3fb6c2c7233ffe1c96e48c", GitTreeState:"clean", BuildDate:"2023-06-23T09:27:28Z", GoVersion:"go1.20.5 X:boringcrypto", Compiler:"gc", Platform:"linux/amd64"}
@kate-osborn
Copy link
Contributor

It doesn't eventually acquire the lease?

@pleshakov
Copy link
Contributor Author

It doesn't eventually acquire the lease?

yep. well, at least not yet, from 2023-09-27T08:09:25Z until 2023-09-27T19:54:25Z

@mpstefan mpstefan added the bug Something isn't working label Sep 28, 2023
@mpstefan mpstefan added this to the v1.0.0 milestone Sep 28, 2023
@mpstefan mpstefan added size/medium Estimated to be completed within a week refined Requirements are refined and the issue is ready to be implemented. labels Sep 28, 2023
@kate-osborn kate-osborn moved this from 🆕 New to 🏗 In Progress in NGINX Gateway Fabric Oct 6, 2023
@kate-osborn kate-osborn self-assigned this Oct 6, 2023
@kate-osborn kate-osborn moved this from 🏗 In Progress to 👀 In Review in NGINX Gateway Fabric Oct 11, 2023
@github-project-automation github-project-automation bot moved this from 👀 In Review to ✅ Done in NGINX Gateway Fabric Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working refined Requirements are refined and the issue is ready to be implemented. size/medium Estimated to be completed within a week
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants