Skip to content

Commit

Permalink
fix(log): add custom error handler for Kubernetes API errors (#1024)
Browse files Browse the repository at this point in the history
# Description


This pull request includes changes to improve error handling and logging
in the Kubernetes watcher and to simplify error checking in the endpoint
reconciler. The most important changes include adding a custom error
handler for the Kubernetes watcher, importing necessary packages, and
simplifying error handling logic.

Improvements to error handling and logging:

*
[`pkg/k8s/watcher_linux.go`](diffhunk://#diff-1769e0320129167654a2a0d5f382b63fb459aadf221d3ba04df1f1a56188f6d2R105-R123):
Added a custom error handler `retinaK8sErrorHandler` to log specific
Kubernetes API server errors and tag them for easier identification.
*
[`pkg/k8s/watcher_linux.go`](diffhunk://#diff-1769e0320129167654a2a0d5f382b63fb459aadf221d3ba04df1f1a56188f6d2R23-R29):
Registered the custom error handler in the `init` function to ensure it
is used by the watcher.

Code simplification:

*
[`pkg/controllers/operator/cilium-crds/endpoint/endpoint_controller.go`](diffhunk://#diff-0a6e7a396be9617c3c31afb9cf9f740b75e645a533833d049726db8321d13df9L536-R536):
Simplified the error checking logic in `handlePodUpsert` by removing
redundant error check.


## Checklist

- [X] I have read the [contributing
documentation](https://retina.sh/docs/contributing).
- [X] I signed and signed-off the commits (`git commit -S -s ...`). See
[this
documentation](https://docs.github.com/en/authentication/managing-commit-signature-verification/about-commit-signature-verification)
on signing commits.
- [X] I have correctly attributed the author(s) of the code.
- [X] I have tested the changes locally.
- [X] I have followed the project's style guidelines.
- [X] I have updated the documentation, if necessary.
- [X] I have added tests, if applicable.

## Testing

I removed permission for retina agent to read nodes and services. I can
see the completer error as as our custom message coming from retina.
```
time="2024-11-26T16:05:33Z" level=error msg="Potentially Network Error coming from K8s API Server failing to watch Services" actualError="pkg/mod/k8s.io/[email protected]/tools/cache/reflector.go:232: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:kube-system:retina-agent\" cannot list resource \"services\" in API group \"\" at the cluster scope" subsys=k8s-watcher
```
---

Please refer to the [CONTRIBUTING.md](../CONTRIBUTING.md) file for more
information on how to contribute to this project.
  • Loading branch information
ritwikranjan authored Dec 16, 2024
1 parent 31bf97f commit 37cade0
Show file tree
Hide file tree
Showing 2 changed files with 38 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -533,7 +533,7 @@ func (r *endpointReconciler) handlePodUpsert(ctx context.Context, newPEP *PodEnd
// May end up getting another endpoint ID below if we try to create the CEP below.
// No downside to this.

if !k8serrors.IsNotFound(err) && err != nil {
if !k8serrors.IsNotFound(err) {
r.l.WithError(err).WithFields(logrus.Fields{
"podKey": newPEP.key.String(),
"pep": newPEP,
Expand Down
37 changes: 37 additions & 0 deletions pkg/k8s/watcher_linux.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,12 @@ package k8s

import (
"context"
"strings"
"sync"
"time"

"k8s.io/apimachinery/pkg/util/runtime"

agentK8s "github.com/cilium/cilium/daemon/k8s"
"github.com/cilium/cilium/pkg/hive/cell"
"github.com/cilium/cilium/pkg/ipcache"
Expand All @@ -15,8 +18,17 @@ import (
"github.com/cilium/cilium/pkg/logging"
"github.com/cilium/cilium/pkg/logging/logfields"
"github.com/cilium/cilium/pkg/option"
"github.com/sirupsen/logrus"
)

func init() {
// Register custom error handler for the watcher
// nolint:reassign // this is the only way to set the error handler
runtime.ErrorHandlers = []func(error){
k8sWatcherErrorHandler,
}
}

const (
K8sAPIGroupCiliumEndpointV2 = "cilium/v2::CiliumEndpoint"
K8sAPIGroupServiceV1Core = "core/v1::Service"
Expand Down Expand Up @@ -92,3 +104,28 @@ func Start(ctx context.Context, k *watchers.K8sWatcher) {
<-syncdCache
logger.Info("Kubernetes watcher synced")
}

// retinaK8sErrorHandler is a custom error handler for the watcher
// that logs the error and tags the error to easily identify
func k8sWatcherErrorHandler(e error) {
errStr := e.Error()
logError := func(er, r string) {
logger.WithFields(logrus.Fields{
"underlyingError": er,
"resource": r,
}).Error("Error watching k8s resource")
}

switch {
case strings.Contains(errStr, "Failed to watch *v1.Node"):
logError(errStr, "v1.Node")
case strings.Contains(errStr, "Failed to watch *v2.CiliumEndpoint"):
logError(errStr, "v2.CiliumEndpoint")
case strings.Contains(errStr, "Failed to watch *v1.Service"):
logError(errStr, "v1.Service")
case strings.Contains(errStr, "Failed to watch *v2.CiliumNode"):
logError(errStr, "v2.CiliumNode")
default:
k8s.K8sErrorHandler(e)
}
}

0 comments on commit 37cade0

Please sign in to comment.