-
Notifications
You must be signed in to change notification settings - Fork 45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added Sysdig agent support and testing automations #315
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manuelbcd Looks like the namespace is incorrect. Our testing is failing. Please make sure to test it fully and PR updates. Also share your license keys via email or slack.
Namespace fixed. Thanks for the heads up @elamaran11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@manuelbcd Have more comments. Also please test it locally. One of the pods is failing
ubuntu@ip-10-226-86-188:~$ kubectl logs -n sysdig-agent sysdig-agent-sysdig-node-analyzer-wsg4c
Defaulted container "sysdig-runtime-scanner" out of: sysdig-runtime-scanner, sysdig-host-scanner
{"level":"info","version":"v1.8.1","time":"2024-11-15T14:05:15Z","message":"Starting Runtime Scanner"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","nodeInfo":{"RuntimeName":"containerd","RuntimeVersion":"1.7.12","Architecture":"amd64","KernelVersion":"5.15.0-122-generic","KubeletVersion":"v1.30.0-eks-036c24b","KubeProxyVersion":"v1.30.0-eks-036c24b","OSImage":"Ubuntu 22.04.4 LTS","OS":"linux","ServerGitVersion":"v1.30.5-eks-ce1d5eb","ServerGoVersion":"go1.22.6"},"platform":"linux/amd64","time":"2024-11-15T14:05:16Z","message":"node info detected"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","ContainerRuntime":"containerd","time":"2024-11-15T14:05:16Z","message":"container runtime client built successfully"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","time":"2024-11-15T14:05:16Z","message":"using default vulnerability db version: V2"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","time":"2024-11-15T14:05:16Z","message":"starting probes server on :7002"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","time":"2024-11-15T14:05:17Z","message":"Starting metrics server on :25001 exposing /metrics ..."}
{"level":"error","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","error":"failed to perform keepalive request: agents conf API returned invalid http status: 401","time":"2024-11-15T14:05:17Z","message":"failed to send keepalive"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","seconds":106,"time":"2024-11-15T14:05:17Z","message":"startup sleep"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","seconds":106,"time":"2024-11-15T14:07:03Z","message":"sleep finished"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","time":"2024-11-15T14:07:03Z","message":"getting vulnerabilities DB"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","time":"2024-11-15T14:07:03Z","message":"retrieving presigned url for DB"}
{"level":"error","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","error":"error refreshing db: failed refreshing vuln DB: failed to retrieve download url for the main db: vulns API returned invalid http status: 401","time":"2024-11-15T14:07:03Z","message":"error during startup"}
{"level":"info","version":"v1.8.1","scannerId":"myclusterName:f0b94697-9379-41ef-a215-4db9fabc57d0:node02","containerRuntimeName":"containerd","scheduler":"keepalive","time":"2024-11-15T14:07:03Z","message":"root context done. CtxErr: context canceled"}
Also your test job is failing too. See the logs below. Plus your job should be a CronJob
which runs on schedule:
ubuntu@ip-10-226-86-188:~$ kubectl logs sysdig-agent-test-6t9vt -n sysdig-agent
# Validation process started #
timed out waiting for the condition on pods/sysdig-agent-sysdig-bmvv2
timed out waiting for the condition on pods/sysdig-agent-sysdig-d2t4v
timed out waiting for the condition on pods/sysdig-agent-sysdig-ng7dx
timed out waiting for the condition on pods/sysdig-agent-sysdig-zzbqr
error: expected 'logs [-f] [-p] (POD | TYPE/NAME) [-c CONTAINER]'.
POD or TYPE/NAME is a required argument for the logs command
See 'kubectl logs -h' for help and examples
# Error: Sysdig Agent couldn't connect with the server. Please check egress, region and token #aws-cloudsoft-sleek-collaboration
@manuelbcd Readiness probe is failing for all ubuntu@ip-10-226-86-188:~$ kubectl get all -n sysdig-agent
NAME READY STATUS RESTARTS AGE
pod/sysdig-agent-sysdig-5np5r 0/1 Init:0/1 0 2m44s
pod/sysdig-agent-sysdig-jmzcm 0/1 Running 0 2m48s
pod/sysdig-agent-sysdig-lw2r9 0/1 Running 0 2m50s
pod/sysdig-agent-sysdig-node-analyzer-76drd 2/2 Running 4 (3m50s ago) 43m
pod/sysdig-agent-sysdig-node-analyzer-t9bjc 1/2 Running 6 (40s ago) 43m
pod/sysdig-agent-sysdig-node-analyzer-wsg4c 1/2 CrashLoopBackOff 7 (59s ago) 43m
pod/sysdig-agent-sysdig-node-analyzer-wxbm2 2/2 Running 5 (11m ago) 43m
pod/sysdig-agent-sysdig-vmqm4 0/1 Pending 0 2m53s
pod/sysdig-agent-test-k9b88 0/1 Error 0 8m32s
pod/sysdig-agent-test-lvjl4 1/1 Running 0 18s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/sysdig-agent-sysdig ClusterIP 172.20.14.156 <none> 7765/TCP 43m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/sysdig-agent-sysdig 4 4 0 4 0 <none> 43m
daemonset.apps/sysdig-agent-sysdig-node-analyzer 4 4 2 4 2 <none> 43m
NAME STATUS COMPLETIONS DURATION AGE
job.batch/sysdig-agent-test Running 0/1 8m32s 8m32s Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 2m44s default-scheduler Successfully assigned sysdig-agent/sysdig-agent-sysdig-lw2r9 to node03
Normal Pulled 2m43s kubelet Container image "quay.io/sysdig/agent-kmodule:13.5.0" already present on machine
Normal Created 2m43s kubelet Created container sysdig-agent-kmodule
Normal Started 2m43s kubelet Started container sysdig-agent-kmodule
Normal Pulled 2m28s kubelet Container image "quay.io/sysdig/agent-slim:13.5.0" already present on machine
Normal Created 2m28s kubelet Created container sysdig
Normal Started 2m28s kubelet Started container sysdig
Warning Unhealthy 3s (x7 over 53s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503 |
Hi. Test results were fine after some tuning. Thanks for your patience.
|
@manuelbcd The test ran good now. But you have not addressed my feedback on changing Job to CronJob schedule which runs once each day
|
Once you make that change i will merge. |
@manuelbcd The CronJob update is incorrect. Please fix it, run it once and let me know
|
Updated. Please check again, @elamaran11 |
@manuelbcd Appreciate on working tirelessly. The job fails now
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All pods reach a ready state, testing pod is a 24hr cronjob with the requested focus on functionality and the testers complete successfully in our environments - LGTM
This should be good to merge, @elamaran11 FYI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All pods reach a ready state, testing pod is a 24hr cronjob with the requested focus on functionality and the testers complete successfully in our environments after the requested update to work with long running pods. Deployment comments have been remediated - LGTM
Requested Edits have been made, reviewed and tested
Sysdig Agent for EKS anywhere / EKS hybrid / EKS baremetal
Automation tasks for QA