Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU and memory usage of k3s #2278

Closed
sraillard opened this issue Sep 18, 2020 · 72 comments
Closed

CPU and memory usage of k3s #2278

sraillard opened this issue Sep 18, 2020 · 72 comments

Comments

@sraillard
Copy link

Environmental Info:
K3s Version: k3s version v1.18.8+k3s1 (6b59531)
Running on CentOS 7.8

Node(s) CPU architecture, OS, and Version:
Linux k3s 3.10.0-1127.19.1.el7.x86_64 #1 SMP Tue Aug 25 17:23:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
AMD GX-412TC SOC with 2GB RAM

Cluster Configuration:
Single node installation

Describe the bug:

When deploying the latest stable k3s on a single node, the CPU and memory usage may look important.
I understand that Kubernetes isn't lightweight by definition, but the k3s is really interessing for creating/deploying appliances.
On small (embedded) systems, the default CPU and memory usage is important (I'm not speaking here for modern servers).
Is-there a way to optimize these ressources usage or at least to understand the k3s usage of ressources when nothing is deployed?

Steps To Reproduce:

  • Installed K3s:
    curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --no-deploy traefik" sh

Expected behavior:

Maybe less CPU and memory usage when nothing is deployed and running

Actual behavior:

500MB of memory used and 5% of CPU usage on each core (4 cores CPU) when idle

Additional context / logs:

@onedr0p
Copy link
Contributor

onedr0p commented Sep 18, 2020

Duplicate #294

...This is a known issue for some people

@sraillard
Copy link
Author

Thanks for the duplicate, I didn't see it. They are more looking at CPU usage than memory usage whereas both are important on low power embedded computers.

And what seems important to me is why this consumption of resources when k3s is idle without any running pod?

@runningman84
Copy link

From my point of view it is valid to have a new issue because the old one is closed without any solution.

Especially if you think about the primary use case which is a low powered edge case computing setup.

Fixing or even responding to this issue should be a top priority for the k3s team.

@brandond
Copy link
Member

brandond commented Sep 20, 2020

500MB of RAM and 5% cpu is about as good as you are going to get for an entire Kubernetes cluster + infrastructure pods running on a single node. What about this would you like to see fixed? Do you have an example of another Kubernetes distribution that offers similar features in a smaller footprint?

@runningman84
Copy link

500MB of RAM and 5% cpu looks fine for me.

My own experience is more like this:
image

@onedr0p
Copy link
Contributor

onedr0p commented Sep 21, 2020

Here's a pretty much idle master with no workers and no deployments:

2020-09-21 15 29 57

Compared to my master that has 5 workers and about 50 deployments:
2020-09-21 15 34 25

You can see the active master is jumping from 20%-150%+ sometimes greater than 300% CPU usage on k3s.

Something seems off to me, but it's working /shrug

@sraillard
Copy link
Author

@onedr0p and @runningman84, can you provide the CPU reference of the system? It's looking like you have both system with 16GB memory. In my case, I have only 2GB of memory!

@onedr0p
Copy link
Contributor

onedr0p commented Sep 21, 2020

My CPU is Intel® Core™ i3-8109U in a NUC8i3BEK, RAM is Crucial 16GB Kit (8GBx2) DDR4 2400 MT/S (PC4-19200) SR x8 SODIMM

@brandond
Copy link
Member

brandond commented Sep 21, 2020

Note that unless your explicitly restrict it, golang will make use of additional memory to avoid having to do garbage collection too frequently. You can reduce the memory utilization at the cost of more aggressive GC cleanup by setting a MemoryLimit in the systemd unit:

[Service]
MemoryLimit=768M

Other users have similarly put CPU limits on the k3s service, although I personally think that this might have more negative effects (apiserver throttling, etc) than a memory limit.

@onedr0p
Copy link
Contributor

onedr0p commented Sep 21, 2020

although I personally think that this might have more negative effects (apiserver throttling, etc) than a memory limit.

That's my mindset too. I do think there is an issue on k3s CPU usage, however, it isn't really affecting me much. Everything seems to be running fine.

@runningman84
Copy link

I am running an intel n5000 cpu in a msi cubi system.

@sraillard
Copy link
Author

@runningman84: your k3s is running some pods? 50% CPU usage with Intel N5000 seems a lot when idle if really nothing is running.

@runningman84
Copy link

I have three nodes running about 100 pods total.

The other two nodes with k3s agents have lower cpu usage because my actual workload is not very resource heavy. (Home assistant, mqtt and other small services)

If you look at the total cpu time is is strange that value of prometheus is way lower than k3s server.

@sraillard
Copy link
Author

@runningman84 that's already a nice deployment! When I was thinking at idle, it was also without any deployment!

@sraillard
Copy link
Author

I would like at least to understand how the CPU and memory resources are used by the k3s-server process as this process is running multiple threads and for me this isn't easy to understand which thread are consuming resources for which task.

@dkarlovi
Copy link

I have the exact same issue, k3s server is burning through the CPU for whatever reason, without doing anything (nothing deployed, fresh install just now).

My processor is

Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz

and the main k3s server process never drops below 10%.

Attaching to it with strace gets me this:

strace: Process 151534 attached
futex(0x6f8fda8, FUTEX_WAIT_PRIVATE, 0, NULL

and it doesn't output anything, which is weird since it's definitely doing something.

@brandond
Copy link
Member

Even when kubernetes isn't "doing anything" it is still constantly running control loops to query the state of the system, compare that to the desired state, and determine if anything needs to be changed. It is also constantly health checking endpoints, collecting metrics, renewing locks, etc. It is never truly idle, which is why you will see a baseline amount of CPU usage even without a workload.

@dkarlovi
Copy link

@brandond seems like quite a high baseline, no? This isn't a raspberry Pi, It's a decent laptop which runs Java IDEs, etc effortlessly, 10% baseline in "idle" seems high to me.

@sraillard
Copy link
Author

I'm trying to search where the CPU cycles are going.

I have done some tests with the perf tool, running perf record -g -a sleep 10 then perf report to see what happen:
image

perf tool can provide detailed information, but I'm lacking some symbol information to identify the stack trace in the k3s process. Anyone know how to get symbols?

@tim-brand
Copy link

I'm seeing this behaviour too on my Pi4. Fresh installation with mariadb as external database.
The load average is about 2.xx.
When I enable the verbose logging, I noticed that it's continuously polling it's own API. Probably to get some status updates.

Log of a couple of seconds:

Sep 23 20:01:06 pi04 k3s[2957]: I0923 20:01:06.497988    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" latency=229.097323ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:06 pi04 k3s[2957]: I0923 20:01:06.501590    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path" latency=43.852989ms resp=200 UserAgent="local-path-provisioner/v0.0.0 (linux/arm) kubernetes/$Format" srcIP="10.42.0.20:51466":
Sep 23 20:01:06 pi04 k3s[2957]: I0923 20:01:06.674860    2957 httplog.go:90] verb="POST" URI="/apis/authorization.k8s.io/v1beta1/subjectaccessreviews" latency=3.125171ms resp=201 UserAgent="metrics-server/v0.0.0 (linux/arm) kubernetes/$Format" srcIP="10.42.0.21:58670":
Sep 23 20:01:06 pi04 k3s[2957]: I0923 20:01:06.694806    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s" latency=438.23441ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:06 pi04 k3s[2957]: I0923 20:01:06.924280    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s" latency=226.324982ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:06 pi04 k3s[2957]: I0923 20:01:06.925416    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager" latency=157.401198ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:06 pi04 k3s[2957]: I0923 20:01:06.968878    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=32.041685ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.048136    2957 trace.go:116] Trace[513042036]: "GuaranteedUpdate etcd3" type:*core.Endpoints (started: 2020-09-23 20:01:06.506811767 +0100 BST m=+320.072085415) (total time: 541.215729ms):
Sep 23 20:01:07 pi04 k3s[2957]: Trace[513042036]: [541.109545ms] [538.614941ms] Transaction committed
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.049541    2957 trace.go:116] Trace[1314711001]: "Update" url:/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path,user-agent:local-path-provisioner/v0.0.0 (linux/arm) kubernetes/$Format,client:10.42.0.20 (started: 2020-09-23 20:01:06.506106978 +0100 BST m=+320.071380459) (total time: 543.302078ms):
Sep 23 20:01:07 pi04 k3s[2957]: Trace[1314711001]: [543.05071ms] [542.561937ms] Object stored in database
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.050504    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path" latency=545.431297ms resp=200 UserAgent="local-path-provisioner/v0.0.0 (linux/arm) kubernetes/$Format" srcIP="10.42.0.20:51466":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.153250    2957 trace.go:116] Trace[1034599144]: "GuaranteedUpdate etcd3" type:*coordination.Lease (started: 2020-09-23 20:01:06.531187866 +0100 BST m=+320.096461514) (total time: 621.93976ms):
Sep 23 20:01:07 pi04 k3s[2957]: Trace[1034599144]: [621.785706ms] [619.544582ms] Transaction committed
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.154835    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s" latency=225.27927ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.158783    2957 trace.go:116] Trace[2020735212]: "Update" url:/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager,user-agent:k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election,client:127.0.0.1 (started: 2020-09-23 20:01:06.530663186 +0100 BST m=+320.095936630) (total time: 627.755515ms):
Sep 23 20:01:07 pi04 k3s[2957]: Trace[2020735212]: [622.705401ms] [622.342794ms] Object stored in database
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.159369    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" latency=657.25856ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.251811    2957 kubelet_pods.go:1360] Generating status for "helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)"
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.252795    2957 status_manager.go:435] Ignoring same status for pod "helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)", status: {Phase:Running Conditions:[{Type:Initialized Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-09-23 19:56:28 +0100 BST Reason: Message:} {Type:Ready Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-09-23 19:59:35 +0100 BST Reason:ContainersNotReady Message:containers with unready status: [helm]} {Type:ContainersReady Status:False LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-09-23 19:59:35 +0100 BST Reason:ContainersNotReady Message:containers with unready status: [helm]} {Type:PodScheduled Status:True LastProbeTime:0001-01-01 00:00:00 +0000 UTC LastTransitionTime:2020-09-23 19:56:28 +0100 BST Reason: Message:}] Message: Reason: NominatedNodeName: HostIP:192.168.1.24 PodIP:10.42.0.23 PodIPs:[{IP:10.42.0.23}] StartTime:2020-09-23 19:56:28 +0100 BST InitContainerStatuses:[] ContainerStatuses:[{Name:helm State:{Waiting:&ContainerStateWaiting{Reason:CrashLoopBackOff,Message:back-off 2m40s restarting failed container=helm pod=helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f),} Running:nil Terminated:nil} LastTerminationState:{Waiting:nil Running:nil Terminated:&ContainerStateTerminated{ExitCode:1,Signal:0,Reason:Error,Message:,StartedAt:2020-09-23 19:59:32 +0100 BST,FinishedAt:2020-09-23 19:59:34 +0100 BST,ContainerID:containerd://131f84560892f7e5f197bf36e1f9933a4c419058882b7222952acb582695e166,}} Ready:false RestartCount:5 Image:docker.io/rancher/klipper-helm:v0.2.5 ImageID:docker.io/rancher/klipper-helm@sha256:b694f931ffb70c4e0b6aedf69171936cad98e79a5df49372f0e553d7d610062d ContainerID:containerd://131f84560892f7e5f197bf36e1f9933a4c419058882b7222952acb582695e166 Started:0xd52717c}] QOSClass:BestEffort EphemeralContainerStatuses:[]}
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.254312    2957 volume_manager.go:372] Waiting for volumes to attach and mount for pod "helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)"
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.254484    2957 volume_manager.go:403] All volumes are attached and mounted for pod "helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)"
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.254568    2957 topology_manager.go:219] [topologymanager] RemoveContainer - Container ID: 131f84560892f7e5f197bf36e1f9933a4c419058882b7222952acb582695e166
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.255044    2957 kuberuntime_manager.go:579] Container {Name:helm Image:rancher/klipper-helm:v0.2.5 Command:[] Args:[install] WorkingDir: Ports:[] EnvFrom:[] Env:[{Name:NAME Value:traefik ValueFrom:nil} {Name:VERSION Value: ValueFrom:nil} {Name:REPO Value: ValueFrom:nil} {Name:VALUES_HASH Value:cfaa420eb38db08e829c9f1359fe1688d72fad5b816f2c99a4f23b58e94680fb ValueFrom:nil} {Name:HELM_DRIVER Value:secret ValueFrom:nil} {Name:CHART_NAMESPACE Value:kube-system ValueFrom:nil} {Name:CHART Value:https://%!{(MISSING)KUBERNETES_API}%!/(MISSING)static/charts/traefik-1.81.0.tgz ValueFrom:nil} {Name:HELM_VERSION Value: ValueFrom:nil} {Name:NO_PROXY Value:,10.42.0.0/16,10.43.0.0/16 ValueFrom:nil}] Resources:{Limits:map[] Requests:map[]} VolumeMounts:[{Name:values ReadOnly:false MountPath:/config SubPath: MountPropagation:<nil> SubPathExpr:} {Name:helm-traefik-token-cdrk2 ReadOnly:true MountPath:/var/run/secrets/kubernetes.io/serviceaccount SubPath: MountPropagation:<nil> SubPathExpr:}] VolumeDevices:[] LivenessProbe:nil ReadinessProbe:nil StartupProbe:nil Lifecycle:nil TerminationMessagePath:/dev/termination-log TerminationMessagePolicy:File ImagePullPolicy:IfNotPresent SecurityContext:nil Stdin:false StdinOnce:false TTY:false} is dead, but RestartPolicy says that we should restart it.
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.255616    2957 kuberuntime_manager.go:650] computePodActions got {KillPod:false CreateSandbox:false SandboxID:bc98db60dc7d81120daccaa892439205eac60be120d84b4ef1854ece022e2325 Attempt:0 NextInitContainerToStart:nil ContainersToStart:[0] ContainersToKill:map[] EphemeralContainersToStart:[]} for pod "helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)"
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.256156    2957 kuberuntime_manager.go:853] checking backoff for container "helm" in pod "helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)"
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.257872    2957 kuberuntime_manager.go:863] back-off 2m40s restarting failed container=helm pod=helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)
Sep 23 20:01:07 pi04 k3s[2957]: E0923 20:01:07.258076    2957 pod_workers.go:191] Error syncing pod c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f ("helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)"), skipping: failed to "StartContainer" for "helm" with CrashLoopBackOff: "back-off 2m40s restarting failed container=helm pod=helm-install-traefik-vn8mx_kube-system(c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f)"
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.258391    2957 event.go:278] Event(v1.ObjectReference{Kind:"Pod", Namespace:"kube-system", Name:"helm-install-traefik-vn8mx", UID:"c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f", APIVersion:"v1", ResourceVersion:"143488", FieldPath:"spec.containers{helm}"}): type: 'Warning' reason: 'BackOff' Back-off restarting failed container
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.308824    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager" latency=337.466715ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.329494    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=19.205301ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.363564    2957 configmap.go:188] Setting up volume values for pod c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f at /var/lib/kubelet/pods/c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f/volumes/kubernetes.io~configmap/values
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.363661    2957 secret.go:183] Setting up volume helm-traefik-token-cdrk2 for pod c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f at /var/lib/kubelet/pods/c4f4af74-ec2a-4ef8-a44d-34cfcaec7f0f/volumes/kubernetes.io~secret/helm-traefik-token-cdrk2
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.363727    2957 secret.go:207] Received secret kube-system/helm-traefik-token-cdrk2 containing (3) pieces of data, 1467 total bytes
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.363664    2957 configmap.go:212] Received configMap kube-system/chart-values-traefik containing (1) pieces of data, 340 total bytes
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.454765    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=123.683011ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.632406    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/default" latency=3.63161ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50882":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.637465    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/default/services/kubernetes" latency=3.603499ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50882":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.945710    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/default/endpoints/kubernetes" latency=74.195155ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50882":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.951429    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/configmaps/k3s" latency=234.064587ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50854":
Sep 23 20:01:07 pi04 k3s[2957]: I0923 20:01:07.955366    2957 httplog.go:90] verb="GET" URI="/apis/discovery.k8s.io/v1beta1/namespaces/default/endpointslices/kubernetes" latency=3.309818ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50882":
Sep 23 20:01:08 pi04 k3s[2957]: I0923 20:01:08.125626    2957 httplog.go:90] verb="GET" URI="/apis/batch/v1/jobs?limit=500" latency=11.610639ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/system:serviceaccount:kube-system:cronjob-controller" srcIP="127.0.0.1:51234":
Sep 23 20:01:08 pi04 k3s[2957]: I0923 20:01:08.138736    2957 httplog.go:90] verb="GET" URI="/apis/batch/v1beta1/cronjobs?limit=500" latency=9.640252ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/system:serviceaccount:kube-system:cronjob-controller" srcIP="127.0.0.1:51234":
Sep 23 20:01:08 pi04 k3s[2957]: I0923 20:01:08.205078    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/configmaps/k3s" latency=252.034436ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50854":
Sep 23 20:01:08 pi04 k3s[2957]: I0923 20:01:08.262686    2957 httplog.go:90] verb="GET" URI="/apis/apps/v1/statefulsets?allowWatchBookmarks=true&resourceVersion=143375&timeoutSeconds=300&watch=true" latency=5m0.001024436s resp=0 UserAgent="kubernetic-backend/v0.0.0 (darwin/amd64) kubernetes/$Format" srcIP="192.168.1.168:60441":
Sep 23 20:01:08 pi04 k3s[2957]: I0923 20:01:08.267509    2957 get.go:251] Starting watch for /apis/apps/v1/statefulsets, rv=143375 labels= fields= timeout=8m25s
Sep 23 20:01:08 pi04 k3s[2957]: I0923 20:01:08.301780    2957 node_lifecycle_controller.go:1082] Node pi04 ReadyCondition updated. Updating timestamp.
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.073484    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path" latency=20.639581ms resp=200 UserAgent="local-path-provisioner/v0.0.0 (linux/arm) kubernetes/$Format" srcIP="10.42.0.20:51466":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.076581    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/pi04?timeout=10s" latency=220.10699ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:59706":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.192035    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s" latency=31.520894ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.197971    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s" latency=34.589919ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.219586    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s" latency=25.513772ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.227292    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" latency=27.308975ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.299614    2957 httplog.go:90] verb="GET" URI="/api/v1/nodes?allowWatchBookmarks=true&resourceVersion=143375&timeout=5m7s&timeoutSeconds=307&watch=true" latency=5m7.059877608s resp=0 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:59630":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.318570    2957 get.go:251] Starting watch for /api/v1/nodes, rv=144516 labels= fields= timeout=8m59s
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.361156    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path" latency=284.717874ms resp=200 UserAgent="local-path-provisioner/v0.0.0 (linux/arm) kubernetes/$Format" srcIP="10.42.0.20:51466":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.504338    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager" latency=47.472451ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.533982    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=26.61226ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.548994    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s" latency=319.076813ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.548994    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s" latency=327.407356ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.578750    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" latency=27.304513ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.580092    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s" latency=28.079431ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.746154    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager" latency=210.200222ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.746252    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" latency=163.893241ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.747938    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s" latency=163.128601ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:09 pi04 k3s[2957]: I0923 20:01:09.767008    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=19.219356ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:10 pi04 k3s[2957]: I0923 20:01:10.006917    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=238.376022ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:10 pi04 k3s[2957]: I0923 20:01:10.230235    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/configmaps/k3s" latency=21.579832ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50854":
Sep 23 20:01:10 pi04 k3s[2957]: I0923 20:01:10.401834    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/configmaps/k3s" latency=169.75692ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50854":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.148816    2957 httplog.go:90] verb="GET" URI="/api/v1/limitranges?allowWatchBookmarks=true&resourceVersion=143375&timeout=5m15s&timeoutSeconds=315&watch=true" latency=5m15.000963051s resp=0 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50882":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.152235    2957 get.go:251] Starting watch for /api/v1/limitranges, rv=143375 labels= fields= timeout=8m46s
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.378513    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path" latency=14.368536ms resp=200 UserAgent="local-path-provisioner/v0.0.0 (linux/arm) kubernetes/$Format" srcIP="10.42.0.20:51466":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.695761    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/rancher.io-local-path" latency=314.344491ms resp=200 UserAgent="local-path-provisioner/v0.0.0 (linux/arm) kubernetes/$Format" srcIP="10.42.0.20:51466":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.782133    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s" latency=33.696187ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.784563    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s" latency=33.834333ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.817426    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" latency=32.443867ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.818910    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s" latency=31.083158ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.850859    2957 prober.go:133] Readiness probe for "traefik-758cd5fc85-2dw2z_kube-system(6da4bd7e-8d80-4031-b71e-3541b30063b0):traefik" succeeded
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.976035    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/kube-scheduler?timeout=10s" latency=155.174368ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.976070    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/kube-controller-manager?timeout=10s" latency=157.059533ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.996951    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s" latency=18.381827ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:11 pi04 k3s[2957]: I0923 20:01:11.998867    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" latency=20.373344ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.031955    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager" latency=21.399944ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.054114    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=20.474379ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.121568    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-controller-manager?timeout=10s" latency=121.114056ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50944":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.121745    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/kube-scheduler?timeout=10s" latency=123.100423ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50942":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.194719    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/endpoints/cloud-controller-manager" latency=138.934536ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.216255    2957 httplog.go:90] verb="GET" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=19.72485ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.423531    2957 httplog.go:90] verb="GET" URI="/api/v1/namespaces/kube-system/configmaps/k3s" latency=20.061588ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50854":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.434408    2957 httplog.go:90] verb="PUT" URI="/apis/coordination.k8s.io/v1/namespaces/kube-system/leases/cloud-controller-manager" latency=216.396787ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531/leader-election" srcIP="127.0.0.1:50946":
Sep 23 20:01:12 pi04 k3s[2957]: I0923 20:01:12.654247    2957 httplog.go:90] verb="PUT" URI="/api/v1/namespaces/kube-system/configmaps/k3s" latency=228.852009ms resp=200 UserAgent="k3s/v1.18.8+k3s1 (linux/arm) kubernetes/6b59531" srcIP="127.0.0.1:50854":

@sraillard
Copy link
Author

Interesting, I know the control loop is checking state to keep it as wanted, but there is maybe some optimization where on low-end systems the intervals can be longer?

@brandond
Copy link
Member

These requests are happening all the time, you're seeing them logged because the request latency is over 100ms. This usually means that your database is on slow storage or does not have sufficient cpu.

@sraillard
Copy link
Author

@brandond: @tim-brand wrote that he has enabled verbose logging, so maybe all the requests are logged even if there are under 100ms response time.

@tim-brand
Copy link

Yes, that's correct, they only get logged when I enable the verbose logging, using -v 6
I just reinstalled my rpi4, this time with the hypriot image, which includes docker and some optimalisations. Then I installed/started k3s with the --docker option. I now see that almost all cpu resources that are being used, are only coming from the k3s processes. Also checked docker stats which shows me that all the containers are running at 0.50% CPU or below.

@tim-brand
Copy link

I installed Ubuntu Server with Microk8s, and it's also having a high idle load. From what I can see now the load is a bit lower than with k3s, but still about 1.60 average.
So I'm starting to thing the 'default' load is because of the status and health checks of k8s.

@sraillard
Copy link
Author

No k3s maintainer can help us on understanding these consumed resources?

@brandond
Copy link
Member

brandond commented Sep 30, 2020

Here's what my pi4b looks like with an external MariaDB database backend; I don't have one running with sqlite at the moment.

Tasks: 150 total,   1 running, 149 sleeping,   0 stopped,   0 zombie
%Cpu(s):  2.1 us,  1.3 sy,  0.0 ni, 96.4 id,  0.0 wa,  0.0 hi,  0.2 si,  0.0 st
MiB Mem :   3793.3 total,    752.8 free,   1283.4 used,   1757.1 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   2692.5 avail Mem

    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    874 root      20   0 1235084 541524  83436 S  12.6  13.9   5526:57 k3s-server
    889 root      20   0  813640 134020  70096 S   1.0   3.5 413:30.53 containerd
   2241 root      20   0  743496  46852  28468 S   0.7   1.2 138:51.05 coredns
   2464 nobody    20   0  731852  39308  21040 S   0.7   1.0 233:01.06 kube-state-metr
sysadm@pi01:~$ w
 23:08:29 up 17 days, 11:31,  1 user,  load average: 0.30, 0.61, 0.53

sysadm@pi01:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.10 (Groovy Gorilla)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu Groovy Gorilla (development branch)"
VERSION_ID="20.10"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=groovy
UBUNTU_CODENAME=groovy

sysadm@pi01:~$ uname -a
Linux pi01.lan.khaus 5.4.0-1018-raspi #20-Ubuntu SMP Sun Sep 6 05:11:16 UTC 2020 aarch64 aarch64 aarch64 GNU/Linux

sysadm@pi01:~$ cat /etc/fstab
LABEL=writable	/	 ext4	defaults,noatime,commit=3600	0 0
LABEL=system-boot       /boot/firmware  vfat    defaults        0       1

ysadm@pi01:~$ cat /etc/systemd/journald.conf
[Journal]
Storage=volatile

sysadm@pi01:~$ cat /etc/default/flash-kernel
LINUX_KERNEL_CMDLINE="apparmor=0 mitigations=off cgroup_memory=1 cgroup_enable=memory max_loop=32"

@tim-brand
Copy link

tim-brand commented Sep 30, 2020

Here's what my pi4b looks like with an external MariaDB database backend; I don't have one running with sqlite at the moment.

sysadm@pi01:~$ w

 23:08:29 up 17 days, 11:31,  1 user,  load average: 0.30, 0.61, 0.53

That's a good load.
What OS and version of k3s are you using?

I've runned k3s with mariadb myself, but with a much higher idle load.

@tim-brand
Copy link

Also, I've installed "plain" kubernetes on Ubuntu 20.04 at my Pi4b, where I notice also and idle load of about 1.40 resulted by the kube-apiserver

@brandond
Copy link
Member

I've updated to add some details. I tweaked the fstab mount options and journald settings to reduce writes to the SD card, as well as disabling apparmor and cpu vulnerability mitigations. Not sure how much of a difference that makes, but you might give it a try.

@ngrilly
Copy link

ngrilly commented Nov 29, 2020

@sraillard This is exactly what surprises me when I read about people using k8s and k3s for IoT/edge projects. The computers we're using at the edge are much less powerful than my laptop.

@mangelajo
Copy link

Interesting, I know the control loop is checking state to keep it as wanted, but there is maybe some optimization where on low-end systems the intervals can be longer?

I suspect something like this would be the solution

@brandond
Copy link
Member

brandond commented Nov 30, 2020

We've added some resource profiling figures to our documentation, feel free to take a look:
https://rancher.com/docs/k3s/latest/en/installation/installation-requirements/resource-profiling/

FWIW, when we talk about edge we usually mean things like cell towers, stores, restaurants, forward operating bases, etc. While K3s has been designed to optimize CPU and memory utilization compared to other Kubernetes distributions, there will probably continue to be many extreme low-end use cases (including mobile, battery-backed operation) in which the overhead of a complex control plane is not desirable.

@onedr0p
Copy link
Contributor

onedr0p commented Nov 30, 2020

Excellent documentation @brandond thanks for sharing!

@sraillard
Copy link
Author

@brandond thank you for sharing your tests and I agree that edge computing may need some clarification as it's a very broad definition. In our case, it could be seen as rugged, standalone, low-spec server.

@mangelajo
Copy link

Thank you for sharing the excellent analysis @brandond . But still, if k3s had a way of modifying the default periods of control loops (I suspect it will be health checks, node updates, etc...) that load could drastically go lower, at the expense of slower response times.

Those are parameters which are configurable on all the subsystems k3s embeds.

For example, here [1], but there are others like kubelet, kubeproxy, etc.. it's all tunable.

Is it possible to access those settings from k3s?

--horizontal-pod-autoscaler-cpu-initialization-period duration
--horizontal-pod-autoscaler-downscale-stabilization duration 
--horizontal-pod-autoscaler-initial-readiness-delay duration
--horizontal-pod-autoscaler-sync-period duration   
--leader-elect-lease-duration duration    
--leader-elect-renew-deadline duration
--leader-elect-retry-period duration 
--log-flush-frequency duration   
--mirroring-endpointslice-updates-batch-period duration
--namespace-sync-period duration     
--node-monitor-grace-period duration   
--node-monitor-period duration    
--node-startup-grace-period duration  
--pod-eviction-timeout duration   
--pvclaimbinder-sync-period duration    
--resource-quota-sync-period duration
--route-reconciliation-period duration    

[1] https://kubernetes.io/docs/reference/command-line-tools-reference/kube-controller-manager/

@brandond
Copy link
Member

brandond commented Dec 1, 2020

@mangelajo you can play with those via the flags here:
https://rancher.com/docs/k3s/latest/en/installation/install-options/server-config/#customized-flags

But I don't think it'll change as much as you're hoping.

@KenMacD
Copy link

KenMacD commented Dec 4, 2020

FWIW, when we talk about edge we usually mean things like cell towers, stores, restaurants, forward operating bases, etc.

Hey @brandond. Your product page says:

K3s is a lightweight, certified Kubernetes distribution built for running production workloads inside IoT appliances or at the network edge. K3s is packaged as a single < 40MB binary for both x86 and Arm processors, and it works great on something as small as a Raspberry Pi or as large as an AWS a1.4xlarge 32GiB server.

If this isn't an environment where K3s is expected to run then I'd suggest updating the marketing material. Otherwise people like me end up wasting a lot of time trying to figure out why installing this on their home lightweight IoT setups brings the server to a complete halt.

Also if this isn't the environment being targeted would you have any suggestions for a similar product that would work?

@brandond
Copy link
Member

brandond commented Dec 4, 2020

@KenMacD The binary has grown a bit since that description was last updated, the repo itself currently states:

Production ready, easy to install, half the memory, all in a binary less than 100 MB.

We do target IoT appliances and the network edge, as I described in the statement you quoted. Note that an IoT appliance is NOT the same thing as an IoT device - an IoT appliance is a device that acts as a hub for IoT devices. Think NVIDIA Jetson, Google Coral, and the like. Lower-power but still capable systems designed to put processing and storage close to the devices generating data, capable of operating in a semi-detached state.

With regards to running on a Pi - I have three of them in my basement right now running k3s without issue. I'm not sure why anyone's surprised that k3s takes more than 25% of a 1.5Ghz embedded processor vs 10% of a 3.0Ghz enterprise datacenter chip. They do less work per cycle, so it takes more cycles to get the job done.

If you do find something that scales as well as k3s while using less CPU and memory, please let me know! I'd love to take notes. I apologize if you feel like you've wasted your time with our product. I generally take the view that any time spent learning or trying something new, is time well spent.

@sraillard
Copy link
Author

I honestly think Rancher is doing a great job to provide/certify/maintain k3s and k3s is a really nice tool.

Edge computing and resource-constrained appliances are broad definitions, different people may have different opinions. It's like unattended where I would think that we can leave the system in the wild untouched, but it's recommended to upgrade it and restart it for renewing the one year certificates issued (see #2342).

k3s is providing a Lightweight Kubernetes, clearly a lighter Kubernetes, but not something we may think is light. Kubernetes is providing a lot of services at a cost. Resources usage optimization don't look like a top priority for the Kubernetes development team certainly more focused on scalability and robustness.

I'm not speaking of storage as it isn't a real issue even on resources constrained servers, but detailing that few hundreds MB of files are downloaded will be more transparent that only writing the binary is less than 100 MB (because it's looking like everything is packaged in the binary file).

My final opinion is k3s can work on resources constrained servers at the expense of a significant amount of their resources used to provide the Kubernetes services. Once understood, I think it can be dealt with.

More than 2 months ago, when I opened this ticket, I was looking for some ways to optimize CPU and memory resources usage when deploying k3s. For now, I didn't find anything really working (I have tried to tune some kubelet and controller settings without luck). If someone think there are some interesting tuning options, I can leave the ticket open, otherwise I can close it.

@mangelajo
Copy link

My two cents, I tried tuning those settings

ExecStart=/usr/local/bin/k3s \
    server \
	'--docker' \
#        '--kube-controller-manager-arg' 'deployment-controller-sync-period=60s' \
#        '--kube-controller-manager-arg' 'endpoint-updates-batch-period=60s' \
#        '--kube-controller-manager-arg' 'endpointslice-updates-batch-period=60s' \
#        '--kube-controller-manager-arg' 'leader-elect-lease-duration=60s' \
#        '--kube-controller-manager-arg' 'leader-elect-renew-deadline=50s' \
#        '--kube-controller-manager-arg' 'log-flush-frequency=60s' \
#        '--kube-controller-manager-arg' 'node-monitor-period=60s' \
#        '--kubelet-arg' 'log-flush-frequency=60s'

Without a slight change (36% -> 25%) but the cluster was having issues with leader elections so I commented them back. I also tried using --docker to see if the management of pods was lightened, but no noticeable change either.

I will keep using k3s in a rpi4 (another one coming) as I'm going to replace a couple of x86_64 server I have at the office, which consume x10 the electricity.

I believe k3s does an outstanding job getting where other k8s distros can't, I wish I was able to trace that 20-30% (of a core) but I understand it's a very complex system.

@brandond
Copy link
Member

brandond commented Dec 4, 2020

@mangelajo if you're still experimenting, you might take a look at increasing the cadvisor stats collection interval - I think it's 10 seconds by default?

@KenMacD
Copy link

KenMacD commented Dec 4, 2020

@brandond hence my suggestion that maybe it's just marketing material that needs updating. The <40M and RPi sounds like it would be usable on an RPi1, which is doubtful.

I tried on a board running an Allwinner A20 (Cubieboard 2). It's running the board as what I'd consider an 'IoT Appliance'. It's receiving per-second temperature data that's being stored in influxedb with a Grafana dashboard from a IoT mesh network. It's also running a minidlna server, telegraf to pull weather data, multiple docker containers to control devices through the mesh network. My thoughts were that I could use k3s to make things more reliable.

Unfortunately I'm thinking it's the disk usage that killed it. The board keeps most data on a usb2 external disk. I spent a bit of time trying to get some type of ramfs before I found this ticket.

I understand you can't support everything, and that this is probably not a use-case worth your dev time. Which is why I come back to the suggestion of updating the marketing material so when new users like me come along and see '40Mb on an RPi' they know that isn't really a viable target install.

@brandond
Copy link
Member

brandond commented Dec 4, 2020

Just to be clear, 40MB was referring to the size of the binary executable, not how much disk or memory it needs. I personally run a complete monitoring stack including Prometheus + Grafana on a 4GB Pi4b, but with an external USB3 SSD since Prometheus is quite demanding in terms of IO.

@jiayihu
Copy link

jiayihu commented Dec 10, 2020

I think binary size is still important but the least one compared to memory, CPU and network usage when talking about constrained environments. As others I also got attracted by the marketing words, but I was wondering why the high usage of memory and CPU in my Raspberry Pi 3+ 1GB cluster. I would hardly consider a 4GB Pi 4D with external SSD ad edge computing. Using Pi 3 B+ was already a luxury for me, meant to use as first step before actually moving on cheaper Pi WH.

That being said, I really appreciate the project and afaik k3s is the best state of art. We would like just more consideration regards truly constrained environments in future.

@brandond
Copy link
Member

brandond commented Dec 11, 2020

I would hardly consider a 4GB Pi 4D with external SSD ad edge computing.

Jetsons, Corals, and (now) Pi4B/CM4s are what most people have in mind when they talk about edge computing. Pi3Bs and CMs are fairly capable for something like digital signage, or teaching STEM, but not so much for running an entire cluster.

@merlinvn
Copy link

I have similar problem here. Fresh install k3s with no pods running...

image

@brandond
Copy link
Member

brandond commented Dec 11, 2020

@merlinvn please see #2278 (comment)

Also note that it is best practice to run Kubernetes with swap disabled. The stock Kubernetes kubelet will actually fail to start if you have swap enabled, but K3s disables this check and leaves it to the user to decide if they need it or not.

@er1z
Copy link

er1z commented Dec 19, 2020

Still, no plans to conserve CPU? :(

There're people who don't need so frequent updates, any chance for letting us to cut off some CPU usage?

@brandond
Copy link
Member

brandond commented Dec 20, 2020

@er1z what would you suggest we remove that wouldn't negatively affect other users who still need those same things?

Kubernetes is a living project that will continue to grow, and that growth mostly targets larger deployments. K3s supports low-end hardware better than most distributions, but it is never going to be a zero-overhead option.

@er1z
Copy link

er1z commented Dec 21, 2020

I'm not into k3s architecture, but I assume being a Kubernetes distribution doesn't allow to change too many things. But I think, there's a room for,

  • allow to tweak some updates intervals (I'm pretty sure that there are situations that updates performed non-instant wouldn't be a big deal),
  • change internal formats, for example — apart from JSON add support for Protobuf based on content-negotiation (if K3s-K3s comm occurs, send via PB, otherwise — stick with JSON),
  • cache some responses, at least for a few seconds within binary-friendly storage.

These are just thoughts-ideas, may be some of them would guide you into coming across other ideas worth at least discussing or even implementing.

@ericrange
Copy link

ericrange commented Feb 1, 2021

im shocked that this loooong existing issue seems to will never be fixed :(

this issue is just the continuation of another loooong existing issue :D

very funny fact: k8s itself needs much less cpu util. than k3s :D

@unixfox
Copy link

unixfox commented Feb 1, 2021

very funny fact: k8s itself needs much less cpu util. than k3s :D

Can you prove that?

@onedr0p
Copy link
Contributor

onedr0p commented Feb 1, 2021

im shocked that this loooong existing issue seems to will never be fixed :(

this issue is just the continuation of another loooong existing issue :D

What exactly do you want fixed? Have you, yourself, provided any information to the maintainers that specifies an issue you are having?

@ericrange
Copy link

i have the same problems that everybody does, in this and in the other issue.
vanilla k3s using up to 20-30% cpu.

raspberry 4b, 8gb ram.

specs do not really matter. and tried this on every raspberry i got.
steps to reproduce:

  1. buy raspberry
  2. install k3s

@brandond
Copy link
Member

brandond commented Feb 1, 2021

Running Kubernetes is not free. It is expected that it will take about 0.5 to 0.75 of a single core on an embedded system like a Raspberry Pi just as a baseline. If you don't want the overhead of constant health checking, reconciliation, and metrics collection you might look at something a little more lightweight like docker-compose.

very funny fact: k8s itself needs much less cpu util. than k3s :D

This is false. It might appear that way at first glance because individual components (etcd, kube-apiserver, kube-controller-manager, kube-scheduler, kubelet) each run in their own process, but the overall footprint in terms of CPU, memory, and disk space will be higher.

I am going to close this issue because at this point it seems to just be attracting more folks who want to argue that Kubernetes should run with zero overhead on any arbitrary low-end or embedded system. Anyone who is curious why this isn't feasible can read the conversation above and click through to the offered links, with https://docs.k3s.io/reference/resource-profiling describing a good baseline.

@brandond brandond closed this as completed Feb 1, 2021
@k3s-io k3s-io locked as resolved and limited conversation to collaborators Feb 1, 2021
@brandond
Copy link
Member

brandond commented Jul 7, 2021

For anyone still following this, in addition to the resource profiling documentation we added, you might also checkout this research paper: http://ceur-ws.org/Vol-2839/paper11.pdf

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests