Unable to get the kubelet monitor to run #584

dylanlingelbach · 2021-06-17T22:06:03Z

I am running into a similar issue as #214 and #439

I am using the k8s.gcr.io/node-problem-detector/node-problem-detector:v0.8.8 image (which doesn't have systemd installed) and installing it into a bottlerocket host that does have systemd.

Everything works great until I try to enable the kubelet monitor as that shells out to systemctl to get uptime of kubelet.

I've tried mounting /bin/systemctl and the other suggestions in those issues without luck.

Is mounting systemctl from the host the only way to get the kubelet monitor running? Or is installing node-problem-detector on the host itself a better solution?

The text was updated successfully, but these errors were encountered:

hjkatz · 2021-07-23T18:46:38Z

I ran into these same problems and was able to successfully run node-problem-detector with access to systemctl and docker binaries using a wrapper Dockerfile image, like so:

FROM us.gcr.io/k8s-artifacts-prod/node-problem-detector/node-problem-detector:v0.8.9

RUN clean-install \
        curl \
        systemd \
        docker.io

I was also able to successfully get the log-counter and health-checker scripts and custom plugins to work (as expected with Node conditions and testing) with the following daemonset.yaml:

    spec:
      # health-checker for kubelet uses the local network to check kubelet's /healthz
      hostNetwork: true
      containers:
      - name: node-problem-detector
        command:
        - /node-problem-detector
        - --logtostderr
        - --config.system-log-monitor=[files]
        - --config.custom-plugin-monitor=[files]
        image: custom-registry/custom-image:0.8.9
        securityContext:
          privileged: true
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        volumeMounts:
        - name: log
          mountPath: /var/log
          readOnly: true
        - name: kmsg
          mountPath: /dev/kmsg
          readOnly: true
        # Make sure container is in the same timezone with the host.
        - name: localtime
          mountPath: /etc/localtime
          readOnly: true
        - name: config
          mountPath: /config
        - mountPath: /etc/machine-id
          name: machine-id
          readOnly: true
        - mountPath: /run/systemd/system
          name: systemd
        - mountPath: /var/run/docker.sock
          name: docker-sock
        - mountPath: /var/run/dbus/
          name: dbus
          mountPropagation: Bidirectional
      volumes:
      - name: log
        # Config `log` to your system log directory
        hostPath:
          path: /var/log/
      - name: kmsg
        hostPath:
          path: /dev/kmsg
      - name: localtime
        hostPath:
          path: /etc/localtime
      - name: config
        configMap:
          defaultMode: 0744
          name: node-problem-detector-config
      - name: machine-id
        hostPath:
          path: /etc/machine-id
          type: File
      - name: systemd
        hostPath:
          path: /run/systemd/system/
          type: Directory
      - name: dbus
        hostPath:
          path: /var/run/dbus/
          type: Directory
      - name: docker-sock
        hostPath:
          path: /var/run/docker.sock
          type: Socket

I hope that this helps others that run into these challenges. Cheers!

Related Issue:

peterrosell · 2021-08-12T13:22:22Z

I run into this issue due to error logs about missing systemctl and now see that there are more binaries that are missing.

I just played around with the Dockerfile in this repo to see what impact it makes to include these binaries in the default docker image.
The image size today is about 140MB
add systemctl - increase image size with 18MB
add curl - increase image size with 4MB
add docker.io - increase image size with 146MB

For me adding systemctl and curl seem to be no big deal. docker.io on the otherhand, I don't know. Many people has replaced docker with containerd, but on the other hand. If you enables the docker health check the binary is needed.

One idea can be to create a PR that adds systemctl and curl and let people build their own image for docker, but it's not that user friendly.

Any thoughts on this?

A few issues have popped up where the provided image doesn't have the required packages for certain health checking operations (like kubernetes#584 (comment)). This installs curl and systemd in the container to help alleviate these issues.

k8s-triage-robot · 2021-11-10T13:26:57Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

k8s-triage-robot · 2021-12-10T14:18:01Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

k8s-triage-robot · 2022-01-09T14:32:11Z

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen
Mark this issue or PR as fresh with /remove-lifecycle rotten
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

k8s-ci-robot · 2022-01-09T14:32:28Z

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied

After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied

After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Reopen this issue or PR with /reopen

Mark this issue or PR as fresh with /remove-lifecycle rotten

Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

vteratipally · 2022-04-22T18:05:51Z

/lgtm
/approve

com6056 mentioned this issue Sep 3, 2021

Install systemd in docker image #616

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 10, 2021

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 10, 2021

k8s-ci-robot closed this as completed Jan 9, 2022

wjam mentioned this issue Mar 3, 2022

Replace uses of systemctl/journalctl with the github.com/coreos/go-systemd library #648

Closed

balusarakesh mentioned this issue Jul 1, 2022

health-checker not working as expected for containerd #683

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to get the kubelet monitor to run #584

Unable to get the kubelet monitor to run #584

dylanlingelbach commented Jun 17, 2021

hjkatz commented Jul 23, 2021

peterrosell commented Aug 12, 2021 •

edited

Loading

k8s-triage-robot commented Nov 10, 2021

k8s-triage-robot commented Dec 10, 2021

k8s-triage-robot commented Jan 9, 2022

k8s-ci-robot commented Jan 9, 2022

vteratipally commented Apr 22, 2022

Unable to get the kubelet monitor to run #584

Unable to get the kubelet monitor to run #584

Comments

dylanlingelbach commented Jun 17, 2021

hjkatz commented Jul 23, 2021

peterrosell commented Aug 12, 2021 • edited Loading

k8s-triage-robot commented Nov 10, 2021

k8s-triage-robot commented Dec 10, 2021

k8s-triage-robot commented Jan 9, 2022

k8s-ci-robot commented Jan 9, 2022

vteratipally commented Apr 22, 2022

peterrosell commented Aug 12, 2021 •

edited

Loading