Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add healthcheck command #1257

Closed
PiDroid-B opened this issue Mar 26, 2022 · 21 comments · Fixed by #1725
Closed

Add healthcheck command #1257

PiDroid-B opened this issue Mar 26, 2022 · 21 comments · Fixed by #1725

Comments

@PiDroid-B
Copy link

Is your feature request related to a problem? Please describe.
No problem

Describe the solution you'd like
Add a healthcheck command to the image.

Describe alternatives you've considered
reopen #972 :)

Additional context
maybe something like that :

healthcheck.sh

#!/bin/sh

PERIOD_12H=43200

MAX_DELAY=$(( PERIOD_12H + 60 ))

# delay since last loop (seconds)
DELAY=$(($(date +%s) - $(date +%s -r /tmp/last_loop)))

[ "${DELAY}" -gt "${MAX_DELAY}" ] && exit 1

exit 0

And for the dockerfile :

HEALTHCHECK --interval=5m --timeout=2s --start-period=30s CMD sh healthcheck.sh

it's just an example to improve and adjust with env var. The watchtower job has just to make a touch on the file to update it

@github-actions
Copy link

Hi there! 👋🏼 As you're new to this repo, we'd like to suggest that you read our code of conduct as well as our contribution guidelines. Thanks a bunch for opening your first issue! 🙏

@piksel
Copy link
Member

piksel commented Mar 28, 2022

The thing is that since the watchtower container only contains a single binary, which is the entrypoint, it either a) is still running or b) the container has stopped (and restarted if that is configured).
If there ever is a situation when it is running, but the scheduler is somehow broken, then this might serve a purpose. But I have not encountered a such a problem.

@EVOTk
Copy link

EVOTk commented Apr 2, 2022

Hello,
It would be really nice to have the integration of this healthcheck in the container! Just for the overall view of the containers, that helps put it in the "healthy" section on Portainer.

2022-04-02 19_54_44-Window
2022-04-02 19_55_50-Window

@simskij
Copy link
Member

simskij commented Apr 4, 2022

My reasoning is that to motivate adding additional complexity to the docker image, it needs to do something worthwhile, ie. actually contributing to the operability or observability of the workload.

Adding a health check just to make it look nicer is not a sufficient argument to me.

@simskij simskij closed this as completed Apr 4, 2022
@mietzen
Copy link

mietzen commented Dec 11, 2022

My reasoning is that to motivate adding additional complexity to the docker image, it needs to do something worthwhile, ie. actually contributing to the operability or observability of the workload.

Adding a health check just to make it look nicer is not a sufficient argument to me.

I get your point.
Nevertheless if anybody searches for this, I've created my own version with health check for aesthetics reasons:
https://github.com/mietzen/watchtower-with-health-check

It's stupidly simple, build a binary that returns exit code 0 and use it as health check, thanks goes to: https://github.com/Soluto/golang-docker-healthcheck-example for the idea. But if anybody wants to add a more complex check, feel free to fork the my repo.

@ninjamonkey198206
Copy link

Health checks are simply good practice.
If there is a way for a system to monitor itself and restart failed processes, it should be added to increase the reliability of the system.
If, as is the case for Docker, external systems are available to monitor the health status of containers for monitoring, stat collection, debugging purposes (health checks and logs generated by the system that monitors said health status can narrow down when issues occurred), and failover, then again the functionality should be in place.
The potential benefits vastly outweigh the negative aspects of adding in the function.

@ninjamonkey198206
Copy link

Aside from that, this has made maintaining my server so much easier, so thank you all for your wonderful work.

@bugficks
Copy link
Contributor

bugficks commented Jul 7, 2023

A workaround would be to e. g. mount static wget/curl into container, enable http-api-metrics and check api endpoint

services:
  watchtower:
    ...
    volumes:
      ...
      - ./wget:/usr/bin/wget:ro
    command: |
      ...
      --http-api-metrics
      --http-api-token BearerToken
    healthcheck:
      test: ["CMD", "wget", "--quiet", "--spider", "--tries=1", "--header", "Authorization: Bearer BearerToken", "http://127.0.0.1:8080/v1/metrics"]
      ...

@piksel
Copy link
Member

piksel commented Jul 7, 2023

@bugficks I think you are missing the point a little bit here. Since the container only has a single binary, which is the sole execution target, the application is either running or has crashed, taking the container down with it. There is no known state where the container would be up, but not responding to a metrics-request (unless you disable it of course).
You might as well add bash using a volume and just exit with a zero code.

@bugficks
Copy link
Contributor

bugficks commented Jul 8, 2023

@piksel Well not really, I am just saying instead of modifying/fork image it might be better to simply mount e. g. wget as it
uses original image which results in less maintenance work.

@piksel
Copy link
Member

piksel commented Jul 8, 2023

@bugficks Yeah, sorry, it was a bit aggressive. It's a good point, that doing this instead of maintaining a fork seems like a much easier solution.

@mietzen
Copy link

mietzen commented Jul 8, 2023

@bugficks Yeah, sorry, it was a bit aggressive. It's a good point, that doing this instead of maintaining a fork seems like a much easier solution.

If by fork you mean my image, it’s no fork, it’s just a inherent Image with another binary. It’s build automatically every day based on the latest watchtower, so there is no need for maintenance.
But sure my way is also just a hack.

@lonix1
Copy link

lonix1 commented Aug 4, 2023

It's been said that a healthcheck is pointless because the container is either running or not.

I agree from watchtower's perspective: that is absolutely true.

But in a large infrastructure of many containers and services, on different servers, you'll use automation tools - and a standard way to monitor services is with healthchecks. (e.g. our ansible infrastructure uses healthchecks to determine container state and thus decide what actions to take.)

So although it adds zero benefit to watchtower's operation, the watchtower container is part of a larger system, and healthchecks are expected.

@denisz1
Copy link

denisz1 commented Aug 4, 2023

@lonix1 I have the same opinion. But is there another endpoint to use? I don't use metrics so I don't want to enable just to abuse it.

Another idea is to use http api endpoint with bad credentials then it will respond 401, and your calling code can assume "401 is good".

@lonix1
Copy link

lonix1 commented Aug 4, 2023

@denisz1 Well that would work for the healthcheck, but then you'd flood your logs with 401 errors.

I think those are the only two endpoints right now, but I'm unsure.

@denisz1
Copy link

denisz1 commented Aug 5, 2023

Yes true I didn't think of that.

Actually I just give up on this - mounting wget just feels bad. And you must "mount" a shell, and dependencies... it's get ugly fast.

The image needs its own healthcheck and for that it must be base on at least busybox or alpine, not scratch.

@lonix1
Copy link

lonix1 commented Aug 5, 2023

@denisz1

base on at least busybox or alpine, not scratch

The maintainers seem open to having a non-scratch image (e.g. alpine) as a secondary option. So if you feel strongly about it, might want to provide a PR for that. It would certainly be useful (and probably solve other issues/requests too).

@piksel
Copy link
Member

piksel commented Aug 5, 2023

I think it would be fine to add a command line argument to the watchtower binary that (optionally, just checks if there is another watchtower process running and then) returns 0.

Then we could add that as the HEALTHCHECK in the docker file.

This way, we still fulfill the contract of "when is the container considered healthy", despite the actual work the HEALTHCHECK command does being none.

The reasoning stated above as to why we only do that should of course be added to the docs

@lonix1
Copy link

lonix1 commented Aug 5, 2023

What a clever idea!

That would solve the issue without expanding the image's attack surface.

(Do you think maybe the issue could be reopened and put on the backlog then? ....so it's not forgotten.)

@ninjamonkey198206
Copy link

Woohoo! Thank you!

@OtenMoten
Copy link

OtenMoten commented Nov 6, 2024

Here's a simple sample:

CONTAINER ID   IMAGE                       COMMAND                  CREATED          STATUS                    PORTS                                                                      NAMES
34fa899bf061   containrrr/watchtower       "/watchtower --inter…"   2 minutes ago    Up 2 minutes (healthy)    8080/tcp                                                                   watchtower
services:

  watchtower:
    image: containrrr/watchtower
    restart: unless-stopped
    container_name: watchtower
    
    # Slightly adjusted resource limits based on typical production loads
    deploy:
      resources:
        limits:
          cpus: '1'
          memory: 1G
        reservations:
          cpus: '0.5'
          memory: 512M
          
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
      
    healthcheck:
      test: ["CMD", "/watchtower", "--health-check"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 5s
    
    cap_drop:
      - ALL
    cap_add:
      - CHOWN             # Only if you need to change file ownership
      - SETGID            # Only if you need to change group ID
      - SETUID            # Only if you need to change user ID
      - DAC_OVERRIDE      # Only if you need to bypass file permissions
    
    # Add logging limits
    logging:
      driver: "json-file"
      options:
        max-size: "100m"
        max-file: "5"
  
    environment:
      - REPO_USER=[ENTER YOUR DOCKER HUB USER]
      - REPO_PASS=[ENTER YOUR DOCKER HUB TOKEN]
      
    command: 
      --interval 15
      --include-restarting
      --include-stopped

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

10 participants