Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Built-in Docker Healthcheck #1126

Closed
firecow opened this issue Dec 16, 2023 · 10 comments · Fixed by #1135
Closed

Built-in Docker Healthcheck #1126

firecow opened this issue Dec 16, 2023 · 10 comments · Fixed by #1135
Labels
Priority: Normal Minor issue impacting one or more users Type: Feature Request A big idea that would be split into smaller pieces

Comments

@firecow
Copy link
Contributor

firecow commented Dec 16, 2023

Describe the feature you'd like
I would like the docker image to contain curl or wget so it's possible to create a docker healthcheck that docker-compose or docker swarm can use.

It would be nice the healthchecking is just native to the cloudflare/cloudflared image like so.

FROM cloudflare/cloudflared:2023.10.0 as cloudflared
FROM alpine:3.18.4

COPY --from=cloudflared /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt
COPY --from=cloudflared /usr/local/bin/cloudflared /usr/local/bin

ENV TUNNEL_METRICS="localhost:2000"

HEALTHCHECK --interval=5s --retries=6 --timeout=3s CMD wget -q ${TUNNEL_METRICS}/ready -O -

If building from alpine is something you don't want to do, then provide a cloudflare tunnel health command that can be used like so

HEALTHCHECK --interval=5s --retries=6 --timeout=3s CMD cloudflare tunnel health --ready-endpoint localhost:2000/ready

or like so

services:
  cloudflared:
    image: cloudflare/cloudflared
    environment:
      TUNNEL_METRICS: "localhost:2000"
    healthcheck:
      command: cloudflared tunnel health --ready-endpoint ${TUNNEL_METRICS}/ready
      interval: 5s
      retries: 6
      timeout: 3s

Describe alternatives you've considered
Build my own image built on-top of the cloudflare/cloudflared image or use third-party images built on top of cloudflare's

@firecow firecow added Priority: Normal Minor issue impacting one or more users Type: Feature Request A big idea that would be split into smaller pieces labels Dec 16, 2023
@mordilloSan
Copy link

I second this. Would be very helpful when using auto heal.

@Letgamer
Copy link

Using the following Dockerfile could be working:

FROM tarampampam/curl as curl

FROM erisamoe/cloudflared
COPY --from=curl /bin/curl /bin/

Credits:
https://github.com/sidevesh/cloudflared-with-health-check

@maggie44
Copy link

maggie44 commented Jan 3, 2024

I think that PR is helpful, although it may also be worth reevaluating whether the current base-debian11 containers still make sense as well. If the goal is to have a minimal container, it may as well be scratch. The Dockerfile (https://github.com/cloudflare/cloudflared/blob/master/Dockerfile) mentions # use a distroless base image with glibc but I'm not sure why glibc would be required for the Golang compiled binary.

However, I would find a more helpful base image to be busybox. It contains some basic tools that would help with debugging. I am currently running Cloudflared on a Kubernetes cluster and for some reason it isn't seeing my service which it is supposed to redirect to. It would be really helpful if I could get in to that container and be able to ping my service endpoint to help with debugging my Kubernetes infrastructure. I tend to favour busybox over Alpine as it is even more minimal than Alpine, and subsequently has very few updates. As a maintainer you often find yourself constantly bumping the Alpine container version for no reason just to try and keep up.

There is a case to be made that a scratch container has additional security benefits, but it's a really hard case to make. If someone has got as far as being able to send exec commands to your busybox container, it's game over either way.

@CodaBool
Copy link

CodaBool commented May 18, 2024

Using the following Dockerfile could be working COPY --from=tarampampam/curl /bin/curl /bin/

No one is going to use copy curl from a unreputable image like that. But the general method is right. It is better however, to use wget for something like this. It's much smaller in size (~1Mb when using busybox's version).

Here is an example

FROM gcr.io/distroless/java21-debian12:nonroot
COPY --from=busybox:stable-glibc /bin/wget /usr/bin/wget

If you wanted to save 500Kb-1Mb and add security the alternative is add healthcheck Go code to the repo. This could either be a separate static binary with a http call. Or add onto the CLI with a flag, like how watchtower does it (looking at this a second time, it seems watchtower is actually just checking if another watchtower process is running, which is not really a healthcheck). It would need to follow the docker spec and do exit 0 on healthy. Exit 1 on unhealthy.

I have 2 PRs I did recently with other dockerfiles that expand the app to have a healthcheck feature. So, those could be looked as general examples (but none of them are in go, they are in rust, and in node+python)

If the goal is to have a minimal container, it may as well be scratch

debian distroless is fine. No need to change off of it. Tons of Go projects work this way.

@maggie44
Copy link

maggie44 commented May 19, 2024

A good rule of thumb with Docker images, is to put in what you need and no more. Distroless image is far more than is needed. Not a deal breaker or concern, but unnecessary if it's a pure go build.

wget a good option if space saving is a concern.

Also a tiny-curl version, but would suggest building from source rather than importing packages from others: https://github.com/maggie44/docker-tiny-curl

Or, as mentioned above, make the base image busybox or Alpine which includes wget, more secure and stable than building and copying in your own, and is 4mb (busybox decompressed) instead of distroless image's 18mb even before copying in wget.

@firecow
Copy link
Contributor Author

firecow commented May 19, 2024

All of this is irrelevant, if the cloudflared binary could execute a command to check the /ready endpoint

#1135 We need this merged, to close this issue.

@kamaradski
Copy link

I'm testing the curl method:

FROM tarampampam/curl as curl
FROM erisamoe/cloudflared
COPY --from=curl /bin/curl /bin/

in AWS ECS like this:

resource "aws_ecs_task_definition" "ecs_cloudflared_task_dev" {
  family                   = "cloudflare-tunnel"
  network_mode             = "awsvpc"
  requires_compatibilities = ["FARGATE"]
  cpu                      = 512
  memory                   = 1024
  execution_role_arn       = aws_iam_role.ecs_task_execution_role_dev.arn
  task_role_arn            = aws_iam_role.ecs_task_execution_role_dev.arn
  tags                     = module.meta_ecs_cloudflared_task_dev.tags

  container_definitions = jsonencode([
    {
      name      = module.meta_ecs_cloudflared_task_dev.name
      image     = "${aws_ecr_repository.ecr_cloudflare_tunnel_dev.repository_url}:latest",
      essential = true
      command = [
        "tunnel",
        "--metrics", "0.0.0.0:3333",
        "--no-autoupdate",
        "run",
        "--token", cloudflare_tunnel.cloudflare_tunnel_dev.tunnel_token
      ]

      logConfiguration = {
        logDriver = "awslogs"
        options = {
          "awslogs-group"         = module.meta_cloudwatch_loggroup_dev.name
          "awslogs-region"        = local.base.region
          "awslogs-stream-prefix" = "cloudflare-tunnel"
        }
      }

      portMappings = [
        {
          containerPort = 3333
          hostPort      = 3333
          protocol      = "tcp"
        }
      ]

      # https://docs.aws.amazon.com/AmazonECS/latest/developerguide/task_definition_parameters.html#container_definition_healthcheck
      healthCheck = {
        command     = ["CMD-SHELL", "curl -f http://localhost:3333/metrics || exit 1"],
        interval    = 10 # Interval in seconds between health checks
        timeout     = 10 # Timeout in seconds for each health check
        retries     = 3  # Number of retries before marking the container as unhealthy
        startPeriod = 10 # Optional: grace period to allow the container to start
      }
    }
  ])
}

But the healthcheck is not working in ECS. (it does locally... 🤷 ). So i'm really waiting for #1135 to get released and have a proper standard solution.

@huyz
Copy link

huyz commented Oct 6, 2024

This is badly needed. For me, the Cloudflared docker container locks up every few days for unknown reasons

@jahands
Copy link

jahands commented Oct 12, 2024

I ran into this and am surprised there's not a buitl-in way to do healthchecks.

I wrote up my workaround where I ended up adding a separate service to my docker-compose.yml for health checks.

@CodaBool
Copy link

CodaBool commented Oct 12, 2024

Wait @jahands don't you work at cloudflare?

Can you help have PR #1135 get official attention?

@firecow #1135 is the better PR right? Why did you open this one? #1135 is a no dependency or bloat solution, why add curl

Nvm 1135 adds the command this one runs it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: Normal Minor issue impacting one or more users Type: Feature Request A big idea that would be split into smaller pieces
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants