Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker Swarm healthcheck.test property does not validate for CMD/CMD-SHELL/NONE, results in strange behavior when invalid. #49034

Open
ryanhaney97 opened this issue Dec 4, 2024 · 0 comments
Labels
area/api area/cli area/stack kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/27.3

Comments

@ryanhaney97
Copy link

Description

healthcheck.test takes an array where the first element needs to be CMD, CMD-SHELL, or NONE. And in docker compose at least, if you try to compose up with an invalid value, it gives you a nice error message:
healthcheck.test must start either by "CMD", "CMD-SHELL" or "NONE"

HOWEVER, if you try to deploy an invalid value with the docker stack command, it does NOT give any sort of error message. Not only is that the case, but it also results in some unusual behavior.

Since it successfully creates the containers asked for (and always considers them healthy, regardless of what you put in the health test), but fails to properly start/create the service completely (presumably the service defaults to considering it unhealthy), resulting in it showing 0 replicas when using docker service ls, and also failing to make the service available to other services networked to it as a result as well. This odd contradictory behavior made things very confusing to debug.

Reproduce

  1. Create the following docker-compose.yml file:
services:
  db:
    image: postgres:alpine
    environment:
        POSTGRES_PASSWORD: password
    healthcheck:
      test: ["SOMETHINGINVALID", "pg_isready"]
      start_period: 3s
      interval: 1s
      timeout: 5s
      retries: 3

1.5.) Try running docker compose up, notice how it gives an error message.
2.) Initialize a swarm if you don't already have one with docker swarm init.
3.) Use docker stack up -c docker-compose.yml healthtest --detach=false, notice how it gives no error message, and hangs when trying to create the service.
4.) Exit out from the attached log to view what got created.
5.) Docker Desktop (or other methods) should show a container for the postgres image has been created, and is not being restarted or recreated.
6.) Use docker service ls, and note how it has 0/1 replicas started for the service, despite the clearly running container.
7.) (clean up) Use docker stack down healthtest when finished.

Expected behavior

When deploying to the swarm it should ideally validate the first healthcheck.test parameter like docker compose already does. If this is not viable for whatever reason, then it should AT LEAST give more consistent behavior for the invalid test condition, such as having the container fail its (invalid) health tests and see constant restarts (a clearer sign that something is wrong).

docker version

Client:
 Version:           27.3.1
 API version:       1.47
 Go version:        go1.22.7
 Git commit:        ce12230
 Built:             Fri Sep 20 11:38:18 2024
 OS/Arch:           darwin/arm64
 Context:           desktop-linux

Server: Docker Desktop 4.36.0 (175267)
 Engine:
  Version:          27.3.1
  API version:      1.47 (minimum version 1.24)
  Go version:       go1.22.7
  Git commit:       41ca978
  Built:            Fri Sep 20 11:41:19 2024
  OS/Arch:          linux/arm64
  Experimental:     false
 containerd:
  Version:          1.7.21
  GitCommit:        472731909fa34bd7bc9c087e4c27943f9835f111
 runc:
  Version:          1.1.13
  GitCommit:        v1.1.13-0-g58aa920
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

docker info

Client:
 Version:    27.3.1
 Context:    desktop-linux
 Debug Mode: false
 Plugins:
  ai: Ask Gordon - Docker Agent (Docker Inc.)
    Version:  v0.1.0
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-ai
  buildx: Docker Buildx (Docker Inc.)
    Version:  v0.18.0-desktop.2
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-buildx
  compose: Docker Compose (Docker Inc.)
    Version:  v2.30.3-desktop.1
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-compose
  debug: Get a shell into any image or container (Docker Inc.)
    Version:  0.0.37
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-debug
  desktop: Docker Desktop commands (Alpha) (Docker Inc.)
    Version:  v0.0.15
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-desktop
  dev: Docker Dev Environments (Docker Inc.)
    Version:  v0.1.2
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-dev
  extension: Manages Docker extensions (Docker Inc.)
    Version:  v0.2.27
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-extension
  feedback: Provide feedback, right in your terminal! (Docker Inc.)
    Version:  v1.0.5
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-feedback
  init: Creates Docker-related starter files for your project (Docker Inc.)
    Version:  v1.4.0
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-init
  sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.)
    Version:  0.6.0
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-sbom
  scout: Docker Scout (Docker Inc.)
    Version:  v1.15.0
    Path:     /Users/ryanhaney/.docker/cli-plugins/docker-scout

Server:
 Containers: 0
  Running: 0
  Paused: 0
  Stopped: 0
 Images: 24
 Server Version: 27.3.1
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Using metacopy: false
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 2
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog
 Swarm: active
  NodeID: pjuuugw26u7nlvsrsr628oms1
  Is Manager: true
  ClusterID: vizrmfw3ddm5sqmztzmfpxu9g
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8  
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.65.3
  Manager Addresses:
   192.168.65.3:2377
 Runtimes: io.containerd.runc.v2 runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 472731909fa34bd7bc9c087e4c27943f9835f111
 runc version: v1.1.13-0-g58aa920
 init version: de40ad0
 Security Options:
  seccomp
   Profile: unconfined
  cgroupns
 Kernel Version: 6.10.14-linuxkit
 Operating System: Docker Desktop
 OSType: linux
 Architecture: aarch64
 CPUs: 8
 Total Memory: 3.827GiB
 Name: docker-desktop
 ID: 3ab93200-4d61-4aaf-9261-7f848b076cf2
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 HTTP Proxy: http.docker.internal:3128
 HTTPS Proxy: http.docker.internal:3128
 No Proxy: hubproxy.docker.internal
 Labels:
  com.docker.desktop.address=unix:///Users/ryanhaney/Library/Containers/com.docker.docker/Data/docker-cli.sock
 Experimental: false
 Insecure Registries:
  hubproxy.docker.internal:5555
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: daemon is not using the default seccomp profile

Additional Info

  • I originally encountered this bug when I mistyped CMD-SHELL as CMD_SHELL. I feel like this is an easy enough mistake to make, and would be much harder to debug if such a mistake is made in a larger environment, especially given the odd behavior that results from it.
  • "pg_ready" in the above example could be anything, even something guaranteed to fail like "cat notafile" would result in the same behavior.
  • I don't think the other healthcheck parameters matter much, those are just what I was using.
@ryanhaney97 ryanhaney97 added kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. status/0-triage labels Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api area/cli area/stack kind/bug Bugs are bugs. The cause may or may not be known at triage time so debugging may be needed. version/27.3
Projects
None yet
Development

No branches or pull requests

2 participants