-
Notifications
You must be signed in to change notification settings - Fork 495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Kamal 2] Error: target failed to become healthy #1041
Comments
By default, Kamal Proxy checks port 80. Since you are using port 3000, specify the app_port setting in the proxy configuration:
|
If you look at the bottom of the Dockerfile, you can see
The container is supposed to be using basecamp/thruster which by default should expose port 80. Nowhere has port 3000 been defined. Adding your suggestion of app_port makes things worse. @dhh, thruster doesn't need any fancy config, does it? |
I have the same issue as @randohinn , using Thruster, though my specific error was
and when I tried to remove thruster and use just Might help to mention that I use Cloudflare and have set ssl mode to Full, also set |
Try to replicate this on a new rails 8 beta app. I've done a million deploys with that without trouble. Then compare the difference between that, if you get it working, and your own app. Then we can hopefully narrow down if there's a bug or a configuration issue. But need a minimal reproducible error to move forward. |
Cool, will do. I presume I cannot run two kamal versions so I can use my staging setup with kamal 1 yeah? I'll need to either wipe that or get a new box? |
Yeah, you can't run both versions on the same box at the same time. |
I've also encountered this problem.
# Skip http-to-https redirect for the default health check endpoint.
config.ssl_options = { redirect: { exclude: ->(request) { request.path == “/up” } } |
Expanding on previous comment - see https://nts.strzibny.name/upgrading-to-kamal-2/ - try Also do you have |
Thanks, @bergatron! NOTE: And do not forget to deploy those changes before the upgrade! |
On a fresh Hertz instance with a Rails 7.2.1 application and Kamal 2, I had to set I've deployed a new Rails 8 app onto it and I didn't need to set that. I'm not sure why. |
FWIW, I ran into a similar issue ( The solution then is to skip this middleware for the health check end point (typically |
I'm facing the same issue using Rails 7.1.3 and Kamal 2.1.0, it just stopped to worked after migrate from the version 1 to 2. The logs: INFO [1ade11b4] Running docker run --detach --restart unless-stopped --name container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36 --network kamal --hostname web_deploy-5fbe94df2c1a -e KAMAL_CONTAINER_NAME="container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36" -e KAMAL_VERSION="b5bbe86b667959d01a09f5ae528df1a7cd78dw36" --env RAILS_MAX_THREADS="5" --env WEB_CONCURRENCY="5" --env APPLICATION_HOST="web" --env RACK_ENV="production" --env RAILS_ENV="production" --env RAILS_SERVE_STATIC_FILES="true" --env-file .kamal/apps/container-name/env/roles/web.env --log-opt max-size="10m" --volume $(pwd)/.kamal/apps/container-name/assets/volumes/web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36:/rails/public/assets --label service="container-name" --label role="web" --label destination destination/container-name:b5bbe86b667959d01a09f5ae528df1a7cd78dw36 on web_deploy
INFO [1ade11b4] Finished in 0.624 seconds with exit status 0 (successful).
INFO [0638ab65] Running docker container ls --all --filter name=^container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36$ --quiet on web_deploy
INFO [0638ab65] Finished in 0.136 seconds with exit status 0 (successful).
INFO [e452cf6d] Running docker exec kamal-proxy kamal-proxy deploy container-name-web --target="d4c42b8aa2ba:3000" --host="my-host.com" --tls --deploy-timeout="30s" --drain-timeout="30s" --health-check-interval="7s" --health-check-timeout="49s" --health-check-path="/up" --buffer-requests --buffer-responses --log-request-header="Cache-Control" --log-request-header="Last-Modified" --log-request-header="User-Agent" on web_deploy
ERROR Failed to boot web on web_deploy
INFO First web container is unhealthy on web_deploy, not booting any other roles
INFO [5af2ac24] Running docker container ls --all --filter name=^container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36$ --quiet | xargs docker logs --timestamps 2>&1 on web_deploy
INFO [5af2ac24] Finished in 0.171 seconds with exit status 0 (successful).
ERROR 2024-10-07T18:35:40.730291457Z == Preparing DB ==
2024-10-07T18:36:00.706423379Z == Setup authorization ==
2024-10-07T18:36:00.706484443Z == Adding roles ==
2024-10-07T18:36:00.706494663Z == Checking roles ==
2024-10-07T18:36:00.706504077Z == No new role was found ==
2024-10-07T18:36:00.706513000Z == Done ==
2024-10-07T18:36:00.706521316Z == Adding permissions ==
2024-10-07T18:36:00.706529314Z == Checking permissions ==
2024-10-07T18:36:00.706537506Z == No new permissions was found ==
2024-10-07T18:36:00.706545706Z == Adding attaching permissions to roles ==
2024-10-07T18:36:00.706553889Z == Attaching permissions to roles ==
2024-10-07T18:36:00.706561985Z == Done ==
INFO [8d453758] Running docker container ls --all --filter name=^container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36$ --quiet | xargs docker inspect --format '{{json .State.Health}}' on web_deploy
INFO [8d453758] Finished in 0.164 seconds with exit status 0 (successful).
ERROR null
INFO [74631586] Running docker container ls --all --filter name=^container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36$ --quiet | xargs docker stop on web_deploy
INFO [74631586] Finished in 10.531 seconds with exit status 0 (successful).
Releasing the deploy lock...
Finished all in 60.3 seconds
ERROR (SSHKit::Command::Failed): Exception while executing on host web_deploy: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: target failed to become healthy Deploy.yml # Name of your application. Used to uniquely configure containers.
service: my-service
image: KAMAL_REGISTRY_USER/my-service
servers:
web:
hosts:
- web_deploy
workers:
hosts:
- web_deploy
cmd: "sidekiq"
labels:
workers.service: sidekiq
# Credentials for your image host.
registry:
username: KAMAL_REGISTRY_USER
password:
- KAMAL_REGISTRY_PASSWORD
env:
clear:
RAILS_MAX_THREADS: 5
WEB_CONCURRENCY: 5
APPLICATION_HOST: web
RACK_ENV: production
RAILS_ENV: production
RAILS_SERVE_STATIC_FILES: true
secret:
- Secrets
ssh:
user: USER_HOST
port: SSH_PORT
# Configure builder setup.
builder:
arch: amd64
remote: ssh://USER_HOST@WEB_HOST:SSH_PORT
cache:
options: --no-cache
args:
GIT_REV: <%= `git rev-parse --short HEAD` %>
BUILD_DATE: <%= `date -u +"%Y-%m-%dT%H:%M:%S %Z"` %>
accessories:
db:
image: postgres:16
host: accessories_deploy
port: 5432
env:
secret:
- POSTGRES_DATABASE
- POSTGRES_USER
- POSTGRES_PASSWORD
directories:
- data/postgres:/var/lib/postgresql/data
files:
- infrastructure/postgres/postgresql.conf:/usr/local/share/postgresql/postgresql.conf.sample
redis:
image: redis:7-alpine
host: accessories_deploy
port: 6379
directories:
- data/redis:/data
files:
- infrastructure/redis/redis.conf:/etc/redis/redis.conf
- infrastructure/redis/redis-sysctl.conf:/etc/sysctl.conf
cmd: redis-server /etc/redis/redis.conf
proxy:
ssl: true
host: my-host.com
app_port: 3000
healthcheck:
path: /up
interval: 7
timeout: 49
asset_path: /rails/public/assets
primary_role: web Dockerfile: # syntax = docker/dockerfile:1
# Make sure RUBY_VERSION matches the Ruby version in .ruby-version and Gemfile
ARG RUBY_VERSION=3.3.0
FROM registry.docker.com/library/ruby:$RUBY_VERSION-slim as base
# Rails app lives here
WORKDIR /rails
# Set production environment
ENV RAILS_ENV="production" \
BUNDLE_DEPLOYMENT="1" \
BUNDLE_PATH="/usr/local/bundle" \
BUNDLE_WITHOUT="development"
# Throw-away build stage to reduce size of final image
FROM base as build
# Install packages needed to build gems
RUN apt-get update && apt-get -y install --no-install-recommends \
postgresql-client libpq-dev tar git libssl-dev cron \
zlib1g-dev libyaml-dev curl libreadline-dev \
build-essential gnupg2 imagemagick libjpeg-dev libpng-dev libtiff-dev \
libwebp-dev libvips tzdata gifsicle tmux nodejs redis-tools && \
rm -rf /var/lib/apt/lists /var/cache/apt/archives
# Install application gems
COPY Gemfile Gemfile.lock ./
RUN bundle install && \
rm -rf ~/.bundle/ "${BUNDLE_PATH}"/ruby/*/cache "${BUNDLE_PATH}"/ruby/*/bundler/gems/*/.git && \
bundle exec bootsnap precompile --gemfile
# Copy application code
COPY . .
# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/
# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN DISABLE_DATABASE_ENVIRONMENT_CHECK=1 SECRET_KEY_BASE_DUMMY=1 ./bin/rails assets:precompile
# Final stage for app image
FROM base
# Install packages needed for deployment
RUN apt-get update && apt-get -y install --no-install-recommends \
build-essential gnupg2 tar git libssl-dev cron \
zlib1g-dev libyaml-dev libreadline-dev curl \
postgresql-client libpq-dev openssh-client nodejs \
imagemagick libjpeg-dev libpng-dev libtiff-dev python-is-python3 \
libwebp-dev libvips tzdata gifsicle tmux redis-tools acl && \
rm -rf /var/lib/apt/lists /var/cache/apt/archives
ARG NODE_VERSION=20.12.2
ENV PATH=/usr/local/node/bin:$PATH
RUN curl -sL https://github.com/nodenv/node-build/archive/master.tar.gz | tar xz -C /tmp/ && \
/tmp/node-build-master/bin/node-build "${NODE_VERSION}" /usr/local/node && \
npm install -g mjml && \
rm -rf /tmp/node-build-master
# Copy built artifacts: gems, application
COPY --from=build /usr/local/bundle /usr/local/bundle
COPY --from=build /rails /rails
# Cron service
RUN service cron start
RUN bundle exec whenever --set 'environment=production' --user 'root' --update-crontab
# Run and own only the runtime files as a non-root user for security
RUN useradd rails --create-home --shell /bin/bash && \
chown -R rails:rails db log storage tmp
USER rails:rails
# Entrypoint prepares the database.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]
# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
CMD ["./bin/rails", "server"] |
@guilhermecaixeta did you set force_ssl to be false in as per #1041 (comment) |
Yep, I did. Still returning the same error... |
@guilhermecaixeta would be great if you would share app logs, too with error. I believe as @sasharevzin mentioned you will find there something like |
@guilhermecaixeta maybe it's a port 80 error. Thruster expose the port 80 and I think it's the new default port in kamal 2. But you can try to specify port 3000 in the |
This is the point there is no error in the app, I've started one of the kamal exited container it just take too long to start (this is other issue) but after some time it starts properly. It looks like the healthcheck is not taking in consideration the interval set.... |
I'll try that also. |
@guilhermecaixeta or you can also add thruster. You will need some change in the entrypoint file. |
Would it be possible to create a rake task that could check if a Rails 7.x app is setup correctly to work with Kamal 2? It could check the Dockerfile and deploy.yml etc for consistency |
The latest errors that I got from my pipeline 2024-10-07T21:38:00.533423809Z => Rails 7.1.4 application starting in production
2024-10-07T21:38:00.533437229Z => Run `bin/rails server --help` for more startup options
2024-10-07T21:38:03.337230298Z [1] Puma starting in cluster mode...
2024-10-07T21:38:03.337372810Z [1] * Puma version: 6.4.3 (ruby 3.3.0-p0) ("The Eagle of Durango")
2024-10-07T21:38:03.337379098Z [1] * Min threads: 5
2024-10-07T21:38:03.337382920Z [1] * Max threads: 5
2024-10-07T21:38:03.33738[67](https://github.com/git/project/actions/runs/11224348124/job/31200853412#step:13:68)02Z [1] * Environment: production
2024-10-07T21:38:03.337390328Z [1] * Master PID: 1
2024-10-07T21:38:03.337393882Z [1] * Workers: 5
2024-10-07T21:38:03.337397386Z [1] * Restarts: (✔) hot (✖) phased
2024-10-07T21:38:03.337410732Z [1] * Preloading application
2024-10-07T21:38:03.339236940Z [1] * Listening on http://0.0.0.0:3000
2024-10-07T21:38:03.339925538Z [1] Use Ctrl-C to stop
2024-10-07T21:38:03.372720317Z [1] - Worker 0 (PID: 104) booted in 0.02s, phase: 0
2024-10-07T21:38:03.372833943Z [1] - Worker 1 (PID: 107) booted in 0.02s, phase: 0
2024-10-07T21:38:03.373391481Z [1] - Worker 2 (PID: 112) booted in 0.01s, phase: 0
2024-10-07T21:38:03.373545035Z [1] - Worker 3 (PID: 118) booted in 0.01s, phase: 0
2024-10-07T21:38:03.373[68](https://github.com/project/actions/runs/11224348124/job/31200853412#step:13:69)8959Z [1] - Worker 4 (PID: 124) booted in 0.0s, phase: 0
2024-10-07T21:38:09.999357642Z I, [2024-10-07T21:38:09.998846 #107] INFO -- : [72ca2793-bb8d-41ed-badc-05a3e3d08349] Started GET "/up" for 172.18.0.2 at 2024-10-07 21:38:09 +0000
2024-10-07T21:38:10.024854868Z I, [2024-10-07T21:38:10.024513 #107] INFO -- : [72ca2793-bb8d-41ed-badc-05a3e3d08349] Processing by Rails::HealthController#show as HTML
2024-10-07T21:38:10.031234764Z I, [2024-10-07T21:38:10.030760 #107] INFO -- : [72ca2793-bb8d-41ed-badc-05a3e3d08349] Completed 200 OK in 6ms (Views: 4.0ms | ActiveRecord: 0.0ms | Allocations: [69](https://github.com/project/actions/runs/11224348124/job/31200853412#step:13:70)1)
INFO [a2db0df9] Running docker container ls --all --filter name=^business-manager-web-production-b5d5feea8b183[71](https://github.com/project/actions/runs/11224348124/job/31200853412#step:13:72)9004184c69ea63c8df7e5ae3f$ --quiet | xargs docker inspect --format '{{json .State.Health}}' on web_deploy
INFO [a2db0df9] Finished in 0.609 seconds with exit status 0 (successful).
ERROR null
INFO [5adbe91b] Running docker container ls --all --filter name=^business-manager-web-production-b5d5feea8b183719004184c69ea63c8df7e5ae3f$ --quiet | xargs docker stop on web_deploy
INFO [5adbe91b] Finished in 1.403 seconds with exit status 0 (successful).
Releasing the deploy lock...
Finished all in 63.1 seconds
ERROR (SSHKit::Command::Failed): Exception while executing on host web_deploy: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: host settings conflict with another service TLDR; If I restart this same container it will works fine. |
@dhh I've known no luck so far. I've updated my system to Rails 8 beta, and tried all the suggestions on this thread and others, and I am consistently getting the same error. I've added more details to basecamp/thruster#42 (comment) as it's looking more like a thruster issue |
My bad, so I was using this gem https://github.com/rameerez/allgood to cover more ground with my health checks, and some of my checks defined there were failing, but without a proper error to understand the cause. Everything works after removing it! |
@guilhermecaixeta For other's issues :
|
@guilhermecaixeta Have you tried using Gemfile # Add HTTP asset caching/compression and X-Sendfile acceleration to Puma [https://github.com/basecamp/thruster/]
gem "thruster", require: false Dockerfile # Start the server by default, this can be overwritten at runtime
EXPOSE 3000
CMD ["./bin/thrust", "./bin/rails", "server"] Running on Rails 7, I also had to make sure production.rb config.force_ssl = false
config.host_authorization = { exclude: ->(request) { request.path == "/up" } } routes.rb # Reveal health status on /up that returns 200 if the app boots with no exceptions, otherwise 500.
# Can be used by load balancers and uptime monitors to verify that the app is live.
get "up" => "rails/health#show", as: :rails_health_check |
@dhh I am facing this same issue while deploying a wordpress site with Logs of
|
@kishaningithub https://github.com/basecamp/kamal/blob/main/lib/kamal/cli/app/boot.rb#L54C7-L 60C9 I think the health check is mandatory to release the deployment's lock. |
I fixed this by just adding an endpoint that returns a 200 ok for "/up" endpoint. Literally spent the whole weekend trying to figure it out and the solution was so simple. My app is serving on port 80, Dockerfile EXPOSE 80 and my deploy.yml proxy settings: If anyone needs any for info on this let me know! |
@bpiroman others did not have the issue with implementing the endpoints, but skipping SSL and increasing timeouts before health check marks the container as healthy. |
Container health check params must be the default IMO people can override in kamal if they want to |
I was able to fix this error on my end. production.rb deploy.yml
Disabled force_ssl from Rails to make way for Kamal2 SSL. |
Use |
Got hung up on this with a Django install. This post got me up and running. Summary is create a middleware and put it first in the list to bypass the # <project>/middleware.py
from django.http import HttpResponse
class HealthCheckMiddleware:
def __init__(self, get_response):
self.get_response = get_response
def __call__(self, request):
if request.path == "/up":
# Or perform any "real" health checking, if needed
response = HttpResponse("OK")
else:
response = self.get_response(request)
return response |
Yeah this is still a problem for other frameworks which don't let you do HTTP/S handling on a per-route basis. For instance, I am trying to host a Next 14 project and running into this problem. |
@dhh Can this be reopened? I feel the default of "/up" for health check is not right and instead the default should come from the HEALTHCHECK defined in the docker image |
I recall us investigating the healthcheck feature in the past and finding it wasn't suitable for our needs, especially around the gap-less deploys. Don't recall the specifics, though. Maybe @djmb remembers why we discarded the healthcheck as an option. Either way, it's not something that's on the menu to change in the short term. Tons of successful deployments in the wild now that are built around using /up as the healthcheck. |
We need some way of simply disabling the health check. Currently we are blocked on a deployment because of a healthcheck being unhealthy, probably because of the HTTPS problem. |
I don't see any path to deployment without a working healthcheck (whether that's /up or HEALTHCHECK inside the container) since that's the mechanism we need to ensure the gap-less deploy. Otherwise you're going to be throwing 500 errors from the proxy for the duration of the app boot, which isn't what anyone wants. |
@djmb Is the reasoning of preferring /up over docker HEALTHCHECK documented somewhere? Would be great to know the train of thought that went behind it. |
@dhh Thanks for explaining how healtchecks are used for gap-less deployments. That actually makes a lot of sense and inspired me to find a solution for this. Thanks to the post by @scuml , I have this solution for Nextjs users struggling with getting healthchecks working on Kamal 2.0. Place this in your middleware. Point your healthcheck at import { NextRequest, NextResponse } from "next/server";
export function middleware(request: NextRequest) {
if (request.nextUrl.pathname === "/up") {
return new Response("OK", { status: 200 });
}
return NextResponse.next();
} |
This is what worked for me on Rails 7.2.2, I mean setting @jjatinggoyal you're the MVP! 😄 Thanks! |
Was definitely what I needed, but I also have other setup steps running in my
This stopped the following from happening again:
|
Yep, I found that as well #1041 (comment) |
I've the same problem with rails 8. My deploy end with "docker stderr: Error: target failed to become healthy". The problem is this command:
The json does not contains the Health key. My dockerfile:
and my deploy.yml:
Please help! |
Hi guys, I'm having a similar problem, but I noticed it only occurs when the Cloudflare DNS Proxy is on. When the Proxy mode is disabled, the deploy works. Anyone has any clue on this? |
Be sure to take a look at your Cloudflare settings from https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/ and tweak them to match your production SSL settings. |
@pftg I think my settings are correct. The SSL Setting in Cloudflare is set to Full, the production.rb contains assume_ssl and force_ssl set as true, and the proxy config contains ssl: true. Am I missing something? |
I'm back! The problem was not related to Kamal. I was using a firewall in my EC2 that was not allowing the proxy to check if the server was healthy. After some tweaks to the firewall rules, it finally worked. |
Ran into the same problem. To resolve follow these steps:
Or a bit quicker:
In my case it had nothing to do with ssl settings, I needed to open the firewall for the database port. Smooth sailing from then on. |
@easydatawarehousing instructions worked for me, after hours of poking around. The issue was I needed to go into DigitalOcean managed Postgres instance and adjust my "Trusted Sources" so the "Droplet" could access the database. From my DigitalOcean "Droplet" (not even from within the Docker container)
Another peculiar thing that helped me understand what was going on is that It's been a rough two days with Kamal 2 + Rails 8. I see the vision, though. I think there needs to be a lot better handoff between Rails 8, Kamal 2 proxy, and Kamal CLI output. In this case I'd hope to see a "Couldn't connect to database" message. |
@pioz make sure your app can start in production. E.g.: set |
I ran into this for a different reason, in case anyone else encounters this. My problem was that my target instance was under-provisioned (Digital Ocean Basic / 512 MB / 1 vCPU). The behavior is that an initial I noticed it was also quite slow to connect to the target box and/or run commands on an active shell. Running It's an easy gotcha if you're experimenting with Kamala on the cheapest cloud instance you can find, which is what I was doing 😄 |
Thank you @eric-eye I faced same problem today. I could deploy my project and suddenly I couldn't even I didn't do anything. As you said, everything is slow on the remote machine. I connected it via ssh, and it took time. What should I do? I need to upgrade DO droplet. It's (Digital Ocean Basic / 1 GB / 1 vCPU). |
I turn off/on my DO machine. Tried |
I'm using kamal 2 and a fresh config and a fresh dockerfile.
Logs end with
Dockerfile:
deploy.yml
What am I missing? devint domain root path serves blue 404 page I assume comes from kamal-proxy.
The text was updated successfully, but these errors were encountered: