Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Kamal 2] Error: target failed to become healthy #1041

Closed
randohinn opened this issue Oct 3, 2024 · 65 comments
Closed

[Kamal 2] Error: target failed to become healthy #1041

randohinn opened this issue Oct 3, 2024 · 65 comments

Comments

@randohinn
Copy link

I'm using kamal 2 and a fresh config and a fresh dockerfile.

Logs end with

 "User-Agent" on xxx.xxx.x.xx
 ERROR Failed to boot web on xxx.xxx.x.xx
  INFO First web container is unhealthy on xxx.xxx.x.xx, not booting any other roles
  INFO [c43fba2b] Running docker container ls --all --filter name=^app-web-616499a7f846ac1b9e793e4b65fc0d620975af34_uncommitted_5974b526eb0bface$ --quiet | xargs docker logs --timestamps 2>&1 on xxx.xxx.x.xx
  INFO [c43fba2b] Finished in 0.086 seconds with exit status 0 (successful).
 ERROR 2024-10-03T10:12:09.396377559Z {"time":"2024-10-03T10:12:09.396233974Z","level":"INFO","msg":"Server started","http":":80"}
2024-10-03T10:12:10.030154028Z => Booting Puma
2024-10-03T10:12:10.030170436Z => Rails 7.1.3.2 application starting in devint
2024-10-03T10:12:10.030172811Z => Run `bin/rails server --help` for more startup options
2024-10-03T10:12:10.047581359Z {"time":"2024-10-03T10:12:10.047492864Z","level":"INFO","msg":"Unable to proxy request","path":"/up","error":"dial tcp 127.0.0.1:3000: connect: connection refused"}
2024-10-03T10:12:10.047600740Z {"time":"2024-10-03T10:12:10.047563654Z","level":"INFO","msg":"Request","path":"/up","status":502,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/plain; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:10.586403629Z Puma starting in single mode...
2024-10-03T10:12:10.586417892Z * Puma version: 6.4.2 (ruby 3.3.0-p0) ("The Eagle of Durango")
2024-10-03T10:12:10.586420350Z *  Min threads: 40
2024-10-03T10:12:10.586422283Z *  Max threads: 40
2024-10-03T10:12:10.586424126Z *  Environment: devint
2024-10-03T10:12:10.586426017Z *          PID: 17
2024-10-03T10:12:10.586552488Z * Listening on http://0.0.0.0:3000
2024-10-03T10:12:10.603315796Z Use Ctrl-C to stop
2024-10-03T10:12:11.048120983Z {"time":"2024-10-03T10:12:11.048046409Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":1,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:12.047512910Z {"time":"2024-10-03T10:12:12.04732821Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:13.047846408Z {"time":"2024-10-03T10:12:13.047729937Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:14.047807686Z {"time":"2024-10-03T10:12:14.047593626Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:15.047478119Z {"time":"2024-10-03T10:12:15.047384139Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:16.047969551Z {"time":"2024-10-03T10:12:16.047850513Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:17.047529146Z {"time":"2024-10-03T10:12:17.047391056Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:18.047754441Z {"time":"2024-10-03T10:12:18.047662475Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:19.047525119Z {"time":"2024-10-03T10:12:19.047406314Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:20.047555951Z {"time":"2024-10-03T10:12:20.047457548Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:21.048005701Z {"time":"2024-10-03T10:12:21.047904734Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:22.047418577Z {"time":"2024-10-03T10:12:22.047325291Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:23.047766252Z {"time":"2024-10-03T10:12:23.047654357Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:24.047382298Z {"time":"2024-10-03T10:12:24.047286567Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:25.048162887Z {"time":"2024-10-03T10:12:25.048084503Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:26.047750958Z {"time":"2024-10-03T10:12:26.047666361Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:27.047461131Z {"time":"2024-10-03T10:12:27.04729404Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:28.047998914Z {"time":"2024-10-03T10:12:28.047815193Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:29.047534102Z {"time":"2024-10-03T10:12:29.04739252Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:30.047297465Z {"time":"2024-10-03T10:12:30.047186348Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:31.047697501Z {"time":"2024-10-03T10:12:31.047600294Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:32.047398803Z {"time":"2024-10-03T10:12:32.047280902Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:33.047837629Z {"time":"2024-10-03T10:12:33.047736866Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:34.047391468Z {"time":"2024-10-03T10:12:34.047251666Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:35.047979415Z {"time":"2024-10-03T10:12:35.047800354Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:36.047563982Z {"time":"2024-10-03T10:12:36.047367517Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:37.047533440Z {"time":"2024-10-03T10:12:37.04743844Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-10-03T10:12:38.047773791Z {"time":"2024-10-03T10:12:38.047710488Z","level":"INFO","msg":"Request","path":"/up","status":301,"dur":0,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/html; charset=utf-8","remote_addr":"172.19.0.2:44120","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
  INFO [0d99e151] Running docker container ls --all --filter name=^app-web-616499a7f846ac1b9e793e4b65fc0d620975af34_uncommitted_5974b526eb0bface$ --quiet | xargs docker inspect --format '{{json .State.Health}}' on xxx.xxx.x.xx
  INFO [0d99e151] Finished in 0.079 seconds with exit status 0 (successful).
 ERROR null
  INFO [92917103] Running docker container ls --all --filter name=^app-web-616499a7f846ac1b9e793e4b65fc0d620975af34_uncommitted_5974b526eb0bface$ --quiet | xargs docker stop on xxx.xxx.x.xx
  INFO [92917103] Finished in 12.182 seconds with exit status 0 (successful).
Releasing the deploy lock...
  Finished all in 158.3 seconds
  ERROR (SSHKit::Command::Failed): Exception while executing on host xxx.xxx.x.xx: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: target failed to become healthy

Dockerfile:

# syntax = docker/dockerfile:1

# Make sure RUBY_VERSION matches the Ruby version in .ruby-version and Gemfile
ARG RUBY_VERSION=3.3.0
FROM ruby:$RUBY_VERSION-slim AS base

# Rails app lives here
WORKDIR /rails

# Set production environment
ENV BUNDLE_DEPLOYMENT="1" \
    BUNDLE_PATH="/usr/local/bundle" \
    #BUNDLE_WITHOUT="development:test" \
    RAILS_ENV="devint"

# Update gems and bundler
RUN gem update --system --no-document && \
    gem install -N bundler


# Throw-away build stage to reduce size of final image
FROM base AS build

# Install packages needed to build gems and node modules
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y build-essential curl default-libmysqlclient-dev libvips node-gyp pkg-config python-is-python3

# Install Node.js
ARG NODE_VERSION=18.15.0
ENV PATH=/usr/local/node/bin:$PATH
RUN curl -sL https://github.com/nodenv/node-build/archive/master.tar.gz | tar xz -C /tmp/ && \
    /tmp/node-build-master/bin/node-build "${NODE_VERSION}" /usr/local/node && \
    rm -rf /tmp/node-build-master

# Install application gems
COPY Gemfile Gemfile.lock ./
RUN bundle install && \
    bundle exec bootsnap precompile --gemfile && \
    rm -rf ~/.bundle/ "${BUNDLE_PATH}"/ruby/*/cache "${BUNDLE_PATH}"/ruby/*/bundler/gems/*/.git

# Install node modules
COPY package.json package-lock.json ./
RUN npm install

# Copy application code
COPY . .

# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/

# Adjust binfiles to be executable on Linux
RUN chmod +x bin/* && \
    sed -i "s/\r$//g" bin/* && \
    sed -i 's/ruby\.exe$/ruby/' bin/*

# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN SECRET_KEY_BASE_DUMMY=1 ./bin/rails assets:precompile
RUN SECRET_KEY_BASE_DUMMY=1 ./bin/rails tailwindcss:build


# Final stage for app image
FROM base

# Install packages needed for deployment
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y curl default-mysql-client imagemagick libsqlite3-0 libvips && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Copy built artifacts: gems, application
COPY --from=build "${BUNDLE_PATH}" "${BUNDLE_PATH}"
COPY --from=build /rails /rails

# Run and own only the runtime files as a non-root user for security
RUN groupadd --system --gid 1000 rails && \
    useradd rails --uid 1000 --gid 1000 --create-home --shell /bin/bash && \
    chown -R 1000:1000 db log storage tmp
USER 1000:1000

# Entrypoint prepares the database.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]

# Start the server by default, this can be overwritten at runtime
EXPOSE 80
CMD ["bundle", "exec", "thrust", "./bin/rails", "server"]

deploy.yml

# Name of your application. Used to uniquely configure containers.
service: app
# Name of the container image.
image: mycomp/myapp

# Deploy to these servers.
servers:
  web:
    -  xxx.xxx.x.xx
  # job:
  #   hosts:
  #     - 192.168.0.1
  #   cmd: bin/jobs

# Enable SSL auto certification via Let's Encrypt (and allow for multiple apps on one server).
# Set ssl: false if using something like Cloudflare to terminate SSL (but keep host!).
proxy:
  ssl: false
  host: devint.mydomain.ee

registry:
  username: myusername
  password:
    - KAMAL_REGISTRY_PASSWORD

# Configure builder setup.
builder:
  arch: amd64
  context: .

ssh:
  user: rando


# Inject ENV variables into containers (secrets come from .kamal/secrets).
#
# env:
#   clear:
#     DB_HOST: 192.168.0.2
#   secret:
#     - RAILS_MASTER_KEY

env:
  clear:
    DB_HOST:  xxx.xxx.x.xx
    RAILS_LOG_TO_STDOUT: 1
    RUBY_YJIT_ENABLE: 1
    RAILS_SERVE_STATIC_FILES: true
    RAILS_MAX_THREADS: 40
  secret:
    - RAILS_MASTER_KEY
    - MYSQL_ROOT_PASSWORD


asset_path: /rails/public/assets

accessories:
  db:
    image: mariadb:11.4
    host: xxx.xxx.x.xx
    port: 3306
    env:
      clear:
        MYSQL_ROOT_HOST: '%'
      secret:
        - MYSQL_ROOT_PASSWORD
    files:
      #       - config/mysql/production.cnf:/etc/mysql/my.cnf
      - db/production.sql:/docker-entrypoint-initdb.d/setup.sql
    directories:
      - data:/var/lib/mysql

What am I missing? devint domain root path serves blue 404 page I assume comes from kamal-proxy.

@wcpaez
Copy link

wcpaez commented Oct 3, 2024

By default, Kamal Proxy checks port 80. Since you are using port 3000, specify the app_port setting in the proxy configuration:

proxy:
  ssl: true
  host: api.abc.com
  app_port: 3000

@dhh dhh closed this as completed Oct 3, 2024
@randohinn
Copy link
Author

By default, Kamal Proxy checks port 80. Since you are using port 3000, specify the app_port setting in the proxy configuration:

proxy:
  ssl: true
  host: api.abc.com
  app_port: 3000

If you look at the bottom of the Dockerfile, you can see

# Start the server by default, this can be overwritten at runtime
EXPOSE 80
CMD ["bundle", "exec", "thrust", "./bin/rails", "server"]

The container is supposed to be using basecamp/thruster which by default should expose port 80. Nowhere has port 3000 been defined. Adding your suggestion of app_port makes things worse. 2024-10-03T14:23:01.810967693Z 2024-10-03 14:23:01 +0000 HTTP parse error, malformed request: #<Puma::HttpParserError: Invalid HTTP format, parsing fails. Are you trying to open an SSL connection to a non-SSL Puma?> . Thruster is included in the gemfile.

@dhh, thruster doesn't need any fancy config, does it?

@kaka-ruto
Copy link

I have the same issue as @randohinn , using Thruster, though my specific error was

2024-10-03T13:18:51.587266188Z {"time":"2024-10-03T13:18:51.586672303Z","level":"INFO","msg":"Server started","http":":80"}
2024-10-03T13:18:51.859859612Z {"time":"2024-10-03T13:18:51.859393607Z","level":"INFO","msg":"Unable to proxy request","path":"/up","error":"dial tcp [::1]:3000: connect: connection refused"}

and when I tried to remove thruster and use just app_port: 3000, I get the same error as rando.

Might help to mention that I use Cloudflare and have set ssl mode to Full, also set ssl: false per the kamal suggestions I've seen elsewhere

@dhh dhh reopened this Oct 3, 2024
@dhh
Copy link
Member

dhh commented Oct 3, 2024

Try to replicate this on a new rails 8 beta app. I've done a million deploys with that without trouble. Then compare the difference between that, if you get it working, and your own app. Then we can hopefully narrow down if there's a bug or a configuration issue. But need a minimal reproducible error to move forward.

@kaka-ruto
Copy link

Cool, will do. I presume I cannot run two kamal versions so I can use my staging setup with kamal 1 yeah? I'll need to either wipe that or get a new box?

@dhh
Copy link
Member

dhh commented Oct 3, 2024

Yeah, you can't run both versions on the same box at the same time.

@BarnabeD
Copy link

BarnabeD commented Oct 4, 2024

I've also encountered this problem.
The problem comes from the “/up” route which must remain reachable without SSL.
You just need to add a configuration line in production.rb.
I actually found this configuration in a fresh rails 8 app.

# production.rb

# Skip http-to-https redirect for the default health check endpoint.
  config.ssl_options = { redirect: { exclude: ->(request) { request.path == /up” } }

@brendonrogers
Copy link

Expanding on previous comment - see https://nts.strzibny.name/upgrading-to-kamal-2/ - try config.host_authorization = { exclude: ->(request) { request.path == "/up" } } (I didn't end up requiring this, YMMV).

Also do you have config.force_ssl = false?

@pftg
Copy link

pftg commented Oct 4, 2024

Expanding on previous comment - see https://nts.strzibny.name/upgrading-to-kamal-2/ - try config.host_authorization = { exclude: ->(request) { request.path == "/up" } } (I didn't end up requiring this, YMMV).

Also do you have config.force_ssl = false?

Thanks, @bergatron!

NOTE: And do not forget to deploy those changes before the upgrade!

@i2chris
Copy link

i2chris commented Oct 4, 2024

On a fresh Hertz instance with a Rails 7.2.1 application and Kamal 2, I had to set config.force_ssl = false to get the healthcheck working.

I've deployed a new Rails 8 app onto it and I didn't need to set that. I'm not sure why.

@marckohlbrugge
Copy link

FWIW, I ran into a similar issue ( /up returning a 301 ) when using Rack middleware to ensure a canonical hostname. This happens because the health check isn't loaded through the hostname and so Rails tries to redirect the request.

The solution then is to skip this middleware for the health check end point (typically /up) or just get rid of the middleware altogether.

@guilhermecaixeta
Copy link

I'm facing the same issue using Rails 7.1.3 and Kamal 2.1.0, it just stopped to worked after migrate from the version 1 to 2.

The logs:

 INFO [1ade11b4] Running docker run --detach --restart unless-stopped --name container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36 --network kamal --hostname web_deploy-5fbe94df2c1a -e KAMAL_CONTAINER_NAME="container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36" -e KAMAL_VERSION="b5bbe86b667959d01a09f5ae528df1a7cd78dw36" --env RAILS_MAX_THREADS="5" --env WEB_CONCURRENCY="5" --env APPLICATION_HOST="web" --env RACK_ENV="production" --env RAILS_ENV="production" --env RAILS_SERVE_STATIC_FILES="true" --env-file .kamal/apps/container-name/env/roles/web.env --log-opt max-size="10m" --volume $(pwd)/.kamal/apps/container-name/assets/volumes/web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36:/rails/public/assets --label service="container-name" --label role="web" --label destination destination/container-name:b5bbe86b667959d01a09f5ae528df1a7cd78dw36 on web_deploy
  INFO [1ade11b4] Finished in 0.624 seconds with exit status 0 (successful).
  INFO [0638ab65] Running docker container ls --all --filter name=^container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36$ --quiet on web_deploy
  INFO [0638ab65] Finished in 0.136 seconds with exit status 0 (successful).
  INFO [e452cf6d] Running docker exec kamal-proxy kamal-proxy deploy container-name-web --target="d4c42b8aa2ba:3000" --host="my-host.com" --tls --deploy-timeout="30s" --drain-timeout="30s" --health-check-interval="7s" --health-check-timeout="49s" --health-check-path="/up" --buffer-requests --buffer-responses --log-request-header="Cache-Control" --log-request-header="Last-Modified" --log-request-header="User-Agent" on web_deploy
 ERROR Failed to boot web on web_deploy
  INFO First web container is unhealthy on web_deploy, not booting any other roles
  INFO [5af2ac24] Running docker container ls --all --filter name=^container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36$ --quiet | xargs docker logs --timestamps 2>&1 on web_deploy
  INFO [5af2ac24] Finished in 0.171 seconds with exit status 0 (successful).
 ERROR 2024-10-07T18:35:40.730291457Z == Preparing DB ==
2024-10-07T18:36:00.706423379Z == Setup authorization ==
2024-10-07T18:36:00.706484443Z == Adding roles ==
2024-10-07T18:36:00.706494663Z == Checking roles ==
2024-10-07T18:36:00.706504077Z == No new role was found ==
2024-10-07T18:36:00.706513000Z == Done ==
2024-10-07T18:36:00.706521316Z == Adding permissions ==
2024-10-07T18:36:00.706529314Z == Checking permissions ==
2024-10-07T18:36:00.706537506Z == No new permissions was found ==
2024-10-07T18:36:00.706545706Z == Adding attaching permissions to roles ==
2024-10-07T18:36:00.706553889Z == Attaching permissions to roles ==
2024-10-07T18:36:00.706561985Z == Done ==
  INFO [8d453758] Running docker container ls --all --filter name=^container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36$ --quiet | xargs docker inspect --format '{{json .State.Health}}' on web_deploy
  INFO [8d453758] Finished in 0.164 seconds with exit status 0 (successful).
 ERROR null
  INFO [74631586] Running docker container ls --all --filter name=^container-name-web-b5bbe86b667959d01a09f5ae528df1a7cd78dw36$ --quiet | xargs docker stop on web_deploy
  INFO [74631586] Finished in 10.531 seconds with exit status 0 (successful).
Releasing the deploy lock...
  Finished all in 60.3 seconds
  ERROR (SSHKit::Command::Failed): Exception while executing on host web_deploy: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: target failed to become healthy

Deploy.yml

# Name of your application. Used to uniquely configure containers.
service: my-service

image: KAMAL_REGISTRY_USER/my-service

servers:
  web:
    hosts:
      - web_deploy 

  workers:
    hosts:
      - web_deploy
    cmd: "sidekiq"
    labels:
      workers.service: sidekiq
    
# Credentials for your image host.
registry:
  username: KAMAL_REGISTRY_USER
  password:
    - KAMAL_REGISTRY_PASSWORD

env:
  clear:
    RAILS_MAX_THREADS: 5
    WEB_CONCURRENCY: 5
    APPLICATION_HOST: web
    RACK_ENV: production
    RAILS_ENV: production
    RAILS_SERVE_STATIC_FILES: true
  secret:
    - Secrets

ssh:
  user: USER_HOST
  port: SSH_PORT

# Configure builder setup.
builder:
  arch: amd64
  remote: ssh://USER_HOST@WEB_HOST:SSH_PORT
  cache:
    options: --no-cache
  args:
    GIT_REV: <%= `git rev-parse --short HEAD` %>
    BUILD_DATE: <%= `date -u +"%Y-%m-%dT%H:%M:%S %Z"` %>

accessories:
  db:
    image: postgres:16
    host: accessories_deploy
    port: 5432
    env:
      secret:
        - POSTGRES_DATABASE 
        - POSTGRES_USER
        - POSTGRES_PASSWORD
    directories:
      - data/postgres:/var/lib/postgresql/data
    files:
      - infrastructure/postgres/postgresql.conf:/usr/local/share/postgresql/postgresql.conf.sample    

  redis:
      image: redis:7-alpine
      host: accessories_deploy
      port: 6379
      directories:
        - data/redis:/data
      files:
        - infrastructure/redis/redis.conf:/etc/redis/redis.conf
        - infrastructure/redis/redis-sysctl.conf:/etc/sysctl.conf
      cmd: redis-server /etc/redis/redis.conf

proxy:
  ssl: true
  host: my-host.com
  app_port: 3000
  healthcheck:
    path: /up
    interval: 7
    timeout: 49
  
asset_path: /rails/public/assets

primary_role: web

Dockerfile:

# syntax = docker/dockerfile:1

# Make sure RUBY_VERSION matches the Ruby version in .ruby-version and Gemfile
ARG RUBY_VERSION=3.3.0
FROM registry.docker.com/library/ruby:$RUBY_VERSION-slim as base

# Rails app lives here
WORKDIR /rails

# Set production environment
ENV RAILS_ENV="production" \
    BUNDLE_DEPLOYMENT="1" \
    BUNDLE_PATH="/usr/local/bundle" \
    BUNDLE_WITHOUT="development"


# Throw-away build stage to reduce size of final image
FROM base as build

# Install packages needed to build gems
RUN apt-get update && apt-get -y install --no-install-recommends \
    postgresql-client libpq-dev tar git libssl-dev cron \
    zlib1g-dev libyaml-dev curl libreadline-dev \
    build-essential gnupg2 imagemagick libjpeg-dev libpng-dev libtiff-dev \
    libwebp-dev libvips tzdata gifsicle tmux nodejs redis-tools && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Install application gems
COPY Gemfile Gemfile.lock ./
RUN bundle install && \
    rm -rf ~/.bundle/ "${BUNDLE_PATH}"/ruby/*/cache "${BUNDLE_PATH}"/ruby/*/bundler/gems/*/.git && \
    bundle exec bootsnap precompile --gemfile

# Copy application code
COPY . .

# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/

# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN DISABLE_DATABASE_ENVIRONMENT_CHECK=1 SECRET_KEY_BASE_DUMMY=1 ./bin/rails assets:precompile

# Final stage for app image
FROM base

# Install packages needed for deployment
RUN apt-get update && apt-get -y install --no-install-recommends \
    build-essential gnupg2 tar git libssl-dev cron \
    zlib1g-dev libyaml-dev libreadline-dev curl \
    postgresql-client libpq-dev openssh-client nodejs \
    imagemagick libjpeg-dev libpng-dev libtiff-dev python-is-python3 \
    libwebp-dev libvips tzdata gifsicle tmux redis-tools acl && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

ARG NODE_VERSION=20.12.2
ENV PATH=/usr/local/node/bin:$PATH
RUN curl -sL https://github.com/nodenv/node-build/archive/master.tar.gz | tar xz -C /tmp/ && \
    /tmp/node-build-master/bin/node-build "${NODE_VERSION}" /usr/local/node && \
    npm install -g mjml && \
    rm -rf /tmp/node-build-master     

# Copy built artifacts: gems, application
COPY --from=build /usr/local/bundle /usr/local/bundle
COPY --from=build /rails /rails

# Cron service
RUN service cron start
RUN bundle exec whenever --set 'environment=production' --user 'root' --update-crontab

# Run and own only the runtime files as a non-root user for security
RUN useradd rails --create-home --shell /bin/bash && \
    chown -R rails:rails db log storage tmp
USER rails:rails

# Entrypoint prepares the database.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]

# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
CMD ["./bin/rails", "server"]

@sasharevzin
Copy link

@guilhermecaixeta did you set force_ssl to be false in as per #1041 (comment)

@guilhermecaixeta
Copy link

@guilhermecaixeta did you set force_ssl to be false in as per #1041 (comment)

Yep, I did. Still returning the same error...

@pftg
Copy link

pftg commented Oct 7, 2024

@guilhermecaixeta would be great if you would share app logs, too with error. I believe as @sasharevzin mentioned you will find there something like Invalid HTTP format, parsing fails. Are you trying to open an SSL connection to a non-SSL Puma?

@BarnabeD
Copy link

BarnabeD commented Oct 7, 2024

@guilhermecaixeta maybe it's a port 80 error. Thruster expose the port 80 and I think it's the new default port in kamal 2. But you can try to specify port 3000 in the deploy.yml

@guilhermecaixeta
Copy link

@guilhermecaixeta would be great if you would share app logs, too with error. I believe as @sasharevzin mentioned you will find there something like Invalid HTTP format, parsing fails. Are you trying to open an SSL connection to a non-SSL Puma?

This is the point there is no error in the app, I've started one of the kamal exited container it just take too long to start (this is other issue) but after some time it starts properly. It looks like the healthcheck is not taking in consideration the interval set....

@guilhermecaixeta
Copy link

@guilhermecaixeta maybe it's a port 80 error. Thruster expose the port 80 and I think it's the new default port in kamal 2. But you can try to specify port 3000 in the deploy.yml

I'll try that also.

@BarnabeD
Copy link

BarnabeD commented Oct 7, 2024

@guilhermecaixeta or you can also add thruster. You will need some change in the entrypoint file.

@i2chris
Copy link

i2chris commented Oct 7, 2024

Would it be possible to create a rake task that could check if a Rails 7.x app is setup correctly to work with Kamal 2? It could check the Dockerfile and deploy.yml etc for consistency

@guilhermecaixeta
Copy link

add thruster

The latest errors that I got from my pipeline

2024-10-07T21:38:00.533423809Z => Rails 7.1.4 application starting in production 
2024-10-07T21:38:00.533437229Z => Run `bin/rails server --help` for more startup options
2024-10-07T21:38:03.337230298Z [1] Puma starting in cluster mode...
2024-10-07T21:38:03.337372810Z [1] * Puma version: 6.4.3 (ruby 3.3.0-p0) ("The Eagle of Durango")
2024-10-07T21:38:03.337379098Z [1] *  Min threads: 5
2024-10-07T21:38:03.337382920Z [1] *  Max threads: 5
2024-10-07T21:38:03.33738[67](https://github.com/git/project/actions/runs/11224348124/job/31200853412#step:13:68)02Z [1] *  Environment: production
2024-10-07T21:38:03.337390328Z [1] *   Master PID: 1
2024-10-07T21:38:03.337393882Z [1] *      Workers: 5
2024-10-07T21:38:03.337397386Z [1] *     Restarts: (✔) hot (✖) phased
2024-10-07T21:38:03.337410732Z [1] * Preloading application
2024-10-07T21:38:03.339236940Z [1] * Listening on http://0.0.0.0:3000
2024-10-07T21:38:03.339925538Z [1] Use Ctrl-C to stop
2024-10-07T21:38:03.372720317Z [1] - Worker 0 (PID: 104) booted in 0.02s, phase: 0
2024-10-07T21:38:03.372833943Z [1] - Worker 1 (PID: 107) booted in 0.02s, phase: 0
2024-10-07T21:38:03.373391481Z [1] - Worker 2 (PID: 112) booted in 0.01s, phase: 0
2024-10-07T21:38:03.373545035Z [1] - Worker 3 (PID: 118) booted in 0.01s, phase: 0
2024-10-07T21:38:03.373[68](https://github.com/project/actions/runs/11224348124/job/31200853412#step:13:69)8959Z [1] - Worker 4 (PID: 124) booted in 0.0s, phase: 0
2024-10-07T21:38:09.999357642Z I, [2024-10-07T21:38:09.998846 #107]  INFO -- : [72ca2793-bb8d-41ed-badc-05a3e3d08349] Started GET "/up" for 172.18.0.2 at 2024-10-07 21:38:09 +0000
2024-10-07T21:38:10.024854868Z I, [2024-10-07T21:38:10.024513 #107]  INFO -- : [72ca2793-bb8d-41ed-badc-05a3e3d08349] Processing by Rails::HealthController#show as HTML
2024-10-07T21:38:10.031234764Z I, [2024-10-07T21:38:10.030760 #107]  INFO -- : [72ca2793-bb8d-41ed-badc-05a3e3d08349] Completed 200 OK in 6ms (Views: 4.0ms | ActiveRecord: 0.0ms | Allocations: [69](https://github.com/project/actions/runs/11224348124/job/31200853412#step:13:70)1)
  INFO [a2db0df9] Running docker container ls --all --filter name=^business-manager-web-production-b5d5feea8b183[71](https://github.com/project/actions/runs/11224348124/job/31200853412#step:13:72)9004184c69ea63c8df7e5ae3f$ --quiet | xargs docker inspect --format '{{json .State.Health}}' on web_deploy
  INFO [a2db0df9] Finished in 0.609 seconds with exit status 0 (successful).
 ERROR null
  INFO [5adbe91b] Running docker container ls --all --filter name=^business-manager-web-production-b5d5feea8b183719004184c69ea63c8df7e5ae3f$ --quiet | xargs docker stop on web_deploy
  INFO [5adbe91b] Finished in 1.403 seconds with exit status 0 (successful).
Releasing the deploy lock...
  Finished all in 63.1 seconds
  ERROR (SSHKit::Command::Failed): Exception while executing on host web_deploy: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: host settings conflict with another service

TLDR; If I restart this same container it will works fine.

@kaka-ruto
Copy link

@dhh I've known no luck so far. I've updated my system to Rails 8 beta, and tried all the suggestions on this thread and others, and I am consistently getting the same error. I've added more details to basecamp/thruster#42 (comment) as it's looking more like a thruster issue

@kaka-ruto
Copy link

My bad, so I was using this gem https://github.com/rameerez/allgood to cover more ground with my health checks, and some of my checks defined there were failing, but without a proper error to understand the cause.

Everything works after removing it!

@BarnabeD
Copy link

BarnabeD commented Oct 8, 2024

@guilhermecaixeta
It looks like you have a new error message : host settings conflict.
I'm not sure, but maybe you have many deployments with similar configuration and between some trying, some are always running.
You should check existing app running on your server with kamal app containers and stop some old trying.

For other's issues :

  • At some point, with kamal 2.1, I encountered a Kamal proxy error. Kamal Proxy needs a restart. Maybe you can try it.
  • Check by launching a console on the server if you don't have a database issue, like no database connected or remaining migration.

@capripot
Copy link

capripot commented Oct 8, 2024

@guilhermecaixeta Have you tried using thruster gem?

Gemfile

# Add HTTP asset caching/compression and X-Sendfile acceleration to Puma [https://github.com/basecamp/thruster/]
gem "thruster", require: false

Dockerfile

# Start the server by default, this can be overwritten at runtime
EXPOSE 3000
CMD ["./bin/thrust", "./bin/rails", "server"]

Running on Rails 7, I also had to make sure force_ssl was off, and /up route was properly configured.

production.rb

  config.force_ssl = false
  config.host_authorization = { exclude: ->(request) { request.path == "/up" } }

routes.rb

  # Reveal health status on /up that returns 200 if the app boots with no exceptions, otherwise 500.
  # Can be used by load balancers and uptime monitors to verify that the app is live.
  get "up" => "rails/health#show", as: :rails_health_check

@kishaningithub
Copy link

kishaningithub commented Oct 8, 2024

@dhh I am facing this same issue while deploying a wordpress site with proxy: false

Logs of kamal deploy

If you notice below the proxy looks alive and it is continously pinging /up endpoint. This is incorrect behaviour when proxy: false

2024-10-08T06:27:20.226839066Z 172.18.0.3 - - [08/Oct/2024:06:27:20 +0000] "GET /up HTTP/1.1" 404 35861 "http://9f3ecb8ffd88:80/up" "Go-http-client/1.1"
2024-10-08T06:27:21.039613333Z [Tue Oct 08 06:27:21.038484 2024] [php:warn] [pid 22:tid 22] [client 172.18.0.3:55608] PHP Warning:  Attempt to read property "is_valid" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 27
2024-10-08T06:27:21.039664440Z [Tue Oct 08 06:27:21.038538 2024] [php:warn] [pid 22:tid 22] [client 172.18.0.3:55608] PHP Warning:  Attempt to read property "support_renew_link" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 31
2024-10-08T06:27:21.078548953Z 172.18.0.3 - - [08/Oct/2024:06:27:21 +0000] "GET /up HTTP/1.1" 301 385 "-" "Go-http-client/1.1"
2024-10-08T06:27:21.087763451Z [Tue Oct 08 06:27:21.087639 2024] [php:warn] [pid 22:tid 22] [client 172.18.0.3:55608] PHP Warning:  Attempt to read property "is_valid" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 27, referer: http://9f3ecb8ffd88:80/up
2024-10-08T06:27:21.087906525Z [Tue Oct 08 06:27:21.087851 2024] [php:warn] [pid 22:tid 22] [client 172.18.0.3:55608] PHP Warning:  Attempt to read property "support_renew_link" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 31, referer: http://9f3ecb8ffd88:80/up
2024-10-08T06:27:21.209349936Z 172.18.0.3 - - [08/Oct/2024:06:27:21 +0000] "GET /up HTTP/1.1" 404 35861 "http://9f3ecb8ffd88:80/up" "Go-http-client/1.1"
2024-10-08T06:27:22.037620774Z [Tue Oct 08 06:27:22.036877 2024] [php:warn] [pid 23:tid 23] [client 172.18.0.3:55610] PHP Warning:  Attempt to read property "is_valid" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 27
2024-10-08T06:27:22.037676742Z [Tue Oct 08 06:27:22.036916 2024] [php:warn] [pid 23:tid 23] [client 172.18.0.3:55610] PHP Warning:  Attempt to read property "support_renew_link" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 31
2024-10-08T06:27:22.072997578Z 172.18.0.3 - - [08/Oct/2024:06:27:22 +0000] "GET /up HTTP/1.1" 301 385 "-" "Go-http-client/1.1"
2024-10-08T06:27:22.082678689Z [Tue Oct 08 06:27:22.082553 2024] [php:warn] [pid 23:tid 23] [client 172.18.0.3:55610] PHP Warning:  Attempt to read property "is_valid" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 27, referer: http://9f3ecb8ffd88:80/up
2024-10-08T06:27:22.082820318Z [Tue Oct 08 06:27:22.082769 2024] [php:warn] [pid 23:tid 23] [client 172.18.0.3:55610] PHP Warning:  Attempt to read property "support_renew_link" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 31, referer: http://9f3ecb8ffd88:80/up
2024-10-08T06:27:22.198417534Z 172.18.0.3 - - [08/Oct/2024:06:27:22 +0000] "GET /up HTTP/1.1" 404 35861 "http://9f3ecb8ffd88:80/up" "Go-http-client/1.1"
2024-10-08T06:27:23.038643181Z [Tue Oct 08 06:27:23.038086 2024] [php:warn] [pid 24:tid 24] [client 172.18.0.3:55612] PHP Warning:  Attempt to read property "is_valid" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 27
2024-10-08T06:27:23.038710088Z [Tue Oct 08 06:27:23.038136 2024] [php:warn] [pid 24:tid 24] [client 172.18.0.3:55612] PHP Warning:  Attempt to read property "support_renew_link" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 31
2024-10-08T06:27:23.079937251Z 172.18.0.3 - - [08/Oct/2024:06:27:23 +0000] "GET /up HTTP/1.1" 301 385 "-" "Go-http-client/1.1"
2024-10-08T06:27:23.088756298Z [Tue Oct 08 06:27:23.088657 2024] [php:warn] [pid 24:tid 24] [client 172.18.0.3:55612] PHP Warning:  Attempt to read property "is_valid" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 27, referer: http://9f3ecb8ffd88:80/up
2024-10-08T06:27:23.089577211Z [Tue Oct 08 06:27:23.088819 2024] [php:warn] [pid 24:tid 24] [client 172.18.0.3:55612] PHP Warning:  Attempt to read property "support_renew_link" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 31, referer: http://9f3ecb8ffd88:80/up
2024-10-08T06:27:23.201792327Z 172.18.0.3 - - [08/Oct/2024:06:27:23 +0000] "GET /up HTTP/1.1" 404 35861 "http://9f3ecb8ffd88:80/up" "Go-http-client/1.1"
2024-10-08T06:27:24.057137478Z [Tue Oct 08 06:27:24.056870 2024] [php:warn] [pid 19:tid 19] [client 172.18.0.3:55626] PHP Warning:  Attempt to read property "is_valid" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 27
2024-10-08T06:27:24.058564102Z [Tue Oct 08 06:27:24.057091 2024] [php:warn] [pid 19:tid 19] [client 172.18.0.3:55626] PHP Warning:  Attempt to read property "support_renew_link" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 31
2024-10-08T06:27:24.113243837Z 172.18.0.3 - - [08/Oct/2024:06:27:24 +0000] "GET /up HTTP/1.1" 301 385 "-" "Go-http-client/1.1"
2024-10-08T06:27:24.132089890Z [Tue Oct 08 06:27:24.131690 2024] [php:warn] [pid 19:tid 19] [client 172.18.0.3:55626] PHP Warning:  Attempt to read property "is_valid" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 27, referer: http://9f3ecb8ffd88:80/up
2024-10-08T06:27:24.132457011Z [Tue Oct 08 06:27:24.132208 2024] [php:warn] [pid 19:tid 19] [client 172.18.0.3:55626] PHP Warning:  Attempt to read property "support_renew_link" on null in /var/www/html/wp-content/plugins/luique-plugin/admin/dashboard-theme-activation.php on line 31, referer: http://9f3ecb8ffd88:80/up
  INFO [8c74c232] Running docker container ls --all --filter name=^wordtrial-web-5d4897ade85123848c0171c695f868f073dd7dd0$ --quiet | xargs docker inspect --format '{{json .State.Health}}' on 139.59.18.46
  INFO [8c74c232] Finished in 0.103 seconds with exit status 0 (successful).
 ERROR null
  INFO [a3309027] Running docker container ls --all --filter name=^wordtrial-web-5d4897ade85123848c0171c695f868f073dd7dd0$ --quiet | xargs docker stop on 139.59.18.46
  INFO [a3309027] Finished in 1.374 seconds with exit status 0 (successful).
Releasing the deploy lock...
  Finished all in 85.1 seconds
  ERROR (SSHKit::Command::Failed): Exception while executing on host 139.59.18.46: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: target failed to become healthy

Kamal config file

service: wordtrial

image: kishaningithub/wordtrial

servers:
  web:
    - 139.59.18.46

proxy: false

registry:
  server: ghcr.io
  username: kishaningithub
  password:
    - KAMAL_REGISTRY_PASSWORD

builder:
  arch: amd64

env:
  clear:
    WORDPRESS_DB_HOST: wordtrial-db:3306
    WORDPRESS_DB_USER: wordpress
    WORDPRESS_DB_NAME: wordpress
  secret:
    - WORDPRESS_DB_PASSWORD

volumes:
  - wordpress_data:/var/www/html

accessories:
  db:
    image: mariadb:11.4
    host: 139.59.18.46
    port: "127.0.0.1:3306:3306"
    options:
      restart: unless-stopped
    env:
      clear:
        MARIADB_DATABASE: wordpress
        MARIADB_USER: wordpress
      secret:
        - MARIADB_ROOT_PASSWORD
        - MARIADB_PASSWORD
    directories:
      - data:/var/lib/mysql

Docker file

FROM wordpress:6.6.2-apache

# https://www.aptgetlife.co.uk/docker-wordpress-increase-php-max-file-size/
COPY ./config/wordpress/uploads.ini /usr/local/etc/php/conf.d/

@BarnabeD
Copy link

BarnabeD commented Oct 8, 2024

@kishaningithub
I think the app boot sequence needs a health check anyway, event with kamal-proxy off.

https://github.com/basecamp/kamal/blob/main/lib/kamal/cli/app/boot.rb#L54C7-L 60C9

I think the health check is mandatory to release the deployment's lock.
It's a conservative approach, if the new app can't prove it's alive, the container is stopped.

@bpiroman
Copy link

I fixed this by just adding an endpoint that returns a 200 ok for "/up" endpoint. Literally spent the whole weekend trying to figure it out and the solution was so simple.

My app is serving on port 80, Dockerfile EXPOSE 80 and my deploy.yml proxy settings:
proxy:
ssl: true
host: app.yourhostname.com

If anyone needs any for info on this let me know!

@pftg
Copy link

pftg commented Oct 13, 2024

@bpiroman others did not have the issue with implementing the endpoints, but skipping SSL and increasing timeouts before health check marks the container as healthy.

@kishaningithub
Copy link

Container health check params must be the default IMO people can override in kamal if they want to

@jasonmag
Copy link

I was able to fix this error on my end.

production.rb
config.force_ssl = false

deploy.yml

proxy:
  ssl: true
  host: example.com
  # kamal-proxy connects to your container over port 80, use `app_port` to specify a different port.
  app_port: 3000

Disabled force_ssl from Rails to make way for Kamal2 SSL.

@dhh
Copy link
Member

dhh commented Oct 15, 2024

Use config.force_ssl together with config.assume_ssl. Then Rails will still act as though you're on SSL, which is important for setting secure cookies etc. This is the default for Rails 8+.

@dhh dhh closed this as completed Oct 15, 2024
@scuml
Copy link

scuml commented Oct 22, 2024

Got hung up on this with a Django install. This post got me up and running.
https://www.coryzue.com/writing/kamal-django/

Summary is create a middleware and put it first in the list to bypass the ALLOWED_HOSTS check and HTTPS redirects.

# <project>/middleware.py

from django.http import HttpResponse

class HealthCheckMiddleware:
    def __init__(self, get_response):
        self.get_response = get_response

    def __call__(self, request):
        if request.path == "/up":
            # Or perform any "real" health checking, if needed
            response = HttpResponse("OK")
        else:
            response = self.get_response(request)

        return response

@SpBills
Copy link

SpBills commented Oct 24, 2024

Yeah this is still a problem for other frameworks which don't let you do HTTP/S handling on a per-route basis. For instance, I am trying to host a Next 14 project and running into this problem.

@kishaningithub
Copy link

kishaningithub commented Oct 24, 2024

@dhh Can this be reopened? I feel the default of "/up" for health check is not right and instead the default should come from the HEALTHCHECK defined in the docker image

https://docs.docker.com/reference/dockerfile/#healthcheck

@dhh
Copy link
Member

dhh commented Oct 24, 2024

I recall us investigating the healthcheck feature in the past and finding it wasn't suitable for our needs, especially around the gap-less deploys. Don't recall the specifics, though. Maybe @djmb remembers why we discarded the healthcheck as an option.

Either way, it's not something that's on the menu to change in the short term. Tons of successful deployments in the wild now that are built around using /up as the healthcheck.

@SpBills
Copy link

SpBills commented Oct 25, 2024

We need some way of simply disabling the health check. Currently we are blocked on a deployment because of a healthcheck being unhealthy, probably because of the HTTPS problem.

@dhh
Copy link
Member

dhh commented Oct 25, 2024

I don't see any path to deployment without a working healthcheck (whether that's /up or HEALTHCHECK inside the container) since that's the mechanism we need to ensure the gap-less deploy. Otherwise you're going to be throwing 500 errors from the proxy for the duration of the app boot, which isn't what anyone wants.

@kishaningithub
Copy link

@djmb Is the reasoning of preferring /up over docker HEALTHCHECK documented somewhere? Would be great to know the train of thought that went behind it.

@SpBills
Copy link

SpBills commented Oct 26, 2024

@dhh Thanks for explaining how healtchecks are used for gap-less deployments. That actually makes a lot of sense and inspired me to find a solution for this.

Thanks to the post by @scuml , I have this solution for Nextjs users struggling with getting healthchecks working on Kamal 2.0.

Place this in your middleware. Point your healthcheck at /up.

import { NextRequest, NextResponse } from "next/server";

export function middleware(request: NextRequest) {

  if (request.nextUrl.pathname === "/up") {
    return new Response("OK", { status: 200 });
  }

  return NextResponse.next();
}

@janosrusiczki
Copy link

config.assume_ssl = true
config.force_ssl = true

This is what worked for me on Rails 7.2.2, I mean setting config.assume_ssl to true.

@jjatinggoyal you're the MVP! 😄 Thanks!

@AxelTheGerman
Copy link
Contributor

config.assume_ssl = true
config.force_ssl = true

Was definitely what I needed, but I also have other setup steps running in my bin/docker-entrypoint so I had to adjust my deploy_timeout in the kamal config:

# Deploy timeout
#
# How long to wait for a container to become ready, default 30:
deploy_timeout: 60

This stopped the following from happening again:

2024-11-16T15:43:30.463083222Z {"time":"2024-11-16T15:43:30.462675726Z","level":"INFO","msg":"Unable to proxy request","path":"/up","error":"dial tcp [::1]:3000: connect: connection refused"}
2024-11-16T15:43:30.464958159Z {"time":"2024-11-16T15:43:30.463473194Z","level":"INFO","msg":"Request","path":"/up","status":502,"dur":2,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/plain; charset=utf-8","remote_addr":"172.18.0.2:39756","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
2024-11-16T15:43:31.214349949Z => Booting Puma
2024-11-16T15:43:31.214412520Z => Rails 8.0.0.rc2 application starting in staging 
2024-11-16T15:43:31.214419656Z => Run `bin/rails server --help` for more startup options
2024-11-16T15:43:31.471120468Z {"time":"2024-11-16T15:43:31.47029138Z","level":"INFO","msg":"Unable to proxy request","path":"/up","error":"dial tcp [::1]:3000: connect: connection refused"}
2024-11-16T15:43:31.471175057Z {"time":"2024-11-16T15:43:31.470470487Z","level":"INFO","msg":"Request","path":"/up","status":502,"dur":6,"method":"GET","req_content_length":0,"req_content_type":"","resp_content_length":0,"resp_content_type":"text/plain; charset=utf-8","remote_addr":"172.18.0.2:39756","user_agent":"Go-http-client/1.1","cache":"miss","query":""}
[...]
Releasing the deploy lock...
  Finished all in 404.0 seconds
  ERROR (SSHKit::Command::Failed): Exception while executing on host ***: docker exit status: 1
docker stdout: Nothing written
docker stderr: Error: target failed to become healthy

@pftg
Copy link

pftg commented Nov 16, 2024

Yep, I found that as well #1041 (comment)

@pioz
Copy link

pioz commented Nov 19, 2024

I've the same problem with rails 8. My deploy end with "docker stderr: Error: target failed to become healthy".

The problem is this command:

docker container ls --all --filter name=^xxx-web-1b3e126f196e6e86114c6e7020e72ab7923e9fa6$ --quiet | xargs docker inspect --format '{{json .State.Health}}' 

The json does not contains the Health key.

My dockerfile:

# syntax=docker/dockerfile:1
# check=error=true



# Make sure RUBY_VERSION matches the Ruby version in .ruby-version
ARG RUBY_VERSION=3.3.5
FROM docker.io/library/ruby:$RUBY_VERSION-slim AS base

# Rails app lives here
WORKDIR /rails

# Install base packages
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y curl libjemalloc2 libvips sqlite3 postgresql-client && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Set production environment
ENV RAILS_ENV="production" \
    NODE_ENV="production" \
    BUNDLE_DEPLOYMENT="1" \
    BUNDLE_PATH="/usr/local/bundle" \
    BUNDLE_WITHOUT="development"



# Throw-away build stage to reduce size of final image
FROM base AS build

# Install packages needed to build gems
RUN apt-get update -qq && \
    apt-get install --no-install-recommends -y build-essential git libpq-dev pkg-config && \
    rm -rf /var/lib/apt/lists /var/cache/apt/archives

# Install Node.js v20.17.0
RUN curl -fsSL https://deb.nodesource.com/setup_20.x | bash - && \
    apt-get install --no-install-recommends -y nodejs && \
    npm install -g yarn@latest && \
    node -v && \
    yarn -v

# Install application gems
COPY Gemfile Gemfile.lock ./
RUN bundle install && \
    rm -rf ~/.bundle/ "${BUNDLE_PATH}"/ruby/*/cache "${BUNDLE_PATH}"/ruby/*/bundler/gems/*/.git && \
    bundle exec bootsnap precompile --gemfile

# Copy application code
COPY . .

# Precompile bootsnap code for faster boot times
RUN bundle exec bootsnap precompile app/ lib/

# Install node modules
RUN yarn install && \
    yarn cache clean

# Precompiling assets for production without requiring secret RAILS_MASTER_KEY
RUN SECRET_KEY_BASE_DUMMY=1 ./bin/rails assets:precompile



# Final stage for app image
FROM base

# Copy built artifacts: gems, application
COPY --from=build "${BUNDLE_PATH}" "${BUNDLE_PATH}"
COPY --from=build /rails /rails

# Run and own only the runtime files as a non-root user for security
RUN groupadd --system --gid 1000 rails && \
    useradd rails --uid 1000 --gid 1000 --create-home --shell /bin/bash && \
    chown -R rails:rails db log storage tmp
USER 1000:1000

# Entrypoint prepares the database.
ENTRYPOINT ["/rails/bin/docker-entrypoint"]

# Start server via Thruster by default, this can be overwritten at runtime
EXPOSE 8080
CMD ["./bin/thrust", "./bin/rails", "server"]

and my deploy.yml:

service: foo

image: pioz/foo

servers:
  web:
    - 1.1.1.1

proxy:
  ssl: true
  host: foo.bar.it
  app_port: 8080
  healthcheck:
    path: /up

registry:
  username: pioz
  password:
    - KAMAL_REGISTRY_PASSWORD

env:
  secret:
    - RAILS_MASTER_KEY
  clear:
    SOLID_QUEUE_IN_PUMA: true
    JOB_CONCURRENCY: 2
    WEB_CONCURRENCY: 10

aliases:
  console: app exec --interactive --reuse "bin/rails console"
  shell: app exec --interactive --reuse "bash"
  logs: app logs -f
  dbc: app exec --interactive --reuse "bin/rails dbconsole"

volumes:
  - "xxx_storage:/rails/storage"

asset_path: /rails/public/assets

builder:
  arch: arm64

ssh:
  user: ubuntu

deploy_timeout: 60

Please help!

@jeduardo824
Copy link

Hi guys, I'm having a similar problem, but I noticed it only occurs when the Cloudflare DNS Proxy is on. When the Proxy mode is disabled, the deploy works.

Anyone has any clue on this?

@pftg
Copy link

pftg commented Nov 26, 2024

Hi guys, I'm having a similar problem, but I noticed it only occurs when the Cloudflare DNS Proxy is on. When the Proxy mode is disabled, the deploy works.

Anyone has any clue on this?

Be sure to take a look at your Cloudflare settings from https://developers.cloudflare.com/ssl/origin-configuration/ssl-modes/ and tweak them to match your production SSL settings.

@jeduardo824
Copy link

@pftg I think my settings are correct. The SSL Setting in Cloudflare is set to Full, the production.rb contains assume_ssl and force_ssl set as true, and the proxy config contains ssl: true. Am I missing something?

@jeduardo824
Copy link

I'm back! The problem was not related to Kamal. I was using a firewall in my EC2 that was not allowing the proxy to check if the server was healthy. After some tweaks to the firewall rules, it finally worked.

@easydatawarehousing
Copy link

Ran into the same problem. To resolve follow these steps:

  • ssh into your server
  • Manually retry the failed kamal docker command.
    Take it from kamal output and remove --detach --restart unless-stopped.
    Wait for it to finish, hopefully you will get an error message, like database not available.

Or a bit quicker:

  • ssh into your server
  • run docker image ls and grab the ID of the newest image that wont start
  • run docker run -it --network kamal --env-file .kamal/apps/<your-app-name>/env/roles/web.env <image-id>
  • if the last command doesn't work you can try to start the container with only a shell:
    docker run -it --network kamal --env-file .kamal/apps/<your-app-name>/env/roles/web.env <image-id> bash
    Now you can try if the database is reacheable: pg_isready -h <host ip or name>
    or any other command that the container should execute.

In my case it had nothing to do with ssl settings, I needed to open the firewall for the database port. Smooth sailing from then on.

@aguynamedben
Copy link

aguynamedben commented Dec 3, 2024

@easydatawarehousing instructions worked for me, after hours of poking around. The issue was I needed to go into DigitalOcean managed Postgres instance and adjust my "Trusted Sources" so the "Droplet" could access the database.

From my DigitalOcean "Droplet" (not even from within the Docker container)

$ cat .kamal/apps/<your-app-name>/env/roles/web.env
$ source .kamal/apps/<your-app-name>/env/roles/web.env
$ pg_isready -h $DB_HOST -p $DB_PORT -U $DB_USER -d $DB_USER # `man pg_isready` for help

# Before the fix I was getting:
> myapp-db-1-do-user-18470523-0.j.db.ondigitalocean.com:25060 - no response

# After the fix I get:
pg_isready -h $DB_HOST -p $DB_PORT -U $DB_USER -d $DB_USER
myapp-db-1-do-user-18470523-0.j.db.ondigitalocean.com:25060 - accepting connections

Another peculiar thing that helped me understand what was going on is that kamal app boot worked fine but kamal app stop then kamal app start would report the start as failing.

It's been a rough two days with Kamal 2 + Rails 8. I see the vision, though. I think there needs to be a lot better handoff between Rails 8, Kamal 2 proxy, and Kamal CLI output. In this case I'd hope to see a "Couldn't connect to database" message.

@varyform
Copy link

varyform commented Dec 5, 2024

@pioz make sure your app can start in production. E.g.: set eager_load = true in development.rb

@eric-eye
Copy link

eric-eye commented Dec 10, 2024

I ran into this for a different reason, in case anyone else encounters this.

My problem was that my target instance was under-provisioned (Digital Ocean Basic / 512 MB / 1 vCPU).

The behavior is that an initial kamal deploy or kamal setup would be successful, but subsequent deploys would not. The docker exec kamal-proxy kamal-proxy deploy my-app-here-web step would hang (for much longer than 30 seconds, which should have been a hint to me) and then ultimately complain with this error.

I noticed it was also quite slow to connect to the target box and/or run commands on an active shell. Running top showed that swap was very active. So, probably what happens is that running two containers during handoff eats up more resources and grinds the health check to a halt and blue-green cannot be verified.

It's an easy gotcha if you're experimenting with Kamala on the cheapest cloud instance you can find, which is what I was doing 😄

@enderahmetyurt
Copy link

I ran into this for a different reason, in case anyone else encounters this.

My problem was that my target instance was under-provisioned (Digital Ocean Basic / 512 MB / 1 vCPU).

The behavior is that an initial kamal deploy or kamal setup would be successful, but subsequent deploys would not. The docker exec kamal-proxy kamal-proxy deploy my-app-here-web step would hang (for much longer than 30 seconds, which should have been a hint to me) and then ultimately complain with this error.

I noticed it was also quite slow to connect to the target box and/or run commands on an active shell. Running top showed that swap was very active. So, probably what happens is that running two containers during handoff eats up more resources and grinds the health check to a halt and blue-green cannot be verified.

It's an easy gotcha if you're experimenting with Kamala on the cheapest cloud instance you can find, which is what I was doing 😄

Thank you @eric-eye I faced same problem today. I could deploy my project and suddenly I couldn't even I didn't do anything. As you said, everything is slow on the remote machine. I connected it via ssh, and it took time. What should I do? I need to upgrade DO droplet. It's (Digital Ocean Basic / 1 GB / 1 vCPU).

@enderahmetyurt
Copy link

enderahmetyurt commented Dec 20, 2024

I ran into this for a different reason, in case anyone else encounters this.
My problem was that my target instance was under-provisioned (Digital Ocean Basic / 512 MB / 1 vCPU).
The behavior is that an initial kamal deploy or kamal setup would be successful, but subsequent deploys would not. The docker exec kamal-proxy kamal-proxy deploy my-app-here-web step would hang (for much longer than 30 seconds, which should have been a hint to me) and then ultimately complain with this error.
I noticed it was also quite slow to connect to the target box and/or run commands on an active shell. Running top showed that swap was very active. So, probably what happens is that running two containers during handoff eats up more resources and grinds the health check to a halt and blue-green cannot be verified.
It's an easy gotcha if you're experimenting with Kamala on the cheapest cloud instance you can find, which is what I was doing 😄

Thank you @eric-eye I faced same problem today. I could deploy my project and suddenly I couldn't even I didn't do anything. As you said, everything is slow on the remote machine. I connected it via ssh, and it took time. What should I do? I need to upgrade DO droplet. It's (Digital Ocean Basic / 1 GB / 1 vCPU).

I turn off/on my DO machine. Tried kamal deploy again and it worked 😂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests