Stop incorrect "Process exited with status 1" puma errors from occurring on SIGTERM #1026

david-a-wheeler · 2018-02-02T16:54:19Z

Every day we typically get incorrect error reports of the following form:

Feb 01 10:22:14 production-bestpractices heroku/web.1: Process exited with status 1

This is caused because Heroku cycles the webserver (puma) by sending SIGTERM, but when puma exits it incorrectly returns an error code of 1 ("error occurred"). That is simply wrong; a process that exits because it was specifically asked to do so (via SIGTERM) has not experienced an error. Instead, exiting on a request to do so, without any other error, is correct behavior and should return a status code of 0. However, because puma incorrectly reports a non-zero error code, lots of other logs whine about a significant problem (even though no problem has occurred). Here's more details showing this happening:

Feb 01 15:22:13 production-bestpractices heroku/web.1: Cycling 
Feb 01 15:22:13 production-bestpractices heroku/web.1: State changed from up to starting 
Feb 01 15:22:14 production-bestpractices heroku/web.1: Stopping all processes with SIGTERM 
Feb 01 15:22:14 production-bestpractices app/web.1: - Gracefully stopping, waiting for requests to finish 
Feb 01 15:22:14 production-bestpractices app/web.1: === puma shutdown: 2018-02-01 15:22:14 +0000 === 
Feb 01 15:22:14 production-bestpractices app/web.1: - Goodbye! 
Feb 01 15:22:14 production-bestpractices app/web.1: bundler: failed to load command: puma (/app/vendor/bundle/ruby/2.4.0/bin/puma) 
Feb 01 15:22:14 production-bestpractices app/web.1: SignalException: SIGTERM 
Feb 01 15:22:14 production-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/launcher.rb:397:in `block in setup_signals' 
Feb 01 15:22:14 production-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/single.rb:106:in `join' 
Feb 01 15:22:14 production-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/single.rb:106:in `run' 
Feb 01 15:22:14 production-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/launcher.rb:183:in `run' 
Feb 01 15:22:14 production-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/cli.rb:77:in `run' 
Feb 01 15:22:14 production-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/bin/puma:10:in `<top (required)>' 
Feb 01 15:22:14 production-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/bin/puma:23:in `load' 
Feb 01 15:22:14 production-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/bin/puma:23:in `<top (required)>' 
Feb 01 15:22:14 production-bestpractices heroku/web.1: Process exited with status 1 
Feb 01 15:22:24 production-bestpractices heroku/web.1: Starting process with command `bundle exec puma -C config/puma.rb`

We are running the current version of puma (3.11.2). See puma issue 1438 and bundler issue 6090 for more.

Note that we run puma using this Procfile:

web: bundle exec puma -C config/puma.rb

One solution (noted in the issues above) would be to make "bundle exec" use Kernel.exec instead of Kernel.load. This essentially disables a minor optimization that bundle usually uses. This is slightly slower on startup (it replaces a whole OS process, instead of just reloading from Ruby), but we rarely start processes so it would have practically no effect in the real world. Here's how to force that:

bundle config disable_exec_load true

Other solutions are welcome.

The text was updated successfully, but these errors were encountered:

david-a-wheeler · 2018-02-02T18:13:28Z

For more information, see bundle config documentation

david-a-wheeler · 2018-02-04T03:24:52Z

Per "bundle config --help", I believe we can just set the environment variable BUNDLE_DISABLE_EXEC_LOAD - the environment variables take precedence, and then we don't have to worry about race conditions setting a global variable if multiple procs start at the same time.

The documentation isn't clear about the expected value of the environment variable; it may not matter. Let's start with "true" as the value, that's clear.

Disable exec-load in Procfile to eliminate a false puma error at runtime on Heroku. This is a tiny but non-obvious change, so this commit message provides lots of detail. Every day we typically get incorrect error reports of the following form: > heroku/web.1: Process exited with status 1 This is caused because Heroku cycles the webserver (puma) by sending SIGTERM, but when puma exits it incorrectly returns an error code of 1 ("error occurred"). That is simply wrong; a process that exits because it was specifically asked to do so (via SIGTERM) has not experienced an error. Instead, exiting on a request to do so, without any other error, is correct behavior and should return a status code of 0. However, because puma incorrectly reports a non-zero error code, lots of other logs whine about a significant problem (even though no problem has occurred). At this time there seems to be some disagreement on whether the problem is really from puma or bundler. See puma issue 1438 and bundler issue 6090 for more. In a grand sense we don't care, we just don't want false positives. One solution (noted in the issues above) would be to make "bundle exec" use Kernel.exec instead of Kernel.load. This essentially disables a minor optimization that bundler usually uses. This is slightly slower on startup (it replaces a whole OS process, instead of just reloading from Ruby), but we rarely start web processes so it would have practically no effect in the real world. Per "bundle config --help", we do this just by setting the environment variable BUNDLE_DISABLE_EXEC_LOAD. The environment variables take precedence, and then we don't have to worry about race conditions setting a global variable if multiple procs start at the same time. This commit uses the "env" command; technically we probably don't need to, but this ensures that regardless of shell (or straight invocation of exec) we get our environment variable set. Signed-off-by: David A. Wheeler <[email protected]>

dankohn · 2018-02-04T04:00:52Z

OK.

…

-- Dan Kohn <[email protected]> Executive Director, Cloud Native Computing Foundation https://www.cncf.io +1-415-233-1000 https://www.dankohn.com

On Sat, Feb 3, 2018 at 10:24 PM, David A. Wheeler ***@***.***> wrote: Per "bundle config --help", I believe we can just set the environment variable BUNDLE_DISABLE_EXEC_LOAD - the environment variables take precedence, and then we don't have to worry about race conditions setting a global variable if multiple procs start at the same time. The documentation isn't clear about the expected value of the environment variable; it may not matter. Let's start with "true" as the value, that's clear. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#1026 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC8MBiawIB36J0lR9O_U4OgzAbzE6yWYks5tRSMFgaJpZM4R3eTc> .

david-a-wheeler · 2018-02-04T14:24:57Z

Great! I've labelled this issue as "bug". It's not really a bug in our program, it's really a workaround for a bug in a program we depend on. That said, the actions manifest as a bug in our program, and in the end, we're responsible for the impact of the components we depend on .

david-a-wheeler · 2018-02-04T20:17:17Z

I think we've resolved it. With the environment variable change, here are example log entries we see:

Feb 04 15:11:50 master-bestpractices heroku/web.1: Restarting 
Feb 04 15:11:50 master-bestpractices heroku/web.1: State changed from up to starting 
Feb 04 15:11:50 master-bestpractices app/web.1: - Gracefully stopping, waiting for requests to finish 
Feb 04 15:11:50 master-bestpractices app/web.1: === puma shutdown: 2018-02-04 20:11:50 +0000 === 
Feb 04 15:11:50 master-bestpractices app/web.1: - Goodbye! 
Feb 04 15:11:51 master-bestpractices heroku/web.1: Stopping all processes with SIGTERM 
Feb 04 15:11:51 master-bestpractices heroku/web.1: Process exited with status 143 
Feb 04 15:12:02 master-bestpractices heroku/web.1: Starting process with command `env BUNDLE_DISABLE_EXEC_LOAD=true bundle exec puma -C config/puma.rb` 
Feb 04 15:12:04 master-bestpractices app/web.1: Puma starting in single mode... 
Feb 04 15:12:04 master-bestpractices app/web.1: * Version 3.11.2 (ruby 2.4.2-p198), codename: Love Song 
Feb 04 15:12:04 master-bestpractices app/web.1: * Min threads: 5, max threads: 5 
Feb 04 15:12:04 master-bestpractices app/web.1: * Environment: production 
Feb 04 15:12:06 master-bestpractices app/web.1: [Scout] [02/04/18 20:12:05 +0000 web.1 (4)] WARN : Deferring agent start. Standing by for first request 
Feb 04 15:12:07 master-bestpractices app/web.1: * Listening on tcp://0.0.0.0:52465 
Feb 04 15:12:07 master-bestpractices heroku/web.1: State changed from starting to up

That compares to these older messages, with the "failed to load" messages from bundler:

    Feb 04 15:10:24 staging-bestpractices heroku/web.1: State changed from starting to up 
    Feb 04 15:10:35 staging-bestpractices heroku/web.1: Restarting 
    Feb 04 15:10:35 staging-bestpractices heroku/web.1: State changed from up to starting 
    Feb 04 15:10:35 staging-bestpractices heroku/web.1: Stopping all processes with SIGTERM 
    Feb 04 15:10:35 staging-bestpractices app/web.1: - Gracefully stopping, waiting for requests to finish 
    Feb 04 15:10:35 staging-bestpractices app/web.1: === puma shutdown: 2018-02-04 20:10:35 +0000 === 
    Feb 04 15:10:35 staging-bestpractices app/web.1: - Goodbye! 
    Feb 04 15:10:35 staging-bestpractices app/web.1: bundler: failed to load command: puma (/app/vendor/bundle/ruby/2.4.0/bin/puma) 
    Feb 04 15:10:35 staging-bestpractices app/web.1: SignalException: SIGTERM 
    Feb 04 15:10:35 staging-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/launcher.rb:397:in `block in setup_signals' 
    Feb 04 15:10:35 staging-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/single.rb:106:in `join' 
    Feb 04 15:10:35 staging-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/single.rb:106:in `run' 
    Feb 04 15:10:35 staging-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/launcher.rb:183:in `run' 
    Feb 04 15:10:35 staging-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/lib/puma/cli.rb:77:in `run' 
    Feb 04 15:10:35 staging-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/gems/puma-3.11.2/bin/puma:10:in `<top (required)>' 
    Feb 04 15:10:35 staging-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/bin/puma:23:in `load' 
    Feb 04 15:10:35 staging-bestpractices app/web.1:   /app/vendor/bundle/ruby/2.4.0/bin/puma:23:in `<top (required)>' 
    Feb 04 15:10:36 staging-bestpractices heroku/web.1: Process exited with status 1 
    Feb 04 15:10:46 staging-bestpractices heroku/web.1: Starting process with command `bundle exec puma -C config/puma.rb` 
    Feb 04 15:10:49 staging-bestpractices app/web.1: Puma starting in single mode... 
    Feb 04 15:10:49 staging-bestpractices app/web.1: * Version 3.11.2 (ruby 2.4.2-p198), codename: Love Song 
    Feb 04 15:10:49 staging-bestpractices app/web.1: * Min threads: 5, max threads: 5 
    Feb 04 15:10:49 staging-bestpractices app/web.1: * Environment: production 
    Feb 04 15:10:54 staging-bestpractices app/web.1: [Scout] [02/04/18 20:10:54 +0000 web.1 (4)] WARN : Deferring agent start. Standing by for first request 
    Feb 04 15:10:57 staging-bestpractices heroku/web.1: State changed from starting to up 
    Feb 04 15:10:57 staging-bestpractices app/web.1: * Listening on tcp://0.0.0.0:27365 
    Feb 04 15:10:57 staging-bestpractices app/web.1: Use Ctrl-C to stop

david-a-wheeler · 2018-02-04T20:20:18Z

Strictly speaking the process now exiting with exit code 143, not 0. But that's a widely-expected signal value for "received and processed SIGTERM normally", and the rest of the system seems to handle it well. Our problem was that we were getting exit code 1, which is widely interpreted as "things went badly".

We'll keep monitoring, but seems okay now.

david-a-wheeler · 2018-02-08T15:43:58Z

Reopening. Heroku sees the exit code 143, and from its viewpoint that's non-zero and thus STILL an error.

We'll have to insert a little shim that captures error code 143, and turns it into error code 0.

david-a-wheeler · 2018-02-09T12:23:44Z

#1048 is intended to solve it.

david-a-wheeler · 2018-02-16T02:46:34Z

Completed. It's switched to error code 143, and the rest of the system now knows that 143 isn't a problem.

david-a-wheeler added the bug label Feb 4, 2018

david-a-wheeler closed this as completed in 54d714a Feb 4, 2018

david-a-wheeler reopened this Feb 8, 2018

david-a-wheeler closed this as completed Feb 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop incorrect "Process exited with status 1" puma errors from occurring on SIGTERM #1026

Stop incorrect "Process exited with status 1" puma errors from occurring on SIGTERM #1026

david-a-wheeler commented Feb 2, 2018

david-a-wheeler commented Feb 2, 2018

david-a-wheeler commented Feb 4, 2018

dankohn commented Feb 4, 2018 via email

david-a-wheeler commented Feb 4, 2018

david-a-wheeler commented Feb 4, 2018

david-a-wheeler commented Feb 4, 2018

david-a-wheeler commented Feb 8, 2018

david-a-wheeler commented Feb 9, 2018

david-a-wheeler commented Feb 16, 2018

Stop incorrect "Process exited with status 1" puma errors from occurring on SIGTERM #1026

Stop incorrect "Process exited with status 1" puma errors from occurring on SIGTERM #1026

Comments

david-a-wheeler commented Feb 2, 2018

david-a-wheeler commented Feb 2, 2018

david-a-wheeler commented Feb 4, 2018

dankohn commented Feb 4, 2018 via email

david-a-wheeler commented Feb 4, 2018

david-a-wheeler commented Feb 4, 2018

david-a-wheeler commented Feb 4, 2018

david-a-wheeler commented Feb 8, 2018

david-a-wheeler commented Feb 9, 2018

david-a-wheeler commented Feb 16, 2018