-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
consul is dying from SIGPIPE under systemd with no log output #1688
Comments
Update: Still happens with 0.6.3 |
Here's the output from strace when it happens. Looks like
|
Figured out the "cause", IMO it should be worked around in consul/golang. The new docker RPMs for EL7 required an update of systemd. This makes journald restart. EL7's journald doesn't do the restart gracefully. There's a fix for that in systemd that hasn't been backported: https://bugs.freedesktop.org/show_bug.cgi?id=84923#c9 Then it was harder to track down, because consul was still letting other servers join the cluster. That's fixed by this golang/go#11845 Man, I thought today was Friday, not Monday. (resolution, make sure yum update is run before starting consul) |
Hi @lattwood if I understand what happened, our next release on Go 1.6 should fix this by causing Consul to exit immediately. Does that seem correct? Thanks! |
According to the change https://go-review.googlesource.com/#/c/18151/ you'll need to handle SIGPIPE or die with 1.6. Should a loss of stdout (and the ability to log) result in a crash? I'm looking into what's involved in getting the systemd patch backported, so journald doesn't close every fd on reload. |
It's kind of a weird situation and letting Consul die and restart (assuming you have a process monitor) will probably heal the problem. I'd hate to silently lose logging and soldier on. |
Unfortunately we're using expect 1 |
(Dev environment) |
We're running into this too. It should be noted that we do set
|
@justinclayton SIGPIPE is one of the 4 signals that systemd considers as OK when https://www.freedesktop.org/software/systemd/man/systemd.exec.html |
I'm getting this too on 0.6.4, is modifying the systemd service file to |
See here for a discussion on some of the issues: hashicorp/consul#1688 (comment) I got this problem running Centos 7 with Prom 0.18.0.
It looks like you could also use |
I'm not much of a systemd expert - do we need any changes on the Consul side to help here? |
Closing this as we never heard back. Please let us know if you still need help. |
I also encountered this problem Consul v0.7.0 Just start the consul For the first time to use the curl registered service can be a problem After the restart the consul, curl registration service is normal
curl: (52) Empty reply from server
service consul start
|
We're having this happen in a multi-vm vagrant environment backed by the vmware-fusion plugin randomly. This happened on 0.5.0, and 0.6.0 on CentOS 7. Currently in the process of upgrading to 0.6.3 locally, but I don't have any way to reliably reproduce it.
I noticed there's nothing in the signal handing code for dealing with SIGPIPE.
consul/command/agent/command.go
Line 768 in eb27a02
The text was updated successfully, but these errors were encountered: