-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Audit socket backend does not reconnect on error #2931
Comments
UDP sockets do not exhibit this behavior, which is expected as they're connectionless. |
The backend already does try to reconnect; take a look at https://github.com/hashicorp/vault/blob/master/builtin/audit/socket/backend.go#L146 However, you may be hitting an error at https://github.com/hashicorp/vault/blob/master/builtin/audit/socket/backend.go#L75 -- in that case the backend will fail to come up. If you have a different backend configured that comes up successfully Vault will still use that backend; if you have only the one, Vault will actually bail from taking over active duty. This is in line with Vault working as long as any audit backend can write, and it's unclear what to do otherwise -- if an audit backend fails to come up, trying over and over again is often not the right approach and can simply make any underlying problems worse. I believe this is the error you're seeing since you said it happened after you failed over Vault, in which case you have two options: add a second audit backend for redundancy in which case that will hopefully come up and still log, or fail over again, which should happen automatically if it's the only audit backend and the problem is on backend setup. |
I do also have the file backend configured, which was working. However, I wish to have the file and socket backends up: file for reliable but difficult to access logs, and socket for less than reliable but easy to access logs. (I say less than reliable by the nature of networks and depending on external services.) So the socket backend must be up when the initial connection is made or it is simply not used? Do I understand that correctly? I consider that bad behavior and would appreciate an option to always have it retry, even on initial errors. In my opinion this is a race, if Vault happens to failover at the same time the listener is down for some reason then Vault will never log to the socket without manual intervention, which is undesirable behavior in my environment. EDIT: I did confirm it will reconnect if the initial connection is successful. As mentioned above I'd like to see an option to have it try and reconnect if the initial connection fails, in case that's something transient. Having a retry on an interval is much better in my opinion than being down indefinitely. |
Yes. Any backend must work when being set up or it causes Vault post-unseal to fail. Audit backends are a special case as of a few versions ago though, when we changed it to any one must come up. The arguments in favor were persuasive, and it matches the model of any one backend must successfully log. I'm hesitant to have a system whereby something causes the backend to try, try again for some period of time or number of tries. Possibly the right thing to do is actually remove the initial connection when the backend is created. The first request would always error out but the error would be swallowed by a reconnect and retry, if successful. |
network failures are worked around. Also, during a reconnect always close the existing connection. Fixes #2931
Storage: consul (HA available)
Version: Vault v0.7.3
Version Sha: 0b20ae0
In experimenting with the socket audit backend I've noticed it isn't reconnecting on error. I caused a failover to a node where the listener was down, and Vault logged the following error, which is expected:
However, after this the socket audit backed appears to just die. After starting the listener I see no logs, and no further socket errors are logged. Attempting to reconfigure the backend with a HTTP PUT to /v1/sys/audit/socket has no effect, I'm unable to get it to resume audit logging.
This is undesirable behavior as the backend should resume logging when the listener comes back up, it should not fail indefinitely.
This is my complete backend config:
The text was updated successfully, but these errors were encountered: