Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audit socket backend does not reconnect on error #2931

Closed
CVTJNII opened this issue Jun 28, 2017 · 4 comments · Fixed by #2934
Closed

Audit socket backend does not reconnect on error #2931

CVTJNII opened this issue Jun 28, 2017 · 4 comments · Fixed by #2934

Comments

@CVTJNII
Copy link

CVTJNII commented Jun 28, 2017

Storage: consul (HA available)
Version: Vault v0.7.3
Version Sha: 0b20ae0

In experimenting with the socket audit backend I've noticed it isn't reconnecting on error. I caused a failover to a node where the listener was down, and Vault logged the following error, which is expected:

2017/06/28 20:23:31.306148 [ERROR] core: failed to create audit entry: path=socket/ error=dial tcp 172.17.0.1:6500: getsockopt: connection refused

However, after this the socket audit backed appears to just die. After starting the listener I see no logs, and no further socket errors are logged. Attempting to reconfigure the backend with a HTTP PUT to /v1/sys/audit/socket has no effect, I'm unable to get it to resume audit logging.

This is undesirable behavior as the backend should resume logging when the listener comes back up, it should not fail indefinitely.

This is my complete backend config:

{
  "type": "socket",
  "options": {
    "address": "172.17.0.1:6500",
    "socket_type": "tcp",
    "log_raw": "false",
    "hmac_accessor": "false",
    "format": "json"
  }
}
@CVTJNII
Copy link
Author

CVTJNII commented Jun 28, 2017

UDP sockets do not exhibit this behavior, which is expected as they're connectionless.

@jefferai
Copy link
Member

The backend already does try to reconnect; take a look at https://github.com/hashicorp/vault/blob/master/builtin/audit/socket/backend.go#L146

However, you may be hitting an error at https://github.com/hashicorp/vault/blob/master/builtin/audit/socket/backend.go#L75 -- in that case the backend will fail to come up. If you have a different backend configured that comes up successfully Vault will still use that backend; if you have only the one, Vault will actually bail from taking over active duty.

This is in line with Vault working as long as any audit backend can write, and it's unclear what to do otherwise -- if an audit backend fails to come up, trying over and over again is often not the right approach and can simply make any underlying problems worse.

I believe this is the error you're seeing since you said it happened after you failed over Vault, in which case you have two options: add a second audit backend for redundancy in which case that will hopefully come up and still log, or fail over again, which should happen automatically if it's the only audit backend and the problem is on backend setup.

@CVTJNII
Copy link
Author

CVTJNII commented Jun 28, 2017

I do also have the file backend configured, which was working. However, I wish to have the file and socket backends up: file for reliable but difficult to access logs, and socket for less than reliable but easy to access logs. (I say less than reliable by the nature of networks and depending on external services.)

So the socket backend must be up when the initial connection is made or it is simply not used? Do I understand that correctly? I consider that bad behavior and would appreciate an option to always have it retry, even on initial errors. In my opinion this is a race, if Vault happens to failover at the same time the listener is down for some reason then Vault will never log to the socket without manual intervention, which is undesirable behavior in my environment.

EDIT: I did confirm it will reconnect if the initial connection is successful. As mentioned above I'd like to see an option to have it try and reconnect if the initial connection fails, in case that's something transient. Having a retry on an interval is much better in my opinion than being down indefinitely.

@jefferai
Copy link
Member

So the socket backend must be up when the initial connection is made or it is simply not used? Do I understand that correctly?

Yes. Any backend must work when being set up or it causes Vault post-unseal to fail. Audit backends are a special case as of a few versions ago though, when we changed it to any one must come up. The arguments in favor were persuasive, and it matches the model of any one backend must successfully log.

I'm hesitant to have a system whereby something causes the backend to try, try again for some period of time or number of tries.

Possibly the right thing to do is actually remove the initial connection when the backend is created. The first request would always error out but the error would be swallowed by a reconnect and retry, if successful.

jefferai added a commit that referenced this issue Jun 28, 2017
network failures are worked around. Also, during a reconnect always
close the existing connection.

Fixes #2931
jefferai added a commit that referenced this issue Jul 6, 2017
…ent (#2934)

network failures are worked around. Also, during a reconnect always
close the existing connection.

Fixes #2931
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants