You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We found out that our nomad servers in one environment had been using a lot more memory than they should. We took a look at the usage using the golang profiler and saw the following:
In the image you can see that most of the memory is thought the function VaultAccessorRestore
We had a problem a few weeks ago where nomad started creating vault tokens for allocations all the time. For context, we have some clients in the cluster that are not managed by our team. These clients access vault via a Load balancer while the servers access them using consul service DNS. The load balancer certificate changed and these clients stopped trusting vault.
ERR] client.vault: failed to derive token for allocation "437e671e-be25-b950-f22a-a84b649e9dfb" and tasks [task]: failed to unwrap the token for task "my-task": Put https://my-load-balancer.internal/v1/sys/wrapping/unwrap: x509: certificate signed by unknown authority
It seems that Nomad is storing all those tokens (about 500k) information in the raft database.
Looking at the code it seems like when a server adquire leadership it revokes vault accessors, but as it fails to find the accessor in vault (the tokens were already revoked) it stops and doesn't delete them from raft.
{"@level":"warn","@message":"failed to revoke tokens. Will reattempt until TTL","@module":"nomad.vault","@timestamp":"2020-05-12T17:19:28.249653Z","error":"failed to revoke token (alloc: \"58c8f4dc-1d98-c6a0-4492-babbae44d9ed\", node: \"712c90c0-cbc9-432b-3e32-b1c72237be15\", task: \"manager\"): Error making API request.\n\nURL: POST https://vault.service.consul:8200/v1/auth/token/revoke-accessor\nCode: 400. Errors:\n\n* 1 error occurred:\n\t* invalid accessor\n\n"}
I think that nomad should remove those tokens also when vault doesn't know about them. Any thoughts?
The text was updated successfully, but these errors were encountered:
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.
Nomad version
Nomad v0.10.5
Issue
Vault accessors doesn't get deleted.
We found out that our nomad servers in one environment had been using a lot more memory than they should. We took a look at the usage using the golang profiler and saw the following:
In the image you can see that most of the memory is thought the function VaultAccessorRestore
We had a problem a few weeks ago where nomad started creating vault tokens for allocations all the time. For context, we have some clients in the cluster that are not managed by our team. These clients access vault via a Load balancer while the servers access them using consul service DNS. The load balancer certificate changed and these clients stopped trusting vault.
It seems that Nomad is storing all those tokens (about 500k) information in the raft database.
Looking at the code it seems like when a server adquire leadership it revokes vault accessors, but as it fails to find the accessor in vault (the tokens were already revoked) it stops and doesn't delete them from raft.
I think that nomad should remove those tokens also when vault doesn't know about them. Any thoughts?
The text was updated successfully, but these errors were encountered: