-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nomad alloc exec
fails in TLS enabled clusters.
#7233
Comments
Thanks @henrikjohansen for reporting this. We'll take a closer look here. I'm a bit puzzled by the error message. It somewhat implies that the cert is only valid for 127.0.0.1 and fails to validate on the ip address the CLI uses to connect to host. If that's true, then I would expect connection to fail. Mind if you provide more info about the certs being used, their SAN value, and if there is a difference between server and client cert setup? I would appreciate any feedback or input you have.
|
That would explain it. I will follow up to ensure that handling roles is consistent. Thanks for the pointers and quick follow up. I'll consult with the team about the docs. I suspect adding host ip/domain would be beneficial for browsers (when using UI) as they don't special case certificate roles either. |
FYI - I have tried adding the relevant hostnames as subject alternate names to the client cert ... that did not make a difference. Besides , cert roles is what makes Nomad TLS without Vault tolerable to manage :) |
@henrikjohansen Thanks again for reporting the issue. I have fixed the bug where nomad alloc exec handling differed from nomad alloc logs. The PR #7274 has some context of what the issue is. Though, I have confirmed that In a test cluster that follows the the docs above, I can verify the failure in the follow; note that nomad alloc logs error message is misleading:
These correspond to error log messages like:
So next steps for us would be:
|
@notnoop Well, the docs specifically state that adding SANs is considered an anti-pattern for most Nomad deployments :
All It might be worth noting that |
Thanks for pointing this out - I'll discuss it with the team and follow up.
Can you try running the commands I posted above in my sample? Others were failing for me too? |
All our clusters have the same behavior ... only Litterally very single |
Sorry, I meant the steps in #7233 (comment) where you switch |
We never query
|
I see - I'm afraid I'm still seeing other commands doing hostname validation. Thank you for your patience, as you walk me through this. Just to confirm, in your setup, the nomad server (or load balancer if any) servicing the FQDN is configured with a cert that doesn't have the FQDN in its SAN values? Can you run Here are my steps to check hostname validation when I use a custom FQDN and guide above with nomad 0.10.4 binary. In all cases, the cli command fails when I use an ip/host that doesn't match the cert SAN/CN values:
In my test cluster here, my certification information in command above is:
|
Ah, I see the confusion. The server certificates have their respective hostnames added as SANs since they are static. The client certificates however have not as they change rather frequently. |
Perfect - that clarifies everything. The PR #7274 fixes your case then so alloc exec handles the case where client cert isn't configured with the ip address, so it's inline with other alloc commands. Indeed, the documentation you linked to is ambiguous now. It implies you don't need SAN/IP for any node, not even the servers, for purposes of nomad, and it only mentions SAN/IP for purposes of integration with other tools, e.g. curl. This is not correct. We should update it so that it calls out the nomad server benefit of having SAN values, and that the CLI acts just like other tools and doesn't special case cert roles. |
I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues. |
Nomad version
Nomad v0.10.4+ent (284fc3a)
Issue
We are running a TLS enabled cluster - all nomad cli commands work with the exception of nomad alloc exec which fails with a TLS error :
failed to exec into task: x509: certificate is valid for 127.0.0.1, not 1.2.3.4
Yes, you could set
NOMAD_SKIP_VERIFY
but this is not something I can recommend to our internal users.Running the example job from nomad init :
$
nomad alloc fs 74a25298
$
nomad alloc logs 74a25298
$
nomad alloc exec 74a25298 /bin/sh
Reproduction steps
See above
The text was updated successfully, but these errors were encountered: