-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Task API via Unix Domain Socket #15864
Conversation
This changeset allows Workload Identities to authenticate to all the RPCs that support HTTP API endpoints, for use with PR #15864. * Extends the work done for pre-forwarding authentication to all RPCs that support a HTTP API endpoint. * Consolidates the auth helpers used by the CSI, Service Registration, and Node endpoints that are currently used to support both tokens and client secrets. Intentionally excluded from this changeset: * The Variables endpoint still has custom handling because of the implicit policies. Ideally we'll figure out an efficient way to resolve those into real policies and then we can get rid of that custom handling. * The RPCs that don't currently support auth tokens (i.e. those that don't support HTTP endpoints) have not been updated with the new pre-forwarding auth We'll be doing this under a separate PR to support RPC rate metrics.
// Initialize builtin API server here for use in the client, but it won't | ||
// accept connections until the HTTP servers are created. | ||
a.builtinServer = newBuiltinAPI() | ||
conf.APIListenerRegistrar = a.builtinServer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this necessary? The implemenation of builtinAPI
is clever and unfamiliar, which is usually a bad thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tl;dr I would love a better way of doing this.
Is this necessary?
Kind of? Basically giving the Client access to the Agent HTTP server creates a circular dependency that needs solving somehow:
- NewAgent creates Client
- agent.Command creates HTTP servers which accept the Agent as a dependency
That worked out fine when the Client was oblivious to the existence of the HTTP API.
When we needed access to the HTTP API for template
support we used bufconn
to create a Dialer connected to a Listener, returned the Listener to Agent, and Agent would slap the HTTP API on it. bufconn
blocks Dials until the Listener accepts them.
Unix Domain Sockets (UDS) are listeners themselves though, so we couldn't just reuse the Dialer (unless we wanted to create 1 Accept goroutine per Task and 2 proxy goroutines for every UDS connection to proxy reads and writes from the UDS to the bufconn
).
So I more or less copied bufconn
except instead of blocking a Dial attempt on its corresponding Accept, I block a Register(listener)
attempt on the HTTP Server existing.
schmichael why don't you just swap the order we start the api and client?
I think that could work. The Agent.{Client,Server}
getters for the API to access the client and server would need locks and setters since they would be nil until NewAgent returned them. The dependencies that the API has on Agent itself are harder to untangle: Agent.RPC(
being one of the harder bits. Do we do a bufconn
/builtinServer
style block RPCs from the HTTP API until the Agent is created
? We don't really gain much from swapping the order if we don't also detangle the circular dependencies. So it gets very complicated especially because both Client and Server have their own RPC implementations and methods of routing requests around and the HTTP API just grabs one and uses it.
Marked as ready for review despite some outstanding TODOs. All of them are fairly orthogonal to this and may be best suited to followup PRs anyway. |
@schmichael can you look into these test failures?
|
return nil | ||
} | ||
|
||
if err := os.Chmod(udsPath, 0o777); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think our logic here for why this is safe in the context of the task dir is sound, but it's also going to result in us getting dinged by third-party auditors armed with sophisticated source code analysis tools like grep
😀
Can we leave a comment above here about why it's ok in hopes of reducing the amount of noise we get from those discussions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, does it need to be world writable? Or can we apply the same logic made in https://github.com/hashicorp/nomad/blob/main/helper/users/lookup.go#L39
And does it need to be executable? some random sockets on my machine are srw-rw-rw-
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points. I was going to say well we require auth anyway so who cares
but...
These are the sorts of decisions we have to live with forever, and respecting Task.User
when its set (as nomad_token
does too) seems like a valuable road to go down.
So I followed #15755 and made this attempt to chown the socket to Task.User
with 0o600
perms. Like #15755 it falls back to 0o666
if that fails (which it always will on Windows or as a non-root user).
return nil | ||
} | ||
|
||
if err := os.Chmod(udsPath, 0o777); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, does it need to be world writable? Or can we apply the same logic made in https://github.com/hashicorp/nomad/blob/main/helper/users/lookup.go#L39
And does it need to be executable? some random sockets on my machine are srw-rw-rw-
Co-authored-by: Seth Hoenig <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Hi there, I realize I am kinda late to the party here -- sorry for that. I was wondering if it is a good idea to expose the API sockets to tasks by default. I realize that the access to it requires authentication, but not all tasks are trusted and a pre-auth vulnerability in the API would allow "malicious" tasks to exploit this. From an security POV I'd very much prefer this to be opt in (it would keep the attack surface smaller). |
That's a valid concern @apollo13! Mind opening a new issue? We discussed the ability to opt out internally but couldn't decide who should control it: Client Agent, jobspec, namespace, ...some combination of those to allow defaults in one place to be overridden in another place? |
The goal of this PR is to provide universal workload access to Nomad's HTTP API via Unix Domain Sockets (UDS).
Dynamic Node Metadata #15844 is one potential use case.
Inspired by e90807e
TODO