Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auth: use ACLsDisabledACL when ACLs are disabled #18754

Merged
merged 1 commit into from
Oct 16, 2023
Merged

Conversation

tgross
Copy link
Member

@tgross tgross commented Oct 13, 2023

The RPC handlers expect to see nil ACL objects whenever ACLs are disabled. By using nil as a sentinel value, we have the risk of nil pointer exceptions and improper handling of nil when returned from our various auth methods that can lead to privilege escalation bugs. This is the final patch in a series to eliminate the use of nil ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and updates our authentication logic to use it. Included:

  • Extends auth package tests to demonstrate that nil ACLs are treated as failed auth and disabled ACLs succeed auth.
  • Adds a new AllowDebug ACL check for the weird special casing we have for pprof debugging when ACLs are disabled.
  • Removes the remaining unexported methods (and repeated tests) from the nomad/acl.go file.
  • Update the semgrep rules to detect improper nil ACL checking and remove the old invalid ACL checks.
  • Update the contributing guide for RPC authentication.

Ref: https://github.com/hashicorp/nomad-enterprise/pull/1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730
Ref: #18744

Notes for reviewers:

  • I know this looks like a monster PR, but the bulk of it is removing if aclObj != nil lines all over the RPC endpoints 😀
  • This actually isn't quite the final patch, as I need to have a patch in ENT removing the if aclObj != nil from those RPC endpoints too.

The RPC handlers expect to see `nil` ACL objects whenever ACLs are disabled. By
using `nil` as a sentinel value, we have the risk of nil pointer exceptions and
improper handling of `nil` when returned from our various auth methods that can
lead to privilege escalation bugs. This is the final patch in a series to
eliminate the use of `nil` ACLs as a sentinel value for when ACLs are disabled.

This patch adds a new virtual ACL policy field for when ACLs are disabled and
updates our authentication logic to use it. Included:

* Extends auth package tests to demonstrate that nil ACLs are treated as failed
  auth and disabled ACLs succeed auth.
* Adds a new `AllowDebug` ACL check for the weird special casing we have for
  pprof debugging when ACLs are disabled.
* Removes the remaining unexported methods (and repeated tests) from the
  `nomad/acl.go` file.
* Update the semgrep rules to detect improper nil ACL checking and remove the
  old invalid ACL checks.
* Update the contributing guide for RPC authentication.

Ref: hashicorp/nomad-enterprise#1218
Ref: #18703
Ref: #18715
Ref: #16799
Ref: #18730
Ref: #18744
Copy link
Member

@jrasell jrasell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Just noting something I looked at when reviewing this PR for future: can/should we avoid building the iradix transactions when performing the client/server/disabled ACL init? Curious what the time/space gain would be, and even if it's safe.

@tgross
Copy link
Member Author

tgross commented Oct 16, 2023

can/should we avoid building the iradix transactions when performing the client/server/disabled ACL init? Curious what the time/space gain would be, and even if it's safe.

Oh, I bet we could. We just need to make sure the ACL struct is otherwise initialized. Maybe a separate constructor for "virtual" ACLs. I'll open a follow-up issue for that.

@tgross tgross merged commit cbd7248 into main Oct 16, 2023
21 checks passed
@tgross tgross deleted the no-nil-acls-final branch October 16, 2023 13:30
tgross added a commit that referenced this pull request Dec 1, 2023
In #18754 we accidentally fixed a bug that prevented poststop tasks from getting
access to Variables. This was fixed in the 1.6.x branch in #19270, at which
point we discovered the fix had been done in main already as part of the auth
refactor. Add a changelog entry for it.
tgross added a commit that referenced this pull request Dec 1, 2023
In #18754 we accidentally fixed a bug that prevented poststop tasks from getting
access to Variables. This was fixed in the 1.6.x branch in #19270, at which
point we discovered the fix had been done in main already as part of the auth
refactor. Add a changelog entry for it.
tgross added a commit that referenced this pull request Dec 1, 2023
tgross added a commit that referenced this pull request Mar 18, 2024
As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the
object returned by `ResolveACL` if there's no error return, because in the case
where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking
nil is not a bug but should be discouraged because it opens us up to future bugs
that would bypass ACLs.

While working on an unrelated feature @lindleywhite discovered that we missed
removing the nil check from several endpoints with our semgrep linter. This
changeset fixes that.

Co-Author: Lindley <[email protected]>
tgross added a commit that referenced this pull request Mar 18, 2024
As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the
object returned by `ResolveACL` if there's no error return, because in the case
where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking
nil is not a bug but should be discouraged because it opens us up to future bugs
that would bypass ACLs.

While working on an unrelated feature @lindleywhite discovered that we missed
removing the nil check from several endpoints with our semgrep linter. This
changeset fixes that.

Co-Author: Lindley <[email protected]>
philrenaud pushed a commit that referenced this pull request Apr 18, 2024
As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the
object returned by `ResolveACL` if there's no error return, because in the case
where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking
nil is not a bug but should be discouraged because it opens us up to future bugs
that would bypass ACLs.

While working on an unrelated feature @lindleywhite discovered that we missed
removing the nil check from several endpoints with our semgrep linter. This
changeset fixes that.

Co-Author: Lindley <[email protected]>
tgross added a commit that referenced this pull request Apr 18, 2024
As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the
object returned by ResolveACL if there's no error return, because in the case
where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking
nil is not a bug but should be discouraged because it opens us up to future bugs
that would bypass ACLs.

We fixed a bunch of these cases in #20150
but I didn't update the semgrep rule, which meant we missed a few more. Update
the semgrep rule and fix the remaining cases.
tgross added a commit that referenced this pull request Apr 18, 2024
As of #18754 which shipped in Nomad 1.7, we no longer need to nil-check the
object returned by ResolveACL if there's no error return, because in the case
where ACLs are disabled we return a special "ACLs disabled" ACL object. Checking
nil is not a bug but should be discouraged because it opens us up to future bugs
that would bypass ACLs.

We fixed a bunch of these cases in #20150
but I didn't update the semgrep rule, which meant we missed a few more. Update
the semgrep rule and fix the remaining cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants