Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AuthorizationPolicy: add serviceAccounts field #3340

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

howardjohn
Copy link
Member

This is a minor implementation complexity in favor of a dramatic
simplification to usage of Istio authorization.

Today, if a user wants to dive into zero-trust 101, they are presented
with a requirement to set principals: A list of peer identities derived from the peer certificate, and write
<TRUST_DOMAIN>/ns/<NAMESPACE>/sa/<SERVICE_ACCOUNT>.

This simple sentance is a huge cognitive overload for users in my
experience working with users, and unnecesarily pushes SPIFFE, trust
domains, and other unneccesary concepts onto users. Additionally, the
requirement to set 'trust domain', which is overwhelmingly not desired
by users who just want SA auth, leads to all sorts of wonky workarounds
in Istio like cluster.local being a magic value.

Instead, we just add a SA field directly. This takes the format ns/sa,
as you cannot safely reference a SA without a namespace field as well.
Note we do this, rather than just require you to set 'service account' and 'namespace'
as individual fields, since you could have namespace=[a,b],sa=[d,e]
which is ambiguous.

If this is directionally approved, I will add some more documentation
and CEL validation and testing.

This is a minor implementation complexity in favor of a dramatic
simplification to usage of Istio authorization.

Today, if a user wants to dive into zero-trust 101, they are presented
with a requirement to set `principals`: `A list of peer identities
derived from the peer certificate`, and write
`<TRUST_DOMAIN>/ns/<NAMESPACE>/sa/<SERVICE_ACCOUNT>`.

This simple sentance is a huge cognitive overload for users in my
experience working with users, and unnecesarily pushes SPIFFE, trust
domains, and other unneccesary concepts onto users. Additionally, the
requirement to set 'trust domain', which is overwhelmingly not desired
by users who just want SA auth, leads to all sorts of wonky workarounds
in Istio like `cluster.local` being a magic value.

Instead, we just add a SA field directly. This takes the format `ns/sa`,
as you cannot safely reference a SA without a namespace field as well.
Note we do this, rather than just require you to set 'service account' and 'namespace'
as individual fields, since you could have `namespace=[a,b],sa=[d,e]`
which is ambiguous.

If this is directionally approved, I will add some more documentation
and CEL validation and testing.
@howardjohn howardjohn requested a review from a team as a code owner October 18, 2024 20:24
@howardjohn howardjohn added the release-notes-none Indicates a PR that does not require release notes. label Oct 18, 2024
@istio-testing istio-testing added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Oct 18, 2024
Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And you should document only one field can be specified

@@ -428,6 +428,8 @@ message Source {
// `"<TRUST_DOMAIN>/ns/<NAMESPACE>/sa/<SERVICE_ACCOUNT>"`, for example, `"cluster.local/ns/default/sa/productpage"`.
// This field requires mTLS enabled and is the same as the `source.principal` attribute.
//
// Usage of `serviceAccounts` is typically simpler and offers the same functionality.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nope, if this is a client from outside, sa is not known, in this case principals is still needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't get it. If I know to write spiffe://cluster.local/ns/foo/sa/bar then surely I can know to write foo/bar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the client is from external, the identity could be any format, nit limited to spiffe://cluster.local/ns/foo/sa/bar

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case the user would not use this field then.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not deprecating principals,just making the 99.9999% use case easier

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something like this would clarify?

Suggested change
// Usage of `serviceAccounts` is typically simpler and offers the same functionality.
// Usage of `serviceAccounts` is typically simpler and offers similar functionality. For complex scenarios principals are still fully supported.

Copy link
Contributor

@bleggett bleggett Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not even for "complex" scenarios - hardcoding principals in the bespoke Istio format in our Auth policies is one reason we can't currently support complex scenarios at all (custom SPIFFE IDs, SPIRE etc) - so we should just say that:

Suggested change
// Usage of `serviceAccounts` is typically simpler and offers the same functionality.
// Usage of `serviceAccounts` is typically simpler and offers the same security guarantees. Principals are still fully supported, but not recommended, as encoding complete principal strings leads to fragile policies.

Copy link
Contributor

@ilrudie ilrudie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direction looks good. This seems inline with ambient's overarching mission to simplify the things which can be simple.

@ilrudie
Copy link
Contributor

ilrudie commented Oct 22, 2024

And you should document only one field can be specified

Is this decided for sure? It seems to me that mixing and matching is plausible even if likely not recommended and more error prone.

@howardjohn
Copy link
Member Author

Is this decided for sure? It seems to me that mixing and matching is plausible even if likely not recommended and more error prone.

IMO you should be able to set both. The fields are not strictly related... its fine to say I want to allow from 'foo/bar OR spiffe://something-else'

@bleggett
Copy link
Contributor

bleggett commented Oct 23, 2024

And you should document only one field can be specified

You should be able to include as many fields as Istio chooses to support in the AuthPolicy, ultimately - if they can be matched against the identity/principal, we will match them.

So this will probably eventually be a list of substrings to match against OR a whole SPIFFE ID.

SA is all we need to start, but this is also heavily related to istio/istio#43105 - effectively we cannot even properly support arbitrary SPIFFE IDs without this, so this is required for better SPIFFE/SPIRE support as well.

All that is really required is to match against substrings - whether Istio happens to be matching against a SPIFFE ID principal in the Istio format internally or not doesn't matter.

IMO you should be able to set both. The fields are not strictly related... its fine to say I want to allow from 'foo/bar OR spiffe://something-else'

as long as this is a strict OR, I agree - this will help encourage people to stop hardcoding SPIFFE IDs in auth policies in the Istio-specific format, which is the only real obstacle we have to supporting customizable SPIFFE IDs.

EDIT: actually wrong - this will still work just fine.

//
// This takes the format `<namespace>/<serviceaccount>`.
//
// If not set, any service account is allowed.
Copy link
Contributor

@bleggett bleggett Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If not set, any service account is allowed.
// If not set, any service account is allowed.
// if both principal and this field are set, this field has precedence

If we are going to let people set both, we need to be explicit about whether it's an AND or an OR match, and what the precedence is if both are set.

(probably with a blurb on both principal and service account)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting both should be allowed. Presumably internally we can normalize the types and then can just append one list to the other, dedupe and move on so neither takes precedence.

Copy link
Contributor

@bleggett bleggett Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think setting both should be allowed. Presumably internally we can normalize the types and then can just append one list to the other, dedupe and move on so neither takes precedence.

The problem is I don't think you can actually do that/it would make the existing problem worse to do that.

Given an AuthPol

principal: spiffe://example.org/ns/default/sa/my-sa
service_account: default/my-sa

How do I evaluate the AuthPol if the presented workload principal is actually (best case)

spiffe://example.org/ns/default/sa/my-sa/some/other/stuff

or (worst case)

spiffe://example.org/beep/boop/ns/default/sa/my-sa

which should win in that case? Neither?

If we change this, we should at least change it in a way that makes istio/istio#43105 easier, and not harder. Supporting things other than the principal in AuthPol definitely makes #43105 easier, but not if we ignore the current problem we have with encouraging people to put fixed/complete principal strings in their AuthPolicies, which creates the secondary problem of forcing everyone to use a very specific/exact/fixed principal format which is compatible with no other product.

We either need to make the combined semantics very clear in the API, or make them mutually exclusive - I don't much care which, but I think it has to be one or the other.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe I misunderstand but my guess was internally we would use this as a shorthand for a spiffe ID in the istio format making the conversion from ns/sa to spiffe pretty straight forward. I do agree that spiffe -> ns/sa presumes all spiffe IDs are in the istio format which we likely don't want but spiffe -> ns/sa is lossy so we probably don't want to do that conversion even if we were ok mandating the istio format

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the use case for a mix and match would, in practice, be limited. If you want to require some advanced ID format which includes some/other/stuff then I don't think relying on ns/sa is going to work for you really at all and in that case you should NOT try to specify things that way. I just don't think our API can really track the user's intent in that way. If this is your scenario then you probably need additional policy to enforce it

Copy link
Contributor

@bleggett bleggett Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I just fully misunderstand this. What do we expect the SANs in our certs will look like if not a spiffe ID?

We should expect it will look like a SPIFFE ID (it follows the spec, has a trust domain, and some fields we care about).

We should not require or assume it will look like an Istio SPIFFE ID (it follows the spec, has a trust domain, and ONLY contains the fields we care about, or we barf)

There's no particular reason they need to be "our" certs, and they may not be. They could be SPIREs, or anybodys. Istio (especially ambient) really doesn't care what CA grants workload identities, as long as those identities have

  1. A SAN.
  2. Which is in the SPIFFE ID format.
  3. Which has AT-LEAST certain Istio-specific fields.

The problem is the current AuthPol API with principal implicitly requires that AT-LEAST to be an AT-MOST in all cases, because it only supports an exact-match principal - Istio does not require this. Just our (not great) AuthPol API, which this seeks to change with net-new fields (which is great).

Imagine the SAN is a SPIFFE ID, but you can't make assumptions about its complete format. You can assume it will have ns/sa/td parts - but you can't assume you will always have an exact string match against the Istio-only format.

  • for an AuthPol that is match exactPrincipal || ns/sa -> policy resolution is always unambiguous, great.
  • for an AuthPol that is match exactPrincipal -> policy resolution is always unambiguous, great.
  • for an AuthPol that is match ns/sa -> -> policy resolution is always unambiguous, great.
  • for an AuthPol that is match exactPrincipal && ns/sa -> this is effectively impossible to resolve unambiguously, unless the API excludes this condition or introduces an explicit precedence (or we make implicit assumptions elsewhere which will confuse people looking at the under-specified API - which I would like to avoid here).

Current state:

  1. AuthPol has the principal field.
  2. If specified, this must be a SPIFFE ID.
  3. If specified, it also (at this time) must be a SPIFFE ID (exact/strict match) in the Istio-specific format. Which is not desirable because it effectively singlehandledly forces you to use Istio's workload CA - or use a different workload CA and retool it to use Istio's format for SANs. Which is a pain if you already have a workload CA in your env and you already have a format - we have had multiple bugs opened around this as well as user and customer complaints.
  4. Logically/codewise, Istio (especially Ambient) doesn't need to force you to use the exact Istio format, and we could fix that relatively easily.
  5. However, just fixing that in the code doesn't solve the problem - you still can't use a different SPIFFE ID format because now all people's extant AuthPols will break - because they all force you to encode the entire SPIFFE ID in the strict Istio format. We have talked about this before actually, it's come up repeatedly in SPIRE discussions as the main blocker for fully supporting SPIRE or other workload CAs that follow the SPIFFE ID standard - there are no good workarounds, just hard-to-maintain hacks that create operational burden.

This PR:

  1. Ditches the requirement to specify a principal field (great - I agree, nobody should put a raw SPIFFE ID in their AuthPol, and we don't need the full/exact SPIFFE ID to resolve Istio policy)
  2. Instead we can ignore the underlying format of the principal as a first-class API concern (also great) and just let people specify matchers like (minimally) serviceAccount+namespace. Maybe trust domain as an optional additional specifier later but that's out of scope as I think we can easily assume that if only the SA/NS is defined. Maybe other things later, whatever.
  3. Great, now we have removed the explict and implicit assumption of a specific, fixed Istio-local SPIFFE ID format from the AuthPol API - perfect.
  4. Except if we allow exactPrincipal and <other stuff> to be AND-able in AuthPol, we reintroduce the implicit assumption again because the API is underspecified (not to mention we'll confuse the hell out of people reading the AuthPol docs - what's the operational point of ANDing those?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then the expectation is that we change things and configure our proxies to do a match on both "ns/specified-ns" and "sa/specified-sa" being present in the SANs if we are using this field?

Copy link
Contributor

@bleggett bleggett Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nevermind, I had forgotten we support N principal strings.

Principals:
spiffe://td/ns/<ns>/sa/<sa>
spiffe://td/ns/<ns>/sa/<sa>/foo/bar/baz

SA:
ns/sa

So the above would OR the principal(s) and AND the SA.

That's fine, then. The problem I was thinking of was

Principals:
spiffe://td/ns/<ns>/sa/<sa>

SA:
ns/sa

but my actual principal is spiffe://td/ns/<ns>/sa/<sa>/foo/bar/ba

Here this would still fail because of exact matching, but we could just say either define all possible principals, or none and only use SA - with none being preferred in most cases, as it makes using arbitrary SPIFFE IDs much easier.

Copy link
Contributor

@bleggett bleggett Oct 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So then the expectation is that we change things and configure our proxies to do a match on both "ns/specified-ns" and "sa/specified-sa" being present in the SANs if we are using this field?

The API as I read it doesn't require the new service_account field to be a strict substring of the SPIFFE ID.

So that means if we have an AuthPol with

service_account: default/bar

we can internally map that against an identity principal of

spiffe://td/ns/default/sa/bar

OR

spiffe://td/ns/default/sa/bar/baz/beep/boop

pretty trivially with one AuthPolicy, and de-opinionate on the "SPIFFE ID format".

which means it's possible to now write an AuthPolicy that won't break if you change your SPIFFE ID format.

(impl is TBD but I am happy to have a way to express this in the API at all, which was lacking before)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I largely agree with Ian's comment about semantic assumptions in this thread.

We clearly have limitations in how we interpret SPIFFE URIs when the principal is a multi-path segment but leaving those aside for a moment these are both ways to define principals and should be ORed

After all there's not a lot of logical difference between the new fields and allowing a new URI type in this field to reference SAs. E.g

k8s://ServiceAccount/{namespace}/{name}

This is also the same damn thing as targetRef which has the nice property of allowing for reference expansion without requiring the API to evolve. E.g. this would allow for the introduction of types which represented principal groups to be referenced if they were added to the system

Copy link
Contributor

@bleggett bleggett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM and makes istio/istio#43105 much easier to boot.

It may be necessary to add something like a trust domain later, but I do think it's much better to have an API that looks like

  • service_account
  • trust_domain
  • ...etc

versus

  • exactPrincipal

or

  • matcherTupleWhichIsAKindOfIdentity (random fixed fields)

@istio-testing istio-testing added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Oct 28, 2024
@howardjohn
Copy link
Member Author

I've added some tests and validation. I have blocked usage of SA with principals, in the same from, since they are ANDed -- it does not make sense to say "from ns=a AND sa=foo/bar".

@bleggett
Copy link
Contributor

bleggett commented Oct 28, 2024

I've added some tests and validation. I have blocked usage of SA with principals, in the same from, since they are ANDed -- it does not make sense to say "from ns=a AND sa=foo/bar".

It occurs to me we have historically supported (in the API validation sense, not the logical sense)

from ns=a AND principal=spiffe://td/ns/b/sa/c

which is, by the same token, also ~always a validation error we should probably check for (but that's OOS for this PR - I agree we should proactively validate the net-new bits like you've done it here)

Copy link
Member

@hzxuzhonghu hzxuzhonghu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-notes-none Indicates a PR that does not require release notes. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants