- Release Signoff Checklist
- Summary
- Motivation
- Proposal
- Design Details
- Production Readiness Review Questionnaire
- Implementation History
- Drawbacks
- Alternatives
- Infrastructure Needed (Optional)
Items marked with (R) are required prior to targeting to a milestone / release.
- (R) Enhancement issue in release milestone, which links to KEP dir in kubernetes/enhancements (not the initial KEP PR)
- (R) KEP approvers have approved the KEP status as
implementable
- (R) Design details are appropriately documented
- (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- e2e Tests for all Beta API Operations (endpoints)
- (R) Ensure GA e2e tests meet requirements for Conformance Tests
- (R) Minimum Two Week Window for GA e2e tests to prove flake free
- (R) Graduation criteria is in place
- (R) all GA Endpoints must be hit by Conformance Tests
- (R) Production readiness review completed
- (R) Production readiness review approved
- "Implementation History" section is up-to-date for milestone
- User-facing documentation has been created in kubernetes/website, for publication to kubernetes.io
- Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
Token projection and alternative audiences on JWTs issued by the apiserver enable an external entity to validate the identity and certain properties (e.g. associated ServiceAccount or Pod) of the caller.
When attempting to verify a token associated with a Pod, it is not possible to verify that the Pod is associated with
a specific Node without get
ing the relevant Pod object (embedded as a private claim in the JWT) and cross-referencing
the named spec.nodeName
.
To allow for a robust chain of identity verification from the requester all the way through to the projected token, it would be beneficial if the Node object reference associated with the requesting Pod were embedded into the signed JWT.
This is especially useful in cases where the external software wants to avoid replay attacks with projected service account tokens. The external software can cross-reference the identity of the caller to that service Node reference embedded in the JWT, which allows this verification to be rooted upon the same root of trust that the kubelet/requesting entity uses.
By embedding the identity of the Node the Pod is running on, we can cross-reference this information with an identity passed along to the external service, thus removing the ability for a malicious actor to 'replay' a projected token from another Node.
This will be implemented as an additional node
entry in the private claims embedded into each JWT returned by the
TokenRequest API, in a similar manner to how the ServiceAccount, Pod or Secret is referenced.
Additionally, to provide a robust means of tracking token usage within the audit log we can embed a unique identifier for each token which is can then also be recorded in future audit entries made by this token.
As we are adding support for node
metadata associated with Pods, we will also add the ability to bind a token/JWT
to a Node object directly, similar to how a token can be bound to a Pod or Secret resource today.
- Embedding information about the Node that a pod is running on into signed JWTs.
- Make it easier to track the actions a single token has taken, and cross-reference that back to the origin of the token (via audit log inspection).
- Provide a means of checking whether a Pod's token is associated with the same Node as it was associated with when the initial TokenRequest was made (via an extra field that can be observed from the TokenReview API).
- Embedding requester information. This is discussed further in the alternatives considered section, and a future KEP may revisit this.
- Embedding information beyond the immutable Node name and UID into the token. We aim to mimic what is done with the ref fields for secret, pod and serviceaccount (not introduce any additional properties).
- Changing default behaviour of the SA authenticator to enforce the referenced Node object still exists.
The kube-apiserver will be extended to automatically embed the name
and uid
of the Node a Pod is associated
with (via spec.nodeName
) in generated tokens when a TokenRequest create
call is serviced.
As the 'pod' is already available in this area of code, which contains the nodeName
, we will just need to plumb
through a Getter for Node objects into the TokenRequest storage layer so the node's UID can be fetched, similar to
what is done for pod & secret objects.
Similar to how a token can be bound to a Pod or Secret object, we will also extend the TokenRequest API to allow binding directly to Node objects (without needing to bind to a Pod as well).
This allows users to obtain a token that is tied specifically to the Node objects lifecycle, i.e. when the Node object is deleted, the token will be invalidated.
The SA authenticator will be extended to check whether a token that is bound to a Node object is still valid, by first checking whether the Node object with the name given in the JWT still exists, and if it does, validating whether the UID of that Node is equal to the UID embedded in the token.
Tokens bound to Pod objects will continue to only validate the referenced pod. This avoids changing the previous behaviour for validation of tokens issued for pods. Deletion of a node triggers deletion of the pods associated with that node after a period of time, which ultimately invalidates those tokens.
Tokens that are directly bound to Node objects will always validate the name and UID, as binding tokens to Node objects is a new option and therefore enforcing this validation check from day 1 is non-breaking.
Including a UUID (JTI) on each issued JWT
When a TokenRequest is being issued/fulfilled, we will modify the issuing code to also generate and embed a UUID which can be later used to trace the requests that a specific issued token has made to the apiserver via the audit log.
This will require changing the JWT issuing code to actually generate this UUID, as well as extending the code around the
audit log to have it record this information into audit entries when a token is issued (via the authentication.k8s.io/issued-credential-id
audit annotation).
As this UUID will be embedded as part of a user's ExtraInfo, it'll automatically be persisted into audit events for all
requests made using a token that embeds a credential identifier (as authentication.k8s.io/credential-id
).
Alice hosts a service that verifies host identity using an out-of-band mechanism and also submits a bound token that contains a node assertion.
The node assertion can be checked to ensure the host identity matches the node assertion of the token.
Bob is an administrator of a cluster and has noticed some strange request patterns from an unknown service account.
Bob would like to understand who initially issued/authorised this token to be issued. To do so, Bob looks up the JTI of the token making the suspicious requests by looking inside the audit log entries at user's ExtraInfo for these suspect requests.
This JTI is then used for a further audit log lookup - namely, looking for the TokenRequest create
call which contains
the audit annotation with key authentication.kubernetes.io/issued-credential-id
and the value set to that of the suspect token.
This allows Bob to determine precisely who made the original request for this token, and (depending on the 'chain' above this token), allows Bob to recursively perform this lookup to find all involved parties that led to this token being issued.
- Adding additional cross-referencing validation checks into the TokenReview API may break some user workflows that involve deleting Node objects and restarting kubelet's to allow them to be recreated. As a result, the TokenReview API will NOT be modified to permit tightening this validation behaviour. Instead, the existing protections & mechanisms for invalidating a Node<>Pod binding (i.e. auto-deletion after a fixed time period after the Node object is deleted).
The pkg/serviceaccount/claims.go
file's Claims
function
will be modified to accept a core.Node
. This will be made available in the call-site for this function
(pkg/registry/core/serviceaccount/storage/token.go
) by passing through a Getter for Node objects, similar to how
secret objects are fetched.
The associated Validator
used to validate and parse service account tokens will also be extended to extract this
new information from tokens if it is available.
In pkg/registry/core/serviceaccount/storage/token.go
, the Create
function will also be extended to add an audit
annotation including the generated service account token's JTI, to make it possible to map a future request which
used this token back to the initial point at which the token was generated (i.e. to allow deeper inspection of who
the requester is).
In the file staging/src/k8s.io/apiserver/pkg/authentication/serviceaccount/util.go
, the ServiceAccountInfo.UserInfo
method will be modified to also return this information in the returned user.Info
struct.
These proposed changes can also be reviewed in the draft pull request.
[x] I/we understand the owners of the involved components may require updates to existing tests to make this code solid enough prior to committing the changes necessary to implement this enhancement.
pkg/registry/core/serviceaccount/storage
:
- Coverage before (
release-1.28
):k8s.io/kubernetes/pkg/registry/core/serviceaccount/storage 8.354s coverage: 10.7% of statements
- Coverage after:
k8s.io/kubernetes/pkg/registry/core/serviceaccount/storage 8.394s coverage: 8.7% of statements
- Test ensuring audit annotations are added to audit events for the
serviceaccounts/<name>/token
subresource. - Tests verifying it's possible to bind a token to a Node object.
- Tests ensuring tokens bound to pod objects also embed associated node metadata.
- NOTE: the majority of this file is untested with unit tests (instead, using integration tests). #121515.
staging/src/k8s.io/apiserver/pkg/authentication/serviceaccount
:
- Coverage before (
release-1.28
):k8s.io/apiserver/pkg/authentication/serviceaccount 0.567s coverage: 60.8% of statements
- Coverage after:
k8s.io/apiserver/pkg/authentication/serviceaccount 0.569s coverage: 70.1% of statements
- Test ensuring that service account info (JTI, node name and UID) is correctly extracted from a presented JWT.
- Tests to ensure the information is NOT extracted when the feature gate is disabled.
pkg/serviceaccount
:
- Coverage before (
release-1.28
):k8s.io/kubernetes/pkg/serviceaccount 0.755s coverage: 72.4% of statements
- Coverage after:
k8s.io/kubernetes/pkg/serviceaccount 0.786s coverage: 72.7% of statements
- Extending tests to ensure Node info is embedded into extended claims (name and uid)
- Tests to ensure
ID
/JTI
field is always set to a random UUID. - Tests to ensure the info embedded on a JWT is extracted from the token and into the ServiceAccountInfo when a token is validated.
- Tests to ensure the information is NOT embedded or extracted when the feature gate is disabled.
staging/src/k8s.io/kubectl/pkg/cmd/create
:
- Coverage before (
release-1.28
):k8s.io/kubectl/pkg/cmd/create 0.995s coverage: 55.1% of statements
- Coverage after:
k8s.io/kubectl/pkg/cmd/create 0.949s coverage: 55.2% of statements
- Add tests ensuring it's possible to request a token that is bound to a Node object (gated by environment variable during alpha)
- Test that calls the TokenRequest API to obtain a token that is bound to a Pod. It should assert that the token embeds a reference to the Pod object, as well as to the Node object that the Pod is assigned to.
- Test that calls the TokenRequest API to obtain a token that is bound to a Node. It should assert that the token embeds a reference to the Node object.
- Test that calls the TokenReview API with a token that is bound to a Node object that no longer exists. It should assert that the token does not validate once the Node has been deleted.
k8s.io/test/integration/sig-auth/svcacct_test.go
- TestServiceAccountTokenCreate_bound to a service account and pod
- TestServiceAccountTokenCreate_bound to service account and a pod with an assigned nodeName that does not exist
- TestServiceAccountTokenCreate_bound to service account and a pod with an assigned nodeName
- TestServiceAccountTokenCreate_fails to bind to a Node if the feature gate is disabled
- TestServiceAccountTokenCreate_bound to service account and node
- Extend existing TokenRequest e2e tests to check for embedded scheduled node name & UID + generated JTI is present.
- :
- JTI feature implemented behind a feature flag
ServiceAccountTokenJTI
. - Embedding Pod's assigned Node name/uid feature implemented behind a feature flag
ServiceAccountTokenPodNodeInfo
. - Support verifying JWTs bound to Node objects with feature flag
ServiceAccountTokenNodeBindingValidation
. - Allowing tokens bound to Node objects to be issued with feature flag flag
ServiceAccountTokenNodeBinding
. - Initial e2e tests completed and enabled
- Decide what the default of the new flag should be
- Decision: this flag was not added during alpha, and MAY be added post-beta, but will definitely default to off.
- This does not need to block promotion of ServiceAccountTokenPodNodeInfo feature as a result.
- Decide if using an audit annotation is the correct approach
- Decision: audit annotation is the correct approach as this is only for
serviceaccounts/<name>/token
requests, not all - Renaming audit annotation to
authentication.kubernetes.io/issued-credential-id
to disambiguate fromauthentication.kubernetes.io/credential-id
in user's ExtraInfo
- Decision: audit annotation is the correct approach as this is only for
- Docs around the SA JWT schema (this does not exist today)
- Allowing time for feedback and any other user-experience reports.
- Conformance tests
- Consolidate the existing service account docs to be more coherent and avoid duplication, especially in regards to consuming service account tokens outside of Kubernetes:
Embedding a Pod's assigned Node name into a JWT does not require any coordination between clients and the apiserver,
as no components require this information to be embedded. This is purely additive, and the only rollback concerns
would be around third party software that consumes this information. This software should always verify whether a
node
claim is embedded into tokens if they require using it, and provide a fall-back behaviour (i.e. a GET to the
apiserver to fetch the Pod & Node object) if they need to maintain compatibility with older apiservers.
Binding a token to a Node introduces a new validation mechanism, and therefore we must allow one release cycle after introducing the ability to validate tokens, before we can begin permitting issuance of these tokens. This is a critical step from a security standpoint, as otherwise an administrator could:
- upgrade their apiserver/control plane.
- a user could request a token bound to a Node, expecting it to be invalidated when the Node is deleted.
- rollback the apiserver to an older version.
- the Node object is deleted.
- the token issued in (2) would now continue to be accepted/validated, despite the Node object no longer existing.
By graduating validation a release earlier than issuance, we can ensure any tokens that are bound to a Node object will be correctly validated even after a rollback.
ServiceAccountTokenJTI
feature flag will toggle including JTI information in tokens, as well as recording JTIs in the audit log / the SA user info.ServiceAccountTokenPodNodeInfo
feature flag will toggle including node info associated with pods in tokens.ServiceAccountTokenNodeBindingValidation
feature flag will toggle the apiserver validating Node claims in node bound service account tokens.ServiceAccountTokenNodeBinding
feature flag will toggle allowing service account tokens to be bound to Node objects.
The ServiceAccountTokenNodeBindingValidation
feature will graduate to beta in version v1.30, a release earlier than ServiceAccountTokenNodeBinding
to ensure a safe rollback from version v1.31 to v1.30 (more info below in rollback considerations section).
The ServiceAccountTokenNodeBinding
feature gate must only be enabled once the ServiceAccountTokenNodeBindingValidation
feature has been enabled.
Disabling the ServiceAccountTokenNodeBindingValidation
feature whilst keeping ServiceAccountTokenNodeBinding
would allow tokens that are expected to
be bound to the lifetime of a particular Node to validate even if that Node no longer exists.
The rollout & rollback section below goes into further detail.
All other feature flags can be disabled without any unexpected adverse affects or coordination required.
-
Feature gate
- Feature gate name:
ServiceAccountTokenJTI
- Components depending on the feature gate: kube-apiserver
- Feature gate name:
-
Feature gate
- Feature gate name:
ServiceAccountTokenPodNodeInfo
- Components depending on the feature gate: kube-apiserver
- Feature gate name:
-
Feature gate
- Feature gate name:
ServiceAccountTokenNodeBinding
- Components depending on the feature gate: kube-apiserver
- Feature gate name:
-
Feature gate
- Feature gate name:
ServiceAccountTokenNodeBindingValidation
- Components depending on the feature gate: kube-apiserver
- Feature gate name:
Enabling the ServiceAccountTokenPodNodeInfo
and/or ServiceAccountTokenJTI
feature gate will cause additional information
to be stored/persisted into service account JWTs, as well as new audit annotations being recorded in the audit log.
This is all purely additive, so no changes to existing features, schemas or fields are expected.
Enabling the ServiceAccountTokenNodeBinding
will permit binding tokens to Node objects, which is a change in
behaviour (albeit not to an existing feature, so is not problematic).
Yes. Future tokens will then not embed this information. Any existing issued tokens will still have this information embedded, however.
If these fields are deemed to be problematic for other systems interpreting these tokens, users will need to re-issue these tokens before presenting them elsewhere.
Once the feature(s) have graduated to GA, it will not be possible to disable this behaviour.
Future tokens will once again include this information/no adverse effects.
Yes (as noted above in the test plan)
Rolling this out will be done by enabling the feature flag on all control plane hosts.
The ServiceAccountTokenNodeBindingValidation
feature gate should be enabled and complete rollout before the
ServiceAccountTokenNodeBinding
gate is enabled, so all active servers will correctly validate tokens issued by
any server.
The ServiceAccountTokenNodeBindingValidation
will be defaulted to on one release before ServiceAccountTokenNodeBinding
to account for this. Concretely, ServiceAccountTokenNodeBindingValidation
will be enabled by default in v1.30 and
ServiceAccountTokenNodeBinding
will be enabled by default in v1.31.
This should not have any issues/affect during upgrades. Rollback is done by removing/disabling the feature gate(s).
During a rollback, there is a concern that tokens that were issued prior to the rollback that are bound directly to a
Node object (i.e. not bound to a Pod that also embeds node info, which is informational) could be accepted by an older
apiserver even if the bound Node object no longer exists (as it would not know to verify the new node
claim).
To help avoid this, the feature will be graduated in two phases:
- First, graduating the acceptance/validation of explicitly node-scoped tokens in one release
- Secondly, graduating the issuance of explicitly Node bound tokens
This allows for a safe rollback in which the same security expectations are enforced once a token has been issued.
If a user explicitly disables ServiceAccountTokenNodeBindingValidation
but keeps ServiceAccountTokenNodeBinding
enabled,
the node claims in the issued tokens will not be properly validated. This configuration will be explicitly denied by the
kube-apiserver and will cause it to exit on startup.
authentication_attempts
authorization_attempts_total
serviceaccount_valid_tokens_total
New metrics that can be used to identify if the feature is in use:
serviceaccount_authentication_pod_node_ref_verified_total
serviceaccount_authentication_bound_object_verified_total{bound_object_kind="Node"}
serviceaccount_bound_tokens_issued_pod_with_node_tokens_total
serviceaccount_bound_tokens_issued_total{bound_object_kind="Node"}
serviceaccount_bound_tokens_issued_with_identifier_total
For ServiceAccountTokenJTI
feature (alpha v1.29, beta v1.30, GA v1.32):
Without the feature gate enabled, issued service account tokens will not have their jti
field set to a random UUID,
and the audit log will not persist the issued credential identifier when issuing a token.
With the feature gate enabled, issued service accounts will set the jti
field to a random UUID.
Additionally, the audit event recorded when issuing a new token will have a new annotation added (authentication.k8s.io/issued-credential-id
).
As a service account's JTI field is used to infer the credential identifier, which forms part of a users ExtraInfo
,
audit events generated using this newly issued token will also include this JTI (persisted as authentication.k8s.io/credential-id
).
If the feature is disabled and a token is presented that includes a credential identifier, it will still be persisted into the audit log as part of the UserInfo in the audit event.
As none of these fields are actually used for validating/verifying a token is valid, enabling & disabling the feature does not cause any adverse side effects.
For ServiceAccountTokenNodeBinding
(alpha v1.29, beta v1.31, GA v1.33) and ServiceAccountTokenNodeBindingValidation
(alpha v1.29, beta v1.30, GA v1.32) feature:
Without the feature gate enabled, service account tokens that have been bound to Node objects will not have their node reference claims validated (to ensure the referenced node exists).
With the feature gate enabled, if a token has a node
claim contained within it, it'll be validated to ensure the
corresponding Node object actually exists.
Disabling this feature will therefore relax the security posture of the cluster in an unexpected way, as tokens that may have been previously invalid (because their corresponding Node does not exist) may become valid again.
Node bound tokens may only be issued if the ServiceAccountTokenNodeBinding
feature is enabled, and it is not possible
to enable ServiceAccountTokenNodeBinding
without ServiceAccountTokenNodeBindingValidation
being enabled too.
This is further mitigated by graduating the ServiceAccountTokenNodeBindingValidation
feature one release earlier
than ServiceAccountTokenNodeBinding
.
Tokens that are bound to objects other than Nodes are unaffected.
For ServiceAccountTokenPodNodeInfo
feature (alpha v1.29, beta v1.30, GA v1.32):
Without the feature gate enabled, tokens that are bound to Pod objects will not include information about the Node that the pod is scheduled/assigned to.
With the feature enabled, newly minted tokens that are bound to Pod objects will include metadata about the Node, namely the Node's name and UID.
These fields are not validated and therefore disabling the feature after enabling it will not cause any adverse side-effects.
``
Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
No.
New metrics:
serviceaccount_authentication_pod_node_ref_verified_total
- new metric that is incremeneted when a token bound to a Pod has its Node reference verifiedserviceaccount_authentication_bound_object_verified_total{bound_object_kind="Node"}
- new metric that is incremeneted when a token bound to a Node has its reference verifiedserviceaccount_bound_tokens_issued_pod_with_node_tokens_total
- new metric that is incremented when a node ref is embedded into a bound Pod token (aka implicitly added)serviceaccount_bound_tokens_issued_total{bound_object_kind="Node"}
- new metric that is incremented whenever a bound token is issued that references a Node (explicitly added)serviceaccount_bound_tokens_issued_with_identifier_total
- new metric that is incremented whenever a token that contains an identifier/JTI is issued
The metrics detailed above provide a clear signal as to whether these features are being used.
For the node info part, using the TokenRequest API and inspecting the contents of the issued JWTs for a token bound to a Pod. For JTIs, using the TokenRequest API and then inspecting the contents of the issued JWT for any ServiceAccount token.
For the validation/verification, the user can use the SelfSubjectAccessReview API to check whether the token is still valid. To do so, they'd need to obtain a token that is bound to a Pod, delete the corresponding Node object that the Pod is scheduled on, and observe that the token is no longer valid via the SelfSubjectAccessReview API.
A similar process could be used for tokens bound to Node objects directly.
N/A
What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
N/A
- Metrics
- Metric name:
- [Optional] Aggregation method:
- Components exposing the metric:
- Other (treat as last resort)
- Details:
Are there any missing metrics that would be useful to have to improve observability of this feature?
No
None
No
N/A
No
No
No
Additional audit log annotation keys, as well as extending the JWT claims we embed into service account tokens.
The maximum size of a UUID is 36 bytes. The maximum size of a Node object's name is 253 bytes. The maximum size of a Node object's UID is 36 bytes.
This additional data will be recorded into issued JWTs as well as audit log events.
Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs?
Fractionally increase the time spent issuing service account JWTs (UUID generation mainly). This is expected to be negligible.
Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components?
No
Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
No
Not applicable. This change is solely within the apiserver, and does not touch etcd.
After observing an issue (e.g. uptick in denied authentication requests or a significant shift in any metrics added for this KEP), kube-apiserver logs from the authenticator may be used to debug.
Additionally, manually attempting to exercise the affected codepaths would surface information that'd aid debugging. For example, attempting to issue a node bound token, or attempting to authenticate to the apiserver using a node bound token.
- KEP marked implementable and merged for the v1.29 release
- KEP implemented in an alpha state for v1.29
- Renamed audit annotation used for the
serviceaccounts/<name>/token
endpoint to be clearer: kubernetes/kubernetes#123098 - Added restrictions to disallow enabling
ServiceAccountTokenNodeBinding
withoutServiceAccountTokenNodeBindingValidation
: kubernetes/kubernetes#123135 ServiceAccountTokenJTI
,ServiceAccountTokenNodeBindingValidation
andServiceAccountTokenPodNodeInfo
promoted to beta for v1.30 release- Promoted
ServiceAccountTokenNodeBinding
promoted to beta for v1.31 release - Promoted
ServiceAccountTokenJTI
,ServiceAccountTokenPodNodeInfo
,ServiceAccountTokenNodeBindingValidation
to stable for v1.32 release - Promoted
ServiceAccountTokenNodeBinding
to stable for v1.33 release
- TBC
N/A