Add secure-pod-defaults flag to default Pods to 'restricted' profile by default #13398

evankanderson · 2022-10-15T18:29:59Z

Adds a feature flag (deactivated for the current release) which defaults launched pods to a mostly-secure version of the restricted profile that shouldn't be too onerous. (i.e. it does not set runAsNonRoot, because Docker defaults to building containers that run as root)

Proposed Changes

Adds a feature flag secure-pod-defaults, which is currently Disabled
When secure-pod-defaults is set, it will fill in unset container SecurityContexts with secure defaults
When the flag is set, if specific insecure SecurityContext features are requested, they will be retained.

Release Note

Adds the `secure-pod-defaults` feature, which is defaulted to Disabled in
this release.

When enabled, containers described by users will have best-practice
SecurityContext features enabled unless insecure settings are specifically
requested.

…by default

evankanderson · 2022-10-15T18:30:58Z

/assign @dprotaso @mattmoor @psschwei

I realize this is very last-minute, but I'd love to get this into 1.8 on Tuesday, so that we can move the flag to Beta (Enabled by default) for 1.9.

codecov · 2022-10-15T18:36:20Z

Codecov Report

Base: 86.21% // Head: 86.27% // Increases project coverage by +0.05% 🎉

Coverage data is based on head (3f6c947) compared to base (8722a63).
Patch coverage: 94.64% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #13398      +/-   ##
==========================================
+ Coverage   86.21%   86.27%   +0.05%     
==========================================
  Files         197      197              
  Lines       14722    14778      +56     
==========================================
+ Hits        12692    12749      +57     
+ Misses       1730     1729       -1     
  Partials      300      300

Impacted Files	Coverage Δ
pkg/apis/serving/fieldmask.go	`94.91% <76.92%> (-0.70%)`	⬇️
pkg/apis/config/features.go	`95.52% <100.00%> (+0.13%)`	⬆️
pkg/apis/serving/v1/revision_defaults.go	`97.43% <100.00%> (+0.91%)`	⬆️
pkg/autoscaler/statserver/server.go	`77.37% <0.00%> (+1.45%)`	⬆️
pkg/autoscaler/statforwarder/leases.go	`73.95% <0.00%> (+1.56%)`	⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

mattmoor

Generally I think this is a positive change. Most of my feedback is focused on a future where this defaults on, ways it might break folks (directly or indirectly), and whether folks have the tools to workaround any breakage.

FWIW, we run all of our services with the following today:

        securityContext:
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          capabilities:
            drop:
            - all

... and I'm looking forward to adding allowPrivilegeEscalation: false to this. As I said inline, I don't think we allow specifying seccomp profile today, which we should maybe allow? 🤔

If we wanted to encourage a better default security posture for folks, we could consider using the new warning capabilities in apis.FieldError to encourage folks to explicitly set security features (well in advance of a breaking default change, in line with our versioning/deprecation policies). We're using warnings in sigstore/policy-controller downstream and they are pretty neat!

pkg/apis/serving/v1/revision_defaults.go

mattmoor · 2022-10-15T18:43:33Z

pkg/apis/serving/v1/revision_defaults.go

+	if container.SecurityContext.SeccompProfile == nil {
+		container.SecurityContext.SeccompProfile = &corev1.SeccompProfile{}
+	}
+	if container.SecurityContext.SeccompProfile.Type == "" {
+		container.SecurityContext.SeccompProfile.Type = corev1.SeccompProfileTypeRuntimeDefault
+	}


We don't allow SeccompProfile thru the fieldmask today (it looks like it may be newer than our field triage because it isn't copied or explicitly cleared):

serving/pkg/apis/serving/fieldmask.go

Lines 623 to 639 in a18077c

// Allowed fields

out.Capabilities = in.Capabilities

out.ReadOnlyRootFilesystem = in.ReadOnlyRootFilesystem

out.RunAsUser = in.RunAsUser

out.RunAsGroup = in.RunAsGroup

// RunAsNonRoot when unset behaves the same way as false

// We do want the ability for folks to set this value to true

out.RunAsNonRoot = in.RunAsNonRoot

// AllowPrivilegeEscalation when unset can behave the same way as true

// We do want the ability for folks to set this value to false

out.AllowPrivilegeEscalation = in.AllowPrivilegeEscalation

// Disallowed

// This list is unnecessary, but added here for clarity

out.Privileged = nil

out.SELinuxOptions = nil

out.ProcMount = nil

So my main concern here would be that we are changing a default value that we don't allow folks to override.

Good point, I can make a PR to make this settable.

psschwei · 2022-10-15T19:46:20Z

Will give a full review on Monday, but wonder if there's any overlap with #13376

…later

evankanderson · 2022-10-16T17:29:37Z

I wrote #13399 for validation to warn about the insecure defaults.

skonto · 2022-10-17T10:25:01Z

if there's any overlap with #13376

Regarding overlapping with #13376 we can just do all required changes in this PR here or just keep the other for queue-proxy only changes.

Btw I left a comment here. I am in favor of using defaults that are more secure, avoid psp warnings, and we should also allow users to set/unset stuff on demand. It depends on our strategy, we can be as opinionated we want as long as users can change that and they know about it. The downside is that users might expect whatever K8s has as a default (breaks UX), but we have broken UX already by disabling certain stuff with several flags. Another issue is what it means to have certain combinations (have a comment below) and if we actually break anything (assuming users dont see that change coming, even with the flag).

Maybe for user containers it is safer/easier to allow users set their workloads scs as they want without defaults and touch only stuff we are controlling like Knative control plane, queue proxy etc. Wondering if we need the defaults at all.

skonto · 2022-10-17T10:44:25Z

pkg/apis/serving/v1/revision_defaults.go

+		updatedSC.Capabilities = &corev1.Capabilities{}
+	}
+	if updatedSC.Capabilities.Drop == nil {
+		updatedSC.Capabilities.Drop = []corev1.Capability{"ALL"}


If updatedSC.AllowPrivilegeEscalation is nil and we set it to false and later on we have updatedSC.Capabilities.Drop != nil with CAP_SYS_ADMIN present, would that be a problem or K8s will catch that as an invalid combination (kubernetes/website#30104) or K8s will ignore the capabilities value? The flag we introduce in this PR hopefully will help users migrate their services on time, since we allow them (via other PRs) to set explicitly these fields to avoid default overrides when later the flag is on by default. 🤔

Per that note, I don't think we allow privileged to be set, so I don't think we get into the forcing issue.

And yes, the idea is that:

The flag defaults to false or can be flipped to false, so users can avoid things altogether for a while.

The flag only causes the default values to be filled in, so there are ways to explicitly request the old behavior if needed.

Don't we allow updatedSC.AllowPrivilegeEscalation set via #13395?

Per that note, I don't think we allow privileged to be set, so I don't think we get into the forcing issue.

That's right, SecurityContext.privileged isn't allowed, though because a pod's security context is behind a feature gate, it fails silently vs. getting caught in the validation as in a container's security context.

with CAP_SYS_ADMIN present, would that be a problem or K8s

Since adding capabilities requires enabling a feature gate, we could put extra warnings in the release notes / documentation in for anyone who is flipping that gate on to further mitigate any risk.

evankanderson · 2022-10-17T14:35:59Z

This is one of those "I reserve the right to wake up smarter" moments. I realized thanks to Matt's PR that all the SecurityContext values were pointer-ish, so it was possible to distinguish between "I don't care about security" and "I need to do something dangerous". That tipped me into the "we should do something safer than the Kubernetes default, even though the experience is different".

config/core/configmaps/features.yaml

psschwei · 2022-10-18T01:15:00Z

pkg/apis/serving/v1/revision_defaults.go

+		updatedSC.Capabilities = &corev1.Capabilities{}
+	}
+	if updatedSC.Capabilities.Drop == nil {
+		updatedSC.Capabilities.Drop = []corev1.Capability{"ALL"}


Per that note, I don't think we allow privileged to be set, so I don't think we get into the forcing issue.

That's right, SecurityContext.privileged isn't allowed, though because a pod's security context is behind a feature gate, it fails silently vs. getting caught in the validation as in a container's security context.

with CAP_SYS_ADMIN present, would that be a problem or K8s

Since adding capabilities requires enabling a feature gate, we could put extra warnings in the release notes / documentation in for anyone who is flipping that gate on to further mitigate any risk.

psschwei · 2022-10-18T01:21:35Z

pkg/apis/serving/v1/revision_defaults.go

+	if psc.SeccompProfile == nil || psc.SeccompProfile.Type == "" {
+		if updatedSC.SeccompProfile == nil {
+			updatedSC.SeccompProfile = &corev1.SeccompProfile{}
+		}
+		if updatedSC.SeccompProfile.Type == "" {
+			updatedSC.SeccompProfile.Type = corev1.SeccompProfileTypeRuntimeDefault
+		}
+	}


#13401 needs to merge for this not to feel the wrath of the validating webhook...

in that vein, it probably wouldn't be a bad idea to have an e2e test for this before we default to enabling this (see also the issue above about needed to enable another feature to get this working)

psschwei · 2022-10-18T01:35:31Z

config/core/configmaps/features.yaml

+    #
+    # This value will default to "enabled" in a future release,
+    # probably Knative 1.8
+    secure-pod-defaults: "disabled"


I also ran into an issue where I was getting a validation error on seccompProfile, even after setting this to enabled and pulling in the other patch. Enabling the pod security context feature fixed it, but I wonder if we need to consolidate some of the behaviors... at the very least, maybe using secure defaults should turn on pod security contexts also (though something about that feels a little hacky...)

I think you're correct about having this feature enable the other one. That actually seems reasonable to me, and at some point if we make this field default-true or even always-enabled, we can retire the other flag, too, possibly.

evankanderson · 2022-10-29T18:03:09Z

/reopen

oops, that's not what I meant to do.

knative-prow · 2022-10-29T18:03:14Z

@evankanderson: Reopened this PR.

In response to this:

/reopen

oops, that's not what I meant to do.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dprotaso · 2022-11-24T16:26:20Z

Note - K8s provides a knob for setting the default seccompProfile for all workloads (to RuntimeDefault)

https://kubernetes.io/docs/tutorials/security/seccomp/#enable-the-use-of-runtimedefault-as-the-default-seccomp-profile-for-all-workloads

I wonder if we should skip setting this property since it's breaking OpenShift users and people using gVisor (#13471)

dprotaso · 2022-11-24T16:28:26Z

Looks like long-term k8s will default to the RuntimeDefault profile as well

https://github.com/kubernetes/enhancements/tree/9a124fd29d1f9ddf2ff455c49a630e3181992c25/keps/sig-node/2413-seccomp-by-default#beta-to-ga-graduation

evankanderson · 2023-01-20T18:16:38Z

In addition to the dependency on #13565, I've added a the ability to explicitly request the Kubernetes defaults for SeccompProfile (at the PodSpec level) and Capabilities and AllowPrivilegeEscalation (both at the container level, and already allowed).

I'd originally written this so that you could request the default (Unconfined) behavior at the Pod level, but not the safe default (RuntimeDefault). I've changed it so you can request either value when the flag is enabled, so you can explicitly opt in to the safe behavior if desired.

evankanderson · 2023-01-20T23:54:31Z

(Tests are now passing, though that could be because I've got the behavior behind a flag.)

I'll try to get an integration test or two in for this, but I'd really like to get the flag in DEFAULT OFF for Knative 1.9, so we can talk about flipping the flag to on in 1.10 in April.

knative-prow · 2023-01-22T02:18:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: evankanderson

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~config/OWNERS~~ [evankanderson]
~~hack/OWNERS~~ [evankanderson]
~~pkg/apis/OWNERS~~ [evankanderson]
~~test/OWNERS~~ [evankanderson]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

…of nil.

evankanderson · 2023-01-22T04:26:01Z

It turns out capabilities.drop is marked omitempty, so I can't distinguish between nil and []; I've adjusted the code to default in capabilities only if entire object is empty.

evankanderson · 2023-01-23T16:05:54Z

@skonto @psschwei @dprotaso

I think this is ready for review again

psschwei

nit on one of the tests, but otherwise lgtm

pkg/apis/serving/fieldmask_test.go

psschwei

/lgtm

/hold
to give @skonto @dprotaso @mattmoor a chance to comment (otherwise will unhold in the morning before cutting the release)

skonto · 2023-01-23T23:29:21Z

Some questions:
a) Regarding runAsNonRoot is it that we might hit some image that is built with root and we will stop it from running if we set runAsNonRoot=true?
b) Users will have to enable kubernetes.containerspec-addcapabilities if they add some port < 1024 correct? I guess we don't test/document that part for now. Downstream some tests fail (grpc ping) if I dont set that flag, I will check.

evankanderson · 2023-01-24T00:29:57Z

Yeah, I didn't touch runAsNonRoot for now because Docker defaults to building images with UID=0, and some popular images such as nginx may be written to run as UID 0. I'd love to fix this, but I want another cycle or a separate knob for it.

evankanderson · 2023-01-24T00:31:38Z

For (b), it's intended that we will set NET_BIND_SERVICE to enable ports < 1024. It's possible that isn't sufficient for some cases -- that's one reason I'm leaving the default at false for right now (also, I'm not a monster that jumps straight to beta, but mostly I'm worried about ironing out these cases with real tests).

If you have an example image that doesn't work with port <1024 and otherwise does work, it will shorten my repro efforts.

evankanderson · 2023-01-24T00:32:22Z

(I'm going to cancel hold around 7-8pm, so this can get one round of nightlies before going into the release. I should have done it earlier, but... nothing like a deadline to sharpen the will.)

evankanderson · 2023-01-24T04:26:36Z

/hold cancel

evankanderson · 2023-01-24T04:32:56Z

/test unit-tests

This looks related to #13636 , but not sure if that's just a bad merge by the automation.

skonto · 2023-01-24T14:31:41Z

@evankanderson regarding b) it seems we are seeing failures because if only port name is defined then it tries to add the NET_BIND_SERVICE (probably because containerPort is zero). Downstream the fix was to set specific port numbers outside that range.

evankanderson · 2023-01-24T14:44:15Z

D'oh, I'd forgotten about the possibility of not setting a port number (which is pretty common) and then defaulting to zero.

I think we can simply adjust the test for port >0 and <1024.

Do you also want me to add something for opting out of seccompProfile? It wouldn't be difficult, but it wasn't clear to me the affected range of components.

In any case, I think those fixes are for 1.10, as this feature is disabled by default in in 1.9.

skonto · 2023-01-24T14:51:10Z

I think we can simply adjust the test for port >0 and <1024.

I think that should be the case. Let me test it. I am wondering why tests here don't fail, I guess because the feature is not enabled by default we missed that.

Do you also want me to add something for opting out of seccompProfile? It wouldn't be difficult, but it wasn't clear to me the affected range of components.

Yeah it would be nice because for OCP there are cases where you want to tag a service not to set seccompProfile, because if set, it does not allow you to run as the default image user (TestShouldRunAsUserContainerDefault fails). RuntimeDefault SeccompProfile on OCP brings restrictions that break stuff, for example check the blocked sys calls here, including the ones for setting uid to image user value.

In any case, I think those fixes are for 1.10, as this feature is disabled by default in in 1.9.

ok makes sense.

skonto · 2023-01-24T17:31:47Z

port >0 and <1024.

Verified it does the trick btw.

Add secure-pod-defaults flag to default Pods to 'restricted' profile …

7f344b7

…by default

knative-prow bot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. area/API API objects and controllers labels Oct 15, 2022

knative-prow bot assigned dprotaso, mattmoor and psschwei Oct 15, 2022

mattmoor reviewed Oct 15, 2022

View reviewed changes

evankanderson added 2 commits October 16, 2022 07:50

Merge remote-tracking branch 'upstream/main' into secured-by-default-…

625c30b

…later

Account for pod-level SecurityContext when setting defaluts

7316733

evankanderson mentioned this pull request Oct 16, 2022

Add validation to warn about insecure SecurityContext defaults #13399

Merged

Fix format that somehow slipped through auto-format-on-save

540f339

evankanderson mentioned this pull request Oct 16, 2022

Allow setting seccompProfile to enable using restricted security profile #13401

Merged

Fix lint unparam complaint

a2f323c

skonto mentioned this pull request Oct 17, 2022

Run queue proxy with restricted profile #13376

Merged

skonto reviewed Oct 17, 2022

View reviewed changes

knative-prow-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 18, 2022

psschwei reviewed Oct 18, 2022

View reviewed changes

knative-prow bot closed this in #13401 Oct 18, 2022

knative-prow bot reopened this Oct 29, 2022

Add seccompProfile to CRD schemas (it was accepted, but not documented)

8f6e31c

knative-prow bot added the area/test-and-release It flags unit/e2e/conformance/perf test issues for product features label Jan 21, 2023

Add e2e tests, checking CI combinations

15fef2f

Capabilities.Drop is "omitempty", so it will always be empty instead …

36e9e38

…of nil.

Fix e2e test comment.

3f6c947

psschwei reviewed Jan 23, 2023

View reviewed changes

pkg/apis/serving/fieldmask_test.go Show resolved Hide resolved

psschwei reviewed Jan 23, 2023

View reviewed changes

knative-prow bot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm Indicates that a PR is ready to be merged. labels Jan 23, 2023

knative-prow bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 24, 2023

knative-prow bot merged commit 0360850 into knative:main Jan 24, 2023

This was referenced Feb 28, 2023

[TEST-ONLY] [WIP] Test sec default openshift-knative/serving#134

Closed

[RELEASE-1.7][SRVKS-985] Add revision security defaults openshift-knative/serving#186

Closed

kauana mentioned this pull request Jul 17, 2023

feat: secure-pod-defaults is enabled by default #14168

Closed

ReToCode mentioned this pull request Sep 12, 2023

Deploying a model server creates a seccompProfile expression that prevents it from starting in our environment. #14299

Closed

	// Allowed fields
	out.Capabilities = in.Capabilities
	out.ReadOnlyRootFilesystem = in.ReadOnlyRootFilesystem
	out.RunAsUser = in.RunAsUser
	out.RunAsGroup = in.RunAsGroup
	// RunAsNonRoot when unset behaves the same way as false
	// We do want the ability for folks to set this value to true
	out.RunAsNonRoot = in.RunAsNonRoot
	// AllowPrivilegeEscalation when unset can behave the same way as true
	// We do want the ability for folks to set this value to false
	out.AllowPrivilegeEscalation = in.AllowPrivilegeEscalation

	// Disallowed
	// This list is unnecessary, but added here for clarity
	out.Privileged = nil
	out.SELinuxOptions = nil
	out.ProcMount = nil

Add secure-pod-defaults flag to default Pods to 'restricted' profile by default #13398

Add secure-pod-defaults flag to default Pods to 'restricted' profile by default #13398

Conversation

evankanderson commented Oct 15, 2022 • edited by psschwei Loading

Proposed Changes

evankanderson commented Oct 15, 2022

codecov bot commented Oct 15, 2022 • edited Loading

Codecov Report

mattmoor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

psschwei commented Oct 15, 2022

evankanderson commented Oct 16, 2022

skonto commented Oct 17, 2022 • edited Loading

skonto Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

skonto Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson commented Oct 17, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson commented Oct 29, 2022

knative-prow bot commented Oct 29, 2022

dprotaso commented Nov 24, 2022

dprotaso commented Nov 24, 2022

evankanderson commented Jan 20, 2023

evankanderson commented Jan 20, 2023

knative-prow bot commented Jan 22, 2023

evankanderson commented Jan 22, 2023

evankanderson commented Jan 23, 2023

psschwei left a comment

Choose a reason for hiding this comment

psschwei left a comment

Choose a reason for hiding this comment

skonto commented Jan 23, 2023 • edited Loading

evankanderson commented Jan 24, 2023

evankanderson commented Jan 24, 2023

evankanderson commented Jan 24, 2023

evankanderson commented Jan 24, 2023

evankanderson commented Jan 24, 2023

skonto commented Jan 24, 2023 • edited Loading

evankanderson commented Jan 24, 2023

skonto commented Jan 24, 2023 • edited Loading

skonto commented Jan 24, 2023

evankanderson commented Oct 15, 2022 •

edited by psschwei

Loading

codecov bot commented Oct 15, 2022 •

edited

Loading

skonto commented Oct 17, 2022 •

edited

Loading

skonto Oct 17, 2022 •

edited

Loading

skonto Oct 17, 2022 •

edited

Loading

skonto commented Jan 23, 2023 •

edited

Loading

skonto commented Jan 24, 2023 •

edited

Loading

skonto commented Jan 24, 2023 •

edited

Loading