Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chart: Explicitly set runAsGroup. #11679

Merged
merged 2 commits into from
Jul 29, 2024

Conversation

ubergesundheit
Copy link
Contributor

@ubergesundheit ubergesundheit commented Jul 25, 2024

What this PR does / why we need it:

Set a default value for the runAsGroup in container securityContexts of the controller and default backend.

Also set the runAsGroup for opentelemetry and webhook Job container securityContexts.

This change is an improvement over the current suituation as it explicitly sets the group id through runAsGroup instead of relying on the default kubernetes behaviour.

According to the Kubernetes API reference the runAsGroup defaults to the runtime default if not set. In reality this might be the runAsUser ID, but it doesn't have to.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • CVE Report (Scanner found CVE and adding report)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation only

How Has This Been Tested?

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have added unit and/or e2e tests to cover my changes.
  • All new and existing tests passed.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jul 25, 2024
Copy link

netlify bot commented Jul 25, 2024

Deploy Preview for kubernetes-ingress-nginx canceled.

Name Link
🔨 Latest commit 89d8121
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-ingress-nginx/deploys/66a768234158c30008fe47b0

@k8s-ci-robot k8s-ci-robot added the area/helm Issues or PRs related to helm charts label Jul 25, 2024
@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 25, 2024
@ubergesundheit
Copy link
Contributor Author

/assign Gacko

Copy link
Member

@Gacko Gacko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please also set it in this comment?

Apart from that, your PR looks good to me!

@Gacko
Copy link
Member

Gacko commented Jul 25, 2024

Some context for readers: According to the Kubernetes API reference the runAsGroup defaults to the runtime default if not set. In reality this might be the runAsUser ID, but it doesn't have to.

@Gacko Gacko changed the title Make runAsGroup for controller and default backend configurable in helm chart Chart: Make runAsGroup configurable for controller & default backend. Jul 25, 2024
Copy link
Member

@Gacko Gacko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/triage accepted
/kind feature
/priority backlog
/lgtm

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. lgtm "Looks good to me", indicates that a PR is ready to be merged. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority labels Jul 25, 2024
@longwuyuan
Copy link
Contributor

longwuyuan commented Jul 25, 2024

Does it mean that after this PR merges, the users can set any random gid of their choice ?

@Gacko
Copy link
Member

Gacko commented Jul 25, 2024

Yes, they can. But I think this PR is more about making explicit what's currently implicit.

@longwuyuan
Copy link
Contributor

And we want to allow any random gid/uid to be set ? If yes, is there a test needed to generate a random number and set that ?

@Gacko
Copy link
Member

Gacko commented Jul 25, 2024

/hold

@Gacko
Copy link
Member

Gacko commented Jul 25, 2024

to re-configuring/tweak/change runAsGroup, to any potentially random gid, that the end-user may fancy.

The intention is not to change it as this would probably break the image. The intention is to explicitly define it and therefore overcome the implicit runtime value defined by your container runtime.

@longwuyuan
Copy link
Contributor

longwuyuan commented Jul 25, 2024

  1. Is it possible to set gid for runAsGroup (hardcode in template etc) without adding a new key to the values file ?
  2. Does that proposed new key in the values file, add a protection for the until-now non-existent potential, for a user setting a random gid via values file, and creating a brand new problem ?
  3. When www-data has a gid of 82, it would help a lot to clearly lavout, precisely what and more importantly how something is broken/vulnerable here to begin with ?
  4. This link ,referenced earlier in posts here, is not elaborate, contextual or precise enough, in clearly demonstrating something is broken/vulnerable in the ingress-nginx controller, due to that behaviour of K8S.
  5. Is a new test needed to demonstrate/validate the current broken/vulnerable setting of runAsGroup ?

@Gacko
Copy link
Member

Gacko commented Jul 26, 2024

  1. No, this is not possible. We have places where we directly hand in the whole securityContext from values.yaml, e.g. the webhook patch job or OpenTelemetry. The hard-coded securityContext from _helpers.tpl is only being used for the controller itself and the default backend.
  2. No, it does not protect users from setting random GIDs and therefore breaking their deployment. Actually they could already do so right now by changing the UID. I once tried using a different UID for the controller container and that prevented it from accessing some important files like the certificates.

    I could imagine the controller to still work with a different group, but yeah, the long term impact is unclear.
  3. There is nothing vulnerable about GID 82. It's more the fact that the group is not explicitly set, so the "implicity" and not defining it causes unknown behavior (and security scanners to complain).
  4. Yes, you are right. This on its own does not expose any security risk. But it also hands the responsibility of deciding on the runAsGroup to your container runtime. Again, that behavior could vary and therefore adds uncertainty.
  5. The current behavior is not broken nor insecure. It probably works fine, but not defining and making clear the GID also doesn't make it any better.

So this PR is mostly about adding certainty and security where currently implicit and unknown behavior could be a security issue even though it might not be in most environments. Of course you can rely on your environment to handle this in a secure way, but the container runtime does not know all use cases and especially not our image.

@Gacko Gacko changed the title Chart: Make runAsGroup configurable for controller & default backend. Chart: Explicitly set runAsGroup for controller & default backend. Jul 26, 2024
@ubergesundheit
Copy link
Contributor Author

So from my point of view key takeaways from this are:

  • Allowing users to change the uid + gid maybe breaks their deployments if they use the official ingress-nginx container image
  • For people using custom ingress-nginx images that need a change in uid/gid will need the configurable uid+gid
  • Security scanners are "dumb" in the sense that they only see the securityContext configured in k8s and will complain regardless if the container image already has this in a secure state.

People (and tools) without deeper knowledge into the ingress-nginx project will see that the deployment doesn't have the securityContext field set and will think its insecure. This can be fixed using this PR.

So my proposal would be: We add the possibility to change the gid. Default 82 and add a line explaining the situation in the values.yaml. Something like The official ingress-nginx container image uses uid 101 and gid 82. Do not change this unless you're using a custom container image that requires the change.

WDYT?

@Gacko
Copy link
Member

Gacko commented Jul 29, 2024

For people using custom ingress-nginx images that need a change in uid/gid will need the configurable uid+gid

That's already possible via the values. You can completely override each and every security context in the chart. For the controller and the default backend you will also need to set the other parts, that are currently hard coded in the template, on your own then.

It's just not outlined explicitly so people hopefully would know they are doing / changing this on their own risk rather than in a provided and supported manner.

@Gacko
Copy link
Member

Gacko commented Jul 29, 2024

So my proposal would be: We add the possibility to change the gid. Default 82 and add a line explaining the situation in the values.yaml. Something like The official ingress-nginx container image uses uid 101 and gid 82. Do not change this unless you're using a custom container image that requires the change.

WDYT?

Sounds good to me. I would not set it to 82 without checking the process is currently running and working with this ID. This ID was taken from the passwd file, I don't know if it's also the GID the process is currently running with.

@longwuyuan
Copy link
Contributor

longwuyuan commented Jul 29, 2024 via email

@Gacko
Copy link
Member

Gacko commented Jul 29, 2024

Of course adding this value exposes an API users could use to change the GID, but we already have this API right now with controller.podSecurityContext and controller.securityContext. One can simply override them and make it even worse.

In the end values are just both sides of a coin: Public API for users and dynamic way of configuration for maintainers. We cannot simply hard-code stuff we don't want users to change. That's not a good code style. And also having our defaults set in the values instead of hard-coded makes our chart flexible for real issues we do not know of, yet.

So in the end I'm not afraid of adding this key to the values with a proper hint stating that changing this might break your deployment.

Right now there's an uncertainty in which GID the images are being executes with as we are not defining them and we therefore rely on the users runtime to work as expected. This also could cause issues. Apart from that and especially in the scope of recent discussion about the security of this project, I'd rather go the explicit way of setting the GID than leaving a question mark.

We had those insecure defaults and "up to the user" in the past and we decided to no longer leave it like this.

@Gacko Gacko changed the title Chart: Explicitly set runAsGroup for controller & default backend. Chart: Explicitly set runAsGroup. Jul 29, 2024
@ubergesundheit
Copy link
Contributor Author

Looks like 82 would be the right gid.

This is from a running ingress-nginx pod

ingress-nginx-app-controller-569dff6ccb-gpjgl:/etc/nginx$ id
uid=101(www-data) gid=82(www-data) groups=82(www-data)
ingress-nginx-app-controller-569dff6ccb-gpjgl:/etc/nginx$ ps
PID   USER     TIME  COMMAND
    1 www-data  0:00 /usr/bin/dumb-init -- /nginx-ingress-controller --enable-annotation-validation=true --publish-service=ingress-controllers/ingress-nginx-app-controller --election-id=ing
    6 www-data  4h30 /nginx-ingress-controller --enable-annotation-validation=true --publish-service=ingress-controllers/ingress-nginx-app-controller --election-id=ingress-nginx-app-leader 
   31 www-data  0:13 nginx: master process /usr/bin/nginx -c /etc/nginx/nginx.conf
 1594 www-data  1:20 nginx: worker process
 1627 www-data  1:17 nginx: worker process
 1660 www-data  1:14 nginx: worker process
 1693 www-data  1:13 nginx: worker process
 1726 www-data  0:00 nginx: cache manager process
 1846 www-data  0:00 bash
 1852 www-data  0:00 ps
ingress-nginx-app-controller-569dff6ccb-gpjgl:/etc/nginx$ head -11 /proc/1/status
Name:	dumb-init
Umask:	0022
State:	S (sleeping)
Tgid:	1
Ngid:	0
Pid:	1
PPid:	0
TracerPid:	0
Uid:	101	101	101	101
Gid:	82	82	82	82
FDSize:	64
ingress-nginx-app-controller-569dff6ccb-gpjgl:/etc/nginx$ head -11 /proc/6/status
Name:	nginx-ingress-c
Umask:	0022
State:	S (sleeping)
Tgid:	6
Ngid:	0
Pid:	6
PPid:	1
TracerPid:	0
Uid:	101	101	101	101
Gid:	82	82	82	82
FDSize:	64
ingress-nginx-app-controller-569dff6ccb-gpjgl:/etc/nginx$ head -11 /proc/31/status
Name:	nginx
Umask:	0022
State:	S (sleeping)
Tgid:	31
Ngid:	0
Pid:	31
PPid:	6
TracerPid:	0
Uid:	101	101	101	101
Gid:	82	82	82	82
FDSize:	64
ingress-nginx-app-controller-569dff6ccb-gpjgl:/etc/nginx$

The interesting lines are the Gid: 82 82 82 82 from the pid files

Set a default value for the runAsGroup in container securityContexts of
the controller and default backend.

Also set the runAsGroup for opentelemetry and webhook Job container
securityContexts.

Signed-off-by: Gerald Pape <[email protected]>
Copy link
Member

@Gacko Gacko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a more concise suggestion with the output of running id inside a container.

Also I added the double dashes so your hint is included in the README.md.

charts/ingress-nginx/values.yaml Outdated Show resolved Hide resolved
charts/ingress-nginx/values.yaml Show resolved Hide resolved
@ubergesundheit
Copy link
Contributor Author

Thanks for your suggestions. I ran helm-docs again and have pushed it. :)

@longwuyuan
Copy link
Contributor

@Gacko I completely ack that there is controller.podSecurityContext. And this my opinion.

  • Its not a improvement to first have controller.podSecurityContext and then add a second troublemaker key for gid setting. There are NO issues being reported frequently about this. There is no CVE around this.

  • Admins are rather accustomed to the field controller.podSecurityContext and know how to deal with it. The implications of adding a second lesser-known key, that causes the same trouble as the first one, is not a improvement.

  • Addition of this new key is not being coupled with validations. Possible values some un-suspecting user may put there is adding a new risk. This new key is a new risk, that is not existing currently, so its not a improvement.

  • Currently gid not being set, is not a high-enough threat or reason to adding a new risky configuration option. Not an improvement.

  • Gerard confirms the gid is 82 for pid of controller. So adding a new risky config option with zero validation, is not a improvement.

The current runtime-set-gid-of-82 is not a broken/insecure behavior needing a fix/change.

@Gacko
Copy link
Member

Gacko commented Jul 29, 2024

Once again I'd like to point out that this change is not only meant for giving users the possibility of changing the chart to their needs - because they should not do this in this particular case. If users want to override stuff and therefore break their deployment, they already can do so today.

This change is for us as maintainers being explicit about a security relevant setting of our product instead of leaving it open to users' environments.

Developing a secure solution should not be about reacting on issues and CVEs, it should be secure by default.

Also both controller.podSecurityContext and controller.securityContext lead to omitting all the defaults we are currently setting via the template in the _helpers.tpl. So as soon as someone is just setting controller.securityContext.runAsGroup, all the other security context relevant values are no longer being considered nor rendered. I therefore prefer another granular option like controller.image.runAsGroup over having to re-define the whole security context settings, also because they are partially dynamic in the template while they are fully static when being set through controller.podSecurityContext and controller.securityContext.

In the end we are telling users only to change this value if they want to run a custom image. So from a maintenance point of view nothing changes. Defining a runAsGroup is a best practice and I will not wait until someones raises an issue or - even worse - a CVE.

@longwuyuan
Copy link
Contributor

@Gacko thanks.

Copy link
Member

@Gacko Gacko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 29, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Gacko, ubergesundheit

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Gacko
Copy link
Member

Gacko commented Jul 29, 2024

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 29, 2024
@k8s-ci-robot k8s-ci-robot merged commit 36df47f into kubernetes:main Jul 29, 2024
46 checks passed
@ubergesundheit ubergesundheit deleted the runAsGroup branch July 29, 2024 13:31
@ubergesundheit
Copy link
Contributor Author

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/helm Issues or PRs related to helm charts cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants