Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configure IC root filesystem as read-only #3548

Merged
merged 1 commit into from
Mar 16, 2023

Conversation

sigv
Copy link
Contributor

@sigv sigv commented Feb 13, 2023

KIC-1094

Proposed changes

The Nginx Ingress Controller has various protections against attacks, such as running the service as non-root to avoid changes to files. An additional industry best practice is having root filesystems set as read-only so that the attack surface is further reduced by limiting changes to binaries and libraries. This commit ensures such a defense in depth is enabled for the Nginx Ingress Controller.

To accomplish read-only rootfs, we will create emptyDir Volumes for

  • /etc/nginx
  • /var/cache/nginx
  • /var/lib/nginx
  • /var/log/nginx

We populate /etc mount with an Init Container, as there are generic configuration files here. If the base image is updated via Helm chart, then Init Container will pull files from the new base image.

The other three empty volumes are populated during runtime, as they are not expected to have valuable files as part of base image:

  • /var/cache mount will be used in case caching is enabled.
  • /var/lib is used for some sockets and temporary files.
  • /var/log is used for logging in the default configuration.

By default, the emptyDir will be persisted as long as the Pod is not disposed of. Simple Pod crashes will retain the data.

There is no maximum size limit implemented currently, but may be added at some point in time as a further protection. For improved performance, cache could be moved to ram tmpfs as well (currently backed by disk).

Do note, the initial commit does not set read-only root filesystem as default. Instead, this is an opt-in feature available on the chart. Users of non-Helm YAML manifests need to manually uncomment the aforementioned volumes and initContainers.

Resolves #1677.

Checklist

Before creating a PR, run through this checklist and mark each as complete.

  • I have read the CONTRIBUTING doc
  • I have added tests that prove my fix is effective or that my feature works
  • I have checked that all unit tests pass after adding my changes
  • I have updated necessary documentation
  • I have rebased my branch onto main
  • I will ensure my PR is targeting the main branch and pulling from my branch from my own fork

@sigv sigv requested a review from a team as a code owner February 13, 2023 14:37
@github-actions github-actions bot added the helm_chart Pull requests that update the Helm Chart label Feb 13, 2023
@sigv
Copy link
Contributor Author

sigv commented Feb 13, 2023

I have added tests that prove my fix is effective or that my feature works

Should I add tests that check whether the applied K8s pods have read-only filesystems?
I presume that's a no - in which case the bullet-point is not applicable.

I have checked that all unit tests pass after adding my changes

I have no explicit errors on test suite, and all tests are reported as passed/skipped/warning, when running from my Windows machine. But this is strange result after tweaking the rsplit call here which expects "/" as path separator. but path is being passed around further and \ errors out as invalid argument (OSError).
Will try to re-test later on from a Linux machine to see how it looks there.

@sigv
Copy link
Contributor Author

sigv commented Feb 13, 2023

There is a spike/readOnlyRootFileSystem branch from May 2022, which does not appear to have been picked up, so I am proposing a solution for that. The approach tries to be as non-intrusive as possible, by setting up empty scratch directories as emptyDirs, while pulling in /etc/nginx from base image via Init Container so that entrypoint or configuration itself does not need to have any changes.

This has potential of allowing the /etc/nginx files to be writeable by the nginx process. If this is not preferred, then further changes do need to be evaluated to work around this specific shortcoming.

@sigv sigv force-pushed the readOnlyRootFilesystem branch from 238dc80 to cb5a48f Compare February 13, 2023 15:19
@sigv
Copy link
Contributor Author

sigv commented Feb 13, 2023

Pushed a new commit which configures Nginx Plus with its own different base image in respective static YAML files.
@vepatel, sorry for wasted runner CPU time!

@sigv
Copy link
Contributor Author

sigv commented Feb 13, 2023

Spun up an Ubuntu 22.04 (Jammy) VM in Azure, installed Docker and Minikube, and ran tests.
The test suite fails at Prometheus for some reason - will try to figure that out:

suite/test_prometheus_metrics.py::TestPrometheusExporter::test_https_metrics[ingress_controller0-expected_metrics0] - Failed: Connection failed after several attempts

@sigv sigv marked this pull request as draft February 13, 2023 18:32
@sigv
Copy link
Contributor Author

sigv commented Feb 13, 2023

Might be multiple Nginx Pods spinning up in one Deployment, expecting empty /var/lib/nginx/ in new Pod and resulting in:

[emerg] 14#14: bind() to unix:/var/lib/nginx/nginx-status.sock failed (98: Address already in use)
[emerg] 14#14: bind() to unix:/var/lib/nginx/nginx-config-version.sock failed (98: Address already in use)
[emerg] 14#14: bind() to unix:/var/lib/nginx/nginx-502-server.sock failed (98: Address already in use)
[emerg] 14#14: bind() to unix:/var/lib/nginx/nginx-418-server.sock failed (98: Address already in use)

@lucacome lucacome added proposal An issue that proposes a feature request backlog Pull requests/issues that are backlog items labels Feb 14, 2023
@sigv
Copy link
Contributor Author

sigv commented Feb 14, 2023

I am thinking if this is somewhat similar to trac.nginx.org#753 (Nginx leaves UNIX domain sockets after SIGQUIT).
Not the easiest to troubleshoot as part of the test suite.

My main workstation is Windows-based - running Docker in WSL2, with K8s v1.25.4 on top.
It annoyingly succeeds when running suite/test_prometheus_metrics.py, while I am seeing the failure on Ubuntu.
So I'm still trying to narrow down the root cause of "Address already in use".

@sigv sigv force-pushed the readOnlyRootFilesystem branch from cb5a48f to a82bcd7 Compare February 14, 2023 10:03
@sigv
Copy link
Contributor Author

sigv commented Feb 14, 2023

Cause appears to be left-over sockets after an unclean shutdown for nginx/1.23.3. These are not properly handled when nginx restarts using the same /var/lib/nginx volume. A very basic example is deletion of the active Pod.

@sigv
Copy link
Contributor Author

sigv commented Feb 15, 2023

Not seeing a nice way how to get a clean /var/lib/nginx for Pod restarts.
So most likely course of action is to change the Nginx image entrypoint, to check there are no leftover sockets:

find /var/lib/nginx -type s -print | while read s; do
    if ! grep -qF " $s" /proc/net/unix; then
        entrypoint_log "$0: removing leftover socket: $s"
        rm -v "$s"
    fi
done

If it's preferred to integrate the socket logic in this K8s repo instead of upstream Docker image, I could set it as a separate Init Container in the Deployment/DaemonSet:

find /var/lib/nginx -type s -print0 | xargs -0 -I{} sh -c 'grep -qF " {}" /proc/net/unix || rm -v "{}"'

What are the thoughts on this? Any better approaches that come to mind before I open a PR on that project?

@sigv
Copy link
Contributor Author

sigv commented Feb 16, 2023

@shaun-nx, how should the self-assign be interpreted in this case? Will you handle the socket cleanup?

Noticed that there are new commits on main - rebased the WIP branch.

@shaun-nx
Copy link
Contributor

shaun-nx commented Feb 16, 2023

Hi @sigv
We're discussing internally in the team where this feature will line up with our current priorities.
I've assigned myself to it as I am driving that discussion in our team now. I'll update you as soon as I can.

@brianehlert
Copy link
Collaborator

Sorry about any perceived delays. The team is heads down on some existing work and we will take a more serious look as soon as we can guarantee some focus time.
Some pretty interesting things in here. I am really grateful for your effort, and we look forward to working with you to finish this off and taking it.

@sigv
Copy link
Contributor Author

sigv commented Feb 17, 2023

All good - thank you for the update! Hopefully all the active tasks go smoothly 🙏🏻

I will circle back around to this PR in two weeks, just to check on the availability of involved team members.
Not pursuing a change in the base image until I hear back from you as to avoid additional workload 👍🏻

@sigv sigv force-pushed the readOnlyRootFilesystem branch from ed69a82 to 3a0c643 Compare February 17, 2023 08:58
@sigv
Copy link
Contributor Author

sigv commented Feb 21, 2023

Interesting CI failure on Nginx Plus:

WARNING: Ignoring https://pkgs.nginx.com/plus/R28/alpine/v3.17/main: Protocol error
ERROR: unable to select packages:
  nginx-plus (no such package):
    required by: world[nginx-plus]
  nginx-plus-module-njs (no such package):
    required by: world[nginx-plus-module-njs]
  nginx-plus-module-opentracing (no such package):
    required by: world[nginx-plus-module-opentracing]

Doubt there is something suddenly wrong with the PR that would be relevant. Is something up with pkgs.nginx.com connectivity? 👀

@sigv sigv force-pushed the readOnlyRootFilesystem branch from 3a0c643 to 81d2a66 Compare February 21, 2023 15:17
@codecov-commenter
Copy link

codecov-commenter commented Feb 21, 2023

Codecov Report

Merging #3548 (af79964) into main (88864c3) will increase coverage by 0.03%.
The diff coverage is n/a.

❗ Current head af79964 differs from pull request most recent head bb17a8e. Consider uploading reports for the commit bb17a8e to get more accurate results

📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@            Coverage Diff             @@
##             main    #3548      +/-   ##
==========================================
+ Coverage   52.21%   52.24%   +0.03%     
==========================================
  Files          59       59              
  Lines       16877    16873       -4     
==========================================
+ Hits         8812     8816       +4     
+ Misses       7768     7762       -6     
+ Partials      297      295       -2     

see 2 files with indirect coverage changes

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@sigv sigv force-pushed the readOnlyRootFilesystem branch 2 times, most recently from ff4cb0f to 45a545f Compare February 23, 2023 15:23
@sigv sigv force-pushed the readOnlyRootFilesystem branch from 885668d to 1d3af4f Compare March 15, 2023 09:59
@shaun-nx
Copy link
Contributor

Hi @sigv just wanted to keep you posted. Sorry that it has been taking a bit of time to do re-testing on this PR. Some additional internal work has required my focus. I should have an update for you tomorrow. Thanks for your patience 🙏

@sigv
Copy link
Contributor Author

sigv commented Mar 15, 2023

Thanks for the update!

I'm just keeping the PR rebased and without merge conflicts in the meantime 🙂

@sigv sigv force-pushed the readOnlyRootFilesystem branch 2 times, most recently from 73d246b to bb17a8e Compare March 16, 2023 07:15
The Nginx Ingress Controller has various protections against attacks,
such as running the service as non-root to avoid changes to files.
An additional industry best practice is having root filesystems set
as read-only so that the attack surface is further reduced by limiting
changes to binaries and libraries. This commit ensures such a defense
in depth is enabled for the Nginx Ingress Controller.

To accomplish read-only rootfs, we will create emptyDir Volumes for
- `/etc/nginx`
- `/var/cache/nginx`
- `/var/lib/nginx`
- `/var/log/nginx`

We populate `/etc` mount with an Init Container, as there are generic
configuration files here. If the base image is updated via Helm chart,
then Init Container will pull files from the new base image.

The other three empty volumes are populated during runtime, as they
are not expected to have valuable files as part of base image:
- `/var/cache` mount will be used in case caching is enabled.
- `/var/lib` is used for some sockets and temporary files.
- `/var/log` is used for logging in the default configuration.

By default, the emptyDir will be persisted as long as the Pod is not
disposed of. Simple Pod crashes will retain the data.

There is no maximum size limit implemented currently, but may be added
at some point in time as a further protection. For improved performance,
cache could be moved to ram tmpfs as well (currently backed by disk).

Do note, the initial commit does not set read-only root filesystem
as default. Instead, this is an opt-in feature available on the chart.
Users of non-Helm YAML manifests need to manually uncomment the
aforementioned volumes and initContainers.

Co-authored-by: Shaun <[email protected]>
@sigv sigv force-pushed the readOnlyRootFilesystem branch from bb17a8e to a10301e Compare March 16, 2023 12:42
@sigv sigv requested review from vepatel and removed request for shaun-nx, lucacome and jasonwilliams14 March 16, 2023 12:43
@sigv
Copy link
Contributor Author

sigv commented Mar 16, 2023

And I feel stupid about my 're-review' button dropping others again. Sorry! 😐

Copy link
Contributor

@shaun-nx shaun-nx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks all good to me. Thanks again for this work @sigv 🚀

@sigv
Copy link
Contributor Author

sigv commented Mar 16, 2023

Although: the potential issue of nginx being killed/dying still stands. It could leave a socket file behind that is no longer used. This would block new nginx process start-up. I wonder if this should be documented in some page - "Known issues" for release?

This should be considered in the backlog as well.
I would not expect people to be hit often by this, and upgrade/redeployment would issue a new Pod with a fresh volume.. but I at the same time believe risk of hitting this is much higher in a containerized workflow, due to the "cattle not pets" mentality for Pods, than compared with "regular IaaS" webservers.

Copy link
Contributor

@vepatel vepatel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the patience during review process @sigv, we tested upgrades from previous version with and without readOnlyFileSystem and all looks okay to me 👍🏼

@ciarams87 ciarams87 merged commit 97ef3c3 into nginx:main Mar 16, 2023
@sigv
Copy link
Contributor Author

sigv commented Mar 16, 2023

@vepatel, it's a concern that crept up during my local testing. It should be all safe, but as there are more eyes on it now, wanted to highlight it.

Upgrade and redeployment process should be good. The potential issue I observed is when Pod dies abruptly leaving a socket behind. That seemed similar to trac.nginx.org#753 (Nginx leaves UNIX domain sockets after SIGQUIT).

KIC-813 issue title sounds promising.

Looking back at it, I did not test this on older versions.. And as this PR does not impact relevant behavior, it should be reproducible on versions before this PR, if the Pod is abused there. The socket file would simply be on root filesystem, which I believe is also persisted across restarts.

So the concern is most likely moot.

@sigv sigv deleted the readOnlyRootFilesystem branch March 16, 2023 16:13
@sigv
Copy link
Contributor Author

sigv commented Mar 16, 2023

Big thanks to the whole team! ❤️

Is there a calendar-based release schedule, or is it fully based on the items linked to a milestone? Asking as to understand if the official 3.1 tag is far out.

@vepatel
Copy link
Contributor

vepatel commented Mar 16, 2023

@sigv we're targeting 3.1.0 release by the end of this month!

@brianehlert
Copy link
Collaborator

We are planning on getting better with the Milestones. (I must have not pushed an update to that milestone, whoops)

@lucacome lucacome removed the backlog Pull requests/issues that are backlog items label Mar 17, 2023
@lucacome lucacome added this to the v3.1.0 milestone Mar 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Pull requests/issues for documentation helm_chart Pull requests that update the Helm Chart proposal An issue that proposes a feature request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Run Ngninx Kubernetes Ingress controller with readOnlyRootFilesystem: true
7 participants