Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix MaxWorkerOpenFiles calculation on high cores nodes #7107

Merged
merged 3 commits into from
Jun 29, 2021

Conversation

nanorobocop
Copy link
Contributor

@nanorobocop nanorobocop commented May 4, 2021

Re-open of my stale PR #5627.
Added tests and fixed doc.

What this PR does / why we need it:

Default MaxWorkerOpenFiles (max-worker-open-files) value becomes too low on high spec nodes with multiple of cores.
As workaround it always possible to provide exact value through configMap: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#max-worker-open-files

Currently max opened files is calculated based on RLIMIT_NOFILE divided on amount of working processes.

maxOpenFiles := (rlimitMaxNumFiles() / wp) - 1024
klog.V(3).Infof("Maximum number of open file descriptors: %d", maxOpenFiles)
if maxOpenFiles < 1024 {
// this means the value of RLIMIT_NOFILE is too low.
maxOpenFiles = 1024
}

Working processes is calculated based on amount of cores.

WorkerProcesses: strconv.Itoa(runtime.NumCPU()),

Our example:

$ sysctl fs.file-max
fs.file-max = 6506158
$ ulimit -Sn
65536
$ ulimit -Hn
65536
$ lscpu  | grep 'CPU(s)' | head -n 1
CPU(s):              40

As a result, we have 65536 / 40 - 1024 = 614.4. This is less than 1024, so 1024 is actually used.
This value is too small for single worker on heavy loaded server with thousands total connections.

Since RLIMIT_NOFILE is already a value per single process, there's no need divide by worker_processes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Which issue/s this PR fixes

How Has This Been Tested?

E2e test

Checklist:

  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I've read the CONTRIBUTION guide
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label May 4, 2021
@k8s-ci-robot
Copy link
Contributor

Hi @nanorobocop. Thanks for your PR.

I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 4, 2021
@k8s-ci-robot k8s-ci-robot requested review from cmluciano and rikatz May 4, 2021 02:30
@tao12345666333
Copy link
Member

/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels May 4, 2021
@nanorobocop
Copy link
Contributor Author

/retest

@tao12345666333
Copy link
Member

error log:

    nginx_test.go:154: creating tcp listener: listen tcp :10246: bind: address already in use
--- FAIL: TestConfigureDynamically (0.00s)
=== RUN   TestConfigureCertificates
    nginx_test.go:308: creating tcp listener: listen tcp :10246: bind: address already in use
--- FAIL: TestConfigureCertificates (0.00s)

@nanorobocop
Copy link
Contributor Author

I guess that's some flaky test. It failed for me couple of times out of few dozens.

@nanorobocop
Copy link
Contributor Author

/retest

@nanorobocop
Copy link
Contributor Author

Hm, seems other PRs have same problem https://prow.k8s.io/?repo=kubernetes%2Fingress-nginx&job=pull-ingress-nginx-test

@nanorobocop
Copy link
Contributor Author

Tests passed after I disabled parallel test execution 26e267a.
t.Parallel() not used anywhere in tests, so that should be OK.

@joostschriek
Copy link

joostschriek commented May 4, 2021

hey 👋 I was peeking at this too, my PR has the same issues. The testgrid pointed me to this PR. I haven't looked at it yet, but it could be the source :)

EDIT man i should've looked at it before posting 🙃

@nanorobocop
Copy link
Contributor Author

Hi @cmluciano, @rikatz, could you please help to review this PR?
Is anything else required to proceed?

@nanorobocop
Copy link
Contributor Author

/assign @rikatz

internal/ingress/controller/nginx.go Outdated Show resolved Hide resolved
build/test.sh Outdated Show resolved Hide resolved
@rikatz
Copy link
Contributor

rikatz commented Jun 29, 2021

/lgtm
/approve

If you want this one also to be available in v0.X releases (the stable ones), please cherry pick this PR to branch release-v1beta1

Thanks!!

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jun 29, 2021
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: justinmchase, nanorobocop, rikatz

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 29, 2021
@k8s-ci-robot k8s-ci-robot merged commit 4bdb553 into kubernetes:master Jun 29, 2021
rikatz pushed a commit to rikatz/ingress-nginx that referenced this pull request Aug 21, 2021
* Fix MaxWorkerOpenFiles calculation on high cores nodes

* Add e2e test for rlimit_nofile

* Fix doc for max-worker-open-files
k8s-ci-robot pushed a commit that referenced this pull request Aug 21, 2021
* Drop v1beta1 from ingress nginx (#7156)

* Drop v1beta1 from ingress nginx

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix intorstr logic in controller

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* fixing admission

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* more intorstr fixing

* correct template rendering

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix e2e tests for v1 api

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix gofmt errors

* This is finally working...almost there...

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Re-add removed validation of AdmissionReview

* Prepare for v1.0.0-alpha.1 release

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Update changelog and matrix table for v1.0.0-alpha.1 (#7274)

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* add docs for syslog feature (#7219)

* Fix link to e2e-tests.md in developer-guide (#7201)

* Use ENV expansion for namespace in args (#7146)

Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does.

* chart: using Helm builtin capabilities check (#7190)

Signed-off-by: Jintao Zhang <[email protected]>

* Update proper default value for HTTP2MaxConcurrentStreams in Docs (#6944)

It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780

* Fix MaxWorkerOpenFiles calculation on high cores nodes (#7107)

* Fix MaxWorkerOpenFiles calculation on high cores nodes

* Add e2e test for rlimit_nofile

* Fix doc for max-worker-open-files

* ingress/tcp: add additional error logging on failed (#7208)

* Add file containing stable release (#7313)

* Handle named (non-numeric) ports correctly (#7311)

Signed-off-by: Carlos Panato <[email protected]>

* Updated v1beta1 to v1 as its deprecated (#7308)

* remove mercurial from build (#7031)

* Retry to download maxmind DB if it fails (#7242)

* Retry to download maxmind DB if it fails.

Signed-off-by: Sergey Shakuto <[email protected]>

* Add retries count arg, move retry logic into DownloadGeoLite2DB function

Signed-off-by: Sergey Shakuto <[email protected]>

* Reorder parameters in DownloadGeoLite2DB

Signed-off-by: Sergey Shakuto <[email protected]>

* Remove hardcoded value

Signed-off-by: Sergey Shakuto <[email protected]>

* Release v1.0.0-alpha.1

* Add changelog for v1.0.0-alpha.2

* controller: ignore non-service backends (#7332)

* controller: ignore non-service backends

Signed-off-by: Carlos Panato <[email protected]>

* update per feedback

Signed-off-by: Carlos Panato <[email protected]>

* fix: allow scope/tcp/udp configmap namespace to altered (#7161)

* Lower webhook timeout for digital ocean (#7319)

* Lower webhook timeout for digital ocean

* Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29

* update OWNERS and aliases files (#7365) (#7366)

Signed-off-by: Carlos Panato <[email protected]>

* Downgrade Lua modules for s390x (#7355)

Downgrade Lua modules to last known working version.

* Fix IngressClass logic for newer releases (#7341)

* Fix IngressClass logic for newer releases

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Change e2e tests for the new IngressClass presence

* Fix chart and admission tests

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix helm chart test

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix reviews

* Remove ingressclass code from admission

* update tag to v1.0.0-beta.1

* update readme and changelog for v1.0.0-beta.1

* Release v1.0.0-beta.1 - helm and manifests (#7422)

* Change the order of annotation just to trigger a new helm release (#7425)

* [cherry-pick] Add dev-v1 branch into helm releaser (#7428)

* Add dev-v1 branch into helm releaser (#7424)

* chore: add link for artifacthub.io/prerelease annotations

Signed-off-by: Jintao Zhang <[email protected]>

Co-authored-by: Ricardo Katz <[email protected]>

* k8s job ci pipeline for dev-v1 br v1.22.0 (#7453)

* k8s job ci pipeline for dev-v1 br v1.22.0

Signed-off-by: Neha Lohia <[email protected]>

* k8s job ci pipeline for dev-v1 br v1.21.2

Signed-off-by: Neha Lohia <[email protected]>

* remove v1.21.1 version

Signed-off-by: Neha Lohia <[email protected]>

* Add controller.watchIngressWithoutClass config option (#7459)

Signed-off-by: Akshit Grover <[email protected]>

* Release new helm chart with certgen fixed (#7478)

* Update go version, modules and remove ioutil

* Release new helm chart with certgen fixed

* changed appversion, chartversion, TAG, image (#7490)

* Fix CI conflict

* Fix CI conflict

* Fix build.sh from rebase process

* Fix controller_test post rebase

Co-authored-by: Tianhao Guo <[email protected]>
Co-authored-by: Ray <[email protected]>
Co-authored-by: Bill Cassidy <[email protected]>
Co-authored-by: Jintao Zhang <[email protected]>
Co-authored-by: Sathish Ramani <[email protected]>
Co-authored-by: Mansur Marvanov <[email protected]>
Co-authored-by: Matt1360 <[email protected]>
Co-authored-by: Carlos Tadeu Panato Junior <[email protected]>
Co-authored-by: Kundan Kumar <[email protected]>
Co-authored-by: Tom Hayward <[email protected]>
Co-authored-by: Sergey Shakuto <[email protected]>
Co-authored-by: Tore <[email protected]>
Co-authored-by: Bouke Versteegh <[email protected]>
Co-authored-by: Shahid <[email protected]>
Co-authored-by: James Strong <[email protected]>
Co-authored-by: Long Wu Yuan <[email protected]>
Co-authored-by: Jintao Zhang <[email protected]>
Co-authored-by: Neha Lohia <[email protected]>
Co-authored-by: Akshit Grover <[email protected]>
rchshld pushed a commit to joomcode/ingress-nginx that referenced this pull request May 19, 2023
* Drop v1beta1 from ingress nginx (kubernetes#7156)

* Drop v1beta1 from ingress nginx

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix intorstr logic in controller

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* fixing admission

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* more intorstr fixing

* correct template rendering

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix e2e tests for v1 api

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix gofmt errors

* This is finally working...almost there...

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Re-add removed validation of AdmissionReview

* Prepare for v1.0.0-alpha.1 release

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Update changelog and matrix table for v1.0.0-alpha.1 (kubernetes#7274)

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* add docs for syslog feature (kubernetes#7219)

* Fix link to e2e-tests.md in developer-guide (kubernetes#7201)

* Use ENV expansion for namespace in args (kubernetes#7146)

Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does.

* chart: using Helm builtin capabilities check (kubernetes#7190)

Signed-off-by: Jintao Zhang <[email protected]>

* Update proper default value for HTTP2MaxConcurrentStreams in Docs (kubernetes#6944)

It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780

* Fix MaxWorkerOpenFiles calculation on high cores nodes (kubernetes#7107)

* Fix MaxWorkerOpenFiles calculation on high cores nodes

* Add e2e test for rlimit_nofile

* Fix doc for max-worker-open-files

* ingress/tcp: add additional error logging on failed (kubernetes#7208)

* Add file containing stable release (kubernetes#7313)

* Handle named (non-numeric) ports correctly (kubernetes#7311)

Signed-off-by: Carlos Panato <[email protected]>

* Updated v1beta1 to v1 as its deprecated (kubernetes#7308)

* remove mercurial from build (kubernetes#7031)

* Retry to download maxmind DB if it fails (kubernetes#7242)

* Retry to download maxmind DB if it fails.

Signed-off-by: Sergey Shakuto <[email protected]>

* Add retries count arg, move retry logic into DownloadGeoLite2DB function

Signed-off-by: Sergey Shakuto <[email protected]>

* Reorder parameters in DownloadGeoLite2DB

Signed-off-by: Sergey Shakuto <[email protected]>

* Remove hardcoded value

Signed-off-by: Sergey Shakuto <[email protected]>

* Release v1.0.0-alpha.1

* Add changelog for v1.0.0-alpha.2

* controller: ignore non-service backends (kubernetes#7332)

* controller: ignore non-service backends

Signed-off-by: Carlos Panato <[email protected]>

* update per feedback

Signed-off-by: Carlos Panato <[email protected]>

* fix: allow scope/tcp/udp configmap namespace to altered (kubernetes#7161)

* Lower webhook timeout for digital ocean (kubernetes#7319)

* Lower webhook timeout for digital ocean

* Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29

* update OWNERS and aliases files (kubernetes#7365) (kubernetes#7366)

Signed-off-by: Carlos Panato <[email protected]>

* Downgrade Lua modules for s390x (kubernetes#7355)

Downgrade Lua modules to last known working version.

* Fix IngressClass logic for newer releases (kubernetes#7341)

* Fix IngressClass logic for newer releases

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Change e2e tests for the new IngressClass presence

* Fix chart and admission tests

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix helm chart test

Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]>

* Fix reviews

* Remove ingressclass code from admission

* update tag to v1.0.0-beta.1

* update readme and changelog for v1.0.0-beta.1

* Release v1.0.0-beta.1 - helm and manifests (kubernetes#7422)

* Change the order of annotation just to trigger a new helm release (kubernetes#7425)

* [cherry-pick] Add dev-v1 branch into helm releaser (kubernetes#7428)

* Add dev-v1 branch into helm releaser (kubernetes#7424)

* chore: add link for artifacthub.io/prerelease annotations

Signed-off-by: Jintao Zhang <[email protected]>

Co-authored-by: Ricardo Katz <[email protected]>

* k8s job ci pipeline for dev-v1 br v1.22.0 (kubernetes#7453)

* k8s job ci pipeline for dev-v1 br v1.22.0

Signed-off-by: Neha Lohia <[email protected]>

* k8s job ci pipeline for dev-v1 br v1.21.2

Signed-off-by: Neha Lohia <[email protected]>

* remove v1.21.1 version

Signed-off-by: Neha Lohia <[email protected]>

* Add controller.watchIngressWithoutClass config option (kubernetes#7459)

Signed-off-by: Akshit Grover <[email protected]>

* Release new helm chart with certgen fixed (kubernetes#7478)

* Update go version, modules and remove ioutil

* Release new helm chart with certgen fixed

* changed appversion, chartversion, TAG, image (kubernetes#7490)

* Fix CI conflict

* Fix CI conflict

* Fix build.sh from rebase process

* Fix controller_test post rebase

Co-authored-by: Tianhao Guo <[email protected]>
Co-authored-by: Ray <[email protected]>
Co-authored-by: Bill Cassidy <[email protected]>
Co-authored-by: Jintao Zhang <[email protected]>
Co-authored-by: Sathish Ramani <[email protected]>
Co-authored-by: Mansur Marvanov <[email protected]>
Co-authored-by: Matt1360 <[email protected]>
Co-authored-by: Carlos Tadeu Panato Junior <[email protected]>
Co-authored-by: Kundan Kumar <[email protected]>
Co-authored-by: Tom Hayward <[email protected]>
Co-authored-by: Sergey Shakuto <[email protected]>
Co-authored-by: Tore <[email protected]>
Co-authored-by: Bouke Versteegh <[email protected]>
Co-authored-by: Shahid <[email protected]>
Co-authored-by: James Strong <[email protected]>
Co-authored-by: Long Wu Yuan <[email protected]>
Co-authored-by: Jintao Zhang <[email protected]>
Co-authored-by: Neha Lohia <[email protected]>
Co-authored-by: Akshit Grover <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants