-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix MaxWorkerOpenFiles calculation on high cores nodes #7107
Conversation
Hi @nanorobocop. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/ok-to-test |
/retest |
error log:
|
I guess that's some flaky test. It failed for me couple of times out of few dozens. |
/retest |
Hm, seems other PRs have same problem https://prow.k8s.io/?repo=kubernetes%2Fingress-nginx&job=pull-ingress-nginx-test |
Tests passed after I disabled parallel test execution 26e267a. |
EDIT man i should've looked at it before posting 🙃 |
Hi @cmluciano, @rikatz, could you please help to review this PR? |
/assign @rikatz |
26e267a
to
7a7d1ee
Compare
7a7d1ee
to
df46d83
Compare
/lgtm If you want this one also to be available in v0.X releases (the stable ones), please cherry pick this PR to branch release-v1beta1 Thanks!! |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: justinmchase, nanorobocop, rikatz The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
* Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files
* Drop v1beta1 from ingress nginx (#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Update changelog and matrix table for v1.0.0-alpha.1 (#7274) Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * add docs for syslog feature (#7219) * Fix link to e2e-tests.md in developer-guide (#7201) * Use ENV expansion for namespace in args (#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (#7190) Signed-off-by: Jintao Zhang <[email protected]> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (#7208) * Add file containing stable release (#7313) * Handle named (non-numeric) ports correctly (#7311) Signed-off-by: Carlos Panato <[email protected]> * Updated v1beta1 to v1 as its deprecated (#7308) * remove mercurial from build (#7031) * Retry to download maxmind DB if it fails (#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <[email protected]> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <[email protected]> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <[email protected]> * Remove hardcoded value Signed-off-by: Sergey Shakuto <[email protected]> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <[email protected]> * update per feedback Signed-off-by: Carlos Panato <[email protected]> * fix: allow scope/tcp/udp configmap namespace to altered (#7161) * Lower webhook timeout for digital ocean (#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (#7365) (#7366) Signed-off-by: Carlos Panato <[email protected]> * Downgrade Lua modules for s390x (#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (#7422) * Change the order of annotation just to trigger a new helm release (#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (#7428) * Add dev-v1 branch into helm releaser (#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <[email protected]> Co-authored-by: Ricardo Katz <[email protected]> * k8s job ci pipeline for dev-v1 br v1.22.0 (#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <[email protected]> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <[email protected]> * remove v1.21.1 version Signed-off-by: Neha Lohia <[email protected]> * Add controller.watchIngressWithoutClass config option (#7459) Signed-off-by: Akshit Grover <[email protected]> * Release new helm chart with certgen fixed (#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <[email protected]> Co-authored-by: Ray <[email protected]> Co-authored-by: Bill Cassidy <[email protected]> Co-authored-by: Jintao Zhang <[email protected]> Co-authored-by: Sathish Ramani <[email protected]> Co-authored-by: Mansur Marvanov <[email protected]> Co-authored-by: Matt1360 <[email protected]> Co-authored-by: Carlos Tadeu Panato Junior <[email protected]> Co-authored-by: Kundan Kumar <[email protected]> Co-authored-by: Tom Hayward <[email protected]> Co-authored-by: Sergey Shakuto <[email protected]> Co-authored-by: Tore <[email protected]> Co-authored-by: Bouke Versteegh <[email protected]> Co-authored-by: Shahid <[email protected]> Co-authored-by: James Strong <[email protected]> Co-authored-by: Long Wu Yuan <[email protected]> Co-authored-by: Jintao Zhang <[email protected]> Co-authored-by: Neha Lohia <[email protected]> Co-authored-by: Akshit Grover <[email protected]>
* Drop v1beta1 from ingress nginx (kubernetes#7156) * Drop v1beta1 from ingress nginx Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix intorstr logic in controller Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * fixing admission Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * more intorstr fixing * correct template rendering Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix e2e tests for v1 api Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix gofmt errors * This is finally working...almost there... Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Re-add removed validation of AdmissionReview * Prepare for v1.0.0-alpha.1 release Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Update changelog and matrix table for v1.0.0-alpha.1 (kubernetes#7274) Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * add docs for syslog feature (kubernetes#7219) * Fix link to e2e-tests.md in developer-guide (kubernetes#7201) * Use ENV expansion for namespace in args (kubernetes#7146) Update the DaemonSet namespace references to use the `POD_NAMESPACE` environment variable in the same way that the Deployment does. * chart: using Helm builtin capabilities check (kubernetes#7190) Signed-off-by: Jintao Zhang <[email protected]> * Update proper default value for HTTP2MaxConcurrentStreams in Docs (kubernetes#6944) It should be 128 as documented in https://github.com/kubernetes/ingress-nginx/blob/master/internal/ingress/controller/config/config.go#L780 * Fix MaxWorkerOpenFiles calculation on high cores nodes (kubernetes#7107) * Fix MaxWorkerOpenFiles calculation on high cores nodes * Add e2e test for rlimit_nofile * Fix doc for max-worker-open-files * ingress/tcp: add additional error logging on failed (kubernetes#7208) * Add file containing stable release (kubernetes#7313) * Handle named (non-numeric) ports correctly (kubernetes#7311) Signed-off-by: Carlos Panato <[email protected]> * Updated v1beta1 to v1 as its deprecated (kubernetes#7308) * remove mercurial from build (kubernetes#7031) * Retry to download maxmind DB if it fails (kubernetes#7242) * Retry to download maxmind DB if it fails. Signed-off-by: Sergey Shakuto <[email protected]> * Add retries count arg, move retry logic into DownloadGeoLite2DB function Signed-off-by: Sergey Shakuto <[email protected]> * Reorder parameters in DownloadGeoLite2DB Signed-off-by: Sergey Shakuto <[email protected]> * Remove hardcoded value Signed-off-by: Sergey Shakuto <[email protected]> * Release v1.0.0-alpha.1 * Add changelog for v1.0.0-alpha.2 * controller: ignore non-service backends (kubernetes#7332) * controller: ignore non-service backends Signed-off-by: Carlos Panato <[email protected]> * update per feedback Signed-off-by: Carlos Panato <[email protected]> * fix: allow scope/tcp/udp configmap namespace to altered (kubernetes#7161) * Lower webhook timeout for digital ocean (kubernetes#7319) * Lower webhook timeout for digital ocean * Set Digital Ocean value controller.admissionWebhooks.timeoutSeconds to 29 * update OWNERS and aliases files (kubernetes#7365) (kubernetes#7366) Signed-off-by: Carlos Panato <[email protected]> * Downgrade Lua modules for s390x (kubernetes#7355) Downgrade Lua modules to last known working version. * Fix IngressClass logic for newer releases (kubernetes#7341) * Fix IngressClass logic for newer releases Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Change e2e tests for the new IngressClass presence * Fix chart and admission tests Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix helm chart test Signed-off-by: Ricardo Pchevuzinske Katz <[email protected]> * Fix reviews * Remove ingressclass code from admission * update tag to v1.0.0-beta.1 * update readme and changelog for v1.0.0-beta.1 * Release v1.0.0-beta.1 - helm and manifests (kubernetes#7422) * Change the order of annotation just to trigger a new helm release (kubernetes#7425) * [cherry-pick] Add dev-v1 branch into helm releaser (kubernetes#7428) * Add dev-v1 branch into helm releaser (kubernetes#7424) * chore: add link for artifacthub.io/prerelease annotations Signed-off-by: Jintao Zhang <[email protected]> Co-authored-by: Ricardo Katz <[email protected]> * k8s job ci pipeline for dev-v1 br v1.22.0 (kubernetes#7453) * k8s job ci pipeline for dev-v1 br v1.22.0 Signed-off-by: Neha Lohia <[email protected]> * k8s job ci pipeline for dev-v1 br v1.21.2 Signed-off-by: Neha Lohia <[email protected]> * remove v1.21.1 version Signed-off-by: Neha Lohia <[email protected]> * Add controller.watchIngressWithoutClass config option (kubernetes#7459) Signed-off-by: Akshit Grover <[email protected]> * Release new helm chart with certgen fixed (kubernetes#7478) * Update go version, modules and remove ioutil * Release new helm chart with certgen fixed * changed appversion, chartversion, TAG, image (kubernetes#7490) * Fix CI conflict * Fix CI conflict * Fix build.sh from rebase process * Fix controller_test post rebase Co-authored-by: Tianhao Guo <[email protected]> Co-authored-by: Ray <[email protected]> Co-authored-by: Bill Cassidy <[email protected]> Co-authored-by: Jintao Zhang <[email protected]> Co-authored-by: Sathish Ramani <[email protected]> Co-authored-by: Mansur Marvanov <[email protected]> Co-authored-by: Matt1360 <[email protected]> Co-authored-by: Carlos Tadeu Panato Junior <[email protected]> Co-authored-by: Kundan Kumar <[email protected]> Co-authored-by: Tom Hayward <[email protected]> Co-authored-by: Sergey Shakuto <[email protected]> Co-authored-by: Tore <[email protected]> Co-authored-by: Bouke Versteegh <[email protected]> Co-authored-by: Shahid <[email protected]> Co-authored-by: James Strong <[email protected]> Co-authored-by: Long Wu Yuan <[email protected]> Co-authored-by: Jintao Zhang <[email protected]> Co-authored-by: Neha Lohia <[email protected]> Co-authored-by: Akshit Grover <[email protected]>
Re-open of my stale PR #5627.
Added tests and fixed doc.
What this PR does / why we need it:
Default MaxWorkerOpenFiles (
max-worker-open-files
) value becomes too low on high spec nodes with multiple of cores.As workaround it always possible to provide exact value through configMap: https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/configmap/#max-worker-open-files
Currently max opened files is calculated based on
RLIMIT_NOFILE
divided on amount of working processes.ingress-nginx/internal/ingress/controller/nginx.go
Lines 539 to 544 in 07b70f6
Working processes is calculated based on amount of cores.
ingress-nginx/internal/ingress/controller/config/config.go
Line 751 in bef2efc
Our example:
As a result, we have
65536 / 40 - 1024 = 614.4
. This is less than1024
, so1024
is actually used.This value is too small for single worker on heavy loaded server with thousands total connections.
Since RLIMIT_NOFILE is already a value per single process, there's no need divide by worker_processes.
Types of changes
Which issue/s this PR fixes
maxOpenFiles
as 64000 #2055 - similar problemHow Has This Been Tested?
E2e test
Checklist: