Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kubeadm: Add a preflight check that the control-plane node has at least 1700MB of RAM #93275

Merged
merged 2 commits into from
Sep 3, 2020

Conversation

xlgao-zju
Copy link
Contributor

@xlgao-zju xlgao-zju commented Jul 21, 2020

What type of PR is this?
/kind bug

What this PR does / why we need it:
Add a preflight check that the control-plane node has at least 1700MB RAM

Which issue(s) this PR fixes:
Fixes kubernetes/kubeadm#1052

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

kubeadm: Add a preflight check that the control-plane node has at least 1700MB of RAM

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:


/assign @neolit123
/area kubeadm

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. release-note Denotes a PR that will be considered when it comes time to generate release notes. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 21, 2020
@k8s-ci-robot k8s-ci-robot requested review from brendandburns, kad and a team July 21, 2020 02:55
@k8s-ci-robot k8s-ci-robot added area/dependency Issues or PRs related to dependency changes area/kubeadm sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 21, 2020
@xlgao-zju
Copy link
Contributor Author

/retest


// Check number of memory required by kubeadm
func (mc MemCheck) Check() (warnings, errorList []error) {
actual := memory.TotalMemory() / 1024 / 1024 // TotalMemory returns bytes; convert to MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the control plane only runs on Linux it should be fine to use https://golang.org/pkg/syscall/#Sysinfo_t directly without adding another code dependency?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to avoid the extra dependency.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, let's use https://golang.org/pkg/syscall/#Sysinfo_t
we should add a TODO if some day the control-plane is supported on Windows.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any plan to support control plane on Windows or MacOS?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kubelet does not work on MacOS and there are no plans to support it soon, as far as i know.
control-plane on Windows is not supported and there are no plans to support it anytime soon, but maybe in the future.
the TODO you've added SGTM.

func (mc MemCheck) Check() (warnings, errorList []error) {
actual := memory.TotalMemory() / 1024 / 1024 // TotalMemory returns bytes; convert to MB
if actual < mc.Mem {
errorList = append(errorList, errors.Errorf("the system RAM (%d MB) is less than the minimum %d MB", actual, mc.Mem))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should treat it as a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 I think too small memory may affect the stability of the cluster...

and the cpu check will bring an error, I think the memory check should have the same behavior of cpu check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @rosti

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to keep it as an error, but lower the error threshold a bit. Often times folks might deploy control planes in machines that have a bit less than 2GB RAM (be it due to sharing RAM between the CPU & a built in GPU, or some other system reservation). Therefore it might be better to make it ~1700 MB. Below that amount of RAM running a stable control plane would be difficult.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably best to say "not recommended to have less than X memory"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@neolit123 @rosti the CPU checker is an error, I think maybe we should keep the memory checker having the same behavior.

if the CPU checker and memory checker have different behavior, it will be a little confusing...

@neolit123
Copy link
Member

/kind feature
/priority backlog

we are in code freeze for 1.19 and this should merge after that.
i will try to bring this as a topic on the kubeadm office hours.

@k8s-ci-robot k8s-ci-robot added kind/feature Categorizes issue or PR as related to a new feature. priority/backlog Higher priority than priority/awaiting-more-evidence. and removed needs-kind Indicates a PR lacks a `kind/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 21, 2020
@k8s-ci-robot k8s-ci-robot requested a review from rosti July 21, 2020 16:11
Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xlgao-zju !


// Check number of memory required by kubeadm
func (mc MemCheck) Check() (warnings, errorList []error) {
actual := memory.TotalMemory() / 1024 / 1024 // TotalMemory returns bytes; convert to MB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to avoid the extra dependency.

func (mc MemCheck) Check() (warnings, errorList []error) {
actual := memory.TotalMemory() / 1024 / 1024 // TotalMemory returns bytes; convert to MB
if actual < mc.Mem {
errorList = append(errorList, errors.Errorf("the system RAM (%d MB) is less than the minimum %d MB", actual, mc.Mem))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest to keep it as an error, but lower the error threshold a bit. Often times folks might deploy control planes in machines that have a bit less than 2GB RAM (be it due to sharing RAM between the CPU & a built in GPU, or some other system reservation). Therefore it might be better to make it ~1700 MB. Below that amount of RAM running a stable control plane would be difficult.

@@ -849,3 +849,24 @@ func TestNumCPUCheck(t *testing.T) {
})
}
}

func TestMemCheck(t *testing.T) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test will ONLY pass on Linux. should we remove? @neolit123 @rosti

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can skip it if the GOOS != "linux"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@xlgao-zju xlgao-zju force-pushed the check-mem branch 3 times, most recently from 4dfbdf1 to 58707fa Compare July 23, 2020 09:26
@xlgao-zju xlgao-zju force-pushed the check-mem branch 2 times, most recently from 02d5929 to 87e09bf Compare July 23, 2020 12:00
func TestMemCheck(t *testing.T) {
// skip this test, if OS in not Linux, since it will ONLY pass on Linux.
if runtime.GOOS != "linux" {
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of return a t.Skip("unsupported OS for memory check test) should be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@neolit123
Copy link
Member

i guess it should be an error for consistency.
the difference is that this one more likely to trigger on setups.

kubeadm: Add a preflight check that the control-plane node has at least 2GB RAM

the release note needs an update 1700MB

@xlgao-zju xlgao-zju changed the title kubeadm: Add a preflight check that the control-plane node has at least 2GB RAM kubeadm: Add a preflight check that the control-plane node has at least 1700MB Jul 24, 2020
@xlgao-zju
Copy link
Contributor Author

xlgao-zju commented Jul 24, 2020

the release note needs an update 1700MB

done. please take another look. @neolit123 @rosti @johscheuer

@neolit123
Copy link
Member

@xlgao-zju another minor change for the release note:
should be "1700MB of RAM".

reminder we are in code freeze:

Pending — Not mergeable. Must be in milestone v1.19.

/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: neolit123, xlgao-zju

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 24, 2020
@xlgao-zju xlgao-zju changed the title kubeadm: Add a preflight check that the control-plane node has at least 1700MB kubeadm: Add a preflight check that the control-plane node has at least 1700MB of RAM Jul 27, 2020
@xlgao-zju
Copy link
Contributor Author

should be "1700MB of RAM".

@neolit123 done, will wait until we out of code freeze.

Copy link
Contributor

@rosti rosti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @xlgao-zju !
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Jul 28, 2020
@neolit123
Copy link
Member

/milestone v1.20-phase-feature

@k8s-ci-robot k8s-ci-robot added this to the v1.20-phase-feature milestone Aug 27, 2020
@neolit123
Copy link
Member

/milestone v1.20
/retest pull-kubernetes-e2e-kind-ipv6

@k8s-ci-robot
Copy link
Contributor

@neolit123: The /retest command does not accept any targets.
The following commands are available to trigger jobs:

  • /test pull-kubernetes-bazel-build
  • /test pull-kubernetes-bazel-test
  • /test pull-kubernetes-conformance-image-test
  • /test pull-kubernetes-conformance-kind-ipv6-parallel
  • /test pull-kubernetes-dependencies
  • /test pull-kubernetes-dependencies-canary
  • /test pull-kubernetes-e2e-aws-eks-1-13-correctness
  • /test pull-kubernetes-files-remake
  • /test pull-kubernetes-e2e-gce
  • /test pull-kubernetes-e2e-gce-no-stage
  • /test pull-kubernetes-e2e-gce-kubetest2
  • /test pull-kubernetes-e2e-gce-canary
  • /test pull-kubernetes-e2e-gce-ubuntu
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd
  • /test pull-kubernetes-e2e-gce-ubuntu-containerd-canary
  • /test pull-kubernetes-e2e-gce-rbe
  • /test pull-kubernetes-e2e-gce-alpha-features
  • /test pull-kubernetes-e2e-gce-device-plugin-gpu
  • /test pull-kubernetes-integration
  • /test pull-kubernetes-cross
  • /test pull-kubernetes-e2e-kind
  • /test pull-kubernetes-e2e-kind-canary
  • /test pull-kubernetes-e2e-kind-ipv6
  • /test pull-kubernetes-e2e-kind-ipv6-canary
  • /test pull-kubernetes-conformance-kind-ga-only
  • /test pull-kubernetes-conformance-kind-ga-only-parallel
  • /test pull-kubernetes-e2e-kops-aws
  • /test pull-kubernetes-bazel-build-canary
  • /test pull-kubernetes-bazel-test-canary
  • /test pull-kubernetes-bazel-test-integration-canary
  • /test pull-kubernetes-local-e2e
  • /test pull-publishing-bot-validate
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect
  • /test pull-kubernetes-e2e-gce-network-proxy-http-connect-canary
  • /test pull-kubernetes-e2e-gce-network-proxy-grpc
  • /test pull-kubernetes-e2e-gci-gce-autoscaling
  • /test pull-kubernetes-e2e-aks-engine-azure
  • /test pull-kubernetes-e2e-azure-disk
  • /test pull-kubernetes-e2e-azure-disk-vmss
  • /test pull-kubernetes-e2e-azure-file
  • /test pull-kubernetes-e2e-kind-dual-canary
  • /test pull-kubernetes-e2e-gci-gce-ipvs
  • /test pull-kubernetes-node-e2e
  • /test pull-kubernetes-e2e-containerd-gce
  • /test pull-kubernetes-node-e2e-containerd
  • /test pull-kubernetes-node-e2e-alpha
  • /test pull-kubernetes-node-kubelet-serial-cpu-manager
  • /test pull-kubernetes-node-kubelet-serial-topology-manager
  • /test pull-kubernetes-node-kubelet-serial-hugepages
  • /test pull-kubernetes-node-crio1-18-e2e
  • /test pull-kubernetes-e2e-gce-100-performance
  • /test pull-kubernetes-e2e-gce-big-performance
  • /test pull-kubernetes-e2e-gce-correctness
  • /test pull-kubernetes-e2e-gce-large-performance
  • /test pull-kubernetes-kubemark-e2e-gce-big
  • /test pull-kubernetes-kubemark-e2e-gce-scale
  • /test pull-kubernetes-e2e-gce-storage-slow
  • /test pull-kubernetes-e2e-gce-storage-snapshot
  • /test pull-kubernetes-e2e-gce-storage-slow-rbe
  • /test pull-kubernetes-e2e-gce-csi-serial
  • /test pull-kubernetes-e2e-gce-iscsi
  • /test pull-kubernetes-e2e-gce-iscsi-serial
  • /test pull-kubernetes-e2e-gce-storage-disruptive
  • /test pull-kubernetes-e2e-aks-engine-azure-windows
  • /test pull-kubernetes-e2e-azure-disk-windows
  • /test pull-kubernetes-e2e-azure-file-windows
  • /test pull-kubernetes-typecheck
  • /test pull-kubernetes-verify
  • /test pull-kubernetes-e2e-windows-gce

Use /test all to run the following jobs:

  • pull-kubernetes-bazel-build
  • pull-kubernetes-bazel-test
  • pull-kubernetes-dependencies
  • pull-kubernetes-e2e-gce-ubuntu-containerd
  • pull-kubernetes-integration
  • pull-kubernetes-e2e-kind
  • pull-kubernetes-e2e-kind-ipv6
  • pull-kubernetes-conformance-kind-ga-only-parallel
  • pull-kubernetes-node-e2e
  • pull-kubernetes-e2e-gce-100-performance
  • pull-kubernetes-e2e-azure-disk-windows
  • pull-kubernetes-e2e-azure-file-windows
  • pull-kubernetes-typecheck
  • pull-kubernetes-verify

In response to this:

/milestone v1.20
/retest pull-kubernetes-e2e-kind-ipv6

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@neolit123
Copy link
Member

/test pull-kubernetes-e2e-kind-ipv6

@k8s-ci-robot k8s-ci-robot merged commit 92ba3eb into kubernetes:master Sep 3, 2020
@xlgao-zju xlgao-zju deleted the check-mem branch September 4, 2020 09:33
@abelbarrera15
Copy link
Contributor

@xlgao-zju , I think something with this merge didn't quite workout. I just used kubeadm init on a 7.3GB of available memory machine and it threw:

[ERROR Mem]: the system RAM (1 MB) is less than the minimum 1700 MB

after using: sudo kubeadm init --pod-network-cidr=10.244.0.0/16

It ran regardless once I added "--ignore-preflight-errors=Mem"

But this should not be intended behavior -- evidence of actual mem below:

pi@k8s-master:~ $ free -h
total used free shared buff/cache available
Mem: 7.7Gi 112Mi 6.9Gi 8.0Mi 679Mi 7.3Gi
Swap: 0B 0B 0B

@neolit123
Copy link
Member

can you please log an issue in the kubernetes/kubeadm repository and fill the details in the issue template?
thanks

@xlgao-zju
Copy link
Contributor Author

@abelbarrera15 as @neolit123 said, let's dig this in k/kubeadm repo.

@abelbarrera15
Copy link
Contributor

abelbarrera15 commented Dec 16, 2020

Will do! Thank guys. Will post there in a sec.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/dependency Issues or PRs related to dependency changes area/kubeadm cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/backlog Higher priority than priority/awaiting-more-evidence. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/cluster-lifecycle Categorizes an issue or PR as relevant to SIG Cluster Lifecycle. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Preflight checks shall check hardware requirements.
6 participants