Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable to sort kube_control_plane including the first node #9756

Conversation

HoKim98
Copy link
Contributor

@HoKim98 HoKim98 commented Feb 4, 2023

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind cleanup

What this PR does / why we need it:

The first kubernetes control plane node has a very important role in overall kubespray workflows. However, if the first node is out of order with the others, the kubernetes cluster configuration will fail catastrophically. For example, certificate loss due to accidental ETCD and Kubernetes SSL key renewal.

Keeping the order of the first nodes is very important, but there is always the potential for human error to cause problems. So, I want to force the dynamic discovery logic to consider one of the kubernetes control plane nodes that are still functioning as the first node, so that we don't run into issues with node order in the first place.

Fortunately, I have been able to find that these attempts are already being made locally. Therefore, this PR will be a generalization of them. (#7989)

> BEFORE

  • groups['kube_control_plane'][0]
  • groups['kube_control_plane'] | first

> AFTER

  • first_kube_control_plane
    • If there is one or more running nodes: Choose the first one of them
    • If there is no running nodes (fresh install): Choose the first one of kube_control_plane

And if control planes are changed via cluster.yml and etc, it will automatically update the cluster-info to the first control plane node.

> BUG FIXES

Which issue(s) this PR fixes:

Fixes #3471 (only for kube_control_plane)

Special notes for your reviewer:

This PR has many changes for usage of kube_control_plane | first.

So many various tests are needed, but I can't be sure that this changes are complete yet, so I want to review with others.

Does this PR introduce a user-facing change?:

Changing the order of kube_control_plane is now allowed freely.
cluster-info configmap is automatically updated to the first kube_control_plane.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Feb 4, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @kerryeon. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Feb 4, 2023
@HoKim98 HoKim98 force-pushed the raplace-first-kube-control-plane branch from 2874158 to 0f66334 Compare February 8, 2023 05:20
@floryut floryut closed this Feb 9, 2023
@floryut floryut reopened this Feb 9, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: kerryeon
Once this PR has been reviewed and has the lgtm label, please assign liupeng0518 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@HoKim98 HoKim98 marked this pull request as ready for review February 13, 2023 20:35
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Feb 13, 2023
JRaver and others added 27 commits March 8, 2023 08:17
requirements-$ANSIBLE_VERSION.yml doesn't exist in Kubespray repo.
That was for supporting ansible 2.10-, and now Kubespray supports
2.11+. So this drops the part to avoid confusion.
crun_bin_dir was used to specify the destination of the crun binary during the
download process. This path must match with the value provided in the CRI-O
configuration file. So changing its value to bin_dir helps to mismatch errors.

Signed-off-by: Victor Morales <[email protected]>
* network_plugin/custom_cni: add CNI to apply provided manifests

Add a new simple custom_cni to install provided Kubernetes manifests.
This could be useful to use manifests directly provided by a CNI when
there are not support by Kubespray (i.e.: helm chart or any other manifests
generation method).

Co-authored-by: James Landrein <[email protected]>
Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* network_plugin/custom_cni: add test with cilium

Co-authored-by: James Landrein <[email protected]>
Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

---------

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>
Co-authored-by: James Landrein <[email protected]>
…kubernetes-sigs#9834)

* node: fix default kubelet/runtime cgroups when kube_reserved is false (default)

Commit 1c4db61 introduced a notion of
kube_reserved. This introduced a breaking change defaulting to use
kube.slice for the container_manager and the kubelet as if kube_reserved
was always enabled whereas it is disabled by default.

This commit fixes this by bringing back system.slice whenever
kube_reserved is disabled.

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

* inventory/sample: change false for kube_reserved as its the default

Changing the commented value in sample inventory to the actual default
value.

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>

---------

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>
…kubernetes-sigs#9718)

This commit removes the variable `use_localhost_as_kubeapi_loadbalancer`
and rather detects that we are in a situation where we can use the
localhost apiserver loadbalancer (meaning that we use the localhost load
balancer and that the same ports are used for both the load balancer and
the kube-apiserver).

This also cleanups the calico code to use `kube_apiserver_global_endpoint`
rather than implementing the same logic all over again.

Signed-off-by: Arthur Outhenin-Chalandre <[email protected]>
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Mar 8, 2023
@HoKim98 HoKim98 closed this Mar 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to add new master/etcd node to cluster