Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update MetalLB deployment, wait for resource. #9995

Merged
merged 2 commits into from
May 30, 2023

Conversation

Jeroen0494
Copy link
Contributor

@Jeroen0494 Jeroen0494 commented Apr 17, 2023

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespaces from that line:

/kind api-change

/kind bug

/kind cleanup
/kind design
/kind documentation
/kind failing-test
/kind feature
/kind flake

Fixes #10003

What this PR does / why we need it:

Continuation of PR #9120.

Does this PR introduce a user-facing change?:

Update MetalLB deployment, wait for resource.

@k8s-ci-robot k8s-ci-robot added kind/bug Categorizes issue or PR as related to a bug. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Apr 17, 2023
@k8s-ci-robot
Copy link
Contributor

Hi @Jeroen0494. Thanks for your PR.

I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added the needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. label Apr 17, 2023
@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Apr 17, 2023
@Jeroen0494
Copy link
Contributor Author

Also see my comment here:
#9120 (comment)

Copy link
Member

@floryut floryut left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jeroen0494 Thanks for the followup fix 🙇
/ok-to-test

@k8s-ci-robot k8s-ci-robot added ok-to-test Indicates a non-member PR verified by an org member that is safe to test. approved Indicates a PR has been approved by an approver from all required OWNERS files. and removed needs-ok-to-test Indicates a PR that requires an org member to verify it is safe to test. labels Apr 18, 2023
@liupeng0518
Copy link
Member

#9990

pls rebase master branch.

@Jeroen0494 Jeroen0494 force-pushed the feat/metallbv0.13.4 branch 2 times, most recently from ae9598a to 1beb29f Compare April 20, 2023 09:08
@Jeroen0494
Copy link
Contributor Author

#9990

pls rebase master branch.

Hi, I rebased, pushed some more fixes and squashed my commits.

@eugene-marchanka
Copy link
Contributor

Hey! I have an issue running your branch:

TASK [kubernetes-apps/metallb : Kubernetes Apps | Install and configure MetalLB] **********************************************************************************************************
fatal: [test00-00]: FAILED! => 
  msg: '''dict object'' has no attribute ''results'''
skipping: [test00-01]
skipping: [test00-02]

Could you, please, take a look?
Thanks!

@Jeroen0494
Copy link
Contributor Author

Hey! I have an issue running your branch:

TASK [kubernetes-apps/metallb : Kubernetes Apps | Install and configure MetalLB] **********************************************************************************************************
fatal: [test00-00]: FAILED! => 
  msg: '''dict object'' has no attribute ''results'''
skipping: [test00-01]
skipping: [test00-02]

Could you, please, take a look? Thanks!

Hi, I'm sorry this is happening. I'll look into it tomorrow afternoon when I'm at work, and give the whole thing a more thorough testing.

I'm also a bit surprised this wasn't noticed during CI/CD testing. @floryut is MetalLB included in the Kubespray CI tests?

@eugene-marchanka
Copy link
Contributor

I fixed it modifying this:

- name: Kubernetes Apps | Lay Down MetalLB
  become: true
  template:
    src: "{{ item }}.j2"
    dest: "{{ kube_config_dir }}/{{ item }}"
    mode: 0644
  with_items:
  - metallb.yml
  register: metallb_rendering
  when:
  - "inventory_hostname == groups['kube_control_plane'][0]"

but faced another problem:

  msg: |-
    error running kubectl (/usr/local/bin/kubectl apply --force --wait --filename=/etc/kubernetes/metallb.yml) command (rc=1), out='customresourcedefinition.apiextensions.k8s.io/addresspools.metallb.io created
    customresourcedefinition.apiextensions.k8s.io/bfdprofiles.metallb.io created
    customresourcedefinition.apiextensions.k8s.io/bgpadvertisements.metallb.io created
    customresourcedefinition.apiextensions.k8s.io/bgppeers.metallb.io created
    customresourcedefinition.apiextensions.k8s.io/communities.metallb.io created
    customresourcedefinition.apiextensions.k8s.io/ipaddresspools.metallb.io created
    customresourcedefinition.apiextensions.k8s.io/l2advertisements.metallb.io created
    clusterrole.rbac.authorization.k8s.io/metallb-system:controller created
    clusterrole.rbac.authorization.k8s.io/metallb-system:speaker created
    clusterrolebinding.rbac.authorization.k8s.io/metallb-system:controller created
    clusterrolebinding.rbac.authorization.k8s.io/metallb-system:speaker created
    ', err='error parsing /etc/kubernetes/metallb.yml: error converting YAML to JSON: yaml: line 133: mapping values are not allowed in this context
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found
    Error from server (NotFound): error when creating "/etc/kubernetes/metallb.yml": namespaces "metallb-system" not found

Copy link
Contributor

@eugene-marchanka eugene-marchanka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some refactoring for error:

TASK [kubernetes-apps/metallb : MetalLB | Create address pools configuration] *************************************************************************************************************
fatal: [test00-00]: FAILED! =>                                                                                                                                                             
  msg: |-                                                                                                                                                                                  
    The task includes an option with an undefined variable. The error was: 'item' is undefined                                                                                             
                                                                                                                                                                                           
    The error appears to be in '..kubespray/roles/kubernetes-apps/metallb/tasks/main.yml': line 6
8, column 5, but may                                                                                                                                                                       
    be elsewhere in the file depending on the exact syntax problem.                                                                                                                        
                                                                                                                                                                                           
    The offending line appears to be:                                                                                                                                                      
                                                                                                                                                                                                                                                                                                                                                                                      
      - name: MetalLB | Create address pools configuration                                                                                                                                 
        ^ here                                                                                                                                                                             
skipping: [test00-01]                                                                                                                                                                      
skipping: [test00-02]

roles/kubernetes-apps/metallb/tasks/main.yml Outdated Show resolved Hide resolved
roles/kubernetes-apps/metallb/tasks/main.yml Outdated Show resolved Hide resolved
roles/kubernetes-apps/metallb/tasks/main.yml Outdated Show resolved Hide resolved
@eugene-marchanka
Copy link
Contributor

another one:

TASK [kubernetes-apps/metallb : Kubernetes Apps | Delete MetalLB ConfigMap] ***************************************************************************************************************
skipping: [test00-01]
skipping: [test00-02]
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ModuleNotFoundError: No module named 'kubernetes'
fatal: [test00-00]: FAILED! => changed=false 
  error: No module named 'kubernetes'

@eugene-marchanka
Copy link
Contributor

another one:

TASK [kubernetes-apps/metallb : Kubernetes Apps | Delete MetalLB ConfigMap] ***************************************************************************************************************
skipping: [test00-01]
skipping: [test00-02]
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ModuleNotFoundError: No module named 'kubernetes'
fatal: [test00-00]: FAILED! => changed=false 
  error: No module named 'kubernetes'

This works fine with me:

- name: Kubernetes Apps | Delete MetalLB ConfigMap
  command: "{{ bin_dir }}/kubectl delete configmap -n metallb-system config --ignore-not-found=true"
  when: inventory_hostname == groups['kube_control_plane'][0]
  no_log: "{{ not (unsafe_show_logs|bool) }}"

@floryut
Copy link
Member

floryut commented Apr 21, 2023

Hey! I have an issue running your branch:

TASK [kubernetes-apps/metallb : Kubernetes Apps | Install and configure MetalLB] **********************************************************************************************************
fatal: [test00-00]: FAILED! => 
  msg: '''dict object'' has no attribute ''results'''
skipping: [test00-01]
skipping: [test00-02]

Could you, please, take a look? Thanks!

Hi, I'm sorry this is happening. I'll look into it tomorrow afternoon when I'm at work, and give the whole thing a more thorough testing.

I'm also a bit surprised this wasn't noticed during CI/CD testing. @floryut is MetalLB included in the Kubespray CI tests?

Now that you are mentioning this, after a quick search we don't have any CI coverage for MetalLB use cases, not even a simple deployment :/

@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Apr 21, 2023
@Jeroen0494
Copy link
Contributor Author

@eugene-marchanka Thank you for the help! I've updated the installation, should be quite a bit better now. Could you give it a spin?

@eugene-marchanka
Copy link
Contributor

@eugene-marchanka Thank you for the help! I've updated the installation, should be quite a bit better now. Could you give it a spin?

Hey!
Just did another run and got an error:

TASK [kubernetes-apps/metallb : MetalLB | Create address pools configuration] *************************************************************************************************************
skipping: [test00-01]
skipping: [test00-02]
fatal: [test00-00]: FAILED! => changed=false 
  msg: |-
    error running kubectl (/usr/local/bin/kubectl apply --force --filename=/etc/kubernetes/pools.yaml) command (rc=1), out='', err='Error from server (InternalError): error when creating "/etc/kubernetes/pools.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": dial tcp 10.233.14.18:443: connect: connection refused
    '

@Jeroen0494
Copy link
Contributor Author

@eugene-marchanka Thank you for the help! I've updated the installation, should be quite a bit better now. Could you give it a spin?

Hey! Just did another run and got an error:

TASK [kubernetes-apps/metallb : MetalLB | Create address pools configuration] *************************************************************************************************************
skipping: [test00-01]
skipping: [test00-02]
fatal: [test00-00]: FAILED! => changed=false 
  msg: |-
    error running kubectl (/usr/local/bin/kubectl apply --force --filename=/etc/kubernetes/pools.yaml) command (rc=1), out='', err='Error from server (InternalError): error when creating "/etc/kubernetes/pools.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": dial tcp 10.233.14.18:443: connect: connection refused
    '

Yea, so this is the reason I want to use the kubernetes.core.k8s module. Because the playbook needs to wait until the controller pod is properly running before resources can be created.

@floryut any change I can switch from using the self-written kube module to the kubernetes.core collection, and add this as a requirement in requirements.txt? Because the kube module doesn't have what I need to prevent any further errors in this regard. Without a proper wait-until-ready, this isn't going to work.

@eugene-marchanka
Copy link
Contributor

@eugene-marchanka Thank you for the help! I've updated the installation, should be quite a bit better now. Could you give it a spin?

Hey! Just did another run and got an error:

TASK [kubernetes-apps/metallb : MetalLB | Create address pools configuration] *************************************************************************************************************
skipping: [test00-01]
skipping: [test00-02]
fatal: [test00-00]: FAILED! => changed=false 
  msg: |-
    error running kubectl (/usr/local/bin/kubectl apply --force --filename=/etc/kubernetes/pools.yaml) command (rc=1), out='', err='Error from server (InternalError): error when creating "/etc/kubernetes/pools.yaml": Internal error occurred: failed calling webhook "ipaddresspoolvalidationwebhook.metallb.io": failed to call webhook: Post "https://webhook-service.metallb-system.svc:443/validate-metallb-io-v1beta1-ipaddresspool?timeout=10s": dial tcp 10.233.14.18:443: connect: connection refused
    '

A workaround is to add simple sleep task before deploying pools. Finally deployed without issues 🚀 :

- name: Kubernetes Apps | Install and configure MetalLB
  kube:
    name: "MetalLB"
    kubectl: "{{ bin_dir }}/kubectl"
    filename: "{{ kube_config_dir }}/metallb.yml"
    state: "{{ metallb_rendering.changed | ternary('latest','present') }}"
    wait: true
  become: true
  when:
  - "inventory_hostname == groups['kube_control_plane'][0]"

- name: Sleep
  command: "sleep 300"
  when: inventory_hostname == groups['kube_control_plane'][0]
  no_log: "{{ not (unsafe_show_logs|bool) }}"

- name: MetalLB | Address pools
  block:
  - name: MetalLB | Layout address pools template
    ansible.builtin.template:
      src: pools.yaml.j2
      dest: "{{ kube_config_dir }}/pools.yaml"
      mode: 0644
    register: pools_rendering

@Jeroen0494
Copy link
Contributor Author

Thank you so much for working hard to fix the issue @Jeroen0494 @eugene-marchanka !
@Jeroen0494 Could you apply @eugene-marchanka's workaround into this pull request? That would help many people I guess.

I certainly can, but this would implement a wait of 300, even if the controller is already running. What's against using the kubernetes.core.k8s module? I already have working code for this.

@oomichi kind reminder.

@yankay
Copy link
Member

yankay commented May 18, 2023

HI @Jeroen0494

The v2.22 will release the next week.
So the PR can be combined into v2.22 if it is merged before it.

@Jeroen0494
Copy link
Contributor Author

HI @Jeroen0494

The v2.22 will release the next week. So the PR can be combined into v2.22 if it is merged before it.

Thanks. I'm still waiting for an answer for my question, so I guess it won't be ready before the release.

@Nathanael-Mtd
Copy link

Hi !

It seems Kubespray v2.22.0 release broke upgrade/install for metallb users because that PR isn't merged :/

Maybe it should a good idea to add an advertissement to tell users to skip the metallb upgrade for the v2.22.0 release ?

Copy link
Contributor

@cosandr cosandr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your effort, I've tried this PR out a bit and found some issues.

I'd also like to comment on the usage of kubernetes.core modules, these require the Python kubernetes library on the target host, which is likely not so trivial to setup considering the broad OS compatibility of kubespray. You could attempt to delegate these tasks to localhost, but personally I don't think that's a great solution either.

Oh yeah, and I suggest you look into kubectl wait instead of an arbitrary sleep, i.e. kubectl -n metallb-system wait --for condition=established --timeout=300s crd/addresspools.metallb.io.

roles/kubernetes-apps/metallb/templates/metallb.yml.j2 Outdated Show resolved Hide resolved
roles/kubernetes-apps/metallb/templates/metallb.yml.j2 Outdated Show resolved Hide resolved
roles/kubernetes-apps/metallb/templates/metallb.yml.j2 Outdated Show resolved Hide resolved
@Jeroen0494 Jeroen0494 mentioned this pull request May 26, 2023
@Jeroen0494
Copy link
Contributor Author

Thanks for your effort, I've tried this PR out a bit and found some issues.

I'd also like to comment on the usage of kubernetes.core modules, these require the Python kubernetes library on the target host, which is likely not so trivial to setup considering the broad OS compatibility of kubespray. You could attempt to delegate these tasks to localhost, but personally I don't think that's a great solution either.

Oh yeah, and I suggest you look into kubectl wait instead of an arbitrary sleep, i.e. kubectl -n metallb-system wait --for condition=established --timeout=300s crd/addresspools.metallb.io.

Indeed we use connection: local for our internal Kubernetes deployments, the only change this would require for Kubespray is to add the following to the requirements.txt:

ansible-galaxy collection install kubernetes.core

That's really everything.

But I will take another look at the kubectl wait command to see if there is something this can offer. And I will review your requested changes.

@Jeroen0494 Jeroen0494 force-pushed the feat/metallbv0.13.4 branch from 9760fe9 to 720b393 Compare May 26, 2023 11:52
@Jeroen0494
Copy link
Contributor Author

@cosandr @yankay I've added a simple wait for the controller to be running.

Copy link
Member

@MrFreezeex MrFreezeex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments but looks good overall, thanks :). Could you also add a test similarly to the other PR?

roles/kubernetes-apps/metallb/tasks/main.yml Outdated Show resolved Hide resolved
roles/kubernetes-apps/metallb/tasks/main.yml Outdated Show resolved Hide resolved
@MrFreezeex
Copy link
Member

MrFreezeex commented May 26, 2023

Ah and could you also please cleanup (or squash) you commit history?

@Jeroen0494 Jeroen0494 force-pushed the feat/metallbv0.13.4 branch from a145ff4 to d74d1b6 Compare May 30, 2023 07:20
@Jeroen0494
Copy link
Contributor Author

@MrFreezeex fixes applied and commits squashed.

@MrFreezeex
Copy link
Member

@Jeroen0494
Copy link
Contributor Author

I've added a basic test.

Copy link
Member

@MrFreezeex MrFreezeex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution and the extra miles of adding tests and docs!
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 30, 2023
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: floryut, Jeroen0494, MrFreezeex

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot merged commit ea7dcd4 into kubernetes-sigs:master May 30, 2023
@yankay yankay mentioned this pull request Jun 8, 2023
@Jeroen0494 Jeroen0494 deleted the feat/metallbv0.13.4 branch June 8, 2023 06:53
alekseyolg pushed a commit to alekseyolg/kubespray that referenced this pull request Jun 21, 2023
* Update MetalLB deployment, wait for resource.

Signed-off-by: Jeroen Rijken <[email protected]>

* yml to yaml, add basic test for metallb

Signed-off-by: Jeroen Rijken <[email protected]>

---------

Signed-off-by: Jeroen Rijken <[email protected]>
@yankay yankay mentioned this pull request Aug 24, 2023
pedro-peter pushed a commit to pedro-peter/kubespray that referenced this pull request May 8, 2024
* Update MetalLB deployment, wait for resource.

Signed-off-by: Jeroen Rijken <[email protected]>

* yml to yaml, add basic test for metallb

Signed-off-by: Jeroen Rijken <[email protected]>

---------

Signed-off-by: Jeroen Rijken <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. ok-to-test Indicates a non-member PR verified by an org member that is safe to test. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[MetalLB] playbook failed with msg: metallb_ip_range is mandatory to be specified for MetalLB