Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] WorkloadSpread 拓扑分布更新时候未按照预期工作 #1194

Closed
a33151 opened this issue Feb 27, 2023 · 4 comments
Closed

[BUG] WorkloadSpread 拓扑分布更新时候未按照预期工作 #1194

a33151 opened this issue Feb 27, 2023 · 4 comments
Assignees
Labels
Milestone

Comments

@a33151
Copy link

a33151 commented Feb 27, 2023

What happened:
WorkloadSpread 拓扑分布在更新的时候不生效

我期望是:
common标签是2个pod
spot标签是8个pod

如何复现:

我的WorkloadSpread如下:


apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
metadata:
  name: test-project-zjj-test-co225-121321-cp
  namespace: test-product
spec:
  subsets:
  - maxReplicas: 2
    name: common
    patch:
      metadata:
        labels:
          biz.type: common
    requiredNodeSelectorTerm:
      matchExpressions:
      - key: biz.type
        operator: In
        values:
        - common
  - maxReplicas: 8
    name: spot
    patch:
      metadata:
        labels:
          biz.type: spot
    requiredNodeSelectorTerm:
      matchExpressions:
      - key: biz.type
        operator: In
        values:
        - spot
  targetRef:
    apiVersion: apps.kruise.io/v1alpha1
    kind: CloneSet
    name: test-project-zjj-test-co225-121321-cp

我部署了 replica 为10 的部署.
第一次apply 我的测试部署时候, 拓扑分布是正常的

root@k8s-master-670ba4fbb4:~/test# kubectl get po -n  test-product -l  biz.type=spot
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-7nnqd   1/1     Running   0          33s
test-project-zjj-test-co225-121321-cp-fkvql   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-mw6zb   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-s642l   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-slc8g   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-tmszn   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-w8tzm   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-zgspv   1/1     Running   0          32s
root@k8s-master-670ba4fbb4:~/test# kubectl get po -n  test-product -l  biz.type=common
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-74r25   1/1     Running   0          36s
test-project-zjj-test-co225-121321-cp-cjpq4   1/1     Running   0          36s

我编辑我的yaml,修改cpu limit

再次apply, 其中有一个pod没有被标记任何的biz.type标签,并且common上只有1个pod

root@k8s-master-670ba4fbb4:~/test# kubectl get po -n  test-product -l  biz.type=common
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-2kxjh   1/1     Running   0          79s

root@k8s-master-670ba4fbb4:~/test# kubectl get po -n  test-product -l  biz.type=spot
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-2s2dq   1/1     Running   0          84s
test-project-zjj-test-co225-121321-cp-9pwnq   1/1     Running   0          96s
test-project-zjj-test-co225-121321-cp-bb2cr   1/1     Running   0          83s
test-project-zjj-test-co225-121321-cp-n8prs   1/1     Running   0          83s
test-project-zjj-test-co225-121321-cp-nmskh   1/1     Running   0          96s
test-project-zjj-test-co225-121321-cp-s2j95   1/1     Running   0          69s
test-project-zjj-test-co225-121321-cp-t9wxn   1/1     Running   0          97s
test-project-zjj-test-co225-121321-cp-v6lnd   1/1     Running   0          97s

root@k8s-master-670ba4fbb4:~/test# kubectl get po -n test-product
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-2kxjh   1/1     Running   0          91s
test-project-zjj-test-co225-121321-cp-2s2dq   1/1     Running   0          92s
test-project-zjj-test-co225-121321-cp-9pwnq   1/1     Running   0          104s
test-project-zjj-test-co225-121321-cp-bb2cr   1/1     Running   0          91s
test-project-zjj-test-co225-121321-cp-n8prs   1/1     Running   0          91s
test-project-zjj-test-co225-121321-cp-nmskh   1/1     Running   0          104s
test-project-zjj-test-co225-121321-cp-s2j95   1/1     Running   0          77s
test-project-zjj-test-co225-121321-cp-t9wxn   1/1     Running   0          105s
test-project-zjj-test-co225-121321-cp-v6lnd   1/1     Running   0          105s
test-project-zjj-test-co225-121321-cp-zkr7t   1/1     Running   0          105s

Environment:

  • Kruise version:
  • 1.3
  • Kubernetes version (use kubectl version):
  • 1.20.10

请问是我这边配置的问题吗?

@a33151 a33151 added the kind/bug Something isn't working label Feb 27, 2023
@veophi
Copy link
Member

veophi commented Feb 27, 2023

@a33151 不是你的问题,WorkloadSpread 遇到滚动更新确实有这个问题,最多会有 MaxUnavailable+MaxSurge 的误差,原因是 Workload 滚动更新的时候会先扩后缩,或者是缩容后 Informer 同步不及时,WorkloadSpread 计数得不到纠正导致的。

减少误差的办法可以是先调小 MaxUnavailable 和 MaxSurge。

这个问题预计最近两周会修复。

@veophi
Copy link
Member

veophi commented Feb 27, 2023

/assign @veophi

@veophi
Copy link
Member

veophi commented Feb 27, 2023

/unassign @FillZpp

@a33151
Copy link
Author

a33151 commented Feb 27, 2023

感谢大佬,我这边留意下版本更新的

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Done
Development

No branches or pull requests

4 participants