[BUG] WorkloadSpread 拓扑分布更新时候未按照预期工作 #1194

a33151 · 2023-02-27T11:00:02Z

What happened:
WorkloadSpread 拓扑分布在更新的时候不生效

我期望是:
common标签是2个pod
spot标签是8个pod

如何复现:

我的WorkloadSpread如下:


apiVersion: apps.kruise.io/v1alpha1
kind: WorkloadSpread
metadata:
  name: test-project-zjj-test-co225-121321-cp
  namespace: test-product
spec:
  subsets:
  - maxReplicas: 2
    name: common
    patch:
      metadata:
        labels:
          biz.type: common
    requiredNodeSelectorTerm:
      matchExpressions:
      - key: biz.type
        operator: In
        values:
        - common
  - maxReplicas: 8
    name: spot
    patch:
      metadata:
        labels:
          biz.type: spot
    requiredNodeSelectorTerm:
      matchExpressions:
      - key: biz.type
        operator: In
        values:
        - spot
  targetRef:
    apiVersion: apps.kruise.io/v1alpha1
    kind: CloneSet
    name: test-project-zjj-test-co225-121321-cp

我部署了 replica 为10 的部署.
第一次apply 我的测试部署时候, 拓扑分布是正常的

root@k8s-master-670ba4fbb4:~/test# kubectl get po -n  test-product -l  biz.type=spot
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-7nnqd   1/1     Running   0          33s
test-project-zjj-test-co225-121321-cp-fkvql   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-mw6zb   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-s642l   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-slc8g   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-tmszn   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-w8tzm   1/1     Running   0          32s
test-project-zjj-test-co225-121321-cp-zgspv   1/1     Running   0          32s
root@k8s-master-670ba4fbb4:~/test# kubectl get po -n  test-product -l  biz.type=common
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-74r25   1/1     Running   0          36s
test-project-zjj-test-co225-121321-cp-cjpq4   1/1     Running   0          36s

我编辑我的yaml,修改cpu limit

再次apply, 其中有一个pod没有被标记任何的biz.type标签,并且common上只有1个pod

root@k8s-master-670ba4fbb4:~/test# kubectl get po -n  test-product -l  biz.type=common
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-2kxjh   1/1     Running   0          79s

root@k8s-master-670ba4fbb4:~/test# kubectl get po -n  test-product -l  biz.type=spot
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-2s2dq   1/1     Running   0          84s
test-project-zjj-test-co225-121321-cp-9pwnq   1/1     Running   0          96s
test-project-zjj-test-co225-121321-cp-bb2cr   1/1     Running   0          83s
test-project-zjj-test-co225-121321-cp-n8prs   1/1     Running   0          83s
test-project-zjj-test-co225-121321-cp-nmskh   1/1     Running   0          96s
test-project-zjj-test-co225-121321-cp-s2j95   1/1     Running   0          69s
test-project-zjj-test-co225-121321-cp-t9wxn   1/1     Running   0          97s
test-project-zjj-test-co225-121321-cp-v6lnd   1/1     Running   0          97s

root@k8s-master-670ba4fbb4:~/test# kubectl get po -n test-product
NAME                                          READY   STATUS    RESTARTS   AGE
test-project-zjj-test-co225-121321-cp-2kxjh   1/1     Running   0          91s
test-project-zjj-test-co225-121321-cp-2s2dq   1/1     Running   0          92s
test-project-zjj-test-co225-121321-cp-9pwnq   1/1     Running   0          104s
test-project-zjj-test-co225-121321-cp-bb2cr   1/1     Running   0          91s
test-project-zjj-test-co225-121321-cp-n8prs   1/1     Running   0          91s
test-project-zjj-test-co225-121321-cp-nmskh   1/1     Running   0          104s
test-project-zjj-test-co225-121321-cp-s2j95   1/1     Running   0          77s
test-project-zjj-test-co225-121321-cp-t9wxn   1/1     Running   0          105s
test-project-zjj-test-co225-121321-cp-v6lnd   1/1     Running   0          105s
test-project-zjj-test-co225-121321-cp-zkr7t   1/1     Running   0          105s

Environment:

Kruise version:
1.3
Kubernetes version (use kubectl version):
1.20.10

请问是我这边配置的问题吗?

The text was updated successfully, but these errors were encountered:

veophi · 2023-02-27T11:04:21Z

@a33151 不是你的问题，WorkloadSpread 遇到滚动更新确实有这个问题，最多会有 MaxUnavailable+MaxSurge 的误差，原因是 Workload 滚动更新的时候会先扩后缩，或者是缩容后 Informer 同步不及时，WorkloadSpread 计数得不到纠正导致的。

减少误差的办法可以是先调小 MaxUnavailable 和 MaxSurge。

这个问题预计最近两周会修复。

veophi · 2023-02-27T11:04:47Z

/assign @veophi

veophi · 2023-02-27T11:05:17Z

/unassign @FillZpp

a33151 · 2023-02-27T11:07:25Z

感谢大佬,我这边留意下版本更新的

a33151 added the kind/bug Something isn't working label Feb 27, 2023

a33151 assigned FillZpp Feb 27, 2023

kruise-bot assigned veophi Feb 27, 2023

kruise-bot unassigned FillZpp Feb 27, 2023

zmberg added this to the v1.5 milestone Feb 28, 2023

veophi mentioned this issue Mar 1, 2023

WorkloadSpread support workload rolling update case #1197

Merged

veophi added the do-not-merge/work-in-progress label Mar 14, 2023

veophi mentioned this issue Mar 19, 2023

[BUG] workloadspread 调度的不对 #1227

Closed

zmberg closed this as completed Mar 7, 2024

github-project-automation bot added this to Roadmap Aug 29, 2024

github-project-automation bot moved this to Done in Roadmap Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] WorkloadSpread 拓扑分布更新时候未按照预期工作 #1194

[BUG] WorkloadSpread 拓扑分布更新时候未按照预期工作 #1194

a33151 commented Feb 27, 2023

veophi commented Feb 27, 2023

veophi commented Feb 27, 2023

veophi commented Feb 27, 2023

a33151 commented Feb 27, 2023

[BUG] WorkloadSpread 拓扑分布更新时候未按照预期工作 #1194

[BUG] WorkloadSpread 拓扑分布更新时候未按照预期工作 #1194

Comments

a33151 commented Feb 27, 2023

veophi commented Feb 27, 2023

veophi commented Feb 27, 2023

veophi commented Feb 27, 2023

a33151 commented Feb 27, 2023