You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Anything else we need to know?:
我分析了代码发现:
broadcastjob_event_handler.go line141这里 canOldNodeFit, err := checkNodeFitness(mockPod, oldNode)
返回err就会直接continue,而不会去检查新的node状态是否已经把taint清除了,这个是不是不妥?
func (p *enqueueBroadcastJobForNode) updateNode(q workqueue.RateLimitingInterface, old, cur runtime.Object) {
oldNode := old.(*v1.Node)
curNode := cur.(*v1.Node)
if shouldIgnoreNodeUpdate(*oldNode, *curNode) {
return
}
jobList := &v1alpha1.BroadcastJobList{}
err := p.reader.List(context.TODO(), jobList)
if err != nil {
klog.Errorf("Error enqueueing broadcastjob on updateNode %v", err)
}
for _, bcj := range jobList.Items {
mockPod := NewMockPod(&bcj, oldNode.Name)
canOldNodeFit, err := checkNodeFitness(mockPod, oldNode)
if err != nil {
klog.Errorf("failed to checkNodeFitness for job %s/%s, on old node %s, %v", bcj.Namespace, bcj.Name, oldNode.Name, err)
continue
}
canCurNodeFit, err := checkNodeFitness(mockPod, curNode)
if err != nil {
klog.Errorf("failed to checkNodeFitness for job %s/%s, on cur node %s, %v", bcj.Namespace, bcj.Name, curNode.Name, err)
continue
}
if canOldNodeFit != canCurNodeFit {
// enqueue the broadcast job for matching node
q.Add(reconcile.Request{
NamespacedName: types.NamespacedName{
Namespace: bcj.Namespace,
Name: bcj.Name}})
}
}
}
Environment:
Kruise version: master branch
Kubernetes version: 1.19.16
The text was updated successfully, but these errors were encountered:
weldonlwz
changed the title
node has erased its taint,but broadcastjob won't make a pod on that node
[BUG] node has erased its taint,but broadcastjob won't make a pod on that node
Mar 1, 2023
What happened:
k8s集群有一个节点有unschedule taint, 此时提交了一个broadcastjob, pod没有在该 node 上创建,这是符合预期的。
但是当该节点taint被消除后,broadcastjob仍然不会在该node上创建pod。
What you expected to happen:
broadcastjob会在节点taint被清除后在该节点上创建pod
How to reproduce it (as minimally and precisely as possible):
cordon一个节点。
提交一个 broadcastjob
然后再 uncordon 该节点,没有pod被新建出来
Anything else we need to know?:
我分析了代码发现:
broadcastjob_event_handler.go line141这里
canOldNodeFit, err := checkNodeFitness(mockPod, oldNode)
返回err就会直接continue,而不会去检查新的node状态是否已经把taint清除了,这个是不是不妥?
Environment:
The text was updated successfully, but these errors were encountered: