-
Notifications
You must be signed in to change notification settings - Fork 913
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update Scheduled conditon when failed scheduling #988
Conversation
pkg/scheduler/scheduler.go
Outdated
Status: metav1.ConditionTrue, | ||
Reason: scheduleSuccessReason, | ||
Message: scheduleSuccessMessage, | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not make the success condition a function so it can be reused。
In scheduler, there are two reason may cause the schedule failed. One it that the Algorithm has error, and you add the failed condition. Another is that the placement has some problems, may we should also add failed condition?
|
/hold waiting for pr #967 merged, may need to refactor |
There is one more concern: karmada/pkg/controllers/binding/binding_controller.go Lines 72 to 78 in 7e35f9c
I think we can replace it with isReady := binding.Annotations[util.PolicyPlacementAnnotation] != "" |
Ping @mrlihanbo |
79bcf7c
to
3d6da07
Compare
/remove-hold @qianjun1993 @dddddai @RainbowMango @Garrybest |
/hold cancel |
3d1218c
to
6cf7f3f
Compare
As I mentioned above, I guess we should update (cluster) binding controller as well |
sorry, I forget it. |
6cf7f3f
to
b019984
Compare
I remember the annotation value would be "{}" if placement is empty
There might be a problem, for example: |
If there is no cluster that fits an updated propagation policy, the scheduler will remain the schedule result of the unupdated propagation policy. Thus, the |
b019984
to
b68b5f4
Compare
Yes I suspect if it's an expected behavior(which may be fixed in future), so I was thinking we might consider that case karmada/pkg/scheduler/core/generic_scheduler.go Lines 59 to 61 in 3a8c15c
Anyway using len(util.PolicyPlacementAnnotation)==0 does no harm, doesn't it? Please correct me if I'm wrong
|
If user create a propagation policy like:
with this propagation policy, len(util.PolicyPlacementAnnotation) will be 0. But it is actually schedule to all clusters.
|
I am not sure if the length of |
It can't be 0, anyway if you worry about this, you may use: _, isReady := binding.Annotations[util.PolicyPlacementAnnotation] |
b68b5f4
to
5d2f68d
Compare
Sounds good to me. |
@@ -69,7 +69,7 @@ func (c *ResourceBindingController) Reconcile(ctx context.Context, req controlle | |||
return c.removeFinalizer(binding) | |||
} | |||
|
|||
isReady := helper.IsBindingReady(&binding.Status) | |||
_, isReady := binding.Annotations[util.PolicyPlacementAnnotation] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks weird because why has the annotation is ready
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@RainbowMango Precisely, it does NOT mean schedule successed, but has been scheduled once
We have to aggregate status no matter if the schedule successed or not, don't we?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please look at the latest code. This check has been postponed to syncBinding
.
pkg/scheduler/scheduler.go
Outdated
defer func() { | ||
if err != nil { | ||
failedSchedulingCondition := util.NewCondition(workv1alpha2.Scheduled, scheduleFailedReason, err.Error(), metav1.ConditionFalse) | ||
if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, failedSchedulingCondition); updateErr != nil { | ||
klog.Errorf("failed to set failed scheduling condition for binding(%s/%s): %v", rb.Namespace, rb.Name, updateErr) | ||
err = fmt.Errorf("attempted to set failed scheduling condition after error %v, but failed due to error: %v", err, updateErr) | ||
} | ||
} else { | ||
successSchedulingCondition := util.NewCondition(workv1alpha2.Scheduled, scheduleSuccessReason, scheduleSuccessMessage, metav1.ConditionTrue) | ||
if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, successSchedulingCondition); updateErr != nil { | ||
klog.Errorf("failed to set success scheduling condition for binding(%s/%s): %v", rb.Namespace, rb.Name, updateErr) | ||
err = fmt.Errorf("attempted to set success scheduling condition, but failed due to error: %v", updateErr) | ||
} | ||
} | ||
}() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
defer func() { | |
if err != nil { | |
failedSchedulingCondition := util.NewCondition(workv1alpha2.Scheduled, scheduleFailedReason, err.Error(), metav1.ConditionFalse) | |
if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, failedSchedulingCondition); updateErr != nil { | |
klog.Errorf("failed to set failed scheduling condition for binding(%s/%s): %v", rb.Namespace, rb.Name, updateErr) | |
err = fmt.Errorf("attempted to set failed scheduling condition after error %v, but failed due to error: %v", err, updateErr) | |
} | |
} else { | |
successSchedulingCondition := util.NewCondition(workv1alpha2.Scheduled, scheduleSuccessReason, scheduleSuccessMessage, metav1.ConditionTrue) | |
if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, successSchedulingCondition); updateErr != nil { | |
klog.Errorf("failed to set success scheduling condition for binding(%s/%s): %v", rb.Namespace, rb.Name, updateErr) | |
err = fmt.Errorf("attempted to set success scheduling condition, but failed due to error: %v", updateErr) | |
} | |
} | |
}() | |
// Update "Scheduled" condition according to schedule result. | |
defer func() { | |
var condition metav1.Condition | |
if err == nil { | |
condition = util.NewCondition(workv1alpha2.Scheduled, scheduleSuccessReason, scheduleSuccessMessage, metav1.ConditionTrue) | |
} else { | |
condition = util.NewCondition(workv1alpha2.Scheduled, scheduleFailedReason, err.Error(), metav1.ConditionFalse) | |
} | |
if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, condition); updateErr != nil { | |
klog.Errorf("Failed update condition(%s) for ResourceBinding(%s/%s)", workv1alpha2.Scheduled, rb.Namespace, rb.Name) | |
if err == nil { | |
// schedule succeed but update condition failed, return err in order to retry in next loop. | |
err = updateErr | |
} | |
} | |
}() |
How about this?
806f6e7
to
cb7f351
Compare
// the binding has not been scheduled, need schedule | ||
klog.Infof("Start scheduling ResourceBinding(%s/%s)", namespace, name) | ||
err = s.scheduleResourceBinding(rb) | ||
metrics.BindingSchedule(string(FirstSchedule), metrics.SinceInSeconds(start), err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qianjun1993
Here merged FirstSchedule
to ReconcileSchedule
because they share the same schedule logic, are you ok with it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with that for they share the same schedule logic now. But As mentioned in #900, there may be some differences in the two kind of schedule.
/assign @qianjun1993 @dddddai |
cb7f351
to
6cf17ab
Compare
Signed-off-by: lihanbo <[email protected]>
6cf17ab
to
435f32c
Compare
lgtm |
/lgtm @dddddai |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: RainbowMango The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: lihanbo [email protected]
What type of PR is this?
/kind cleanup
What this PR does / why we need it:
Scheduled
conditon when failed schedulingFirstSchedule
, reconcile schedule once applied policy is different from the latest propogation policy.Which issue(s) this PR fixes:
Fixes #
Special notes for your reviewer:
Does this PR introduce a user-facing change?:
"NONE"