update Scheduled conditon when failed scheduling #988

mrlihanbo · 2021-11-19T07:45:47Z

Signed-off-by: lihanbo [email protected]

What type of PR is this?
/kind cleanup

What this PR does / why we need it:

update Scheduled conditon when failed scheduling
remove FirstSchedule, reconcile schedule once applied policy is different from the latest propogation policy.

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:

  conditions:
  - lastTransitionTime: "2021-11-19T07:34:38Z"
    message: 'failed to assignReplicas: failed to scaleUp: clusters resources are
      not enough to schedule, max 6 replicas are support'
    reason: BindingFailedScheduling
    status: "False"
    type: Scheduled

Does this PR introduce a user-facing change?:
"NONE"

qianjun1993 · 2021-11-22T06:49:06Z

pkg/scheduler/scheduler.go

+		Status:  metav1.ConditionTrue,
+		Reason:  scheduleSuccessReason,
+		Message: scheduleSuccessMessage,
+	})


why not make the success condition a function so it can be reused。

qianjun1993 · 2021-11-22T06:58:02Z

In scheduler, there are two reason may cause the schedule failed. One it that the Algorithm has error, and you add the failed condition. Another is that the placement has some problems, may we should also add failed condition?

func (s *Scheduler) scheduleResourceBinding(resourceBinding *workv1alpha2.ResourceBinding) (err error) {	
placement, placementStr, err := s.getPlacement(resourceBinding)
if err != nil {
       // update condition here ?
	return err
       
}

mrlihanbo · 2021-11-24T06:09:12Z

/hold waiting for pr #967 merged, may need to refactor

dddddai · 2021-11-27T14:36:23Z

There is one more concern:
If schedule failed, binding controller would not sync the binding, which means it would not aggregate status (please refer to #768)

karmada/pkg/controllers/binding/binding_controller.go

Lines 72 to 78 in 7e35f9c

    
           isReady := helper.IsBindingReady(&binding.Status) 
        
           if !isReady { 
        
           	klog.Infof("ResourceBinding(%s/%s) is not ready to sync", binding.GetNamespace(), binding.GetName()) 
        
           	return controllerruntime.Result{}, nil 
        
           } 
        
           return c.syncBinding(binding)

I think we can replace it with

isReady := binding.Annotations[util.PolicyPlacementAnnotation] != ""

RainbowMango · 2021-11-30T01:59:42Z

Ping @mrlihanbo

mrlihanbo · 2021-11-30T03:24:08Z

/remove-hold @qianjun1993 @dddddai @RainbowMango @Garrybest

mrlihanbo · 2021-11-30T03:26:13Z

/hold cancel

dddddai · 2021-11-30T06:54:11Z

As I mentioned above, I guess we should update (cluster) binding controller as well

mrlihanbo · 2021-11-30T07:04:06Z

As I mentioned above, I guess we should update (cluster) binding controller as well

sorry, I forget it.

mrlihanbo · 2021-11-30T08:17:38Z

As I mentioned above, I guess we should update (cluster) binding controller as well

@dddddai hi, when binding.Annotations[util.PolicyPlacementAnnotation] == "", it may means to match all clusters. So I choose to recover the raw logic: len(targetClusters) != 0

dddddai · 2021-11-30T08:40:17Z

@dddddai hi, when binding.Annotations[util.PolicyPlacementAnnotation] == "", it may means to match all clusters.

I remember the annotation value would be "{}" if placement is empty

So I choose to recover the raw logic: len(targetClusters) != 0

There might be a problem, for example:
If there is no cluster that fits an updated propagation policy, len(targetClusters) would be 0, then binding controller would not be able to aggregate status

mrlihanbo · 2021-11-30T08:50:24Z

@dddddai hi, when binding.Annotations[util.PolicyPlacementAnnotation] == "", it may means to match all clusters.

I remember the annotation value would be "{}" if placement is empty

So I choose to recover the raw logic: len(targetClusters) != 0

There might be a problem, for example: If there is no cluster that fits an updated propagation policy, len(targetClusters) would be 0, then binding controller would not be able to aggregate status

If there is no cluster that fits an updated propagation policy, the scheduler will remain the schedule result of the unupdated propagation policy. Thus, the len(targetClusters) won't be 0.

dddddai · 2021-11-30T09:10:31Z

If there is no cluster that fits an updated propagation policy, the scheduler will remain the schedule result of the unupdated propagation policy. Thus, the len(targetClusters) won't be 0.

Yes I suspect if it's an expected behavior(which may be fixed in future), so I was thinking we might consider that case

karmada/pkg/scheduler/core/generic_scheduler.go

Lines 59 to 61 in 3a8c15c

    
           if len(feasibleClusters) == 0 { 
        
           	return result, fmt.Errorf("no clusters fit") 
        
           }

Anyway using len(util.PolicyPlacementAnnotation)==0 does no harm, doesn't it? Please correct me if I'm wrong

mrlihanbo · 2021-11-30T09:19:10Z

If there is no cluster that fits an updated propagation policy, the scheduler will remain the schedule result of the unupdated propagation policy. Thus, the len(targetClusters) won't be 0.

Yes I suspect if it's an expected behavior(which may be fixed in future), so I was thinking we might consider that case

karmada/pkg/scheduler/core/generic_scheduler.go

Lines 59 to 61 in 3a8c15c

if len(feasibleClusters) == 0 {

return result, fmt.Errorf("no clusters fit")

}

Anyway using len(util.PolicyPlacementAnnotation)==0 does no harm, doesn't it? Please correct me if I'm wrong

If user create a propagation policy like:

apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: nginx-propagation
  namespace: default
spec:
  resourceSelectors:
  - apiVersion: apps/v1
    kind: Deployment
    name: nginx
    namespace: default

with this propagation policy, len(util.PolicyPlacementAnnotation) will be 0. But it is actually schedule to all clusters.

apiVersion: work.karmada.io/v1alpha2
kind: ResourceBinding
metadata:
  annotations:
    policy.karmada.io/applied-placement: '{}'
  creationTimestamp: "2021-11-30T03:01:30Z"
  finalizers:
  - karmada.io/binding-controller
  generation: 97
  labels:
    propagationpolicy.karmada.io/name: nginx-propagation
    propagationpolicy.karmada.io/namespace: default
  name: nginx-deployment
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: Deployment
    name: nginx
    uid: 96c98f99-49ae-46e7-8bec-af83c9dec13a
  resourceVersion: "54740"
  uid: 29486e35-1dcf-4574-8af7-43cd61d51ae1
spec:
  clusters:
  - name: member1
  - name: member3
  replicaRequirements:
    resourceRequest:
      cpu: "1"
  replicas: 3
  resource:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
    namespace: default
    resourceVersion: "54737"

mrlihanbo · 2021-11-30T09:22:02Z

If there is no cluster that fits an updated propagation policy, the scheduler will remain the schedule result of the unupdated propagation policy. Thus, the len(targetClusters) won't be 0.

Yes I suspect if it's an expected behavior(which may be fixed in future), so I was thinking we might consider that case

karmada/pkg/scheduler/core/generic_scheduler.go

Lines 59 to 61 in 3a8c15c

if len(feasibleClusters) == 0 {

return result, fmt.Errorf("no clusters fit")

}

Anyway using len(util.PolicyPlacementAnnotation)==0 does no harm, doesn't it? Please correct me if I'm wrong

If user create a propagation policy like:
apiVersion: policy.karmada.io/v1alpha1
kind: PropagationPolicy
metadata:
  name: nginx-propagation
  namespace: default
spec:
  resourceSelectors:
  - apiVersion: apps/v1
    kind: Deployment
    name: nginx
    namespace: default
with this propagation policy, len(util.PolicyPlacementAnnotation) will be 0. But it is actually schedule to all clusters.
apiVersion: work.karmada.io/v1alpha2
kind: ResourceBinding
metadata:
  annotations:
    policy.karmada.io/applied-placement: '{}'
  creationTimestamp: "2021-11-30T03:01:30Z"
  finalizers:
  - karmada.io/binding-controller
  generation: 97
  labels:
    propagationpolicy.karmada.io/name: nginx-propagation
    propagationpolicy.karmada.io/namespace: default
  name: nginx-deployment
  namespace: default
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: Deployment
    name: nginx
    uid: 96c98f99-49ae-46e7-8bec-af83c9dec13a
  resourceVersion: "54740"
  uid: 29486e35-1dcf-4574-8af7-43cd61d51ae1
spec:
  clusters:
  - name: member1
  - name: member3
  replicaRequirements:
    resourceRequest:
      cpu: "1"
  replicas: 3
  resource:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
    namespace: default
    resourceVersion: "54737"

I am not sure if the length of policy.karmada.io/applied-placement: '{}' is 0?

dddddai · 2021-11-30T09:26:16Z

I am not sure if the length of policy.karmada.io/applied-placement: '{}' is 0?

It can't be 0, anyway if you worry about this, you may use:

_, isReady := binding.Annotations[util.PolicyPlacementAnnotation]

mrlihanbo · 2021-11-30T09:38:35Z

I am not sure if the length of policy.karmada.io/applied-placement: '{}' is 0?

It can't be 0, anyway if you worry about this, you may use:
_, isReady := binding.Annotations[util.PolicyPlacementAnnotation]

Sounds good to me.

RainbowMango · 2021-12-01T08:02:40Z

pkg/controllers/binding/binding_controller.go

@@ -69,7 +69,7 @@ func (c *ResourceBindingController) Reconcile(ctx context.Context, req controlle
 		return c.removeFinalizer(binding)
 	}

-	isReady := helper.IsBindingReady(&binding.Status)
+	_, isReady := binding.Annotations[util.PolicyPlacementAnnotation]


It looks weird because why has the annotation is ready?

@RainbowMango Precisely, it does NOT mean schedule successed, but has been scheduled once

We have to aggregate status no matter if the schedule successed or not, don't we?

Please look at the latest code. This check has been postponed to syncBinding.

RainbowMango · 2021-12-01T08:23:48Z

pkg/scheduler/scheduler.go

+	defer func() {
+		if err != nil {
+			failedSchedulingCondition := util.NewCondition(workv1alpha2.Scheduled, scheduleFailedReason, err.Error(), metav1.ConditionFalse)
+			if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, failedSchedulingCondition); updateErr != nil {
+				klog.Errorf("failed to set failed scheduling condition for binding(%s/%s): %v", rb.Namespace, rb.Name, updateErr)
+				err = fmt.Errorf("attempted to set failed scheduling condition after error %v, but failed due to error: %v", err, updateErr)
+			}
+		} else {
+			successSchedulingCondition := util.NewCondition(workv1alpha2.Scheduled, scheduleSuccessReason, scheduleSuccessMessage, metav1.ConditionTrue)
+			if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, successSchedulingCondition); updateErr != nil {
+				klog.Errorf("failed to set success scheduling condition for binding(%s/%s): %v", rb.Namespace, rb.Name, updateErr)
+				err = fmt.Errorf("attempted to set success scheduling condition, but failed due to error: %v", updateErr)
+			}
+		}
+	}()
+


Suggested change

defer func() {

if err != nil {

failedSchedulingCondition := util.NewCondition(workv1alpha2.Scheduled, scheduleFailedReason, err.Error(), metav1.ConditionFalse)

if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, failedSchedulingCondition); updateErr != nil {

klog.Errorf("failed to set failed scheduling condition for binding(%s/%s): %v", rb.Namespace, rb.Name, updateErr)

err = fmt.Errorf("attempted to set failed scheduling condition after error %v, but failed due to error: %v", err, updateErr)

}

} else {

successSchedulingCondition := util.NewCondition(workv1alpha2.Scheduled, scheduleSuccessReason, scheduleSuccessMessage, metav1.ConditionTrue)

if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, successSchedulingCondition); updateErr != nil {

klog.Errorf("failed to set success scheduling condition for binding(%s/%s): %v", rb.Namespace, rb.Name, updateErr)

err = fmt.Errorf("attempted to set success scheduling condition, but failed due to error: %v", updateErr)

}

}

}()

// Update "Scheduled" condition according to schedule result.

defer func() {

var condition metav1.Condition

if err == nil {

condition = util.NewCondition(workv1alpha2.Scheduled, scheduleSuccessReason, scheduleSuccessMessage, metav1.ConditionTrue)

} else {

condition = util.NewCondition(workv1alpha2.Scheduled, scheduleFailedReason, err.Error(), metav1.ConditionFalse)

}

if updateErr := s.updateBindingScheduledConditionIfNeeded(rb, condition); updateErr != nil {

klog.Errorf("Failed update condition(%s) for ResourceBinding(%s/%s)", workv1alpha2.Scheduled, rb.Namespace, rb.Name)

if err == nil {

// schedule succeed but update condition failed, return err in order to retry in next loop.

err = updateErr

}

}

}()

How about this?

RainbowMango · 2021-12-01T09:47:15Z

pkg/scheduler/scheduler.go

-		// the binding has not been scheduled, need schedule
-		klog.Infof("Start scheduling ResourceBinding(%s/%s)", namespace, name)
-		err = s.scheduleResourceBinding(rb)
-		metrics.BindingSchedule(string(FirstSchedule), metrics.SinceInSeconds(start), err)


@qianjun1993
Here merged FirstSchedule to ReconcileSchedule because they share the same schedule logic, are you ok with it?

I am ok with that for they share the same schedule logic now. But As mentioned in #900, there may be some differences in the two kind of schedule.

RainbowMango · 2021-12-01T09:48:04Z

/assign @qianjun1993 @dddddai
How do you say?

pkg/controllers/binding/binding_controller.go

Signed-off-by: lihanbo <[email protected]>

dddddai · 2021-12-02T13:33:51Z

lgtm

RainbowMango · 2021-12-07T01:17:49Z

/lgtm
/approve

@dddddai
All robot commands start with /.
Here is the command help documents.

karmada-bot · 2021-12-07T01:17:55Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [RainbowMango]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

karmada-bot added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Nov 19, 2021

karmada-bot requested review from qianjun1993 and XiShanYongYe-Chang November 19, 2021 07:45

karmada-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Nov 19, 2021

qianjun1993 reviewed Nov 22, 2021

View reviewed changes

karmada-bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 24, 2021

dddddai mentioned this pull request Nov 26, 2021

Refactor schedule type #967

Merged

mrlihanbo force-pushed the schedule-condition branch from 79bcf7c to 3d6da07 Compare November 30, 2021 03:03

karmada-bot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 30, 2021

mrlihanbo force-pushed the schedule-condition branch 2 times, most recently from 3d1218c to 6cf7f3f Compare November 30, 2021 06:48

mrlihanbo force-pushed the schedule-condition branch from 6cf7f3f to b019984 Compare November 30, 2021 08:14

mrlihanbo force-pushed the schedule-condition branch from b019984 to b68b5f4 Compare November 30, 2021 08:52

mrlihanbo force-pushed the schedule-condition branch from b68b5f4 to 5d2f68d Compare November 30, 2021 09:37

RainbowMango reviewed Dec 1, 2021

View reviewed changes

mrlihanbo force-pushed the schedule-condition branch 2 times, most recently from 806f6e7 to cb7f351 Compare December 1, 2021 09:39

RainbowMango reviewed Dec 1, 2021

View reviewed changes

karmada-bot assigned dddddai and qianjun1993 Dec 1, 2021

RainbowMango mentioned this pull request Dec 1, 2021

karmada-scheduler: merge scalescheduling with normal scheduling #1051

Merged

dddddai reviewed Dec 1, 2021

View reviewed changes

pkg/controllers/binding/binding_controller.go Outdated Show resolved Hide resolved

mrlihanbo force-pushed the schedule-condition branch from cb7f351 to 6cf17ab Compare December 2, 2021 02:25

update Scheduled conditon when failed scheduling

435f32c

Signed-off-by: lihanbo <[email protected]>

mrlihanbo force-pushed the schedule-condition branch from 6cf17ab to 435f32c Compare December 2, 2021 10:00

karmada-bot assigned RainbowMango Dec 7, 2021

karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2021

karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 7, 2021

karmada-bot merged commit 09c0449 into karmada-io:master Dec 7, 2021

mrlihanbo deleted the schedule-condition branch March 2, 2022 07:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update Scheduled conditon when failed scheduling #988

update Scheduled conditon when failed scheduling #988

mrlihanbo commented Nov 19, 2021 •

edited

Loading

qianjun1993 Nov 22, 2021

qianjun1993 commented Nov 22, 2021 •

edited

Loading

mrlihanbo commented Nov 24, 2021

dddddai commented Nov 27, 2021

RainbowMango commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021 •

edited

Loading

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

RainbowMango Dec 1, 2021

dddddai Dec 1, 2021 •

edited

Loading

RainbowMango Dec 1, 2021

RainbowMango Dec 1, 2021 •

edited

Loading

RainbowMango Dec 1, 2021

qianjun1993 Dec 2, 2021

RainbowMango commented Dec 1, 2021

dddddai commented Dec 2, 2021

RainbowMango commented Dec 7, 2021

karmada-bot commented Dec 7, 2021

update Scheduled conditon when failed scheduling #988

update Scheduled conditon when failed scheduling #988

Conversation

mrlihanbo commented Nov 19, 2021 • edited Loading

qianjun1993 Nov 22, 2021

Choose a reason for hiding this comment

qianjun1993 commented Nov 22, 2021 • edited Loading

mrlihanbo commented Nov 24, 2021

dddddai commented Nov 27, 2021

RainbowMango commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021 • edited Loading

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

dddddai commented Nov 30, 2021

mrlihanbo commented Nov 30, 2021

RainbowMango Dec 1, 2021

Choose a reason for hiding this comment

dddddai Dec 1, 2021 • edited Loading

Choose a reason for hiding this comment

RainbowMango Dec 1, 2021

Choose a reason for hiding this comment

RainbowMango Dec 1, 2021 • edited Loading

Choose a reason for hiding this comment

RainbowMango Dec 1, 2021

Choose a reason for hiding this comment

qianjun1993 Dec 2, 2021

Choose a reason for hiding this comment

RainbowMango commented Dec 1, 2021

dddddai commented Dec 2, 2021

RainbowMango commented Dec 7, 2021

karmada-bot commented Dec 7, 2021

mrlihanbo commented Nov 19, 2021 •

edited

Loading

qianjun1993 commented Nov 22, 2021 •

edited

Loading

mrlihanbo commented Nov 30, 2021 •

edited

Loading

dddddai Dec 1, 2021 •

edited

Loading

RainbowMango Dec 1, 2021 •

edited

Loading