Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix spreadconstraints[i].MaxGroups Invalidation when scaleup replicas #1324

Merged
merged 1 commit into from
Feb 8, 2022

Conversation

huone1
Copy link
Contributor

@huone1 huone1 commented Jan 27, 2022

Signed-off-by: huone1 [email protected]

What type of PR is this?
/kind bug
What this PR does / why we need it:
because the spreadconstraints[i].maxgroup define the max cluster number, So the target cluster number could not be greater than it in any scenario.

The difference between the current cluster and the scheduled cluster is not considered when calculating the scheduled replicas
Which issue(s) this PR fixes:
Fixes #1323

let's test it According to the issue #1323 operation

before scaleup

spec:
    clusters:
    - name: member1
      replicas: 55
    - name: member2
      replicas: 55
    replicas: 110

after scaleup

spec:
    clusters:
    - name: member2
      replicas: 95
    - name: member3
      replicas: 85
    replicas: 180

Special notes for your reviewer:
NONE
Does this PR introduce a user-facing change?:

fix spreadconstraints[i].MaxGroups is invalidated in some scenarios

@karmada-bot karmada-bot added kind/bug Categorizes issue or PR as related to a bug. do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 27, 2022
@karmada-bot karmada-bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jan 27, 2022
@huone1 huone1 force-pushed the fix/MaxGroups-invalidation branch from 1e2930a to e7fceb1 Compare January 27, 2022 07:35
@karmada-bot karmada-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Jan 27, 2022
@huone1
Copy link
Contributor Author

huone1 commented Jan 27, 2022

/cc @Garrybest

@karmada-bot karmada-bot requested a review from Garrybest January 27, 2022 07:35
@karmada-bot karmada-bot added release-note-none Denotes a PR that doesn't merit a release note. and removed do-not-merge/release-note-label-needed Indicates that a PR should not merge because it's missing one of the release note labels. labels Jan 27, 2022
@Garrybest
Copy link
Member

Thanks @huone1, I will check it later.

@karmada-bot karmada-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed release-note-none Denotes a PR that doesn't merit a release note. labels Jan 28, 2022
@Garrybest
Copy link
Member

/assign

@Garrybest
Copy link
Member

This PR generally LGTM. But I'm a little curious about how does the spreadConstraint choose the available clusters?

Taken #1323 as an example. We divided 56 replicas into member1 and 54 replicas member3. When scaling up additional 70 replicas, it seems like we can't accommodate more available replicas in member1. If resource is enough, we should scheduled all additional replicas in to member1 but the final replica is 56.

How does the spreadConstraint choose the right clusters? Why not still choose member1 and member3 again?

/cc @mrlihanbo

@huone1
Copy link
Contributor Author

huone1 commented Jan 30, 2022

This PR generally LGTM. But I'm a little curious about how does the spreadConstraint choose the available clusters?

Taken #1323 as an example. We divided 56 replicas into member1 and 54 replicas member3. When scaling up additional 70 replicas, it seems like we can't accommodate more available replicas in member1. If resource is enough, we should scheduled all additional replicas in to member1 but the final replica is 56.

How does the spreadConstraint choose the right clusters? Why not still choose member1 and member3 again?

/cc @mrlihanbo

the PR #1334 add a new score plugin clusterLocality to favors cluster that already have requested and supply a preferred policy to choose the available clusters

@Garrybest
Copy link
Member

Got it, nice work👍
/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Jan 30, 2022
break
}
}
}
return res

return res, validTarget
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The res could be calculated by validTarget, I suggest remove it.

diff --git a/pkg/scheduler/core/division_algorithm.go b/pkg/scheduler/core/division_algorithm.go
index e0b2c082..689b8231 100644
--- a/pkg/scheduler/core/division_algorithm.go
+++ b/pkg/scheduler/core/division_algorithm.go
@@ -210,7 +210,7 @@ func scaleUpScheduleByReplicaDivisionPreference(
        preference policyv1alpha1.ReplicaDivisionPreference,
 ) ([]workv1alpha2.TargetCluster, error) {
        // Step 1: Find the clusters that have old replicas, so we can prefer to assign new replicas towards them.
-       scheduledClusterNames, scheduledClusters := findOutScheduledCluster(spec.Clusters, clusters)
+       scheduledClusters := findOutScheduledCluster(spec.Clusters, clusters)

        // Step 2: calculate the assigned Replicas in scheduledClusters
        assignedReplicas := util.GetSumOfReplicas(scheduledClusters)
@@ -229,7 +229,7 @@ func scaleUpScheduleByReplicaDivisionPreference(
        // If not, the old replicas may be recreated which is not expected during scaling up.
        // The parameter `scheduledClusterNames` is used to make sure that we assign new replicas to them preferentially
        // so that all the replicas are aggregated.
-       result, err := divideReplicasByPreference(clusterAvailableReplicas, newSpec.Replicas, preference, scheduledClusterNames)
+       result, err := divideReplicasByPreference(clusterAvailableReplicas, newSpec.Replicas, preference, util.ConvertToClusterNames(scheduledClusters))
        if err != nil {
                return result, err
        }
diff --git a/pkg/scheduler/core/util.go b/pkg/scheduler/core/util.go
index 1ceab827..93e2b8e4 100644
--- a/pkg/scheduler/core/util.go
+++ b/pkg/scheduler/core/util.go
@@ -77,11 +77,10 @@ func calAvailableReplicas(clusters []*clusterv1alpha1.Cluster, spec *workv1alpha

 // findOutScheduledCluster will return a name set of clusters
 // which are a part of `feasibleClusters` and have non-zero replicas.
-func findOutScheduledCluster(tcs []workv1alpha2.TargetCluster, candidates []*clusterv1alpha1.Cluster) (sets.String, []workv1alpha2.TargetCluster) {
+func findOutScheduledCluster(tcs []workv1alpha2.TargetCluster, candidates []*clusterv1alpha1.Cluster) []workv1alpha2.TargetCluster {
        validTarget := make([]workv1alpha2.TargetCluster, 0)
-       res := sets.NewString()
        if len(tcs) == 0 {
-               return res, validTarget
+               return validTarget
        }

        for _, targetCluster := range tcs {
@@ -92,14 +91,13 @@ func findOutScheduledCluster(tcs []workv1alpha2.TargetCluster, candidates []*clu
                // must in `candidates`
                for _, cluster := range candidates {
                        if targetCluster.Name == cluster.Name {
-                               res.Insert(targetCluster.Name)
                                validTarget = append(validTarget, targetCluster)
                                break
                        }
                }
        }

-       return res, validTarget
+       return validTarget
 }

 // resortClusterList is used to make sure scheduledClusterNames are in front of the other clusters in the list of

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is better, and I have fixed it

@huone1 huone1 force-pushed the fix/MaxGroups-invalidation branch from e7fceb1 to 72a91ba Compare February 8, 2022 02:45
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2022
@@ -77,11 +77,12 @@ func calAvailableReplicas(clusters []*clusterv1alpha1.Cluster, spec *workv1alpha

// findOutScheduledCluster will return a name set of clusters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments should be updated as well. Sorry for missing it from my last comment.

@huone1 huone1 force-pushed the fix/MaxGroups-invalidation branch from 72a91ba to 9aa1349 Compare February 8, 2022 02:52
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2022
@karmada-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: RainbowMango

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@karmada-bot karmada-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 8, 2022
@huone1 huone1 force-pushed the fix/MaxGroups-invalidation branch from 9aa1349 to 75aa4ce Compare February 8, 2022 02:54
@karmada-bot karmada-bot removed the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2022
Copy link
Member

@RainbowMango RainbowMango left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@karmada-bot karmada-bot added the lgtm Indicates that a PR is ready to be merged. label Feb 8, 2022
@karmada-bot karmada-bot merged commit b5493cc into karmada-io:master Feb 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/bug Categorizes issue or PR as related to a bug. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spreadconstraints[i].MaxGroups invalidate when scaleup replicas
4 participants