-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add annotation to pause autoscaling at a fixed count #2765
Conversation
Thanks for the PR! Would you mind opening a PR for the docs please? |
Does this support pausing without defining an amount (the scenario with just paused) or only to a fixed amount of replicas? |
It pauses at a fixed amount of replicas (which can be 0 as well) |
Maybe we could add another subsection under https://keda.sh/docs/2.6/concepts/scaling-deployments/ |
Do you think you can add e2e tests for this feature. A couple of scenarios come to my mind, initiate the scaling then pause to 0 replicas (and N replicas), then disable the pause and check that scaling is working. And another one is to start with paused scaling, then enable. |
I think we can merge these two tests into one. We can have a flow such as:
|
Yeah, though it would be great if we can add another simple scenario:
This way we have scenarios for different paused-replicas (0, 1 and N -> those are all distinct cases) and we have scenarios for creating ScaledObject with and without paused-replicas. You can use azure-queue test for this, there are already some additional tests around this scaler and it doesn't require creating and deploying the service (as you would have to do with Prometheus for example). |
I think it's redundant to test for 1 replica and N replicas separately. We could just set paused replicas as 3 instead of 1 in the last stage of the flow I suggested. Also, I think it'd be better for this test to be live separately. I was thinking of keeping it minimal using a simple memory scaler. |
1 and N replicas are distinct cases, because the way we KEDA scales workloads, we should really cover this. And I wasn't suggesting to extend the current azure-queue test but use it as a base. Memory scaler is a special case (it doesn't fully rely on KEDA and external metrics) so please don't use this one. |
Hm, I'd love to have the But if I recall correctly @zroubalik and @JorTurFer were not a fan of this? This is another use-case than specifying the amount imo which is equally important that others raised as well but I don't want to push this |
e756d83
to
526c261
Compare
/run-e2e pause* |
526c261
to
8ebc68b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Thanks for the contribution! ❤️ ❤️
Please, open a PR documenting the change in keda-docs 🙏
you're right. IMO, having only 1 annotation is easier than having more than one, but we can see the feedback from the users about how this approach is and iterate over it |
/run-e2e pause |
8ebc68b
to
a9c3a40
Compare
Different use cases though. My intention of this feature was "stop scaling, regardless of how many that we have now" which is not easy today as you have to check the amount of current replicas but not sure about @zroubalik's thoughts |
Aren't we overcomplicating the stuff? Getting the current number of replicas is very easy, even if user would like to automate the stuff 🤷♂️ I am not against adding a separate option for this, but it shouldn't overcomplicate the UX (too many options might be confusing). |
I don't see it as too complicated, either you just pause it and don't care or just specify the annotation with an amount next to it. It's not hard indeed, but another step that some people would want to avoid (see issue) Anyway let's leave it out and we can add it later on. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@aryan9600 e2e tests are failiing :(
/run-e2e azure-queue-pause* |
@aryan9600 is the problem in the e2e test or on the implementation? ie. if you run the test locally, do you see the failure? If you try to reproduce the behavior manually on some ScaledObject and Deployment, do you see the problem? |
I tried running the test locally, but after a lot of hours spent struggling, I couldn't get Azure storage to work as required on my machine and gave up. I used a CPU scaler to see if KEDA is actually scaling the deployment up when the load increases and that's working fine. |
So can you try with another scaler? (not CPU/Mem though) |
@zroubalik works perfectly fine with prometheus as well |
Let's use prometheus in the e2e tests then 👍 |
/run-e2e azure-queue-pause* |
/run-e2e azure-queue-pause* |
@@ -28,6 +29,8 @@ import ( | |||
"sigs.k8s.io/controller-runtime/pkg/client" | |||
|
|||
kedav1alpha1 "github.com/kedacore/keda/v2/apis/keda/v1alpha1" | |||
kedacontrollerutil "github.com/kedacore/keda/v2/controllers/keda/util" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just my thinking, we should unify the functions for Status & Conditions handling. Referencing controller related stuff here is suboptimal.
There are two locations with similar functions:
keda/controllers/keda/util/status.go
Lines 30 to 63 in 9f0ee58
func SetStatusConditions(ctx context.Context, client runtimeclient.StatusClient, logger logr.Logger, object interface{}, conditions *kedav1alpha1.Conditions) error { | |
var patch runtimeclient.Patch | |
runtimeObj := object.(runtimeclient.Object) | |
switch obj := runtimeObj.(type) { | |
case *kedav1alpha1.ScaledObject: | |
patch = runtimeclient.MergeFrom(obj.DeepCopy()) | |
obj.Status.Conditions = *conditions | |
case *kedav1alpha1.ScaledJob: | |
patch = runtimeclient.MergeFrom(obj.DeepCopy()) | |
obj.Status.Conditions = *conditions | |
default: | |
err := fmt.Errorf("unknown scalable object type %v", obj) | |
logger.Error(err, "Failed to patch Objects Status with Conditions") | |
return err | |
} | |
err := client.Status().Patch(ctx, runtimeObj, patch) | |
if err != nil { | |
logger.Error(err, "Failed to patch Objects Status with Conditions") | |
} | |
return err | |
} | |
// UpdateScaledObjectStatus patches the given ScaledObject with the updated status passed to it or returns an error. | |
func UpdateScaledObjectStatus(ctx context.Context, client runtimeclient.StatusClient, logger logr.Logger, scaledObject *kedav1alpha1.ScaledObject, status *kedav1alpha1.ScaledObjectStatus) error { | |
patch := runtimeclient.MergeFrom(scaledObject.DeepCopy()) | |
scaledObject.Status = *status | |
err := client.Status().Patch(ctx, scaledObject, patch) | |
if err != nil { | |
logger.Error(err, "Failed to patch ScaledObjects Status") | |
} | |
return err | |
} |
keda/pkg/scaling/executor/scale_executor.go
Lines 64 to 132 in 9f0ee58
func (e *scaleExecutor) updateLastActiveTime(ctx context.Context, logger logr.Logger, object interface{}) error { | |
var patch runtimeclient.Patch | |
now := metav1.Now() | |
runtimeObj := object.(runtimeclient.Object) | |
switch obj := runtimeObj.(type) { | |
case *kedav1alpha1.ScaledObject: | |
patch = runtimeclient.MergeFrom(obj.DeepCopy()) | |
obj.Status.LastActiveTime = &now | |
case *kedav1alpha1.ScaledJob: | |
patch = runtimeclient.MergeFrom(obj.DeepCopy()) | |
obj.Status.LastActiveTime = &now | |
default: | |
err := fmt.Errorf("unknown scalable object type %v", obj) | |
logger.Error(err, "Failed to patch Objects Status") | |
return err | |
} | |
err := e.client.Status().Patch(ctx, runtimeObj, patch) | |
if err != nil { | |
logger.Error(err, "Failed to patch Objects Status") | |
} | |
return err | |
} | |
func (e *scaleExecutor) setCondition(ctx context.Context, logger logr.Logger, object interface{}, status metav1.ConditionStatus, reason string, message string, setCondition func(kedav1alpha1.Conditions, metav1.ConditionStatus, string, string)) error { | |
var patch runtimeclient.Patch | |
runtimeObj := object.(runtimeclient.Object) | |
switch obj := runtimeObj.(type) { | |
case *kedav1alpha1.ScaledObject: | |
patch = runtimeclient.MergeFrom(obj.DeepCopy()) | |
setCondition(obj.Status.Conditions, status, reason, message) | |
case *kedav1alpha1.ScaledJob: | |
patch = runtimeclient.MergeFrom(obj.DeepCopy()) | |
setCondition(obj.Status.Conditions, status, reason, message) | |
default: | |
err := fmt.Errorf("unknown scalable object type %v", obj) | |
logger.Error(err, "Failed to patch Objects Status") | |
return err | |
} | |
err := e.client.Status().Patch(ctx, runtimeObj, patch) | |
if err != nil { | |
logger.Error(err, "Failed to patch Objects Status") | |
} | |
return err | |
} | |
func (e *scaleExecutor) setReadyCondition(ctx context.Context, logger logr.Logger, object interface{}, status metav1.ConditionStatus, reason string, message string) error { | |
active := func(conditions kedav1alpha1.Conditions, status metav1.ConditionStatus, reason string, message string) { | |
conditions.SetReadyCondition(status, reason, message) | |
} | |
return e.setCondition(ctx, logger, object, status, reason, message, active) | |
} | |
func (e *scaleExecutor) setActiveCondition(ctx context.Context, logger logr.Logger, object interface{}, status metav1.ConditionStatus, reason string, message string) error { | |
active := func(conditions kedav1alpha1.Conditions, status metav1.ConditionStatus, reason string, message string) { | |
conditions.SetActiveCondition(status, reason, message) | |
} | |
return e.setCondition(ctx, logger, object, status, reason, message, active) | |
} | |
func (e *scaleExecutor) setFallbackCondition(ctx context.Context, logger logr.Logger, object interface{}, status metav1.ConditionStatus, reason string, message string) error { | |
fallback := func(conditions kedav1alpha1.Conditions, status metav1.ConditionStatus, reason string, message string) { | |
conditions.SetFallbackCondition(status, reason, message) | |
} | |
return e.setCondition(ctx, logger, object, status, reason, message, fallback) | |
} |
We should refactor this to have it in one place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could move keda/controllers/keda/util/status.go
to pkg/util/status.go
, this way the SetStatusConditions
and UpdateScaldObjectStatus
can be used across the project. As for unifying the functions for Status and Conditions handling, I think that should be tackled in a different PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a change like that. And I concur this should be tackled in a different PR.
There are 2 things we should cover, update documentation and implement similar functionality, it should be relatively trivial : a check on the annotation in the ScaleJob controller loop. |
There's a PR open for docs already: https://github.com/kedacore/keda-docs/pull/728/files. I think the only thing left is to update the CHANGELOG, let me do that 👍 |
Sorry, I missed the docs PR 🤦 😄 |
Also could you please solve the conflict in the Changelog? |
4352b3c
to
450242a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Signed-off-by: Sanskar Jaiswal <[email protected]>
Signed-off-by: Sanskar Jaiswal <[email protected]>
Signed-off-by: Sanskar Jaiswal <[email protected]>
Signed-off-by: Sanskar Jaiswal <[email protected]>
450242a
to
efe3118
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
waiting for the merge for other to take a look, @kedacore/keda-contributors
/run-e2e azure-queue-pause* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Only small nit in the changelog and one question about e2e tests:
What's the difference between them? I fell pause-at
as a case inside pause
.
I mean, in the first one you do the normal scaling cycle, and then pause in 5 so the instance count goes to 5, and in the second one, you pause in 1, and the instances goes to 1. In fact, the second one is more complete because it continues after the pause.
Am I missing anything?
Thanks a ton for this amazing feature! ❤️ ❤️ ❤️
Signed-off-by: Sanskar Jaiswal <[email protected]>
efe3118
to
77d7b12
Compare
@JorTurFer yeah there is some overlap between them, but the tests together cover all possible scenarios as suggested by @zroubalik in #2765 (comment) |
I was missing something xD |
Signed-off-by: Sanskar Jaiswal [email protected]
This PR adds a user configurable annotation
autoscaling.keda.sh/paused-replicas
, which lets users scale the target to a fixed replica count and pauses autoscaling.Checklist
Relates to #944