Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[HPA] Add targetCPUUtilization field to collector config #1066

Merged
merged 1 commit into from
Sep 23, 2022

Conversation

moh-osman3
Copy link
Contributor

Implementation for #1065

Adding targetCPUUtilization (percentage) to the collector config instead of hard coding the value.

@moh-osman3 moh-osman3 requested a review from a team August 30, 2022 09:47
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Aug 30, 2022

CLA Signed

The committers listed above are authorized under a signed CLA.

@pavolloffay
Copy link
Member

@moh-osman3 please sign the CLA

if otelcol.Spec.TargetCPUUtilization != nil {
cpuTarget = *otelcol.Spec.TargetCPUUtilization
} else {
cpuTarget = defaultCPUTarget
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the default value of `otelcol.Spec.TargetCPUUtilization should be set in the defaulting webhook.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! moved the setting of default to opentelemetrycollector_webhook.go

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems without this nil check, unit tests will begin failing with

--- FAIL: TestHPA (0.00s)
    --- FAIL: TestHPA/v2 (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1cf2364]

Copy link
Contributor

@jaronoff97 jaronoff97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a comment about where to add in the default behavior, also a question about testing

@@ -129,6 +129,10 @@ func (r *OpenTelemetryCollector) validateCRDSpec() error {
return fmt.Errorf("the OpenTelemetry Spec autoscale configuration is incorrect, minReplicas should be one or more")
}

if r.Spec.TargetCPUUtilization != nil && (*r.Spec.TargetCPUUtilization < int32(1) || *r.Spec.TargetCPUUtilization > int32(99)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also be setting a default in this file

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! Setting the default in opentelemetrycollector_webhook.go now

kind: HorizontalPodAutoscaler
metadata:
name: simplest-collector
spec:
minReplicas: 1
maxReplicas: 2

metrics:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep a test for the default behavior?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new OpenTelemetryCollector that does not set targetCPUUtilization for the default.

@@ -62,6 +62,12 @@ func (r *OpenTelemetryCollector) Default() {
one := int32(1)
r.Spec.Replicas = &one
}

// if autoscaling is enabled then set default targetCPUUtilization
if r.Spec.MaxReplicas != nil && r.Spec.TargetCPUUtilization == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this guard needed. Perhaps we could default always.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm based on your comment below I think I included the guard bc I was worried the default might override a user set value (and I also think it makes it clear this setting is only used when autoscaling is enabled). Removed the guard here and the ones in your comment below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems this guard and the next comment's are needed otherwise the e2e tests begin to fail

@@ -119,13 +120,23 @@ func setAutoscalerSpec(params Params, autoscalingVersion autodetect.AutoscalingV
} else {
updated.(*autoscalingv2beta2.HorizontalPodAutoscaler).Spec.MinReplicas = &one
}
if params.Instance.Spec.TargetCPUUtilization != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check needed? The defaulting webhook should make sure the value is always set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

} else {
updated.(*autoscalingv2.HorizontalPodAutoscaler).Spec.MaxReplicas = *params.Instance.Spec.MaxReplicas
if params.Instance.Spec.MinReplicas != nil {
updated.(*autoscalingv2.HorizontalPodAutoscaler).Spec.MinReplicas = params.Instance.Spec.MinReplicas
} else {
updated.(*autoscalingv2.HorizontalPodAutoscaler).Spec.MinReplicas = &one
}
if params.Instance.Spec.TargetCPUUtilization != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same question as above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

@pavolloffay
Copy link
Member

linting failed

@moh-osman3
Copy link
Contributor Author

linting failed

I think that was because I had some spaces instead of tabs in the webhook file. golangci-lint run is not throwing errors for me locally anymore. Can you run again please?

@pavolloffay
Copy link
Member

@kevinearls could you please review as well?

Copy link
Member

@kevinearls kevinearls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pavolloffay
Copy link
Member

@moh-osman3 CI failed

@moh-osman3
Copy link
Contributor Author

@moh-osman3 CI failed

@pavolloffay Sorry for delay I was on PTO last week! Yeah I noticed CI was failing due to unit test error

--- FAIL: TestHPA (0.00s)
    --- FAIL: TestHPA/v2 (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1cf2364]

This was resolved by adding back the nil check in pkg/collector/horizontalpodautoscaler.go

I also noticed that after I applied e54ef2a my e2e tests are failing but unable to figure out the exact issue. I think I'm a little confused why the default set in the webhook isn't being picked up in horizontalpodautoscaler.go. So I reverted that change - currently I think the only test failing is Security for CI. Wondering if you have any guidance to get the tests passing?

if otelcol.Spec.TargetCPUUtilization != nil {
cpuTarget = *otelcol.Spec.TargetCPUUtilization
} else {
cpuTarget = int32(90)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like that the default CPU target utilization is defined in two places: defaulting webhook and here. There should be only one place where defaults are defined - preferably in the defaulting webhook.

if params.Instance.Spec.TargetCPUUtilization != nil {
updated.(*autoscalingv2beta2.HorizontalPodAutoscaler).Spec.Metrics[0].Resource.Target.AverageUtilization = params.Instance.Spec.TargetCPUUtilization
} else {
updated.(*autoscalingv2beta2.HorizontalPodAutoscaler).Spec.Metrics[0].Resource.Target.AverageUtilization = &ninety
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same here - the default is defined in two places

@pavolloffay
Copy link
Member

the PR needs to be rebased

@moh-osman3 moh-osman3 force-pushed the main branch 3 times, most recently from b7c0650 to d2fa170 Compare September 19, 2022 09:57
// TargetCPUUtilization sets the target average CPU used across all replicas.
// If average CPU exceeds this value, the HPA will scale up. Defaults to 90 percent.
// +optional
TargetCPUUtilization *int32 `json:"targetCPUUtilization,omitempty"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevinearls could you please review this PR.

I am curious if this should be moved into .spec.autoscaler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavolloffay @moh-osman3 Yes, sorry I didn't comment on this earlier. spec.autoscaler is the best place for this (and in the future we should figure out how to move maxreplicas.)

In the future we should put any other metrics in spec.autoscaler too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future we should put any other metrics in spec.autoscaler too.

We should put this comment somewhere in the CRD or book a ticket.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out! I moved TargetCPUUtilization to the appropriate place in the spec now. I also created #1115 to track moving MaxReplicas and MinReplicas to .spec.autoscaler. I don't mind taking this issue on in a followup PR.

Copy link
Member

@kevinearls kevinearls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@pavolloffay pavolloffay merged commit f7aafc5 into open-telemetry:main Sep 23, 2022
ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants