[HPA] Add targetCPUUtilization field to collector config #1066

moh-osman3 · 2022-08-30T09:47:31Z

Implementation for #1065

Adding targetCPUUtilization (percentage) to the collector config instead of hard coding the value.

linux-foundation-easycla · 2022-08-30T09:47:34Z

The committers listed above are authorized under a signed CLA.

✅ login: moh-osman3 / name: Moh Osman (012fd4a, 08733b9)

pavolloffay · 2022-08-30T13:57:47Z

@moh-osman3 please sign the CLA

pavolloffay · 2022-08-30T13:59:20Z

pkg/collector/horizontalpodautoscaler.go

+	if otelcol.Spec.TargetCPUUtilization != nil {
+		cpuTarget = *otelcol.Spec.TargetCPUUtilization
+	} else {
+		cpuTarget = defaultCPUTarget


the default value of `otelcol.Spec.TargetCPUUtilization should be set in the defaulting webhook.

Thanks for the review! moved the setting of default to opentelemetrycollector_webhook.go

It seems without this nil check, unit tests will begin failing with

--- FAIL: TestHPA (0.00s) --- FAIL: TestHPA/v2 (0.00s) panic: runtime error: invalid memory address or nil pointer dereference [recovered] panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1cf2364]

jaronoff97

Left a comment about where to add in the default behavior, also a question about testing

jaronoff97 · 2022-08-30T15:16:34Z

apis/v1alpha1/opentelemetrycollector_webhook.go

@@ -129,6 +129,10 @@ func (r *OpenTelemetryCollector) validateCRDSpec() error {
 			return fmt.Errorf("the OpenTelemetry Spec autoscale configuration is incorrect, minReplicas should be one or more")
 		}

+		if r.Spec.TargetCPUUtilization != nil && (*r.Spec.TargetCPUUtilization < int32(1) || *r.Spec.TargetCPUUtilization > int32(99)) {


We should also be setting a default in this file

Thanks for the review! Setting the default in opentelemetrycollector_webhook.go now

jaronoff97 · 2022-08-30T15:17:50Z

tests/e2e/autoscale/00-assert.yaml

 kind: HorizontalPodAutoscaler
 metadata:
  name: simplest-collector
 spec:
  minReplicas: 1
  maxReplicas: 2
-
+  metrics:


Should we keep a test for the default behavior?

Added a new OpenTelemetryCollector that does not set targetCPUUtilization for the default.

pavolloffay · 2022-08-31T08:31:57Z

apis/v1alpha1/opentelemetrycollector_webhook.go

@@ -62,6 +62,12 @@ func (r *OpenTelemetryCollector) Default() {
 		one := int32(1)
 		r.Spec.Replicas = &one
 	}
+
+	// if autoscaling is enabled then set default targetCPUUtilization
+	if r.Spec.MaxReplicas != nil && r.Spec.TargetCPUUtilization == nil {


is this guard needed. Perhaps we could default always.

Hmm based on your comment below I think I included the guard bc I was worried the default might override a user set value (and I also think it makes it clear this setting is only used when autoscaling is enabled). Removed the guard here and the ones in your comment below.

Seems this guard and the next comment's are needed otherwise the e2e tests begin to fail

pavolloffay · 2022-08-31T08:33:21Z

pkg/collector/reconcile/horizontalpodautoscaler.go

@@ -119,13 +120,23 @@ func setAutoscalerSpec(params Params, autoscalingVersion autodetect.AutoscalingV
 			} else {
 				updated.(*autoscalingv2beta2.HorizontalPodAutoscaler).Spec.MinReplicas = &one
 			}
+			if params.Instance.Spec.TargetCPUUtilization != nil {


Is this check needed? The defaulting webhook should make sure the value is always set.

pavolloffay · 2022-08-31T08:33:32Z

pkg/collector/reconcile/horizontalpodautoscaler.go

 		} else {
 			updated.(*autoscalingv2.HorizontalPodAutoscaler).Spec.MaxReplicas = *params.Instance.Spec.MaxReplicas
 			if params.Instance.Spec.MinReplicas != nil {
 				updated.(*autoscalingv2.HorizontalPodAutoscaler).Spec.MinReplicas = params.Instance.Spec.MinReplicas
 			} else {
 				updated.(*autoscalingv2.HorizontalPodAutoscaler).Spec.MinReplicas = &one
 			}
+			if params.Instance.Spec.TargetCPUUtilization != nil {


the same question as above.

pavolloffay · 2022-08-31T08:33:58Z

linting failed

moh-osman3 · 2022-08-31T22:21:46Z

linting failed

I think that was because I had some spaces instead of tabs in the webhook file. golangci-lint run is not throwing errors for me locally anymore. Can you run again please?

pavolloffay · 2022-09-01T10:24:16Z

@kevinearls could you please review as well?

kevinearls

LGTM

pavolloffay · 2022-09-02T13:38:36Z

@moh-osman3 CI failed

moh-osman3 · 2022-09-13T01:12:27Z

@moh-osman3 CI failed

@pavolloffay Sorry for delay I was on PTO last week! Yeah I noticed CI was failing due to unit test error

--- FAIL: TestHPA (0.00s)
    --- FAIL: TestHPA/v2 (0.00s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x1cf2364]

This was resolved by adding back the nil check in pkg/collector/horizontalpodautoscaler.go

I also noticed that after I applied e54ef2a my e2e tests are failing but unable to figure out the exact issue. I think I'm a little confused why the default set in the webhook isn't being picked up in horizontalpodautoscaler.go. So I reverted that change - currently I think the only test failing is Security for CI. Wondering if you have any guidance to get the tests passing?

pavolloffay · 2022-09-15T07:19:42Z

pkg/collector/horizontalpodautoscaler.go

+	if otelcol.Spec.TargetCPUUtilization != nil {
+		cpuTarget = *otelcol.Spec.TargetCPUUtilization
+	} else {
+		cpuTarget = int32(90)


I don't like that the default CPU target utilization is defined in two places: defaulting webhook and here. There should be only one place where defaults are defined - preferably in the defaulting webhook.

pavolloffay · 2022-09-15T07:20:38Z

pkg/collector/reconcile/horizontalpodautoscaler.go

+			if params.Instance.Spec.TargetCPUUtilization != nil {
+				updated.(*autoscalingv2beta2.HorizontalPodAutoscaler).Spec.Metrics[0].Resource.Target.AverageUtilization = params.Instance.Spec.TargetCPUUtilization
+			} else {
+				updated.(*autoscalingv2beta2.HorizontalPodAutoscaler).Spec.Metrics[0].Resource.Target.AverageUtilization = &ninety


The same here - the default is defined in two places

pavolloffay · 2022-09-16T08:02:13Z

the PR needs to be rebased

pavolloffay · 2022-09-20T08:26:28Z

apis/v1alpha1/opentelemetrycollector_types.go

+	// TargetCPUUtilization sets the target average CPU used across all replicas.
+	// If average CPU exceeds this value, the HPA will scale up. Defaults to 90 percent.
+	// +optional
+	TargetCPUUtilization *int32 `json:"targetCPUUtilization,omitempty"`


@kevinearls could you please review this PR.

I am curious if this should be moved into .spec.autoscaler.

@pavolloffay @moh-osman3 Yes, sorry I didn't comment on this earlier. spec.autoscaler is the best place for this (and in the future we should figure out how to move maxreplicas.)

In the future we should put any other metrics in spec.autoscaler too.

In the future we should put any other metrics in spec.autoscaler too.

We should put this comment somewhere in the CRD or book a ticket.

Thanks for pointing this out! I moved TargetCPUUtilization to the appropriate place in the spec now. I also created #1115 to track moving MaxReplicas and MinReplicas to .spec.autoscaler. I don't mind taking this issue on in a followup PR.

kevinearls

LGTM

moh-osman3 requested a review from a team August 30, 2022 09:47

pavolloffay reviewed Aug 30, 2022

View reviewed changes

jaronoff97 reviewed Aug 30, 2022

View reviewed changes

pavolloffay reviewed Aug 31, 2022

View reviewed changes

pavolloffay approved these changes Sep 1, 2022

View reviewed changes

kevinearls approved these changes Sep 2, 2022

View reviewed changes

pavolloffay reviewed Sep 15, 2022

View reviewed changes

moh-osman3 force-pushed the main branch 3 times, most recently from b7c0650 to d2fa170 Compare September 19, 2022 09:57

pavolloffay reviewed Sep 20, 2022

View reviewed changes

moh-osman3 force-pushed the main branch from 32dea9a to f32010e Compare September 21, 2022 02:12

pavolloffay approved these changes Sep 21, 2022

View reviewed changes

add targetCPUUTilization to HPA

ff51dc1

moh-osman3 force-pushed the main branch from f32010e to ff51dc1 Compare September 22, 2022 16:37

kevinearls approved these changes Sep 23, 2022

View reviewed changes

pavolloffay merged commit f7aafc5 into open-telemetry:main Sep 23, 2022

moh-osman3 mentioned this pull request Dec 5, 2022

Allow configuration of target CPU Utilization for HPA #1065

Closed

moh-osman3 mentioned this pull request Aug 15, 2023

REQUEST: New membership for moh-osman3 open-telemetry/community#1647

Closed

6 tasks

ItielOlenick pushed a commit to ItielOlenick/opentelemetry-operator that referenced this pull request May 1, 2024

Add targetCPUUTilization to HPA (open-telemetry#1066)

1e8a338

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[HPA] Add targetCPUUtilization field to collector config #1066

[HPA] Add targetCPUUtilization field to collector config #1066

moh-osman3 commented Aug 30, 2022

linux-foundation-easycla bot commented Aug 30, 2022 •

edited

Loading

pavolloffay commented Aug 30, 2022

pavolloffay Aug 30, 2022

moh-osman3 Aug 30, 2022

moh-osman3 Sep 14, 2022

jaronoff97 left a comment

jaronoff97 Aug 30, 2022

moh-osman3 Aug 30, 2022

jaronoff97 Aug 30, 2022

moh-osman3 Aug 30, 2022

pavolloffay Aug 31, 2022

moh-osman3 Aug 31, 2022

moh-osman3 Sep 14, 2022

pavolloffay Aug 31, 2022

moh-osman3 Aug 31, 2022

pavolloffay Aug 31, 2022

moh-osman3 Aug 31, 2022

pavolloffay commented Aug 31, 2022

moh-osman3 commented Aug 31, 2022

pavolloffay commented Sep 1, 2022

kevinearls left a comment

pavolloffay commented Sep 2, 2022

moh-osman3 commented Sep 13, 2022

pavolloffay Sep 15, 2022

pavolloffay Sep 15, 2022

pavolloffay commented Sep 16, 2022

pavolloffay Sep 20, 2022

kevinearls Sep 20, 2022

pavolloffay Sep 20, 2022

moh-osman3 Sep 20, 2022

kevinearls left a comment

[HPA] Add targetCPUUtilization field to collector config #1066

[HPA] Add targetCPUUtilization field to collector config #1066

Conversation

moh-osman3 commented Aug 30, 2022

linux-foundation-easycla bot commented Aug 30, 2022 • edited Loading

pavolloffay commented Aug 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jaronoff97 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay commented Aug 31, 2022

moh-osman3 commented Aug 31, 2022

pavolloffay commented Sep 1, 2022

kevinearls left a comment

Choose a reason for hiding this comment

pavolloffay commented Sep 2, 2022

moh-osman3 commented Sep 13, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pavolloffay commented Sep 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kevinearls left a comment

Choose a reason for hiding this comment

linux-foundation-easycla bot commented Aug 30, 2022 •

edited

Loading