Pod Disruption Budget implementation #900

triceras · 2025-01-13T23:22:17Z

This is a new implementation for Pod Disruption Burget. This implements #104

controllers/humiocluster_controller.go

api/v1alpha1/humiocluster_types.go

controllers/humiocluster_controller.go

…rces

…econcile loop

SaaldjorMike · 2025-01-27T12:09:29Z

controllers/humiocluster_controller.go

+	pdbSpec := hnp.GetPodDisruptionBudget()
+	r.Log.Info("Entering PDB enforcement check", "nodePool", hnp.GetNodePoolName(), "pdbSpec", pdbSpec, "pdbSpec.Enabled", pdbSpec != nil && pdbSpec.Enabled)
+	if pdbSpec != nil && pdbSpec.Enabled {
+		pdbName := fmt.Sprintf("%s-%s-pdb", hc.Name, hnp.GetNodePoolName())


You don't need to explicitly add hc.Name yourself, unless you want hc.Name added twice in the name.

GetNodePoolName() returns return strings.Join([]string{hnp.GetClusterName(), hnp.nodePoolName}, "-"), and GetClusterName() returns hnp.clusterName which is set to hc.Name. So essentially, we're already adding hc.Name to the prefix returned by GetNodePoolName().

Instead of constructing pdbName here, perhaps you can instead just use GetPodDisruptionBudgetName() which already construct the full PDB name.

That's a good observation. However, fwiw I haven't finished refactoring the code

SaaldjorMike · 2025-01-27T12:12:54Z

controllers/humiocluster_controller.go

+
+		r.Log.Info("Fetching PDB",
+			"pdbName", pdbName,
+			"namespace", hnp.GetNamespace())


You don't need to add the namespace to the logs yourself as we initialize the logger in the start of the Reconcile() function with the namespace: https://github.com/humio/humio-operator/blob/master/controllers/humiocluster_controller.go#L91-L96

With this, it means all logs using r.Log would include both the name of the HumioCluster resource as well as the namespace of where it is located.

Thanks for the info. Actually, I was unsure about the log message structure that and was going to ask about it. I will keep it simple as r.Log.Info("Fetching PDB", "pdbName", pdbName)

SaaldjorMike · 2025-01-27T12:15:39Z

controllers/humiocluster_controller.go

+			if k8serrors.IsNotFound(err) {
+				r.Log.Info("PDB not found for node pool, proceeding without PDB check",
+					"pdb", pdbName,
+					"namespace", hnp.GetNamespace(),


The same thing is true here where you don't need to explicitly add namespace to the log events, since the logger already includes this to all events.

Same as above

SaaldjorMike · 2025-01-27T12:34:10Z

controllers/humiocluster_controller.go

@@ -1922,6 +1928,64 @@ func (r *HumioClusterReconciler) ensureMismatchedPodsAreDeleted(ctx context.Cont

 	podsForDeletion := desiredLifecycleState.podsToBeReplaced

+	pdbSpec := hnp.GetPodDisruptionBudget()
+	r.Log.Info("Entering PDB enforcement check", "nodePool", hnp.GetNodePoolName(), "pdbSpec", pdbSpec, "pdbSpec.Enabled", pdbSpec != nil && pdbSpec.Enabled)


As far as I can tell, we don't really need to add anything to ensureMismatchedPodsAreDeleted, since ensureMismatchedPodsAreDeleted is handling the case of pod replacements due to config changes and version changes/upgrades.

The case for adding PDB is more about ensuring components "outside the humio-operator" e.g. k8s worker node drain won't bring more pods/instances down than the configured amount. Config changes and version changes/upgrades is already controlled through the configurable update strategy (see https://github.com/humio/humio-operator/blob/master/api/v1alpha1/humiocluster_types.go#L302) and this update strategy logic already included in the current master version of ensureMismatchedPodsAreDeleted.

Let me know if I missed something. Thank you.

Makes total sense you explained. So what I understand what you said is that ensureMismatchedPodsAreDeleted don't require explicit PDB handling because the function already coordinates pod replacements using:

The cluster's update strategy (rolling updates)

Max unavailable pods calculation

Readiness checks

Version/config mismatch detection

And PDB's are primarily for external kubernetes-level disruption, like node drains or maintenance, rather than operator-controlled replacements.

Said that, I have removed the PDB section from ensureMismatchedPodsAreDeleted

SaaldjorMike · 2025-01-27T12:36:26Z

controllers/humiocluster_controller.go

+	if !hc.DeletionTimestamp.IsZero() {
+		if err := r.handlePDBFinalizers(ctx, hc); err != nil {
+			return r.updateStatus(ctx, r.Client.Status(), hc, statusOptions().
+				withMessage(fmt.Sprintf("failed to handle PDB finalizers: %s", err)))
+		}
+	}


I believe handlePDBFinalizers can be removed given that all it does is remove the finalizer.

As long as ownerReferences on the PDB object is configured to point at the HumioCluster object, then the PDB's are garbage collected once the HumioCluster object is deleted, see https://kubernetes.io/docs/concepts/architecture/garbage-collection/#:~:text=Cascading%20deletion,a%20process%20called%20cascading%20deletion.

Thanks for correcting me. Conditional statement removed.

SaaldjorMike · 2025-01-27T12:37:52Z

controllers/humiocluster_controller.go

+
+	if err := r.cleanupOrphanedPDBs(ctx, hc, &humioNodePools); err != nil {
+		return r.updateStatus(ctx, r.Client.Status(), hc, statusOptions().
+			withMessage(err.Error()))
+	}
+


The same comment goes here as https://github.com/humio/humio-operator/pull/900/files#r1930459818
Cleanup of PDB objects should already be garbage collected by k8s out of the box as long as owner references is properly set up.

I wanted to be extra careful here. Deleted the conditional statement above.

SaaldjorMike · 2025-01-27T12:39:08Z

api/v1alpha1/humiocluster_types.go

+type HumioPodDisruptionBudgetSpec struct {
+	// +kubebuilder:validation:Type=string
+	// +kubebuilder:validation:Format=int-or-string
+	// +kubebuilder:validation:Immutable


What is the idea behind making this Immutable? I don't see why we would not allow users to change the configs after initially configuring.

I have marked those fields immutable to prevent runtime changes that could break the cluster during updates. However, if the operator can safely handle PDB updates through proper requeuing and rolling update coordination as you previously said, then we could safely remove the flag.

I am removing the flag "kubebuilder:validation:Immutable".

SaaldjorMike · 2025-01-27T12:39:15Z

api/v1alpha1/humiocluster_types.go

+
+	// +kubebuilder:validation:Type=string
+	// +kubebuilder:validation:Format=int-or-string
+	// +kubebuilder:validation:Immutable


What is the idea behind making this Immutable? I don't see why we would not allow users to change the configs after initially configuring.

Same answer as above

SaaldjorMike · 2025-01-27T12:40:53Z

controllers/humiocluster_controller.go

+
+	desiredPDB, err := r.constructPDB(hc, hnp, pdbSpec)
+	if err != nil {
+		r.Log.Error(err, "failed to construct PDB", "pdbName", hnp.GetPodDisruptionBudgetName(), "namespace", hnp.GetNamespace())


One more place where you could skip adding the namespace yourself due to the logger already including this in all events.

Removed "namespace" form the Log statement.

SaaldjorMike · 2025-01-27T12:44:52Z

controllers/humiocluster_controller.go

+			Labels:     kubernetes.LabelsForHumio(hc.Name), // Add node pool name label if needed
+			Finalizers: []string{HumioProtectionFinalizer},
+			OwnerReferences: []metav1.OwnerReference{
+				*metav1.NewControllerRef(hc, humiov1alpha1.GroupVersion.WithKind("HumioCluster")),


Interesting. I wasn't aware of this being a valid way to configure the owner reference.

The other places we use the controllerutil.SetControllerReference() function to set the ownerReference. Here's an example of this:

humio-operator/controllers/humiocluster_controller.go

Line 414 in b2e27ea

if err := controllerutil.SetControllerReference(hc, hbt, r.Scheme()); err != nil {

Do you know which one of these would be the better one to use? Perhaps we should be replacing the use of controllerutil.SetControllerReference() with using metav1.NewControllerRef().

Thanks for bringing this up. Had a bit of reflection on this comment. Both approaches are valid but serve different purposes:

controllerutil.SetControllerReference() is preferred for most cases because:

Automatically sets GVK (GroupVersionKind)

Handles cross-namespace ownership prevention - Provides safety checks for CRD scope compatibility

Maintains consistency with controller-runtime patterns

metav1.NewControllerRef() is used here because:

The PDB is explicitly namespaced with the cluster

Direct control over owner reference creation was needed

Avoids potential issues with automatic GVK resolution in some contexts.

In this case, using using metav1.NewControllerRef() directly ensured the owner reference would always have:

apiVersion: core.humio.com/v1alpha1 kind: HumioCluster

Even if the type wasn't fully registered in the scheme. However, since we properly registered types in humiocluster_types.go , we can safely use controllerutil.SetControllerReference() which provides better validation.

I would say that using controllerutil.SetControllerReference() would be a better choice really. I will switch the PDB related code to use controllerutil.SetControllerReference()

I see. Thank you a lot for the explanation.

SaaldjorMike · 2025-01-27T13:09:42Z

controllers/humiocluster_controller.go

+
+// SemanticPDBsEqual compares two PodDisruptionBudgets and returns true if they are equal
+func SemanticPDBsEqual(desired *policyv1.PodDisruptionBudget, current *policyv1.PodDisruptionBudget) bool {
+	if !equality.Semantic.DeepEqual(desired.Spec.MinAvailable, current.Spec.MinAvailable) {


Cool. Wasn't aware of this equality package.

I see we have a total of 10 references to reflect.DeepEqual() various places in the code base on master branch, and maybe we should consider refactoring those to use equality.Semantic.DeepEqual() instead.

We also have 91 places on master where we use cmp.diff(). Not all of those actually need the diff so perhaps we could also replace and update those with equality.Semantic.DeepEqual().

I don't expect this to be incorporated in this PR though, just wanted to call it out as a possible cleanup we may consider later as I wasn't aware of this equality package.

II also would like to tackle those changes on a different PR. Thanks for pointing that out

…ogging. A few code refactors

…ByCluster

triceras marked this pull request as ready for review January 14, 2025 04:40

triceras requested a review from a team as a code owner January 14, 2025 04:40

SaaldjorMike reviewed Jan 14, 2025

View reviewed changes

triceras added 5 commits January 16, 2025 15:50

Pod Disruption Budget implementation

35c428d

API docs for PDB

c2db234

Removed unnused variables

29e53bc

Removed unused function

2af8c39

PDB adjustments

1eb8ab0

triceras force-pushed the rafael/PodDisruptionBugdet branch from a1c32e5 to 1eb8ab0 Compare January 16, 2025 04:50

triceras added 2 commits January 16, 2025 16:01

fixes

4a962ac

Removed func podLabelsForHumio

c8db5ef

triceras requested a review from SaaldjorMike January 17, 2025 04:05

triceras and others added 18 commits January 17, 2025 19:55

Improvements

5321939

Removed unused function cleanupOrphanedNodePoolPDBs

892d37f

Removed comments in the controller pdb functions

fd24ec7

HumioNodeSpec

f6c1dad

HumioNodeSpec updates

cbd7875

updated api docs for pdb

b768812

using pdb functions

85710ff

removed func ensurePodDisruptionBudgets

eb71c9c

Improvements to the PDB logic

a8a0dce

Moved cleanupOrphanedPDBs outside the foor loop in cleanupUnusedResou…

f219b22

…rces

fixed wrong function placement

856c2ac

Adding missing func ensureViewGroupPermissionsConfigMap back to the r…

a9bbff9

…econcile loop

remove obsolete func shouldCreatePDBForNodePool

04a7681

Update humiocluster_controller.go

91f37c2

Update humiocluster_controller_test.go

bc98bf2

Update humiocluster_types.go

4172517

Update humiocluster_controller.go

88ef1d4

PDB improvements

232ed46

triceras added 3 commits January 24, 2025 17:48

Added API docs

8edefbe

Fixed integer overflow conversion

6dbc9c3

Using handlePDBFinalizers

2d82ce6

SaaldjorMike reviewed Jan 27, 2025

View reviewed changes

triceras added 2 commits January 28, 2025 14:16

Remobed PDB logic from ensureMismatchedPodsAreDeleted. Improved PDB l…

69a3348

…ogging. A few code refactors

removed unused funcs cleanupOrphanedPDBs, isValidNodePool and isOwned…

7781b69

…ByCluster

triceras requested a review from SaaldjorMike January 28, 2025 04:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pod Disruption Budget implementation #900

Pod Disruption Budget implementation #900

triceras commented Jan 13, 2025

SaaldjorMike Jan 27, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025 •

edited

Loading

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

SaaldjorMike Jan 28, 2025

SaaldjorMike Jan 27, 2025

triceras Jan 28, 2025

Pod Disruption Budget implementation #900

Are you sure you want to change the base?

Pod Disruption Budget implementation #900

Conversation

triceras commented Jan 13, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SaaldjorMike Jan 27, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SaaldjorMike Jan 27, 2025 •

edited

Loading