Add NodePool support to Cluster resource #188

birdayz · 2024-08-15T08:22:04Z

Operator v1: support for multiple Node Pools

Overall design notes

Everything is driven by spec.nodePools. The default nodepool, driven by the existing fields spec.replicas, spec.storage etc, is still fully supported. It is automatically injected into internal data structures as a nodePool with the name "default". This way. the code needs (almost) no special treatment for the default nodepool, but still supports it in full.
Deleted nodePools - so nodepools removed from spec.nodePools are handled by detecting their statefulset, and injecting a synthetic nodepoolSpec into the internal data structures.
Replace AdminAPIFactory with a factory that works with pods, instead of ordinals - as we need to support pods with different prefixes (nodepools).
Reduce cross-nodepool dependencies as much as possible, try to avoid code that requires overall replica count if possible. It is not possible everywhere, as we support for example only one pod in decomission at the same time.
Cloud Storage percentage has been reworked. We do not want to set bytes-specific cluster prop anymore. Instead, only use percentage based property.
This avoids problems with multiple node pools: bytes-based is very different if disk size changes (move between different tiers in cloud, for example).Percentage based is kept stable mostly (even exactly the same most of the time, 15% typically), and will not cause immediate churn on the old nodePool if disk size is going down.
Previously, if a brokerID is marked for decomission (status.decomissionBrokerID), but eventually the pod for the brokerID is not found, the decommission was flagged as done, and the status field cleared. However, this is not safe: if this happens, in a race condition, the broker may still get registered - but is not being decommissioned anymore, making it a ghost broker.
Made featuregate EmptySeedStartCluster mandatory.
Clusters <22.3.0 are not supported anymore. This significantly reduces complexity
with multi-nodepool-support. We would have to start the first node in initial startup,
and wait for this node. However, with multiple nodepools, all nodepools would have to
coordinate this. Since this is a very old version, that is not supported anymore,
support if therefore removed to simplify.

Not directly related changes, to reduce test flakiness:

Changed tests to require less resources, down from 1G,1cpu. Some tests run longer - if two tests like this run concurrently (we have concurrency limit 2), one of them sometimes gets starved on cpu/mem and fails to schedule until timeout.
Bumped timeout of readiness probe to 5s (had none configured => default 1s). this helped making lower resource allocations work. It is also rarely a problem in production, so IMHO no-brainer to bump it a little.

CLAassistant · 2024-08-15T08:22:10Z

All committers have signed the CLA.

src/go/k8s/internal/controller/redpanda/cluster_controller.go

src/go/k8s/api/vectorized/v1alpha1/cluster_types.go

operator/api/vectorized/v1alpha1/cluster_types.go

operator/api/vectorized/v1alpha1/cluster_webhook.go

birdayz · 2024-11-07T15:22:28Z

kind ping for review @chrisseto

chrisseto · 2024-11-07T21:22:40Z

operator/api/vectorized/v1alpha1/cluster_types.go

-// It returns 1 when not initialized (as fresh clusters start from 1 replica)
-func (r *Cluster) GetCurrentReplicas() int32 {
-	if r == nil {
+func (r *Cluster) SumNodePoolReplicas() int32 {


nit: This doesn't indicate if it's returning the current replicas or desired replicas. Please add a comment or rename the method.

renamed to GetDesiredReplicas. it looks like it in the diff, but this function is not a replacement for GetCurrentReplicas - this can be seen in metric_controller.go, where a reference to Spec.Replicas is replaced with a call to this function here.

chrisseto · 2024-11-07T21:24:49Z

operator/api/vectorized/v1alpha1/cluster_types.go

@@ -1422,3 +1419,41 @@ func (r *Cluster) GetDecommissionBrokerID() *int32 {
 func (r *Cluster) SetDecommissionBrokerID(id *int32) {
 	r.Status.DecommissioningNode = id
 }
+
+// getNodePoolsFromSpec returns the NodePools defined in the spec.
+// This contains the primary NodePool (driven by spec.replicas for example), and the


Suggested change

// This contains the primary NodePool (driven by spec.replicas for example), and the

// This contains the default NodePool (driven by spec.replicas for example), and the

Not sure what the terminology is but let's be consistent. If the primary pool has the name "default", let's rename the constant otherwise update this comment.

Right, this is an oversight. default nodepool is the correct terminology. Updated the comment

operator/pkg/nodepools/pools.go

chrisseto · 2024-11-07T21:29:59Z

operator/pkg/nodepools/pools.go

+	"k8s.io/apimachinery/pkg/api/resource"
+	"k8s.io/utils/ptr"
+	"sigs.k8s.io/controller-runtime/pkg/client"
+


nit: extra newline (we'll need to reconfigure out linters).

operator/pkg/nodepools/pools.go

operator/pkg/resources/statefulset_update.go

chrisseto · 2024-11-07T21:49:44Z

operator/kind-for-cloud.yaml

+- role: worker
+  image: kindest/node:v1.29.8@sha256:d46b7aa29567e93b27f7531d258c372e829d7224b25e3fc6ffdefed12476d3aa
+- role: worker
+  image: kindest/node:v1.29.8@sha256:d46b7aa29567e93b27f7531d258c372e829d7224b25e3fc6ffdefed12476d3aa
+- role: worker
+  image: kindest/node:v1.29.8@sha256:d46b7aa29567e93b27f7531d258c372e829d7224b25e3fc6ffdefed12476d3aa


+1 to Rafal. This isn't super sustainable. Can we not just schedule multiple brokers onto a single node for the purpose of testing?

chrisseto · 2024-11-07T21:49:53Z

operator/pkg/admin/admin.go

@@ -17,11 +17,14 @@ import (
 	"io"
 	"time"

+	corev1 "k8s.io/api/core/v1"
+


nit: extra newline

chrisseto · 2024-11-07T21:52:09Z

operator/pkg/labels/labels.go

+	// NodePoolKey is used to document the node pool associated with the StatefulSet.
+	NodePoolKey = "cluster.redpanda.com/nodepool"
+
+	// PodLabelNodeIDKey


nit: incomplete comment. Seems like PodNodeIDKey would be more consistent with the naming convention here? Label is already implied by the package.

chrisseto · 2024-11-07T21:54:05Z

operator/pkg/labels/labels.go

@@ -30,6 +30,11 @@ const (
 	PartOfKey = "app.kubernetes.io/part-of"
 	// The tool being used to manage the operation of an application
 	ManagedByKey = "app.kubernetes.io/managed-by"
+	// NodePoolKey is used to document the node pool associated with the StatefulSet.


Suggested change

// NodePoolKey is used to document the node pool associated with the StatefulSet.

// NodePoolKey is used to denote the node pool associated with the StatefulSet.

Struggling to explain why document isn't the right word here. I think document implies a separation of information and thing that's being informed about. A label on something wouldn't document something about itself but rather denote something about itself.

chrisseto · 2024-11-07T21:54:38Z

Thanks for the ping :)

The function is about desired replicas. Rename it to make this clear.

the correct term is "default".

it's not required to use status anymore, as GetNodePools contains also deleted pools.

birdayz · 2024-11-08T09:16:32Z

Thanks for the review! Addressed most of the comments.
Ready for the next round.

Mostly, these are open and awaiting further feedback:

Add NodePool support to Cluster resource #188 (comment) - Help required how to actually achieve this (no specific opinion on my side)
Add NodePool support to Cluster resource #188 (comment) is probably most controversial, i added an explanation why i believe it can stay as-is. can you please have another look?

also waiting for tests to finish. there is some problem with flakiness (here, but also in main)

- Change param to non-pointer, do conversion from pointer at the caller's site, where they already checked for nil - Use loop to simplify

- The term Label is obsolete, as it's in the labels package. - Update comment

birdayz · 2024-11-12T19:49:27Z

@chrisseto kind ping for next round. would be really important to bring this over the finish line this week!

chrisseto

Approving so I'm not in a blocking path. Some refactoring of GetNodePools before merging would be greatly appreciated.

I can't seem to find the comment again but I'm still curious about the management of the seed servers list and why it's populated from Status instead of the Spec.

Before merging could you either point me to the answer or comment again?

chrisseto · 2024-11-13T17:04:35Z

operator/pkg/nodepools/pools.go

+		})
+	}
+
+	// Also add "virtual NodePools" based on StatefulSets found.


AH. Wow, I really misread this function the first time around. I think what's really throwing me for a loop (aside from using side by side view in github) is the inconsistency of breaking loops within this loop and the number at that.

Could we pick a single method of iterating and short circuiting be it labeled breaks, slices.Contains, or a helper findFunc[T any]([]T, func(T) bool) (*T, bool) function?

operator/pkg/nodepools/pools.go

chrisseto · 2024-11-13T17:19:18Z

operator/pkg/nodepools/pools.go

+	for i := range stsList.Items {
+		sts := stsList.Items[i]
+
+		for _, ownerRef := range sts.OwnerReferences {


I find this check and the later check for the existence of a redpanda container a bit odd. We're already using a label selector to filter our list of Sts's; is there a reason we're performing these extra checks? Are they checking for different things? If they're not, why does one error but the other does not?

I like to avoid "anxious" code when possible as it creates an ethical dilemma when it comes time to refactor. I'll have all the above questions with potentially no one to answer them nor test cases to validate them.

If these checks are in place for a reason, document what exactly they're checking for and ideally the situations in which they might trigger.

If these checks are just anxiety or paranoia that you can't shake, group them together and label them as such. A simple // Paranoid check to make sure we don't delete other STSs can save so much head-aching in the future.

Ownerref check

i replaced it with strings.ContainsFunc, and improved the comment.

Redpanda container check

There is no guarantee that a redpanda container exists. It's unlikely, but the STS could just not have it for some reason. So i check for it and error out. It's not specifically paranoid, it's just that the container is required to build the NodePoolSpec of that deleted (extract resource requirements from it). I'll keep that unchanged

birdayz · 2024-11-14T09:09:54Z

I can't seem to find the comment again but I'm still curious about the management of the seed servers list and why it's populated from Status instead of the Spec.

based on your comment, i refactored it. It's now not based on status anymore.
See commit: 7fce0e8

It used to use status, because spec does not contain deleted nodepools. however, later, a helper function was added, that always returns all nodepools, including these "virtual pools" based on the deleted ones. now, we're using that, status is not required anymore.

birdayz force-pushed the jb/nodepools branch 5 times, most recently from b4b88ea to 386ec21 Compare August 20, 2024 17:54

birdayz force-pushed the jb/nodepools branch 3 times, most recently from aa676ba to 50b59d3 Compare August 26, 2024 13:56

RafalKorepta reviewed Aug 26, 2024

View reviewed changes

src/go/k8s/internal/controller/redpanda/cluster_controller.go Outdated Show resolved Hide resolved

chrisseto reviewed Aug 26, 2024

View reviewed changes

birdayz force-pushed the jb/nodepools branch from c9402d4 to aeb451f Compare September 4, 2024 13:16

birdayz force-pushed the jb/nodepools branch from e7b94a0 to 351497d Compare September 27, 2024 14:36

birdayz force-pushed the jb/nodepools branch 7 times, most recently from 55a961b to 19a6c99 Compare October 9, 2024 07:34

birdayz marked this pull request as ready for review October 9, 2024 08:13

birdayz requested a review from andrewstucki as a code owner October 9, 2024 08:13

birdayz requested review from RafalKorepta and chrisseto October 9, 2024 08:13

birdayz changed the title ~~[WIP] Add NodePool support to Cluster resource~~ [DO NOT MERGE YET] Add NodePool support to Cluster resource Oct 9, 2024

sbocinec reviewed Oct 9, 2024

View reviewed changes

operator/api/vectorized/v1alpha1/cluster_types.go Outdated Show resolved Hide resolved

sbocinec reviewed Oct 9, 2024

View reviewed changes

operator/api/vectorized/v1alpha1/cluster_types.go Outdated Show resolved Hide resolved

sbocinec reviewed Oct 9, 2024

View reviewed changes

operator/api/vectorized/v1alpha1/cluster_types.go Outdated Show resolved Hide resolved

sbocinec reviewed Oct 9, 2024

View reviewed changes

operator/api/vectorized/v1alpha1/cluster_types.go Outdated Show resolved Hide resolved

sbocinec reviewed Oct 9, 2024

View reviewed changes

operator/api/vectorized/v1alpha1/cluster_webhook.go Show resolved Hide resolved

birdayz force-pushed the jb/nodepools branch from 981120a to c5056c3 Compare November 6, 2024 13:38

operator v1: remove obsolete comment/log

2b15d46

chrisseto reviewed Nov 7, 2024

View reviewed changes

chrisseto changed the title ~~[DO NOT MERGE YET] Add NodePool support to Cluster resource~~ Add NodePool support to Cluster resource Nov 7, 2024

birdayz added 6 commits November 8, 2024 09:00

operator v1: rename SumNodePoolReplicas to GetDesiredReplicas

1542f9b

The function is about desired replicas. Rename it to make this clear.

operator v1: change wording from primary to default

ad85894

the correct term is "default".

operator v1: add comment about package cycle

1db9f76

operator v1: remove extra newline

722a9fe

operator v1: add comment

f963386

operator v1: stop using status for computing seed addresses

7fce0e8

it's not required to use status anymore, as GetNodePools contains also deleted pools.

birdayz force-pushed the jb/nodepools branch from 7c8f61e to 19f59aa Compare November 8, 2024 09:19

birdayz added 4 commits November 8, 2024 12:35

operator v1: refactor getDecommissioningPod

cdc53ac

- Change param to non-pointer, do conversion from pointer at the caller's site, where they already checked for nil - Use loop to simplify

operator v1: remove extra newline

f066d80

operator v1: change PodLabelNodeIDKey to PodNodeIDKey

52eccd6

- The term Label is obsolete, as it's in the labels package. - Update comment

operator v1: update comment

1c1b057

birdayz force-pushed the jb/nodepools branch from 19f59aa to 1c1b057 Compare November 8, 2024 11:36

birdayz added 2 commits November 11, 2024 14:36

operator v1: fix nodepools test assertions

83ba5e2

operator v1: add KUTTL test for deleting a nodepool

9e08973

birdayz force-pushed the jb/nodepools branch 3 times, most recently from ab001c3 to 9e08973 Compare November 13, 2024 16:08

chrisseto approved these changes Nov 13, 2024

View reviewed changes

operator v1: improve readability of GetNodePools

45b5f57

birdayz merged commit 385cee7 into main Nov 15, 2024
5 checks passed

RafalKorepta deleted the jb/nodepools branch December 2, 2024 13:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NodePool support to Cluster resource #188

Add NodePool support to Cluster resource #188

birdayz commented Aug 15, 2024 •

edited

Loading

CLAassistant commented Aug 15, 2024 •

edited

Loading

birdayz commented Nov 7, 2024

chrisseto Nov 7, 2024

birdayz Nov 8, 2024

chrisseto Nov 7, 2024

birdayz Nov 8, 2024

chrisseto Nov 7, 2024

chrisseto Nov 7, 2024

chrisseto Nov 7, 2024

birdayz Nov 8, 2024

chrisseto Nov 7, 2024

birdayz Nov 8, 2024

chrisseto Nov 7, 2024

birdayz Nov 8, 2024

chrisseto commented Nov 7, 2024

birdayz commented Nov 8, 2024 •

edited

Loading

birdayz commented Nov 12, 2024

chrisseto left a comment •

edited

Loading

chrisseto Nov 13, 2024

chrisseto Nov 13, 2024

birdayz Nov 14, 2024 •

edited

Loading

birdayz commented Nov 14, 2024 •

edited

Loading

	// This contains the primary NodePool (driven by spec.replicas for example), and the
	// This contains the default NodePool (driven by spec.replicas for example), and the

	// NodePoolKey is used to document the node pool associated with the StatefulSet.
	// NodePoolKey is used to denote the node pool associated with the StatefulSet.

Add NodePool support to Cluster resource #188

Add NodePool support to Cluster resource #188

Conversation

birdayz commented Aug 15, 2024 • edited Loading

Operator v1: support for multiple Node Pools

CLAassistant commented Aug 15, 2024 • edited Loading

birdayz commented Nov 7, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

chrisseto commented Nov 7, 2024

birdayz commented Nov 8, 2024 • edited Loading

birdayz commented Nov 12, 2024

chrisseto left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

birdayz Nov 14, 2024 • edited Loading

Choose a reason for hiding this comment

birdayz commented Nov 14, 2024 • edited Loading

birdayz commented Aug 15, 2024 •

edited

Loading

CLAassistant commented Aug 15, 2024 •

edited

Loading

birdayz commented Nov 8, 2024 •

edited

Loading

chrisseto left a comment •

edited

Loading

birdayz Nov 14, 2024 •

edited

Loading

birdayz commented Nov 14, 2024 •

edited

Loading