Skip to content
This repository has been archived by the owner on Apr 4, 2023. It is now read-only.

Delete left over pilots #328

Closed
wants to merge 1 commit into from
Closed

Conversation

wallrj
Copy link
Member

@wallrj wallrj commented Apr 11, 2018

Contrary to the original idea of creating pilots during scale-out and deleting them during scale-in,
I've instead extended the existing pilot sync method to delete pilots with an index higher than the Replicas value of the corresponding StatefulSet.

Fixes: #322

Release note:

NONE

@jetstack-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To fully approve this pull request, please assign additional approvers.
We suggest the following additional approver: kragniz

Assign the PR to them by writing /assign @kragniz in a comment when ready.

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@wallrj
Copy link
Member Author

wallrj commented Apr 12, 2018

/retest

@munnerz
Copy link
Contributor

munnerz commented Apr 17, 2018

I'm not sure we should be using currentReplicas here:

FIELD: currentReplicas <integer>

DESCRIPTION:
     currentReplicas is the number of Pods created by the StatefulSet controller
     from the StatefulSet version indicated by currentRevision.

During an update of the statefulset, this number will represent the total number of out of date replicas (i.e. it will gradually reduce from N to 0 during the upgrade, and then once the upgrade is complete will reset back to N).

FWIW, here are the various StatefulSet.status fields:

FIELDS:
   collisionCount	<integer>
     collisionCount is the count of hash collisions for the StatefulSet. The
     StatefulSet controller uses this field as a collision avoidance mechanism
     when it needs to create the name for the newest ControllerRevision.

   conditions	<[]Object>
     Represents the latest available observations of a statefulset's current
     state.

   currentReplicas	<integer>
     currentReplicas is the number of Pods created by the StatefulSet controller
     from the StatefulSet version indicated by currentRevision.

   currentRevision	<string>
     currentRevision, if not empty, indicates the version of the StatefulSet
     used to generate Pods in the sequence [0,currentReplicas).

   observedGeneration	<integer>
     observedGeneration is the most recent generation observed for this
     StatefulSet. It corresponds to the StatefulSet's generation, which is
     updated on mutation by the API Server.

   readyReplicas	<integer>
     readyReplicas is the number of Pods created by the StatefulSet controller
     that have a Ready Condition.

   replicas	<integer> -required-
     replicas is the number of Pods created by the StatefulSet controller.

   updateRevision	<string>
     updateRevision, if not empty, indicates the version of the StatefulSet used
     to generate Pods in the sequence [replicas-updatedReplicas,replicas)

   updatedReplicas	<integer>
     updatedReplicas is the number of Pods created by the StatefulSet controller
     from the StatefulSet version indicated by updateRevision.

I'd guess either replicas, or currentReplicas+updatedReplicas is what we need to use?

We could alternatively always base it on the statefulset.spec.replicas field - as once the StatefulSet has been scaled down, the cluster itself should already be in a state whereby it can scale down (i.e. all data evacuated from the node). Perhaps I've missed an edge case here though?

Copy link
Contributor

@munnerz munnerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I'd left this review sitting around in my browser cache/github somewhere!

return err
return errors.Wrap(err, "unable to create cluster nodepools selector")
}
pods, err := c.pods.Pods(cluster.Namespace).List(selector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need a list of pods if we're just creating a Pilot per StatefulSet replica? Can we not infer the name of each pod from the name of the StatefulSet and its index?

err, "unable to delete pilot %s/%s", pilot.Namespace, pilot.Name,
)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this function.

First, we get the statefulset. Then we list all pods managed by that set and create Pilots for each of them.
We then list all Pilots for the StatefulSet, and iterate over them, determining whether they should or shouldn't exist and deleting them.

Instead, we could create a function getExpectedPilotsForStatefulSet which could return a []string of Pilots that are expected to exist.

We could then get a list of all existing pilots for the StatefulSet, 'diff' the two lists into a 'to delete' and 'to create', and then go ahead and delete/create the Pilots.

I've done a similar thing in cert-manager: https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/prepare.go#L107-L108

Here, we first run a function that creates a list of challenges that should exist. We then pass that list to a cleanup function, which deletes any challenges that do exist but are not expected to exist anymore.

The later code then goes ahead and creates/realises those challenges (i.e. the 'pilot creation' part).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I'd over complicated it.
Was trying to imagine (and handle) a case where the statefulset had been scaled in but the pods hadn't yet been deleted but the pilot had been accidentally deleted.
I wanted to re-create the pilot in that case, so that the pod (c* node) could discover its desired configuration and decommission and remove itself from the cluster.
But discussing with @kragniz and I now realise that the statefulset.spec.replicas value doesn't change until C* node / Pod / Pilot has been decomissioned and reported that fact via the pilot status.

I've re-implemented as you suggested, as a couple of lists (actual / expected pilots) and a couple of sets (pilotsToCreate / pilotsToDelete).

See what you think.

glog.V(4).Infof(
"Not deleting pilot %s/%s because its pod still exists.",
pilot.Namespace, pilot.Name,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you collapse these sorts of things into single lines. My own text editor does line wrapping the works for me, and this makes it harder to quickly scan code for what is going on. What is actually a log message looks like an actual block of code when written like this and indented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but my editor doesn't wrap these lines (nor does Github in its web based diffs).
I find it much easier to read the code this way.

Pilots(cluster.Namespace).Delete(pilot.Name, &metav1.DeleteOptions{})
if err == nil {
glog.V(4).Infof("Deleted pilot %s/%s.", pilot.Namespace, pilot.Name)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if/else is a pattern avoided in Go (and linters will warn you about doing it).

This could be rewritten to:

if err != nil {
    if k8sErrors.IsNotFound(err) {
        continue
    }
    return errors.Wrapf(err, "unable to delete pilot %s/%s", pilot.Namespace, pilot.Name)
}

glog.V(4).Infof("Deleted pilot %s/%s.", pilot.Namespace, pilot.Name)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Done.

return nil
}

func parsePodIndex(podName string) (int, bool) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this return (int, bool)? int, error makes sense (parsing the string may fail) - but I can't work out how the bool is used.

A comment to explain this function's behaviour would also help 😄

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see the bool means 'found' - an error would would make more sense here, and would allow you to provide more context on why something failed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this function, it's no longer needed.

"k8s.io/api/core/v1"

metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"

"github.com/kr/pretty"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you fix up grouping of imports here?

import (
//stdlib

//external pkgs

//navigator pkgs
)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems a bit nit picky 😉
My editor isn't set up to automatically format the code that way.
If it's important, then suggest we make it part of verify-lint.sh

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

But it still doesn't format the imports as we want them.

There's a discussion of this here: golang/go#20818

Also tried https://github.com/aristanetworks/goarista/tree/master/cmd/importsort, but that sorts them differently again.

I've fixed it by hand ☹️

cluster2pod1 := clusterPod(cluster2, "c2p1")
cluster2np1 := &cluster2.Spec.NodePools[0]
cluster2np1set := nodepool.StatefulSetForCluster(cluster2, cluster2np1)
cluster2np1pod0 := clusterPod(cluster2, cluster2np1set, 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be great if we can avoid such a huge stack of objects declared here. It's really difficult when looking at a particular test case to see what's being tested (i.e. the inputs and outputs).

Not sure if there's an easy way to do this, but right now I am definitely struggling to understand all this.

I managed to get around this in cert-manager by accepting a pointer to the fixture itself as part of a 'PreFn': https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/http/ingress_test.go#L21-L54

If not, comments throughout all of this may help make this more readable 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the revised implementation I found I could remove many of these.
I tried to make the variable names self-explanatory.
Not sure what more I can add with comments.

v1alpha1.CassandraNodePoolNameLabel,
selection.Exists,
nil,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flatten to a single line

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also why is this being set to nil? Doesn't make sense to me? 🤔

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exists selections can not have a value.

@@ -66,6 +68,35 @@ func SelectorForCluster(c *v1alpha1.CassandraCluster) (labels.Selector, error) {
return labels.NewSelector().Add(*clusterNameReq), nil
}

func SelectorForClusterNodePools(c *v1alpha1.CassandraCluster, nodePoolNames ...string) (labels.Selector, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Brief comment would be great

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

For each nodepool StatefulSet:
* Create a number of pilots to match the number of pods that will be created for the StatefulSet.
* Delete higher index pilots which have been left behind after the statefulset has been scaled in.
* Do not delete a pilot if there is a pod with a matching name,
  (that pod won't be able to decommission its self
   unless it can read its desired configuration from its pilot)
* Do not delete a pilot unless it is owned by the cluster that is being synchronised.
  (this is not an expected state,
   but we don't want to delete anything unless it was created by us)

Fixes: jetstack#322
@wallrj
Copy link
Member Author

wallrj commented Apr 27, 2018

/retest

Copy link
Contributor

@munnerz munnerz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be good to see the API surface for this use annotations that aren't so cassandra specific. I think general creation of Pilots for pods should be fairly generic between implementations, else administrators will have to understand behaviour for each controllers implementations when debugging problems.

Perhaps this could be refactored to have some form of watch on pods with a create-pilot label to trigger creation of a Pilot resource?

We can include all the existing create/delete logic, but have it triggered via watches on pods (i.e. its workqueue should be filled with Pod resources)

Then, each controller implementation (elasticsearch or cassandra) can just update the Pilot with any specific configuration that's required.

To do this, we'll need to somehow access the replicas field of the pod-owning resource (e.g. StatefulSet). If we expand beyond StatefulSets in future, the logic here can be extended to use either the 'scale' subresource or some other kind of polymorphic client.

  • If a pod is created with the required label, a blank Pilot resource is 'ensured' (i.e. create if does not exist)
  • If a Pilot is deleted, trigger a resync of the pod it was created for

This means we create Pilots when Pods are created, and we delete them when Pods are deleted (provided their index < the replicas count on the owner). We can then also consider adding a 5m delay or similar on deletion, if required (which would apply universally across controllers)

CassandraNodePoolNameLabel = "navigator.jetstack.io/cassandra-node-pool-name"
CassandraClusterNameLabel = "navigator.jetstack.io/cassandra-cluster-name"
CassandraNodePoolNameLabel = "navigator.jetstack.io/cassandra-node-pool-name"
CassandraNodePoolIndexLabel = "navigator.jetstack.io/cassandra-node-pool-index"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to be set, but not actually used (at least as part of this PR).

It also could introduce some issues around generalising this code between controllers if we ever have a controller that doesn't use a StatefulSet.

Can we remove it for now in favour of discussion at a later date of storing the index vs the pod name on the Pilot?

wallrj added a commit to wallrj/navigator that referenced this pull request May 15, 2018
* Watches for Pilots and deletes them if there isn't a corresponding Pod.
* Watches for Pods and creates a Pilot if there isn't already one.
* Creates blank pilots which are later updated by database specific controllers.

Fixes: jetstack#328
wallrj added a commit to wallrj/navigator that referenced this pull request May 15, 2018
* Watches for Pilots and deletes them if there isn't a corresponding Pod.
* Watches for Pods and creates a Pilot if there isn't already one.
* Creates blank pilots which are later updated by database specific controllers.

Fixes: jetstack#328
wallrj added a commit to wallrj/navigator that referenced this pull request Jun 7, 2018
* Watches for Pods and creates a Pilot if there isn't already one.
* Pilots have a pod as there controller reference.
* Garbage collector will delete the pilot when it deletes the pod.
* Pilots are later updated by database specific controllers.

Fixes: jetstack#328
wallrj added a commit to wallrj/navigator that referenced this pull request Jun 7, 2018
* Watches for Pods and creates a Pilot if there isn't already one.
* Pilots have a pod as there controller reference.
* Garbage collector will delete the pilot when it deletes the pod.
* Pilots are later updated by database specific controllers.

Fixes: jetstack#328
wallrj added a commit to wallrj/navigator that referenced this pull request Jun 7, 2018
* Watches for Pods and creates a Pilot if there isn't already one.
* Pilots have a pod as there controller reference.
* Garbage collector will delete the pilot when it deletes the pod.
* Pilots are later updated by database specific controllers.

Fixes: jetstack#328
@wallrj
Copy link
Member Author

wallrj commented Jun 7, 2018

Supplanted by #353

@wallrj wallrj closed this Jun 7, 2018
@wallrj wallrj deleted the 322-pilot-sync branch June 7, 2018 15:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants