Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Annotate StrimziPodSets before rolling during CA key replacement #10876

Merged
merged 2 commits into from
Nov 25, 2024

Conversation

katheris
Copy link
Contributor

@katheris katheris commented Nov 25, 2024

Type of change

  • Enhancement / new feature

Description

This PR contains two related changes:

  1. Annotate StrimziPodSets before rolling during CA key replacement
  2. Use key generation annotation to track user managed key replacements

1
This change means that when the Pods are rolled for a new CA key the generation is updated immediately, rather than being done later by KafkaReconciler. This means if for some reason the rolling update stops half way through on the next reconciliation any Pods that were already rolled are not re-rolled.

2
Update CaReconciler to use the ca key generation on the Kafka pods to track when a user has incremented the key generation for user managed Cluster CA. We already use this mechanism when Strimzi is managing the Cluster CA to catch when the key was replaced in a previous reconciliation. This change means we use the same mechanism for both Strimzi managed and user managed CA.

This has the additional implication that when the user renewes the CA cert of a user managed Cluster CA there is only one rolling update (to use the new certificates), not two. So it removes an unnecessary rolling update in that scenario.

Checklist

Please go through this checklist and make sure all applicable tasks have been done

  • Make sure all tests pass
  • Try your changes from Pod inside your Kubernetes and OpenShift cluster, not just locally
  • Update CHANGELOG.md

Update CaReconciler to use the ca key generation on
the Kafka pods to track when a user has incremented the
key generation for user managed Cluster CA.

This has the additional implication that when the user
renewes the CA cert of a user managed Cluster CA there
is only one rolling update, not two.

Signed-off-by: Katherine Stanley <[email protected]>
@katheris katheris added this to the 0.45.0 milestone Nov 25, 2024
@scholzj
Copy link
Member

scholzj commented Nov 25, 2024

/azp run regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@katheris katheris requested a review from ppatierno November 25, 2024 13:22
return podRollReasons;
int clusterCaKeyGeneration = clusterCa.caKeyGeneration();
int podClusterCaKeyGeneration = Annotations.intAnnotation(pod, Ca.ANNO_STRIMZI_IO_CLUSTER_CA_KEY_GENERATION, clusterCaKeyGeneration);
if (clusterCaKeyGeneration == podClusterCaKeyGeneration) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity ... in this case we are just ignoring the podRollReasons. Did you check that it's empty or there is some code which is wrong somewhere which pass a non empty podRollReasons (so the pods should be rolled) but we decide to not roll them for, anyway, a good reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppatierno I'm not sure I fully follow the question, but will try my best to add more context.

If any of the pods don't have the correct cluster CA key generation annotation we call this method to rollKafka. However, we pass through to KafkaRoller all the pods, not just the pods that need to be rolled. So this code is basically skipping over the pods that are already up to date.

The alternative would be updating this class to build the list of pods that need rolling when we are checking the annotations in verifyClusterCaFullyTrustedAndUsed, however that would require a bigger refactor to this code so I chose this option instead. Also I think in general in Strimzi it's the convention to pass all nodes to KafkaRoller and then use this callback to specify which need to actually be rolled

Copy link
Contributor Author

@katheris katheris Nov 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could change this to have:

if (podRollReasons.getReasons().size() == 1 && podRollReasons.contains(RestartReason.CLUSTER_CA_CERT_KEY_REPLACED) && clusterCaKeyGeneration == podClusterCaKeyGeneration) {
  return RestartReasons.empty();
} else {
  return podRollReasons
}

but since the first two parts will always be true at the moment it seems maybe overkill

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my question was mostly around ignoring podRollReasons in the first condition. If you think adding the additional check is overkilling, I would add a comment at least.

@scholzj
Copy link
Member

scholzj commented Nov 25, 2024

/azp run zookeeper-regression

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link
Member

@scholzj scholzj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks,

@scholzj scholzj merged commit 6ef088d into strimzi:main Nov 25, 2024
29 checks passed
OwenCorrigan76 pushed a commit to OwenCorrigan76/strimzi-kafka-operator that referenced this pull request Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants