-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor EIP Operator #348
Conversation
a69caab
to
f659285
Compare
- transition eip/pod watchers to mostly adjust tags - update logic of eip controller to: 1. validate current claim or remove 2. detatch if attatched and unclaimed 3. find a pod/node to claim 4. attach to node or pod node 5. update status to refect current state
f659285
to
c8a9a95
Compare
a555827
to
e13d38a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get current status of EIP.
Get matching pods or nodes.
Sort list for consistency.
Choose single matching pod or node.
If associated, check if EIP is associated with matched pod or node. If not disassociate.
If not associated, associate with matched pod or node.
Update claim.
.map(|pod| { pod.metadata.name.clone().unwrap_or_default() }) | ||
.collect::<Vec<String>>() | ||
.join(",") | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be unreachable, as kubernetes does not allow multiple pods with the same name.
9e95f6c
to
34e5be5
Compare
@@ -20,9 +18,10 @@ skip = [ | |||
{ name = "hashbrown", version = "0.12.3" }, | |||
{ name = "hashbrown", version = "0.14.0" }, | |||
{ name = "nix", version = "0.26.4" }, | |||
{ name = "nix", version = "0.27.1" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we continue to add all versions of duplicate deps here? it makes it easier to track when things change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can but cargo deny was yelling at me, maybe I need to adjust my configuration?
{ name = "ordered-float", version = "3.4.0" }, | ||
{ name = "fastrand", version = "2.0.1" }, | ||
{ name = "regex-automata", version = "0.4.6" }, | ||
{ name = "regex-syntax", version = "0.6.29" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm fairly sure we can get rid of these duplicates with some pretty easy selective downgrades, which i think would be preferable if possible
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's definitely possible, I spent like 2 hours fighting with versions and updating code after updates. I don't really care to hop down that rabbit hole again.
eip_operator/src/controller/eip.rs
Outdated
// get potential attachments | ||
let matched_pods: Vec<Pod> = if eip.selects_with_pod() { | ||
pod_api | ||
.list(&ListParams::default()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think this is true - manage=true means that the eip operator's pod operator should auto-create the eip, but if we just manually create an Eip resource, it should be able to attach to any pod, even without that label
eip_operator/src/controller/eip.rs
Outdated
} | ||
|
||
// Find new claim, it's possible it'll match the existing | ||
// association, in which case, we won't disassociate |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this comment still true? it looks like the disassociation logic is already handled separately above
let mut associatiate_with_node: Option<Node> = None; | ||
if status.claim.is_none() { | ||
match eip.spec.selector { | ||
EipSelector::Node { selector: _ } => match matched_nodes.len() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if matched_nodes.is_empty() {
maybe?
dfe117d
to
c77400b
Compare
); | ||
} | ||
(None, None) => { | ||
info!("Eip {} is correctly associated!", eip.name_unchecked()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
info!("Eip {} is correctly associated!", eip.name_unchecked()); | |
info!("Eip {} is correctly not associated!", eip.name_unchecked()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the node and ip are none then we don't want to modify our association or create a new association.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is correct that it is not associated. The phrasing of the log message makes it seem like it is associated and that that is correct, which isn't what we are trying to indicate. We are trying to indicate both that it is not associated, and that that is the correct state.
} | ||
} | ||
false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, AWS is the source of truth for where things are associated, not the status field. The status field should only be a nice reference for the humans when looking at k9s, not part of the core decision logic.
This is especially true when you are making a clone of the status and making changes along the way, as it is easy to get confused about what the real state is vs the one stored on the EIP object, which hasn't been updated.
I'm pretty sure everywhere you use these functions, you have already described allocations, so you would not need to make any additional network calls to use the AWS values, rather than the status.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
matches_pod and matches_node, and are not used in the eip controller where we're using a cloned status... In the pod/node controller cases we don't actually have the AWS resource (except in the pod case to do the dns annotation in a very specific scenario). associated_with_node is used but only during disassociate which makes no sense to come after associate and should always use the eip objects status never a cloned status.
If we want to use AWS as the source of truth in the pod/node case we'll need to fetch every eip that matches the reconciled resource to check if we're associated with it. I don't believe that is a required change.
eip_operator/src/controller/pod.rs
Outdated
add_dns_target_annotation(&pod_api, name, &public_ip, allocation_id).await?; | ||
return Ok(None); | ||
} | ||
} else if let Some(eip) = eips.iter().find(|eip| eip.matches_pod_name(&pod.name_unchecked()) && !eip.attached()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why && !eip.attached()
? Don't we want it to re-reconcile if it is attached to the wrong pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we want to steal EIPs, in fact that's the entire reason we're in the predicament of needing this rewrite... there's probably an argument for pods that if one is matched and attached to something else we should take it, but that really only works because we only match on name for pods, if we matched on labels we wouldn't be able to make that assumption.
eip_operator/src/controller/pod.rs
Outdated
.ok_or(Error::MissingAddresses)? | ||
.swap_remove(0); | ||
let public_ip = eip_description.public_ip.ok_or(Error::MissingPublicIp)?; | ||
add_dns_target_annotation(&pod_api, name, &public_ip, allocation_id).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@alex-hunt-materialize
There is a previously unhandled case here where add_dns_target_annotation
only occurs when the EIP is added to a pod, if the eip no longer matches a the pod it is associated with we did not remove this annotation. My inclination is to let this one slip in this PR. I don't think it's likely we encounter this since we're not modifying EIPs this way and likely don't need to support pods much longer.
229c6b2
to
88560a2
Compare
Side note here: |
- remove claims
88560a2
to
e7b9df0
Compare
.ok_or(Error::MissingAddresses)? | ||
.swap_remove(0); | ||
let public_ip = eip_description.public_ip.ok_or(Error::MissingPublicIp)?; | ||
add_dns_target_annotation(&pod_api, name, &public_ip, allocation_id).await?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add this target annotation when we associate the EIP with the Pod? We could then check for the annotation when disassociating, and remove it if it matches the EIP address.
// we are not associated | ||
if let Some(eip) = eips | ||
.iter() | ||
.find(|eip| eip.matches_pod_name(&pod.name_unchecked()) && !eip.attached()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder about the !eip.attached()
part. EIPs with pod matchers may only match a single pod, so if it is attached, but not associated with this pod, that means it is incorrectly associated and should be re-reconciled.
Refactor EIP Operator to better handle multiple selector matches
I believe this change set represents the changes discussed here:
Concerns:
Testing:
This passed tests on the first test run which is either very surprising or an indicating that the things I broke our outside our local test scope. Either way I'd like to test this on a personal MZ stack, I'll reach out for some pointers and update this with results.
** update **
I've tested this on a personal stack.
I believe I've also handled the transition case to this new method.
To test this I patched the eip resource status to set claims to None, and ensured reconciliation of the EIP correctly maintained the association and only added the new claim value to the status.
Any happy to do any other recommended testing.