-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
policy: Serve EgressNetwork responses #13206
Conversation
Signed-off-by: Zahari Dichev <[email protected]>
@@ -33,7 +33,7 @@ impl Collector for Instrumented { | |||
None, | |||
MetricType::Gauge, | |||
)?; | |||
let service_infos = ConstGauge::new(this.service_info.len() as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will come back and address metrics correctness in a follow-up if that is a non-blocker for now.
pub enum Kind { | ||
EgressNetwork { | ||
original_dst: SocketAddr, | ||
traffic_policy: TrafficPolicy, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm surprised to see TrafficPolicy here in the discover target. I think this means that the TrafficPolicy is looked up once at the beginning of the discovery lookup but then never updated. If the EgressNetwork resource is edited to change the TrafficPolicy, I don't think the discovery watches will see that change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This actually leads to a more difficult question about EgressNetwork mutability and ip address.
For services, a service's IP address will never change for the life of that services which means we can look up the service by IP once at the beginning of the discovery stream and never again.
For EgressNetworks, on the other hand, they encompass a range of IP addresses and that range can change if the EgressNetwork resource is edited. What happens when a discovery stream is started on an IP address that is in an EgressNetwork, but then the EgressNetwork is edited to no longer include that IP in it's network range? Or vice versa, what if an EgressNetwork is edited to include an IP address that was used for an already started discovery stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It really seems like we need to take a different approach for EgressNetworks compared to the approach we have for services in the outbound index. Perhaps we can take advantage of the fact that cluster networks are immutable. When a discovery lookup comes in, we can look at if it's in the cluster network and send it to the service index or if it's out of the cluster network, then we can send it to the egress index.
The egress index will likely need to have watches indexed by ip address so that we can update the appropriate watches whenever an EgressNetwork resource changes and may encompass watches that it did not before or stop encompassing watches that it used to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good points.. Will think about it a bit more.
// if we have no explicit routes attached to the parent, always attempt protocol detection | ||
if no_explicit_routes(&outbound) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this compare to the current approach of just starting with the most specific route type and falling back to less and less specific? ie if there are grpcRoutes, use those, else if there are httproutes, use those, else if there are tls routes, use those, else if there are tcp routes, use those, ELSE do http detect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. My approach is essentially the same but more complicated. Will incorporate your suggestion
fn timestamp_then_name<LM, LR, RM, RR>( | ||
(left_id, left_route): &(GroupKindNamespaceName, OutboundRoute<LM, LR>), | ||
(right_id, right_route): &(GroupKindNamespaceName, OutboundRoute<RM, RR>), | ||
fn timestamp_then_name<L: Route, R: Route>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think now we never compare routes of different types, so we can simplify this to only use one type parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will enforce, at the type level, that routes cannot be mixed.
Signed-off-by: Zahari Dichev <[email protected]>
bf26d69
to
4144b19
Compare
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
} | ||
|
||
fn update_resource( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function needs to accept the traffic policy so that it can update its ParentMeta if the traffic policy changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, will add a test as well
pub enum OutboundPolicyKind { | ||
Fallback(SocketAddr), | ||
Resource(ResourceOutboundPolicy), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me a long time to fully understand the mapping of OutboundPolicy to OutboundPolicyKind and then finally to proxy-api protobuf. It seems to me that OutboundPolicy + orig_dst already contains enough information to construct the proxy-api response (since the OutboundPolicy contains ParentMeta which stores TrafficPolicy).
Can we simplify this all by skipping the step of transforming OutboundPolicy into OutboundPolicyKind and instead just directly transform OutboundPolicy into proxy-api response? This will require passing in the orig_dst as supplemental information, but I think that's okay.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To put this another way, OutboundPolicyKind::Fallback and OutboundPolicyKind::Resource will never be co-mingled in a single stream. Fallback will only be created if the OutboundDiscoverTarget is Fallback and Resource will only be created if the OutboundDiscoverTarget is Resource. So I think these can be total separate codepaths.
// ParentMeta carries information resource-specific | ||
// information about the parent to which outbound policy | ||
// is associated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand correctly, the purpose of this is that it's the information necessary to construct the default backend (minus the orig_dst since that needs to be filled in outside of the index). It might be helpful to have a comment explaining that that is it's purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might consider calling this something other than "Meta" too since, metadata makes me think of the resource metadata (i.e. group, kind, name)
#[derive(Clone, Debug)] | ||
pub enum OutboundDiscoverTarget { | ||
Resource(ResourceTarget), | ||
Fallback(SocketAddr), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A name like "External" might be more descriptive than Fallback.
}), | ||
|
||
protocol: Some(outbound::ProxyProtocol { | ||
kind: Some(outbound::proxy_protocol::Kind::Opaque( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This means that for external targets that are not covered by an EgressNetwork, we'll return Opaque.
Does this differ from the current behavior? I assume that today we return NotFound for these and then the proxy uses the profile resolution. Does this amount to being Detect? Should we return Detect here instead of Opaque?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what happens today is:
- for addresses that we cannot find here and are potentially in the cluster we return NotFound. The proxy may then use profiles information if available. So this behavior is preserved with this logic.
- for addresses that are outside of the cluster we return NotFound and the proxy will not be able to find a profile. So what happens is that we will synthesize a default Detect policy that uses forward backends. So here we need to do the same as opposed to returning an opaque.
So you are right. Good catch. For context, here is where we do the synthesizing in the proxy: https://github.com/linkerd/linkerd2-proxy/blob/ef779d09191713c5f25682fc7a61212bb6752d21/linkerd/app/outbound/src/discover.rs#L144
), | ||
}; | ||
|
||
// This encoder sets deprecated timeouts for older proxies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is accurate for TLSRoute
), | ||
}; | ||
|
||
// This encoder sets deprecated timeouts for older proxies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this comment is accurate for TCPRoute
// holds information about resources. currently EgressNetworks and Services | ||
resource_info: HashMap<ResourceRef, ResourceInfo>, | ||
cluster_networks: Vec<linkerd_k8s_api::Cidr>, | ||
fallback_polcy_tx: watch::Sender<()>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth some comments on the semantics of this Sender and what it's purpose is.
policy-test/src/lib.rs
Outdated
@@ -435,7 +435,7 @@ where | |||
drop(_tracing); | |||
} | |||
|
|||
if std::env::var("POLICY_TEST_NO_CLEANUP").is_err() { | |||
if std::env::var("POLICY_TEST_NO_CLEANUP").is_ok() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this right? it seems backwards. POLICY_TEST_NO_CLEANUP being ok means we should NOT delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leftover
@@ -26,6 +26,7 @@ async fn service_does_not_exist() { | |||
let mut policy_api = grpc::OutboundPolicyClient::port_forwarded(&client).await; | |||
let rsp = policy_api.watch(&ns, &svc, 4191).await; | |||
|
|||
println!("{:?}", rsp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray println
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
/// specific metadata that is used when putting together the final | ||
/// policy response. | ||
#[derive(Clone, Debug, PartialEq)] | ||
pub enum ResourceOutboundPolicy { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this type is unused and can be removed.
// This encoder sets deprecated timeouts for older proxies. | ||
#![allow(deprecated)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this can be removed if we use ..Default::default()
to fill in default values for request_timeout instead of setting to None explicitly.
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
Signed-off-by: Zahari Dichev <[email protected]>
This PR adds a few notable changes associated with the egress functionality of Linkerd:
EgressNetwork
objects are indexed into the outbound indexip:port
combinationTCPRoute
,TLSRoute
,GRPCRoute
andHTTPRoute
attachments are reflected for bothEgressNetwork
andService
targetsEgressNetwork
is honored by returning the appropriate default (failure/success) routes for all protocolsNote that this PR depends on an unreleased version of the linkerd2-proxy-api repo.
Signed-off-by: Zahari Dichev [email protected]