-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhanced gateway api topology #751
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #751 +/- ##
==========================================
+ Coverage 80.20% 82.90% +2.69%
==========================================
Files 64 77 +13
Lines 4492 6078 +1586
==========================================
+ Hits 3603 5039 +1436
- Misses 600 690 +90
- Partials 289 349 +60
Flags with carried forward coverage won't be shown. Click here to find out more.
|
if route == nil { | ||
return nil | ||
} | ||
return utils.Filter(route.Spec.ParentRefs, IsParentGateway) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been using github.com/samber/lo
for this kind of thing. It's quite good IMHO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not see any reason to change when utils.Filter()
does what it is expected to do. Anything I could miss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
github.com/samber/lo
is immensely popular and IMO very good at this kind of generic widely used functions. Wherever we can leverage something like this from the community – provided it suits our needs of course – instead of keeping our own home-made version of the same thing, I personally would be in favour of.
At the same time, I don't want to create a fuss. I understand this is not the main goal of this PR. Just something to think about.
@@ -37,7 +39,8 @@ func ApplyOverrides(topology *kuadrantgatewayapi.Topology, gateway *gatewayapiv1 | |||
|
|||
for _, route := range topology.Routes() { | |||
if !slices.Contains(kuadrantgatewayapi.GetRouteAcceptedGatewayParentKeys(route.HTTPRoute), client.ObjectKeyFromObject(gateway)) { | |||
overriddenPolicies = append(overriddenPolicies, route.AttachedPolicies()...) | |||
routePolicies := utils.Map(route.AttachedPolicies(), func(pNode kuadrantgatewayapi.PolicyNode) kuadrantgatewayapi.Policy { return pNode.Policy }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid adding another dependency:
routePolicies := utils.Map(route.AttachedPolicies(), func(pNode kuadrantgatewayapi.PolicyNode) kuadrantgatewayapi.Policy { return pNode.Policy }) | |
routePolicies := lo.Map(route.AttachedPolicies(), func(pNode kuadrantgatewayapi.PolicyNode, _ int) kuadrantgatewayapi.Policy { return pNode.Policy }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as #751 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe, for other PR, we could remove all utils.Map
and utils.Filter
and rewrite using github.com/samber/lo (or something else). Then we can remove utils
code and have the external dependency only.
} | ||
|
||
// TopologyOpts allows to manipulate topologyOptions. | ||
type TopologyOpts func(*topologyOptions) | ||
|
||
func WithAcceptedRoutesLinkedOnly() TopologyOpts { | ||
return func(o *topologyOptions) { | ||
o.linkAcceptedRoutesOnly = true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we don't ever want non-accepted routes in the topology as we may occasionally want non-programmed gateways? I actually wonder what would be the use case for adding non-programmed gateways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some use cases I found that non-accepted routes should be linked to parent gateways: mappers. It allows getting events for "disconnected" policies, which status can be affected from the "disconnection".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually wonder what would be the use case for adding non-programmed gateways.
this is a use case #751 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are some use cases I found that non-accepted routes should be linked to parent gateways
So why not giving the option here then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry. To clarify...
My point is that we have WithProgrammedGatewaysOnly(bool)
for when we want to enforce programmed gateways only, i.e. WithProgrammedGatewaysOnly(true)
. I don't see why one would use WithProgrammedGatewaysOnly(false)
though – versus simply not calling WithProgrammedGatewaysOnly
at all.
I guess that was your thinking for WithAcceptedRoutesLinkedOnly
. Nobody will ever need WithAcceptedRoutesLinkedOnly(false)
. The effect in this case is virtually the same as not using WithAcceptedRoutesLinkedOnly
, right?
So why not designing WithProgrammedGatewaysOnly
the same way? If there's a reason I struggling to see, then there may be for WithAcceptedRoutesLinkedOnly
too perhaps?
IOW, why are the signatures of these 2 functions different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes all the sense. Fixing it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -200,6 +200,8 @@ func (r *RateLimitPolicyEnforcedStatusReconciler) buildTopology(ctx context.Cont | |||
} | |||
|
|||
return kuadrantgatewayapi.NewTopology( | |||
kuadrantgatewayapi.WithAcceptedRoutesLinkedOnly(), | |||
kuadrantgatewayapi.WithProgrammedGatewaysOnly(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand a gateway that transitions from Programmed: True
to Programmed: False
will NOT get the statuses of its affecting policies updated then?
I'm thinking this line invoked for a gateway that has not been added to the topology:
policies := indexes.PoliciesFromGateway(gw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
b654bca
to
3b95992
Compare
controllers/authpolicy_status.go
Outdated
@@ -169,6 +169,8 @@ func (r *AuthPolicyReconciler) generateTopology(ctx context.Context) (*kuadrantg | |||
}) | |||
|
|||
return kuadrantgatewayapi.NewTopology( | |||
kuadrantgatewayapi.WithAcceptedRoutesLinkedOnly(), | |||
kuadrantgatewayapi.WithProgrammedGatewaysOnly(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what problem this is solving, other than skipping applying the getaway overrides on top of route policies when the getaway is not in a programmed state. The getaway policy will still be reconciled as valid, the gateway AuthConfig will still be created (along and possibly clashing with the route one if it exists), and all the policy statuses will be reported as fine.
I'm afraid we may be here hiding the actual problem which in this case is that the gateway is not ready, therefore we shouldn't configure policies for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the described problem is not introduced by this PR, but neither it's fixed by it though. In fact, I'm afraid the change may even aggravate the problem.
The more fundamental problem is of course that, in this case, the topology is only marginal to the decisions we make. Most of the business logic for reconciling AuthPolicies still won't leverage the topology graph built by this function, while some already do.
By trying to fix only the topology graph, we're effectively providing two views of world, thus splitting the business logic into the portion that creates configuration based on one reality and another based on a different reality.
I fear that, by trying to do the right thing here, we're actually bringing the system to an inconsistent state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯
I am confused. I will kindly ask you to elaborate more on this because I do not get it.
I will try to clarify some of your statements:
BTW, the described problem is not introduced by this PR, but neither it's fixed by it though.
This PR does not pretend to change a bit the behavior of any of the controllers. All the tests pass. There is no problem being described and thus, no problem being fixed.
This PR is adding few more options to the DAG topology, yet keeping the controllers behavior as they are. The intention is to add more tools/capabilities on the topology lib to enable following up PR's introducing the Limitador limits controller.
The authcontroller in current main
branch is building a topology with only programmed gateways and linking only accepted routes. So is doing the code with the new changes in this PR. Nothing changes. And if something changes, let me know and I will fix it.
In fact, I'm afraid the change may even aggravate the problem.
First of all, what problem are you talking about?
Configuration for "programmed gateways only" option was introduced by #715. This PR is adding (among other public methods) yet another configuration option which is opt out from "link accepted routes only". As you know, currently the topology has hard-coded linking only accepted routes. This may not be always desirable. This feature is not needed now, but will be needed by the limits controller. The reasoning being that the controller will need a topology in its mappers which routes are linked to the gateway no matter they are accepted or not. But this is something I can explain further when I open PR for the limits controller.
I do not see any problem being aggravated. Again, this is not changing any controller behavior. And if it does, I will fix it.
By trying to fix only the topology graph, we're effectively providing two views of world, thus splitting the business logic into the portion that creates configuration based on one reality and another based on a different reality.
This is by no means fixing the topology. AFAIK, there is no issue with the current topology for the needs of the current controllers. This is about adding more configurability to the topology. Some business logic requires a type of topology with not programmed gateways as well as linked routes that have not been accepted. Specifically I am thinking on the mappers. Mappers need to walk the topology. And having disconnections because routes are not accepted prunes events to reconcile kuadrant policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configuration for "programmed gateways only" option was introduced by #715. […] As you know, currently the topology has hard-coded linking only accepted routes.
This indeed confirms the PR does not aggravate the problem, and it's enough for me to approve it. Thanks for explaining it and for collecting the evidences in the code to support it, @eguzki .
To elaborate on my main criticism, which I still stand for...
Disclaimer for all readers: please try to read in between the lines and focus on the generalized form of the problem I'm trying to describe, rather than the individual examples I use. |
I missed #715. I think this is a very good example of how a well-intended change, that we needed indeed and that, looked individually, made our code better than before, but, at the same time, if observed holistically, emerges nonetheless as a change that creates a problem (or aggravates one) we didn't have before (or that simply had a smaller surface before.)
We spotted we were treating non-programmed gateways as if they were ready and thus we were signaling the policies configured on those gateways as enforced. This was clearly a bug and had to be fixed. We have tons of code that implement business logic based on a "view of the world" provided by this topology struct. We made that topology struct to only account for programmed gateways. Done. Problem solved. We fixed all those business logic with one move.
However, unfortunately there are controllers (possibly all of them AFAIK!) that mix business logic based on the topology struct with business logic that does not use the topology struct, where sometimes the status of the Gateways and HTTPRoutes are properly taken into account, sometimes not. (I mentioned the reconciliation of AuthConfigs before, but look at ComputeGatewayDiff
for plenty of more examples.)
With the topology struct now doing what seems to be the right thing, we created two "views of the world" – one that we believe is right and another one that may or may not be right. These two views (as well as the decisions based on them) are often intertwined together, leading to a situation where we can no longer say "Kuadrant overlooks the state of the gateways. As a user, it's up to you to make sure to bring all your gateways to a programmed state. Until then, do not trust status of Kuadrant policies." Unfortunately, neither we can say today, after #715, "No worries! Kuadrant makes sure to always configure and report status of your policies right, according to the state of the gateways." Between the two statements, obviously we want to issue the latter, but if that's not true, then we're probably better off with the former IMO.
I used "programmed gateways" as example, but we can run over again replacing it with "accepted routes", or "listener status", and possibly others that do not occur to me right now. The important part is, while we are not consistent about the views of the world that the code bases its decisions upon, it becomes very hard to debug behaviors such as:
- why a policy with status enforced doesn't seem to be enforced at all
- why an override that seems not have been accepted appears to be taking over my route-level rules (or the other way around)
- etc
To sum up, let's be careful with changes that seem to be improving the code but until they are rolled out completely they may actually put us (as well as our users) in a difficult position. This is not the case of this PR as I'm now convinced, and neither it is a negative criticism to the DAG/topology approach in general. Rather, I'm concerned with how we roll out these changes.
What
WithAcceptedRoutesLinkedOnly()
: only accepted routes will be linked to the gatewayWithProgrammedGatewaysOnly()
: only programmed (valid) gateways can have routes and attached policies. Not programmed gateways will be isolated nodes in the topology.Policies()
that returns all policies included in the topology. Returned objects arePolicyNodes
that allow to jump to topology nodes targeted by the policy (akaPolicyTargetNode
).Work being done as part of #530
This is part of a series of PR's that will end with a new controller that will only reconcile limitador limits configuration.