-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
enhanced gateway api topology #751
Merged
Merged
Changes from 4 commits
Commits
Show all changes
5 commits
Select commit
Hold shift + click to select a range
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what problem this is solving, other than skipping applying the getaway overrides on top of route policies when the getaway is not in a programmed state. The getaway policy will still be reconciled as valid, the gateway AuthConfig will still be created (along and possibly clashing with the route one if it exists), and all the policy statuses will be reported as fine.
I'm afraid we may be here hiding the actual problem which in this case is that the gateway is not ready, therefore we shouldn't configure policies for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, the described problem is not introduced by this PR, but neither it's fixed by it though. In fact, I'm afraid the change may even aggravate the problem.
The more fundamental problem is of course that, in this case, the topology is only marginal to the decisions we make. Most of the business logic for reconciling AuthPolicies still won't leverage the topology graph built by this function, while some already do.
By trying to fix only the topology graph, we're effectively providing two views of world, thus splitting the business logic into the portion that creates configuration based on one reality and another based on a different reality.
I fear that, by trying to do the right thing here, we're actually bringing the system to an inconsistent state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤯
I am confused. I will kindly ask you to elaborate more on this because I do not get it.
I will try to clarify some of your statements:
This PR does not pretend to change a bit the behavior of any of the controllers. All the tests pass. There is no problem being described and thus, no problem being fixed.
This PR is adding few more options to the DAG topology, yet keeping the controllers behavior as they are. The intention is to add more tools/capabilities on the topology lib to enable following up PR's introducing the Limitador limits controller.
The authcontroller in current
main
branch is building a topology with only programmed gateways and linking only accepted routes. So is doing the code with the new changes in this PR. Nothing changes. And if something changes, let me know and I will fix it.First of all, what problem are you talking about?
Configuration for "programmed gateways only" option was introduced by #715. This PR is adding (among other public methods) yet another configuration option which is opt out from "link accepted routes only". As you know, currently the topology has hard-coded linking only accepted routes. This may not be always desirable. This feature is not needed now, but will be needed by the limits controller. The reasoning being that the controller will need a topology in its mappers which routes are linked to the gateway no matter they are accepted or not. But this is something I can explain further when I open PR for the limits controller.
I do not see any problem being aggravated. Again, this is not changing any controller behavior. And if it does, I will fix it.
This is by no means fixing the topology. AFAIK, there is no issue with the current topology for the needs of the current controllers. This is about adding more configurability to the topology. Some business logic requires a type of topology with not programmed gateways as well as linked routes that have not been accepted. Specifically I am thinking on the mappers. Mappers need to walk the topology. And having disconnections because routes are not accepted prunes events to reconcile kuadrant policies.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indeed confirms the PR does not aggravate the problem, and it's enough for me to approve it. Thanks for explaining it and for collecting the evidences in the code to support it, @eguzki .
To elaborate on my main criticism, which I still stand for...
I missed #715. I think this is a very good example of how a well-intended change, that we needed indeed and that, looked individually, made our code better than before, but, at the same time, if observed holistically, emerges nonetheless as a change that creates a problem (or aggravates one) we didn't have before (or that simply had a smaller surface before.)
We spotted we were treating non-programmed gateways as if they were ready and thus we were signaling the policies configured on those gateways as enforced. This was clearly a bug and had to be fixed. We have tons of code that implement business logic based on a "view of the world" provided by this topology struct. We made that topology struct to only account for programmed gateways. Done. Problem solved. We fixed all those business logic with one move.
However, unfortunately there are controllers (possibly all of them AFAIK!) that mix business logic based on the topology struct with business logic that does not use the topology struct, where sometimes the status of the Gateways and HTTPRoutes are properly taken into account, sometimes not. (I mentioned the reconciliation of AuthConfigs before, but look at
ComputeGatewayDiff
for plenty of more examples.)With the topology struct now doing what seems to be the right thing, we created two "views of the world" – one that we believe is right and another one that may or may not be right. These two views (as well as the decisions based on them) are often intertwined together, leading to a situation where we can no longer say "Kuadrant overlooks the state of the gateways. As a user, it's up to you to make sure to bring all your gateways to a programmed state. Until then, do not trust status of Kuadrant policies." Unfortunately, neither we can say today, after #715, "No worries! Kuadrant makes sure to always configure and report status of your policies right, according to the state of the gateways." Between the two statements, obviously we want to issue the latter, but if that's not true, then we're probably better off with the former IMO.
I used "programmed gateways" as example, but we can run over again replacing it with "accepted routes", or "listener status", and possibly others that do not occur to me right now. The important part is, while we are not consistent about the views of the world that the code bases its decisions upon, it becomes very hard to debug behaviors such as:
To sum up, let's be careful with changes that seem to be improving the code but until they are rolled out completely they may actually put us (as well as our users) in a difficult position. This is not the case of this PR as I'm now convinced, and neither it is a negative criticism to the DAG/topology approach in general. Rather, I'm concerned with how we roll out these changes.