Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Namespace access #201

Merged
merged 18 commits into from
Aug 18, 2021
Merged

Namespace access #201

merged 18 commits into from
Aug 18, 2021

Conversation

njbarber
Copy link

@njbarber njbarber commented Jul 29, 2021

This closes #147

Note to reviewers: remember to look at the commits in this PR and consider if they can be squashed
Note to contributors: remember to re-generate client set if there are any API changes

Summary Of Changes

When creating a new resource the namespace field of the cluster reference is checked against the allowedTopologyNamespaces field for a cluster (this effectively inverts the changes made as part of #83). If a namespace is not specified it assumes the cluster is in the same namespace as the resource being created.

Additional Context

Related to rabbitmq/cluster-operator#785.

@embano1
Copy link

embano1 commented Jul 31, 2021

This is a great and needed change since we require this as part of our Knative RabbitMQ eventing setup in the VEBA project which has the cluster in the vmware-system namespace but broker in a different namespace (vmware-functions).

@Zerpet Zerpet self-requested a review August 10, 2021 11:52
Copy link
Contributor

@Zerpet Zerpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial feedback, I'm still reviewing part of the code.

I ran the integration/controllers tests from this branch, and I have red tests for almost all (perhaps all?) controller tests, in the cases when X references a cluster from an allowed namespace and when X references a cluster that allows all namespaces.

I had to slightly tweak the Go mod file to compile the operator (see below). I also think some of the assertions in those tests are not correct. In the "happy case" when an object references a RabbitmqCluster in the allow-list, the object should be created, and the status condition should reflect "success". I left comments in the sections for the binding controller, and I believe the same applies to the others (still have to go through all the code).

go mod edit -replace=github.com/rabbitmq/[email protected]=github.com/njbarber/cluster-operator@namespace-access

controllers/binding_controller_test.go Outdated Show resolved Hide resolved
controllers/binding_controller_test.go Outdated Show resolved Hide resolved
@njbarber
Copy link
Author

go mod edit -replace=github.com/rabbitmq/[email protected]=github.com/njbarber/cluster-operator@namespace-access

Yes that's what I did locally. I also imagine https://github.com/njbarber/messaging-topology-operator/blob/main/controllers/suite_test.go#L65 will have to be changed based on the version increment for the cluster operator.

Copy link
Contributor

@Zerpet Zerpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished reviewing the code, everything is looking good with the latest changes. There's one last item I'm unsure about:
By adding the field .spec.messagingTopologyNamespaces in cluster operator, we are adding a field in Cluster Operator that only makes sense in the context of Topology Operator. I feel we are creating a sort-of dependency/relationship in the Cluster Operator, towards the Topology Operator. Right now, the Cluster Operator doesn't "know" anything about the Topology Operator (which is a good thing).

I'm wondering if, in order to maintain this zero-topology-operator-knowledge in the Cluster Operator, should we take a different approach to declare the allow list?
For example, using annotations.

@njbarber
Copy link
Author

Sure that makes sense to me. Let me know if the recent changes match what you had in mind.

Since I'm not familiar with the annotations pattern generally, my initial questions would be

  • where would the relevant documentation for this live (would the error message in the operator logs/status message on the relevant resource here be sufficient, or should it be somewhere else as well)? I would think someone creating a RabbitmqCluster would want to know that annotation existed independently.
  • is there a convention that should be used for the name of the annotation itself? I just did rabbitmq.topology.allowed-namespaces (sort of modeled after https://kubernetes.github.io/ingress-nginx/user-guide/nginx-configuration/annotations/).

@Zerpet
Copy link
Contributor

Zerpet commented Aug 16, 2021

The error message in the logs and the status condition seem sufficient to me from the Operator side. We will have to document this feature in RabbitMQ docs. I'm happy to take on that task.

The current approach looks good to me. I usually look at AWS Load Balancer Controller for inspiration in annotations. Annotations have an optional prefix, that has to be a DNS domain. We could use rabbitmq.com as prefix. The annotation could look like rabbitmq.com/topology-allowed-namespaces (removed rabbitmq from the name because it's in the domain part).
But I don't feel too strong about it, since rabbitmq.topology.allowed-namespaces is very specefic, and it has rabbitmq.

@njbarber
Copy link
Author

I updated the annotation name with your suggestion and rebased. Let me know if there's anything else.

I also closed rabbitmq/cluster-operator#785 as it's no longer relevant.

@Zerpet
Copy link
Contributor

Zerpet commented Aug 17, 2021

I'm happy with the current state. I will go one last time through the PR with somebody else within our team, and we'll merge if nothing else stands out :shipit:

Copy link
Contributor

@Zerpet Zerpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR workflow in GitHub Actions found a small issue. We have to include the following bits in the Deployment manifest, otherwise the Topology Operator won't when you deploy it in a Kubernetes cluster:

https://github.com/rabbitmq/cluster-operator/blob/fec7f9c666097ea2cf9a99d8dfc35e08bfe4432c/config/manager/manager.yaml#L45-L49

main.go Show resolved Hide resolved
@Zerpet
Copy link
Contributor

Zerpet commented Aug 17, 2021

Hi @njbarber, I was doing some QA with somebody from our team and we found that, since we are returning an error when the resource is not allowed to be created, Kubernetes requeues the request, and the Operator keeps trying indefinitely. If we know the request will never succeed (because it is a permission violation), we should stop trying. By retrying indefinitely, we will flood the logs with the same message over and over, and we can't know if the annotation in RabbitmqCluster will change, therefore, it is best to just stop trying once we log the error and update the status condition.

If the annotation in RabbitmqCluster is updated, and the request will then succeed, the topology object (e.g. User) will have to be updated manually, for example, by adding a label, in order to trigger a new reconcile request. I believe this is an acceptable experience.

@njbarber
Copy link
Author

So just return nil?

I believe the default behavior when returning an error is to go into an exponential backoff (which is counterintuitive since there's no reason to try repeatedly initially when there's no chance of the problem being fixed). It will eventually space out the retries though. Were you seeing something different in your testing?

I agree with forcing a reconcile on the relevant resource as being an acceptable approach as well - if you just want to go that route.

@michaelklishin
Copy link
Member

@njbarber we don't have reasons to believe that in our testing the back-off was not happening. However, retrying an operation that you don't have the permissions to perform just generally seems pointless and counterproductive (will pollute logs, potentially generate alerts over and over, that kind of thing).

@Zerpet
Copy link
Contributor

Zerpet commented Aug 17, 2021

Just returning nil in that case sounds good to me. As Michael mentioned, we do observe an exponential backoff, but the initial "wait" time between requeues is very short (milliseconds) and it does cause a considerable amount of logging until the retries between requests stop spamming the log.

Edit: The error in the checks seems to be a flake, I can run the same locally in a GKE cluster and it succeeds.

Copy link
Contributor

@Zerpet Zerpet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did the last pass of QA. I'm happy with the current state. I will trigger another run of the PR workflow and merge once it passes.

@Zerpet Zerpet merged commit ae26ca6 into rabbitmq:main Aug 18, 2021
gvmw added a commit to gvmw/docs that referenced this pull request Dec 1, 2021
This is what most users expect. Without this, the Messaging Topology
Operator will not be able to manage objects from a different namespace.
The implications are that all Brokers / Triggers / Sinks need to be
created in the same namespace as the RabbitmqCluster. This annotation
addresses that limitation.

More context:
- knative-extensions/eventing-rabbitmq#454
- rabbitmq/messaging-topology-operator#201

Signed-off-by: Gerhard Lazu <[email protected]>
knative-prow-robot pushed a commit to knative/docs that referenced this pull request Dec 1, 2021
#4532)

This is what most users expect. Without this, the Messaging Topology
Operator will not be able to manage objects from a different namespace.
The implications are that all Brokers / Triggers / Sinks need to be
created in the same namespace as the RabbitmqCluster. This annotation
addresses that limitation.

More context:
- knative-extensions/eventing-rabbitmq#454
- rabbitmq/messaging-topology-operator#201

Signed-off-by: Gerhard Lazu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow sourcing Queues, Users and Permissions from different namespaces.
4 participants