Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

restrict manager resource cache based on namespaces from environment #186

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

leelavg
Copy link
Contributor

@leelavg leelavg commented Dec 11, 2024

always cache resources from operator deployed namespace and provider an option to user for caching resources from other namespaces via WATCH_NAMESPACE env var with a comma separated namespace values.

fixes: #184

@leelavg
Copy link
Contributor Author

leelavg commented Dec 11, 2024

tested, peak usage was 50 Mi for startup down from ~ 1.5 Gi on a semi-prod cluster.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 11, 2024

The namespace(s) name needs to be configurable as this operator is developed to work with multiple namespaces

https://sdk.operatorframework.io/docs/building-operators/golang/operator-scope/#configuring-watch-namespaces-dynamically is the way we need to look to fix this one.

@leelavg
Copy link
Contributor Author

leelavg commented Dec 11, 2024

Yes, I want to refactor when we actually need it and so reused operator namespace.

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Dec 11, 2024

Yes, I want to refactor when we actually need it and so reused operator namespace.

IMO, it should be implemented in the above-requested manner instead of immediately changing it and later refactoring it. @nb-ohad WDYT?

@leelavg leelavg changed the title restrict manager resource cache to only operator namespace restrict manager resource cache based on namespaces from environment Dec 11, 2024
@leelavg
Copy link
Contributor Author

leelavg commented Dec 11, 2024

The namespace(s) name needs to be configurable as this operator is developed to work with multiple namespaces

https://sdk.operatorframework.io/docs/building-operators/golang/operator-scope/#configuring-watch-namespaces-dynamically is the way we need to look to fix this one.

  • multi namespace is not working, need to go through controller-runtime code on what changed.

@leelavg leelavg force-pushed the oom-fix branch 2 times, most recently from 37772c2 to e69eb50 Compare December 19, 2024 04:04
@leelavg
Copy link
Contributor Author

leelavg commented Dec 19, 2024

done.

@leelavg
Copy link
Contributor Author

leelavg commented Dec 19, 2024

/assign @nb-ohad @Madhu-1

}

// getOperatorNamespace returns the Namespace the operator is running
func getOperatorNamespace() (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we already have OperatorNamespace in utils. can you please use it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you pls point, didn't find it. There's something similar in controllers but not usable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry i was looking at my local copy where the getOperatorNamespace was exposed, can you please update

var operatorNamespace = utils.Call(func() string {
to use it outside it would duplicate functions

Copy link
Contributor Author

@leelavg leelavg Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as mentioned in controllers but not usable, that function serves different purpose.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to avoid duplication of the ENV var, so if we change something tomorrow, I don't need to change it in multiple places.


// getWatchNamespace returns the Namespace the operator should be watching for changes
func getWatchNamespace() (string, error) {
var watchNamespaceEnvVar = "WATCH_NAMESPACE"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please change for WATCH_NAMESPACE in makefile and container ENV as we did for serviceaccount_name_prefix #116?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying not to touch manifests as during backports it's always creating extra work and that should be fixed first.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please open a tracker if you are planning to fix it as followupl

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please change for WATCH_NAMESPACE in makefile

  • done.

os.Exit(1)
}
// ensure we always cache items from operator namespace
defaultNamespaces[operatorNamespace] = cache.Config{}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how will this change work where the operator should be able to manage the csi drivers in all the namespaces? https://github.com/ceph/ceph-csi-operator/blob/main/docs/design/operator.md#goals-and-guidelines

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in my tests probably csi driver was already deployed, do you know how rook does it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rook isn't doing anything explicitly for cluster scoped resources and based on my testing in red-hat-storage/ocs-operator#2911 we do get events for them, ocs-op only watches 2 namespaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retested, it's working w/o any changes

> date; k get csidriver  -w
Fri Jan  3 06:26:07 UTC 2025
NAME               ATTACHREQUIRED   PODINFOONMOUNT   STORAGECAPACITY   TOKENREQUESTS   REQUIRESREPUBLISH   MODES        AGE
rbd.csi.ceph.com   true             false            false             <unset>         false               Persistent   64s
rbd.csi.ceph.com   true             false            false             <unset>         false               Persistent   87s
rbd.csi.ceph.com   true             false            false             <unset>         false               Persistent   0s

> date; k delete csidriver rbd.csi.ceph.com; date
Fri Jan  3 06:26:31 UTC 2025
csidriver.storage.k8s.io "rbd.csi.ceph.com" deleted
Fri Jan  3 06:26:31 UTC 2025

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its not for Rook, its for standalone deployment of csi driver

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I meant rook also deploys csidriver even if it is watching single namespace.

always cache resources from operator deployed namespace and provider an
option to user for caching resources from other namespaces via
`WATCH_NAMESPACE` env var with a comma separated namespace values.

Signed-off-by: Leela Venkaiah G <[email protected]>
Signed-off-by: Leela Venkaiah G <[email protected]>

watchNamespace, err := getWatchNamespace()
if err != nil {
setupLog.Error(err, "manager will only watch for resources in the operator deployed namespace")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.Exit(1) needs to be done as its an actual error and the user might expect that things are working for the watch namespace but the operator is silently ignoring it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the namespace exists we wouldn't get an error and watchnamespace is additional env as we always use operator namespace in the cache.

}

// getOperatorNamespace returns the Namespace the operator is running
func getOperatorNamespace() (string, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to avoid duplication of the ENV var, so if we change something tomorrow, I don't need to change it in multiple places.

@@ -7,6 +7,8 @@ IMAGE_NAME ?= ceph-csi-operator
# Allow customization of the name prefix and/or namespace
NAME_PREFIX ?= ceph-csi-operator-
NAMESPACE ?= $(NAME_PREFIX)system
# A comma separated list of namespaces for operator to cache objects from
WATCH_NAMESPACE ?= ""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use NAMESPACE as default instead of an empty string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we can rename this to ADDITIONAL_WATCH_NAMESPACES or always have operator ns in this.

I didn't go with latter as the ns is known by downward api which has different semantics and so left this as user configurable option.

@Madhu-1 Madhu-1 requested a review from nb-ohad January 3, 2025 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

High memory usage due to watching all namespaces
2 participants