Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit objects cached by DevWorkspace controller to reduce memory usage #652

Merged
merged 14 commits into from
Nov 5, 2021

Conversation

amisevsk
Copy link
Collaborator

What does this PR do?

Limits the internal controller cache for objects managed by the DevWorkspace Operator (introduced in controller-runtime v0.9.0; see doc). Since we can specify just one selector for limiting the cache, we use

  • Existence of the devworkspace_id label for most objects (deployments, services, etc.)
  • New labels controller.devfile.io/watch-configmap and controller.devfile.io/watch-secret that must be applied to secrets/configmaps we use
    • This changes the flow for automounting secrets/configmaps and git credentials
  • For roles/rolebindings, we limit the cache to the workspace role/binding based on an internal constant

The downside of doing this is that any objects that do not match the selector cannot be read in the controller. This has two impacts on our design:

  1. As mentioned above, it changes the flow for automounting secrets/configmaps, as applying an additional label is required. Without it, the controller won't know those resources exist
  2. It requires refactoring how we manage syncing objects to the cluster, as there are two new edge cases to worry about
    • We must make sure any resources we create/update have the appropriate label
    • We need to treat failures to Create an object specially, otherwise we enter a loop if someone removes the required label from an object -- getting an AlreadyExists error when trying to create requires us to try and update the object.

To address the second point above, rather than rewrite the reconcile for each place its used, I consolidated all syncing (spec vs cluster) into one package and reworked everywhere else to use this.

The main benefit of this change is drastically reduced memory usage on large clusters: for a cluster with 1000 stopped devworkspaces, ~1000 deployments/services/routes, ~5000 configmaps, ~24000 secets, ~26000 rolebindings, we have

  • Current main branch: ~1750Mi memory usage
  • This PR branch: ~90Mi memory usage

This represents an approximate memory use reduction of 18-19x. In the specific case of the cluster I tested against, it appears (unsurprisingly) that secrets are the main culprit. Testing a variant image that only restricts the cache for secrets reduced memory usage to ~350Mi.

I'm opting to restrict the cache for all objects as otherwise memory usage of DWO depends on the objects that exist on the cluster. With the cache restriction, memory use should be governed mainly by how many DevWorkspaces exist on the cluster.

Additional info

Graph of memory usage for DWO while testing various different cases (each image is restarted 5 times as memory usage spikes on startup):

DWO-cache-restriction-metrics

Note the numbers here don't match the ones listed above exactly as these are internal metrics and the numbers above use podmetrics from the cluster. The cases being tested are:

  1. Main branch
  2. Restrict only objects that use the devworkspace_id label (easiest case)
  3. Same as 2. but also restrict roles and rolebindings based on name
  4. Same as 2. but also restrict secrets and configmaps by new label
  5. Restrict everything we can (2 + 3 + 4)
  6. Restrict only secrets from the cache to see the effect there.

Diagram for the new sync object flow:

reconcile-flow-alt

What issues does this PR fix or reference?

Is it tested? How?

Testing might be tricky:

  • We need to make sure I didn't miss adding the required label to any secrets/configmaps.
  • We need to make sure all objects that get created can be read (e.g. workspace ServiceAccounts weren't previously getting the devworkspace_id label)
  • We ideally need to make sure that updating from a branch that doesn't restrict the cache to the current commit functions as expected and that all owned objects are updated if necessary

I've tried testing the above locally and haven't seen any issues.

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

Modify cache used in controller in order to restrict cache to items with
the devworkspace_id label. This has the downside of making all objects
_without_ that label invisible to the controller, but has the benefit of
reduced memory usage on large clusters.

Signed-off-by: Angel Misevski <[email protected]>
Add labels

  - controller.devfile.io/watch-configmap
  - controller.devfile.io/watch-secret

which must be set to "true" in order for the DevWorkspace Operator to
see the  corresponding secret/configmap.

This is required (compare to the previous commit) because the controller
is not only interested in secrets and configmaps it creates, but also
any configmap/secret on the cluster with e.g. the automount label
attached.

Since each type in the controller gets a single informer, we can only
specify a single label selector for the objects we are interested in.
This means we cannot have e.g. "has devworkspace_id label OR has
mount-to-devworkspace label".

Signed-off-by: Angel Misevski <[email protected]>
Restricting the cache to only configmaps with the new label results in
existing workspaces failing to reconcile. This occurs because attempting
to Get() the configmap from the cluster returns a IsNotFound error,
whereas attempting to Create() the configmap returns an AlreadyExists
error (Create interacts with the cluster, Get interacts with the cache).

To avoid this, if we encounter an AlreadyExists error when attempting to
create an object, we optimistically try to update the object (thus
adding the required label). This resolves the issue above, as if the
obejct is updated, the subsequent Get() call will return the object as
expected.

Signed-off-by: Angel Misevski <[email protected]>
Restricting the controller-runtime cache to specific objects means that
once-tracked objects can disappear from the controller's knowledge if
the required label is removed. To work around this, it is necessary to
update how we sync objects to specifically handle the case where:

  * client.Get(object) returns IsNotFound
  * client.Create(object) returns AlreadyExists

This occurs because we can't read objects that aren't in the cache, but
attempting to create objects collides with the actual object on the
cluster.

Since the basic flow of Get -> Create/Update is repeated for each type
we handle, this commit collects that repeated logic into one package
(pkg/provision/sync), allowing object handling to be done in one place.

Signed-off-by: Angel Misevski <[email protected]>
Adapt the metadata and storage cleanup tasks to use the new sync flow

Signed-off-by: Angel Misevski <[email protected]>
@amisevsk amisevsk changed the title Limit cache Limit objects cached by DevWorkspace controller to reduce memory usage Oct 20, 2021
@sleshchenko
Copy link
Member

/test v8-devworkspace-operator-e2e, v8-che-happy-path

Copy link
Member

@sleshchenko sleshchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job!

Regarding the changes - nothing to comment, apart from the fact that I don't like (don't like should show that it's personal opinion that can be ignored) that sync func returns errors when object is updated successfully.

Is going to test now.

@@ -26,6 +26,14 @@ const (
// DevWorkspaceNameLabel is the label key to store workspace name
DevWorkspaceNameLabel = "controller.devfile.io/devworkspace_name"

// DevWorkspaceWatchConfigMapLabel marks a configmap so that it is watched by the controller. This label is required on all
// configmaps that should be seen by the controller
DevWorkspaceWatchConfigMapLabel = "controller.devfile.io/watch-configmap"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we need a dedicated label for each of objects type?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WDYM? For most objects, the devworkspace_id label is sufficient, but I had to add separate ones for configmaps/secrets since those by default do not have one label they always use. We might run into a similar issue to workaround for Deployments if we continue supporting async storage, but we could fudge that by using devworkspace_id: all or something.

We could use one label for both secrets and configmaps, but then we run into the issue of how to name it -- watch-resource may be unclear as it only applies to configmaps/secrets, and we use different labels for other objects.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you got my question)

We could use one label for both secrets and configmaps, but then we run into the issue of how to name it -- watch-resource may be unclear as it only applies to configmaps/secrets, and we use different labels for other objects.

That's exactly what I have in mind, including the concern ) So, then I think it makes sense to leave as is.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One tiny +1 to use common watch annotation - it will allow us to avoid having different articles/section to some docs, like here https://docs.google.com/document/d/1IR78XlxO37VTWXOu-uE-2nKC93D1GNhZomGoRaN518o/edit?usp=sharing

So, we'll just provide a single patch command instead two.

The concern:

We could use one label for both secrets and configmaps, but then we run into the issue of how to name it -- watch-resource may be unclear as it only applies to configmaps/secrets, and we use different labels for other objects.

may be addressed by the following explanation:

DevWorkspace operator watches objects owned by DevWorkspace CR (ones which are in additional labeled with workspace id) or standalone additional objects labeled with watch: true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that there always is at least some kind of label on the secret/configmap that DWO handles, even if such labels differ depending on the purpose.

I think (I have not tried this out), it should be possible to write an "OR" label selector - if not using the existing code then by implementing a custom labels.Selector.

I personally think requiring 2 labels on a single object for a single purpose is a little bit weird from the UX perspective.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an "or" selector is apparently impossible, so please ignore me :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I personally think requiring 2 labels on a single object for a single purpose is a little bit weird from the UX perspective.

The reality is that there are multiple labels that can get applied to configmaps or secrets, and they each serve a different purpose

  1. controller.devfile.io/watch-[secret|configmap]: mark this secret/configmap as "of interest" to the controller; necessary due to caching change
  2. controller.devfile.io/mount-to-devworkspace: mount this resource to the workspace; used by external tools/users to share info across multiple workspaces
  3. controller.devfile.io/git-credential: mark a secret as holding git credentials, which is handled differently from above
  4. controller.devfile.io/devworkspace_id: associate this resource with the workspace with workspace ID specified.

Cases 2, 3 and 4 can exist independently of each other, e.g. a user-defined mounted configmap won't have the devworkspace_id label, and the metadata configmap we provision for workspaces won't have the mount-to-devworkspace label. As a result, there's no label selector we can use here, so we have to add the watch label to cover all use cases. Moreover, there will be cases when there are secrets/configmaps on the cluster that we're interested in that only have the controller.devfile.io/watch-[secret|configmap] label and no others.


The concern [...] may be addressed by the following explanation:

DevWorkspace operator watches objects owned by DevWorkspace CR (ones which are in additional labeled with workspace id) or standalone additional objects labeled with watch: true

Potentially, but it's still somewhat unclear: we watch PVCs without the label applied, and secrets/configmaps become invisible if the label is removed, even if it has the workspace ID label. I'm open to using one label for both, but I'm not sure it's a huge gain in documentation burden.

pkg/provision/sync/sync.go Outdated Show resolved Hide resolved
pkg/provision/sync/sync.go Show resolved Hide resolved
pkg/cache/cache.go Outdated Show resolved Hide resolved
Copy link
Member

@sleshchenko sleshchenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't tested through proper load testing flow but with usual testing, it works fine.
Please, after it's merged drop PR for Che operator to update DWO go dep, or create an issue for them.

@mmorhun
Copy link

mmorhun commented Oct 21, 2021

Is it possible to use non-caching client (like here) instead of adding labels just to manage caches?

@amisevsk
Copy link
Collaborator Author

Is it possible to use non-caching client (like here) instead of adding labels just to manage caches?

It would be possible, but goes against the intention of the controller design. With caching, we can avoid making API calls out to the cluster most of the time (only really hitting the API server on create/update calls), so I'd be concerned about performance/behavior if we went that route.

As a rough example, it takes around 10-15 reconciles to start a workspace, and all of these reconciles tend to happen within the first couple of seconds. If during that process we have to list/get secrets/configmaps multiple times (in the case of mounting git credentials, we might make two secrets, two configmaps, and read secrets + configmaps 2-3 times per reconcile), we're looking at potentially 50+ API requests per started workspace that are not present with caching.

We do use a non-caching client in a few places, but those are reserved for startup or one-time tasks. In load testing, I've seen 200+ reconciles per second in the controller, which could result in thousands of get/list calls per second.

The other concern I'd have is in throughput, as making actual calls to the API will necessarily be slower than reading from the cache. This could directly impact startup time when under load, which we'd prefer to avoid.

Update sync methods in devworkspaceRouting controller to use updated
sync package where appropriate. Note deleting out-of-date network
objects must still be done in the controller, as it's not possible to
iterate through a generic list of client.Objects.

Signed-off-by: Angel Misevski <[email protected]>
On Kubernetes, we can't restrict the cache for Routes since they are not
a part of the included scheme. As a result we have to work around adding
Routes to the cache only on OpenShift.

Signed-off-by: Angel Misevski <[email protected]>
Pass around the full clusterAPI struct to methods in automount package,
to allow for shared Context/Logging.

Signed-off-by: Angel Misevski <[email protected]>
For most objects, we can client.Update() using the spec object without
issue. However, for Services, updates are rejected if they try to unset
spec.ClusterIP. This means we need to copy the ClusterIP from the
cluster service before updating.

This commit adds an extensible mechanism for specifying type-specific
update functions that are called whenever we attempt to update a cluster
object.

Signed-off-by: Angel Misevski <[email protected]>
Use diffOpts when printing spec vs cluster object diffs when updates are
required.

Signed-off-by: Angel Misevski <[email protected]>
@amisevsk
Copy link
Collaborator Author

/test v8-devworkspace-operator-e2e, v8-che-happy-path

@openshift-ci
Copy link

openshift-ci bot commented Oct 26, 2021

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: amisevsk, JPinkney, sleshchenko

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [JPinkney,amisevsk,sleshchenko]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@amisevsk
Copy link
Collaborator Author

amisevsk commented Nov 1, 2021

/retest

@openshift-ci
Copy link

openshift-ci bot commented Nov 1, 2021

@amisevsk: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/v8-che-happy-path 8e07871 link true /test v8-che-happy-path

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@amisevsk amisevsk merged commit cd7b1e4 into devfile:main Nov 5, 2021
@amisevsk amisevsk deleted the limit-cache branch November 5, 2021 18:41
@nickboldt
Copy link
Collaborator

Is this fixed in DWO 0.10? Asking because

image

@amisevsk
Copy link
Collaborator Author

v0.10 was branched before this PR was merged; it'll be included in v0.11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants