Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watcher anti-pattern (temporary watcher lives forever)? #1906

Open
jefflill opened this issue Apr 21, 2024 · 0 comments
Open

Watcher anti-pattern (temporary watcher lives forever)? #1906

jefflill opened this issue Apr 21, 2024 · 0 comments
Assignees

Comments

@jefflill
Copy link
Collaborator

jefflill commented Apr 21, 2024

I'm seeing this pattern in a handful of places, where we're using a watcher to wait for a Kubernetes resource to be created:

// Start the watcher.

_ = K8s.WatchAsync<V1ConfigMap>(
    async (@event) =>
    {
        await SyncContext.Clear;

        ClusterInfo = Neon.K8s.TypedConfigMap<ClusterInfo>.From(@event.Value).Data;

        Logger.LogInformationEx("Updated cluster info");
    },
    namespaceParameter: KubeNamespace.NeonStatus,
    fieldSelector:      $"metadata.name={KubeConfigMapName.ClusterInfo}",
    retryDelay:         TimeSpan.FromSeconds(30),
    logger:             Logger);

// Wait for the watcher to discover the [ClusterInfo].

NeonHelper.WaitFor(() => ClusterInfo != null, timeout: TimeSpan.FromSeconds(60), timeoutMessage: "Timeout obtaining: cluster-info.");

The problem here is that the watcher is started but is never explicitly terminated, In theory, the watcher task will eventually be garbage collected sometime after the method containing this returns, but in real life GCs are delayed until the process experiences memory pressure which may take a while and might never happen if the process isn't doing much.

It doesn't make sense to keep connections open to the API Server unnecessarily, and I believe watchers will periodically reconnect, making this even worse.

NOTE: I'm not actually convinced that GC will actually stop the watcher though. Disposing a running Task actually throws an exception saying that only completed or failed tasks can be disposed, So I suspect this means that the watcher will live forever, even after GCs.

We should add an extension method that implements this logic and invokes a callback when the resource if discovered and then uses a CancellationToken to explicitly abort the watcher after the callback returns.

I've marked the places in the code where I've found this with a $todo(jefflill) and the link to this issue,

@jefflill jefflill self-assigned this Apr 21, 2024
@jefflill jefflill changed the title Watcher discovery anti-pattern? Watcher anti-pattern (temporary watcher lives forever)? Apr 21, 2024
@marcusbooyah marcusbooyah transferred this issue from nforgeio/operator-sdk Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant