Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opentelemetry Collector is not removed on stack uninstall #365

Closed
2 tasks done
paulfantom opened this issue May 12, 2022 · 11 comments · Fixed by #568
Closed
2 tasks done

Opentelemetry Collector is not removed on stack uninstall #365

paulfantom opened this issue May 12, 2022 · 11 comments · Fixed by #568
Assignees
Labels

Comments

@paulfantom
Copy link
Contributor

What happened?

Run tobs uninstall and otel collector deployment was still running.

Did you expect to see something different?

All components should be removed

How to reproduce it (as minimally and precisely as possible):

tobs install -y && tobs uninstall

Environment

  • tobs installation method:
  • cli
  • helm chart
  • tobs version:

    0.10.0

  • Kubernetes version information:

    1.23

  • Kubernetes cluster kind:

kind

@paulfantom paulfantom added the bug Something isn't working label May 12, 2022
@paulfantom paulfantom added this to the 0.11.0 milestone May 12, 2022
@paulfantom
Copy link
Contributor Author

Seems like operator either doesn't have enough time to remove the collector Deployment or it doesn't have this feature coded.

This issue may slip to next milestone.

@onprem
Copy link
Contributor

onprem commented Jun 30, 2022

This is happening because we are annotating the OpentelemetryCollector CR as a helm hook. And by default, helm hooks are not managed with the releases, i.e. they don't get removed when uninstall happens [1].

The only solution I can see now is instructing users to manually delete the CR (+ other resources annotated as hook, if required) similar to how we instruct them for secrets etc. WDYT about this @paulfantom ?

[1] https://helm.sh/docs/topics/charts_hooks/#hook-resources-are-not-managed-with-corresponding-releases

@onprem onprem removed their assignment Jun 30, 2022
@paulfantom paulfantom modified the milestone: 0.12.0 Jun 30, 2022
@paulfantom
Copy link
Contributor Author

I agree, we'll need to document the uninstallation process in detail and point to helm shortcomings as of why this is needed. I also think it would be good to instruct users to use a dedicated namespace when using tobs as removal would be trivial at that point (simple kubectl delete ns <name> should suffice).

Either way let's put this in a backlog for now.

@paulfantom
Copy link
Contributor Author

paulfantom commented Jun 30, 2022

I wonder if we can potentially tie the CR to some other object with kubernetes metadata.ownerReferences. 🤔 This way we would move removal responsibility from helm to kubernetes.

@nhudson
Copy link
Contributor

nhudson commented Aug 9, 2022

Just adding to this, it makes it pretty impossible to run multiple tests using ct CLI tool. Not only is opentelemetry-collector left behind, there are many ConfigMap and Secret objects left behind. When ct moves on to n+1 test, it will fail due to these objects being left behind.

@nhudson
Copy link
Contributor

nhudson commented Aug 10, 2022

#535 is related as it instructs the user to manually delete the opentelemetrycollector CR manually.

@nhudson
Copy link
Contributor

nhudson commented Aug 10, 2022

After some research and testing I don't think we can use the metadata.ownerReferences to make this work. This is due to the uid field that is required, and the uid has to be the uid of the resource that owns what we are setting.

So for example if we set the onwerReferences to the Deployment for the otel-operator, we would need a way to look up the uid of that Deployment when we apply the OpenTelemetryCollector and set it. From what I understand of the lookup() function in Helm this will never be possible, during a template, install, upgrade, delete etc. Maybe I am mis-reading this.

docs

Keep in mind that Helm is not supposed to contact the Kubernetes API Server during a helm template or a helm install|upgrade|delete|rollback --dry-run, so the lookup function will return an empty list (i.e. dict) in such a case.

@nhudson
Copy link
Contributor

nhudson commented Aug 11, 2022

Related otel issue: open-telemetry/opentelemetry-helm-charts#69

@nhudson
Copy link
Contributor

nhudson commented Aug 17, 2022

The main difference between otel-operator and prometheus-operator is that when the prometheus-operator is removed, either via Helm chart or through the release manifest, the CRD is removed. With otel-operator the CRD is in left intact after the operator is removed. This will leave any collectors that are configured.

@nhudson
Copy link
Contributor

nhudson commented Aug 17, 2022

@paulfantom what if we switched from using the operator to using the collector Helm chart instead? I assume that at some point we would want to add support for OpenTelemetryInstrumentation ?

@paulfantom
Copy link
Contributor Author

We cannot switch from operator helm chart as we want to have an ability to use auto-instrumentation feature provided only by the operator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants