Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

invalid argument issue after 0.69.0 upgrade #1339

Open
jihuiyang opened this issue Sep 9, 2024 · 15 comments
Open

invalid argument issue after 0.69.0 upgrade #1339

jihuiyang opened this issue Sep 9, 2024 · 15 comments
Labels
question Further information is requested

Comments

@jihuiyang
Copy link

> kubectl -n otel-collector logs po/otel-collector-collector-zzs7q
Error: invalid argument "-confmap.unifyEnvVarExpansion,-component.UseLocalHostAsDefaultHost" for "--feature-gates" flag: feature gate "confmap.unifyEnvVarExpansion" is stable, can not be disabled
2024/09/09 23:14:08 collector server run finished with error: invalid argument "-confmap.unifyEnvVarExpansion,-component.UseLocalHostAsDefaultHost" for "--feature-gates" flag: feature gate "confmap.unifyEnvVarExpansion" is stable, can not be disabled

otel collectors are failing after 0.69.0 upgrade

when i describe the container i can see -confmap.unifyEnvVarExpansion, looks like new version 0.108.0 does not like it

  Containers:
   otc-container:
    Image:       ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-k8s:0.108.0
    Ports:       8888/TCP, 4317/TCP, 4318/TCP
    Host Ports:  0/TCP, 0/TCP, 0/TCP
    Args:
      --config=/conf/collector.yaml
      --feature-gates=-confmap.unifyEnvVarExpansion,-component.UseLocalHostAsDefaultHost
Copy link

dosubot bot commented Sep 9, 2024

Hey @jihuiyang! I'm here to help you with any bugs, questions, or contributions. Let's tackle this together!

I found a similar open issue that might be relevant to your problem: [operator] Collector fails with featureGate errors when Upgrading the Operator to chart version 0.68.1. The issue you're encountering with the OpenTelemetry collector failing with featureGate errors is due to changes in the OpenTelemetryCollector CRD. Specifically, the confmap.unifyEnvVarExpansion feature gate is now stable and cannot be disabled.

To resolve this issue, you need to ensure that your CRDs are properly annotated and labeled before upgrading. Here are the steps:

  1. Annotate and label the CRDs:

    RELEASE_NAME=my-opentelemetry-operator
    RELEASE_NAMESPACE=opentelemetry-operator-system
    kubectl annotate crds instrumentations.opentelemetry.io opentelemetrycollectors.opentelemetry.io opampbridges.opentelemetry.io \
      meta.helm.sh/release-name=${RELEASE_NAME} \
      meta.helm.sh/release-namespace=${RELEASE_NAMESPACE}
    kubectl label crds instrumentations.opentelemetry.io opentelemetrycollectors.opentelemetry.io opampbridges.opentelemetry.io app.kubernetes.io/managed-by=Helm
  2. Follow the TLS certificate requirements and upgrade instructions provided in the OpenTelemetry Operator Helm Chart documentation.

Additionally, if you are upgrading from version 0.67 to 0.68, note that the preset.kubernetesEvents preset now excludes DELETED watch types. If you want to keep ingesting DELETED updates for Kubernetes Events, you will need to configure the k8sobjectsreceiver manually.

This issue is a known bug and will be fixed by this PR in the operator: open-telemetry/opentelemetry-operator#3074. This fix will be included in the next operator helm release [1].

To continue talking to Dosu, mention @dosu.

@dosubot dosubot bot added the bug Something isn't working label Sep 9, 2024
@jihuiyang
Copy link
Author

bug still exists in latest operator version

@TylerHelmuth
Copy link
Member

@jihuiyang You cannot disable the confmap.unifyEnvVarExpansion feature gate anymore.

Doing --feature-gates= -confmap.unifyEnvVarExpansion is not allowed

@TylerHelmuth TylerHelmuth added question Further information is requested and removed bug Something isn't working labels Sep 10, 2024
@jihuiyang
Copy link
Author

I did not specify --feature-gates=-confmap.unifyEnvVarExpansion, all i did was helm install 0.69.0 and somehow it had the this argument

@jihuiyang
Copy link
Author

maybe i should uninstall and whole thing and reinstall?

@TylerHelmuth
Copy link
Member

@jihuiyang I haven't been able to reproduce your issue yet with a clean install or with a helm upgrade. Can you provide more details?

@jaronoff97
Copy link
Contributor

@jihuiyang the latest version of the operator should resolve this by removing the gate from a collector (we previously needed to add this to the operator's code to prevent users from config from breaking from a collector change)

Can you link any logs you are seeing from the operator?

@jihuiyang
Copy link
Author

I tried a clean install and it worked without the --feature-gates=-confmap.unifyEnvVarExpansion flag. let me try just the upgrade

@jaronoff97
Copy link
Contributor

@jihuiyang thanks for trying that out. I have been debugging the upgrade process in this issue, if you run into a similar issue during the upgrade I would really appreciate if you could share the steps to reproduce. I've tried a few different ways of doing this and have yet to cause it to happen.

@jihuiyang
Copy link
Author

still running into the issue with upgrade, upgrading from 0.65.1 to 0.69

> helm --namespace otel-operator-system ls
NAME                  	NAMESPACE           	REVISION	UPDATED                             	STATUS  	CHART                        	APP VERSION
opentelemetry-operator	otel-operator-system	9       	2024-09-10 12:16:27.747165 -0700 PDT	deployed	opentelemetry-operator-0.69.0	0.108.0

Collector still sees the featuregate

> kubectl -n otel-collector describe ds/otel-collector-collector | grep feature
      --feature-gates=-confmap.unifyEnvVarExpansion,-component.UseLocalHostAsDefaultHost

Operator log

kubectl -n otel-operator-system logs po/opentelemetry-operator-595855cd5c-jx9hj -f
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","message":"Starting the OpenTelemetry Operator","opentelemetry-operator":"0.108.0","opentelemetry-collector":"ghcr.io/open-telemetry/opentelemetry-collector-releases/opentelemetry-collector-k8s:0.108.0","opentelemetry-targetallocator":"ghcr.io/open-telemetry/opentelemetry-operator/target-allocator:0.108.0","operator-opamp-bridge":"ghcr.io/open-telemetry/opentelemetry-operator/operator-opamp-bridge:0.108.0","auto-instrumentation-java":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-java:1.33.5","auto-instrumentation-nodejs":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-nodejs:0.52.1","auto-instrumentation-python":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-python:0.48b0","auto-instrumentation-dotnet":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-dotnet:1.2.0","auto-instrumentation-go":"ghcr.io/open-telemetry/opentelemetry-go-instrumentation/autoinstrumentation-go:v0.14.0-alpha","auto-instrumentation-apache-httpd":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4","auto-instrumentation-nginx":"ghcr.io/open-telemetry/opentelemetry-operator/autoinstrumentation-apache-httpd:1.0.4","feature-gates":"-operator.golang.flags,operator.observability.prometheus","build-date":"2024-09-05T17:19:14Z","go-version":"go1.22.6","go-arch":"amd64","go-os":"linux","labels-filter":[],"annotations-filter":[],"enable-multi-instrumentation":false,"enable-apache-httpd-instrumentation":true,"enable-dotnet-instrumentation":true,"enable-go-instrumentation":false,"enable-python-instrumentation":true,"enable-nginx-instrumentation":false,"enable-nodejs-instrumentation":true,"enable-java-instrumentation":true,"create-openshift-dashboard":false,"zap-message-key":"message","zap-level-key":"level","zap-time-key":"timestamp","zap-level-format":"uppercase"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"setup","message":"the env var WATCH_NAMESPACE isn't set, watching all namespaces"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"setup","message":"Prometheus CRDs are installed, adding to scheme."}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"setup","message":"Openshift CRDs are not installed, skipping adding to scheme."}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector","path":"/mutate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector","path":"/validate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1beta1-opentelemetrycollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/convert"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Conversion webhook enabled","GVK":"opentelemetry.io/v1beta1, Kind=OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=Instrumentation","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-instrumentation"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-v1-pod"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a mutating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpAMPBridge","path":"/mutate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/mutate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.builder","message":"Registering a validating webhook","GVK":"opentelemetry.io/v1alpha1, Kind=OpAMPBridge","path":"/validate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Registering webhook","path":"/validate-opentelemetry-io-v1alpha1-opampbridge"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"setup","message":"starting manager"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.metrics","message":"Starting metrics server"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","message":"starting server","name":"health probe","addr":"[::]:8081"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.metrics","message":"Serving metrics server","bindAddress":"0.0.0.0:8080","secure":false}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Starting webhook server"}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.certwatcher","message":"Updated current TLS certificate"}
I0910 19:17:10.847314       1 leaderelection.go:254] attempting to acquire leader lease otel-operator-system/9f7554c3.opentelemetry.io...
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.webhook","message":"Serving webhook server","host":"","port":9443}
{"level":"INFO","timestamp":"2024-09-10T19:17:10Z","logger":"controller-runtime.certwatcher","message":"Starting certificate watcher"}
I0910 19:18:05.154198       1 leaderelection.go:268] successfully acquired lease otel-operator-system/9f7554c3.opentelemetry.io
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","logger":"collector-upgrade","message":"looking for managed instances to upgrade"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","logger":"instrumentation-upgrade","message":"looking for managed Instrumentation instances to upgrade"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1alpha1.OpAMPBridge"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.ConfigMap"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.ServiceAccount"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.Service"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1beta1.OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","source":"kind source: *v1.Deployment"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting Controller","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ConfigMap"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceAccount"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Service"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Deployment"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.DaemonSet"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.StatefulSet"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.Ingress"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v2.HorizontalPodAutoscaler"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodDisruptionBudget"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.ServiceMonitor"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting EventSource","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","source":"kind source: *v1.PodMonitor"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","message":"Starting Controller","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector"}
{"level":"INFO","timestamp":"2024-09-10T19:18:05Z","logger":"instrumentation-upgrade","message":"no instances to upgrade"}
{"level":"INFO","timestamp":"2024-09-10T19:18:06Z","message":"Starting workers","controller":"opampbridge","controllerGroup":"opentelemetry.io","controllerKind":"OpAMPBridge","worker count":1}
{"level":"INFO","timestamp":"2024-09-10T19:18:06Z","message":"Starting workers","controller":"opentelemetrycollector","controllerGroup":"opentelemetry.io","controllerKind":"OpenTelemetryCollector","worker count":1}

@jaronoff97
Copy link
Contributor

jaronoff97 commented Sep 10, 2024

if you do this, do you see the string 'managed'?

k get otelcol -n otel-collector otel-collector -o yaml | grep 'managementState'
  managementState: managed

@jihuiyang
Copy link
Author

yes i do

> kubectl -n otel-collector get otelcol otel-collector -o yaml | grep 'managementState'
  managementState: managed

@fernandonogueira
Copy link

I also experienced this. I had to delete and recreate the OpentelemetryCollector resource (type = sidecar, in my case).

@ethanmdavidson
Copy link
Contributor

I'm also having a similar issue upgrading from 0.69.0 to 0.74.2 - The collector is failing to start, and logging Error: invalid argument "-component.UseLocalHostAsDefaultHost" for "--feature-gates" flag: no such feature gate "component.UseLocalHostAsDefaultHost".. I applied this upgrade across multiple clusters and only some had this issue. I've confirmed that all affected clusters have spec.args.feature-gates: -component.UseLocalHostAsDefaultHost in the OpenTelemetryCollector resource, and the non-affected clusters do not. However, I did not specify this option anywhere, and I'm deploying and upgrading the operator using the terraform helm provider, so I would expect to see the same behavior across all clusters.

@jaronoff97
Copy link
Contributor

@ethanmdavidson this should have been fixed in the release for https://github.com/open-telemetry/opentelemetry-operator/releases/tag/v0.110.0... I think this may be if the rollout for the operator is slow and an old version is still adding it. I had tried to repro this a few times and never was able to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

5 participants