Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vCenter Receiver : tls settings for vCenter receiver doesn't work as expected. #13447

Closed
manjunath-batakurki opened this issue Aug 22, 2022 · 8 comments
Assignees
Labels
bug Something isn't working priority:p2 Medium receiver/vcenter

Comments

@manjunath-batakurki
Copy link

manjunath-batakurki commented Aug 22, 2022

Describe the bug

vCenter receiver throws an error even after the tls settings are set as follows(Please note that this is only for verification, we would have tls settings enabled in Prod Environments)
tls:
insecure: true
insecure_skip_verify: true

Strange thing is that, this error gets reported after some amount of time and the the only way so far to recover is to kill the docker and restart another one.

Can someone help on how to look at this issue ?
otel-collector_1 | 2022-08-20T06:26:18.532Z error scraperhelper/scrapercontroller.go:197 Error scraping metrics {"kind": "receiver", "name": "vcenter", "error": "unable to get datacenter lists: Post "https://172.16.5.13/sdk\": x509: cannot validate certificate for 172.16.5.13 because it doesn't contain any IP SANs", "scraper": "vcenter"}
otel-collector_1 | go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).scrapeMetricsAndReport
otel-collector_1 | go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:197
otel-collector_1 | go.opentelemetry.io/collector/receiver/scraperhelper.(*controller).startScraping.func1
otel-collector_1 | go.opentelemetry.io/[email protected]/receiver/scraperhelper/scrapercontroller.go:172
otel-collector_1 | 2022-08-20T06:26:18.532Z INFO loggingexporter/logging_exporter.go:57 MetricsExporter {"#metrics": 0}
otel-collector_1 | 2022-08-20T06:26:18.532Z DEBUG loggingexporter/logging_exporter.go:67
otel-collector_1 | 2022-08-20T06:26:18.533Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "prometheusremotewrite", "error": "invalid tsMap: cannot be empty map", "interval": "55.233015ms"}
otel-collector_1 | 2022-08-20T06:26:28.591Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "prometheusremotewrite", "error": "invalid tsMap: cannot be empty map", "interval": "114.858199ms"}
otel-collector_1 | 2022-08-20T06:26:38.779Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "prometheusremotewrite", "error": "invalid tsMap: cannot be empty map", "interval": "245.836146ms"}
otel-collector_1 | 2022-08-20T06:26:48.802Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "prometheusremotewrite", "error": "invalid tsMap: cannot be empty map", "interval": "259.441708ms"}
otel-collector_1 | 2022-08-20T06:26:58.873Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "prometheusremotewrite", "error": "invalid tsMap: cannot be empty map", "interval": "289.038953ms"}
otel-collector_1 | 2022-08-20T06:27:08.907Z info exporterhelper/queued_retry.go:215 Exporting failed. Will retry the request after interval. {"kind": "exporter", "name": "prometheusremotewrite", "error": "invalid tsMap: cannot be empty map", "interval": "291.204126ms"}
otel-collector_1 | 2022-08-20T06:27:18.462Z error exporterhelper/queued_retry_inmemory.go:107 Exporting failed. No more retries left. Dropping data. {"kind": "exporter", "name": "prometheusremotewrite", "error": "max elapsed time expired invalid tsMap: cannot be empty map", "dropped_items": 0}
otel-collector_1 | go.opentelemetry.io/collector/exporter/exporterhelper.onTemporaryFailure
otel-collector_1 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:107
otel-collector_1 | go.opentelemetry.io/collector/exporter/exporterhelper.(*retrySender).send
otel-collector_1 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry.go:199
otel-collector_1 | go.opentelemetry.io/collector/exporter/exporterhelper.(*metricsSenderWithObservability).send
otel-collector_1 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/metrics.go:132
otel-collector_1 | go.opentelemetry.io/collector/exporter/exporterhelper.(*queuedRetrySender).start.func1
otel-collector_1 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/queued_retry_inmemory.go:119
otel-collector_1 | go.opentelemetry.io/collector/exporter/exporterhelper/internal.consumerFunc.consume
otel-collector_1 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:82
otel-collector_1 | go.opentelemetry.io/collector/exporter/exporterhelper/internal.(*boundedMemoryQueue).StartConsumers.func2
otel-collector_1 | go.opentelemetry.io/[email protected]/exporter/exporterhelper/internal/bounded_memory_queue.go:69

Steps to reproduce
start the docker using docker-composer up using the steps mentioned here.
https://opentelemetry.io/docs/collector/getting-started/

What did you expect to see?
Issue is that thought we have skipped tls settings verification, still we get the error reporting unauth certificate. We expect that it should skip the cert verification

What did you see instead?

What version did you use?
Version: 0.53.0

What config did you use?
Config:
[root@testotel gl-gateway-opentelemetry-collector]# cat /etc/otel-collector-config.yaml
receivers:
vcenter:
endpoint: "https://<vcenter_ip>"
username: ''
password: ''
collection_interval: 1m
tls:
insecure: true
insecure_skip_verify: true

exporters:
logging:
loglevel: debug
prometheusremotewrite:
endpoint: "https://"
headers:
Authorization: "Bearer "

service:
pipelines:
metrics:
receivers: [vcenter]
exporters: [logging, prometheusremotewrite]

Environment
OS: Debian

Additional context
Add any other context about the problem here.

@manjunath-batakurki manjunath-batakurki added the bug Something isn't working label Aug 22, 2022
@djaglowski
Copy link
Member

cc @schmikei

@schmikei
Copy link
Contributor

Thanks for the bug report @manjunath-batakurki!

I might not be able to get to this until next week, but I will be looking into this!

In the meanwhile I'm going to see if I can replicate. I'm guessing it has something to do with the session cookies that vCenter uses for its SOAP API and it is using the x509 as a fallback auth mechanism which may not be setup and leads to a poor error message. My theory is that we need to re-establish the sessions intermittently to keep the receiver going, but will dig deeper when i have the chance.

Do you mind sharing how long it usually takes for it to error inside the container @manjunath-batakurki ?

@manjunath-batakurki
Copy link
Author

Thank you @schmikei for the response. I do not know the exact time as such as the issue is seen inconsistency. But in general, issue is seen after at least 30 minz

Please let me know if you need more details

@schmikei
Copy link
Contributor

@manjunath-batakurki I submitted a draft PR that I think might resolve this, going to be doing some more long-form testing to try and make sure it actually resolves your issue 😄

@manjunath-batakurki
Copy link
Author

manjunath-batakurki commented Sep 1, 2022

Thank you @schmikei . I will also verify from my end and get back.

@schmikei
Copy link
Contributor

schmikei commented Sep 1, 2022

Thank you @schmikei . I will also verify from my end and get back.

Sweet let me know what you find!

If you still run into issues you may want to play around with the environment variable GOVMOMI_INSECURE_COOKIES similar to how govc handles trying to communicate through non-tls endpoints: https://github.com/vmware/govmomi/blob/master/govc/README.md#notauthenticated

I had to do this while recording my vCenter via a proxy because the security settings of the client did not want to trust the proxy connection.

@manjunath-batakurki
Copy link
Author

THank you. I see that issue is not seen now. I will monitor the receiver for some more time.

@schmikei
Copy link
Contributor

schmikei commented Sep 7, 2022

Closed by #13733

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working priority:p2 Medium receiver/vcenter
Projects
None yet
Development

No branches or pull requests

4 participants