-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: After a short time after (re)start of ADOT pod, metrics flow to managed Prometheus stops for most of metrics #171
Comments
I opened same issue here : aws-observability/aws-otel-collector#2091 |
We will find a workaround for that in a PR shortly, but thanks for opening on the ADOT repo @ktibi |
PR submitted, might need to run this for a few days to confirm @dzilbermanvmw @ktibi |
@bonclay7 I deployed your fix since this morning. For the moment all is good.
|
Awesome #174 (comment) |
We merged a fix #174, this will go in the next release. Please reopen if you observe the same behaviour |
Many thanks @bonclay7 and team for addressing this issue, happy to confirm it is working just fine and I'm seeing uninterrupted stream of metrics all the way from ADOT to AMP and AMG! |
Welcome to the AWS Observability Accelerator
AWS Observability Accelerator Release version
2.4.0
What is your environment, configuration and the example used?
EKS API v1.24.13-eks-0a21954
Managed Grafana: version 9.4
Managed Prometheus:
ADOT: v0.74.0 - eksbuild.1
What did you do and What did you see instead?
I have deployed the AWS Observability accelerator blueprint example following the instructions. Right after the restart ALL metrics are showing up OK, then in about 4 min they stop and exception like the following are observed in the ADOT pod (namespace adot-collector-kubeprometheus) :
2023-06-02T23:27:24.158Z warn internal/transaction.go:121 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1685748444141, "target_labels": "{__name__=\"up\", cluster=\"do-eks-tf-dz\", instance=\"10.11.3.29:9100\", job=\"node-exporter\", nodename=\"ip-10-11-3-29.ec2.internal\", region=\"us-east-1\"}"} 2023-06-02T23:27:25.083Z warn scrape/scrape.go:1372 Append failed {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_pool": "kubelet", "target": "https://kubernetes.default.svc.cluster.local:443/api/v1/nodes/ip-10-11-11-126.ec2.internal/proxy/metrics/cadvisor", "error": "invalid sample: non-unique label names: \"cluster\""} 2023-06-02T23:27:25.083Z warn internal/transaction.go:121 Failed to scrape Prometheus endpoint {"kind": "receiver", "name": "prometheus", "data_type": "metrics", "scrape_timestamp": 1685748445038, "target_labels": "{__name__=\"up\", beta_kubernetes_io_arch=\"arm64\", beta_kubernetes_io_instance_type=\"c7g.4xlarge\", beta_kubernetes_io_os=\"linux\", cluster=\"do-eks-tf-dz\", eks_amazonaws_com_capacityType=\"ON_DEMAND\", eks_amazonaws_com_nodegroup=\"cpu-graviton-man\", eks_amazonaws_com_nodegroup_image=\"ami-06326b7ef5c114349\", failure_domain_beta_kubernetes_io_region=\"us-east-1\", failure_domain_beta_kubernetes_io_zone=\"us-east-1a\", instance=\"ip-10-11-11-126.ec2.internal\", job=\"kubelet\", k8s_io_cloud_provider_aws=\"51d0ed1b12453098a108c272e71e962f\", kubernetes_io_arch=\"arm64\", kubernetes_io_hostname=\"ip-10-11-11-126.ec2.internal\", kubernetes_io_os=\"linux\", node_cpu=\"graviton\", node_kubernetes_io_instance_type=\"c7g.4xlarge\", node_role=\"compute\", region=\"us-east-1\", scale_model=\"bert\", topology_kubernetes_io_region=\"us-east-1\", topology_kubernetes_io_zone=
Additional Information
The text was updated successfully, but these errors were encountered: