Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add_cloud_metadata detecting wrong cloud provider (aws as openstack) #13816

Closed
aidan- opened this issue Sep 26, 2019 · 20 comments · Fixed by #41636
Closed

add_cloud_metadata detecting wrong cloud provider (aws as openstack) #13816

aidan- opened this issue Sep 26, 2019 · 20 comments · Fixed by #41636
Assignees
Labels

Comments

@aidan-
Copy link

aidan- commented Sep 26, 2019

I am running a large number of instances in AWS which have multiple beat agents running on them (metricbeat, winlogbeat and filebeat). A small percentage of instances are starting up and detecting that they are running on 'openstack' instead of 'ec2' (ie, meta.cloud.provider=openstack).

Looking through the code and the way the cloud platform is detected, it's not very surprising that this is occurring as it looks like the endpoints/paths used by EC2 and Openstack both collide with each other. This appears to be have been briefly discussed as a potential issue in the original pull request that added Openstack as a cloud provider: #7663 (comment) but it doesn't look like the concern was addressed.

Perhaps using the non-ec2 compatible Openstack metadata endpoint would be a simple solution to avoid this?

https://docs.openstack.org/nova/latest/user/metadata.html#metadata-openstack-format

Version:
libbeat v6.8.2

Operating System:
Experienced on Windows but would affect all.

Discuss Forum URL:
Not created by me but a pre-existing one:
https://discuss.elastic.co/t/add-cloud-metadata-wrong-provider/189780

Steps to reproduce:
Starting beat on AWS EC2 instances can sometimes result in openstack being identified as the provider:
INFO add_cloud_metadata/add_cloud_metadata.go:323 add_cloud_metadata: hosting provider type detected as openstack, metadata={"availability_zone":"ap-southeast-2a","instance_id":"i-xxxxxxxxxxxxxxx","instance_name":"ip-10-xx-xx-xx","machine_type":"r5d.2xlarge","provider":"openstack"}

@urso
Copy link

urso commented Oct 2, 2019

Beats 7.4 introduced a new setting to select the providers to query. Original PR #13812

If all your instances run on AWS, you can configure the processor as follows:

processors:
- add_cloud_metadata:
    providers: ["aws"]

@aidan-
Copy link
Author

aidan- commented Oct 10, 2019

Thanks for the information. We are currently running beats v6.x and it's unlikely we will upgrade to v7 in the short term, so we may have to live with this one.

@botelastic
Copy link

botelastic bot commented Sep 9, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@botelastic botelastic bot added Stalled needs_team Indicates that the issue/PR needs a Team:* label labels Sep 9, 2020
@inqueue
Copy link
Member

inqueue commented Sep 11, 2020

Is there any way to fix? Another user has reported the issue.

@botelastic botelastic bot removed the Stalled label Sep 11, 2020
@jsoriano jsoriano added the Team:Integrations Label for the Integrations team label May 10, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/integrations (Team:Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 10, 2021
@botelastic
Copy link

botelastic bot commented May 10, 2022

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label May 10, 2022
@VimCommando
Copy link

👍

@botelastic
Copy link

botelastic bot commented Jul 27, 2023

Hi!
We just realized that we haven't looked into this issue in a while. We're sorry!

We're labeling this issue as Stale to make it hit our filters and make sure we get back to it as soon as possible. In the meantime, it'd be extremely helpful if you could take a look at it as well and confirm its relevance. A simple comment with a nice emoji will be enough :+1.
Thank you for your contribution!

@botelastic botelastic bot added the Stalled label Jul 27, 2023
@andrewkroh
Copy link
Member

The suggested workaround is not available to Elastic Agent users who cannot modify the configuration for the add_cloud_metadata processor. We need a code fix and I think @aidan- 's suggestion is promising:

Perhaps using the non-ec2 compatible Openstack metadata endpoint would be a simple solution to avoid this?

@botelastic botelastic bot removed the Stalled label Aug 2, 2023
@andrewkroh andrewkroh changed the title add_cloud_metadata detecting wrong cloud provider add_cloud_metadata detecting wrong cloud provider (aws as openstack) Aug 8, 2023
@bplies-ATX
Copy link

The suggested workaround is not available to Elastic Agent users who cannot modify the configuration for the add_cloud_metadata processor. We need a code fix and I think @aidan- 's suggestion is promising:

Perhaps using the non-ec2 compatible Openstack metadata endpoint would be a simple solution to avoid this?

We just upgraded Elastic Agent from 8.7.1 to 8.9.1 and started to notice misidentifications as well. Notice cloud.provider and cloud.service.name are now wrong.

    "cloud": {
      "availability_zone": "us-east-1b",
      "instance": {
        "name": "ip-10-102-2-203.ec2.internal",
        "id": "i-0639f8a4c790252e4"
      },
      "provider": "openstack",
      "machine": {
        "type": "t3.2xlarge"
      },
      "service": {
        "name": "Nova"
      }
    }

@renzedj
Copy link

renzedj commented Nov 21, 2023

I'm encountering this with 8.11.x. AWS is being misidentified as Openstack.

@it-ops-liron
Copy link

Same here. Just tested migrating to 8.11 from 8.5 and found out some data was mislabeled as "openstack"

@BenB196
Copy link

BenB196 commented Dec 28, 2023

Also hitting this issue on 8.11 agent.

Edit: One thing that I found was a bit more helpful in generally fixing this issue was to ensure the IMDSv2 was set to required, not entirely sure why that makes a difference, but at least in my case it did.

@udayshingwekar
Copy link

I am hitting the same issue in 8.11 and resolved it by adding processor in each of system integration outputs (very painful as I could not find a global way to do so) in the fleet managed elastic agents.

  • add_cloud_metadata:
    providers: ["aws"]

@toddferg
Copy link
Contributor

toddferg commented Feb 2, 2024

I think I found a workaround that can use the @Custom ingest pipeline that will require less specific workarounds.

PUT _ingest/pipeline/metrics-aws.ec2_metrics@custom
{
  "description": "Custom pipeline for AWS EC2 metrics with failure handling",
  "processors": [
    {
      "script": {
        "source": """
        if (ctx.cloud?.provider != null && ctx.cloud.provider == 'openstack') {
          ctx.cloud.provider = 'aws';
        }
        """,
        "on_failure": [
          {
            "set": {
              "field": "_ingest._failure_message",
              "value": "{{ _ingest.on_failure_message }}"
            }
          }
        ]
      }
    }
  ]
}

PUT _ingest/pipeline/logs-aws.ec2_logs@custom
{
  "description": "Custom pipeline for AWS EC2 logs with failure handling",
  "processors": [
    {
      "script": {
        "source": """
        if (ctx.cloud?.provider != null && ctx.cloud.provider == 'openstack') {
          ctx.cloud.provider = 'aws';
        }
        """,
        "on_failure": [
          {
            "set": {
              "field": "_ingest._failure_message",
              "value": "{{ _ingest.on_failure_message }}"
            }
          }
        ]
      }
    }
  ]
}

@axw
Copy link
Member

axw commented Feb 22, 2024

Perhaps using the non-ec2 compatible Openstack metadata endpoint would be a simple solution to avoid this?

I think this makes sense, but may require a substantial amount of testing. Another option that involves fewer changes would be to check if the OpenStack-specific endpoint exists, and then continue using the EC2-compatible endpoint for returning the values.

Ideally we should have some automated integration testing for this. Probably not running all the time, possibly just on-demand. I was looking for an easy way to test against OpenStack and found https://microstack.run/docs/single-node; I tried it in an EC2 instance and it's timing out, so not sure if that's a viable option.

EDIT: managed to get it working installed, I was using the wrong instance type earlier.
EDIT2: even after it's installed, it's still not working... trying to create an OpenStack VM fails

@george-viaud
Copy link

george-viaud commented Jul 30, 2024

Experiencing this using fleet, agent v8.14.3

Our infra is AWS, EC2

Still seeing:

cloud.provider: openstack

We have tried to find a way to force configuration via fleet to:

processors:
  - add_cloud_metadata:
      providers: ["aws"]

but so far no luck.

If it makes a difference, we are registering our ephemeral ec2 instances via cron on startup:

./elastic-agent install --url=https://[OBFUSCATED]:8220 --insecure --force --enrollment-token=[OBFUSCATED]

Any advice would be greatly appreciated

@george-viaud
Copy link

Some additional information (for my case, at least) - I noticed that our fleet server instance is getting the correct cloud.provider and service - it appears that the instances using the fleet-configured Apache HTTP Server as well as the system integration access log entries (and perhaps others) that are getting the wrong integration info. Not sure how to debug this, wish I could help myself and others further.

@Kavindu-Dodan
Copy link
Contributor

Kavindu-Dodan commented Oct 31, 2024

I had a look into this and the following are my observations.

Background

The root cause have few aspects, first the openstack implementation 1 relies on the EC2-compatible metadata 2 endpoints. Then the both Openstack and EC2/AWS implementations are enabled by default 3 4. (note - Local is a misleading name)

For AWS EC2 instances that enforce IMDSv2, openstack metadata fetch fails as IMDSv2 require a session token 5 to access endpoints (as observed here - #13816 (comment)). This makes Openstack implementation to fail where EC2 metadata fetch wins the race condition.

Action

I am looking into migrating openstack implementation to use Nova metadata service 6 as proposed by many. Further, while this is being investigated, a workaround here is to use the providers selector in the processor. For example, in metricbeat.yaml,

processors:
  - add_cloud_metadata:
      providers:
        aws

Footnotes

  1. https://github.com/elastic/beats/blob/v8.15.3/libbeat/processors/add_cloud_metadata/provider_openstack_nova.go#L26-L29

  2. https://docs.openstack.org/nova/latest/user/metadata.html#ec2-compatible-metadata

  3. https://github.com/elastic/beats/blob/v8.15.3/libbeat/processors/add_cloud_metadata/provider_openstack_nova.go#L37

  4. https://github.com/elastic/beats/blob/v8.15.3/libbeat/processors/add_cloud_metadata/provider_aws_ec2.go#L58

  5. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/instancedata-data-retrieval.html#imds-considerations

  6. https://docs.openstack.org/nova/latest/user/metadata.html#nova-metadata

@Kavindu-Dodan
Copy link
Contributor

#41636 attempts to fix this by adding priority to AWS/EC2 & Azure metadata fetch mechanisms. I had to do this as I was unable to get a stable Openstack instance to validate their dedicated metadata endpoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.