Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Component: ethtool receiver #9593

Closed
MovieStoreGuy opened this issue May 2, 2022 · 10 comments
Closed

New Component: ethtool receiver #9593

MovieStoreGuy opened this issue May 2, 2022 · 10 comments
Labels

Comments

@MovieStoreGuy
Copy link
Contributor

MovieStoreGuy commented May 2, 2022

The purpose and use-cases of the new component

when monitor network bound compute nodes, it becomes important to understand per network interface statistics to check for saturation.

Example configuration for the component

Looking to make transitioning from the telegraf agent to Open Telemetry collector simple to understand.

I would like it to be something similar:

ethtool:
   collection_interval: 10s
   interfaces:
     include:
     - pattern1
     - pattern2
    exclude:
    - lo0

In the event that a conflict was to happen, the exclude definition will override any include patterns.

The default configuration will monitor all interfaces excluding the loopback interface.

Telemetry data types supported

Metrics Only

Sponsor (Optional)

Looking for a sponsor 🙏🏽

Open to any suggestions.

@codeboten codeboten added the Sponsor Needed New component seeking sponsor label May 2, 2022
@codeboten
Copy link
Contributor

@MovieStoreGuy thanks for proposing the component. Could this receiver become a scraper in hostmetrics receiver or do you think it's specific enough that it should be a separate receiver?

@MovieStoreGuy
Copy link
Contributor Author

That sounds like a reasonable way forward, I don't intend for it to do much more than what is described here:

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Agent-network-performance.html

However, I wanted it to be adoptable by any infrastructure vendor

@jamesmoessis
Copy link
Contributor

Having it as a scraper in the hostmetrics receiver could make sense. It would be linux-only, but everything that ethool queries I believe is kernel-level, so it seems general enough.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 9, 2022

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions github-actions bot added the Stale label Nov 9, 2022
@atoulme
Copy link
Contributor

atoulme commented Jan 7, 2023

What metrics would you capture?

@bmcalary-atlassian
Copy link

[ec2-user ~]$ ethtool -S eth0
bw_in_allowance_exceeded: 0
bw_out_allowance_exceeded: 0
pps_allowance_exceeded: 0
conntrack_allowance_exceeded: 0
linklocal_allowance_exceeded: 0

From https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/monitoring-network-performance-ena.html

To know when an EC2 instance in AWS has hit otherwise hidden AWS imposed network allowances/limits.

for posterity, copied from that document:

bw_in_allowance_exceeded The number of packets queued or dropped because the inbound aggregate bandwidth exceeded the maximum for the instance.
bw_out_allowance_exceeded The number of packets queued or dropped because the outbound aggregate bandwidth exceeded the maximum for the instance.
conntrack_allowance_exceeded The number of packets dropped because connection tracking exceeded the maximum for the instance and new connections could not be established. This can result in packet loss for traffic to or from the instance.
linklocal_allowance_exceeded The number of packets dropped because the PPS of the traffic to local proxy services exceeded the maximum for the network interface. This impacts traffic to the DNS service, the Instance Metadata Service, and the Amazon Time Sync Service.
pps_allowance_exceeded The number of packets queued or dropped because the bidirectional PPS exceeded the maximum for the instance.

@github-actions github-actions bot removed the Stale label May 26, 2023
@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

@github-actions
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 24, 2023
@diranged
Copy link

This should get re-opened I think - these are critical metrics I was surprised to find are not supported yet in the hostmetrics receiver. I do think that's where it should go though.

@diranged
Copy link

@atoulme Is there a process for asking for this to be re-opened and evaluated?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants