-
Notifications
You must be signed in to change notification settings - Fork 814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should system.net.tcp.retrans_segs actually use a monotonic count? #2630
Comments
@jalaziz Indeed, currently So from Datadog there's no easy way to query, for instance, the number of segment retransmits that occurred over a certain timeframe. If I understand correctly this would be the kind of query that you'd like to be able to make? In that case a monotonic counter would make sense yes. As an aside, since changing the metric type would not be backwards-compatible, if we decide to make the change we'll need to either use another metric name for the counter or wait for a major release of the Agent. |
Totally understand that we'd have to use a different name. Some things that could be done now though are:
For being able to query the number of segments (and datagrams for udp), I agree that a new set of metrics using a monotonic counter would be ideal. I'm happy to to submit a PR for this. I'll try to come up with a reasonable name for the new metrics. Maybe simply suffix them with |
@jalaziz : The default units for the Feel free to open a PR for the new metrics. A |
Add monotonic counts for tcp segments and udp datagrams. This allows more precise counting of incoming and outgoing segments and datagrams for a given time period. Fixes DataDog#2630
Could it also be a flag in network.yaml? I'd prefer not to have the rates at all, since they aren't as useful as counts, and |
@ejholmes Are you suggesting that the flag would disable the rates and not use the suffix? Or it would just disable the rates? The reason for the suffix is to allow these metrics to be backwards compatible. If we remove the suffix, you'd probably have to fix the units in your account manually. @olivielpeau thoughts? |
@jalaziz yeah, I'm advocating for the default to use |
Add monotonic counts for tcp segments and udp datagrams. This allows more precise counting of incoming and outgoing segments and datagrams for a given time period. Count and rate metrics are also configurable, allowing users to prefer one type over the other. Fixes DataDog/dd-agent#2630
Add monotonic counts for tcp segments and udp datagrams. This allows more precise counting of incoming and outgoing segments and datagrams for a given time period. Count and rate metrics are also configurable, allowing users to prefer one type over the other. Fixes DataDog/dd-agent#2630
* [network] Add monotonic counts for some metrics Add monotonic counts for tcp segments and udp datagrams. This allows more precise counting of incoming and outgoing segments and datagrams for a given time period. Count and rate metrics are also configurable, allowing users to prefer one type over the other. Fixes DataDog/dd-agent#2630 * [network] Use '.count' as the count suffix * [network] Add new count metrics to metadata.csv * [network] Fix typo in example config
The documentation seems to indicate that
system.net.tcp.retrans_segs
and similar metrics are measured as gauges. However, it seems that it's actually being measured as a rate in thenetwork
check. I understand that rates are stored as gauges, and that a rate is simply storing the difference between the previous and current value, but since it's time normalized it can result in odd values.For example, we typically see things like 10.62 retransmitted segments even when using a
rollup
withsum
. This could also simply be to due to the fact that the measured interval using for normalization could be different than the flush interval pass into theflush
method.Using a monotonic count seems more appropriate here. Although, I could also see the rate being useful as well.
The text was updated successfully, but these errors were encountered: