You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Improve the definition of the hw.errors metric in guidelines for hardware network adapter and physical disk . Add one or more attribute to qualify the type of errors.
What did you expect to see?
The definition of the hw.errors metric in hardware network adapter should not be ambiguous. I'm guessing the intent is to count the number of packets that contained errors preventing them from being delivered, i.e., I/O errors, but the spec is not clear. Maybe the intent was to count any error that could happen on the adapter, including count of packet errors (IO errors), hardware component failures and chipset errors.
Additional context.
General Case
In the general case, the spec defines the hw.errors metric as the number of errors encountered by the component. Since the type of error is not qualified, this could include any type of error, such as hardware failure, firmware bugs, I/O bus errors, I/O device errors, machine check exceptions, etc. It's a bit odd to use a single metric for all of these types of errors.
hw.errors in Network adapter metrics
In the network adapter metrics, the hw.errors is defined as the number of errors encountered by the network adapter.
For packet I/O errors, it's useful to know the direction of the error. I.e. counting the number of received packets that couldn't be delivered (malformed packet, CRC error, buffer full, etc) versus the number of packets that could not be transmitted.
If the spec is not clear, that could mean different instruments report network and disk hw.errors in different ways. Some implementation may only report I/O errors, while other implementations could report hardware errors, which are very different.
For example, if a chipset error occurs, the network adapter may have to be replaced. If packets with wrong CRC are received, the network adapter has most likely nothing to do with the problem. It could be a software issue or a hardware problem on a remote system. The errors could also be injected intentionally to test how systems handle network I/O errors.
hw.errors in Physical Disk Errors
This is similar to network errors. Physical disks expose many error counters through SMART, including IO errors (e.g. read errors, write errors, and errors that are not about I/O.
sebastien-rosset
changed the title
Definition of 'hw.errors' metric for hardware network adapter is ambiguous
Definition of 'hw.errors' metric for hardware network adapter and physical disks are ambiguous
Jan 23, 2023
What are you trying to achieve?
Improve the definition of the
hw.errors
metric in guidelines for hardware network adapter and physical disk . Add one or more attribute to qualify the type of errors.What did you expect to see?
The definition of the
hw.errors
metric in hardware network adapter should not be ambiguous. I'm guessing the intent is to count the number of packets that contained errors preventing them from being delivered, i.e., I/O errors, but the spec is not clear. Maybe the intent was to count any error that could happen on the adapter, including count of packet errors (IO errors), hardware component failures and chipset errors.Additional context.
General Case
In the general case, the spec defines the
hw.errors
metric as the number of errors encountered by the component. Since the type of error is not qualified, this could include any type of error, such as hardware failure, firmware bugs, I/O bus errors, I/O device errors, machine check exceptions, etc. It's a bit odd to use a single metric for all of these types of errors.hw.errors in Network adapter metrics
In the network adapter metrics, the
hw.errors
is defined as the number of errors encountered by the network adapter.For packet I/O errors, it's useful to know the
direction
of the error. I.e. counting the number of received packets that couldn't be delivered (malformed packet, CRC error, buffer full, etc) versus the number of packets that could not be transmitted.If the spec is not clear, that could mean different instruments report network and disk
hw.errors
in different ways. Some implementation may only report I/O errors, while other implementations could report hardware errors, which are very different.For example, if a chipset error occurs, the network adapter may have to be replaced. If packets with wrong CRC are received, the network adapter has most likely nothing to do with the problem. It could be a software issue or a hardware problem on a remote system. The errors could also be injected intentionally to test how systems handle network I/O errors.
hw.errors in Physical Disk Errors
This is similar to network errors. Physical disks expose many error counters through SMART, including IO errors (e.g. read errors, write errors, and errors that are not about I/O.
Proposal
direction
attribute forhw.errors
. This will help to distinguish between ingress versus egress errors on the network interface. But see Attribute names in semantic guidelines should be hierarchical #3131.The text was updated successfully, but these errors were encountered: