This repository has been archived by the owner on Sep 7, 2020. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Feature/collect link metrics #831
Feature/collect link metrics #831
Changes from all commits
775d45b
0a43c89
ca4c7f7
289f2b7
11243d0
ea254fc
ea13ebc
29da78e
dcd990c
bc90856
542d1f0
86fa1c8
2c40dcd
8919a56
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
Large diffs are not rendered by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me some time to understand that this is only for ethernet links. I guess I should know by heart that 802.3 is Ethernet, but consider changing the file name to
eth_link_metrics_collector.cpp
(nitpick)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have to do this right...
All the stats fields are uint32_t and wrap. tx_errors probably won't wrap, but the number of rx/tx packets will wrap if the device stays up for a couple of days.
Therefore, we have to keep the previously measured value and subtract it (wrapping subtraction, I'm not entirely sure what the standard C++ way of doing that is).
While we're at it, we should probably also keep the timestamp. In 1905.1 the time of the measurement isn't specified (which renders it completely useless), but in R2 there is a timestamp included. And to make the metrics somewhat useful even in R1, we should probably divide by the sampling time so the numbers are at least comparable with each other.
So, I think it would be better to keep an object for the link metrics for each interface.
That also means you can keep the netlink fd open. Minor optimisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I cannot see the timestamp field you mentioned. Maybe I'm not looking at the right document. Can you please clarify which are those R1 and R2 docs?
I agree with you that if we cannot include the time measurement window along with the counter, then it is useless to provide the number of packets/errors during such measurement period.
I think however that, even if we could provide timestamp or elapsed time since last measure, we have no way to know if the number of packets we read with the netlink socket have wrapped (and how many times) since last call to get link metrics. Although such wrapping is quite improbable to go unnoticed, given the frequency with which calls to get link metrics will be probably issued, we cannot guarantee that wrapping won't happen. Keeping timestamps is useless if they are not provided by the kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should have written "we should keep a timestamp".
Stupid me again, I thought these were 64-bit numbers, but they're only 64-bit if you get them through netlink, not if you get them from IFLA_STATS64. Which apparently requires special handling because they're unaligned...
Which brings me to the question: why don't we use rtnl for this? E.g. https://gist.github.com/beejeebus/cc4fb07472cf5a0afd41 rtnl_link takes care of the 64-bit handling and fallback to 32-bit if the 64-bit stats are not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: since the code is there now anyway, I would merge it as is. But when later we refactor with #900, I think we should also look into using rtnl_link instead of doing all the parsing manually. However, before going into that, we should check if the libnl-tiny used in openwrt also has it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read in https://wireless.wiki.kernel.org/en/developers/documentation/nl80211
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not needed, just return true of false where needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Single vs. Multiple exit point, that's the question.
Have you agreed on which one to use or is it up to the programmer? I haven't seen anything about this topic in
CODINGSTYLE.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in general we favour multiple exit points, it keeps the code indentation shorter and is what we have in most of prplmesh. I will update the
CODYINGSTULE.md
if agreed - @fvbogaert you have a chance for another poll :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use multiple exit points everywhere. Because normally, all cleanup is automatic since we use smart pointers etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Get rid of unnecessary indentation:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
result
fromget_link_metrics(fd, ...)
is being ignored with your proposalThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we opt for multiple exit point we must call
close
twice.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going for single exit point then we should go all the way and do it like in the kernel, with
goto
. @fvbogaert should we usegoto
? Another poll..There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's my preferred pattern if there is any cleanup to do. However, we really should use automatic cleanup everywhere. So the better approach would be to wrap the file descriptor so it gets closed automatically by the destructor.