-
Notifications
You must be signed in to change notification settings - Fork 32
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed up to "split link metric query TLV into two TLVs".
* @return True on success and false otherwise. | ||
*/ | ||
virtual bool get_link_metrics(const std::string &local_interface_name, | ||
const sMacAddr &neighbor_interface_address, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in practice it will never be possible to distinguish between different neighbours that are connected on the same interface. This is a serious shortcoming of the 1905.1 spec, but not much we can do about it... @mariomaz do you think you can ask your colleagues who are more familiar with 1905.1 how you are supposed to deal with interfaces where there are multiple neighbours, or none?
Anyway, my point is: I think in practice we will only use the local_interface_name.
(nitpick)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you.
I've checked our 1905.1 implementation at this point and found this TODO which also confirms your opinion:
// TODO: In Linux there is no easy way to obtain this
// information. We will just report the amount of *total* packets
// transmitted from our local interface, no matter the
// destination.
// This information will only be correct when the local interface
// is connected to one single remote interface... however we
// better report this than nothing at all.
However, be aware that this is an interface which will be implemented by different collectors, not only the 802.3 collector. In the 802.11 collector it is possible to distinguish between different neighbors and thus we do need the neighbor_interface_address parameter.
req.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtgenmsg)); | ||
req.hdr.nlmsg_type = RTM_GETLINK; | ||
req.hdr.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; | ||
req.hdr.nlmsg_seq = 1; | ||
req.hdr.nlmsg_pid = 0; | ||
req.gen.rtgen_family = AF_PACKET; /* no preferred AF, we will get *all* interfaces */ | ||
|
||
io.iov_base = &req; | ||
io.iov_len = req.hdr.nlmsg_len; | ||
rtnl_msg.msg_iov = &io; | ||
rtnl_msg.msg_iovlen = 1; | ||
rtnl_msg.msg_name = &kernel; | ||
rtnl_msg.msg_namelen = sizeof(kernel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to open-code all this, rather than using libnl3 genl? In fact, we could move base_wlan_hal_nl80211::send_nl80211_msg to network_utils and use that function. The nl80211 part of it is only hardcoded in the genl_ctrl_resolve call.
Anyway, something that can be done later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you.
This is something that could probably be done using changes in "Feature/add support for nl80211 standard commands to BWL #900". However, be aware that such changes have been made into BWL and not in network_utils.
Anyway, I agree with you that this is a refactoring that could be done later (I think it is not going to be so easy and quick as might seem at first glance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not indeed move base_wlan_hal_nl80211::send_nl80211_msg
to BCL? just that function, shouldn't be too hard even as a followup commit in this PR. I really prefer that we don't duplicate code when we really don't need to. Still, as you say, it can be done later, so (nitpoick)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Method base_wlan_hal_nl80211::send_nl80211_msg() is going to disappear after we had approved the design and merged "Feature/add support for nl80211 standard commands to BWL #900" and done a refactoring of the NL80211 version of the BWL.
The work to be done is then to implement the get_link_metrics() method using the new classes defined in #900.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of the new classes, the question remains why you would do all the parsing manually rather than use one of the rtnl_link_get_ functions to get all info in a nice struct.
/** | ||
* This message contains the stats for the interface we are interested in | ||
*/ | ||
if (0 == strcmp(local_interface_name.c_str(), (char *)RTA_DATA(attribute))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe there is a way to specify this already in the GET message, so instead of getting all interfaces, you get just one.
You would need to convert it to ifindex first though, IIRC.
(nitpick).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a nitpick comment, the change could probably be done as part of the previous refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strcmp is baned C function if I'm not mistaken.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it appears I was mistaken and apparently the list of banned C functions by Intel SDL is listed in github:
https://github.com/intel/safestringlib/wiki/SDL-List-of-Banned-Functions
We should add this to our CONTRIBUTING.md, perhaps @vitalii-komisarenko @abelog or @VladyslavTupikin can do that as good quick first PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although strcmp is not in the list of banned C functions, I've replaced it with std::strncmp() C++ function which is safer.
} | ||
} | ||
|
||
link_metrics.transmitter.packet_errors = stats->tx_errors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe we have to do this right...
All the stats fields are uint32_t and wrap. tx_errors probably won't wrap, but the number of rx/tx packets will wrap if the device stays up for a couple of days.
Therefore, we have to keep the previously measured value and subtract it (wrapping subtraction, I'm not entirely sure what the standard C++ way of doing that is).
While we're at it, we should probably also keep the timestamp. In 1905.1 the time of the measurement isn't specified (which renders it completely useless), but in R2 there is a timestamp included. And to make the metrics somewhat useful even in R1, we should probably divide by the sampling time so the numbers are at least comparable with each other.
So, I think it would be better to keep an object for the link metrics for each interface.
That also means you can keep the netlink fd open. Minor optimisation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I cannot see the timestamp field you mentioned. Maybe I'm not looking at the right document. Can you please clarify which are those R1 and R2 docs?
I agree with you that if we cannot include the time measurement window along with the counter, then it is useless to provide the number of packets/errors during such measurement period.
I think however that, even if we could provide timestamp or elapsed time since last measure, we have no way to know if the number of packets we read with the netlink socket have wrapped (and how many times) since last call to get link metrics. Although such wrapping is quite improbable to go unnoticed, given the frequency with which calls to get link metrics will be probably issued, we cannot guarantee that wrapping won't happen. Keeping timestamps is useless if they are not provided by the kernel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I cannot see the timestamp field you mentioned.
I should have written "we should keep a timestamp".
we have no way to know if the number of packets we read with the netlink socket have wrapped (and how many times) since last call to get link metrics.
Stupid me again, I thought these were 64-bit numbers, but they're only 64-bit if you get them through netlink, not if you get them from IFLA_STATS64. Which apparently requires special handling because they're unaligned...
Which brings me to the question: why don't we use rtnl for this? E.g. https://gist.github.com/beejeebus/cc4fb07472cf5a0afd41 rtnl_link takes care of the 64-bit handling and fallback to 32-bit if the 64-bit stats are not available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: since the code is there now anyway, I would merge it as is. But when later we refactor with #900, I think we should also look into using rtnl_link instead of doing all the parsing manually. However, before going into that, we should check if the libnl-tiny used in openwrt also has it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Read in https://wireless.wiki.kernel.org/en/developers/documentation/nl80211
libnl tiny
OpenWrt folks created a tiny version of libnl based on a git snapshot, which only contains genl, not rtnetlink or any of the netfilter stuff, and compiles down to less than 30k in binary size. You can find it here:
https://git.openwrt.org/?p=openwrt/openwrt.git;a=tree;f=package/libs/libnl-tiny;hb=HEAD
agent/src/beerocks/slave/backhaul_manager/backhaul_manager_thread.cpp
Outdated
Show resolved
Hide resolved
agent/src/beerocks/slave/backhaul_manager/backhaul_manager_thread.cpp
Outdated
Show resolved
Hide resolved
agent/src/beerocks/slave/backhaul_manager/backhaul_manager_thread.cpp
Outdated
Show resolved
Hide resolved
agent/src/beerocks/slave/backhaul_manager/backhaul_manager_thread.cpp
Outdated
Show resolved
Hide resolved
agent/src/beerocks/slave/backhaul_manager/backhaul_manager_thread.cpp
Outdated
Show resolved
Hide resolved
I've also replaced strcmp() banned function with std::strncmp() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review only for now, so just comments
/** | ||
* This message contains the stats for the interface we are interested in | ||
*/ | ||
if (0 == strcmp(local_interface_name.c_str(), (char *)RTA_DATA(attribute))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
strcmp is baned C function if I'm not mistaken.
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Show resolved
Hide resolved
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Outdated
Show resolved
Hide resolved
/** | ||
* This message contains the stats for the interface we are interested in | ||
*/ | ||
if (0 == strcmp(local_interface_name.c_str(), (char *)RTA_DATA(attribute))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it appears I was mistaken and apparently the list of banned C functions by Intel SDL is listed in github:
https://github.com/intel/safestringlib/wiki/SDL-List-of-Banned-Functions
We should add this to our CONTRIBUTING.md, perhaps @vitalii-komisarenko @abelog or @VladyslavTupikin can do that as good quick first PR.
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Show resolved
Hide resolved
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Outdated
Show resolved
Hide resolved
req.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtgenmsg)); | ||
req.hdr.nlmsg_type = RTM_GETLINK; | ||
req.hdr.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; | ||
req.hdr.nlmsg_seq = 1; | ||
req.hdr.nlmsg_pid = 0; | ||
req.gen.rtgen_family = AF_PACKET; /* no preferred AF, we will get *all* interfaces */ | ||
|
||
io.iov_base = &req; | ||
io.iov_len = req.hdr.nlmsg_len; | ||
rtnl_msg.msg_iov = &io; | ||
rtnl_msg.msg_iovlen = 1; | ||
rtnl_msg.msg_name = &kernel; | ||
rtnl_msg.msg_namelen = sizeof(kernel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not indeed move base_wlan_hal_nl80211::send_nl80211_msg
to BCL? just that function, shouldn't be too hard even as a followup commit in this PR. I really prefer that we don't duplicate code when we really don't need to. Still, as you say, it can be done later, so (nitpoick)
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Show resolved
Hide resolved
e2e175f
to
f42ee33
Compare
60f68a6
to
7485d32
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except for doing the test in python instead of shell, all my comments are nitpicky.
} | ||
} | ||
|
||
link_metrics.transmitter.packet_errors = stats->tx_errors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry but I cannot see the timestamp field you mentioned.
I should have written "we should keep a timestamp".
we have no way to know if the number of packets we read with the netlink socket have wrapped (and how many times) since last call to get link metrics.
Stupid me again, I thought these were 64-bit numbers, but they're only 64-bit if you get them through netlink, not if you get them from IFLA_STATS64. Which apparently requires special handling because they're unaligned...
Which brings me to the question: why don't we use rtnl for this? E.g. https://gist.github.com/beejeebus/cc4fb07472cf5a0afd41 rtnl_link takes care of the 64-bit handling and fallback to 32-bit if the 64-bit stats are not available.
req.hdr.nlmsg_len = NLMSG_LENGTH(sizeof(struct rtgenmsg)); | ||
req.hdr.nlmsg_type = RTM_GETLINK; | ||
req.hdr.nlmsg_flags = NLM_F_REQUEST | NLM_F_DUMP; | ||
req.hdr.nlmsg_seq = 1; | ||
req.hdr.nlmsg_pid = 0; | ||
req.gen.rtgen_family = AF_PACKET; /* no preferred AF, we will get *all* interfaces */ | ||
|
||
io.iov_base = &req; | ||
io.iov_len = req.hdr.nlmsg_len; | ||
rtnl_msg.msg_iov = &io; | ||
rtnl_msg.msg_iovlen = 1; | ||
rtnl_msg.msg_name = &kernel; | ||
rtnl_msg.msg_namelen = sizeof(kernel); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regardless of the new classes, the question remains why you would do all the parsing manually rather than use one of the rtnl_link_get_ functions to get all info in a nice struct.
* The IEEE 1905.1 standard says about the Link Metric Query TLV and the neighbor type | ||
* octet that "If the value is 0, then the EUI48 field is not present; if the value is 1, | ||
* then the EUI-48 field shall be present." | ||
* | ||
* However, optional fields are not currently supported by TLVF. | ||
* | ||
* As a workaround, instead of defining a tlvLinkMetricQuery TLV with an optional field, | ||
* we have defined two different TLVs, one with the optional field and the other one | ||
* without it. Application must then check the length of received TLV to know if optional | ||
* field (MAC address of neighbor device) is present or not and then create an instance of | ||
* either tlvLinkMetricQuery or tlvLinkMetricQueryAllNeighbors respectively. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if I said it before, but this is really an excellent explanation.
/** | ||
* Set alias flag to true if link metrics for a specific neighbor have been requested | ||
*/ | ||
bool specific_neighbor = | ||
ieee1905_1::eLinkMetricNeighborType::SPECIFIC_NEIGHBOR == neighbor_type; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nitpick) I would set this directly in the condition above instead of saving neighbor_type.
7485d32
to
20eee01
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have only nitpicks - this is an amazing work @mariomaz - well done!
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Outdated
Show resolved
Hide resolved
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Outdated
Show resolved
Hide resolved
@@ -0,0 +1,279 @@ | |||
/* SPDX-License-Identifier: BSD-2-Clause-Patent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It took me some time to understand that this is only for ethernet links. I guess I should know by heart that 802.3 is Ethernet, but consider changing the file name to eth_link_metrics_collector.cpp
(nitpick)
* If the media type of the link is IEEE 802.11, then this value is the estimated RSSI in dB at | ||
* the receive side of the link expressed in dB; otherwise, it is set to 0xFF. | ||
*/ | ||
uint8_t rssi = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So why not set it to UINT8_MAX by default instead of 0? Then you don't need to set it in the ethernet link collector implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This struct is going to be used in the WiFi link collector implementation too. UINT8_MAX is not a default value, but the value to use for ethernet links (according to standard)
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Show resolved
Hide resolved
/** | ||
* Get link metrics information | ||
*/ | ||
if (collector && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better to split to 2 if conditions (nitpick)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some static code analyzers would complain in such case.
agent/src/beerocks/slave/backhaul_manager/backhaul_manager_thread.cpp
Outdated
Show resolved
Hide resolved
agent/src/beerocks/slave/backhaul_manager/backhaul_manager_thread.cpp
Outdated
Show resolved
Hide resolved
* tlvLinkMetricQueryAllNeighbors without it. Application must check which of both TLVs has | ||
* been received inside the message. | ||
*/ | ||
std::shared_ptr<ieee1905_1::tlvLinkMetricQueryAllNeighbors> tlvLinkMetricQueryAllNeighbors; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we can handle each option in separate functions? (nitpick)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most of the code is common to both options. Separated functions would contain lots of duplicated code so I prefer to leave it as it is now.
agent/src/beerocks/slave/backhaul_manager/backhaul_manager_thread.cpp
Outdated
Show resolved
Hide resolved
c64e2d1
to
75f286f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I pressed the wrong button
agent/src/beerocks/slave/link_metrics/ieee802_3_link_metrics_collector.cpp
Outdated
Show resolved
Hide resolved
75f286f
to
636e1cf
Compare
Define a structure to hold collected link metrics. This structure will later be used to fill the TLVs of the Link Metric response message. Use uint32_t instead of uint16_t for max throughput capacity and bit rate fields to allow for bigger values. Spec document uses uint16_t so a saturation to the maximum value is required when copying values to the TLV. Link metrics have to be collected differently for wired and wireless links. Define an interface class for a link metrics collector. Derived implementation classes will override pure virtual methods. Create an implementation class for Ethernet links. This implementation reads network interface stats using a Netlink socket and the NETLINK_ROUTE protocol. TODO in future tasks: - Implement link metrics collector for WiFi links in ([TASK] Link metric collection for WiFi links #792) - Move all link metrics related classes to BPL ([TASK] Move link metric related classes to BPL #910) - Implement Ethernet link metrics collector using and extending Netlink classes defined in "Feature/add support for nl80211 standard commands to BWL #900" Signed-off-by: Mario Maz <[email protected]>
Signed-off-by: Mario Maz <[email protected]>
Media type group is the MSB of the media type field. It is not included in any message as is. It is used internally to decide which mechanism has to be used to compute the media type. Signed-off-by: Mario Maz <[email protected]>
The Link Metric Query TLV contains an optional field. However, TLVF does not currently support optional fields. As a workaround we create two different TLVs, one that includes the optional field and another one which does not. When parsing received message, the application will check the length of included TLV to know which one is it. Signed-off-by: Mario Maz <[email protected]>
Signed-off-by: Mario Maz <[email protected]>
Method getNextTlvLength() returns an uit16_t with the length of next TLV in message. However, the method is not considering the endianness and returns the 2 bytes swapped. Fix getNextTlvLength to consider endianness swapping bytes of returned length value. Swapping is only correct in the context of CmduMessageRx::parse(). Remove getNextTlvData and getNextTlvType(eTlvType) from CmduMessage. Move getNextTlvType() and getNextTlvLength down into CmduMessageRx and make them private. Signed-off-by: Mario Maz <[email protected]>
Check the length of received TLV to know if optional field (MAC address of neighbor device) is present or not and then create an instance of either tlvLinkMetricQuerySpecificNeighbor or tlvLinkMetricQuery respectively. Signed-off-by: Mario Maz <[email protected]>
After having created different TLVs to cope with optional field in tlvLinkMetricQuery and moved some inner enums out of their classes, update test that was was using such TLV. Signed-off-by: Mario Maz <[email protected]>
Add handler for LINK_METRIC_QUERY_MESSAGE message. On reception, obtain requested link metrics, build LINK_METRIC_RESPONSE_MESSAGE and send it. Currently only wired interfaces are supported. Obtaining link metrics for wireless interfaces will be done in a future task. Topology database is required to know which neighbor nodes are connected to each interface. Since we do not have such database now, a temporary solution has been implemented: a function that returns a list with a single (neighbor, interface) pair for the controller. In the future, once we have the topology database available, this function will completed to return all links between interfaces and neighbors. Signed-off-by: Mario Maz <[email protected]>
Modify the strings written to log so test_flows can later check which messages and TLVs have been received. Fix some typos in log messages. Signed-off-by: Mario Maz <[email protected]>
636e1cf
to
3133c23
Compare
Remove agent's emulated response because now the Link Metric query message is processed by agent. Also, as the message handler is now in the backhaul_manager_thread, the log file to look for messages has to be changed. Signed-off-by: Mario Maz <[email protected]>
Update test_flows.py to create test for LINK_METRIC_QUERY_MESSAGE reception in agent. Check also that controller receives LINK_METRIC_RESPONSE_MESSAGE. Signed-off-by: Mario Maz <[email protected]>
Link Metric Query message is handled now in backhaul manager thread so remove its (empty) handler from son slave thread. Signed-off-by: Mario Maz <[email protected]>
Now that we have a get_media_type() function, use it wherever it was previously being computed a MediaType field. Signed-off-by: Mario Maz <[email protected]>
3133c23
to
8919a56
Compare
See "[TASK] Link metric collection for Ethernet links #390" for a detailed description of the task.
Currently we only support collecting link metrics for Ethernet interfaces and it is done partially, because we need topology information to know which are the neighbors of the node. This is not available yet.
Collecting metrics for WiFi interfaces requires adding support for standard commands in DWPAL. See "Send standard NL80211 commands using DWPAL #782" for more information.
For tests_flows.sh to pass, "Fix several problems with the Device Information TLV #825" should be merged first.
I preferred to create a draft-PR so you can verify the design of the solution and give early feedback. Also to be particularly reviewed is the way MAC address of requested specific neighbor is compared against MAC address of the controller.