Skip to content
This repository has been archived by the owner on Sep 7, 2020. It is now read-only.

Update the controller on son_slave connect/disconnect event #931

Merged
merged 6 commits into from
Mar 16, 2020

Conversation

morantr
Copy link
Collaborator

@morantr morantr commented Mar 8, 2020

In the past, we used to have a TCP/IP socket between the Controller and the Agent.
When a radio went down, the son_slave perform reset and closed the socket to the controller.
When that happened, the socket on the controller side was closed too, and the controller removed the node of the son_slave from the database.

Nowadays, we have a BUS infrastructure that replaces the TCP/IP socket. This means that if the son_slave perform reset,
No socket is closed and the controller doesn’t aware that and Agent radio is down.

This PP adds a Topology Query message send after the controller receives M1
and add missing fronthaul radio information to Topology Response message, so the Controller
could remove from its database inactive radio (son_salve) nodes.

#925

@morantr morantr force-pushed the feature/detect-son-slave-reset-with-topology-flow branch from 2073992 to ee9f665 Compare March 8, 2020 17:45
@morantr morantr added the don't merge This PR is not ready for merge or review label Mar 8, 2020
@morantr morantr force-pushed the feature/detect-son-slave-reset-with-topology-flow branch from e5a5b5a to c530f4e Compare March 10, 2020 12:42
@morantr morantr removed the don't merge This PR is not ready for merge or review label Mar 10, 2020
@morantr morantr force-pushed the feature/detect-son-slave-reset-with-topology-flow branch from c530f4e to 2398b0d Compare March 10, 2020 16:45
Copy link
Collaborator

@tomereli tomereli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly nitpick comments, but not approving yet cause I think this should be rebased on #900 instead of using dummy data.

@morantr morantr force-pushed the feature/detect-son-slave-reset-with-topology-flow branch 3 times, most recently from 3f8e8fd to 68a848b Compare March 12, 2020 08:25
@morantr morantr requested a review from tomereli March 12, 2020 08:25
Copy link
Collaborator

@tomereli tomereli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since support for WIPHY commands via NL80211 is not ready yet - approved, but please add a task and a TODO comment in the code and commit message which mention that.

@morantr morantr force-pushed the feature/detect-son-slave-reset-with-topology-flow branch 3 times, most recently from 6cfea97 to 0d50037 Compare March 12, 2020 16:23
@arnout arnout added the don't merge This PR is not ready for merge or review label Mar 12, 2020
Copy link
Collaborator

@arnout arnout left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I have a few important comments, but everything can be fixed easily, so approve with don't merge.

Comment on lines +2050 to +2072
// For now, put zeros and when the Agent management will be move to unified Agent thread
// this field will be filled. #435
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not true. The unified agent is about unifying the ap manager and monitor threads. There would still be a separate process for each radio. The idea is that the per-radio process needs to link to bwl and is more prone to crashing, plus it may need more capabilities or different SELinux configuration.

The topology response will always be constructed by the main agent process, not by the per-radio process, so #435 will not solve this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unify Agent is the entity that will connect to two fronthaul managers (per radio) that each one will have a monitor and ap manager. Meaning, it will be the main Agent process.
The backhaul manager will be responsible only to link establishment and remove from him all the Agent stuff.

auto fronthaul_radios_on_db = database.get_node_children(
network_utils::mac_to_string(al_mac), beerocks::TYPE_SLAVE, beerocks::STATE_CONNECTED);

std::unordered_set<std::string> reported_fronthaul_radios;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::unordered_set<std::string> reported_fronthaul_radios;
std::unordered_set<sMacAddress> reported_fronthaul_radios;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it because of the database return set of std::string.
I don't want to change the database in this PR scope.
Let's give #524 more priority.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still, the decision is to use sMacAddress instead of std::string even if we don't change everything at once, so please change to sMacAddress as part of this PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. It's OK for fronthaul_radios_on_db because that is indeed returned by the database. But reported_fronthaul_radios gets filled with the MAC addresses that come from the device information so they're not strings. And where you do reported_fronthaul_radios.find(fronthaul_radio_on_db) you can easily do reported_fronthaul_radios.find(network_utils::string_to_mac(fronthaul_radio_on_db)) instead.

Anyway, I'm not going to block the merge for this. But please take it into account next time.

Comment on lines +1670 to +1684
// const auto &iface_bssid = media_info->network_membership;
// const auto iface_bw = media_info->ap_channel_bandwidth;
// const auto iface_cf1 = media_info->ap_channel_center_frequency_index1;
// const auto iface_cf2 = media_info->ap_channel_center_frequency_index2;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// const auto &iface_bssid = media_info->network_membership;
// const auto iface_bw = media_info->ap_channel_bandwidth;
// const auto iface_cf1 = media_info->ap_channel_center_frequency_index1;
// const auto iface_cf2 = media_info->ap_channel_center_frequency_index2;
// In MultiAP context, bssid, bw, cf1 and cf2 are not actually useful since they are more accurately
// represented in other TLVs (operating channel report, AP operational BSS).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIR, the current channel selection task does use it. I'm not sure that we will not use it. Keep it for now.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our mindset is to delete what we don't use now - so I don't see why we should keep it. In addition, the channel selection task will be stripped down to nothing very soon so we don't really care if it uses it since its obsolete.

Copy link
Collaborator

@arnout arnout Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Channel selection task will use this information - but coming from the AP Radio Basic Capabilities and the Channel Preference Report, which is a lot more accurate than the limited information given in this TLV.

I hoped that my suggested edit expressed this feeling accurately...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further discusion - it turns out that p_channel_center_frequency_index2 is missing from the Operating Channel Report TLV, so we will need to use this for 80+80MHz operation... Which is super-weird...

@morantr morantr force-pushed the feature/detect-son-slave-reset-with-topology-flow branch from 0d50037 to e009eb7 Compare March 15, 2020 12:04
@morantr morantr requested a review from arnout March 15, 2020 12:15
@arnout arnout removed the don't merge This PR is not ready for merge or review label Mar 16, 2020
morantr added 5 commits March 16, 2020 11:40
Send topology message query upon receiving auto-config m1 message.

Signed-off-by: Moran Shoeg <[email protected]>
Change the name of the member "ap_channel_band" to "ap_channel_bandwidth".
The term "band" is wrong and not reflects the field's meaning.
Update auto-generated code.

Signed-off-by: Moran Shoeg <[email protected]>
The controller needs to be able to add and remove the radio node from its database based on
the data that is coming from the TOPOLOGY RESPONSE message.
Add dummy data on the wireless fronthaul interfaces in the tlvDeviceInformation which is in
the Topology Response message.
The reason that it is dummy values is that this is something that cannot be done at this moment
because the MediaType field must be computed with information obtained through
NL80211_CMD_GET_WIPHY command of DWPAL, which is currently not supported.
See "Send standard NL80211 commands using DWPAL #782

The filled band data is based on the information we have about the band but it is not accurate.
The fields ap_channel_bandwidth, ap_channel_center_frequency_index1 and
ap_channel_center_frequency_index2 we could get (the bw for sure, the others not so sure),
but it not worth it since it will require to create more internal messaging which will be deleted
after moving to a unified Agent (#435).

Signed-off-by: Moran Shoeg <[email protected]>
When son_slave socket disconnects or connects, send a TOPOLOGY NOTIFICATION with no TLVs to
the controller. The controller will send TOPOLOGY QUERY and on the response, it will be notified
which radio (son_slave) when up or down.

Signed-off-by: Moran Shoeg <[email protected]>
Subscribe the controller to TOPOLOGY RESPONSE MESSAGE and
add handler function to it with parsing of device information type TLV.
Add a piece of code that checks whether an unreported fronthaul radio
interface exists on the controller database, and if not, remove its node.

Signed-off-by: Moran Shoeg <[email protected]>
Send TOPOLOGY_QUERY_MESSAGE when the controller receives TOPOLOGY_NOTIFICATION_MESSAGE
with no Client Association Event TLV. This will result in getting TOPOLOGY_RESPONSE_MESSAGE
with all the information the controller needs to update the topology on its database.

Signed-off-by: Moran Shoeg <[email protected]>
@arnout arnout force-pushed the feature/detect-son-slave-reset-with-topology-flow branch from e009eb7 to 39c59c6 Compare March 16, 2020 11:41
@mergify mergify bot merged commit 68dbd33 into master Mar 16, 2020
@mergify mergify bot deleted the feature/detect-son-slave-reset-with-topology-flow branch March 16, 2020 11:52
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants