You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In current implementation mDNS is an abstract layer providing mDNS specific API and using platform specific implementations like minimal mDNS or SRP for Thread devices. That architecture is kind of wrong as mDNS and SRP are actually equivalent protocols providing DNS-SD functionalities, so the DNS-SD should be abstract layer providing common API for mDNS and SRP.
Apart from theory, in practice it means that mDNS API assumes that operations like publishing or stopping services (in other words adding/removing) are done immediately and result is available just after requesting it. For SRP it is not true, as Thread devices need to send operation request to the SRP server that runs on a separate device (Thread Border Router), then wait for an answer from it and only then provide result, so interaction is asynchronous. Current mDNS implementation pushes new requests to SRP platform layer doesn't really taking care of what happens with them, so they may succeed or not and be done in different time depending on plenty of factors like current server/client state, communication problems, other processes running on device. That seems to be major reason why operational/commissionable discovery is unstable on Thread devices.
Proposed Solution
Perfect solution would be to create a new DNS-SD API instead of the mDNS one, that would fit to both SRP and mDNS interaction models, but another question is how long can it take and how big such refactor could be.
In my opinion the following things should be met to make SRP work stable:
After boot, upper layer should wait for SRP to be ready to work. That means the device needs to join the Thread network, detect SRP server, send request to it that obsolete data related to the specific host should be removed (e.g. pseudo-random instance of commissionable discovery, that after reset will have different name) and after getting answer it should notify upper layer about initialized state (e.g. by some callback).
Upper layer should know what services it manages and what are their states. Currently we just add or remove services multiple times and request those operations without checking if previous ones were completed and decide what should be done next.
For SRP (not sure if for mDNS too) it could be valuable to gather following requests related to one host and send in short period of time together. For example if in StartServer() method we need to remove services and add again two services on advertising re-start, maybe it would be better to get all information together and on the method end call something like "publish" method that would trigger sending it.
@tcarmelveilleux@andy31415@Damian-Nordic@cecille as discussed on the SWTT stand-up, I created an issue describing the problem with SRP <-> mDNS interactions. I would love to see your opinions/suggestions on that topic.
We discussed with @Damian-Nordic that to avoid creating one big PR changing plenty of things at once and allow simultaneous work on different problems following tasks could be considered:
Implement delayed mDNS platform initialization that will:
provide a way to notify Mdns module when platform layer is ready to register services
Move all common mDNS/SRP constants to one file (to avoid situations that only mDNS or SRP implementation is changed without adjusting the other and prevent regressions)
PR: [DNS-SD] Clean up common constants #10553
Re-design mDNS API to allow refreshing services in bundles (let's say "lock" service when it is going to be modified in several steps like removed and added once again and "unlock" when all actions will be applied)
PR: [DNS-SD] Redesign ServiceAdvertiser and Resolver interfaces #10181
Refactor Mdns.h/Mdns.cpp naming:
Make sure that methods' do what they names suggest
Problem
In current implementation mDNS is an abstract layer providing mDNS specific API and using platform specific implementations like minimal mDNS or SRP for Thread devices. That architecture is kind of wrong as mDNS and SRP are actually equivalent protocols providing DNS-SD functionalities, so the DNS-SD should be abstract layer providing common API for mDNS and SRP.
Apart from theory, in practice it means that mDNS API assumes that operations like publishing or stopping services (in other words adding/removing) are done immediately and result is available just after requesting it. For SRP it is not true, as Thread devices need to send operation request to the SRP server that runs on a separate device (Thread Border Router), then wait for an answer from it and only then provide result, so interaction is asynchronous. Current mDNS implementation pushes new requests to SRP platform layer doesn't really taking care of what happens with them, so they may succeed or not and be done in different time depending on plenty of factors like current server/client state, communication problems, other processes running on device. That seems to be major reason why operational/commissionable discovery is unstable on Thread devices.
Proposed Solution
Perfect solution would be to create a new DNS-SD API instead of the mDNS one, that would fit to both SRP and mDNS interaction models, but another question is how long can it take and how big such refactor could be.
In my opinion the following things should be met to make SRP work stable:
By the way:
Unfortunately I don't have ready solution to share, but I would like to rather discuss my doubts and work out common statement on that problem.
The text was updated successfully, but these errors were encountered: