mDNS <-> SRP interaction architecture needs refactorization #9879

kkasperczyk-no · 2021-09-22T07:51:36Z

Problem

In current implementation mDNS is an abstract layer providing mDNS specific API and using platform specific implementations like minimal mDNS or SRP for Thread devices. That architecture is kind of wrong as mDNS and SRP are actually equivalent protocols providing DNS-SD functionalities, so the DNS-SD should be abstract layer providing common API for mDNS and SRP.

Apart from theory, in practice it means that mDNS API assumes that operations like publishing or stopping services (in other words adding/removing) are done immediately and result is available just after requesting it. For SRP it is not true, as Thread devices need to send operation request to the SRP server that runs on a separate device (Thread Border Router), then wait for an answer from it and only then provide result, so interaction is asynchronous. Current mDNS implementation pushes new requests to SRP platform layer doesn't really taking care of what happens with them, so they may succeed or not and be done in different time depending on plenty of factors like current server/client state, communication problems, other processes running on device. That seems to be major reason why operational/commissionable discovery is unstable on Thread devices.

Proposed Solution

Perfect solution would be to create a new DNS-SD API instead of the mDNS one, that would fit to both SRP and mDNS interaction models, but another question is how long can it take and how big such refactor could be.

In my opinion the following things should be met to make SRP work stable:

After boot, upper layer should wait for SRP to be ready to work. That means the device needs to join the Thread network, detect SRP server, send request to it that obsolete data related to the specific host should be removed (e.g. pseudo-random instance of commissionable discovery, that after reset will have different name) and after getting answer it should notify upper layer about initialized state (e.g. by some callback).
Upper layer should know what services it manages and what are their states. Currently we just add or remove services multiple times and request those operations without checking if previous ones were completed and decide what should be done next.
For SRP (not sure if for mDNS too) it could be valuable to gather following requests related to one host and send in short period of time together. For example if in StartServer() method we need to remove services and add again two services on advertising re-start, maybe it would be better to get all information together and on the method end call something like "publish" method that would trigger sending it.
Restarting advertising tooks a place in the OnPlatformEvent related to the commissioning: https://github.com/project-chip/connectedhomeip/blob/master/src/app/server/CommissioningWindowManager.cpp#L45, so it doesn't takes into account situation when device was added to the network in other way and could start commissionable advertisements. I think that handling all events resulting in advertising restart should be implemented in the module responsible for advertising not outside of it: https://github.com/project-chip/connectedhomeip/blob/master/src/app/server/Mdns.cpp#L43

By the way:

I believe naming could be refactored, so methods should do what their names suggest. Currently we have StartServer() method that on the beginning stops all services (thus removes them): https://github.com/project-chip/connectedhomeip/blob/master/src/app/server/Mdns.cpp#L410 and then we call Start() method that initializes mdns only at the first time and removes services again: https://github.com/project-chip/connectedhomeip/blob/master/src/lib/mdns/Discovery_ImplPlatform.cpp#L62. So basically start method called plenty of times does init and removes services, what is somewhat confusing for me.

Unfortunately I don't have ready solution to share, but I would like to rather discuss my doubts and work out common statement on that problem.

kkasperczyk-no · 2021-09-22T09:10:59Z

@tcarmelveilleux @andy31415 @Damian-Nordic @cecille as discussed on the SWTT stand-up, I created an issue describing the problem with SRP <-> mDNS interactions. I would love to see your opinions/suggestions on that topic.

kkasperczyk-no · 2021-09-28T05:57:49Z

We discussed with @Damian-Nordic that to avoid creating one big PR changing plenty of things at once and allow simultaneous work on different problems following tasks could be considered:

Implement delayed mDNS platform initialization that will:
provide a way to notify Mdns module when platform layer is ready to register services
allow SRP to clean old services related to the given host on init before processing further requests
PR: [mdns/srp] Implemented delayed mDNS platform initialization #10222
Move all common mDNS/SRP constants to one file (to avoid situations that only mDNS or SRP implementation is changed without adjusting the other and prevent regressions)
PR: [DNS-SD] Clean up common constants #10553
Re-design mDNS API to allow refreshing services in bundles (let's say "lock" service when it is going to be modified in several steps like removed and added once again and "unlock" when all actions will be applied)
PR: [DNS-SD] Redesign ServiceAdvertiser and Resolver interfaces #10181
Refactor Mdns.h/Mdns.cpp naming:
Make sure that methods' do what they names suggest
Change Mdns module name to Dns-sd in order to avoid misconception that SRP is mDNS implementation
PR: [mdns] Renamed mDNS abstract layer to DNS-SD #10381

kkasperczyk-no added the SDK Discussion Required label Sep 22, 2021

kkasperczyk-no added this to the TE7 milestone Sep 22, 2021

kkasperczyk-no added api TE7 labels Sep 22, 2021

franck-apple modified the milestones: TE7, Test Event 7 Sep 22, 2021

kkasperczyk-no assigned kkasperczyk-no and Damian-Nordic Sep 28, 2021

kkasperczyk-no closed this as completed Oct 22, 2021

kkasperczyk-no mentioned this issue Oct 27, 2021

Fix instability in the commissionable node feature on nordic platform #9833

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mDNS <-> SRP interaction architecture needs refactorization #9879

mDNS <-> SRP interaction architecture needs refactorization #9879

kkasperczyk-no commented Sep 22, 2021 •

edited

Loading

kkasperczyk-no commented Sep 22, 2021 •

edited

Loading

kkasperczyk-no commented Sep 28, 2021 •

edited

Loading

mDNS <-> SRP interaction architecture needs refactorization #9879

mDNS <-> SRP interaction architecture needs refactorization #9879

Comments

kkasperczyk-no commented Sep 22, 2021 • edited Loading

Problem

Proposed Solution

kkasperczyk-no commented Sep 22, 2021 • edited Loading

kkasperczyk-no commented Sep 28, 2021 • edited Loading

kkasperczyk-no commented Sep 22, 2021 •

edited

Loading

kkasperczyk-no commented Sep 22, 2021 •

edited

Loading

kkasperczyk-no commented Sep 28, 2021 •

edited

Loading