Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mDNS <-> SRP interaction architecture needs refactorization #9879

Closed
kkasperczyk-no opened this issue Sep 22, 2021 · 2 comments
Closed

mDNS <-> SRP interaction architecture needs refactorization #9879

kkasperczyk-no opened this issue Sep 22, 2021 · 2 comments
Assignees
Labels
Milestone

Comments

@kkasperczyk-no
Copy link
Contributor

kkasperczyk-no commented Sep 22, 2021

Problem

In current implementation mDNS is an abstract layer providing mDNS specific API and using platform specific implementations like minimal mDNS or SRP for Thread devices. That architecture is kind of wrong as mDNS and SRP are actually equivalent protocols providing DNS-SD functionalities, so the DNS-SD should be abstract layer providing common API for mDNS and SRP.

Apart from theory, in practice it means that mDNS API assumes that operations like publishing or stopping services (in other words adding/removing) are done immediately and result is available just after requesting it. For SRP it is not true, as Thread devices need to send operation request to the SRP server that runs on a separate device (Thread Border Router), then wait for an answer from it and only then provide result, so interaction is asynchronous. Current mDNS implementation pushes new requests to SRP platform layer doesn't really taking care of what happens with them, so they may succeed or not and be done in different time depending on plenty of factors like current server/client state, communication problems, other processes running on device. That seems to be major reason why operational/commissionable discovery is unstable on Thread devices.

Proposed Solution

Perfect solution would be to create a new DNS-SD API instead of the mDNS one, that would fit to both SRP and mDNS interaction models, but another question is how long can it take and how big such refactor could be.

In my opinion the following things should be met to make SRP work stable:

  • After boot, upper layer should wait for SRP to be ready to work. That means the device needs to join the Thread network, detect SRP server, send request to it that obsolete data related to the specific host should be removed (e.g. pseudo-random instance of commissionable discovery, that after reset will have different name) and after getting answer it should notify upper layer about initialized state (e.g. by some callback).
  • Upper layer should know what services it manages and what are their states. Currently we just add or remove services multiple times and request those operations without checking if previous ones were completed and decide what should be done next.
  • For SRP (not sure if for mDNS too) it could be valuable to gather following requests related to one host and send in short period of time together. For example if in StartServer() method we need to remove services and add again two services on advertising re-start, maybe it would be better to get all information together and on the method end call something like "publish" method that would trigger sending it.
  • Restarting advertising tooks a place in the OnPlatformEvent related to the commissioning: https://github.com/project-chip/connectedhomeip/blob/master/src/app/server/CommissioningWindowManager.cpp#L45, so it doesn't takes into account situation when device was added to the network in other way and could start commissionable advertisements. I think that handling all events resulting in advertising restart should be implemented in the module responsible for advertising not outside of it: https://github.com/project-chip/connectedhomeip/blob/master/src/app/server/Mdns.cpp#L43

By the way:

Unfortunately I don't have ready solution to share, but I would like to rather discuss my doubts and work out common statement on that problem.

@kkasperczyk-no
Copy link
Contributor Author

kkasperczyk-no commented Sep 22, 2021

@tcarmelveilleux @andy31415 @Damian-Nordic @cecille as discussed on the SWTT stand-up, I created an issue describing the problem with SRP <-> mDNS interactions. I would love to see your opinions/suggestions on that topic.

@kkasperczyk-no
Copy link
Contributor Author

kkasperczyk-no commented Sep 28, 2021

We discussed with @Damian-Nordic that to avoid creating one big PR changing plenty of things at once and allow simultaneous work on different problems following tasks could be considered:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants