-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update DNS-SD advertisements on fail-safe timeout #23199
Conversation
When we remove a fabric during a pending add with fail-safe timeout, we aren't updating DNS-SD adevertisements. In addition to the obvious consequence of continuing to advertise a now-stale identity, this can also leave the advertisement slots full if the failed fabric addition was to the Nth slot in an N-fabric device. This then breaks advertisement when recommissioning to the Nth slot thereafter for builds that do not set this: CHIP_CONFIG_MINMDNS_DYNAMIC_OPERATIONAL_RESPONDER_LIST For those builds, this condition persists until reboot or some other action triggers DnssdServer::StartServer. This commit fixes the problem by moving the DnssdServer::StartServer call from the RemoveFabric invoke handler to the server's OnFabricRemoved callback. That will result in server restart both for RemoveFabric invocations and fail-safe expiry. Fixes project-chip#23169
Is there a test that could be added to make sure this never regresses again? |
PR #23199: Size comparison from d520b39 to 7f9ccd9 Increases (9 builds for bl702, cc13x2_26x2, psoc6)
Decreases (21 builds for bl602, bl702, cc13x2_26x2, cyw30739, k32w, nrfconnect, psoc6, qpg, telink)
Full report (31 builds for bl602, bl702, cc13x2_26x2, cyw30739, k32w, linux, mbed, nrfconnect, psoc6, qpg, telink)
|
It's not obvious to me how we would unit test. It is difficult because some of the communication between the fabric table and dns-sd server runs through the app server. That actually is probably the source of this bug in a way: the app shouldn't need to know how to tell the dns-sd server what to do / when to do things (particularly not when the dns-sd server's source of truth is the fabric table anyway). It would probably make more sense for the dns-sd server to register for fabric table add / update / remove callbacks directly so that the dns-sd can just 'do the right thing' based upon the state of the fabric table. And without the app in the middle, unit testing dns-sd server + fabric table seems plausible. But absent such a refactor, perhaps we could add an integration test to make sure from the commissioner side, advertisements disappear after a fail-safe timeout. ttl will complicate things, but it still may be doable. Can somebody point me to the existing commissioning integration tests? |
Discussed offline. Adding something to our existing Python testing could work, but would have some inherent variability across platforms, and could also be unreliable. And per my assessment above, I therefore think it is not feasible for me to add a regression test immediately and we should probably merge this obvious fix as is. |
PR #23199: Size comparison from 9f08fc1 to 309693d Increases (10 builds for bl702, cc13x2_26x2, esp32, psoc6)
Decreases (23 builds for bl602, bl702, cc13x2_26x2, cyw30739, efr32, esp32, k32w, nrfconnect, psoc6, qpg, telink)
Full report (38 builds for bl602, bl702, cc13x2_26x2, cyw30739, efr32, esp32, k32w, linux, mbed, nrfconnect, psoc6, qpg, telink)
|
PR #23199: Size comparison from b96f795 to fe4062e Increases (8 builds for cc13x2_26x2, psoc6)
Decreases (22 builds for bl602, bl702, cc13x2_26x2, cyw30739, k32w, nrfconnect, psoc6, qpg, telink)
Full report (30 builds for bl602, bl702, cc13x2_26x2, cyw30739, k32w, linux, nrfconnect, psoc6, qpg, telink)
|
When we remove a fabric during a pending add with fail-safe timeout, we aren't updating DNS-SD adevertisements. In addition to the obvious consequence of continuing to advertise a now-stale identity, this can also leave the advertisement slots full if the failed fabric addition was to the Nth slot in an N-fabric device. This then breaks advertisement when recommissioning to the Nth slot thereafter for builds that do not set this: CHIP_CONFIG_MINMDNS_DYNAMIC_OPERATIONAL_RESPONDER_LIST For those builds, this condition persists until reboot or some other action triggers DnssdServer::StartServer. This commit fixes the problem by moving the DnssdServer::StartServer call from the RemoveFabric invoke handler to the server's OnFabricRemoved callback. That will result in server restart both for RemoveFabric invocations and fail-safe expiry. Fixes project-chip#23169
When we remove a fabric during a pending add with fail-safe timeout, we aren't updating DNS-SD adevertisements. In addition to the obvious consequence of continuing to advertise a now-stale identity, this can also leave the advertisement slots full if the failed fabric addition was to the Nth slot in an N-fabric device. This then breaks advertisement when recommissioning to the Nth slot thereafter for builds that do not set this: CHIP_CONFIG_MINMDNS_DYNAMIC_OPERATIONAL_RESPONDER_LIST For those builds, this condition persists until reboot or some other action triggers DnssdServer::StartServer. This commit fixes the problem by moving the DnssdServer::StartServer call from the RemoveFabric invoke handler to the server's OnFabricRemoved callback. That will result in server restart both for RemoveFabric invocations and fail-safe expiry. Fixes project-chip#23169
When we remove a fabric during a pending add with fail-safe timeout, we aren't updating DNS-SD adevertisements. In addition to the obvious consequence of continuing to advertise a now-stale identity, this can also leave the advertisement slots full if the failed fabric addition was to the Nth slot in an N-fabric device.
This then breaks advertisement when recommissioning to the Nth slot thereafter for builds that do not set this:
CHIP_CONFIG_MINMDNS_DYNAMIC_OPERATIONAL_RESPONDER_LIST
For those builds, this condition persists until reboot or some other action triggers DnssdServer::StartServer.
This commit fixes the problem by moving the DnssdServer::StartServer call from the RemoveFabric invoke handler to the server's OnFabricRemoved callback. That will result in server restart both for RemoveFabric invocations and fail-safe expiry.
Fixes #23169