-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
minmdns - perform exponental backoff retry for discovering nodes #9900
minmdns - perform exponental backoff retry for discovering nodes #9900
Conversation
…for better code organization
As of 97d9f7e ("Address 9154 and CM/AC DNS-SD convergence (project-chip#9348)") operational discovery during chip-tool's commissioning flow typically fails with minimal mDNS. This occurs because this change unintentionally broke the logic to send unsolicited mDNS announcements as required by RFC 6762. During commissioning, the initial mDNS query to resolve the operational service on the network is generally sent before the device is able to join the network. The is perfectly fine because RFC 6762 accounts for this case and requires devices joining the network to advertise all of their records. These unsolicited responses provide the IP address and allow the final commissioning steps to execute. The minimal mDNS implementation sends its unsolicited announcements in Start() and not at any other time. In the above change, we started calling StopPublishDevice() immediately prior to (re-)starting the server, leaving no records to announce. The records are added immediately afterwards, but currently adding records does not trigger additional announcements. Thus, a response never arrives and commissioning times out. Fix this by adding an announcement after the records are added. Unfortunately this will quite spammy since we announce everything and advertise is called several times in a row. What should happen is announcements only happen with new or updated records, or when new interfaces are enabled. (Note that this problem is greatly exacerbated by the lack of retransmissions of queries, which is being addressed in project-chip#9900).
As of 97d9f7e ("Address 9154 and CM/AC DNS-SD convergence (#9348)") operational discovery during chip-tool's commissioning flow typically fails with minimal mDNS. This occurs because this change unintentionally broke the logic to send unsolicited mDNS announcements as required by RFC 6762. During commissioning, the initial mDNS query to resolve the operational service on the network is generally sent before the device is able to join the network. The is perfectly fine because RFC 6762 accounts for this case and requires devices joining the network to advertise all of their records. These unsolicited responses provide the IP address and allow the final commissioning steps to execute. The minimal mDNS implementation sends its unsolicited announcements in Start() and not at any other time. In the above change, we started calling StopPublishDevice() immediately prior to (re-)starting the server, leaving no records to announce. The records are added immediately afterwards, but currently adding records does not trigger additional announcements. Thus, a response never arrives and commissioning times out. Fix this by adding an announcement after the records are added. Unfortunately this will quite spammy since we announce everything and advertise is called several times in a row. What should happen is announcements only happen with new or updated records, or when new interfaces are enabled. (Note that this problem is greatly exacerbated by the lack of retransmissions of queries, which is being addressed in #9900).
As of 97d9f7e ("Address 9154 and CM/AC DNS-SD convergence (#9348)") operational discovery during chip-tool's commissioning flow typically fails with minimal mDNS. This occurs because this change unintentionally broke the logic to send unsolicited mDNS announcements as required by RFC 6762. During commissioning, the initial mDNS query to resolve the operational service on the network is generally sent before the device is able to join the network. The is perfectly fine because RFC 6762 accounts for this case and requires devices joining the network to advertise all of their records. These unsolicited responses provide the IP address and allow the final commissioning steps to execute. The minimal mDNS implementation sends its unsolicited announcements in Start() and not at any other time. In the above change, we started calling StopPublishDevice() immediately prior to (re-)starting the server, leaving no records to announce. The records are added immediately afterwards, but currently adding records does not trigger additional announcements. Thus, a response never arrives and commissioning times out. Fix this by adding an announcement after the records are added. Unfortunately this will quite spammy since we announce everything and advertise is called several times in a row. What should happen is announcements only happen with new or updated records, or when new interfaces are enabled. (Note that this problem is greatly exacerbated by the lack of retransmissions of queries, which is being addressed in #9900).
As of 97d9f7e ("Address 9154 and CM/AC DNS-SD convergence (project-chip#9348)") operational discovery during chip-tool's commissioning flow typically fails with minimal mDNS. This occurs because this change unintentionally broke the logic to send unsolicited mDNS announcements as required by RFC 6762. During commissioning, the initial mDNS query to resolve the operational service on the network is generally sent before the device is able to join the network. The is perfectly fine because RFC 6762 accounts for this case and requires devices joining the network to advertise all of their records. These unsolicited responses provide the IP address and allow the final commissioning steps to execute. The minimal mDNS implementation sends its unsolicited announcements in Start() and not at any other time. In the above change, we started calling StopPublishDevice() immediately prior to (re-)starting the server, leaving no records to announce. The records are added immediately afterwards, but currently adding records does not trigger additional announcements. Thus, a response never arrives and commissioning times out. Fix this by adding an announcement after the records are added. Unfortunately this will quite spammy since we announce everything and advertise is called several times in a row. What should happen is announcements only happen with new or updated records, or when new interfaces are enabled. (Note that this problem is greatly exacerbated by the lack of retransmissions of queries, which is being addressed in project-chip#9900).
@msandstedt @mspang - would you be able to review this change still? I believe exponential backoff is a desirable mdns functionality regardless of auto-broadcast working or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this mostly makes sense.
My biggest questions are around the larger overall design. I am surprised to see that the Resolver_ImplMinimalMdns.cpp
does not appear to have any types of callbacks for resolution success or failure.
This seems divergent from the interfaces we have in inet/DNSResolver.h
. Is there a reason these aren't more unified?
Second question: is it presumed that we only need to implement this in the sdk for Minimal mDNS? Do Avahi and Bonjour provide this type of retry behavior on their own?
minmdns only has the 'success' part. Failure likely can be implemented only after this PR, since we are effectively saying 'we keep retrying until we give up' and we only have this now. I expect Avahi and Bonjour to implement more of the spec than minmdns, so retries should be there, however I have not explicitly tried it so far. |
node was evicted here, if/when resolution failures areconnectedhomeip/src/lib/mdns/minimal/ActiveResolveAttempts.cpp Lines 110 to 120 in 8bf12cc
This comment was generated by todo based on a
|
* V1: very hardcoded logic for retries * Move the active resolve attempts class to a separate header/cpp file for better code organization * Restyle fixes * Remove unused include in minmdns resolver implementation * Update code to use Optional for potentially missing values * use a clock base for the active resolve attempts, to allow for testability * Start adding unit tests for the resolve attempt queue * More unit tests * Restyle fixes * Fix signed/unsigned for optional, to make android compile * Code review updates, add a more complex test for scheduling check
Problem
A single mDNS query may be insufficient to discover nodes:
Change overview
Adds retries for resolution with exponential backoff suppport.
Testing