Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] ESP32 sends message to the unique local address before it has route to the address and is able to reach the ULA #25570

Closed
wqx6 opened this issue Mar 8, 2023 · 13 comments
Labels

Comments

@wqx6
Copy link
Contributor

wqx6 commented Mar 8, 2023

Reproduction steps

In non-apple devices, the unique local address has higher score than the link local address. But the score might cause an issue. If the OTA Requestor applies the OTA image and reboot, after rebooting, it will send a NotifyUpdateApplied command to the Provider (Assuming the command is sent when the event kDnssdInitialized is post).

If the Provider has an ULA , and the requestor only gets its link local address (The requestor might not receive the RA message and add the ULA when it wants to send the command), the Provider is unreachable for the Requestor.

Here is my log.
ipaddr_resolve_issue.log

In the log, the device wants to send message to addr [FDDE:AD00:BEEF:DAD0:A276:4EFF:FE68:BF00] before it gets an IP address [fdde:ad00:beef:dad0:f612:faff:fe03:89dc].

Bug prevalence

Whenever I do this

GitHub hash of the SDK that was being used

536a0d9

Platform

esp32

Platform Version(s)

No response

Anything else?

No response

@bzbarsky-apple
Copy link
Contributor

Assuming the command is sent when the event kDnssdInitialized is post

This is odd. After #25173 I would expect that kDnssdInitialized would not happen until we have an ipv6 address, for minimal mdns, which is what that log is using.

Is that logic failing somehow? @wqx6 Could you check what's happening in the interface iteration there, if you can reproduce this every time?

@bzbarsky-apple
Copy link
Contributor

So on ESP32 it looks like InitOTARequestorHandler is called off a timer via OTAHelpers::InitOTARequestor. The timer is set for 3 seconds.

InitOTARequestor is called from any of the following cases:

  • kThreadConnectivityChange
  • event->InternetConnectivityChange.IPv4 == kConnectivity_Established
  • event->InternetConnectivityChange.IPv6 == kConnectivity_Established

None of that is waiting for kDnssdInitialized. So that's the real issue here: instead of a hard-coded 3s wait, this should be waiting for kDnssdInitialized to have happened.

@bzbarsky-apple bzbarsky-apple changed the title [BUG] Logic for ScoreIpAddress [BUG] ESP32 startup of OTA Requestor bits does not wait for kDnssdInitialized Mar 8, 2023
@bzbarsky-apple
Copy link
Contributor

@wqx6 I expect #25563 will fix this.

@wqx6
Copy link
Contributor Author

wqx6 commented Mar 9, 2023

@bzbarsky-apple #25563 will not fix this issue, I met this issue even if I apply this patch. #25563 only fixes the issue the device might resolve no address. But my issue is that the device has resolve some IPv6 addresses, but sends the message to the wrong IPv6 address which is unreachable for the device at that time. Can we make the link local address have higher score than the ULA in https://github.com/project-chip/connectedhomeip/blob/master/src/lib/dnssd/IPAddressSorter.h#L46

@wqx6
Copy link
Contributor Author

wqx6 commented Mar 9, 2023

I think other platforms might have the same issue, because it might send message to the unique local address before it has route to the address and is able to reach the ULA.

@wqx6 wqx6 changed the title [BUG] ESP32 startup of OTA Requestor bits does not wait for kDnssdInitialized [BUG] ESP32 sends message to the unique local address before it receives a RA message and is able to reach the ULA Mar 9, 2023
@wqx6 wqx6 changed the title [BUG] ESP32 sends message to the unique local address before it receives a RA message and is able to reach the ULA [BUG] ESP32 sends message to the unique local address before it has route to the address and is able to reach the ULA Mar 9, 2023
@bzbarsky-apple
Copy link
Contributor

But my issue is that the device has resolve some IPv6 addresses, but sends the message to the wrong IPv6 address which is unreachable for the device at that time.

Just to be clear, that's not what the attached log is showing, right? I would be somewhat curious to see a log showing the failure after #25563 is applied.

Can we make the link local address have higher score than the ULA

I clearly think so, since the Apple case does this. But there were objections to changing it in general, which is how we ended up with the ifdef....

@wqx6
Copy link
Contributor Author

wqx6 commented Mar 9, 2023

Just to be clear, that's not what the attached log is showing, right?

This is what the attached log shows.

I (15847) chip[DIS]: UDP:[FE80::A276:4EFF:FE68:BF00%st1]:5540: new best score: 3
I (15847) chip[DIS]: UDP:[FDDE:AD00:BEEF:DAD0:A276:4EFF:FE68:BF00%st1]:5540: new best score: 4
I (15867) chip[DIS]: UDP:192.168.1.102%st1:5540: score has not improved: 2
I (15877) chip[DIS]: Checking node lookup status after 124 ms
I (15877) chip[DIS]: Keeping DNSSD lookup active
I (15947) chip[DIS]: Checking node lookup status after 200 ms
I (15957) chip[SC]: Initiating session on local FabricIndex 1 from 0x0000000000000002 -> 0x0000000000000001
I (16127) chip[EM]: <<< [E:11097i S:0 M:241576368] (U) Msg TX to 0:0000000000000000 [0000] --- Type 0000:30 (SecureChannel:CASE_Sigma1)
I (16147) chip[IN]: (U) Sending msg 241576368 to IP address 'UDP:[FDDE:AD00:BEEF:DAD0:A276:4EFF:FE68:BF00]:5540'
E (16147) chip[SVR]: Failed to establish connection to node 0x0000000000000001
I (18077) ROUTE_HOOK: Received RIO
I (18077) ROUTE_HOOK: prefix FD2A:48DF:CA8C:1:: lifetime 1800
I (19627) chip[DL]: IP_EVENT_GOT_IP6
I (19627) chip[DL]: IPv6 addr available. Ready on WIFI_STA_DEF interface: fdde:ad00:beef:dad0:f612:faff:fe03:89dc

The ULA FDDE:AD00:BEEF:DAD0:A276:4EFF:FE68:BF00 has higher score than LLA FE80::A276:4EFF:FE68:BF00 so the ESP32 sends message to the ULA but it has no route for the ULA before it gets IP address fdde:ad00:beef:dad0:f612:faff:fe03:89dc.

I would be somewhat curious to see a log showing the failure after #25563

A simple way to see the log after #25563

  • start and pair a lighting example
  • start and pair a light switch example
  • start a Thread BR on the same network
  • bind the light to the light switch
  • reboot the light switch

@bzbarsky-apple
Copy link
Contributor

This is what the attached log shows.

Right, and that's showing the message being sent before IP_EVENT_GOT_IP6 happens. That part should be prevented by #25563; after that we should get IP_EVENT_GOT_IP6 and the IPv6 addr available. Ready on WIFI_STA_DEF interface: bits before we try sending the Sigma1 message. Is that what you see?

@wqx6
Copy link
Contributor Author

wqx6 commented Mar 9, 2023

that's showing the message being sent before IP_EVENT_GOT_IP6 happens

Before the error, the device gets a link local IPv6 address, which triggers the DnssdInitialized event.

@bzbarsky-apple
Copy link
Contributor

Ah, I see, there are two IP_EVENT_GOT_IP6. Alright, yeah, then the options are only to either delay things until we are sure out addresses are set up or to change the prioritization.

I wonder whether we can somehow do the prioritization based on what sorts of addresses we have on the interface that the resolve was received on, so that we make sure to only use LL addresses if we ourselves only have an LL address, no matter what else we do with the priority.

Past that, @andy31415 had opinions on this, but I think he's on vacation right now...

@wqx6
Copy link
Contributor Author

wqx6 commented Mar 20, 2023

@andy31415 Can we make some changes to the IP address score?

@andy31415
Copy link
Contributor

We could add more logic into https://github.com/project-chip/connectedhomeip/blob/master/src/lib/dnssd/IPAddressSorter.cpp#L44 (which is called from https://github.com/project-chip/connectedhomeip/blob/master/src/lib/address_resolve/AddressResolve_DefaultImpl.cpp#L120).

Overall the initial thought was that ULA are more likely to be "correct" compared to link local, however if we need to have more heuristics we could add/change thing with a comment explaining. I am all for doing what works best in more cases.

@wqx6
Copy link
Contributor Author

wqx6 commented Aug 10, 2023

Close as #27981 merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants