-
Notifications
You must be signed in to change notification settings - Fork 7.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns_clear_cache() causes lockup (IDFGH-13375) #14287
Comments
Sorry I was in a rush earlier and I misread that forum post. Even if the dns_clear_cache() run in the LwIP thread it still may cause lockup. I'm also checking if we clear the DNS by setting TTL to 0 would work or not which initially mentioend by Linetkux Wang. |
I digged in a bit more, it looks like if there's a DNS query ongoing, and the |
any updates on this? @espressif-abhikroy |
@huming2207 Thank you for bringing this issue to our attention. It seems that the dns_clear_cache(void) function currently performs a simple memset() without executing dns_call_found(i, NULL);. This leads to the dns_table database being erased, along with the stored callback, which in turn causes tcpip_send_msg_wait_sem() to block indefinitely. As a result, functions like gethostbyname() and other netdb APIs will experience blocking. I am actively working on a fix, and it will be available shortly. Thank you for your patience. |
Thanks for the reply!
That sounds a bit problematic...😅 I'll wait for the fix. Thanks. Regards, |
May I also ask will this fix planned to be backported to IDFv5.3? |
Any updates on this issue? Sorry for urging but this indeed affecting one of our products... |
We are in the process of merging the fix, and it will be available soon. I apologize for the delay. In the meantime, you can apply this patch locally to resolve the issue: |
Hi @espressif-abhikroy Thanks for the follow up. I think I have done something similar in your patch but it wasn't quite work for us. Maybe I was wrong. I will see if I can arrange some experiments later and let you know the outcome. |
Also @espressif-abhikroy another idea I came up with, is that maybe in our case we don't need to (and we shouldn't need to) cache the DNS on ESP32. I think it would be nice to if we can add add a Kconfig and disable this cache feature completely. I will try working on a pull request later if I have some time. |
@espressif-abhikroy BTW, you mentioned the fix will be available shortly on Aug 16 : #14287 (comment). The fix is so low. |
Yea agreed. I have to point out this issue should have been esclated and prioritised, as it really hurts anyone who uses two or more different network PHYs with two or more different ISPs. For exmaple, using WiFi to connect to a cooperate network + using cellular for backup. It will lock up the whole firmware while switching between the networks, not just the network stack CC @igrr |
Also I think this need to be backported to esp-lwip 2.1.x with IDF v5.2 and v5.3, not just v2.2.0. We can't risk ourselves on using ESP-IDF master branches. |
Fixed in master: e4c9285. |
@espressif-abhikroy |
@david-cermak |
Answers checklist.
IDF version.
v5.2.2
Espressif SoC revision.
ESP32-S3 (QFN56) (revision v0.1)
Operating System used.
Linux
How did you build your project?
Command line with idf.py
If you are using Windows, please specify command line type.
None
Development Kit.
Custom board
Power Supply used.
External 5V
What is the expected behavior?
Calling
dns_clear_cache()
should just clear the DNS cache and does fresh DNS query from remote DNS server.What is the actual behavior?
If a
dns_clear_cache()
is called in another thread (rather than LWIP task), it may cause lockups.Someone on the forum also spot the same issue, see: https://www.esp32.com/viewtopic.php?t=25239
Steps to reproduce.
dns_clear_cache()
and switch to another network interfacesending DNS request ID
forever, if LWIP debugging log is enabled.Debug Logs.
More Information.
Also see: https://www.esp32.com/viewtopic.php?t=25239
Possible workaround is we probably can issue
DNS_TABLE_SIZE
times of DNS requests by repeatedly callinggethosebyname()
with different valid host names for now. CurrentlyDNS_TABLE_SIZE
is 4. But this will waste more data, and the IT team definitely isn't happy to see that when they do security auditing!The text was updated successfully, but these errors were encountered: