-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Influx DB always tries to use an IPv6 address if the influx server is set as a hostname using v12.4.0 on ESP32 #18015
Comments
There have been quite some change around DNS resolution in Tasmota lately, related or not to IPv6 The InfluxDB driver is using Espressif's NTP and MQTT are using Tasmota's An easy fix would be to do the DNS resolution from the InfluxDB to the IP using Tasmota's Thinking about it .... |
I've implemented the 2nd case, doing the resolv at each infludb publication, htis is what the original code was doing. You can try without any warranty : tasmota32 compiled with USE_INFLUXDB : Or compile yourself from my branch : https://github.com/barbudor/Tasmota/tree/influxdb-with-new-dns-resolv Anyhow I'll try to test it tomorrow evening |
Many thanks barbudor, but that doesn't seem to completely solve the problem. I used your branch to build locally and therefore keep the same test conditions, but it still uses an IPv6 address for the influx hostname when testing "health" even though it now correctly gives an IPv4 for the server address. [09:17:24]00:00:00.002 HDW: ESP32-D0WDQ6 status 0 data at point of crash/reboot |
Having managed to get logging level 4 enabled for long enough to capture the crash properly, the key thing was an entry along the lines of src:button. I found that from somewhere I had a Button 2 set to GPIO35 in my default ESP32 template config. I have no idea where it came from, but the odd thing is it has been there from as long ago as my v10 builds, but has never caused an issue until 12.4.0. Very weird. |
Thanks for the update. I will prepare the PR and we will wait for your confirmation to merge. Regarding buttons, Theo has done large refactoring on buttons and switches management since v12.3.1 |
Just to clarify, I didn't have 2x Button2, only a button1 (on GPIO 0) & a button2. The button2 was set to GPIO35 which is input-only on an ESP32, and does not have any pull-up or pull-down resistors, so is sensitive to noise. I do not have anything connected to this pin. All I can say is having removed this erroneous button2, the system does not reboot repeatedly. I had used the word crash as that's what it seemed like via the web console, but via serial monitoring I can see that the system appears to be triggering a watchdog? as it says restarting or resetting - I need to capture this to make sure the exact wording. So I don't think @arendst needs to look at any button code for the time being. |
Having thought about this more and made the influx period an odd number so that it rarely co-incided with mqtt, wifi checking or any other possible timer type event, I was able to capture this from your build. I am alternating between main menu and information on the web ui, so that is why you see the switching between the two in the logs. I am now going to try the same thing, but with a valid IPv6 address for the influx server in my local DNS server to see if there is any residual IPv6 checking going on that shouldn't be. 09:14:24.799 HTP: Main Menu The web interface freezes at this point and the entried below do not appear until the blocking ends 09:14:36.857 HTP: Main Menu Logging and web interface become usable again at this point, and it is only at this point that the logs from the freeze point up to here appear in the console 09:14:53.150 HTP: Information |
Alternating between info and menu quicker, I can see that the web interface freezes perhaps 1-2 seconds before the berry log entry, as once the system unblocks, those web ui entries appear at the correct point in the sequence. So it is something that happens before the berry stuff that does not generate logging info even at level 4. I can confirm that adding a valid IPv6 to my local DNS makes no difference to the apparent blocking situation. Important info. I have also made a build of your branch where I disable all IPv6 by commenting out -DUSE_IPV6 in platform_tasmota32.ini as I did with the original master release of 12.4.0. Sadly I have to report that this still gives the long blocking periods with your branch, something that the 12.4.0 master build does not seem to exhibit when all IPv6 was disabled. So you have fixed influx always giving ipv6 address issue, but may have introduced a different issue as a result - sorry, but please don't merge this PR into the main release until I have checked more on this. Update: 12.4.0 DOES exhibit the blocking even with IPv6 disabled in the build, but it was only obvious when I reduced the ifxperiod to 25, so you have not introduced any new issues, my apologies. However there is something still not right with influx in version 12.4.0.x as this blocking definitely does not occur in v12.3.1, nor does the button2 issue You can see a similar blocking period of approx 15seconds of apparent inactivity during bootup when influx "health" is checked:- [09:49:39]00:00:02.597 WIF: Connecting to AP1 sensory Channel 6 BSSId 28:EE:52:9B:E9:58 in mode 11n as tasmota-1726E0-1760... I also added back a Button 2, to the template, and the system reboots, but ONLY if influx is enabled and only around the period of influx data send :- When the reboot occurs it resets pretty much everything back to defaults, so logging goes back to level2, and my backlog script to set ifxperiod to 600 as well as setting my default device template also run. In the previous fault entries, I had this erroneous button2 in my default template, so the problem would reappear every time it reached the influx send, and this was adding to the confusion of "constantly rebooting". An FYI, the GPS does NOT have this erroneous Button2 on GPIO35, what I thought was the same rebooting is in fact the very long "blocking" we are seeing in the IPv6 enabled builds. So my theory that it might be any external triggers during the blocking that cause this are evidently wrong. It is very difficult for me to physically access the GPS device, so I can only get console access via the web interface. If you need any more logs or data please let me know. [09:59:09]09:59:13.480 HTP: Configuration |
I think it's best if I summarise what I believe my findings are thus far... I am still investigating the button2 issue as I am getting some odd findings, and need to verify with the stock build from the website. But to sumarise that briefly I am getting reboots with or without influx enabled, on both ESP32 and ESP8266, and also (never observed before) on release 12.3.1 custom build. I think the switch issue if it exists in the stock build, needs its own issue report, so I will not cloud this influx DNS problem any further with this. Blocking delay during ifx data send/lookup issue: v12.4.0 master release, custom option build (only minor variance from official to give my local system defaults), IPv6 enabled for ESP32
v12.4.0 master release, custom option build (only minor variance from official to give my local system defaults), IPv6 disabled for ESP32
v12.4.0.1 your branch, custom option build (only minor variance from official to give my local system defaults), IPv6 enabled for ESP32
v12.4.0.1 your branch, custom option build (only minor variance from official to give my local system defaults), IPv6 disabled for ESP32
v12.3.1 master release, custom option build (only minor variance from official to give my local system defaults), IPv6 disabled for ESP32
|
Haven't read anything regarding blocking but your button issue is as designed. What happens is if you define a wrong button type it reports pressed instead of not pressed. If tasmota detects a button pressed for over 40 seconds it resets the device as requested. See docs. To get rid of this define a correct button type reporting the correct state. So if you defined a See docs about buttons and switches. Happy hunting for the blocking issue. |
Thank-you @arendst , I was in the process of eliminating any custom bits from a test build to find "the cause" and was chasing my own tail wasting time as the results weren't always consistent. I can go back to doing something more useful now :-) |
A lot of details in which I'm a bit lost
Sorry but this sentence doesn't makes much sense for me I thought we were talking about InfluxDB Thanks |
I can see the blocking on an ESP32 with only InfluxDB (no buttons or what so ever) It looks like the http_POST remains blocked for 15 seconds after the end of the transaction (influx has replyied with 204 No Content) |
I have occasionally seen the same blocking on an ESP8266 with your branch too unfortunately, but it may be completely unrelated to this and is not consistently repeatable for me. The GPS I mentioned was in reference to the now discredited button2 issue - I had thought that perhaps the reboots were caused by external triggers during influx data send, and the GPS production device which I mentioned has another possible source of an external trigger (as would a button2 be) on which I had seen the reboots, It was never part of the tests here, but I thought it might be relevant if the reboots were happening when there was an influx send - see Theos comments about incorrectly assigned buttons deliberately causing the system to reboot, something I never knew about. |
I see where the problem is: it takes 15 sec to connect to the server I will be off starting Friday evening and may not have much time to investigate more during the coming week. And one that could help me too will also be off for a week. Some delay to be expected here too.. |
I am in no rush as I have a workaround, I am just pleased that you can reproduce the problem that I am seeing. |
@therobveiller Dev branch has been reverted to before that new Arduino Core so it may be possible that the InfluxDB problem is also solved Thanks |
@barbudor |
@barbudor, I managed to get some time late last night. Latest development branch used was files timestamped as 4 March at 22:52, downloaded at 13:42 on 5 March. For build 1) influx resolves an IPv4 address as would be expected (doesn't show in log, but it connects fine, so logically must be the IPv4 address) For build 2) influx still incorrectly resolves an IPv6 address instead of IPv4 For build 3) influx now correctly resolves an IPv4 address, but we still get the 15second delay in connecting to the server for every influx send during which time the unit hangs. So in summary, the situation is no different than with 12.4.0 or with your branch that we tested before. The reverted Arduino core does not solve the problem. The following observations may be of some help I hope: I noticed that even in build 1 (the current development build that has the reverted arduino core, and with IPv6 disabled by me) and also with v12.4.0, that there is a delay of 15secs on the very first connection to the influx server (where http://influx.home:8086/health is called) after boot. |
Thanks |
This issue has been automatically marked as stale because it hasn't any activity in last few weeks. It will be closed if no further activity occurs. Thank you for your contributions. |
Bump up. I need to restart investigations |
This issue has been automatically marked as stale because it hasn't any activity in last few weeks. It will be closed if no further activity occurs. Thank you for your contributions. |
Bump |
This issue has been automatically marked as stale because it hasn't any activity in last few weeks. It will be closed if no further activity occurs. Thank you for your contributions. |
Bump. |
- Add IPv4 DNS lookup to influxdb (#18015) - Add response to influxdb send
Added local DNS lookup (as suggested by @barbudor) to solve possible IPv6 resolves. Even with this change I still observe the delays on ESP32 at initial connect for both Validating and Dat requests. After that they respond swiftly. I've added On ESP8266 the response is always swift. I think we can conclude the issue is with Arduino Core HTTPClient. Let me check if we can use Tasmota's own ESP32 HTTPClientLight as a solution..... |
Indeed htat was my conclusion too I planned to try with Tasmota's HTTPClightLight but I'm currently snowed under real life work. Just passing by from time to time ;) |
@barbudor There is probably something fishy in underlying IDF SDK. Not the first time with issues here. Maybe it is not just the HTTPClient. There is an unsolved issue with DNS from underlying IDF lwip when IPv6 is enabled. espressif/arduino-esp32#8221 |
Fix ESP32 InfluxDb initial connection delays using HTTPClient (#18015)
Try latest dev branch. For ESP32 it now uses our light HTTPClient. In my tests it responds swift as the ESP8266 does. Pls report back. |
Tried on my testbed ESP32 and all seems to behave properly as far as I can see, with IPv6 enabled in the build, setoption149 set to either 1 or 0, and on a network with no IPv6 address for the influx server. Thank you very much for your efforts, they are much appreciated. Rob |
Seems to be behaving as it should now. It connects via IPv4 even with an IPv6 address on the server & DNS, with setoption149 set to 0, so I am very happy. Thanks again. Rob |
PROBLEM DESCRIPTION
A clear and concise description of what the problem is.
Influx DB always yields an IPv6 address (after about 30seconds) if the influx server is set as a hostname using v12.4.0 on ESP32, even though setoption149 is set to 0 (default). If -DUSE_IPV6 is commented out in platformio_tasmota32.ini, then it yields an IPv4 address, but takes 15seconds to do so (can be seen in delay before next log entry generated, and the device hangs and is unresponsive during this delay). In v12.3.1 the name-to-ip to yeild an IPv4 is almost instantaneous.
REQUESTED INFORMATION
Make sure your have performed every step and checked the applicable boxes before submitting your issue. Thank you!
Backlog Template; Module; GPIO 255
:Backlog Rule1; Rule2; Rule3
:Status 0
:weblog
to 4 and then, when you experience your issue, provide the output of the Console log:TO REPRODUCE
Steps to reproduce the behavior:
EXPECTED BEHAVIOUR
A clear and concise description of what you expected to happen.
SCREENSHOTS
If applicable, add screenshots to help explain your problem.
ADDITIONAL CONTEXT
Add any other context about the problem here.
The IPv4 address is correct as resolved from my local DNS server for all entries (mqtt.home, ntp.home, and influx.home).
The IPv6 address that 12.4.0 produces is "fictional" and does not come from my DNS server, and it is NOT the correct address for the influx server either. This also causes the device to reboot itself often.
If I add the correct IPv6 address for influx.home to my local DNS, the v12.4.0 with IPv6 does connect to the server, but it still takes it 15seconds before moving on to the mqtt section.
(Please, remember to close the issue when the problem has been addressed)
The text was updated successfully, but these errors were encountered: