-
-
Notifications
You must be signed in to change notification settings - Fork 633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unresponsive controller recovery: Improvements #6402
Comments
Thanks for your work on resolving these issues. |
You can share them here |
I misspoke when I said I was running this latest commit. I realized I was running the master branch of zwave-js-ui, not the node application. I'm not actually sure how to do that.
Happy to post more if this isn't the bits which are helpful. I also just ordered a new stick (Zooz 800 Series) as I know my 500 series is getting old at this point. Thanks in advance. |
What you posted are the UI/Application logs. I'll need to see a driver log, on loglevel |
Sorry about that, I didn't realize it was a separate log. Full days log is attached. Upgrade to 12.x near the bottom. |
Here's one with some more usage, including it resetting after the failed ping
|
Here's a fresh log on 12.2.0 with soft restart both enabled and disabled. 12.2.0 starts at 2023-10-18T18:12:23.217Z. |
I installed the ZST39 (800 series) tonight and excluded then included 40 of my devices. So far it’s been smooth sailing on the new controller and hoping it stays that way. The rest I’ll do over the weekend. Smart Start has been pretty nice. I won't be able to provide any new logs as I solved my problem with the new controller, but I hope the prior ones help others. |
@jschollenberger looks like Z-Wave JS is now almost doing what it's supposed to when the controller does not respond in time. There's something I don't understand though...
A similar thing happens the next time - the recovery fails at Relevant part for further investigation:
|
Hi @AlCalzone Do you think is now time to plan a controller change ? BTW : You can find my today's log attached ... there is at least 20 reset event today. Currently the zwave hang is causing my a lot of trouble .... there are soo many reset event that I can't deal with they (last night I had to disable alarm based on zwave sensor) Anyway : Thanks for your work for helping us with Home Automation. |
Make sure you aren't also being affected by home-assistant/core#102637. If you have any thermostats, you may need to re-interview them. You can confirm in ZUI if there are any problems with them (unknown setpoints). |
@candrea77 going by that log, your controller randomly stops responding to commands. Normally it acknowledges commands it got from Z-Wave JS within a 2-3 milliseconds. Then at times it takes 3-5 seconds, or doesn't send one at all. That's when Z-Wave JS tries to recover it by restarting the controller (soft-reset), which usually works but takes a couple of seconds. Disabling it means Z-Wave JS is going to restart itself, and re-open the serial port, maybe even a few times until the stick becomes responsive again on its own. This is definitely going to take longer than leaving soft-reset enabled. Not sure how the situation was before the latest updates, but I guess it was the same - you just didn't notice it that well. You'd probably end up with a couple of random "dead" nodes in this situation, because Z-Wave JS would just continue trying commands that fail, until the stick recovered on its own. I'm not sure why this keeps happening - there isn't really an obvious trigger. Sometimes it happens when sending commands, sometimes it happens when reading the noise levels. I should get my hands on a new 800 series stick today or tomorrow, but I can't make any guarantees if and when the migration will be supported. |
Not sure how the situation was before the latest updates, but I guess it was the same - you just didn't notice it that well. You'd probably end up with a couple of random "dead" nodes in this situation, because Z-Wave JS would just continue trying commands that fail, until the stick recovered on its own. Switching to a different controller would probably solve that, but then you'll have to deal with the 700 series EU range issues or start over with an 800 series controller (due to missing migration functionality), where I'm not sure the range is actually better. I will probability wait some week (I've planned to move my hardware to a different floor of my house) and then start with a new 700 controller (and in case it was worst then the 500 series .... switch back to 500) @AlCalzone : Is there a way to disable not only to SoftReset but also the HardReset of the controller, let him waite for its own recover ? |
I am considering adding an option to disable the automatic recovery feature altogether. |
I've got an UZB (https://z-wave.me/products/uzb/) and I'm experiencing these issues as well. Previously I never had any issues (no random dead nodes either as far as I am aware) but now I need to restart my zwave docker at least daily or it completely stops working. I'm currently considering making a docker healthcheck to auto restart when it stops working again. I'm not sure if my controller has a soft reset option but the button in the interface gives me an error :) |
UZB is one of the sticks that are blacklisted from soft reset because it shuts down completely and has to be physically re-plugged. Do you have driver logs of your issue anyways? |
As far as I undeserstood : BEFORE v12 : Zwavejs never try to reset the zstick. NOW : When a node does not reply to the controller for a specific amount of (time/number of retry ??) , the "zwavejs software" try to reset (soft / hard) the controller. @AlCalzone : Is my assumption correct ? |
I've enabled the driver logs now. I'll let you know as soon as it breaks again. But I should mention that I've switched from the latest to the master docker tag in the mean time. I'm not sure if that makes any difference :) |
Not quite @candrea77 BEFORE v12: NOW: As you see the new behavior is more correct, although it can be pretty disruptive if the controller often becomes unresponsive like yours. |
Having the same issue since a couple of versions. Do you still need any logs? zwave-js-ui: 9.2.3.1454eca Z Wave stick is a Aeotec Z-Stick Gen5 |
@AlCalzone :
So , if I may , here you are my suggestion : is it possibile to stop reporting the "restart action" to home assistant and "freeze" the node status until controller restart ? |
I think zwave-js just shouldn't restart and instead try re-opening the serial port (see opening post, the unchecked point). That way what you're seeing doesn't happen. |
Unfortunately I'm not sure when it stopped working exactly, but this log should cover it all |
@wolph your problem seems to be Node 17 - the stick takes extraordinarily long to communicate with that one (> 30s), so Z-Wave JS often aborts these attempts. I've raised two issues:
Aside from that you may want to try and fix the situation with Node 17 yourself, e.g. by excluding it, factory resetting, including again. Not sure if that helps, but even when we handle the situation more gracefully, that device will still keep the controller busy and block all other communication while it's ongoing. |
I've noticed the slowness with node 17 as well but I've got no clue what the cause is to be honest. Perhaps it's simply a broken node, I've noticed it being flaky in the past as well. If I may, I would like to raise another issue as well although I'm not sure if this is the right place for that. When it currently fails it doesn't indicate to Home Assistant that it has failed. So all devices appear to work but they simply stop responding. It would be nice if the devices would be set to unavailable if the controller is unavailable. |
The controller status is exposed by this library, so this would be a feature request for HA. |
Re: home-assistant/addons#3234 (comment)
The text was updated successfully, but these errors were encountered: