Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unresponsive controller recovery: Improvements #6402

Closed
4 tasks done
AlCalzone opened this issue Oct 16, 2023 · 27 comments · Fixed by #6480
Closed
4 tasks done

Unresponsive controller recovery: Improvements #6402

AlCalzone opened this issue Oct 16, 2023 · 27 comments · Fixed by #6480

Comments

@AlCalzone
Copy link
Member

AlCalzone commented Oct 16, 2023

Re: home-assistant/addons#3234 (comment)

  • just increasing the response timeout is not enough. We should probably go back to 10s, issue a Send Data Abort when this is triggered and then wait for the callback as usual.
  • Try reopening the serial port if soft-reset failed. If that doesn't help, move on.
  • Introduce a soft timeout for missing callback. Once elapsed, issue SendDataAbort, then wait until the callback timeout elapses or a callback is received.
  • Add setting to disable this.
@AlCalzone AlCalzone changed the title Unresponsive controller (missing Send Data Response): Abort transmission and then wait for callback Unresponsive controller recovery: Improvements Oct 16, 2023
@jschollenberger
Copy link

Thanks for your work on resolving these issues.
I just updated to 12.x and encountering issues with my HUSBZB-1 controller. I tested this latest commit and while everything loads and works at the start, I then get the "Driver: Recovering unresponsive controller failed. Restarting the driver... (ZW0100)" error even with soft reset disabled. I rolled back to 11.14.3 where everything works, but unfortunately Home Assistant 2023.10 requires the newer version. Should I start a new issue with the logs or would you prefer we use this one?

@AlCalzone
Copy link
Member Author

You can share them here

@jschollenberger
Copy link

I misspoke when I said I was running this latest commit. I realized I was running the master branch of zwave-js-ui, not the node application. I'm not actually sure how to do that.
Anyway, here are the relevant portions of my log file where it times out after sending a command and loops. Perhaps this is already what you addressed here, but I'm not sure how to build and test it.

...
2023-10-16 17:16:58.120 INFO Z-WAVE: [Node 017] Value added 17-114-0-manufacturerId => 99
2023-10-16 17:16:58.120 INFO Z-WAVE: [Node 017] Value added 17-114-0-productType => 18756
2023-10-16 17:16:58.120 INFO Z-WAVE: [Node 017] Value added 17-114-0-productId => 12337
2023-10-16 17:16:58.137 INFO Z-WAVE: [Node 017] Value added 17-119-0-name => Bedroom Dimmer
2023-10-16 17:16:58.137 INFO Z-WAVE: [Node 017] Value added 17-119-0-location => Bedroom
2023-10-16 17:16:58.148 INFO Z-WAVE: [Node 017] Value added 17-134-0-libraryType => 6
2023-10-16 17:16:58.148 INFO Z-WAVE: [Node 017] Value added 17-134-0-protocolVersion => 3.67
2023-10-16 17:16:58.149 INFO Z-WAVE: [Node 017] Value added 17-134-0-firmwareVersions => 3.37
2023-10-16 17:16:58.164 INFO Z-WAVE: [Node 017] Ready: GE - 12724 / ZW3005 (In-Wall Paddle Dimmer, 300S)
2023-10-16 17:17:53.918 INFO Z-WAVE: Calling api writeValue with args: [
  { nodeId: 9, commandClass: 37, endpoint: 0, property: 'targetValue' },
  true,
  null,
  [length]: 3
]
2023-10-16 17:17:53.920 INFO Z-WAVE: Writing true to 9-37-0-targetValue
2023-10-16 17:18:26.762 INFO Z-WAVE: Controller status: Controller is unresponsive
2023-10-16 17:18:26.787 INFO Z-WAVE: Controller status: Driver: Recovering unresponsive controller failed. Restarting the driver... (ZW0100)
2023-10-16 17:18:26.791 INFO Z-WAVE: Restarting client in 1 seconds, retry 1
2023-10-16 17:18:27.801 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:18:27.840 INFO Z-WAVE: Client closed
2023-10-16 17:18:27.842 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:18:27.883 INFO Z-WAVE: Connecting to /dev/serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_813007C8-if00-port0
2023-10-16 17:18:27.885 INFO Z-WAVE: Setting user callbacks
2023-10-16 17:18:35.012 INFO Z-WAVE: Restarting client in 2 seconds, retry 2
2023-10-16 17:18:35.016 INFO Z-WAVE: Controller status: Driver: Failed to initialize the driver: ZWaveError: Timeout while waiting for an ACK from the controller (ZW0200)
    at Driver.sendMessage (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:5028:23)
    at ZWaveController.identify (/usr/src/app/node_modules/zwave-js/src/lib/controller/Controller.ts:897:37)
    at Driver.initializeControllerAndNodes (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:1392:26)
    at Immediate.<anonymous> (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:1186:16) (ZW0100)
2023-10-16 17:18:35.017 INFO Z-WAVE: Restarting client in 4 seconds, retry 3
2023-10-16 17:18:37.019 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:18:37.022 INFO Z-WAVE: Client closed
2023-10-16 17:18:37.023 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:18:37.036 INFO Z-WAVE: Connecting to /dev/serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_813007C8-if00-port0
2023-10-16 17:18:37.037 INFO Z-WAVE: Setting user callbacks
2023-10-16 17:18:43.785 INFO Z-WAVE: Controller status: Driver: Recovering unresponsive controller failed. Restarting the driver... (ZW0100)
2023-10-16 17:18:43.786 INFO Z-WAVE: Restarting client in 8 seconds, retry 4
2023-10-16 17:18:43.790 INFO Z-WAVE: Controller status: Driver: Failed to initialize the driver: ZWaveError: Timeout while waiting for an ACK from the controller (ZW0200)
    at Driver.sendMessage (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:5028:23)
    at ZWaveController.identify (/usr/src/app/node_modules/zwave-js/src/lib/controller/Controller.ts:897:37)
    at Driver.initializeControllerAndNodes (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:1392:26)
    at Immediate.<anonymous> (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:1186:16) (ZW0100)
2023-10-16 17:18:43.791 INFO Z-WAVE: Restarting client in 15 seconds, retry 5
2023-10-16 17:18:51.791 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:18:51.800 INFO Z-WAVE: Client closed
2023-10-16 17:18:51.800 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:18:51.811 INFO Z-WAVE: Connecting to /dev/serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_813007C8-if00-port0
2023-10-16 17:18:51.812 INFO Z-WAVE: Setting user callbacks
2023-10-16 17:18:58.316 INFO Z-WAVE: Controller status: Driver: Recovering unresponsive controller failed. Restarting the driver... (ZW0100)
2023-10-16 17:18:58.319 INFO Z-WAVE: Restarting client in 15 seconds, retry 6
2023-10-16 17:18:58.321 INFO Z-WAVE: Controller status: Driver: Failed to initialize the driver: ZWaveError: Timeout while waiting for an ACK from the controller (ZW0200)
    at Driver.sendMessage (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:5028:23)
    at ZWaveController.identify (/usr/src/app/node_modules/zwave-js/src/lib/controller/Controller.ts:897:37)
    at Driver.initializeControllerAndNodes (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:1392:26)
    at Immediate.<anonymous> (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:1186:16) (ZW0100)
2023-10-16 17:18:58.323 INFO Z-WAVE: Restarting client in 15 seconds, retry 7
2023-10-16 17:19:13.322 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:19:13.325 INFO Z-WAVE: Client closed
2023-10-16 17:19:13.326 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:19:13.337 INFO Z-WAVE: Connecting to /dev/serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_813007C8-if00-port0
2023-10-16 17:19:13.338 INFO Z-WAVE: Setting user callbacks
2023-10-16 17:19:19.991 INFO Z-WAVE: Controller status: Driver: Recovering unresponsive controller failed. Restarting the driver... (ZW0100)
2023-10-16 17:19:19.991 INFO Z-WAVE: Restarting client in 15 seconds, retry 8
2023-10-16 17:19:19.997 INFO Z-WAVE: Controller status: Driver: Failed to initialize the driver: ZWaveError: Timeout while waiting for an ACK from the controller (ZW0200)
    at Driver.sendMessage (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:5028:23)
    at ZWaveController.identify (/usr/src/app/node_modules/zwave-js/src/lib/controller/Controller.ts:897:37)
    at Driver.initializeControllerAndNodes (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:1392:26)
    at Immediate.<anonymous> (/usr/src/app/node_modules/zwave-js/src/lib/driver/Driver.ts:1186:16) (ZW0100)
2023-10-16 17:19:19.998 INFO Z-WAVE: Restarting client in 15 seconds, retry 9
2023-10-16 17:19:35.009 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:19:35.012 INFO Z-WAVE: Client closed
2023-10-16 17:19:35.013 INFO GATEWAY: Driver is CLOSED
2023-10-16 17:19:35.024 INFO Z-WAVE: Connecting to /dev/serial/by-id/usb-Silicon_Labs_HubZ_Smart_Home_Controller_813007C8-if00-port0
2023-10-16 17:19:35.025 INFO Z-WAVE: Setting user callbacks
2023-10-16 17:19:38.586 INFO GATEWAY: Driver is READY
2023-10-16 17:19:38.591 INFO Z-WAVE: Z-Wave driver is ready
2023-10-16 17:19:38.593 INFO Z-WAVE: Controller status: Driver ready
2023-10-16 17:19:38.676 INFO Z-WAVE: Scanning network with homeid: 0xfa4fb395
2023-10-16 17:19:38.680 INFO Z-WAVE: [Node 001] Is alive
2023-10-16 17:19:38.683 INFO Z-WAVE: Powerlevel is not supported by controller
2023-10-16 17:19:38.684 INFO Z-WAVE: RF region is not supported by controller
2023-10-16 17:19:38.687 INFO Z-WAVE: [Node 001] Ready: Sigma Designs (Former Zensys) - HUSBZB-1 (QuickStick Combo)
2023-10-16 17:19:38.779 INFO Z-WAVE: [Node 031] Is asleep
2023-10-16 17:19:38.795 INFO Z-WAVE: [Node 031] Value added 31-48-0-Motion => false
2023-10-16 17:19:38.796 INFO Z-WAVE: [Node 031] Value added 31-49-0-Air temperature => 68.7
2023-10-16 17:19:38.797 INFO Z-WAVE: [Node 031] Value added 31-49-0-Illuminance => 53
2023-10-16 17:19:38.797 INFO Z-WAVE: [Node 031] Value added 31-112-0-1 => 30
2023-10-16 17:19:38.798 INFO Z-WAVE: [Node 031] Value added 31-112-0-2 => 240
2023-10-16 17:19:38.799 INFO Z-WAVE: [Node 031] Value added 31-112-0-3 => 8
2023-10-16 17:19:38.799 INFO Z-WAVE: [Node 031] Value added 31-112-0-4 => 1
2023-10-16 17:19:38.800 INFO Z-WAVE: [Node 031] Value added 31-112-0-5 => 3
...

Happy to post more if this isn't the bits which are helpful. I also just ordered a new stick (Zooz 800 Series) as I know my 500 series is getting old at this point.

Thanks in advance.

@AlCalzone
Copy link
Member Author

What you posted are the UI/Application logs. I'll need to see a driver log, on loglevel debug. Ideally attach it here as a file (drag & drop into the text field).

@jschollenberger
Copy link

jschollenberger commented Oct 17, 2023

Sorry about that, I didn't realize it was a separate log. Full days log is attached. Upgrade to 12.x near the bottom.
zwavejs_2023-10-17.log

@jschollenberger
Copy link

Here's one with some more usage, including it resetting after the failed ping
zwavejs_2023-10-17-ctd.log

2023-10-17T12:49:13.798Z CNTRLR   [Node 018] ping failed: Timeout while waiting for a callback from the controll
                                  er (ZW0200)
2023-10-17T12:49:13.803Z DRIVER   all queues idle
2023-10-17T12:49:13.887Z DRIVER   driver instance destroyed

@jschollenberger
Copy link

Here's a fresh log on 12.2.0 with soft restart both enabled and disabled. 12.2.0 starts at 2023-10-18T18:12:23.217Z.
zwavejs_2023-10-18.log

@jschollenberger
Copy link

I installed the ZST39 (800 series) tonight and excluded then included 40 of my devices. So far it’s been smooth sailing on the new controller and hoping it stays that way. The rest I’ll do over the weekend. Smart Start has been pretty nice. I won't be able to provide any new logs as I solved my problem with the new controller, but I hope the prior ones help others.

@AlCalzone
Copy link
Member Author

@jschollenberger looks like Z-Wave JS is now almost doing what it's supposed to when the controller does not respond in time.

There's something I don't understand though...

  • At 18:12:23, you're starting the new version.
  • Between 18:13:06 and 18:13:47 it tries to ping Node 18, which takes too long. The recovery does not quite work as intended, partly because the controller does not acknowledge the abort command. However this process ends with "all queues idle", and finally the controller responding again at 18:14:00
  • At 18:17:38 the driver restarts - did you trigger that?

A similar thing happens the next time - the recovery fails at 18:18:46, this time the controller does not seem to come back. At 18:19:48 the driver is restarted again.


Relevant part for further investigation:

2023-10-18T18:13:06.849Z SERIAL » 0x01080013120100251ecc                                              (10 bytes)
2023-10-18T18:13:06.849Z DRIVER » [Node 018] [REQ] [SendData]
                                  │ transmit options: 0x25
                                  │ callback id:      30
                                  └─[NoOperationCC]
2023-10-18T18:13:06.859Z SERIAL « [ACK]                                                                   (0x06)
2023-10-18T18:13:16.864Z SERIAL » 0x01030016ea                                                         (5 bytes)
2023-10-18T18:13:16.867Z DRIVER » [REQ] [SendDataAbort]
2023-10-18T18:13:30.171Z SERIAL « 0x0104011301e8                                                       (6 bytes)
2023-10-18T18:13:30.172Z SERIAL » [ACK]                                                                   (0x06)
2023-10-18T18:13:30.173Z SERIAL « [CAN]                                                                   (0x18)
2023-10-18T18:13:30.177Z DRIVER « [RES] [SendData]
...
2023-10-18T18:13:36.867Z SERIAL » 0x01030016ea                                                         (5 bytes)
2023-10-18T18:13:36.869Z DRIVER » [REQ] [SendDataAbort]
2023-10-18T18:13:46.871Z SERIAL » 0x01030016ea                                                         (5 bytes)
2023-10-18T18:13:46.873Z DRIVER » [REQ] [SendDataAbort]
2023-10-18T18:13:47.376Z CNTRLR   The controller is unresponsive
2023-10-18T18:13:47.379Z DRIVER   Controller missed Send Data callback. Attempting to recover...
2023-10-18T18:13:47.386Z CNTRLR   The controller does not support soft reset or the soft reset feature has been 
                                  disabled with a config option or the ZWAVEJS_DISABLE_SOFT_RESET environment va
                                  riable.
2023-10-18T18:13:47.388Z CNTRLR   [Node 018] ping failed: Timeout while waiting for a callback from the controll
                                  er (ZW0200)
2023-10-18T18:13:47.390Z DRIVER   all queues idle

@candrea77
Copy link
Contributor

Hi @AlCalzone
a couple of months ago we've talked about the option to change the controller from 500 series to 700 series.
In that case you suggested me to stay stick with my 500 series controller because 700s EU series had some problems.
At this time, it seems that the situation is flipped over (or at least, 500 series seems to having more problems )

Do you think is now time to plan a controller change ?
In case of change, by your pov, is better to change to 700 or to 800 series ?
( My MUST-HAVE is the ability to do Backup/Restore for migration all the nodes from 500 to the new one)

BTW : You can find my today's log attached ... there is at least 20 reset event today.
I was using soft reset until 18.00 when I switch back to hard reset due to unresponsive network.
I'm using zwavejs on a container that can see the USB via /dev/ttyACM0

Currently the zwave hang is causing my a lot of trouble .... there are soo many reset event that I can't deal with they (last night I had to disable alarm based on zwave sensor)
Problems started with v12.x

Anyway : Thanks for your work for helping us with Home Automation.

zwavejs_2023-10-24.zip

@candrea77
Copy link
Contributor

candrea77 commented Oct 25, 2023

This is another example (found this morning @06.00)
in Home Assistenat the zwave network seems broken (lost all zwave device , marked as unavailable)
image
image
image

If I "tail" the log, the network seems doing something
If I try to execute some command from zwavejs-ui it works
image

Here is the today's log :

zwavejs_2023-10-25.zip

cat /share/SSD-VM/ZWaveJS/store/logs/zwavejs_current.log | grep reset
2023-10-24T22:28:20.359Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-24T22:28:31.250Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-24T22:28:41.176Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-24T22:53:04.504Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-24T23:09:14.145Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-24T23:09:23.450Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-24T23:15:27.563Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-24T23:15:35.953Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T00:01:49.031Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T00:02:26.435Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T00:09:19.610Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T00:09:58.822Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T00:18:59.731Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T00:19:36.720Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T00:47:17.118Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T00:59:47.321Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T01:04:02.020Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T01:17:04.233Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T01:18:04.536Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T01:35:05.592Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T01:35:28.490Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T01:36:16.316Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T01:36:33.826Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T01:39:22.991Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T02:01:14.087Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T02:08:04.429Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T02:08:12.200Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T02:08:21.271Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T02:27:26.982Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T02:32:04.512Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T02:55:04.527Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:16:53.375Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:20:09.641Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:20:51.209Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:29:17.286Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:29:24.684Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:35:49.842Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:53:43.964Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:54:18.263Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T03:54:24.878Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T04:26:44.306Z CNTRLR The controller does not support soft reset or the soft reset feature has been
2023-10-25T04:27:24.565Z CNTRLR The controller does not support soft reset or the soft reset feature has been

@kpine
Copy link
Contributor

kpine commented Oct 25, 2023

If I try to execute some command from zwavejs-ui it works

Make sure you aren't also being affected by home-assistant/core#102637. If you have any thermostats, you may need to re-interview them. You can confirm in ZUI if there are any problems with them (unknown setpoints).

@AlCalzone
Copy link
Member Author

@candrea77 going by that log, your controller randomly stops responding to commands. Normally it acknowledges commands it got from Z-Wave JS within a 2-3 milliseconds. Then at times it takes 3-5 seconds, or doesn't send one at all. That's when Z-Wave JS tries to recover it by restarting the controller (soft-reset), which usually works but takes a couple of seconds. Disabling it means Z-Wave JS is going to restart itself, and re-open the serial port, maybe even a few times until the stick becomes responsive again on its own. This is definitely going to take longer than leaving soft-reset enabled.

Not sure how the situation was before the latest updates, but I guess it was the same - you just didn't notice it that well. You'd probably end up with a couple of random "dead" nodes in this situation, because Z-Wave JS would just continue trying commands that fail, until the stick recovered on its own.

I'm not sure why this keeps happening - there isn't really an obvious trigger. Sometimes it happens when sending commands, sometimes it happens when reading the noise levels.
Switching to a different controller would probably solve that, but then you'll have to deal with the 700 series EU range issues or start over with an 800 series controller (due to missing migration functionality), where I'm not sure the range is actually better.

I should get my hands on a new 800 series stick today or tomorrow, but I can't make any guarantees if and when the migration will be supported.

@candrea77
Copy link
Contributor

candrea77 commented Oct 25, 2023

Not sure how the situation was before the latest updates, but I guess it was the same - you just didn't notice it that well. You'd probably end up with a couple of random "dead" nodes in this situation, because Z-Wave JS would just continue trying commands that fail, until the stick recovered on its own.
Yep, you are right.
Before updates, I've got random dead nodes .... this is way you can notice a "ping" action button on my "Home Assistant" screen ... it was needed to wake up dead nodes.

Switching to a different controller would probably solve that, but then you'll have to deal with the 700 series EU range issues or start over with an 800 series controller (due to missing migration functionality), where I'm not sure the range is actually better.
Yep, clear to me.
800 isn't ready right now for a live migration.

I will probability wait some week (I've planned to move my hardware to a different floor of my house) and then start with a new 700 controller (and in case it was worst then the 500 series .... switch back to 500)

@AlCalzone : Is there a way to disable not only to SoftReset but also the HardReset of the controller, let him waite for its own recover ?

@AlCalzone
Copy link
Member Author

I am considering adding an option to disable the automatic recovery feature altogether.

@wolph
Copy link

wolph commented Oct 25, 2023

I've got an UZB (https://z-wave.me/products/uzb/) and I'm experiencing these issues as well. Previously I never had any issues (no random dead nodes either as far as I am aware) but now I need to restart my zwave docker at least daily or it completely stops working.

I'm currently considering making a docker healthcheck to auto restart when it stops working again.

I'm not sure if my controller has a soft reset option but the button in the interface gives me an error :)

@AlCalzone
Copy link
Member Author

AlCalzone commented Oct 25, 2023

UZB is one of the sticks that are blacklisted from soft reset because it shuts down completely and has to be physically re-plugged.

Do you have driver logs of your issue anyways?

@candrea77
Copy link
Contributor

As far as I undeserstood :

BEFORE v12 : Zwavejs never try to reset the zstick.
When a node does not reply to the controller, the controller marked the node as "dead" by itself.
User can wake up a "dead node", triyng to ping it.
If it reply to the ping request, the controller mark this node as alive.

NOW : When a node does not reply to the controller for a specific amount of (time/number of retry ??) , the "zwavejs software" try to reset (soft / hard) the controller.

@AlCalzone : Is my assumption correct ?

@wolph
Copy link

wolph commented Oct 26, 2023

UZB is one of the sticks that are blacklisted from soft reset because it shuts down completely and has to be physically re-plugged.

Do you have driver logs of your issue anyways?

I've enabled the driver logs now. I'll let you know as soon as it breaks again. But I should mention that I've switched from the latest to the master docker tag in the mean time. I'm not sure if that makes any difference :)

@AlCalzone
Copy link
Member Author

AlCalzone commented Oct 26, 2023

Not quite @candrea77

BEFORE v12:
Z-Wave JS never tried to reset the stick.
When the controller did not respond to Z-Wave JS, and the command where it did not respond was sending something to a node, Z-Wave JS considered the node to be dead, although the controller is the problem.
Attempting communication with the node later will likely succeed, causing Z-Wave JS to consider the node alive again.

NOW:
When the controller does not respond to Z-Wave JS, Z-Wave JS tries to reset the controller to make it responsive again, either through the soft-reset command, or by restarting itself and reopening the serial port.
Z-Wave JS no longer considers the node to be dead when the controller is the problem.

As you see the new behavior is more correct, although it can be pretty disruptive if the controller often becomes unresponsive like yours.

@2Fake
Copy link

2Fake commented Oct 26, 2023

Having the same issue since a couple of versions. Do you still need any logs?
I'm running in a docker container

zwave-js-ui: 9.2.3.1454eca
zwave-js: 12.2.1

Z Wave stick is a Aeotec Z-Stick Gen5

@candrea77
Copy link
Contributor

@AlCalzone :
My problems with this new function, is mostly based on how Home Assistant see the network during the restart,
It become unavailable (all sensors)
image
This cause a lot of trouble :

    • In the interface (visibile artifact .... se next two images)
      image
      image
    • Also cause troubles to Alarm as in my previous post (that most important) :sensors that switch between ON --> UNAVAILABLE --> ON seems to trigger an alarm.
    • Sometimes (as yestarday) despite zwaves restart and work corrrectly (example via zwave-ui I can turn on-off sensors) , home assistant seems to lose the link with zwavejs (have to restart homeassistant to let it work again)
      image

So , if I may , here you are my suggestion : is it possibile to stop reporting the "restart action" to home assistant and "freeze" the node status until controller restart ?
In fact I suppose that due to high frequency controller restart, it cause troubles to home assistant (in my network more that 5 times/hour)

@AlCalzone
Copy link
Member Author

I think zwave-js just shouldn't restart and instead try re-opening the serial port (see opening post, the unchecked point). That way what you're seeing doesn't happen.

@wolph
Copy link

wolph commented Oct 29, 2023

UZB is one of the sticks that are blacklisted from soft reset because it shuts down completely and has to be physically re-plugged.

Do you have driver logs of your issue anyways?

Unfortunately I'm not sure when it stopped working exactly, but this log should cover it all
zwavejs_2023-10-27.log.gz

@AlCalzone
Copy link
Member Author

@wolph your problem seems to be Node 17 - the stick takes extraordinarily long to communicate with that one (> 30s), so Z-Wave JS often aborts these attempts.
There's one attempt where the abort takes just a fraction of a second too long, and it looks like Z-Wave JS doesn't handle this well.

I've raised two issues:

  1. We shouldn't try and poll meter values from nodes that are considered dead Do not refresh stale values from dead nodes #6468
  2. In your situation (disabled soft reset), we shouldn't stop transmitting indefinitely Missing Send Data callback with disabled soft reset causes all transmissions to stop #6469

Aside from that you may want to try and fix the situation with Node 17 yourself, e.g. by excluding it, factory resetting, including again. Not sure if that helps, but even when we handle the situation more gracefully, that device will still keep the controller busy and block all other communication while it's ongoing.

@wolph
Copy link

wolph commented Oct 30, 2023

I've noticed the slowness with node 17 as well but I've got no clue what the cause is to be honest. Perhaps it's simply a broken node, I've noticed it being flaky in the past as well.
Thank you for the help in any case, I'll see what I can do with it :)

If I may, I would like to raise another issue as well although I'm not sure if this is the right place for that. When it currently fails it doesn't indicate to Home Assistant that it has failed. So all devices appear to work but they simply stop responding. It would be nice if the devices would be set to unavailable if the controller is unavailable.

@AlCalzone
Copy link
Member Author

It would be nice if the devices would be set to unavailable if the controller is unavailable.

The controller status is exposed by this library, so this would be a feature request for HA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants