Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

General Z-Wave Instability with 2.2.0 and Later #602

Closed
jbhorner opened this issue Oct 19, 2023 · 34 comments
Closed

General Z-Wave Instability with 2.2.0 and Later #602

jbhorner opened this issue Oct 19, 2023 · 34 comments
Labels
stale There has not been activity on this issue or PR for quite some time.

Comments

@jbhorner
Copy link

jbhorner commented Oct 19, 2023

Problem/Motivation

After upgrading from 2.1.2 to 2.2.0 and subsequently 2.2.1, my Z-Wave network has been unstable. Nodes will randomly go dead frequently. The prior to these updates, the network was reliable. (From a user perspective...things behind the scenes may have been different, but there were no negative functionality observations.)

Expected behavior

Stability in the network. Node stay "alive."

Actual behavior

After updating, there were nodes that just started to "die." Toggling the node on and off (the nodes that were noted as dead are all powered) brings the node back online for a period of time. I have performed network repairs through the ZW UI to see if that helped. It did not.

Steps to reproduce

Update to 2.2.0 or 2.2.1. Observe node status or set up an automation to notify when a node's status changes from "alive" to dead.

Proposed changes

No thoughts on this. I did read the release notes and saw that there were changes that were made to mark nodes as dead under certain circumstances.
zwave-js-ui-store (1).zip
zwave-js-ui-store.zip

@zharling
Copy link

+1

@jprates
Copy link

jprates commented Oct 21, 2023

+1

During the day there also seem to be quite a few random restarts of the add-on during the day.

I'd also like to add, don't know if relevant to the case, that some more complex devices like the Quibino ZMNHXD (3-phase meter) lost a lot of entities that became unavailable and seems to have renamed a few others. Re-query won't solve the problem.

I started by renaming the entities back to their original names since many of my automations were obviously now failing, till I noticed the unavailable ones and just gave up.

I came here to see if anyone had already reported it and there were good news already, but that does not seem to be the case.
I'm reverting back to the standard z-wave add-on hopping things work there.

@jbhorner
Copy link
Author

jbhorner commented Oct 21, 2023

I came here to see if anyone had already reported it and there were good news already, but that does not seem to be the case.
I'm reverting back to the standard z-wave add-on hopping things work there.

It will be interesting to hear your success with that. My belief is that it is the driver, which I believe is shared by both add-ons. What Z-Wave controller are you using? I'm using a Zooz 800 Series. I specifically selected this one recently after seeing the problems (another issue) with 500 Series controllers, and some firmware version problems with the 700 Series controllers.

With all of the problems I have had over the past two months with Z-Wave, I'm looking to migrate away from it completely. I don't know what changes started in August, but it seems to be a series of ongoing problems now versus the past.

@davidcoulson
Copy link

I'm using the aeotec gen7+ stick and 2.2.x is basically unusable. I rolled back to 2.1.2 and it's working again.

@jprates
Copy link

jprates commented Oct 21, 2023

It will be interesting to hear your success with that.

You guess it right.
All the same with the standard/default add-on.
What a disappointment!

I personally had a stable z-wave system for over 1 year, perhaps 2.
I agree with you, someone has done more harm to z-wave users on HASS over these last few weeks than years before summed up.

I understand this is freeware software, I understand this is open source, we can't complain for something we did not pay a cent, I understand all of that.
However, I should also point out there are a lot of users using this code, so the first rule should always be test, the second rule should be test again, and the third rule must be test once more, out of respect for those end-users.

We come to expect enhancements on each release, not in a million chances we ever expect new releases to render our systems unusable.

Sorry for the rant.
I'm available to assist solving this mess, but I would much rather prefer this mess didn't occur at all in the first place.

@jprates
Copy link

jprates commented Oct 21, 2023

I'm using the aeotec gen7+ stick and 2.2.x is basically unusable. I rolled back to 2.1.2 and it's working again.

I don't know if I have backups from 2.1.2, I'll check, thanks for the info!

EDIT: I have! Hurray! Rolling back now...

@davidcoulson
Copy link

I don't know if I have backups from 2.1.2, I'll check, thanks for the info!

Would be great if there was an option in HA to rollback the image to a specific version. I keep all my addon backups for a month because of all the zwave issues lately :(

@jprates
Copy link

jprates commented Oct 21, 2023

Well, for a few minutes all was well, I had voltage and amps values again, I was about to come here to thank you @davidcoulson , then they went away again. STRANGE!

I'm starting to suspect the problem is in the definition of the device, perhaps someone edited the properties for the device and broke it. Somehow it seems shortly after the rollback the device properties were updated automatically and once again I lost attributes that became unavailable again.

What a mess.
Next I'll try to remove the device from the network and add it back with the same name to see if it works.
Let me tell you, the next time I get this thing working I'll immediately disable updates and never update it again!

@jprates
Copy link

jprates commented Oct 21, 2023

I don't know if I have backups from 2.1.2, I'll check, thanks for the info!

Would be great if there was an option in HA to rollback the image to a specific version. I keep all my addon backups for a month because of all the zwave issues lately :(

I'm using "Home Assistant Google Drive Backup" add-on and it gives you just that, try it out, it works like a charm!

@davidcoulson
Copy link

I'm starting to suspect the problem is in the definition of the device, perhaps someone edited the properties for the device and broke it. Somehow it seems shortly after the rollback the device properties were updated automatically and once again I lost attributes that became unavailable again.

Did you try reinterviewing the node?

@davidcoulson
Copy link

I'm using "Home Assistant Google Drive Backup" add-on and it gives you just that, try it out, it works like a charm!

Yeah that is what i am using too, but you have to make sure it doesn't remove the backup otherwise it's impossible to roll back.

@jprates
Copy link

jprates commented Oct 21, 2023

Another possible hint on the problem... from all the 33 devices I had, in one of the more recent z-wave updates I suddenly gained the notification of having 20 repairs to perform on those devices... quoting one paragraph:

"Z-Wave JS discovers a lot of device metadata by interviewing the device. However, some of the information has to be loaded from a configuration file. Some of this information is only evaluated once, during the device interview."

Something tells me those configuration files got beaten up pretty well and that's the root cause of all our issues... worth investigating IMHO, it's definitely not normal to have 20 repairs to do on almost all of my z-wave devices!

image

EDIT: Some minutes after it went up to 30 devices needing repair... WTF...

@jprates
Copy link

jprates commented Oct 21, 2023

I'm starting to suspect the problem is in the definition of the device, perhaps someone edited the properties for the device and broke it. Somehow it seems shortly after the rollback the device properties were updated automatically and once again I lost attributes that became unavailable again.

Did you try reinterviewing the node?

yeah, the attributes are still there, but unavailable... a small cut from the properties:

image
image

@jprates
Copy link

jprates commented Oct 21, 2023

Update: After excluding and including the Qubino devices (I have 2 of them), all entities are back available and reporting values.

image

The only thing strange is that firmware versions are way different, when both devices are the same bought at the same time.
Quite hard to imagine they have firmware versions that different, but I guess it can happen... and they both report being up to date... :-/

image

I'd say it's the update/upgrade/repair process that messes up with the device altogether, I don't know, I'm just reporting what happened to me hopping it will help other users or even the developers.

Cheers,

@jbhorner
Copy link
Author

I've opened this issue under HAS Core as well, as I'm not sure if I was right to open it here directly. (I'd used the link in the Add-on documentation.)

When I first installed the 2.2.0, I was also presented with several "repairs" by HA Core. Each of the repairs said it was necessary to interview several devices. I performed this for each device. I didn't think much about it at the time, as it was conceivable that the add-on was fixing issues that earlier versions caused or didn't address themselves.

I echo the frustration noted above, but also balance that with the knowledge that the developers volunteer their time here, and do not have a means by which they can test every conceivable configuration. They also have their "day jobs/activities." I think there was a driver architecture change/reconciliation that started in July/August, and since that time some latent problems might have come up. Pure conjecture on my part.

For those who have not created and uploaded logs, I'd encourage you to do so. This is what helps the developers the most.

@jprates
Copy link

jprates commented Oct 21, 2023

developers volunteer their time here, and do not have a means by which they can test every conceivable configuration. They also have their "day jobs/activities."

You are absolutely correct, I think we all understand, appreciate, and value that!
However, that has always been the case, and never in the past, at least that I recall, z-wave got to damaged as these last weeks.

It's not just z-wave, signal add-on is literally unusable now as well, perhaps it's just a sign of times, or a passing phase to quote Pink Floyd, but it's worth signaling our disappointment I think, so that developers realize we're going down hill here.

I'll stop commenting now, I just realized we're on github, not on HASS Community... apologies to all for this.

@jbhorner
Copy link
Author

I don't know if I have backups from 2.1.2, I'll check, thanks for the info!

Would be great if there was an option in HA to rollback the image to a specific version. I keep all my addon backups for a month because of all the zwave issues lately :(

I'm using "Home Assistant Google Drive Backup" add-on and it gives you just that, try it out, it works like a charm!

I've never had luck restoring ZWave addons. Every time, after restoring the backup, I get : Image ghcr.io/hassio-addons/zwave-js-ui/amd64:2.1.2 does not exist for addon_a0d7b954_zwavejs2mqtt. Is there a process outside of restoring the backup to make this work? (I usually just restore a VM snapshot.)

@davidcoulson
Copy link

I've never had luck restoring ZWave addons. Every time, after restoring the backup, I get : Image ghcr.io/hassio-addons/zwave-js-ui/amd64:2.1.2 does not exist for addon_a0d7b954_zwavejs2mqtt. Is there a process outside of restoring the backup to make this work? (I usually just restore a VM snapshot.)

Usually you just need to wait longer for it to restore. Or just run the restore again.

@alistair23
Copy link

home-assistant/core#102477 for reference

@JtwoA
Copy link

JtwoA commented Oct 23, 2023

Piling on here. Current version: 2.2.3 and this morning every single device is "Dead". I use this primarily for my iBlinds and a couple other ancillary items but the blinds are kinda critical.

@JtwoA
Copy link

JtwoA commented Oct 23, 2023

Piling on here. Current version: 2.2.3 and this morning every single device is "Dead". I use this primarily for my iBlinds and a couple other ancillary items but the blinds are kinda critical.

Update: restarting the add-on restored all but two iblinds. Then a third dropped. Then one of the two dead nodes restored. It has a life of it's own atm.

@christianreiss
Copy link

Any update on this? Any fixes in the last days?

@tsf0x13
Copy link

tsf0x13 commented Oct 27, 2023

+1

2023-10-22 12:48:59.245 INFO Z-WAVE: [Node 063] Is dead
2023-10-22 12:48:59.292 INFO Z-WAVE: [Node 054] Is alive
2023-10-22 12:48:59.371 INFO Z-WAVE: [Node 039] Is alive
2023-10-22 12:48:59.499 INFO Z-WAVE: [Node 038] Is alive
2023-10-22 12:48:59.611 INFO Z-WAVE: [Node 053] Is alive
2023-10-22 12:48:59.660 INFO Z-WAVE: [Node 044] Is alive
2023-10-22 12:48:59.807 INFO Z-WAVE: [Node 055] Is alive
2023-10-22 12:48:59.849 INFO Z-WAVE: [Node 065] Is alive
2023-10-22 12:48:59.895 INFO Z-WAVE: [Node 064] Is alive
2023-10-22 12:48:59.943 INFO Z-WAVE: [Node 048] Is alive
2023-10-22 12:49:07.942 INFO Z-WAVE: [Node 026] Is dead
2023-10-22 12:49:20.319 INFO APP: GET /health/zwave 301 2.076 ms - 191
2023-10-22 12:49:38.483 INFO Z-WAVE: Controller status: Controller is unresponsive
2023-10-22 12:49:50.472 INFO APP: GET /health/zwave 301 5.830 ms - 191

@davidcoulson
Copy link

Has anyone tried the 3.0.0 Add-On update yet? I'll give it a go tomorrow, but wasn't sure if anyone had beat me to it :)

@jbhorner
Copy link
Author

Has anyone tried the 3.0.0 Add-On update yet? I'll give it a go tomorrow, but wasn't sure if anyone had beat me to it :)

I installed it yesterday and haven't had any issues. I was on 2.1.2 before due to issues I was having with later versions, with devices moving to a "dead" status randomly. Though after installation of 3.0.0 I was still presented with several repairs that were necessary (all tied to my motion sensors), I executed those and everything has been stable.

I can't speak, of course, to issues others were having. Mine, for the moment, seems to have been resolved.

@JtwoA
Copy link

JtwoA commented Nov 1, 2023

Has anyone tried the 3.0.0 Add-On update yet? I'll give it a go tomorrow, but wasn't sure if anyone had beat me to it :)

I updated immediately and for a few days everything was fine. Now I'm right back to 12-15/17 devices going dead and requiring everything from a simple "ping" to bring them back to completely removing/repairing.... which is a major PITA because it means retouching every single automation rule they were in.

I wish someone involved in this would bother to acknowledge this issue.

@tsf0x13
Copy link

tsf0x13 commented Nov 2, 2023

Update to 3.0.1 does not resolve the problem - after HA reboot the status "Controller is unresponsive"

@tsf0x13
Copy link

tsf0x13 commented Nov 12, 2023

Update to 3.0.2 and Home Assistant to 2023.11.2 resolve a problem. 12 hours after update - all is ok

@Wiigian
Copy link

Wiigian commented Nov 13, 2023

I can also confirm that 3.0.2 is working for me after lots of issues with the earlier versions. Running 500-series stick in a Proxmox VM. I am still running Home Assistant 2023.9.1 in case I needed to roll back the z-wave driver.
After disabling the soft reset, I have been running for 72 hours without problems so far.

@tsf0x13
Copy link

tsf0x13 commented Nov 15, 2023

so bad news. =(
after few hours have a problem instability Z-Wave again
reboot entire node resolve problem, but appears again after few hours (

                             n 100 ms.

2023-11-15T18:36:47.667Z DRIVER » [REQ] [GetPriorityRoute]
node ID: 44
2023-11-15T18:36:47.671Z CNTRLR Failed to execute controller command after 2/3 attempts. Scheduling next try i
n 1100 ms.
2023-11-15T18:36:48.774Z DRIVER » [REQ] [GetPriorityRoute]
node ID: 44
2023-11-15T18:36:48.779Z CNTRLR Retrieving priority route failed: Failed to send the message after 3 attempts
(ZW0202)
2023-11-15T18:36:48.785Z DRIVER » [REQ] [GetPriorityRoute]
node ID: 54
2023-11-15T18:36:48.787Z CNTRLR Failed to execute controller command after 1/3 attempts. Scheduling next try i
n 100 ms.
2023-11-15T18:36:48.889Z DRIVER » [REQ] [GetPriorityRoute]
node ID: 54
2023-11-15T18:36:48.892Z CNTRLR Failed to execute controller command after 2/3 attempts. Scheduling next try i
n 1100 ms.

revert back =(

@tsf0x13
Copy link

tsf0x13 commented Nov 16, 2023

self-enabled soft-reset after reboot& Disable it's again solve problems. Thanks a lot @Wiigian !

Soft-Reset is disabled at the software interface, the Z-Stick Gen5 does not support soft reset which is why this can cause issues. If you have ZWaveJS UI

  1. Open ZWaveJS UI
  2. Open the Menu -> Settings -> Z-Wave
  3. disable / grey out the switch next to Soft Reset:

Copy link

There hasn't been any activity on this issue recently, so we clean up some of the older and inactive issues.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thanks!

@github-actions github-actions bot added the stale There has not been activity on this issue or PR for quite some time. label Dec 16, 2023
@JtwoA
Copy link

JtwoA commented Dec 16, 2023

Funny the bot bumped this. Updated to HA 12.3 last night and my ZWaveJS lost 13/15 devices. My other automation system initiated a ping to no avail. Manually restarting ZWaveJS brought all but one back. I got that one back this morning by manually intervening.

@github-actions github-actions bot removed the stale There has not been activity on this issue or PR for quite some time. label Dec 17, 2023
Copy link

There hasn't been any activity on this issue recently, so we clean up some of the older and inactive issues.
Please make sure to update to the latest version and check if that solves the issue. Let us know if that works for you by leaving a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thanks!

@github-actions github-actions bot added the stale There has not been activity on this issue or PR for quite some time. label Jan 16, 2024
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jan 23, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Feb 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
stale There has not been activity on this issue or PR for quite some time.
Projects
None yet
Development

No branches or pull requests

9 participants