Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devices unavailable after Conbee to SkyConnect migration #86231

Closed
prnzngr opened this issue Jan 19, 2023 · 79 comments
Closed

Devices unavailable after Conbee to SkyConnect migration #86231

prnzngr opened this issue Jan 19, 2023 · 79 comments
Assignees

Comments

@prnzngr
Copy link

prnzngr commented Jan 19, 2023

The problem

I switched from Deconz/Conbee to ZHA/Skyconnect and since then the problems started.
Everyday some random devices (Aqara sensors, Ikea Bulbs) getting unavailable. I have to pair them again and the next day some diffent device is unavailable.
With Deconz/Conbee I never had this problems.

What version of Home Assistant Core has the issue?

core-2023.1.5

What was the last working version of Home Assistant Core?

core-2022.11

What type of installation are you running?

Home Assistant OS

Integration causing the issue

zha

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha

Diagnostics information

zha-f63193a751e4f18abbdbeaba6257af97-LUMI lumi.sensor_magnet.aq2-38e364b71addd13e8f72c896cb83e544.json.txt
zha-f63193a751e4f18abbdbeaba6257af97-Zigbee Coordinator-aabc7e236d33f421590e46eb61ea0400.json.txt

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2023-01-19 12:35:59.016 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x99DE](lumi.sensor_magnet.aq2): last_seen is 141280.53930068016 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:03.510 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAD16](lumi.sensor_wleak.aq1): last_seen is 1038776.234536171 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:04.011 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x0CBA](SmokeSensor-EF-3.0): last_seen is 167550.2345740795 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:28.082 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x6378](lumi.sensor_wleak.aq1): last_seen is 144684.77171301842 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:30.030 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x3BBB](TRADFRI remote control): last_seen is 144489.35309433937 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:30.057 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xDB20](TRADFRI control outlet): last_seen is 614320.0762073994 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:32.193 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAE69](lumi.sensor_magnet.aq2): last_seen is 169974.5131304264 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:47.953 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xC7D4](TRADFRI bulb E27 WS opal 1000lm): last_seen is 20564.596658945084 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:52.008 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x2A27](lumi.sensor_magnet.aq2): last_seen is 108069.15588212013 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:36:52.032 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x360D](lumi.sensor_magnet.aq2): last_seen is 170186.2264046669 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:04.510 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAD16](lumi.sensor_wleak.aq1): last_seen is 1038837.235011816 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:12.012 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x0CBA](SmokeSensor-EF-3.0): last_seen is 167618.23617124557 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:12.019 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x99DE](lumi.sensor_magnet.aq2): last_seen is 141353.54190301895 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:32.083 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x6378](lumi.sensor_wleak.aq1): last_seen is 144748.77293539047 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:36.059 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xDB20](TRADFRI control outlet): last_seen is 614386.0775065422 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:45.031 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x3BBB](TRADFRI remote control): last_seen is 144564.3546833992 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:37:51.194 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAE69](lumi.sensor_magnet.aq2): last_seen is 170053.51456308365 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:05.511 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xAD16](lumi.sensor_wleak.aq1): last_seen is 1038898.2359614372 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:11.954 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0xC7D4](TRADFRI bulb E27 WS opal 1000lm): last_seen is 20648.59741950035 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:20.009 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x2A27](lumi.sensor_magnet.aq2): last_seen is 108157.15664672852 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:20.013 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x0CBA](SmokeSensor-EF-3.0): last_seen is 167686.2369081974 seconds ago and ping attempts have been exhausted, marking the device unavailable
2023-01-19 12:38:25.020 DEBUG (MainThread) [homeassistant.components.zha.core.device] [0x99DE](lumi.sensor_magnet.aq2): last_seen is 141426.5432062149 seconds ago and ping attempts have been exhausted, marking the device unavailable

Additional information

No response

@home-assistant
Copy link

Hey there @dmulcahey, @Adminiuga, @puddly, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Change the title of the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.

(message by CodeOwnersMention)


zha documentation
zha source
(message by IssueLinks)

@puddly
Copy link
Contributor

puddly commented Jan 19, 2023

Please download diagnostics for the ZHA integration after letting it run for a few hours:

image

@prnzngr
Copy link
Author

prnzngr commented Jan 19, 2023

@puddly
Copy link
Contributor

puddly commented Jan 19, 2023

Aqara sensors becoming unavailable is not unusual: when joining a network, they pick the very first parent they detect, which rarely is a physically close one. If they picked good parent (at random) when joining your Conbee network but picked a bad one with the new network, that would be an issue. You can force them to pick a new parent by re-joining them to your network via a specific, physically-close routing device:

image

Is your SkyConnect in exactly the same position as the Conbee, plugged into the exact same USB extension cable? Is it away from USB 3.0 devices, SSDs, 2.4GHz routers, etc.?

@snike3
Copy link

snike3 commented Jan 20, 2023

I can confirm this bug report.

I have 8 Aqara water sensors and 2 temperature sensors. I've been using ZHA with a Conbee II USB stick (latest firmware) for a couple years without issues. Using 2023.1.3 all my sensors were working. After upgrading to 2023.1.5 none of the Aqara sensors will not stay connected for longer than a few hours.

I have 76 total zigbee devices (Jasco, SmartThings, Centralite, Aqara) with a large number of them being routing devices. Looking at the ZHA device map, none of the Aqara devices are showing connections to any routing device or the coordinator. They're just kind of floating.

I went through and removed and re-paired all the Aqara devices using the "Add via this device" on the closest routing device (typically 5-15 ft total distance with no walls). Unfortunately, that didn't help. All the devices went to unavailable after the ZigBee battery timeout found in the ZHA configuration settings.

For more debugging I swapped in a Sonoff ZBDongle Plus E for the Conbee II, but I get the same result.

Tonight I'll try reverting to an older version of HA Core.

System Details:
Raspberry Pi 4 4G
Deconz Conbee II 26780700 / Sonoff ZBDongle Plus E 1.0.1
Home Assistant OS 9.4

  • Core 2023.1.6
  • Supervisor 2022.12.1

@Adminiuga
Copy link
Contributor

Joining aqara through any router won't work. You have to join it only through an aqara compatible router. There was a list somewhere on the forums

@prnzngr
Copy link
Author

prnzngr commented Jan 21, 2023

You can force them to pick a new parent by re-joining them to your network via a specific, physically-close routing device:

I already tried this method, it doesn't help

Is your SkyConnect in exactly the same position as the Conbee, plugged into the exact same USB extension cable? Is it away from USB 3.0 devices, SSDs, 2.4GHz routers, etc.?

Yes, everything the same and away of routers, SSD and so on

@prnzngr
Copy link
Author

prnzngr commented Jan 21, 2023

Joining aqara through any router won't work. You have to join it only through an aqara compatible router. There was a list somewhere on the forums

It is not only affecting aqara. Ikea Bulbs, Motion Sensors and so on are also affected

@prnzngr
Copy link
Author

prnzngr commented Jan 21, 2023

today I had again some unavailable devices, re-joined them with bulbs nearby (as you suggested) .
In the visualization they still have no limb to a router like other devices
image

@Rogue136198
Copy link

Joining aqara through any router won't work. You have to join it only through an aqara compatible router. There was a list somewhere on the forums

It is not only affecting aqara. Ikea Bulbs, Motion Sensors and so on are also affected

I can confirm. I have been having nothing but issues with my hue motion sensors since switching to the skyconnect.

@MattWestb
Copy link
Contributor

HUE motion sensors (i have 2 in production system) is real jumpers and is in the end using the worse parent they can but then they is being stable. I think you need having some router around them so they can doing there jumping thing or they is leaving the network.

You have one very long issue in Z2M with the problem Koenkk/zigbee2mqtt#2693.

@prnzngr
Copy link
Author

prnzngr commented Jan 23, 2023

HUE motion sensors (i have 2 in production system) is real jumpers and is in the end using the worse parent they can but then they is being stable. I think you need having some router around them so they can doing there jumping thing or they is leaving the network.

You have one very long issue in Z2M with the problem Koenkk/zigbee2mqtt#2693.

as I have already written, there have been no problems at all with Deconz

@kenwiens
Copy link

kenwiens commented Jan 23, 2023

I cannot confirm when my zigbee devices stopped working, but it was since Jan 1 2023. Every day I have a different set of Aqara motion sensors, zigbee switches and other zigbee devices that become unavailable. I tried some ikea plugs to use as routers but that didn't make any difference. Wish I seen this thread earlier as I assumed the problem was my house just couldn't handle zigbee and I have been busy returning my zigbee stuff and migrating to yolink.

The only devices that haven't gone off line at some point in the last month were those within 2m of the stick (with no walls). An Aqara motion sensor about 4m away (1 simple wall) goes off line intermittently. Beyond that, my devicers come "online" intermittently.

I am using a sonoff zigbee 3 stick. I did try moving it from usb3 to usb2, I tried a 1 m and a 2 m extension cable with multiple different locations for the stick. None of this had changed prior to the disconnect issue arising, but I was going through various trouble shooting scenarios so decided to try this.

The timing of my issues does seem to correlate with the revision history discussed above in this thread. These devices were relatively solid last year (Aug - Dec 2022)

I'm running HA 2022.1.7 on a Raspberry Pi

@atmezferix
Copy link

I'm also having multiple sensor dropping off as late. Usually the Samsung Centralite ones, I've been lucky with my Aqara ones by the sound of it. I'm unable to get one of the Samsung sensors working again as well despite re-pairing it, it also won't go through a re-configure. I couldn't tell you what version exactly it started but I've had Philips hue bulbs dropping off which has never happened in the last 2 years of using Home Assistant, so to me something is definitely wrong. Hope it gets fixed soon as before long I'm likely to fall down the stairs in the dark.

@snike3
Copy link

snike3 commented Jan 23, 2023

Alright, I've been doing some testing and researching all weekend... Here's my findings.

TL;DR;
The Sonoff ZBDongle Plus E does not appear to be compatible with Aqara devices, but deconz devices are (at least the Conbee II). Ensure Aqara devices are directly connected to compatible routers even if your coordinator is compatible. It seems as though something in the HA Core 2023.1.x update can cause devices to switch to a different routing device which may not be compatible with Aqara. Using a Conbee II with the deCONZ/Phoscon integration & add-on allows for stable Aqara devices.

Full History & Troubleshooting:
I've been operating with a Conbee II using ZHA for over 2 years. I believe all my Aqara sensors (10 of them) were directly connected to the coordinator, and have had no issues with anything Zigbee (76 total devices - Centralite, Jasco/GE, Orbit, SmartThings, Innr, PEQ/Centralite, SmartThings/Centralite, Iris/Centralite) in that time. All my routing devices are Jasco/GE or Centralite branded.
I was running HA Core 2023.1.3 for several days with my Conbee II stick without any issues.

Upgraded to HA Core 2023.1.5. All of a sudden multiple Aqara sensors went unavailable, but others stayed connected. I was in this state about a week, so I purchased the Sonoff ZBDongle Plus E after reading a few forums (should have read more).

Used the ZHA migration to switch to the Sonoff dongle and everything seemed good, but by the next morning ALL the Aqara devices were unavailable.

I've been trying to force specific routing devices to the Aqara sensors (like mentioned in an earlier post) while using the Sonoff dongle. Turns out that neither the Jasco/GE nor the Centralite routing devices I have are Aqara friendly. Connecting the sensors to those devices or the Sonoff dongle directly will not result in a stable connection.

I tried reverting back to HA Core 2022.12.9, but got the same results.
Next I wanted to go back to the last 100% working configuration (HA Core 2022.12.9 using the Conbee II). Unfortunately, migrating from the Sonoff to Conbee II did not work. ZHA said it was successful and all my devices showed up, but nothing worked. It seemed as though I was going to have to completely rebuild my Zigbee network to get back to this state, so I thought a bit about that...

Decided I wanted to use the new Zigbee 3.0 stick instead of the old Conbee II, so I investigated using both sticks at the same time.

First, I completely rebuilt the Zigbee network from scratch using the Sonoff dongle and ZHA (this took a long time) to make sure my ZHA setup was up-to-date/clean. I left the Aqara sensors disconnected.
Then, I setup the Conbee II using zigbee2mqtt (Z2M). Unfortunately when I added an Aqara sensor, Z2M reported that it was "unsupported" and gave no information.
So I uninstalled Z2M and installed deCONZ/Phoscon. The Phoscon interface sucks to use on mobile device, but it's workable by rotating the phone to landscape occasionally. HA didn't auto-discover the service like it's supposed to, but it was easy enough to configure the Integration to point to the Add-on (the information is in the Add-on documentation).

Using deCONZ/Phoscon I was able to add all the Aqara sensors. It correctly identified them and even had a pretty image of the device. In HA I then refreshed the integration and all the sensors showed up.

My HA has been sitting this way for over a day now. I've had no unavailable devices on ZHA or deCONZ/Phoscon. Reviewing the history for the Aqara devices I can see that they're periodically updating (temperature graph shows changes).

So.... there's my solution to the problem... Not pretty or optimal, but functional. I now have 2 Zigbee networks. One on the Sonoff ZBDongle Plus E running ZHA and one on the Conbee II running Phoscon/deCONZ that I'm only putting Aqara devices.

After reviewing LOTS of forums, I believe that there are some Aqara devices that will work with non-Aqara Zigbee 3.0 routers/coordinators, but it appears that the temperature/weather and leak sensors are not in that list. It sounds like this is due to an older method of keep alive used by Aqara vs the Zigbee 3.0 protocol.

Essentially, if you have a deconz coordinator and are using ZHA then Aqara devices should work (and has for me for years), but something in the new 2023.1.x versions is causing it to no longer work (guessing route optimization and updated reporting configuration). Going to keep watching for updates on the issue, but given I've spent many hours troubleshooting/debugging this issue on my setup I'm out for now.

@MattWestb
Copy link
Contributor

Rasp/CprnBee is one very dominant and normally is all end device having it as parent = good and bad.

EZSP is working little different and is normally no problems if the devices is working as Zigbee devices shall doing but Aqara is not doing it. And the problem is that EZSP is restarting its not knowing if some sleepers is being its children and they must pulling its parent OK after restart for being OK.
The problem is if they is pulling its parent then its not online then Aqara devices is leaving the network and must being forced going back.
And its strongly recommended having good routers in the network and connecting end devices to them for getting the mesh working well. With EZSP is possible blocking the coordinator having direct children = forcing then have router as parent.
I have lumi weather and magnet of different version and only one have problem with battery braining and leaving but its outside on the balcony ins its -1°C and snow around it.

I was 3 week on holiday with the laptop and coming back and connecting it and in one hour is all device online also all 10 Aqara sensors then they have routers online and was working well without the coordinator being online for out of house time.

Is up to you building one star network or one mesh network but Zigbe shall being one mesh network for healing and working OK.

@danTHAman152000
Copy link

I wanted to comment that my HA set up has been unstable since recent updates. Everything seems to work fine after a reboot, with the exception of my Zigbee aqara sensors. Sometimes none of them connect after the reboot, sometimes some of them do. But eventually all of HA locks up, all devices (aqara plus everything else) go offline, and I have to manually turn off the Pi and back. I am investigating on how to get logs, in case that can be of help. Is there any private info in these logs that I need to remove first?

@hitokiri8x
Copy link

hitokiri8x commented Jan 29, 2023

I'm also in everyone situation I buyed a ZBDongle Plus E, but my aquara devices doesn't stay connected: temperatures are a bit stable, but botton and magnetic door switch not even a tiny bit..

Alright, I've been doing some testing and researching all weekend... Here's my findings.

TL;DR; The Sonoff ZBDongle Plus E does not appear to be compatible with Aqara devices, but deconz devices are (at least the Conbee II). Ensure Aqara devices are directly connected to compatible routers even if your coordinator is compatible. It seems as though something in the HA Core 2023.1.x update can cause devices to switch to a different routing device which may not be compatible with Aqara. Using a Conbee II with the deCONZ/Phoscon integration & add-on allows for stable Aqara devices.

Could you link even in private some resource where you find information? I have 25days before I can return the device ( thanks Amazon ) and I refuse to keep two network and use my old cc2531..
I want to keep digging IF is solvable or better to return and switch to something else ( not sure what is new and stable with aquara )

@HarvsG
Copy link
Contributor

HarvsG commented Jan 30, 2023

I switched from Conbee 2 to SkyConnect (Both ZHA). I am having the same issues. Chiefly with IKEA shortcut buttons and 2-button controllers. I also have some issues with aqara contact sensors (but they were quite unreliable before). On at least one of the shortcut buttons even after I re-join it to the network clicks don't produce events - almost as if it has become immediately unavailable (pending the time out). Last Seen doesn't update.

I have my SkyConnect connected to the same USB2 extension cord that I had used for the conbee stick. RPI3B+ on HAOS.

Edit: I have transitioned back to the ConBee II and the devices are working well now

@maguiresf
Copy link

Copying this comment from another issue ticket, seems to be a duplicate of this one but this seems like the most active so I'm going to try and following here:

"Same problem here, just migrated from Deconz / Conbee II to ZHA / SkyConnect. Hue motion sensors get "stuck" in some state, occupancy may be either true or false but they never change. Require a re-pair to get them working again. Aqara magnet sensors seem to either stop sending updates or just become unavailable. This can happen twice a day on some of these devices. Latest HA version, latest SkyConnect firmware."

Everything was stable on Deconz / Conbee II and had been for about 3 years.

@prnzngr
Copy link
Author

prnzngr commented Feb 13, 2023

I switched back to Deconz / Conbee and everything works fine.
Seems like the Devs are not interested in this topic

@Hedda
Copy link
Contributor

Hedda commented Feb 18, 2023

Before even starting to troubleshoot any problems with those kinds of symptoms I always highly recommend following this in-depth best practice guide regarding reception optimization and interference avoidance -> https://community.home-assistant.io/t/guide-for-zigbee-interference-avoidance-and-network-range-coverage-optimization/515752/

As well as in addition a switch/change to using a less noisy Zigbee channel (which is part of that guide) -> https://www.home-assistant.io/integrations/zha#defining-zigbee-channel-to-use

Then also follow these other related best practices to at least re-pair your devices again in their final location after trying to take on all the suggested actions -> https://www.home-assistant.io/integrations/zha#best-practices-to-avoid-pairingconnection-difficulties

Note that those are actions that you need to take regardless of which Zigbee Coordinator radio adapter and Zigbee gateway solution you use.

If still have issues then you will need to enable debug logging and replicate the issue so that you can provide debug logs that show the exact time when the issues occur.

Again, please understand and remember that all and any problems will be much easier to narrow down and troubleshoot if you have already taken actions to reduce any sources of interference and changed to a Zigbee channel with less noise.

@prnzngr
Copy link
Author

prnzngr commented Feb 18, 2023

For these kinds of symptoms I always highly recommend following this in-depth best practice guide regarding reception optimization and interference avoidance, regardless of which Zigbee Coordinator radio adapter and Zigbee gateway solution that you use -> https://community.home-assistant.io/t/guide-for-zigbee-interference-avoidance-and-network-range-coverage-optimization/515752/ (and then if still have issues then also follow these other related best practices to at least re-pair your devices again in their final location after tried to take on all the suggested actions -> https://www.home-assistant.io/integrations/zha#best-practices-to-avoid-pairingconnection-difficulties)

If you have read this thread carefully you will see it is not because of radio issues.
with deconz no problems, with zha a chaos.

@austwhite
Copy link

I went from Deconz to ZHA and had similar issues at first. What I did was make sure all Aqara devices are paired close to the Co-ordinator so they don't try to pair through a router. I then put the Aqara devices in their place after they were paired and never had a problem again. I know that wasn't necessary with Deconz, so it may be a limitation with ZHA, but none of my Aqara devices have ever fallen iff the network after doing that.
Might be a bandaide, but it worked for me :)

@timiman
Copy link

timiman commented Feb 20, 2023

I'm also having issues with unavailable devices all over since the day I've switched from ConBeeII to SkyConnect -through ZHA both of them. It seems that something is very wrong with SkyConnect, because using ConBeeII did not give me this issue. The reason to change to SkyConnect was future Matter support and better support in general or of specific devices (like Aqara Plug). First the migration process went sideways having devices out of the zigbee network and then having to re-pair them again and again. I'll move back to ConBeeII. I hope the backup will work without having to re-pair 40 devices around the house.
I hope SkyConnect will become some time in the future more stable than ConBeeII, so I'll switch to it again.

@timiman
Copy link

timiman commented Feb 28, 2023

I want to report that I've had zero disconnects by using 2023.3.0b5. Except two TRADFRI repeaters which change from unavailable/available all day long, but without hurting any End Devices of the network which are connected to coordinator via them. It might be the connected End Devices that are put themselves to "power saving sleep" due to inactivity and thus the TRADFRI repeater itself.
In my case, I've change the position of one TRADFRI repeater a little bit further away from a wifi repeater, so it might have helped a bit, too.
Also, I really don't know if anything from .0b4 or .0b5 helped with that but the only other parameter that was changed on the HA installation was the manual removal of HACS. The removal of HACS -happened some months ago- from within HA UI was not fully completed, because by searching the logs for info regarding the issue I've got with .0b3 update, I've found that HA was reporting an installation of HACS was present on the system which could not be found on HA UI.
So, I've 'rmdir' the HACS installation folder manually from Terminal.
I repeat, I do not know if this has to do anything with the devices becoming unavailable, but it was the only thing that was changed in between.
Finally, I've also enabled debugging mode for ZHA, but no End Device became unavailable during this time.

@HarvsG
Copy link
Contributor

HarvsG commented Mar 1, 2023

I strongly suggest anyone affected by "devices randomly stop working/go offline" issues please take a look at https://skyconnect.home-assistant.io/connectivity/, especially the "How to counter interference" section. In exceptionally noisy environments, the threshold between "just barely working" and "not working" is low and may have been exacerbated by switching coordinators. Try a different coordinator placement, orientation, a different USB extension cable, a second USB extension cable, etc. RF issues aren't intuitive and won't be revealed with just a WiFi scan.

another thing that has silently changed is the radio channel of the ZigBee network

If you used the ZHA migration flow, the network channel did not change. However, if you set up a new network from scratch, it will be formed on channel 15.

The only known "issue" with Conbee migration is that a relatively-recent firmware version is required, as otherwise the Conbee doesn't provide a way to read the network key frame counter with older firmwares. If that counter isn't migrated, some brands of devices will refuse to receive commands from the coordinator and will only be able to send updates.

Can also confirm that this wasn't the issue for me. Both sticks were on the same USB 2 ext lead. WiFi access points on channels that don't overlap with Zigbee 15, commands still reached distant parts of the network with a similar reliability to the Conbee II. Issue was exclusively with battery-powered devices dropping off the network and/or having rapid battery drain.

@lougreenwood
Copy link

lougreenwood commented Mar 1, 2023

Likewise for me - I'm now ~5 days in with no dropouts from 5 SML003 Hue motion sensors whereas 13/14 Hue SML001 are unavailable.

So seems that it's a subset of battery devices that are affected.

@timiman
Copy link

timiman commented Mar 1, 2023

Also, I do not know how to check if any firmware on the TRADFRI repeaters was silently applied OTA during the last 10 days or so. It might be a parameter that we've missed. I've IKEA OTA enabled on configuration.yaml. I will search for an official online release list of IKEA firmwares.

@Hedda
Copy link
Contributor

Hedda commented Mar 1, 2023

Also, I do not know how to check if any firmware on the TRADFRI repeaters was silently applied OTA during the last 10 days or so. It might be a parameter that we've missed. I've IKEA OTA enabled on configuration.yaml. I will search for an official online release list of IKEA firmwares.

FYI, @MattWestb tries to maintain an unofficial release list of IKEA Zigbee OTA files under zigpy discussions -> zigpy/zigpy#660

@d-0l
Copy link

d-0l commented Mar 1, 2023

Well, I can say after reverting back to a backup from 2 days prior to the Skyconnect migration didn't help.

The problem persists between ZHA and zigbee2mqtt, even with an updated firmware on the Sonoff stick.

The problem: All 3 motion sensors are simply not connecting to the rest of the zigbee network, despite correctly reporting occupancy under devices. They can't be used to trigger automations.

@TheFelix93
Copy link

Hello, another update from my side.

After the problem did not reoccur after update to 2023.2.5 since days. I decided to restore my full backup/snapshot of the VM in Proxmox to the point where problem existed in the first place.

Result: Still no problems since two days... As I wrote before the measured humidity values are changing all the time, so the sensor may not go to sleep, which can be the problem.

I keep the debug logging active and will wait for the problem to reoccur.

@MattWestb
Copy link
Contributor

@TheFelix93 Is your sensor connected to one router or the coordinator for the moment ?
Look on the visualization / network map in ZHA.

@TheFelix93
Copy link

TheFelix93 commented Mar 1, 2023

@TheFelix93 Is your sensor connected to one router or the coordinator for the moment ? Look on the visualization / network map in ZHA.

Directly to the coordinator "ZNP = Texas Instruments Z-Stack ZNP protocol: CC253x, CC26x2, CC13x2"
I have only one sensor/device connected.

@lougreenwood
Copy link

Just installed 2023.3 hoping it might fix things based on some earlier replies. But within 6 minutes of re-pairing one of my sensors, it became unavailable. 🤦

@austwhite
Copy link

Well, I got a chance to run a couple of tests again and for some stupid reason my issue is now resolved.
It sounds absolutely ridiculous, but I mounted my Sky Connect 90 degrees turned from how it was before, so instead of the stick laying on the horizontal plane, it is now vertical. It sounds ridiculous, but for 3 days I have had no devices going unavailable at all. I have 6 Philips Hue Motion Senors, the SML01 version, and all of them have been rock solid for 3 days now. I am still running 2023.2.5, haven't updated to 2023.3

@rchiileea
Copy link

rchiileea commented Mar 1, 2023 via email

@Hedda
Copy link
Contributor

Hedda commented Mar 2, 2023

It sounds absolutely ridiculous, but I mounted my Sky Connect 90 degrees turned from how it was before, so instead of the stick laying on the horizontal plane, it is now vertical. It sounds ridiculous, but for 3 days I have had no devices going unavailable at all.

FYI, Zigbee Coordinator radio adapter and antenna orientation is covered here -> https://community.home-assistant.io/t/guide-for-zigbee-interference-avoidance-and-network-range-coverage-optimization/515752/

@lougreenwood
Copy link

@puddly i don't mean to pester, but since you mentioned being a maintainer related to ZHA I thought it ok to ping you.

Is this issue being looked into or is there any news to share?

I keep coming over the same story - issues with ZHA and Sonoff/SkyConnect and Philips SML001 sensors (SML003 not affected) where the sensor drops out after some short (mins to 1 day) amount of time yet the rest of the network is stable. Also the same story that the user moved from some other platform to ZHA with one of these coordinators, previously the network was fine (same device positions) but now the network is seemingly broken.

It seems something is fundamentally going wrong in many setups.

In my case, I have 14 SML001 that keep dropping out. I've moved the coordinator (Sonoff E), it's already on an extension cable, I have 40 router devices (hue bulbs/lightstrips/plugs). I've tried changing channels twice (first to 20, which my Hue hub was using previously, then to 25 after inspecting Wi-Fi interference and researching good channels to use to co-exist with an Eero mesh Wi-Fi network). Nothing has made any meaningful difference, the behaviour of the devices on the network is fine except for the SML001 sensors.

I'm concerned with what appears to be silence given the number of issues here and threads on HA forums - and aside from the usual (useful and well intentioned) advice to fix up the network fundamentals it seems no one knows what is going on.

Are you able to provide any more info on this? I've spent a lot of time trying to figure this out (prob 10+ hours researching, tinkering, changing channels, pairing etc. This has taught me some useful stuff WRT zigbee network health, but I haven't solved the issue and now I'm considering moving back to my rock solid Hue hub (I literally had it under a sofa, on top of a pi, next to a stack of network switches, PSUs, modem, firewall and ring alarm system for the last year of it's life with no noticeable issues aside from speed of comms for automations from Hue > HA > NR > HA > Hue) or to try zigbee2mqtt.

I have logs for the past few days, including channel switching if it's useful for debugging.

Thanks in advance, and thanks for your work on the project!

@puddly
Copy link
Contributor

puddly commented Mar 4, 2023

Is this issue being looked into or is there any news to share?

I have one of each on order and will see if I can replicate the problem on my network.

Can you please open a new issue specifically for connectivity issues with the Philips SML001 with SiLabs coordinators (i.e. SkyConnect/Yellow/Sonoff E)? I'm afraid that these general "devices unavailable" GitHub issues become unmanageable after a while due to them aggregating so many unrelated problems (general network issues, problems with specific devices, other coordinator hardware, etc.).

I have logs for the past few days, including channel switching if it's useful for debugging.

How specifically did you switch channels? In the ZHA configuration page, does Channel: show what you expect? The YAML config is only used when forming a new network, since Zigbee doesn't handle channel switching smoothly.

@TheJulianJES
Copy link
Member

@lougreenwood Can you check the Basic: sw_build_id attribute on your SML001 sensors?
And also check current_file_version on the Ota cluster please. (The sensor should wake up automatically when reading those attributes)

(To read them: Settings -> Integrations -> ZHA: Configure -> Devices -> select your SML001 -> three dots on the left -> Manage Zigbee device -> cluster -> select Basic in the first drop-down, then sw_build_id in the second drop-down menu and click "Get Zigbee attribute". Then, Ota in the first menu and current_file_version in the second, press read again and paste the results here.)

@lougreenwood
Copy link

lougreenwood commented Mar 7, 2023

@puddly / @TheJulianJES - I created a new issue here and included the info you requested in that issue - thanks for your help!

@austwhite
Copy link

Just as a note, this issue seems to have mutliple tickets open under various titles, maybe need to combine them all as the issue is not just related to Hue SML001 or SML003 detectors. Hue RWL021 and RWL022 dimmer switches are affected and also Aqara Motion/light sensors, some Aqara Temp/humidty sensors and Tuya zigbee motion sensors.
It seems to affect a lot of battery operated devices

@Ultra9k
Copy link

Ultra9k commented Apr 11, 2023

Same issue. Two of my four philips motion sensors constantly become unavailable and I have to remove and re-add them and one of them is 1m away from the skyconnect.

@austwhite
Copy link

@Ultra9k
See the new issue. IT could be the Hue Motion Sensors. It has been proven, also in my testing, that Hue Motion Sensors, particularly SML001 version, like to attach to the worst parent, the weakest link so to speak.
That said, mine magically started working reliably in 2023.4 but they still like to attach to the weakest parent. Mind you, they do that on the Hue Bridge also

@Caligo82
Copy link

Just as a note, this issue seems to have mutliple tickets open under various titles, maybe need to combine them all as the issue is not just related to Hue SML001 or SML003 detectors. Hue RWL021 and RWL022 dimmer switches are affected and also Aqara Motion/light sensors, some Aqara Temp/humidty sensors and Tuya zigbee motion sensors. It seems to affect a lot of battery operated devices

This is really my experience as well. I can't tell if it's down to a ZHA update or in my case the switch from a Conbee II Stick to a Skyconnect. In my case I have Hue RWL021, SML001 and some Aqara AQ2 motion sensors acting up recently. My routers are mainly Tuya TS011F wall plugs. Funny thing is, that I have some of these devices that are rock solid while others just refuse to work reliably no matter how many times I repair them. I also noticed that they don't seem to always get paired the same way. I have one AQ2 where it just has the entity "motion" while others have "Iaszone" as a motion entity.

To be fair I thought it was all my mistake, as the Conbee II to Skyconnect migration was quite the disaster. (It failed to repair the devices as the ZHA migration process didn't respect the previous Zigbee channel and just decided to change it. And when I used a manual Zigbee backup it switched up all my identical devices on the network. So switch RWL021 A was suddenly switch RWL021 B and so on. This was especially messy with all my TS011F wall plugs which I used for my Energy Monitoring. Basically wiped all my cost history for the past months. But I guess as we always do, bit the bullet and rebuilt it. =] )

I'm glad to find out though that there is some larger problem here. Since I didn't get some of the Hue devices working reliably on the Skyconnect I switched them to the Hue Bridge. They're a bit slower now... but they work. One of the AQ2 is killing me though. It's like tossing a coin if it works or not. I'm at the point where I start betting with myself if it's going to work so I know it's time to ditch them. Don't know if I can wait for the fix of this. LOL

I have the following threads atm.:

  • WiFi interference from neighbours on my Zigbee Channel 15 (Hue Bridge is on 25). To be fair: overall it's a mess as it's in a relatively dense city. I have for example one prick neighbour now having three access points on 2.4ghz all on the same channel with HE80. One can only imagine...
  • TS011F wall plug not acting as a decent router
  • Skyconnect having more problems with weaker clients compared to the Conbee II
  • ZHA quirks not fully developed or adapted for some devices
  • some or all of the factors above combined

I wouldn't say it was 100% reliable with the Conbee II, especially some HUE RWL021 acted up similarly. But it's worse now. That's for sure.

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@pschneider87
Copy link

I still have some Sensors that turn unavailable regularly and need to be repaired, to work for some hours or days.
Not sure if others experiencing the same.

It's happening with all latest software updates

@github-actions github-actions bot removed the stale label Aug 10, 2023
@puddly
Copy link
Contributor

puddly commented Aug 10, 2023

Author of the original issue is no longer using ZHA so there's no way to further debug.

@pschneider87 The issue with your Hue sensors is unrelated. Take a look at #89311 (comment) for more info and for a potential fix.

@puddly puddly closed this as completed Aug 10, 2023
@github-actions github-actions bot locked and limited conversation to collaborators Sep 9, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests