Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Firmware 20240710 repair Devices every day #23869

Open
Barneybaer84 opened this issue Sep 6, 2024 · 30 comments
Open

Firmware 20240710 repair Devices every day #23869

Barneybaer84 opened this issue Sep 6, 2024 · 30 comments
Labels
problem Something isn't working

Comments

@Barneybaer84
Copy link

Barneybaer84 commented Sep 6, 2024

What happened?

Since the firmware Update to my Sonsoff zigbee 3.0-P , I have devices that are offline every day and need to be re-paired. This is frustrating

What did you expect to happen?

No response

How to reproduce it (minimal and precise)

No response

Zigbee2MQTT version

1.40.0

Adapter firmware version

20240710

Adapter

SONOFF ZigBee 3.0 USB Dongle Plus

Setup

X86 Home Assistant

Debug log

No response

@Barneybaer84 Barneybaer84 added the problem Something isn't working label Sep 6, 2024
@raaaf
Copy link

raaaf commented Sep 6, 2024

Same here.

@rursache
Copy link

rursache commented Sep 7, 2024

same issue here, i downgraded the adapter to 20221226 but it still crashed. so i also downgraded zigbee2mqtt to 1.39.1 and i plan to go as low as 1.38.0

my zigbee network was flawless for 2 years until these 2-3 days when both zigbee2mqtt and adapter fw got updated

we are not alone Koenkk/Z-Stack-firmware#518

@Koenkk
Copy link
Owner

Koenkk commented Sep 7, 2024

Could you provide the debug log from starting z2m until the device drops?

@jymorel
Copy link

jymorel commented Sep 7, 2024

Same issue here after firmware and z2m updates. Especially sonoff temp and humidity sensor (snzb-02), but not only. Re-pairing is not enough. Need to delete the device and re-pair again

@Barneybaer84
Copy link
Author

Barneybaer84 commented Sep 7, 2024

I have update the firmware at 05.09. but my next log ist at 06.09.
In the log you can ignore Alarm Sirene, Alarm Sirene defekt and Nachttisch Schatz. All other devices are always online.
06.09.24_log.zip
The most devices where goes offline are Aqara Sensors, like "Kinderzimmer Temperatur" or "Felix Zimmer Fenster Rechts" or "SZ Fenster Links"

@rursache
Copy link

rursache commented Sep 8, 2024

@Koenkk: Could you provide the debug log from starting z2m until the device drops?

here are my logs: zigbee2mqtt_logs.txt

the issue happened at exactly 2024-09-08T13:50:08.712364154Z and i think the relevant part is this:

2024-09-08T13:50:08.712364154Z [2024-09-08 16:50:08] debug: 	zh:controller:endpoint: Error: ZCL command 0x00124b00259aa8d0/8 genBasic.read(["zclVersion"], {"timeout":10000,"disableResponse":false,"disableRecovery":true,"disableDefaultResponse":true,"direction":0,"reservedBits":0,"writeUndiv":false}) failed (SRSP - AF - dataRequest after 6000ms)
2024-09-08T13:50:08.712384320Z     at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
2024-09-08T13:50:08.712387080Z     at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:300:45
2024-09-08T13:50:08.712389245Z     at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
2024-09-08T13:50:08.712391411Z     at Znp.request (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:291:27)
2024-09-08T13:50:08.712393557Z     at ZStackAdapter.dataRequest (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:1201:24)
2024-09-08T13:50:08.712395752Z     at ZStackAdapter.sendZclFrameToEndpointInternal (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:446:46)
2024-09-08T13:50:08.712398067Z     at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:380:25
2024-09-08T13:50:08.712400231Z     at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
2024-09-08T13:50:08.712402356Z     at ZStackAdapter.sendZclFrameToEndpoint (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:378:27)
2024-09-08T13:50:08.712404589Z     at Request.func (/app/node_modules/zigbee-herdsman/src/controller/model/endpoint.ts:296:36)

after these, my light bulbs and smart plugs + the router all dropped, bringing down the entire zigbee network

with a container restart, the zigbee network is back online.. until the next time (between 5mins and 8h)

image

im using Slaesh Zigbee 3.0 USB Stick (i have two of them, one coordinator and one router. i switched them around to rule out a hardware issue. the problem happens on both

@Koenkk
Copy link
Owner

Koenkk commented Sep 9, 2024

@Barneybaer84 does the issue also happen when you keep the devices really close to the coordinator? Z2M tries to reach it but fails, could indicate an issue with one of the routers

@rursache does the issue still happen with the availability feature disabled? Did the 20221226 firmware work?

@rursache
Copy link

rursache commented Sep 9, 2024

@Koenkk I always had "Availability" enabled as I want to see that my devices are available. i can try with it off next week but I don't want to keep it disabled. the latest working fw was 'CC2652RB_coordinator_20221226' (almost 2 years, butter smooth and stable) but now downgrading to it does not fix it!

@Barneybaer84
Copy link
Author

After 5 days with device drops and update z2m to 1.40.1, my Zigbee network works perfecly again.

@Skeletorjus
Copy link

Skeletorjus commented Sep 10, 2024

I'm facing the same, but I'm not so sure that 20240710 is causing this - I had the same with 20230507 (#23329 (comment)).
Updated to 20240410 five days ago and am still having the same thing happening with the network going offline. The way I usually notice it is because some of my bulbs (Namrons, to be exact) starts panicking and goes into some flashing disco mode 😆

I can't remember when it started, but I'm tempted to say that it was around 1.38.0.

Log from a couple of minutes before the crash and the crash itself, pretty much the same as #23869 (comment).
z2m_error.txt

Zigbee2MQTT 1.40.1 commit: 403d3c0
Sonoff Plus P
zStack3x0
20240710.

@rursache
Copy link

@Koenkk any update on this?

@jymorel
Copy link

jymorel commented Sep 17, 2024

everything ok for a few days after re-pairing several devices

@rursache
Copy link

rursache commented Sep 17, 2024

i still have to restart the zigbee2mqtt docker container every hour otherwise the coordinator and all the routers go offline, bringing everything down. its instantly fixed after the container restarts

@rursache
Copy link

as a walkaround and fed up with the sloppy cronjob, i made this HomeAssistant automation to restart the zigbee2mqtt docker container when my philips hue light goes offline (unavailable in HASS):

alias: Fix Zigbee2MQTT
description: ""
trigger:
  - platform: state
    entity_id:
      - light.living_room_philips_hue_color
    from: null
    to: unavailable
    for:
      hours: 0
      minutes: 0
      seconds: 5
condition: []
action:
  - action: shell_command.restart_zigbee2mqtt
    metadata: {}
    data: {}
mode: single

please note that you need to create an entry for shell_command.restart_zigbee2mqtt in your HASS configuration.yaml file like this:

shell_command:
  restart_zigbee2mqtt: >
    'nohup curl -X POST URL $1 > /dev/null 2>&1 &' 

make sure to replace URL with your portainer or whatever else webhook you have and light.living_room_philips_hue_color with your zigbee entity from HASS

now when the light goes offline (the simpler way of detecting when the entire zigbee network is down) it will restart the z2m container bringing everything up and running in under 15 seconds

i really can't wait for a proper fix tho!

@Skeletorjus
Copy link

Very likely a red herring, but do any of you utilize bindings?
My network has been crashing on and off for a while (as described in #23869 (comment)), but it hasn't been too bad lately.

I have two IKEA Parasoll-devices that both have been bound to their own bulbs (Namron 3802952). The contact sensors have been out of use for a while due to empty batteries. Today I replaced the batteries, and as soon as I did a couple of tests to ensure that the bidings to the bulbs worked, my whole network went down. Had to replug the Sonoff and restart Zigbee2MQTT. Did this multiple times.

I have removed the bindings, and for the time being the network seems stable.

@rursache
Copy link

being super frustrated with the lack of support or work being done to fix this, i bought a new ZigStar UZG-01 (CC2652P7 ) which arrived with FW 20230507. i switched the old slaesh CC2652RB with the UZG-01 and my zigbee network has been stable ever since. 72h so far, had crashes every 20min-8h. so far 0 drops or crashes. will flash the slaesh as a router and use it like that.

i think the new firmware ruins the coordinator somehow but it's just my guess. i tried flashing the slaesh CC2652RB with each firmware starting with 2022 until the latest, none fixed it. a new device did. well 🤷🏻‍♂️

@Skeletorjus i don't use bindings, don't think it's related

@Koenkk
Copy link
Owner

Koenkk commented Sep 22, 2024

Could you see if the 99240914 firmware fixes it? If yes, then try the next fws (e.g. 99240915) until the problem reappears. fws.zip (SONOFF Dongle P only)

@rursache
Copy link

i do not own a sonoff device, just two slash-es.

i'm also not interested in fiddling again after i finally have a functioning network

@thogens
Copy link

thogens commented Sep 23, 2024

I seem to have the same issue, but need to investigate further.
So far the only way for me to fix the crash was to restart HA, but that takes 5mins at least.
Next time I'll check if a simple restart of the Zigbee2Mqtt Addon will do the trick.
If yes -> @rursache : I find your script promising, but I don't know how to restart the Addon in my case. I'm not running it on a dedicated portainer, but as std. HA Addon... Any idea?

@Barneybaer84
Copy link
Author

Could you see if the 99240914 firmware fixes it? If yes, then try the next fws (e.g. 99240915) until the problem reappears. fws.zip (SONOFF Dongle P only)

I will test the Firmware.

@rursache
Copy link

@thogens i'm sorry, i run homeassistant, zigbee2mqtt, mosquitto and all my services in docker. not sure if you could reload an HASS addon if you have HASS OS installed

@Barneybaer84
Copy link
Author

Barneybaer84 commented Sep 24, 2024

I have don't flash the test fw but i have switch the USB 3.0 to 2.0 with an USB 2.0 cable and i have no device disconnect since two days.
I hope it stays that way.

@Barneybaer84
Copy link
Author

Barneybaer84 commented Sep 26, 2024

Day 4 after switch to USB 2.0, 2 devices are offline :( I will test the fws.zip

@Luca1996O
Copy link

Same issue here, with ZBDongle-P latest firmware version and ZigBee2MQTT latest version (1.40.1).

@Barneybaer84
Copy link
Author

With test fw 99240914 same problem. I have downgraded to 20230507, re-pair same devices and now it works for 4 days. No device goes offline anymore.

@fsedarkalex
Copy link

Are the fws.zip still a thing @Koenkk ? Currently on 20240710 and having random dropt of routers (I think multiple at a time, probably increasing drop-speed the more are dropping)

@stewepylon
Copy link

In my environment, latency was caused by the automatic update checks. Many BTicino devices are reporting hundreds of messages due to firmware version mismatches, which seems to be slowing everything down.

@fsedarkalex
Copy link

I have just disabled the OTA check to test if this fixes the device "drops"

@fsedarkalex
Copy link

fsedarkalex commented Nov 11, 2024

Would like to add... For me currently those devices are definitely affected:

Also I am constantly losing battery powered sensors, which could be based on the same issue or probably a follow-up issue:

My network consists of total 83 devices right now...

by manufacturer

IKEA of Sweden: 41
eWeLink: 9
frient A/S: 9
OSRAM: 5
Paulmann Licht GmbH: 3
Signify Netherlands B.V.: 3
_TZ3000_okaz9tjs: 3
LUMI: 3
Paulmann LichtGmbH: 2
GLEDOPTO: 2
undefined: 1
_TZ3000_zmy1waw6: 1
_TZE204_qasjif9e: 1

by model

DS01: 7
SMSZB-120: 6
501.34: 5
Plug 01: 4
TS011F: 4
TRADFRIbulbE27WSglobeclear806lm: 4
TRADFRI bulb GU10 CWS 345lm: 4
TRADFRI Driver 30W: 4
TRADFRI bulb E27 CWS globe 806lm: 4
TRADFRI bulb E27 WW globe 806lm: 4
TRADFRIbulbE27WWclear250lm: 3
SML004: 3
HESZB-120: 3
TRADFRI control outlet: 3
RODRET Dimmer: 2
GL-SD-001: 2
lumi.sensor_wleak.aq1: 2
TRADFRIbulbE14WWclear250lm: 2
TH01: 2
�TRADFRI bulb GU10 WW 345lm8: 2
TRADFRI SHORTCUT Button: 2
SYMFONISK sound remote gen2: 2
undefined: 1
lumi.vibration.aq1: 1
Remote Control N2: 1
VALLHORN Wireless Motion Sensor: 1
TRADFRIbulbE27WSglobeopal1055lm: 1
Lightify Switch Mini: 1
STARKVIND Air purifier: 1
STARKVIND Air purifier table: 1
TS0601: 1

Probably worth a mention:
I am running a second Z2M instance, configured identical, also on the same coordinator FW and HW.
This second network is dedicated for AwoX devices AND their related wall remotes.

I have no drops in this network at all so far. But of course it is much smaller and more homogenic:

AwoX: 10
Paulmann LichtGmbH: 3
Sunricher: 1
Paulmann Licht GmbH: 1
EGLO_ZM_TW: 9
501.34: 4
TLSR82xx: 1
ZGRC-KEY-004: 1

@thogens
Copy link

thogens commented Nov 15, 2024

Yesterday again the whole network crashed and I know that I disabled the automatic OTA just some hours earlier.
Currently I have some sort auf auto-detection mechanism in place that automatically reboots HA in such a case, where nothing works anymore in the Zigbee net.
Downside is that this takes 15mins.
But after rebooting HA, everything works fine again...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
problem Something isn't working
Projects
None yet
Development

No branches or pull requests

10 participants