Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I believe there is an issue with Z-stack 20240710 FW #518

Open
habitats-tech opened this issue Sep 4, 2024 · 79 comments
Open

I believe there is an issue with Z-stack 20240710 FW #518

habitats-tech opened this issue Sep 4, 2024 · 79 comments

Comments

@habitats-tech
Copy link

habitats-tech commented Sep 4, 2024

Flashed SLZB-06P7 with 20240710 FW a couple of weeks ago. Since the new FW was applied the device randomly stops receiving updates from devices, while Zigbee2MQTT reports no issues except communication errors.

Restarting ESP32 or Zigbee (the last time I restarted Zigbee, was not enough, I had to restart ESP32 for the device to start responding; it could have been I did not give enough time for the devices to start reporting - gave it a couple of minutes which was my prior experience with the failure) the SLZB-06P7 starts processing packets again.

Time between failures, one took 2 days, and another 6 days. The last 2 weeks the device failed with the same issue 3 times.

Initially I thought it was an isolated incident, but now I am more confident is a FW issue.

Are these type of issues related to TI chipsets only?

@Koenkk
Copy link
Owner

Koenkk commented Sep 4, 2024

Please post the debug log from starting z2m until this error.

See this on how to enable debug logging.

@habitats-tech
Copy link
Author

I need to do some work as they will total more than 100Mb in size. Hopefully in a few hrs from now will submit.

@habitats-tech
Copy link
Author

Unfortunately I cannot post debug info yet as I downgraded FW last night and only had the log at info level. Today I will upgrade again to 20240716 and will update once I have the first failure.

@nicolasvila
Copy link

Facing a strange issue too with my CC2652RB with 20240710 FW, with ZigBee2MQTT and HomeAssistant.
Devices are not able to pair anymore. Device map shows no link between coordinator and devices., and only one sensor is sending values..
Zigbee device reset makes it disappear but unable to join again.

@devchristof
Copy link

I have updated my Sonoff USB Dongle with 20240710 FW, ZigBee2MQTT and Home Assistant, need to reboot everyd ay because of loose all devices.

Version de Zigbee2MQTT
1.40.0 commit: unknown
Type de coordinateur
zStack3x0
Révision du coordinateur
20240710
Adresse IEEE du Coordinateur
0xxxxxxxxxxxx
Version de l'interface
0.7.4
Version Zigbee-herdsman-converters
20.8.4
Version Zigbee-herdsman
0.57.1
Statistiques
Total 51
Par type d'appareil
Routeurs: 28
Appareils terminaux: 23
Par source d'alimentation
Secteur (monophasé): 30
Batterie: 19
Source DC: 2
Par vendeur
SONOFF: 7
LUMI: 5
eWeLight: 4
GLEDOPTO: 4
_TZ3000_qeuvnohg: 4
Niko NV: 2
_TZ3000_xr3htd96: 2
frient A/S: 2
_TZE200_2aaelwxk: 2
_TZ3000_ko6v90pg: 2
zbeacon: 2
_TZ3000_cayepv1a: 2
_TZ3000_5e235jpa: 1
_TZ3000_typdpbpg: 1
ADUROLIGHT: 1
_TZE204_t1blo2bj: 1
_TZ3000_hhiodade: 1
_TZ3000_axpdxqgu: 1
_TZE200_hue3yfsn: 1
_TZ3210_0zabbfax: 1
_TZE200_yvx5lh6k: 1
ptvo.info: 1
_TZE200_81isopgh: 1
_TZ3210_95txyzbx: 1
_TZ3000_xwh1e22x: 1

I change config to debug log I send beforce next reboot

@rursache
Copy link

rursache commented Sep 6, 2024

i'm having issues as well, CC2652RB with 20240710 FW, with ZigBee2MQTT and HomeAssistant.
woke up with half (8/19) devices offline. had to stop the ZigBee2MQTT container, unplug the zigbee adapter and then plug back in. started ZigBee2MQTT fine and devices joined back but in 8-9h it did it again.

my devices were working smooth for almost 2 years until i updated.

i flashed 20221226 back and will continue monitoring but so far so good.

EDIT: woke up with the zigbee network down again, at this point i regret updating so bad

@habitats-tech
Copy link
Author

habitats-tech commented Sep 7, 2024

I have done some deeper digging. For some reason the SLZB-06P7 coordinator decided not to join the network following power off and replaced it with a UZG-01 flashed with 20240710. Trying different coordinator FW versions I have observed routing is wildly different. Earlier versions to 20240710 do not cluster well around routers. 20240710 seems to elect the key routers properly, and all routers seem to connect to the coordinator as expected.
However, I get constant communication errors with all routers and I lean towards the issue is device related, rather then firmware related. Possibly someone could take a deep dive into the code behind https://www.zigbee2mqtt.io/devices/QS-zigbee-S08-16A-RF.html, I have 30+ of these, plus some other Tuya relay switches. For anyone reading this DO NOT ever buy QS products as they are not reliable short or long term - they stop functioning at some point.

Images with the map and the errors are provided below. All of the routers even if they are really close (<5m no obstacles) to the coordinator seem to have low LQI (<99); if I re-pair they seem they go triple digit LQI, but eventually they settle on low double digit LQI, and therefore concluded re-pairing is not useful.

image

image

image

Although the battery powered devices are not connected they all seem to work flawlessly. Possibly some improvements required to get the map right. It seems to take a couple of hours for connection routing to settle.

image

It takes close to 20 mins for the above map to be generated.

One last thought. It would be a great addition if we have an option to clear all error messages displayed. Sometimes it takes more than 2 mins to manually clear error messages, which obscure action buttons.

@habitats-tech
Copy link
Author

habitats-tech commented Sep 7, 2024

Good news is the map does eventually generate an accurate connection layout. The following map is after 2 days of operation. Additionally it now takes less than 4 mins to produce the map using 20240710 UZG-01 combo.

image

@habitats-tech
Copy link
Author

After of almost 2 days of operation Z2M lost connection to UZG, but automatically recovered. I timed map generation to just 2.5 mins.

image

This zip file has all the debug logs. Logs dated 2024-09-05 were created using SLZB-06P7. Logs dated 2024-09-06 were created using UZG-01/20240710. The logs also include the instance where connection to UZG was reset by something.

log.zip

@habitats-tech
Copy link
Author

I see a (0xc7: NWK_TABLE_FULL) error in the debug logs, if this is of any assistance.

@Koenkk
Copy link
Owner

Koenkk commented Sep 7, 2024

20240315 allowed more devices to connect to the coordinator, 20240710 less and thus relies more on routers to improve stability of the coordinator. If those routers are crap, you will get very poor performance. I would suggest to first power of some spammy devices, e.g. '00901-33-SM' and see if that improves your network.

@habitats-tech
Copy link
Author

20240315 allowed more devices to connect to the coordinator, 20240710 less and thus relies more on routers to improve stability of the coordinator. If those routers are crap, you will get very poor performance. I would suggest to first power of some spammy devices, e.g. '00901-33-SM' and see if that improves your network.

I have already done this twice, powered off the entire section of 00901, waited a few mins and power on the entire section again. I will try once more and provide feedback.

@habitats-tech
Copy link
Author

habitats-tech commented Sep 7, 2024

Do/Can we have any tools which allows us to influence affinity to certain routers.

For SLZB which router FW version works better with coordinator FW 20240710.

I was thinking of a tool which allows us to group routers and let the system automatically balance between them, therefore being able to avoid certain routers during pairing.

@habitats-tech
Copy link
Author

I have a question for Koenkk. The below device will re-interview successfully with no errors or warnings. How come it still shows offline with a 2 week last seen status.

image

@habitats-tech
Copy link
Author

I suggest an option to set the line colours in the map. The dark blue on the dark theme is difficult to visualise on a busy map.

image

image

@starox
Copy link

starox commented Sep 8, 2024

I am wondering if it is possible to save the map in a nice format.
I would be nice to track route changes or processing it in order to do better analysis.

@starox
Copy link

starox commented Sep 8, 2024

I have 20240710 working for some weeks now.
First I had some trouble with some tuya meter plug TS011F which were seen by the coordinator but not responding, or don't want to pair. I swapped some of these plug and repaired.
Now everything works perfectly but one plug don't pair anymore.

Thanks for the good jobs !

@starox
Copy link

starox commented Sep 8, 2024

Oops, I forgot to better explained what I did.
Some TS011 plug did not pair anymore.
I replaced one with a spare new one (never paired on my network): It worked
I try to pair the other faulty one at a different location in my home : It partially worked. One were alive again, the other one refuse to pair.
Maybe it is a routing problem ?

@habitats-tech
Copy link
Author

habitats-tech commented Sep 8, 2024

I confirm there is an issue with FW20240710. The coordinator resets sometimes after 20 mins, sometimes after hours, but it resets nevertheless. I will keep FW20240710 to assist in troubleshooting and because, since v1.40.0, Z2M automatically reconnects. My experience is that I completely lose Ethernet connectivity and I know it is not a network issue as there are hundreds of devices on the network (Ethernet and WiFi) with no issues.

Please let me know how I can assist debug this issue.

image

image

Log file attached. The crash debug is at the top of the log.

log.log

@habitats-tech
Copy link
Author

This info might be useful to some.

I have lots of Tuya QS relay devices which in theory can act as routers. When the UZG-01 was directly connected to some of them I had constant communication errors, although everything seemed to work with no issues, but with delays, sometimes considerable.

Once I paired a SLZB-06P7 router to the UZG-01 coordinator most of the communication issues between QS routers and coordinator vanished. I get the occasional error now, but the errors are not show stoppers.

The SLZB-06P7 router (00902-Router) is configured as follows:

image

Since I introduced the SLZB-06P7 router everything seems to have smoothed out. However, because I have issues with the UZG-01 (the coordinator) restarting at random intervals, I will upgrade to the latest SLZB-06P7 router FW and give an update if the UZG-01 issues have been eliminated or some other gremlins were introduced.

image

@habitats-tech
Copy link
Author

For those interested to know the SLZB-06P7 router FW differences here they are:

image

@dankocrnkovic
Copy link

dankocrnkovic commented Sep 8, 2024

Zigbee2mqtt has the latest update: 1.40.1-1
Coordinator: UZG-01 20240707

Again last night total adapter crash... (Adapter Web GUI is working, and Zigbee reset does not help). After PoE reset, and I let it for 1 hour to stabilize, network was a mess.. Ikea bulbs were reporting no network route. had to reset all router devices with appartment elec. braker. Yesterday befor the crash I noticed that everything is very laggy.

And it is like this every 1-2 weeks from the day this firmvare version is released. Already rejoined all devices.
As I can see SONOF dongles are the winners here, but P7 chips are simpy not working.

Are there any ongoing actions about this as reports like this were posted from first day it was released as beta?

@habitats-tech
Copy link
Author

So I updated SLZG-06P7 router FW to 20240716 from 20240315. The device goes offline. To get it back online follow this:

  1. Forcibly remove SLZG-06P7 from Z2M
  2. Enable Joining through the coordinator
  3. Reboot SLZG-06P7
  4. Hopefully you should see SLZG-06P7 pairing and it should go online insantly

Will update once I am confident of any positive/negative/neutral changes to the Zigbee network.

image

@rursache
Copy link

rursache commented Sep 8, 2024

@Koenkk my logs are posted here: Koenkk/zigbee2mqtt#23869 (comment)

@cloudbr34k84
Copy link

cloudbr34k84 commented Sep 8, 2024

Running Zigbee2MQTT Edge
POE UZG-01
150 Devices

Where to start? I updated a week or so ago. The update went well. I noticed Aqara devices dropped off a day later. Repaired. No issue. I then started seeing a few devices drop off. I then started to get complete network crashes. Fast forward to now and my network is struggling to stay up.
I have gone back 1 FW version and the outcome is still the same. The annoying part is everything was fine until FW 20240710.
This is what I'm seeing

[2024-09-09 03:42:34] error: 	z2m: Error while starting zigbee-herdsman
[2024-09-09 03:42:34] error: 	z2m: Failed to start zigbee
[2024-09-09 03:42:34] error: 	z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-09-09 03:42:34] error: 	z2m: Exiting...
[2024-09-09 03:42:34] error: 	z2m: Error: SRSP - ZDO - startupFromApp after 40000ms
    at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
    at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:300:45
    at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
    at Znp.request (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:291:27)
    at ZnpAdapterManager.beginStartup (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:279:28)
    at processTicksAndRejections (node:internal/process/task_queues:95:5)
    at ZnpAdapterManager.beginRestore (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:330:9)
    at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:74:21)
    at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:138:29)
    at Zigbee.start (/app/lib/zigbee.ts:65:27)

Update
Im now back on latest FW and I'm seeing this in the logs

[2024-09-09 04:02:11] debug: 	zh:controller: Data is from unknown device with address '63441', skipping...

I tried to find any device with these numbers and I get nothing

Update
Good News is clicking the pair button is not result in my coordinator crashing. This is the most stable its been in 12 hrs. I feel a bit more relaxed as its 420am and ill need to get up soon lol

@dankocrnkovic
Copy link

I have gone back 1 FW version and the outcome is still the same. The annoying part is everything was fine until FW 20240710. This is what I'm seeing

That is the weirdest thing here.. rolling back is not fixing problems.. its like updated coordinator pushed somethint to router devices and network is not stable with older firwares any more...

@rursache
Copy link

rursache commented Sep 8, 2024

having the same thoughts as @cloudbr34k84 and @dankocrnkovic.
@Koenkk how is it possible that downgrading the coordinator back to the previous working version not fix the issues? i even tried downgrading the coordinator, the main router AND the zigbee2mqtt container to no avail..

@cloudbr34k84
Copy link

I have gone back 1 FW version and the outcome is still the same. The annoying part is everything was fine until FW 20240710. This is what I'm seeing

That is the weirdest thing here.. rolling back is not fixing problems.. its like updated coordinator pushed somethint to router devices and network is not stable with older firwares any more...

i seem to have stabilised for now, but I flashed the last 3 firmware on the device retarded the device each time, then I flashed the latest and it seems to be stable again. i have 2 devices offline but that's because they are bulbs which my wife switch the lamps off manually

@dankocrnkovic
Copy link

i seem to have stabilised for now, but I flashed the last 3 firmware on the device retarded the device each time, then I flashed the latest and it seems to be stable again. i have 2 devices offline but that's because they are bulbs which my wife switch the lamps off manually

Give it time. I thought that also, but then in 5-10 days router devices start to die again, bring the network down.
After fiew iterations I decided to stay on this latest version as downgrading does not help me stabilise my home. Now I have a chair near my electricity braker box and made instructions for my partner how to recycle power in home so we can have lights....

I will never again update firmware once (and if) its stable again with some fix.

@rursache
Copy link

as a walkaround and fed up with the sloppy cronjob, i made this HomeAssistant automation to restart the zigbee2mqtt docker container when my philips hue light goes offline (unavailable in HASS):

alias: Fix Zigbee2MQTT
description: ""
trigger:
  - platform: state
    entity_id:
      - light.living_room_philips_hue_color
    from: null
    to: unavailable
    for:
      hours: 0
      minutes: 0
      seconds: 5
condition: []
action:
  - action: shell_command.restart_zigbee2mqtt
    metadata: {}
    data: {}
mode: single

please note that you need to create an entry for shell_command.restart_zigbee2mqtt in your HASS configuration.yaml file like this:

shell_command:
  restart_zigbee2mqtt: >
    'nohup curl -X POST URL $1 > /dev/null 2>&1 &' 

make sure to replace URL with your portainer or whatever else webhook you have and light.living_room_philips_hue_color with your zigbee entity from HASS

now when the light goes offline (the simpler way of detecting when the entire zigbee network is down) it will restart the z2m container bringing everything up and running in under 15 seconds

i really can't wait for a proper fix tho!

@nicolasvila
Copy link

Just in case a few are facing the same issues as mine, maybe this could help someone.
I have not a clear answer of which of the points below solved my Zigbee network and make it back to normal.

  1. I reinstalled the following coordinators versions of:
  • CC2652RB_coordinator_20210120.hex (originally installed on slaesh's stick)
  • CC2652RB_coordinator_20221226.hex
  • CC2652RB_coordinator_20230507.hex
  • CC2652RB_coordinator_20240710.hex
  1. Network was complaining about existing network and pan_id was already assigned. So I did the following
  • start Z2M without antenna (it started) when not being able to communicate with devices, but that's not solving the issue
  • disconnected all the zigbee devices (removed batteries)
  • the issue was still there so I erased, reflashed my CC2531 routers and still no luck. So I also put them offline too
  1. Maybe Z2M has a corrupted configuration or something like that
  • I reinstalled Z2M from the latest repository. I was stuck with 1.18.1 as I never noticed the repository has moved. The latest version has the same behaviour as the old one...
  • After a backup, I erased Z2M configuration from HomeAssistant. Still no luck
  1. Maybe a flash issue? I tried various options to reflash the 240710 firmware
  • using JelmerT's serial bootload tool (cc2538-bsl.py script on Linux)
  • I tried also ZigStar Multi Tool on Windows. And this time I totally erased the nvram of the CC2531RB coordinator, and reflashed the latest firmware. The pan_id conflict has gone
  1. With a working coordinator, the Zigbee network was still down. Unable to pair my devices with my clear coordinator...
  • I tried different wiring and check that the USB cable was successfully plugged.
  • The coordinator was on an USB3 port of my RaspberryPi4, so I decided to move it to an USB2 port... and TADAAAAAAAA !!! All my Zigbee devices were able to pair with the coordinator again !!!

This has been driving me crazy during 2 weeks (!!!!)
Maybe the issue was caused by USB3 on the RPi4 but why did it work during the past 2 years?
Maybe a subtle mix of different causes?

My advice:

  • USB extension cable (20cm) is highly RECOMMENDED
  • Use an USB2 slot and avoid USB3 !
  • Use full path for the configuration ! /dev/serial/by-id/..... (not /dev/USB1)

Hope it helps.

@emaayan
Copy link

emaayan commented Sep 20, 2024

good thing i didn't update i was having problems updating it with add on repository

@rursache
Copy link

rursache commented Sep 22, 2024

being super frustrated with the lack of support or work being done to fix this, i bought a new ZigStar UZG-01 (CC2652P7 ) which arrived with FW 20230507. i switched the old slaesh CC2652RB with the UZG-01 and my zigbee network has been stable ever since. 72h so far, had crashes every 20min-8h. so far 0 drops or crashes. will flash the slaesh as a router and use it like that. god knows i won't ever update the coordinator fw ever again after this.

i think the new firmware ruins the coordinator somehow but it's just my guess. i tried flashing the slaesh CC2652RB with each firmware starting with 2022 until the latest, none fixed it. a new device did. well 🤷🏻‍♂️

@nicolasvila what you described was the most basic troubleshooting one can do, the same instructions are found everywhere on the web, including in the docs. your issue is not related to the one here and does not help but dilute the actual problem

@emaayan i strongly recommend you to not update!

EDIT: one month in, still no issues with the new zigstar
EDIT 2: two months check in, all smooth sailing!

i still see people here updating and then having issues. STOP UPDATING THE ZIGBEE FW!!! you gain nothing but cause issues

@emaayan
Copy link

emaayan commented Sep 22, 2024

Thanks, now i have to figure out to switch the sonoff from zha to z2m cause i think the zigbee led usb light bars aren't compatible zha

@habitats-tech
Copy link
Author

habitats-tech commented Sep 24, 2024

Following testing over the last couple of weeks I now have feedback, which I hope might be of some value to some.

The physical area where the ZB network has been deployed is as follows:

  • one level
  • 1150sqm surface area
  • divided into 4 quartiles

Initial deployment, experiencing lots of device communication errors and coordinator disconnections every few mins:

  • around 160 devices (all Tuya) / 40+ routing-devices and 120 end-devices - all routers directly connected to coordinator
  • one coordinator (UZG-01/SLZB06x) - flashed with coordinator-FW 20240710
  • Z2M 1.40.1

Re-deployment, no disconnections of any sort with coordinator or devices:

  • around 160 devices (all Tuya) / 40+ routing-devices and 120 end-devices - devices connected to respective quartile routers
  • one coordinator (UZG-01/SLZB06x) - flashed with coordinator-FW 20240710
  • 4 dedicated routers (UZG-01/SLZB06x) - flashed with router-FW 20231201
  • Z2M 1.40.1

I have therefore concluded that the coordinator (UZG-01 and/or SLZB06x) when flashed with FW 20240710 runs stable when it communicates with dedicated routers (using better router-devices might have solved the issue as well, but have not tested this scenario). To stress test the system, all end-devices have been configured to report instantly on every change and they have been running faultless for a few days. No disconnections or errors of any sort.

One last observation is that Tuya devices are definitely not the right way to go, especially on larger ZB networks, for a number of reasons, the primary being stability, configurability and reliability. I am slowly replacing Tuya with Sonoff devices and so far all is good.

If I find any subsequent issues related to FW20240710 over the long run I will post on this thread again.

The only issue which remains unsolved, but not related to Z2M, is the coordinator (UZG-01) disconnects whenever I try to access its webUI; tried latest ESP32 FW 20240915 but issue persists.

@devchristof
Copy link

devchristof commented Sep 26, 2024

After 3weeks to try to run with 20240710 and sonoff dongle P , re flash firmware, re pair all devices, Always sames problem after a couple of hours losing devices, lag in command, not possible to pair New device. Need to unplug dongle 2 times by Day. I try also to dowgrade to fw 2023 but finaly I dowgrade to the 20221226 fw and everything working fine now ! no problem for 7days!

Zigbee2MQTT version
1.40.1 commit: unknown
Coordinator type
zStack3x0
Coordinator revision
20221226
Coordinator IEEE Address

Frontend version
0.7.4
zigbee-herdsman-converters version
20.12.1
zigbee-herdsman version
0.57.3
Stats
Total 52
By device type
Router: 28
End devices: 24
By power source
Mains (single phase): 29
Battery: 21
DC Source: 2
By vendor
SONOFF: 7
LUMI: 5
eWeLight: 4
GLEDOPTO: 4
_TZ3000_qeuvnohg: 4
frient A/S: 3
Niko NV: 2
_TZ3000_xr3htd96: 2
_TZE200_2aaelwxk: 2
_TZ3000_ko6v90pg: 2
zbeacon: 2
_TZ3000_cayepv1a: 2
_TZ3000_5e235jpa: 1
_TZ3000_typdpbpg: 1
ADUROLIGHT: 1
_TZE204_t1blo2bj: 1
_TZ3000_hhiodade: 1
_TZ3000_axpdxqgu: 1
_TZE200_hue3yfsn: 1
_TZE200_yvx5lh6k: 1
ptvo.info: 1
_TZE200_81isopgh: 1
_TZ3000_xwh1e22x: 1
_TZ3210_95txyzbx: 1
_TZ3210_0zabbfax: 1

@SVH-Powel
Copy link

After updating to 20240710, all my sensors nearby and directly connected to the coordinator started to fail. I installed the old firmware from may 2023 and have not experienced any problem the last week. So rolling back to the previous version worked fine for me.

@kafisc1
Copy link

kafisc1 commented Oct 8, 2024

I'm no using launchpad_coordinator_20221226 and will report back if this solves the issue.

@Great-Chart
Copy link

@rursache - By reference to your "success" comment (#518 (comment)) I fear I'm at that same point you reached (but likely less technically adept myself). I suspect there has been an element of user error on my side with how I've migrated from different Sonoff Dongle-P coordinators leap-frogging one FW date onto a "spare" and swapping out the "original" or flashing firmware (on Windows via python method - I cannot get into bootloader mode with the button pressed on plugging in USB etc) that has resulted in some element of corruption lingering.

Of late my network has devices going offline as quickly as circa 2 hrs and as each eureka moment of a perceived fix does little more than increase randomness I'm in need of a fresh start.

Can I ask if your network has remained stable since your replacement coordinator and what FW is that running?
I've got a SLZB-06 (not M) that I plan to run via USB initially and need to further research the best approach with that and the preferred firmware

@habitats-tech - Noting you've had success with this coordinator could I ask what FM you flashed it to (noting that the SM Light Webflasher naming convention differs with v2.5.6 seemingly the most recent?

I've become somewhat uncertain where to turn and what nuggets of information to take from assorted posts as the feedback on coordinator FM flashing results are hugely varied. as such I'm likely to start a topic to gauge what is considered best practice to ensure a clean end result.

Did either of you try and retain the IEEE address of a previous coordinator or rebuild from scratch?

I'm concerned I might need to remove all devices; reset them and re-build progressively to avoid falling back into previous traps. With you both having had success (applying different methods) I thought I'd ask for additional clarity on how you went about restoring your network post plugging in "new coordinator" and thus whether I should be removing devices, powering them down, deleting coordinator_backup.json file or any other such step in any specific sequence.

Everyone appears to have had minimal issues and fathomed a permanent fix or continues to struggle (and suffer) and I'd started with issues that seemed related to one device loosing connection too regularly and ended up with a largely unusable network that I'm not likely to fix by attempting similar methods.

Log attached is the pair of logs merged from a restart this morning circa 6am that had the network down again by 8am.
It doesn't seem to infer the coordinator itself has crashed; nor evident signs of interference (that I can tell) and the failure mode is nominally consistent in that devices start going offline, ping errors arise and it quietly dies. (Unless anyone can indicate otherwise for me).
I've removed as many of the devices that didn't seem to self recover from a restart of Z2M or seemed otherwise excessively chatty and now a simple restart of Z2M restores things but it's hardly useable in the short term let alone the long term!

_FULL-log.log

Koenkk/zigbee2mqtt#24387
Koenkk/zigbee2mqtt#24401
Koenkk/zigbee2mqtt#23329
#505

@rursache
Copy link

rursache commented Oct 23, 2024

@Great-Chart Can I ask if your network has remained stable since your replacement coordinator and what FW is that running?

it did, perfectly stable. fw and details are in my initial comment

Did either of you try and retain the IEEE address of a previous coordinator or rebuild from scratch?

i did not but it didn't seem to matter to any accessory, everything works fine with a new IEEE address. i was just a simple swap for me, nothing else

@Great-Chart
Copy link

@Great-Chart Can I ask if your network has remained stable since your replacement coordinator and what FW is that running?

it did, perfectly stable. fw and details are in my initial comment

Did either of you try and retain the IEEE address of a previous coordinator or rebuild from scratch?

i did not but it didn't seem to matter to any accessory, everything works fine with a new IEEE address. i was just a simple swap for me, nothing else

Many thanks - great to hear somewhat painless for you and gives me confidence to attempt the same (perhaps playing safe and seeking out the appropriate 20230507 FW as a starting point for my SLZB-06 trial).

@gcs8
Copy link

gcs8 commented Oct 24, 2024

I tried the 20230507 FW and now Z2M just does not start.

[09:04:28] INFO: Preparing to start...
[09:04:28] INFO: Socat not enabled
[09:04:29] INFO: Starting Zigbee2MQTT...
Starting Zigbee2MQTT without watchdog.
[2024-10-24 09:04:31] info: z2m: Logging to console, file (filename: log.log)
[2024-10-24 09:04:31] info: z2m: Starting Zigbee2MQTT version 1.40.2 (commit #unknown)
[2024-10-24 09:04:31] info: z2m: Starting zigbee-herdsman (2.1.3)
[2024-10-24 09:04:31] info: zh:zstack:znp: Opening TCP socket with 192.168.100.137:6638
[2024-10-24 09:04:31] info: zh:zstack:znp: Socket connected
[2024-10-24 09:04:31] info: zh:zstack:znp: Socket ready
[2024-10-24 09:04:31] info: zh:zstack:znp: Writing CC2530/CC2531 skip bootloader payload
[2024-10-24 09:04:32] info: zh:zstack:znp: Skip bootloader for CC2652/CC1352
[2024-10-24 09:05:38] error: z2m: Error while starting zigbee-herdsman
[2024-10-24 09:05:38] error: z2m: Failed to start zigbee
[2024-10-24 09:05:38] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-10-24 09:05:38] error: z2m: Exiting...
[2024-10-24 09:05:38] error: z2m: Error: network commissioning timed out - most likely network with the same panId or extendedPanId already exists nearby (Error: AREQ - ZDO - stateChangeInd after 60000ms
at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
at ZnpAdapterManager.beginCommissioning (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:370:31)
at processTicksAndRejections (node:internal/process/task_queues:95:5)
at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:91:21)
at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:158:16)
at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:137:29)
at Zigbee.start (/app/lib/zigbee.ts:69:27)
at Controller.start (/app/lib/controller.ts:161:27)
at start (/app/index.js:154:5))
at ZnpAdapterManager.beginCommissioning (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:372:23)
at ZnpAdapterManager.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/manager.ts:91:21)
at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:158:16)
at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:137:29)
at Zigbee.start (/app/lib/zigbee.ts:69:27)
at Controller.start (/app/lib/controller.ts:161:27)
at start (/app/index.js:154:5)

@gcs8
Copy link

gcs8 commented Oct 24, 2024

After doing the below, it's still having issues readding devices.

network_key: GENERATE
# Let Zigbee2MQTT generate a pan_id on first start
pan_id: GENERATE
# Let Zigbee2MQTT generate a ext_pan_id on first start
ext_pan_id: GENERATE```

```info 2024-10-24 09:18:40z2m: Zigbee: allowing new devices to join.
info 2024-10-24 09:18:40z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/response/permit_join', payload '{"data":{"time":254,"value":true},"status":"ok","transaction":"pz7yr-1"}'
info 2024-10-24 09:18:55zh:controller: Interview for '0x00158d008afe16cf' started
info 2024-10-24 09:18:55z2m: Device 'Gcs8 office temp' joined
info 2024-10-24 09:18:55z2m: Starting interview of 'Gcs8 office temp'
info 2024-10-24 09:18:55z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"Gcs8 office temp","ieee_address":"0x00158d008afe16cf"},"type":"device_joined"}'
info 2024-10-24 09:18:55z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"Gcs8 office temp","ieee_address":"0x00158d008afe16cf","status":"started"},"type":"device_interview"}'
error 2024-10-24 09:19:55zh:controller: Interview failed for '0x00158d008afe16cf with error 'Error: Interview failed because can not get node descriptor ('0x00158d008afe16cf')'
error 2024-10-24 09:19:55z2m: Failed to interview 'Gcs8 office temp', device has not successfully been paired
info 2024-10-24 09:19:55z2m:mqtt: MQTT publish: topic 'zigbee2mqtt/bridge/event', payload '{"data":{"friendly_name":"Gcs8 office temp","ieee_address":"0x00158d008afe16cf","status":"failed"},"type":"device_interview"}'```

@dpgh947
Copy link

dpgh947 commented Oct 24, 2024

I flashed a sonoff dongle P (CC2652P) with 20240710. It worked, and everything came up, but switching lights was randomly very laggy - particularly if I switched any one light (instant) and then another a few seconds later (took 10-15 secs), and viewing the map in Z2M took minutes compared to normally just a few seconds - I only have 16 devices. Re-flashed back to 20230507 and all is well again.

@mpuff
Copy link

mpuff commented Nov 17, 2024

I got also issues with a slzb06p7 with latest firmware:
Koenkk/zigbee2mqtt#24332

when I do start pairing, I get the error:
JavaScript heap out of memory

and also when I do a restart of zigbee2mqtt, I also get often the error:
extendedPanId already exists nearby (Error: AREQ - ZDO - stateChangeInd after 60000ms

I have 125 devices and after a week, 50 devices (most router devices) are getting status offline

@gcs8
Copy link

gcs8 commented Nov 17, 2024

So, I ended up nuking everything again, moved from the SLZB-06 to the SLZB-06M I had, and it's more or less been fine.

for anyone playing along at home that needs a workaround for now, this is whats been working for me and decently stable, other than one time I did something odd and had to go around and just push all the buttons on pettery operated stuff 1-2 times (just a click to wake it up?) and it's been fine.

SLZB-06M
Firmware: core: v2.5.8 / zigbee: 20240510

port: tcp://192.168.100.26:6638
baudrate: 115200
adapter: ember
disable_led: false
rtscts: false

@rursache
Copy link

@Helgimagg01 there is no fix once you updated and have issues. see my comment

@vascozorrinho
Copy link

I managed to revert the update. Just one of my shades cover didn't fully recovered and I ended up changing the device itself. Guess I was lucky

@mpuff
Copy link

mpuff commented Nov 18, 2024

So what firmware is the most stable now for usage? 20230507?

@gerard33
Copy link

gerard33 commented Nov 18, 2024

I have updated my SMLIGHT Zigbee LAN Adapter CC2652P Model SLZB-05 with ZigStarGW-MT.exe from 20230507 to 20240710 and after that all my Aqara/Xiaomi door sensors and motion sensors didn't react anymore after a few hours. A restart of the smlight-slzb-05 made them work again for a few hours, but after that they either didn't respond or reacted laggy.
All other devices (battery powered or lights) were still working normal.

In Z2M the devices that didn't work anymore showed up normally as online.

I have now reverted back to 20230507 and everything works fine so far.

Network: 87 devices
Z2M: 1.41.0 (but also tried before with 1.40.0 with the same results)

@manormachine2207
Copy link

Has no one else came accross this behaviour of getting this error once updating a SLZB-06?

All I did was applying the latest firmware 20240710 through the SLZB-06 web UI, it immediately resulted in HA Zigbee2MQTT seing multiple erros comming in.

Going back to firmware 20221226 didn´t fix the issue. I can´t get it to work anymore - not even with a plain vanilla instance.

[07:30:41] INFO: Preparing to start...
[07:30:41] INFO: Socat not enabled
[07:30:41] INFO: Starting Zigbee2MQTT...
Starting Zigbee2MQTT without watchdog.
[2024-11-23 07:30:43] info: z2m: Logging to console, file (filename: log.log)
[2024-11-23 07:30:43] info: z2m: Starting Zigbee2MQTT version 1.41.0 (commit #unknown)
[2024-11-23 07:30:43] info: z2m: Starting zigbee-herdsman (2.1.7)
[2024-11-23 07:30:43] info: zh:zstack:znp: Opening TCP socket with 192.168.178.173:6638
[2024-11-23 07:30:43] info: zh:zstack:znp: Socket connected
[2024-11-23 07:30:43] info: zh:zstack:znp: Socket ready
[2024-11-23 07:30:43] info: zh:zstack:znp: Writing CC2530/CC2531 skip bootloader payload
[2024-11-23 07:30:44] info: zh:zstack:znp: Skip bootloader for CC2652/CC1352
[2024-11-23 07:30:51] error: z2m: Error while starting zigbee-herdsman
[2024-11-23 07:30:51] error: z2m: Failed to start zigbee
[2024-11-23 07:30:51] error: z2m: Check https://www.zigbee2mqtt.io/guide/installation/20_zigbee2mqtt-fails-to-start.html for possible solutions
[2024-11-23 07:30:51] error: z2m: Exiting...
[2024-11-23 07:30:51] error: z2m: Error: SRSP - SYS - stackTune after 6000ms
at Object.start (/app/node_modules/zigbee-herdsman/src/utils/waitress.ts:59:23)
at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/znp/znp.ts:285:45
at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:26)
at /app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:966:13
at Queue.execute (/app/node_modules/zigbee-herdsman/src/utils/queue.ts:36:20)
at ZStackAdapter.setTransmitPower (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:965:16)
at ZStackAdapter.start (/app/node_modules/zigbee-herdsman/src/adapter/z-stack/adapter/zStackAdapter.ts:155:13)
at Controller.start (/app/node_modules/zigbee-herdsman/src/controller/controller.ts:137:29)
at Zigbee.start (/app/lib/zigbee.ts:69:27)
at Controller.start (/app/lib/controller.ts:161:27)

@indomitorum
Copy link

indomitorum commented Nov 24, 2024

So what firmware is the most stable now for usage? 20230507?

20221226 is what I use. Works well and the releases that have come afterwards seem to be riddled with issues looking at the comments.

@kafisc1
Copy link

kafisc1 commented Nov 24, 2024

So what firmware is the most stable now for usage? 20230507?

20221226 is what I use. Works well and the releases that have come afterwards seem to be riddled with issues looking at the comments.

Can confirm 20221226 works well. I'm running 20221226 since a couple weeks ans everything runs well.

@mpuff
Copy link

mpuff commented Nov 24, 2024

I changed now to slzb-06p7, i can't find a firmware from 20221226 for that new zigbee chip in that device :(
is it possible to create a firmware?

@deviantintegral
Copy link

Hey folks, in general 20240710 has been the most solid firmware for me in my network of ~110 devices. While I've had some issues in the past few months, every time I've traced it down to a bad router or device.

Today, I had one of my light switches disconnect (https://www.zigbee2mqtt.io/devices/43076.html#enbrighten-43076).

The switch would respond to commands from the coordinator. However, no responses from the switch would make it back to zigbee2mqtt. I sniffed the traffic and saw it was routing through another Enbrighten switch, which was then sending it to the coordinator. However, there were no debug logs from herdsman indicating it got the packet.

First, I simply restarted zigbee2mqtt and it made no change. Then, I shut down zigbee2mqtt, pulled the Sonoff adapter out, plugged it back in, and restarted.

After zigbee2mqtt started up, and I sent an "on" command to the switch, after a second or two the response showed back up in zigbee2mqtt. However, the routing path remained exactly the same. The only thing that changed was that the firmware on the stick was restarted.

So far, this seems like a once-every-few-months issue, and I bet it's dependent on how often I restart my server.

Anyone else see anything similar?

@drjjr2
Copy link

drjjr2 commented Nov 27, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests