-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Devices Offline / Trouble Shooting Guide #419
Comments
If you receive on/off events from the device, that means the device is on the network and has the current network key. You are not saying what device is it, so it's hard to guess, but usually battery operated devices i.e. end devices, go in sleep and don't reply to request. Zha/zigpy keeps the neighbor table of each router, you could use that information to construct the topology. Or use the built in visualization or install the 3rd party zig-zag component. Keep debug logging running, and monitor for trustCenterHandler message. That would tell you when a device leaves the network or joins it. Alternatively, setup the packet sniffer and troubleshoot individual device: are they getting the packet? Do they acknowledge it? Etc |
Ps: troubleshooting why xiaomis leave the network is a lost cause. They need compatibile routers |
In no software engineer but more radio / telecom engineer but if look on the line (that i can see) you dont have some general bad radio connections. I can being RF interference that is blocking your mesh but not so likely. If that happens is unicast not working but very likely mullti/broadcast (group commands) i working OK. But i think you is having on or more devices that is bad behaving in your mesh and making the problems. One "classical problem" is OSRAM plugs that is having very bad RF and buggy firmware that is not forwarding all packages so end devices is leaving then is not getting acks from the coordinator. Not the easy way but the best is sniffing traffic with on external sniffer devices so can see all passages also the IEEE 802.15.4 acks between devices in the chain that you cant see in the Zigbee level. Building on routing map sound great so can see where the traffic is broken or at least around where the problem is shoeing up. If you is having one extra EM35X or EFR32 device you can using it as one good zigbee sniffier with normal NCP firmware on it. To finding good docs that describing source routing is not so easy but om very sure you can finding but i dont have see any. Its also one nasty firmware bug in older Silabs router devices like IKEA and many more that is crashing the firmware and cant routing traffic and is triggered of parent announcement (fixed in EZSP 6.7.7.0 stack) and the router must being re powered for start working (and that is triggering the device broadcasting parent announcement). |
Most devices are Jasco family in wall. I see the bad links in Digi's XCTU, but not really in ZigZag. There are some for sure... but .... they all worked great on Hubitat as the controller. My idea is that I could test all routes by traversing each router. 95% of my devices are wired. I only have 6 battery powered motion on the network. Everything else is a router, or a mains powered end point. There has to be a good testing methodology to walk the routes and see which devices say they are good, vs the devices that actually are. I gave up on all xiaomis 30 minutes after pairing them. Ugh..... I have been looking at the neighbor table from zha. How is it constructed? Smartthings sent a one to many routing packet every sixty seconds. I feel like there is a liar in the network. Maybe a sociopath. It thinks it's telling the truth....but it isn't. If I can find it and kill it, or replace it I'll fix the black hole and then that will fix the routes. I'll totally get an extra EM35X or EFR32. What do you use for sniffing with those? Different that XCTU? I have been running XCTU for 6 hours. 6 edge devices. Everything else is a router. |
you could enable source routing for EZSP with the following in your zha:
zigpy_config:
source_routing: True
ezsp_config:
CONFIG_SOURCE_ROUTE_TABLE_SIZE: 32
ZHA sends For sniffing, you could use CC2531 with ZBoss sniffer firmware, but it requires a flasher. Otherwise, any Silabs (EZSP) based USB device would work, like Elelabs, if you don't have a SWD flasher. If you have one, then there're more options, like EBYTE E180-Z120B module, or some other MG21 based USB dongles from aliexpress Sonoff Zigbee Bridge flashed with EZSP firmware could be used to, but it runs over WiFi and may not work reliably. |
Was looking for FCC docks but most is not under the name Jasco but under the Chinese manufacture but its looks like they is using CEL modules with EM357 chips = good. One good instruction for sniffing is: https://www.zigbee2mqtt.io/how_tos/how_to_sniff_zigbee_traffic.html. If going EM35X / EFR32 you only need have one EZSP coordinator firmware running on the hardware and can sniffing with the program in the link above. Elelabs USB dongle is working out of the box and can being updated if you need without problem. If you is having SWD flasher (original J-Link) you can flashing every EFR32 device also the new IKEA N2 (E2000 / E2001) remote that is having one very good EFR32MG21 module and have all the pads for SWD flashing on the PCB only needing on USB-TTL for com after flashing one EZSP firmware. |
@MrYutz did you succeed in making your network stable ?
So any feedback/ experience sharing would be great |
In 10s there were 24 RX Unicast, and 23 TX successful unicasts and 1 failed TX unicasts. About 120 devices, at least 3/8 are mains powered.
That should not happen. I mean: even if bellows may looses connection to the stick, bellows restart should re-establish the connection. On the other hand, i'm not fond of Pi Shields, too much interference close to the radio and cannot really separate it. Try turning off Wifi and Bluetooth on RPI. |
and pay attention to PHY_CCA_FAIL_COUNT counter, that would indicate interference. It is quite possible that single failed Unicast TX was caused my CCA failure. |
I'll re-enabled debug information to capture those mettrics. But here is what I found on past logs
|
Keep in mind, the counters are reset about every 1800s ( bellows/bellows/zigbee/application.py Line 930 in 7dc68b0
state.counters['ezsp_counters'] check the counters values just before it resets, cause 10s could be too short. |
Is there a way to get some documentations on all those counters ? For instance , PHY_CCA_FAIL_COUNT what should be a reasonable value ? In your example I see PHY_CCA_FAIL_COUNT = 19777 so I wouldn't be to worry about a large number , should I compare it to the Number of Tx sent ? |
here after is an exemple of crash. Should I do to fix "ERROR_EXCEEDED_MAXIMUM_ACK_TIMEOUT_COUNT" ?
|
First mobil app for scanning WiFi channels around the coordinator and looking in the wiki for how its Zigbee is mapping channels https://github.com/zigpy/zigpy/wiki/Zigbee---Changing-channel. Also USB 3 cables and devices away from the coordinator and other bad hardware. |
All,
I have been trying to move my ~100 devices to Home Assistant from Hubitat for several months without success. I started with Insteon, moved to Smartthings, then Hubitat, then Hubitat + HA. Now I'd like to get it all on one platform.
The problem is, I am getting device unavailable on random devices in my network. I am an RF engineer and have been trying to figure out what the deal is. All devices are within 20m of a router. The coordinator is within 10m or 10 routers. Most links show very high quality in HA, but in XCTU it is a different picture.
I have tried both conbee II and NORTEK HUSBZB-1 with the latest 115k Firmware. I have a XSTICK2 joined to the network for troubleshooting.
I think that one or more of my Router devices are black-holing traffic. I am getting weird symptoms like the following:
90% of my devices are JASCO / GE in wall dimmers and switches. I also have a pile of smart plugs from different manufactures that I can use for additional coverage or Christmas lights depending on the time of year.
I am not sure this is the right place to post this but because bellows does Source Routing, I thought this community might be the place to get involved.
So my first question is: Where would I jump in to docs for creating a route mapping python script? Is there a FAQ on source routing? I am reasonably familiar with Digi's ZigBee docs, but I haven't written anything beyond simple P2P communications.
I'd like to build some doc and tools to help the community better understand their network and allow me to consolidate on one box.
The text was updated successfully, but these errors were encountered: