Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZHA - Individual Philips Hue Lights Unresponsive/Inconsistent, except within a Zigbee Group #116104

Closed
nerdyninny opened this issue Apr 24, 2024 · 15 comments

Comments

@nerdyninny
Copy link

nerdyninny commented Apr 24, 2024

The problem

ZHA - Individual Lights Unresponsive/Inconsistent, except within a Zigbee Group

I recently started having a lot of unresponsive/inconsistent Philips Hue lights. It all started when I migrated off the Philips Hue bridge and added around 25 devices (mostly Bulbs) to ZHA. I have a total of 95 nodes currently.

Something I noticed is that when the same unresponsive/inconsistent Philips Hue lights are controlled via a Zigbee group, they work very fast and snappy. But when I control them individually, that's when I see error messages like this one (see screenshot):
Failed to call service light/turn_off. Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102<

Things I've tried:

  1. Replaced and migrated Zigbee coordinator from HUSBZB-1 to Skyconnect (multi-protocol not enabled). HUSBZB-1 was working fine for many many months until I added a lot of devices.
  2. Skyconnect is connected to a 5m USB extension cord, via a USB 2.0 Hub, which is connected to a USB 2.0 port.
  3. Re-paired many devices that did not migrate properly. Added an additional 4 routers in an attempt to beef up mesh stability.

image
image

What version of Home Assistant Core has the issue?

core-2024.4.3

What was the last working version of Home Assistant Core?

Unsure

What type of installation are you running?

Home Assistant OS

Integration causing the issue

ZHA

Link to integration documentation on our website

https://www.home-assistant.io/integrations/zha/

Diagnostics information

home-assistant_zha_2024-04-24T12-53-22.693Z.log

Debug during on/off failures of individual Hue lights, and on/off success when Group containing same lights on/off is sent.

Example YAML snippet

No response

Anything in the logs that might be useful for us?

2024-04-24 08:53:16.261 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received messageSentHandler: [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 21065, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=5), 210, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2024-04-24 08:53:16.261 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 21065, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=5), 210, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2024-04-24 08:53:16.262 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection] [140178771012160] Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>
Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 64, in wrap_zigpy_exceptions
    yield
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 84, in wrapper
    return await RETRYABLE_REQUEST_DECORATOR(func)(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy/util.py", line 131, in retry
    return await func()
           ^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy/zcl/__init__.py", line 377, in request
    return await self._endpoint.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy/endpoint.py", line 253, in request
    return await self.device.request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/zigpy/device.py", line 339, in request
    await send_request()
  File "/usr/local/lib/python3.12/site-packages/zigpy/application.py", line 841, in request
    await self.send_packet(
  File "/usr/local/lib/python3.12/site-packages/bellows/zigbee/application.py", line 931, in send_packet
    raise zigpy.exceptions.DeliveryError(
zigpy.exceptions.DeliveryError: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/src/homeassistant/homeassistant/components/websocket_api/commands.py", line 239, in handle_call_service
    response = await hass.services.async_call(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2543, in async_call
    response_data = await coro
                    ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/core.py", line 2580, in _execute_service
    return await target(service_call)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 971, in entity_service_call
    single_response = await _handle_entity_call(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/helpers/service.py", line 1043, in _handle_entity_call
    result = await task
             ^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/light/__init__.py", line 642, in async_handle_light_off_service
    await light.async_turn_off(**filter_turn_off_params(light, params))
  File "/usr/src/homeassistant/homeassistant/components/zha/light.py", line 472, in async_turn_off
    result = await self._on_off_cluster_handler.off()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 83, in wrapper
    with wrap_zigpy_exceptions():
  File "/usr/local/lib/python3.12/contextlib.py", line 158, in __exit__
    self.gen.throw(value)
  File "/usr/src/homeassistant/homeassistant/components/zha/core/cluster_handlers/__init__.py", line 75, in wrap_zigpy_exceptions
    raise HomeAssistantError(message) from exc
homeassistant.exceptions.HomeAssistantError: Failed to send request: Failed to deliver message: <EmberStatus.DELIVERY_FAILED: 102>
2024-04-24 08:53:16.469 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'75c9b1a96b2a15c1d9904b23aa5e99099c4e27a23ba867cdd37e'
2024-04-24 08:53:16.469 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8070787e'
2024-04-24 08:53:16.470 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received messageSentHandler: [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 32883, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=9), 214, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2024-04-24 08:53:16.471 DEBUG (MainThread) [bellows.zigbee.application] Received messageSentHandler frame with [<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 32883, EmberApsFrame(profileId=260, clusterId=6, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=9), 214, <EmberStatus.DELIVERY_FAILED: 102>, b'']
2024-04-24 08:53:16.472 DEBUG (MainThread) [zigpy.zcl] [0x8073:11:0x0008] Sending request header: ZCLHeader(frame_control=FrameControl<0x00>(frame_type=<FrameType.GLOBAL_COMMAND: 0>, is_manufacturer_specific=False, direction=<Direction.Client_to_Server: 0>, disable_default_response=0, reserved=0, *is_cluster=False, *is_general=True), tsn=252, command_id=<GeneralCommand.Read_Attributes: 0>, *direction=<Direction.Client_to_Server: 0>)
2024-04-24 08:53:16.472 DEBUG (MainThread) [zigpy.zcl] [0x8073:11:0x0008] Sending request: Read_Attributes(attribute_ids=[0])
2024-04-24 08:53:16.473 DEBUG (MainThread) [bellows.zigbee.application] Sending packet ZigbeePacket(timestamp=datetime.datetime(2024, 4, 24, 12, 53, 16, 472991, tzinfo=datetime.timezone.utc), src=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x0000), src_ep=11, dst=AddrModeAddress(addr_mode=<AddrMode.NWK: 2>, address=0x8073), dst_ep=11, source_route=[0x6851, 0x9c03], extended_timeout=False, tsn=252, profile_id=260, cluster_id=8, data=Serialized[b'\x00\xfc\x00\x00\x00'], tx_options=<TransmitOptions.NONE: 0>, radius=0, non_member_radius=0, lqi=None, rssi=None)
2024-04-24 08:53:16.473 DEBUG (MainThread) [bellows.ezsp.protocol] Send command setSourceRoute: (0x8073, [0x6851, 0x9c03])
2024-04-24 08:53:16.474 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'50ce21a9fa2a66325bc5222636ca527e'
2024-04-24 08:53:16.482 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'06cea1a9fa2a1578667e'
2024-04-24 08:53:16.482 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'8160597e'
2024-04-24 08:53:16.483 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received setSourceRoute: [<EmberStatus.SUCCESS: 0>]
2024-04-24 08:53:16.484 DEBUG (MainThread) [bellows.ezsp.protocol] Send command sendUnicast: (<EmberOutgoingMessageType.OUTGOING_DIRECT: 0>, 0x8073, EmberApsFrame(profileId=260, clusterId=8, sourceEndpoint=11, destinationEndpoint=11, options=<EmberApsOption.APS_OPTION_RETRY: 64>, groupId=0, sequence=252), 238, b'\x00\xfc\x00\x00\x00')
2024-04-24 08:53:16.485 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'61cf21a9602a15c1d9904b2daa5e99099c4e275703cb6777fdc663d3617e'
2024-04-24 08:53:16.494 DEBUG (bellows.thread_0) [bellows.uart] Data frame: b'17cfa1a9602a159346777e'
2024-04-24 08:53:16.495 DEBUG (bellows.thread_0) [bellows.uart] Sending: b'82503a7e'
2024-04-24 08:53:16.495 DEBUG (MainThread) [bellows.ezsp.protocol] Application frame received sendUnicast: [<EmberStatus.SUCCESS: 0>, 33]

Additional information

Diagnostic Logs
config_entry-zha-65910acf658adfe741b482ea10beeb3f.json

@home-assistant
Copy link

Hey there @dmulcahey, @Adminiuga, @puddly, @TheJulianJES, mind taking a look at this issue as it has been labeled with an integration (zha) you are listed as a code owner for? Thanks!

Code owner commands

Code owners of zha can trigger bot actions by commenting:

  • @home-assistant close Closes the issue.
  • @home-assistant rename Awesome new title Renames the issue.
  • @home-assistant reopen Reopen the issue.
  • @home-assistant unassign zha Removes the current integration label and assignees on the issue, add the integration domain after the command.
  • @home-assistant add-label needs-more-information Add a label (needs-more-information, problem in dependency, problem in custom component) to the issue.
  • @home-assistant remove-label needs-more-information Remove a label (needs-more-information, problem in dependency, problem in custom component) on the issue.

(message by CodeOwnersMention)


zha documentation
zha source
(message by IssueLinks)

@TheJulianJES
Copy link
Member

TheJulianJES commented Apr 24, 2024

What are the model names of the affected Hue lights?

@nerdyninny
Copy link
Author

What are the model names of the affected Hue lights?

  1. Philips Hue (HA name: Hue Kitchen Counter Corner Lamp): IEEE: 00:17:88:01:00:1d:a7:eb, Nwk: 0x5249, LLC011 by Signify Netherlands B.V. | Firmware: 0x43005d0b

  2. Philips Hue (HA name: Hue Kitchen Counter Strip Light): IEEE: 00:17:88:01:00:cc:0c:13, Nwk: 0x4256, LST001
    by Signify Netherlands B.V. | Firmware: 0x43005d0b

@nerdyninny
Copy link
Author

nerdyninny commented Apr 24, 2024

Update: I just moved my coordinator to a central part of my home (1st floor, instead of basement). I have a lot of routers/repeaters (mains) already on the network (27 Hue Bulbs, 2 Hue Strips Lights, 1 Hue Bloom Lamp, 1 Tuya Air Monitoring Sensor, 17 ThirdReality Outlets). 13 ThirdReality Outlets were already present before I migrated the Hue devices, but I added another 4 afterwards to see if it would improve anything. Anyway, after moving my coordinator, I still have several Hue devices that intermittently work when controlled directly (EmberStatus.DELIVERY_FAILED: 102), but work very fast/consistently when I use the Zigbee group they happen to be part of. I've also noticed the same 'delivery failed' issue on some of my ThirdReality outlets as well, but I haven't bothered to add them to a Zigbee group.

@dmulcahey
Copy link
Contributor

How many devices total are on the network? Source routing could possibly help if the network is a decent size.

@nerdyninny
Copy link
Author

nerdyninny commented Apr 25, 2024

95 total. Source routing is already enabled.

enable zha quirks

zha:
enable_quirks: true
custom_quirks_path: /config/zha_quirks/
zigpy_config:
source_routing: true
ezsp_config:
����� CONFIG_MAX_END_DEVICE_CHILDREN: 0
����� CONFIG_TX_POWER_MODE: 3
����� CONFIG_NEIGHBOR_TABLE_SIZE: 16
����� CONFIG_SOURCE_ROUTE_TABLE_SIZE: 110

—-

RPReplay_Final1714010601.mov

I figured a video of the issue might help.

@dmulcahey
Copy link
Contributor

Ok, let’s try something. Put Zigpy at debug (can do with the logger set level service) then try to control a couple individual devices that give you trouble. Then get the logs and look at what is logged for the source routes for the bad transactions. Maybe there is a router on the network misbehaving. We can try power cycling and / or pulling the devices with the nwk in the route that is logged.

@puddly
Copy link
Contributor

puddly commented Apr 25, 2024

Also, get rid of CONFIG_TX_POWER_MODE.

@nerdyninny
Copy link
Author

So I went ahead and just removed all the configuration.yaml Zha entries under ezsp_config, and disabled source routing for now.

Moving the coordinator made somewhat of a positive difference in terms of the total number of unresponsive Hue lights.

I haven’t done the Zigpy debug, mostly because I don’t know how and finger crossing things will stabilize.

I do have a question though:

Why are the lights unresponsive via the GUI as a Device, but consistently responsive when the same Device (or even Devices, plural) is added to a Zigbee Group? I get that a single command can be sent to a set of Zigbee grouped lights, which decreases network traffic, but even when I power on a single bulb (one command), it is also unresponsive? And even in the case of source routing enabled, wouldn’t the path be the same regardless?

@dmulcahey
Copy link
Contributor

So I went ahead and just removed all the configuration.yaml Zha entries under ezsp_config, and disabled source routing for now.

Moving the coordinator made somewhat of a positive difference in terms of the total number of unresponsive Hue lights.

I haven’t done the Zigpy debug, mostly because I don’t know how and finger crossing things will stabilize.

I do have a question though:

Why are the lights unresponsive via the GUI as a Device, but consistently responsive when the same Device (or even Devices, plural) is added to a Zigbee Group? I get that a single command can be sent to a set of Zigbee grouped lights, which decreases network traffic, but even when I power on a single bulb (one command), it is also unresponsive? And even in the case of source routing enabled, wouldn’t the path be the same regardless?

No

  1. Source-Routed Packets:

    • Definition: In source routing, the entire path through which the packet is to travel through the network is determined at the source. This route is specified in the packet header, which means the packet carries the addresses of all the intermediate nodes it must pass through en route to the destination.
    • Purpose: Source routing is typically used in mesh networks, like those formed by Zigbee, to enhance routing efficiency and reliability. It helps in scenarios where network topology is stable, and the best routes are known and can be pre-determined, often based on previous interactions.
    • Advantages: Reduces the routing overhead on intermediate nodes, as they do not need to make routing decisions, just forward the packet based on the pre-determined path. It also can help avoid routing loops and decrease latency.
    • Disadvantages: Requires knowledge of the network topology, which may not always be current. It also increases the packet header size due to the inclusion of the list of node addresses.
  2. Broadcasts:

    • Definition: Broadcasting in Zigbee sends a message to all nodes within the network or within a certain radius. The packet does not specify any particular route or destination node addresses; instead, it is simply propagated by each node to all of its neighbors.
    • Purpose: Broadcasts are used for tasks like network-wide announcements, searching for a specific node (device discovery), or configuration commands that need to reach all nodes.
    • Advantages: Simple to implement as it does not require the source to know the network topology or the route to specific nodes. It ensures that all nodes in the area (or the entire network) will receive the message.
    • Disadvantages: Can lead to high network traffic and increased collisions, a phenomenon known as the "broadcast storm problem," especially in dense networks. It is less efficient in terms of network resource usage compared to directed routing methods.
  3. Group addressing:
    in Zigbee is a method used to efficiently manage communication among multiple devices within a Zigbee network. It allows a single message to be sent to multiple devices that are configured to listen to the same group address. This is particularly useful in home automation and IoT applications where multiple devices, like lights or sensors, need to receive the same command simultaneously. Here's how it works in detail:

Communication Using Group Addresses

  • Sending Commands: When a command or message needs to be sent to all devices in a group, the sender (such as a Zigbee coordinator or a smart home hub) sends a single message addressed to the group address.
  • Reception by Devices: Only the devices that have subscribed to that particular group address will process the message and act upon it. Other devices that have not subscribed to the addressed group will ignore the message.
  • Efficiency: This method is efficient because it reduces the number of messages sent through the network, thereby decreasing traffic and increasing the responsiveness of devices.

So the TL;DR they are very different. Group messaging is technically multicast (a broadcast that only some devices will act on) essentially spray and pray… where as source routing is a message sent along a pre determined path to a particular device. This is why I wanted you to enable debug so we could see what devices were in the path to determine if maybe a particular device is swallowing messages or if your routes aren’t updating for some reason.

@MirekDusinojc
Copy link

MirekDusinojc commented May 1, 2024

I can confirm I might be observing the same issue. I have few Philips Hue lights. After the 2024.4 update I started noticing one of my Hue lights - a light strip - started misbehaving, becoming non-responsive time to time. I have it as a part of a scene and when the automation fires this light does not turn on sometimes, other times it fires without any issue. There hasn't been any issues before the update, it was working quite reliably. Other of my Hue lights seems to be working well though so I am not sure why only this one misbehave.
When trying to fix the light using reconfiguration, it very often fails. I did a factory restart of it once.
Also this should not be a signal issue as the device is few meters from the HUB and there are multiple other routers in the room

@nerdyninny
Copy link
Author

My Zigbee network seems have mostly stabilized, except for one Hue Spotlight Color bulb.

I can control it consistently only via a Zigbee Group (it’s in a room group with 3 total). If I change the problem bulb’s color, or on/off, I get the following error:

Failed to call service light/turn_on. Failed to send request: Failed to deliver message: «EmberStatus.DELIVERY_FAILED: 102>

Kinda wonky and I don’t understand why a Zigbee group command works on it, but not a direct command by itself. It’s sporadic too as I’ve reset the bulb several times now. It’ll work for a day, or a few days, and then stop working (except when in a zigbee group).

Bulb stats:
LCT002
by Signify Netherlands B.V.
Firmware: 0x43006502

Device Type: Router
LQI: 140
RSSI: -65
Last seen: 2024-05-28T21:58:25

@TheJulianJES
Copy link
Member

The LLC011 is apparently TI/CC2530-based and is known to cause issues (especially with source routing) on anything but very old firmware. You might want to remove these devices from your network for now if you experience stability issues.
The Hue firmware development team is informed of the issue.

Related thread:

@nerdyninny
Copy link
Author

especially with source routing

Does going back to broadcast routing (instead of source routing) fix it?

I read the thread you hyperlinked to. I downloaded the oldest OTA firmware, but can’t seem to downgrade and looking at the debug logs it’s saying I have the latest firmware already. Any tips on how to downgrade using Zha?

@issue-triage-workflows
Copy link

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue has now been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@issue-triage-workflows issue-triage-workflows bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 20, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Oct 20, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants