Explicit tasks cancel #383

Lurker00 · 2024-11-07T07:49:51Z

I've started this set of changes when I realized that TuyaDevice._call_on_close permanently grows. Consider a WiFi battery powered T&H sensor that wakes up and sends measurements every 20 minutes, 3 times per hour. If it is configured with sleep time of 6 hours (21600 seconds), it generates 3 _shutdown_entities tasks per hour, 72 items in the TuyaDevice._call_on_close per day. Add here one _new_entity_handler per successful connect and _async_reconnect to re-connect, to have 3x72=216 items per day from one sensor alone, not counting other devices that may disconnect and re-connect. This is a memory leak.

Moreover, with this consideration, after 6 hours, 18 _shutdown_entities async tasks pilled up in the tasks queue, and then the queue always has at least 18 tasks to serve, from only one sensor. This is a waste of queue resources.

When I've implemented one TuyaDevice member per async task (_task_reconnect, _task_shutdown_entities, _unsub_new_entity) instead of _call_on_close to explicitly stop them in the close(), I've found out that cancel() does not actually stops a task which calls await asyncio.pause! asyncio.CancelledError is raised for a task only if await asyncio.pause is called in a try block, directly or indirectly. It means, for low power devices, _shutdown_entities never stopped, and _async_reconnect stopped with a delay, during LocalTuya restart or shutdown. This could cause unexpected misbehavior. Probably, this was the reason why the original #363 code didn't work for you. So, self._sleep() was implemented and tested to confirm that now all the tasks stop instantly.

Please remember that all the async tasks are now running from the same loop, meaning, e.g. when close() is running, other tasks in the queue hang on await statements. But when close() itself executes await, any other task may continue running until either its end, or another await. That's why I've inserted several and not self._is_closing checks into _make_connection(), to avoid redundant work while waiting for the _task_connect to stop. Alternatively, except asyncio.CancelledError/return could be added to each and every try block, but it would make more lines of code.

I've renamed _connect_task -> _task_connect to have uniform names.

Frankly, I never saw _new_entity_handler being called, and I don't quite understand the conditions for it to be called. But I believe it's enough to set it only once, like other HASS callbacks.

custom_components/localtuya/coordinator.py

xZetsubou · 2024-11-09T03:13:05Z

Also instead of waiting close one by one for tasks why not cancel them all then wait. e.g.

        tasks = [self._task_reconnect, self._task_shutdown_entities, self._task_connect] 
        for task in tasks:
            task.cancel()

        await asyncio.gather(*tasks)

Lurker00 · 2024-11-09T06:48:15Z

Also instead of waiting close one by one for tasks why not cancel them all then wait. e.g.

        tasks = [self._task_reconnect, self._task_shutdown_entities, self._task_connect] 
        for task in tasks:
            task.cancel()

        await asyncio.gather(*tasks)

How it would deal with None values? A device successfully connected from the first attempt has None for all of them. Also, it is essential to completely stop _task_connect before cancelling other tasks, because _task_connect can start _task_reconnect, and _task_reconnect can start _task_shutdown_entities. So, the current order guarantees full stop.

Actually, they work almost instantly. No real performance to gain.

Lurker00 · 2024-11-13T17:46:31Z

Now I believe it is done. I even recalled about disconnected sub-devices that remove themselves from their gateways! I think the goal of this PR is met.

custom_components/localtuya/coordinator.py

Lurker00 · 2024-11-19T12:21:26Z

Here is what I'm currently testing. I'm against making def connected(self) inconsistent and misleading by including the check for _is_closing: explicit checks in places where it is really required.

I can't provide a proof that the check I added
https://github.com/Lurker00/hass-localtuya/blob/709b1050e4c29e96827b149eb0cba8ae49c0e25c/custom_components/localtuya/coordinator.py#L192-L193
may actually happen, but decided that better to check than to regret, because not gateway.connected may mean self._interface is None.

Lurker00 · 2024-11-19T14:26:27Z

Testing BLE device detach-attach, when LocalTuya self-restarted due to local key changed for detached sub-device, I've got

2024-11-19 16:50:28.543 WARNING (MainThread) [homeassistant.config_entries] Unloading localtuya (localtuya) config entry. Task <Task pending name='None localtuya localtuya 944c1616039a1d1ec6b9745179bff74e' coro=<TuyaDevice.close() running at /config/custom_components/localtuya/coordinator.py:364> wait_for=<_GatheringFuture pending cb=[Task.task_wakeup()]> cb=[set.remove(), set.remove()]> did not complete in time

in the log. From the previous records I see that the sub-device finished updating its local key while other devices were closing. Then it should sleep for scale * RECONNECT_INTERVAL seconds, and, obviously, the sleep was not cancelled.

So, I'm still curious when a sleep is cancelled and when it does not... I'd return the _sleep with the try, like all the cancel() examples have.

This is a minor issue that need more investigation, and, I believe, does not prevent merging existing changes.

Made a fix based on this test (forgotten return).

Lurker00 · 2024-11-19T17:06:02Z

Wih the commit 08ec9e1 the problem with detached BLE sub-device has gone! So, I've done the same with _shutdown_entities to be on the safe side for sure. Also minor related changes in _make_connection.

Now it looks and acts good for my taste.

xZetsubou · 2024-11-19T22:02:58Z

So, I'm still curious when a sleep is cancelled and when it does not... I'd return the _sleep with the try, like all the cancel() examples have.

From my understanding I think this occurs in "while" loops so usually when creating a while loop the whole code should be wrapped inside try block.

I think this is good now, thank you 😸

* Merged abort connection w/ pending tasks. * added cancled asyncio in pytuya connect. * rename subdevice_state var/func

* abort_connect() shall be called at the end: revert #383 abort interface close before sub-devices * subdevice_state_updated calls * Prevent out of order commands * Better serialization of writes * Check for _is_closing before going to sleep

Lurker00 and others added 11 commits November 4, 2024 21:15

Eliminated _call_on_close, which was growing without a control

308a1e0

Different cancellation order

1cbf4bc

Clear the task on return

c0332aa

Typo in the comment

59f4f8e

Clear after use

a54664e

Useless parameter

b3563b0

Interruptable sleep

742b0bb

Stop the previous _shutdown_entities and start a new one.

24a4b15

Force debug logging of sleep cancel event

845b82f

Comment

018f372

Fix mistaken edit

5092a9d

xZetsubou reviewed Nov 8, 2024

View reviewed changes

custom_components/localtuya/coordinator.py Outdated Show resolved Hide resolved

Lurker00 and others added 4 commits November 12, 2024 12:31

Calling _shutdown_entities() in close() is not required.

3122839

Early interrupt of _connect_subdevices

260eaee

Group cancel of async tasks

9f5e2ec

Close detached sub-devices and faster close of sub-devices

a317328

xZetsubou requested changes Nov 15, 2024

View reviewed changes

custom_components/localtuya/coordinator.py Outdated Show resolved Hide resolved

xZetsubou reviewed Nov 15, 2024

View reviewed changes

custom_components/localtuya/coordinator.py Outdated Show resolved Hide resolved

Lurker00 added 3 commits November 18, 2024 21:09

_sleep() actually is not required (anymore?)

fcecc02

Don't remove sub-devices from its gateway's container

709b105

Don't rely on disconnect(): _unsub_refresh in close()

141047c

Don't count "absent" events as "offline" events

cd5098a

Lurker00 added 3 commits November 19, 2024 19:42

try block around _async_reconnect loop body

08ec9e1

try block in _shutdown_entities

1f4bb08

More on cancel handling in _make_connection

107ec1a

cleans up and missed expection

5e3f5ba

* Merged abort connection w/ pending tasks. * added cancled asyncio in pytuya connect. * rename subdevice_state var/func

xZetsubou merged commit 6a902e6 into xZetsubou:master Nov 19, 2024
2 checks passed

xZetsubou mentioned this pull request Nov 26, 2024

pref: Include sub device on disable disconnection #400

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicit tasks cancel #383

Explicit tasks cancel #383

Lurker00 commented Nov 7, 2024

xZetsubou commented Nov 9, 2024 •

edited

Loading

Lurker00 commented Nov 9, 2024 •

edited

Loading

Lurker00 commented Nov 13, 2024

Lurker00 commented Nov 19, 2024

Lurker00 commented Nov 19, 2024 •

edited

Loading

Lurker00 commented Nov 19, 2024

xZetsubou commented Nov 19, 2024 •

edited

Loading

Explicit tasks cancel #383

Explicit tasks cancel #383

Conversation

Lurker00 commented Nov 7, 2024

xZetsubou commented Nov 9, 2024 • edited Loading

Lurker00 commented Nov 9, 2024 • edited Loading

Lurker00 commented Nov 13, 2024

Lurker00 commented Nov 19, 2024

Lurker00 commented Nov 19, 2024 • edited Loading

Lurker00 commented Nov 19, 2024

xZetsubou commented Nov 19, 2024 • edited Loading

xZetsubou commented Nov 9, 2024 •

edited

Loading

Lurker00 commented Nov 9, 2024 •

edited

Loading

Lurker00 commented Nov 19, 2024 •

edited

Loading

xZetsubou commented Nov 19, 2024 •

edited

Loading