Terminate scripts with until and while conditions that execute more than 10000 times #115110

bdraco · 2024-04-07T12:41:11Z

Usually, we don't tag breaking changes for backport but since the impact is the system hanging up and it does so ~153x faster in 2024.4.x, the condition can't become true soon enough to avoid the hang most of the time like it could in earlier versions.

This one seems worth making an exception for given the result is a hang of the full system. If someone disagrees feel free to untag the milestone. Removing the milestone could also be justified since the problem could still happen on older versions, but was much less likely.

Breaking change

To prevent the system from locking up when an automation is looping, there is now a hard limit of 10000 loops for repeat until and while conditions in scripts and automations. A warning will be logged when the loop reaches 5000 iterations.

Proposed change

TLDR: Polling loops (usually without a delay) in automations can hang the system. Previously these loops would execute a lot slower before 2024.4.x and users relied on them finishing in time before the system ran out of memory.

Extremely high numbers were chosen here to minimize the number of use cases affected.

It is expected that this will cause some automation to stop working because we have cases where users expect their automation to check the condition in a loop for many iterations, even if it is consuming nearly all of the system's CPU time. Affected automations would likely crash the system anyway, so the impact should be minimal.

Additionally, we now yield to the event loop between iterations of until and while conditions to keep the system responsive while all the CPU time is being consumed and give the user a chance to terminate the script themselves. Yielding to the loop has the unfortunate side effect of making all until and while conditions more latent but is limited to scripts with these conditions.

These loop designs are typically created because the automation author copied the code from somewhere or needed to learn how to write better automation. They didn't expect the system to lock up because the automation design could have been better.

Ideally, these types of automation should be replaced with wait_template instead, as it generally will not consume all the CPU time while waiting for a condition to be true. However, we need to ensure the system remains stable even with a less-than-ideal automation design since blueprints exist that use loops to poll conditions instead.

The python equivalent to what these script are doing is:

while True:
  if switch.is_on():
    break

2024-04-07 02:37:57.336 WARNING (MainThread) [homeassistant.helpers.script] Until condition [{condition: template, value_template: Template<template=({{1 != 1}}) renders=20000>}] in script loop is looping more than 10000 times
2024-04-07 02:38:11.902 CRITICAL (MainThread) [homeassistant.helpers.script] Until condition [{condition: template, value_template: Template<template=({{1 != 1}}) renders=200000>}] in script loop terminated because it looping more than 100000 times

Type of change

Dependency upgrade
Bugfix (non-breaking change which fixes an issue)
New integration (thank you!)
New feature (which adds functionality to an existing integration)
Deprecation (breaking change to happen in the future)
Breaking change (fix/feature causing existing functionality to break)
Code quality improvements to existing code or addition of tests

Additional information

This PR fixes or closes issue: fixes Infinite loop in automation/script repeat causes Home Assistant to freeze #115042
This PR is related to issue:
Link to documentation pull request:

Checklist

The code change is tested and works locally.
Local tests pass. Your PR cannot be merged unless tests pass
There is no commented out code in this PR.
I have followed the development checklist
I have followed the perfect PR recommendations
The code has been formatted using Ruff (ruff format homeassistant tests)
Tests have been added to verify that the new code works.

If user exposed functionality or configuration variables are added/changed:

Documentation added/updated for www.home-assistant.io

If the code communicates with devices, web services, or third-party tools:

The manifest file has all fields filled out correctly.
Updated and included derived files by running: python3 -m script.hassfest.
New or updated dependencies have been added to requirements_all.txt.
Updated by running python3 -m script.gen_requirements_all.
For the updated dependencies - a link to the changelog, or at minimum a diff between library versions is added to the PR description.
Untested files have been added to .coveragerc.

To help with the load of incoming pull requests:

I have reviewed two other open pull requests in this repository.

…than 100000 times - warns at 10000 - terminates at 100000 2024-04-07 02:37:57.336 WARNING (MainThread) [homeassistant.helpers.script] Until condition [{condition: template, value_template: Template<template=({{1 != 1}}) renders=20000>}] in script `loop` is looping more than 10000 times 2024-04-07 02:38:11.902 CRITICAL (MainThread) [homeassistant.helpers.script] Until condition [{condition: template, value_template: Template<template=({{1 != 1}}) renders=200000>}] in script `loop` terminated because it looping more than 100000 times fixes #115042

homeassistant/helpers/script.py

bdraco · 2024-04-07T13:49:31Z

almost 0400 here so will have to write tests tomorrow

bdraco · 2024-04-07T18:11:07Z

100000 is likely too high. I can still get a blue to crash before the protection kicks in

jbouwh · 2024-04-07T18:12:44Z

100000 is likely too high. I can still get a blue to crash before the protection kicks in

It also depends on what is done in each step

bdraco · 2024-04-07T18:44:27Z

100000 is likely too high. I can still get a blue to crash before the protection kicks in

It also depends on what is done in each step

Yes, and the amount of ram in the system. 100000 was fine because I have ~16GB of ram in my test env, but the blue only has ~4GB

bdraco · 2024-04-07T18:48:06Z

The system gets unstable at 27000 iterations and hard crashes at

Apr 07 18:47:15 homeassistant homeassistant[512]: 2024-04-07 08:46:11.930 INFO (MainThread) [homeassistant.components.automation.loop] loop: Repeating sequence: Iteration 54246

bdraco · 2024-04-07T18:51:30Z

Its also having trouble with the trace being so large.

I dropped it to a 25000 cap

bdraco · 2024-04-07T18:52:38Z

Now the system having trouble coming back up because traces are too large

homeassistant:/config/.storage# du -sch trace.saved_traces 
112.4M	trace.saved_traces
112.4M	total

bdraco · 2024-04-07T18:54:56Z

25000 is enough to keep the system stable if they only run it once. But since we store multiple traces, if it runs again, we still end up in bad place

bdraco · 2024-04-07T19:01:11Z

system crashes on shutdown writing traces now

bdraco · 2024-04-07T19:06:27Z

10000 nodes seems to be the most a trace can handle if we save 5x of them

bdraco · 2024-04-07T19:10:51Z

with 10000, the traces only get to

56.2M	trace.saved_traces
56.2M	total

bdraco · 2024-04-07T19:22:18Z

56.2M still seems like a lot for 5 traces, but it does stay stable on the blue now

bdraco · 2024-04-07T19:23:12Z

Even 10000 iterations seems like its pushing the limits of the system design, it does seem to be OK.

jbouwh · 2024-04-07T19:23:50Z

Even 10000 iterations seems like its pushing the limits of the system design, it does seem to be OK.

Look like a nice limit for a breaking change

bdraco · 2024-04-07T19:33:43Z

Usually, we don't tag breaking changes for backport but since the impact is the system hanging up and it does so ~153x faster in 2024.4.x, the condition can't become true soon enough to avoid the hang most of the time like it could in earlier versions.

This one seems worth making an exception for given the result is a hang of the full system. If someone disagrees feel free to untag the milestone. Removing the milestone could also be justified since the problem could still happen on older versions, but was much less likely.

jbouwh

LGTM,
Thnx @bdraco 👍

bdraco · 2024-04-07T21:02:48Z

thanks

…han 10000 times (#115110)

50494554524F · 2024-04-08T19:58:00Z

make it configurable + a default value

joostlek · 2024-04-08T20:39:50Z

home-assistant bot added breaking-change cla-signed core labels Apr 7, 2024

bdraco added 4 commits April 7, 2024 02:45

tweak

9bcc7cb

Merge branch 'dev' into terminate_until_while_more_than_100000

7e0460c

tweak

a1a1e03

tweak

4ff1484

jbouwh reviewed Apr 7, 2024

View reviewed changes

homeassistant/helpers/script.py Outdated Show resolved Hide resolved

homeassistant/helpers/script.py Outdated Show resolved Hide resolved

bdraco added 6 commits April 7, 2024 03:37

raise so it shows in trace

0a4379e

grammar

8965aea

grammar

622f1d6

order

359a5db

grammar

606fa2b

grammar

6c63f78

lower cap to 25000

e385af7

bdraco changed the title ~~Terminate until and while conditions in scripts if they execute more than 100000 times~~ Terminate until and while conditions in scripts if they execute more than 25000 times Apr 7, 2024

lower

645b706

bdraco changed the title ~~Terminate until and while conditions in scripts if they execute more than 25000 times~~ Terminate until and while conditions in scripts if they execute more than 10000 times Apr 7, 2024

bdraco added this to the 2024.4.2 milestone Apr 7, 2024

bdraco changed the title ~~Terminate until and while conditions in scripts if they execute more than 10000 times~~ Terminate scripts with until and while conditions that execute more than 10000 times Apr 7, 2024

coverage, fix count for while

ef76bdf

jbouwh approved these changes Apr 7, 2024

View reviewed changes

bdraco marked this pull request as ready for review April 7, 2024 21:02

bdraco requested a review from a team as a code owner April 7, 2024 21:02

bdraco merged commit 569f54d into dev Apr 7, 2024
38 checks passed

bdraco deleted the terminate_until_while_more_than_100000 branch April 7, 2024 21:02

frenck added the cherry-picked label Apr 8, 2024

frenck pushed a commit that referenced this pull request Apr 8, 2024

Terminate scripts with until and while conditions that execute more t…

19f3ef7

…han 10000 times (#115110)

frenck mentioned this pull request Apr 8, 2024

2024.4.2 #115186

Merged

ic-dev21 mentioned this pull request Apr 8, 2024

Super high CPU usage with 2024.4.1 #115072

Closed

github-actions bot locked and limited conversation to collaborators Apr 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Terminate scripts with until and while conditions that execute more than 10000 times #115110

Terminate scripts with until and while conditions that execute more than 10000 times #115110

bdraco commented Apr 7, 2024 •

edited

Loading

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

jbouwh commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

jbouwh commented Apr 7, 2024

bdraco commented Apr 7, 2024 •

edited

Loading

jbouwh left a comment

bdraco commented Apr 7, 2024

50494554524F commented Apr 8, 2024

joostlek commented Apr 8, 2024

Terminate scripts with until and while conditions that execute more than 10000 times #115110

Terminate scripts with until and while conditions that execute more than 10000 times #115110

Conversation

bdraco commented Apr 7, 2024 • edited Loading

Breaking change

Proposed change

Type of change

Additional information

Checklist

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

jbouwh commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

bdraco commented Apr 7, 2024

jbouwh commented Apr 7, 2024

bdraco commented Apr 7, 2024 • edited Loading

jbouwh left a comment

Choose a reason for hiding this comment

bdraco commented Apr 7, 2024

50494554524F commented Apr 8, 2024

joostlek commented Apr 8, 2024

bdraco commented Apr 7, 2024 •

edited

Loading

bdraco commented Apr 7, 2024 •

edited

Loading