Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hassio does not load correctly #21774

Closed
daenny opened this issue Mar 7, 2019 · 22 comments · Fixed by #21794
Closed

Hassio does not load correctly #21774

daenny opened this issue Mar 7, 2019 · 22 comments · Fixed by #21794

Comments

@daenny
Copy link
Contributor

daenny commented Mar 7, 2019

Home Assistant release with the issue:
0.89.0

Last working Home Assistant release (if known):
0.88.2 (partly)

Operating environment (Hass.io/Docker/Windows/etc.):
Hassio on Ubuntu host

Component/platform:
Maybe related to:
#21569

Description of problem:
Hassio frontend does not come up. Even multiple restarts do not solve it.

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

http:
#  api_password: !secret http_password
  ssl_certificate: !secret ssl_certificate
  ssl_key: !secret ssl_key
  server_port: 8123
  base_url: something.dyndns.org:443
  trusted_networks:
    - 127.0.0.1
    - 192.168.2.0/24
    - 172.30.32.0/24
  ip_ban_enabled: True
  login_attempts_threshold: 5

Traceback (if applicable):

019-03-07 23:46:32 ERROR (MainThread) [homeassistant.core] Error doing job: SSL error errno:1 reason: KRB5_S_INIT
Traceback (most recent call last):
  File "uvloop/sslproto.pyx", line 504, in uvloop.loop.SSLProtocol.data_received
  File "uvloop/sslproto.pyx", line 204, in uvloop.loop._SSLPipe.feed_ssldata
  File "uvloop/sslproto.pyx", line 189, in uvloop.loop._SSLPipe.feed_ssldata
  File "/usr/local/lib/python3.7/ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2605)
2019-03-07 23:46:33 INFO (MainThread) [homeassistant.components.device_tracker.asuswrt] Checking Devices
2019-03-07 23:46:43 INFO (MainThread) [homeassistant.components.automation] Executing Ambilight On With Switch
2019-03-07 23:46:43 INFO (MainThread) [homeassistant.helpers.script] Script Ambilight On With Switch: Running script
2019-03-07 23:46:43 INFO (MainThread) [homeassistant.helpers.script] Script Ambilight On With Switch: Executing step call service
2019-03-07 23:46:44 INFO (MainThread) [homeassistant.components.device_tracker.asuswrt] Checking Devices
2019-03-07 23:46:44 INFO (MainThread) [homeassistant.components.automation] Executing Alarm Off Schuuur
2019-03-07 23:46:44 INFO (MainThread) [homeassistant.helpers.script] Script Alarm Off Schuuur: Running script
2019-03-07 23:46:44 INFO (MainThread) [homeassistant.helpers.script] Script Alarm Off Schuuur: Executing step call service
2019-03-07 23:46:46 INFO (MainThread) [homeassistant.components.device_tracker.asuswrt] Checking Devices
2019-03-07 23:46:49 WARNING (MainThread) [homeassistant.components.binary_sensor] Updating zoneminder binary_sensor took longer than the scheduled update interval 0:00:30
2019-03-07 23:46:49 WARNING (MainThread) [homeassistant.components.switch] Updating zoneminder switch took longer than the scheduled update interval 0:00:30
2019-03-07 23:46:50 WARNING (MainThread) [homeassistant.components.light] Updating hyperion light took longer than the scheduled update interval 0:00:30
2019-03-07 23:46:51 WARNING (MainThread) [homeassistant.components.light] Updating yeelight light took longer than the scheduled update interval 0:00:30
2019-03-07 23:46:51 WARNING (MainThread) [homeassistant.components.sensor] Updating zoneminder sensor took longer than the scheduled update interval 0:00:30
2019-03-07 23:46:51 WARNING (MainThread) [homeassistant.components.camera] Updating zoneminder camera took longer than the scheduled update interval 0:00:30
2019-03-07 23:46:54 INFO (SyncWorker_5) [homeassistant.components.sensor.miflora] Polling error 
2019-03-07 23:46:54 INFO (MainThread) [homeassistant.components.automation] Initialized trigger Bedroom: Toggle lights with button
2019-03-07 23:46:54 ERROR (MainThread) [homeassistant.setup] Error during setup of component hassio
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/homeassistant/setup.py", line 151, in _async_setup_component
    hass, processed_config)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/components/hassio/__init__.py", line 175, in async_setup
    user = await hass.auth.async_get_user(data['hassio_user'])
  File "/usr/local/lib/python3.7/site-packages/homeassistant/auth/__init__.py", line 114, in async_get_user
    return await self._store.async_get_user(user_id)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/auth/auth_store.py", line 69, in async_get_user
    await self._async_load()
  File "/usr/local/lib/python3.7/site-packages/homeassistant/auth/auth_store.py", line 277, in _async_load
    self._store.async_load(),
concurrent.futures._base.CancelledError

I have a automation which restarts nodered add-on 60 seconds after home_assistant start. This is also not done correctly.

2019-03-07 23:49:54 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/homeassistant/helpers/script.py", line 131, in async_run
    await self._handle_action(action, variables, context)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/helpers/script.py", line 210, in _handle_action
    action, variables, context)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/helpers/script.py", line 299, in _async_call_service
    context=context
  File "/usr/local/lib/python3.7/site-packages/homeassistant/helpers/service.py", line 85, in async_call_from_config
    domain, service_name, service_data, blocking=blocking, context=context)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/core.py", line 1110, in async_call
    raise ServiceNotFound(domain, service) from None
homeassistant.exceptions.ServiceNotFound: (ServiceNotFound(...), 'Service hassio.addon_restart not found')
2019-03-07 23:50:02 INFO (MainThread) [homeassistant.components.device_tracker.asuswrt] Checking Devices
2019-03-07 23:50:02 INFO (MainThread) [homeassistant.components.device_tracker.asuswrt] Checking Devices
2019-03-07 23:50:12 WARNING (MainThread) [homeassistant.components.binary_sensor] Updating zoneminder binary_sensor took longer than the scheduled update interval 0:00:30
2019-03-07 23:50:12 WARNING (MainThread) [homeassistant.components.switch] Updating zoneminder switch took longer than the scheduled update interval 0:00:30
2019-03-07 23:50:12 WARNING (MainThread) [homeassistant.components.light] Updating hyperion light took longer than the scheduled update interval 0:00:30
2019-03-07 23:50:13 WARNING (MainThread) [homeassistant.components.light] Updating yeelight light took longer than the scheduled update interval 0:00:30
2019-03-07 23:50:13 WARNING (MainThread) [homeassistant.components.sensor] Updating zoneminder sensor took longer than the scheduled update interval 0:00:30
2019-03-07 23:50:13 WARNING (MainThread) [homeassistant.components.camera] Updating zoneminder camera took longer than the scheduled update interval 0:00:30
2019-03-07 23:50:14 INFO (MainThread) [homeassistant.components.device_tracker.asuswrt] Checking Devices
2019-03-07 23:50:14 INFO (MainThread) [homeassistant.components.device_tracker.asuswrt] Checking Devices
2019-03-07 23:50:24 INFO (SyncWorker_1) [homeassistant.components.sensor.miflora] Polling error 
2019-03-07 23:50:24 INFO (SyncWorker_5) [homeassistant.components.sensor.miflora] Polling error Could not read data from Mi Flora sensor C4:7C:8D:65:DA:C8
2019-03-07 23:50:24 INFO (SyncWorker_4) [homeassistant.components.sensor.miflora] Polling error Could not read data from Mi Flora sensor C4:7C:8D:65:DA:C8
2019-03-07 23:50:24 INFO (SyncWorker_9) [homeassistant.components.sensor.miflora] Polling error Could not read data from Mi Flora sensor C4:7C:8D:65:DA:C8
2019-03-07 23:50:24 INFO (MainThread) [homeassistant.components.automation] Initialized trigger Light Trigger Both attic Light
2019-03-07 23:50:24 ERROR (MainThread) [homeassistant.setup] Error during setup of component hassio
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/homeassistant/setup.py", line 151, in _async_setup_component
    hass, processed_config)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/components/hassio/__init__.py", line 175, in async_setup
    user = await hass.auth.async_get_user(data['hassio_user'])
  File "/usr/local/lib/python3.7/site-packages/homeassistant/auth/__init__.py", line 114, in async_get_user
    return await self._store.async_get_user(user_id)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/auth/auth_store.py", line 69, in async_get_user
    await self._async_load()
  File "/usr/local/lib/python3.7/site-packages/homeassistant/auth/auth_store.py", line 277, in _async_load
    self._store.async_load(),
concurrent.futures._base.CancelledError

Additional information:

@awarecan
Copy link
Contributor

awarecan commented Mar 8, 2019

As my research in #21619

Other helpers.storage based solution also need check potential issues. Not everyone will throw exception though, for example, auth_store handle is fine, it may load data from harddisk twice if there were two get_groups() called at same time, but no exception will throw. So I didn't change it in this pull request.

It looks like I got the conclusion too soon for your case. I will look at it tonight.

If possible, @daenny could you check if you have a log entry for hassio begin to setup? How long it take till the CancelledError occurred.

@awarecan
Copy link
Contributor

awarecan commented Mar 8, 2019

Could you share your full DEBUG level log to me? You can ping me at discord channel.

@daenny
Copy link
Contributor Author

daenny commented Mar 8, 2019

Hi, thanks for looking into it. I see the warning the setup of Hassio is taking longer than 10 seconds. I can have a look if I can get the debug logs tonight. Any components in particular or the complete debug log?

@awarecan
Copy link
Contributor

awarecan commented Mar 8, 2019

I feel that something special on your system caused dead lock in the hassio component setup. I opened a pr #21794 to fix the auth_store race condition, but I am not sure this will fix you root cause, so a full debug log will be helpful.

@ghitchen
Copy link

ghitchen commented Mar 8, 2019

This looks similar to #21811.

19-03-08 17:46:10 INFO (SyncWorker_16) [hassio.docker.homeassistant] Start homeassistant homeassistant/raspberrypi3-homeassistant with version 0.89.1
19-03-08 17:56:10 WARNING (MainThread) [hassio.homeassistant] Don't wait anymore of Home Assistant startup!
19-03-08 17:56:10 CRITICAL (MainThread) [hassio.homeassistant] HomeAssistant update fails -> rollback!
19-03-08 17:56:10 INFO (MainThread) [hassio.homeassistant] Update Home Assistant to version 0.88.2

@awarecan
Copy link
Contributor

awarecan commented Mar 8, 2019

@ghitchen they are totally different

@ghost ghost removed the in progress label Mar 8, 2019
@awarecan
Copy link
Contributor

awarecan commented Mar 8, 2019

@daenny let me know if it can be resolved by #21794

@naamah75
Copy link

naamah75 commented Mar 9, 2019

Same problem for me.
Virtual Machine with 16GB VHD and HASSos 2.10.

@daenny
Copy link
Contributor Author

daenny commented Mar 10, 2019

@awarecan sadly I think this does not solve my problem.

To test the changes, I did the following:

  • login to the docker from my ubuntu host with docker exec -it homeassistant /bin/bash
  • wget the changed auth_store.py to the /usr/local/xxx/site-packages/homeassistant/auth folder
  • commit the docker container as homeassistant:latest image and restart
  • I verified that the newly started container contains the changed auth_store.py.
    That should be sufficient for testing right?

I also disabled the mosquitto broker and node-red plugin for testing. Since I though that those might be the problem as I think it heavily query the auth_store (I use authenticated mosquitto messages)

@awarecan
Copy link
Contributor

awarecan commented Mar 10, 2019

We only load auth_store from file system once. Write it back every minute if there are any changes.

What is your current error message?

@awarecan awarecan reopened this Mar 10, 2019
@daenny
Copy link
Contributor Author

daenny commented Mar 10, 2019

Same error. CancelledError.
I think it is some race condition, when I enable Debug logging it works more often than with Info logging.

@awarecan
Copy link
Contributor

awarecan commented Mar 10, 2019

I think the CancelledError might be caused by the hassio component not be able to finish setup in a long time. Maybe because your zwave network is starting up at the same time. zwave writes very verbose debug log, which will slow down the already struggled system more.

@daenny
Copy link
Contributor Author

daenny commented Mar 10, 2019

I pinged you on discord, my zwave network is very small, only 4 nodes. On a successful start I see a lot of these messages:

2019-03-10 22:02:48 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:48 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:49 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:49 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:49 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:49 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:50 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:50 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:51 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:51 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:52 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:52 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:53 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:53 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:53 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:53 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:54 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:54 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:55 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:55 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:56 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:56 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:56 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:56 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:57 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:57 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:58 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:58 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:02:59 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:02:59 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)
2019-03-10 22:03:00 INFO (MainThread) [homeassistant.components.http.view] Serving /api/ to 172.30.32.2 (auth: True)
2019-03-10 22:03:00 INFO (MainThread) [homeassistant.components.http.view] Serving /api/hassio_auth to 172.30.32.2 (auth: True)

The IP is of the supervisor. They are multiple per seconds.

@daenny
Copy link
Contributor Author

daenny commented Mar 10, 2019

Behaviour is better for me, when I disable the plugins of node-red and zigbee2mqtt. I start them with an automation 60s after HA started and stop them on shutdown. It still does not always work but at least success rate of restarts is now at ca 50%:

- id: addon-restart
  alias: "Addons (re)-start"
  initial_state: True
  trigger:
    platform: homeassistant
    event: start
  action:
  - delay:
      seconds: 60
  - service: hassio.addon_restart
    data:
      addon: a0d7b954_nodered
  - service: hassio.addon_restart
    data:
      addon: 7ad98f9c_zigbee2mqtt
- id: addon-stop
  alias: "Addons stop"
  initial_state: True
  trigger:
    platform: homeassistant
    event: shutdown
  action:
  - service: hassio.addon_stop
    data:
      addon: a0d7b954_nodered
  - service: hassio.addon_stop
    data:
      addon: 7ad98f9c_zigbee2mqtt

@awarecan
Copy link
Contributor

@daenny do you have hassio: in your configuration.yaml? Could you add it and try again?

@awarecan
Copy link
Contributor

awarecan commented Mar 11, 2019

Alright, I think I found the root cause.

There are several issues actually, but in short, put hassio: into configuration.yaml will allow HA load and set up hassio component during the earlier phase of system start up.

hassio has a feature to allow user don't put hassio: in the configuration.yaml, but be discovered after system startup. However if there are too many tasks concurrently running (lots of sensor polling in your case), the hassio set up might be blocked then failed later. (EDIT: I think the root cause should be the race condition in miflora lib caused deadlock of all then threads in your thread pool, see my finding in #21925)

Another way is to increase the size of thread pool. You should have 10 threads in your HA's task execution pool. I guess you will get rid of the problem by increase max_works, say 20, e.g. change
https://github.com/home-assistant/home-assistant/blob/4e7dbf9ce5e5e2eb471fb289048a17ee0d7ccc02/homeassistant/core.py#L134
to

       executor_opts = {'max_workers': 20}  # type: Dict[str, Any]

@awarecan
Copy link
Contributor

You can also disable the problematic Miflora sensor.

@daenny
Copy link
Contributor Author

daenny commented Mar 12, 2019

Thanks for you deep insights. I am currently away from my system until the weekend. It's currently running, so I will keep it that way until I am back. I will when I am back with your settings, i.e. adding hassio: and removing the miflora sensors.
I was planning to remove the miflora sensors from HA and use a esp32 with esphome for those, as I they are barely in the bluetooth range of my server. So now, I have another reason to do this.

EDIT: And as I have a dual-core CPU I have 10 threads, as I am using 2 sensors, those indeed will completely exhaust all available threads...

@awarecan
Copy link
Contributor

awarecan commented Mar 19, 2019

@daenny can you try to apply my fix in #22149

It is a one line core change, I want to get some validation before merge it.

You can ignore all those test change when you apply it.

@daenny
Copy link
Contributor Author

daenny commented Mar 19, 2019

No problem. I will have a look tonight.

@awarecan
Copy link
Contributor

Just ignore it, my PR has a problem, need redo.

@daenny
Copy link
Contributor Author

daenny commented Mar 25, 2019

Ok. Adding ’hassio:’ to the config resolves the problem of not booting up. With that I can reliable restart even with zigbee2mqtt and node red enabled. I also updated to v0.90.1.
Thanks again for all your help. The miflora problem remains. So I will close this issue as the other one is the root cause.

@daenny daenny closed this as completed Mar 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants