Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Total fuel burned not updating #45

Closed
3 of 10 tasks
svenlange2 opened this issue Nov 11, 2023 · 14 comments
Closed
3 of 10 tasks

Total fuel burned not updating #45

svenlange2 opened this issue Nov 11, 2023 · 14 comments
Labels
bug Something isn't working

Comments

@svenlange2
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

I'm having the following issue:

Sicen last update the Total fuel runed has stopped changing on ecomax 920 device. I just ofund out that the graph im having on HA is flat. Restarting HS didnt help.

I have following devices connected:

  • ecoMAX 3xx series
  • ecoMAX 8xx series
  • ecoMAX 9xx series
  • Expansion module B
  • Expansion module C
  • ecoSTER 200/ecoSTER Touch
  • ecoLAMBDA
  • ecoNET 300

I'm connecting to my devices using:

Ethernet/WiFi to RS-485 converter

I'm seeing following log messages:

No response

My diagnostics data:

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@svenlange2 svenlange2 added the bug Something isn't working label Nov 11, 2023
@denpamusic
Copy link
Owner

denpamusic commented Nov 11, 2023

Hi,

Thank you for the feedback!

That's weird, since this sensor is not calculated by EM device. Instead it's platform independent and is calculated by the PyPlumIO and the integration, so if it runs for me, it should run for you. There was no change to this logic since forever...

These are steps that can be taken to troubleshoot the issue:

  • Does your fuel consumption sensor (in kg/h) work? Total fuel burned calculator is based on it's value.
  • Is there messages like "Skipping outdated fuel consumption data, was (number) seconds old" in your log?
  • When boiler is running, download integration diagnostics data a look for a fuel_burned key. It should have non-zero value.
  • Visit Developer Tools -> Statistics, search for "fuel_total_burned" sensor and ensure that there's no problems reported.
  • (optional) Use plum_ecomax.calibrate_meter service in Developer Tools -> Services to manually set a value for the total_fuel_burned sensor. Value should be correctly set and no errors reported.

This will help you to pin the issue to specific component. Then we can decide on how to resolve it.

@svenlange2
Copy link
Author

Good that you mentioned its calculated in the plugin. And now as i hasve reset the HA I see it started to udate again. Problably something was not working between the restarts (it was ~2 weeks). The fuel consumption data that i derived from fuel level variable was updating as normal in between the restarts. I did not monitor fuel-burnt - so i dont know how that behaved. Thank you for the insight anyway. As it works now - case closed :)

@denpamusic
Copy link
Owner

denpamusic commented Nov 11, 2023

Thanks for heads up!

Great thinking with deriving burned fuel from fuel level btw. I was thinking about doing this, but ultimately decided against it, since it would require asking users how many kilos of fuel is 100%. Instead it is calculated by adding (current_fuel_consumption * seconds_passed_since_last_received_consumption_data) / 3600 to the total counter. Not very accurate method, I presume, since it deals with floats, but doesn't require any input from the user.

@svenlange2
Copy link
Author

It is broken again. It seems to be related to random disconnects of the rs485 dongle or reboots of the ecomax920 device itself. I dont know which as I only find out much later that the value has stopped updating. I know that i did power on/off cycle once for the device and now 5 days later i see that the value has been flat for 5 days on graph. I suggest to remove any checks on the logic calculating the value and take derivative in as fuel furn rate is detected. Last winter the deivce worked flawlessly for 5 months without any of such problems and i did numerous power cycles.

@denpamusic
Copy link
Owner

denpamusic commented Nov 16, 2023

Strange thing is there isn't any checks done by the integration. We simply extend RestoreSensor from HomeAssistant core with one function that adds value and nothing else:

self._attr_native_value += value

I believe the issue is probably introduced by one of the new HASS releases. There was no changes to fuel logic for well over the year in this integration or PyPlumIO.

I'll try to find an issue with RestoreSensor, but sadly chances are low, since it never happens to me even after power outages :(

edit:
It seems I was wrong, last changes to fuel logic were on February 12, so it less than a year ago. These changes were limited to converting time to nanoseconds by multiplying everything by 1000000000 to avoid float calculations as much as possible.

One other suspect is actually aggregate filter, that is used to collect data for thirty seconds before passing it to HASS...

@denpamusic denpamusic reopened this Nov 16, 2023
@denpamusic
Copy link
Owner

Could you please still provide diagnostics data when the sensor is broken. It will greatly help to eliminate at least some possibilities. Thanks beforehand!

@svenlange2
Copy link
Author

What kind of diagnostics data ? HASS log or something else ? (I need to catch it when its broken - i dont have it now as I restarted)

@denpamusic
Copy link
Owner

The diagnostics, obtained via HomeAssistant's built-in component. Please follow steps below to obtain it, once sensor is broken. You can attach it via comments to this issue, or email it to me via [email protected]. You can also open file with any text editor to review and remove any information you don't like me to see, although sensitive information like "UID" and device's IP address are already redacted.

  1. Open the integrations page in HASS and click the integration name "Plum ecoMAX".

Screenshot showing one element of integration panel. Element is named "Plum ecoMAX"

  1. Click three dots in the upper right corner and select "Download diagnostics".

Modal menu, showing options for "Plum ecoMAX" integration. "Download diagnostics" is highlighted

Thanks again for you help!
Hopefully we will be able to figure out what's going on with this sensor.

@svenlange2
Copy link
Author

config_entry-plum_ecomax-6b4fb62816ce81405437142431183f73.json.txt

This is the current state. I dont know if it makes sense to upload/send it again when it becomes broken ?

@denpamusic
Copy link
Owner

denpamusic commented Nov 16, 2023

Thank you!

Please do upload it again when it becomes broken.

See those values from your diagnostics below. fuel_consumption is your current boiler fuel consumption in kg/h, fuel_burned is difference counter. It contains the amount of fuel in kilograms since PyPlumIO last received fuel_consumption value (this frame is being sent about once per second or so). As you can see, your boiler managed to burn about 1.3 grams of fuel during this period. These tiny value are being aggregated for 30 seconds and then sent to HASS, hence Total Burned Fuel counter in HASS updates once every 30 seconds.

If Total Burned Fuel in HASS breaks and we see 0 or null for fuel_burned in diagnostic data, we'll know for sure that the issue is with PyPlumIO or EM controller and not with integration or HASS itself. So we eliminate half of possible failure points just from this.

{
      "fuel_burned": 0.0013145052387522774,
      "fuel_consumption": 1.7245032787322998,
}

denpamusic added a commit that referenced this issue Jan 5, 2024
Ensure, that connection is properly closed when retrying via async_setup_entry call, otherwise connection task gets cacelled by the HASS with timeout, which in turn breaks some stuff (notably burned fuel counter).

Related to: #45
@denpamusic
Copy link
Owner

denpamusic commented Jan 5, 2024

I've probably found a root cause for this issue by accident during power outage...

When HomeAssistant (HASS) starts up and loads the integration, it calls async_setup_entry(), which tries to setup the integration and establish the connection. This mechanism retries async_setup_entry() call if previous one raises ConfigEntryNotReady exception. This behavior conflicts with PyPlumIO's own re-connection logic, which tries to reconnect every 20 seconds on any connection failure. As PyPlumIO doesn't know anything about HASS'es retry attempt, it keeps trying to reconnect every 20 seconds and becomes a stale global task, which gets cancelled by HASS leading to incomplete setup.

This only happens, if HASS fails to connect to the converter during initial integration setup (i. e. after HASS restart), if connection to the converter is lost at any time after the setup is done, the issue doesn't occur.

We'll fix it by closing PyPlumIO connection when raising ConfigEntryNotReady exception from async_setup_entry(). This way PyPlumIO wouldn't attempt to reconnect by itself after setup has failed, leaving this task to the HASS.

homeassistant.log
  File "/usr/local/lib/python3.11/site-packages/pyplumio/connection.py", line 74, in _connect
  reader, writer = await self._open_connection()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyplumio/helpers/timeout.py", line 21, in wrapper
  return await asyncio.wait_for(func(*args, **kwargs), timeout=seconds)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/tasks.py", line 489, in wait_for
  return fut.result()
         ^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyplumio/connection.py", line 163, in _open_connection
  return await asyncio.open_connection(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/streams.py", line 48, in open_connection
  transport, _ = await loop.create_connection(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1045, in create_connection
  infos = await self._ensure_resolved(
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 1419, in _ensure_resolved
  return await loop.getaddrinfo(host, port, family=family, type=type,
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/asyncio/base_events.py", line 867, in getaddrinfo
  return await self.run_in_executor(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/concurrent/futures/thread.py", line 58, in run
  result = self.fn(*self.args, **self.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/socket.py", line 962, in getaddrinfo
  for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
socket.gaierror: [Errno -2] Name does not resolve

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/pyplumio/connection.py", line 86, in _reconnect
  return await self._connect()
         ^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/pyplumio/connection.py", line 81, in _connect
  raise ConnectionFailedError from connection_error
pyplumio.exceptions.ConnectionFailedError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/src/homeassistant/homeassistant/config_entries.py", line 406, in async_setup
  result = await component.async_setup_entry(hass, self)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/config/custom_components/plum_ecomax/__init__.py", line 79, in async_setup_entry
  await connection.async_setup()
File "/config/custom_components/plum_ecomax/connection.py", line 141, in async_setup
  await self.connect()
File "/usr/local/lib/python3.11/site-packages/pyplumio/connection.py", line 107, in connect
  await self._reconnect()
File "/usr/local/lib/python3.11/site-packages/pyplumio/connection.py", line 88, in _reconnect
  await self._connection_lost()
File "/usr/local/lib/python3.11/site-packages/pyplumio/connection.py", line 88, in _reconnect
  await self._connection_lost()
File "/usr/local/lib/python3.11/site-packages/pyplumio/connection.py", line 88, in _reconnect
  await self._connection_lost()
[Previous line repeated 8 more times]
File "/usr/local/lib/python3.11/asyncio/tasks.py", line 649, in sleep
  return await future
         ^^^^^^^^^^^^
asyncio.exceptions.CancelledError: Global task timeout

@svenlange2
Copy link
Author

I see this in log now:

2024-01-17 14:32:41.245 WARNING (MainThread) [pyplumio.devices.ecomax] Skipping outdated fuel consumption data, was 247675 seconds old

15 minute gap in RS485 dongle connection to the Ecomax920 device. The seconds value is big as it happened some days ago already.

@denpamusic
Copy link
Owner

denpamusic commented Jan 17, 2024

Hi,

Thanks for the feedback!

I've fixed this issue in 4e03ad2

Basically due to an error, integration creates multiple ecoMAX objects on reconnect, which blocks fuel counter from being updated correctly as all instances try to call same callback at the same time.

This fix will be released as part of v0.4.2.

edit.
There's no ETA on the release yet, since it will include a biggest change yet - adding a menu to create custom entities via UI: #58

If you don't want to wait, you can safely install and use "Main" branch it includes this fix, but requires at least HASS 2024.1.0.

@denpamusic
Copy link
Owner

denpamusic commented Feb 12, 2024

Hi again,

I found another issue with the fuel consumption tracker itself.

As I though, I just needed a bit of distraction from this project to find it...
It turns out it was really simple issue.
The 4e03ad2 partially fixed this, by initiating full reconnect when setting up the integration, however if integration could recover without reconnecting the issue can still occur.

Previously when the consumption tracker encountered outdated fuel consumption data, it skipped processing it, which is good, but it also skipped setting a new current timestamp, which means that it has marked any frames received after received the outdated data, as outdated as well.

Detailed example of the issue below:

current_timestamp = 0, frame_timestamp = 10, difference = 10 -> (processing fuel data, 10 seconds old)
current_timestamp = 10, frame_timestamp = 20, difference = 10 -> (processing fuel data, 10 seconds old)
current_timestamp = 20, frame_timestamp = 320, difference = 300 (outdated fuel data >= 300 seconds, skipping)
On the next frames the issue happens, due to not updating timestamp in previous step.
current_timestamp = 20 (should've been 320), frame_timestamp = 330, difference = 310 (fuel data incorrectly marked as outdated, despite it being only 10 seconds old)
current_timestamp = 20 (should've been 330), frame_timestamp = 340, difference = 320 (fuel data incorrectly marked as outdated, despite it still being only 10 seconds old)
and so on...

This has now been completely fixed in denpamusic/PyPlumIO@79970c3 with modification to the test case, so that it doesn't ever happen again.

I'll release v0.4.2 as minor that will incorporate this fix tomorrow, moving custom entities UI feature to v0.5.0.

Sorry for all the inconvenience caused, this was embarrassingly hard to track :(

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants