Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IKEA double battery percentage for old/new firmware #2948

Merged

Conversation

TheJulianJES
Copy link
Collaborator

@TheJulianJES TheJulianJES commented Jan 29, 2024

Proposed change

This fixes the issue of some battery-powered IKEA devices having double the percentage reported.
The "double battery pct" cluster is still used on some devices which can have new or old firmware (where doubling may or may not be needed depending on the installed firmware version).

The fix would be simple if ZHA/zigpy cached the sw_build_id attribute when pairing/reconfiguring a device.
But that doesn't happen. So, we need to read the attribute once to cache it while the device is awake.
(That would also be required if we want to use sw_build_id with quirks v2, because we can't magically read an attribute of battery-powered devices that are already paired and asleep.)

The diff might look huge at first (but ignore the tests) and the fix is mostly somewhat simple actually.
-> Start by looking at _update_attribute()

The "fix"

The "fix" works as follows:

  • a battery percentage report comes in from the device
    • check sw_build_id attribute value from cache like:
      • if the value is known (e.g. 24.0.0), we split on . and check the first part, here: 24
      • when the value is 24 or greater, we do not double the percentage.
      • when the value is below 24, we double the percentage.
      • if the value is not known, double the percentage AND start a task to try and read sw_build_id in the background
  • if a task was started because of sw_build_id being unknown AND the read is successful:
    • and if the task returned that new firmware is installed (>= 24), we update the attribute cache again to correct the wrong doubled value and replace it with the correct one
    • and if the task returned that old firmware is installed, we do not do anything, as the value is already correct because we doubled it by default

Note: If the attribute read for sw_build_id is successful (or it's already known), no (further) read calls will ever be made.

EDIT: For (new) pairings or reconfigurations, sw_build_id is now also read on pairing/bind().

Additional information

Note part 1 / TODO:

I did not yet test if an IKEA device responds to attribute reads when it sends an attribute report with the remaining battery percentage. I assume it would, as it should be "awake" at that point, but I still have to test this.
(So far, I only wrote the unit tests to test this behavior.)

If, for some reason, it does not wake up to receive message in that case, I'd move the logic to read the sw_build_id attribute once (until it's in cache) to happen when a button is pressed (for a remote).
Because then, a remote is definitely ready to receive commands for a short period.

Seems to work as expected. A button push during longer inactivity of the remote also triggers an attribute report, thus causing us to read sw_build_id while the device is awake.

Note part 2:

Or, if we do not want this "read schedule logic" at all (for now), we can also just check the sw_build_id from attribute cache and if it's unknown, not schedule a read call.

This might cause weird behavior though if a user who previously manually read sw_build_id and has correct percentages then removes and re-pairs the device, only now to have doubled percentages (because sw_build_id isn't in the attribute cache).
(That's fixed by scheduling the read at that point.)

Note part 3 (EDIT):

We're now also reading sw_build_id during pairing/reconfiguration, so during bind().
This will not automatically fix the issue for existing users who currently have doubled percentages, but it will fix it if the device is paired again or reconfigured.
I'm hoping that the "task to schedule a read" will still work to fix existing installations automatically.
If it doesn't work, the "making users reconfigure their device to read sw_build_id and fix the issue"-approach is also fine I guess.

(I'm waiting for an attribute report from the IKEA remote for now to see if it also receives commands for a short period then.)

Initial approach

My initial approach did not re-update the attribute cache after a successful attribute read if the initial doubling was wrong.
The successful attribute read only updated the attribute cache, so only next time the device sends something, the percentage would be right. You can still see this approach in my initial commit.


cc @dmulcahey just to know if this approach/PR would be somewhat okay. I know it might not be ideal to have to read attributes in the background, but it's only done once (if successful) and we need the sw_build_id sooner or later anyway.

A similar-ish approach with the background tasks is used for some Tuya devices and EU Xiaomi plugs (although not for firmware checking).

Checklist

  • The changes are tested and work correctly
  • pre-commit checks pass / the code has been formatted using Black
  • Tests have been added to verify that the new code works

We always double the battery pct first if the firmware is unknown (for devices which have the doubling clusters, very new IKEA devices do not at all).
But if the firmware is unknown, we cause a read to be sent (as the device is hopefully awake).
If that read now returns a new firmware, we "undouble" the battery pct again, so it's correct.

With the previous commit, it would only be correct if the device sends another update.
With this commit, we have a short flash of the doubled battery pct, then go to the correct value IF the device responded to our read.
@TheJulianJES TheJulianJES added the bugfix This PR fixes a bug label Jan 29, 2024
Copy link

codecov bot commented Jan 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (e40f71a) 87.64% compared to head (b9bacbe) 87.67%.
Report is 1 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #2948      +/-   ##
==========================================
+ Coverage   87.64%   87.67%   +0.03%     
==========================================
  Files         297      297              
  Lines        9055     9079      +24     
==========================================
+ Hits         7936     7960      +24     
  Misses       1119     1119              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@TheJulianJES
Copy link
Collaborator Author

TheJulianJES commented Jan 29, 2024

"New": sw_build_id is now also read during bind(), no matter if it's already cached or not.
(we do it from the PowerConfig cluster, but that's fine. The Basic cluster isn't bound by ZHA and we also kind of only want to read this for device where we need it which exactly match where this power cluster class is used.)

The device is very likely to be awake when bind() is called and it also reads the attribute on (first) pairing.
For existing pairing, the "task approach" would still be needed. But if that somehow doesn't work out, we can at least keep the bind() part to read the attribute once and tell users to repair/reconfigure affected IKEA devices.

This is for reading the attribute on a new join/repairing/reconfiguration, or refreshing the attribute in the two later cases.
The `sw_build_id` attribute is not updated after zigpy performed an OTA update. It will still be in cache, so our existing code won't update it in that case, only a reconfiguration will with this code.
@MattWestb
Copy link
Contributor

If the controller is on current production firmware its Zigbee3 and have / using pull control.
Way not using the checkin the controller is doing to the coordinator and hocking it for reading the attribute. ZHA is getting it and putting the device in fast pull until sending fast pull pull end or timing out (from the time set by the system in the device). IKEA controllers is doing checkin every 50 - 55 minutes and is logged in HA so its working well.
Z2M have implanted some of this logic but i dont knowing if its working 100 % fir all things.
Its one of the best Zigbee 3 futures for end devices and shall working OK if implanting it well.
Also possible sending fast pull to the device for extending the time then doing "other things" (IKEA Dirigera is doing it then pairin devices).

@TheJulianJES TheJulianJES marked this pull request as ready for review January 29, 2024 21:55
@TheJulianJES
Copy link
Collaborator Author

TheJulianJES commented Jan 29, 2024

This seems to be working nicely.
So, this should automatically fix any issues for double percentages reported on newer firmware. No need to re-pair/reconfigure the device.

@TheJulianJES
Copy link
Collaborator Author

I've added another small commit to guard against possible future version formatting where sw_build_id might contain letters (we only expect numbers).
The firmware is handled as "new" in that case (so no double percentage), and a warning also gets printed.

Again, this should be good to go and if nobody has anything against this, I'll merge it tomorrow and release/bump quirks for the next HA beta.

@MattWestb
Copy link
Contributor

I can normally testing 90% of IKEA controllers with current production firmware (and reversing to old if needed with SWD flashing if very needed) but im 810 Km from my systems so cant doing testing and im going back in one week if all is going well.

Great work done !!

@ProfDrYoMan
Copy link

Just a side note: This version is running on my side since the release and still I see double battery.
I also tried to query that sw_build_id manually and also tried to rebind the device(s) (not deleting them before).

Still same issue. Will wait for some more time and if it does not change will report in a new ticket with logs.
What type of loggings should be done and included?

@TheJulianJES
Copy link
Collaborator Author

Were you able to read sw_build_id successfully? (Remotes need to be waken up by pressing some buttons whilst trying to read the attribute).

Please open a new issue and provide the diagnostic information for the affected devices.
Tag me in that issue (@TheJulianJES).

@ProfDrYoMan
Copy link

Yes. I could read with success with the help of a press on the button(s).

And rebinding is anyhow completely waking the device.

Since you are so quick answering I will try to get you the diagnostics this evening. :)

@TheJulianJES
Copy link
Collaborator Author

Nice. New issue would indeed be good for that, but do you roughly remember the version that it showed off the top of your head? (e.g. 2.3.075 or 24.4.5)

And which device is affected?

lgraf pushed a commit to lgraf/zha-device-handlers that referenced this pull request May 6, 2024
We always double the battery pct first if the firmware is unknown (for devices which have the doubling clusters, very new IKEA devices do not at all).
But if the firmware is unknown, we cause a read to be sent (as the device is hopefully awake).
If that read now returns a new firmware, we "undouble" the battery pct again, so it's correct.

We also always read `sw_build_id` on `bind()`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugfix This PR fixes a bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants