Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in lutron.py -- Hass.io loses ability to control LutronRA2 entities #20348

Closed
grantalewis opened this issue Jan 23, 2019 · 145 comments · Fixed by #26815
Closed

Error in lutron.py -- Hass.io loses ability to control LutronRA2 entities #20348

grantalewis opened this issue Jan 23, 2019 · 145 comments · Fixed by #26815

Comments

@grantalewis
Copy link

Home Assistant release with the issue:
0.85.1

Last working Home Assistant release (if known):
0.84.6

Operating environment (Hass.io/Docker/Windows/etc.):
RPi3, Hass.io

Component/platform:
https://www.home-assistant.io/components/lutron/

Description of problem:

Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/service.py", line 287, in _handle_service_platform_call
    await getattr(entity, func)(**data)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/lutron.py", line 73, in turn_off
    self._lutron_device.level = 0
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 591, in level
    Output._ACTION_ZONE_LEVEL, "%.2f" % new_level)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 392, in send
    self._conn.send(op + out_cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 91, in send
    self._send_locked(cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 81, in _send_locked
    self._telnet.write(cmd.encode('ascii') + b'\r\n')
  File "/usr/local/lib/python3.6/telnetlib.py", line 290, in write
    self.sock.sendall(buffer)
TimeoutError: [Errno 110] Operation timed out

Problem-relevant configuration.yaml entries and (fill out even if it seems unimportant):

lutron:
  host: 192.168.1.225
  username: lutron
  password: integration

Traceback (if applicable):


Additional information:

@grantalewis
Copy link
Author

grantalewis commented Jan 26, 2019

Still occurring in 0.86.1

ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/service.py", line 287, in _handle_service_platform_call
    await getattr(entity, func)(**data)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/lutron.py", line 73, in turn_off
    self._lutron_device.level = 0
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 591, in level
    Output._ACTION_ZONE_LEVEL, "%.2f" % new_level)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 392, in send
    self._conn.send(op + out_cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 91, in send
    self._send_locked(cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 81, in _send_locked
    self._telnet.write(cmd.encode('ascii') + b'\r\n')
AttributeError: 'NoneType' object has no attribute 'write'

@cdheiser
Copy link
Contributor

This sounds like thecynic/pylutron#17

@cdheiser
Copy link
Contributor

I've created thecynic/pylutron#23 to patch the pylutron library and hopefully resolve this bug. Once that patch is accepted, we still need to wait for a new version to get published and update home-assistant to use the new version.

@grantalewis
Copy link
Author

Still seeing this in 0.87.1

Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/helpers/service.py", line 289, in _handle_service_platform_call
    await getattr(entity, func)(**data)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/lutron.py", line 73, in turn_off
    self._lutron_device.level = 0
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 591, in level
    Output._ACTION_ZONE_LEVEL, "%.2f" % new_level)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 392, in send
    self._conn.send(op + out_cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 91, in send
    self._send_locked(cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 81, in _send_locked
    self._telnet.write(cmd.encode('ascii') + b'\r\n')
AttributeError: 'NoneType' object has no attribute 'write'

(Apologies if my attempts to get this noticed are in the wrong place. I'm surprised this isn't getting much uptake. Is no one else is seeing this problem?)

@JonGilmore
Copy link
Contributor

I'll not seeing this error. Can you describe the behavior in more detail? Does it work at all? Does it stop working?

@grantalewis
Copy link
Author

I'll not seeing this error. Can you describe the behavior in more detail? Does it work at all? Does it stop working?

After rebooting things seem to be OK for a short period of time -- maybe as much as 30 minutes. Then on/off commands from Hass.io start failing. There's a second or two of hesitation when clicking a light toggle, no result, and then the toggle returns to its prior position. Checking the logs shows the above error.

@cdheiser
Copy link
Contributor

Do you reprogram or make other changes to your RadioRA setup at all? I've only experienced the behavior you describe when updating the programming in the main repeater, or otherwise suffer some odd network problem.

I know it's annoying, and I'd love to get this fixed, but right now it's at the mercy of the pylutron maintainer. We could consider forking the library, or asking the maintainer if they would consider adding additional people to accept/reject pull requests.

@grantalewis
Copy link
Author

@cdheiser I really do appreciate the feedback and willingness to help. Yes, I make changes to my setup fairly frequently, but before now it seemed that cold-restarting everything usually resolved any issues. This problem does seem different.

Yesterday I did a completely new setup of HA 0.87.1 on a newly formatted SD card. It ran well for a few hours, but by bedtime, toggling lights was beginning to flake out again. Then by this morning the toggles were completely unresponsive. Slightly different symptom: the toggle would stay in the ON position, but the Lutron entity was unaffected.

The logs show this familiar info:

2019-02-17 06:53:07 ERROR (MainThread) [homeassistant.components.websocket_api.http.connection.1875282512] Error handling message: {'type': 'call_service', 'domain': 'light', 'service': 'turn_on', 'service_data': {'entity_id': 'light.main_stairs_main_hall'}, 'id': 16}
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/websocket_api/decorators.py", line 17, in _handle_async_response
    await func(hass, connection, msg)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/websocket_api/commands.py", line 148, in handle_call_service
    connection.context(msg))
  File "/usr/local/lib/python3.6/site-packages/homeassistant/core.py", line 1130, in async_call
    self._execute_service(handler, service_call))
  File "/usr/local/lib/python3.6/site-packages/homeassistant/core.py", line 1152, in _execute_service
    await handler.func(service_call)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/__init__.py", line 287, in async_handle_light_on_service
    await light.async_turn_on(**pars)
  File "/usr/local/lib/python3.6/concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/local/lib/python3.6/site-packages/homeassistant/components/light/lutron.py", line 69, in turn_on
    self._lutron_device.level = to_lutron_level(brightness)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 591, in level
    Output._ACTION_ZONE_LEVEL, "%.2f" % new_level)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 392, in send
    self._conn.send(op + out_cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 91, in send
    self._send_locked(cmd)
  File "/usr/local/lib/python3.6/site-packages/pylutron/__init__.py", line 81, in _send_locked
    self._telnet.write(cmd.encode('ascii') + b'\r\n')
  File "/usr/local/lib/python3.6/telnetlib.py", line 290, in write
    self.sock.sendall(buffer)
TimeoutError: [Errno 110] Operation timed out

I'll just revert to 0.84.6 and wait it out. That version is pretty much rock-solid for me.

Again, my thanks.

@ToddNJ
Copy link

ToddNJ commented Feb 28, 2019

I had this same problem. Works for me in 0.86.4, failed when I updated to 0.88.1, I rolled back.
I'm running HASS.IO in docker on Ubuntu 18.0.4 on a Laptop.
Lutron Bridge is model L-BDG2

Error Log Entry:

2019-02-27 20:02:14 ERROR (MainThread) [homeassistant.core] Error doing job: Task exception was never retrieved
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/site-packages/homeassistant/helpers/service.py", line 289, in _handle_service_platform_call
    await getattr(entity, func)(**data)
  File "/usr/local/lib/python3.7/site-packages/homeassistant/components/lutron_caseta/switch.py", line 33, in async_turn_on
    self._smartbridge.turn_on(self._device_id)
  File "/usr/local/lib/python3.7/site-packages/pylutron_caseta/smartbridge.py", line 190, in turn_on
    return self.set_value(device_id, 100)
  File "/usr/local/lib/python3.7/site-packages/pylutron_caseta/smartbridge.py", line 182, in set_value
    return self._writer.write(cmd)
AttributeError: 'NoneType' object has no attribute 'write'

@stale
Copy link

stale bot commented Jul 7, 2019

There hasn't been any activity on this issue recently. Due to the high number of incoming GitHub notifications, we have to clean some of the old issues, as many of them have already been resolved with the latest updates.
Please make sure to update to the latest Home Assistant version and check if that solves the issue. Let us know if that works for you by adding a comment 👍
This issue now has been marked as stale and will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 7, 2019
@grantalewis
Copy link
Author

This is still an unresolved issue in 95.4.

@stale stale bot removed the stale label Jul 8, 2019
@JonGilmore
Copy link
Contributor

@ToddNJ your issue looks different than this one, appears your using caseta? @grantalewis is using RadioRa2, and a different library altogether. I'd suggest you open another issue for yours.

@grantalewis can you describe your setup a bit? I'm afraid the troubleshooting that @cdheiser is doing may possibly provide a workaround for your issue, but I don't think it addresses the root cause: why is your HASS instance losing connection to your Ra2 main receiver?

I believe we've chatted about this before on Reddit, but I'll add that my RadioRa2 install is working well with the latest home assistant release (still), so I think we need to diagnose the why.

@grantalewis
Copy link
Author

The core of my system is OmniPro II / Lutron RadioRA2. Nearly all of my light and fan controls are Lutron. I got my start with Home Assistant in 2017 (v0.7x-ish). I started on Hasbian, moved to Hass.io, and then have gone back/forth a couple of times since. I enjoyed a very smooth HA experience until January, 2019 with v0.85x at which time I had to revert to 0.84.6. I've been stuck on that version since January. (Despite its increasing age and ever-growing list of missing features, 0.84.6 still performs really well for me.)

I needed an environment that would let me experiment more and so moved Hass.io to VirtualBox earlier this year (Ubuntu 64). That didn't provide me with a solution to my Lutron problem, but it at least gave me a way to test new versions while still having an easy road back to my stable 0.84.6.

I also keep an RPi3 updated with the latest HA release in case a solution suddenly becomes available. (I hope to eventually be able to use Hass.io on RPi3 as my day-to-day system.)

I've tried literally every release since 0.84.6, and they all exhibit exactly the same issue. After rebooting, the system seems to be fine for the first hour or two and then I'll notice that the UI begins losing the ability to control entities. This video demonstrates the problem ( <-- Dropbox link to MP4 video). When the issue begins happening, I can control an entity 1 more time using its toggle. The entity will reflect the change, but the toggle will revert to its prior state as if somehow indicating that the change was unsuccessful. At that point, the entity will not respond to any additional changes from the UI. The only way to regain control is to reboot.

Sorry if this is long-winded but I didn't want to leave anything out. I'm happy to help in any way I'm able, so please feel free to call on me for testing, more information, etc.

@JonGilmore
Copy link
Contributor

Thats a good explanation, I appreciate it. The video helps me understand the behavior, which is different than I was understanding from your previous stuff in writing. Going to continue our convo on discord to keep this issue clean, if we come up with anything, we can post it here.

@Tangston311
Copy link

I’m glad I found this because I’m experiencing exactly the same issue with HA and Ra2. At first everything seems to work just fine, but after an hour or so I experience the exact behavior seen in the video. My automations and service calls seems to still work just fine, but controlling through the front end becomes unusable.

Happy to do what I can to provide more info to help troubleshoot.

I’m on 93.1 using Hass.io.

@thecynic
Copy link
Contributor

@Tangston311 Interesting. So in your case the actual communication with the lutron controller itself is not affected? Just that frontend gets out of sync? If it's the latter than I'll need some help from someone who's more higher-level stack in HASS oriented to help figuring that part out.

@grantalewis Do you happen to have logs for when this happens? I'd also be curious to see if manual service calls work fine as well. Would be good to identify where in the stack the issue is creeping in.

@thecynic
Copy link
Contributor

@grantalewis Oh I see the stacks above. Ok, yes, my pylutron PR should help with that. The connection management previously was very optimistic with respect to how well things remain connected, and thus was not really robust. I wonder if your network is somewhat flaky. Maybe you have things on wifi and they drop, etc? Doesn't really matter, just curious as to why you see it so much more often.

@grantalewis
Copy link
Author

grantalewis commented Aug 14, 2019

@thecynic, @JonGilmore suggested debug logging via Discord DM, so I'll do that. Going with

logger:
   default: debug

for now but let me know if you want other options enabled, etc.

It seems that @Tangston311 and I are seeing the exact same symptoms. UI misbehaves but entities still respond to homeassistant.toggle, homeassistant.turn_on, homeassistant.turn_off, etc. via the Services panel.

Network: I suppose it could be network-related, but that doesn't account for rock-solid performance from 0.84.6. It's like night and day: fantastic performance with 0.84.6 and before; completely predictable failure with every version since.

@Tangston311
Copy link

@thecynic , @grantalewis is correct, I think we're both experiencing the same behavior. I have a few automatons that work by simply turning things on or off (regardless of current state) with "turn_on" or "turn_off" service calls, but controlling through the UI becomes inoperable. Additionally, when this happens, HA no longer knows the current state of a light, so if I attempt to create an automation based on the current state of an entity it won't work because it thinks the light is currently off when it's actually on, or vice versa.

Network issues could definitely be a factor for me: I noticed that sometimes the Ra2 Inclusive software couldn't find the main repeater, so I tried relocated the main repeater to a different location. Now the Ra2 software can always find the main repeater, and I noticed this lengthened the period of time that the HA UI would function: it went from bombing out after ~30 minutes to lasting most of the day, but inevitably it still exhibits the same behavior. I can't explain why because the main repeater was hard-wired in both locations, but it does point to some network instability that may be causing the issue.

@grantalewis
Copy link
Author

I don't mean to be stubborn, but unless 0.84.6 is just more tolerant of borderline network issues, I just don't think it's network related. I have no problems with 0.84.6 whatsoever -- not one, not ever.

@grantalewis
Copy link
Author

OK, let 0.97.2 run for a while today. As soon as I saw the problem occur, I grabbed the logs from 3 or so minutes prior. I deleted a lot of log entries that I'm fairly certain are not involved in the issue. The .txt file is attached. Thanks for taking a look.

log.txt

@JonGilmore
Copy link
Contributor

OK, let 0.97.2 run for a while today. As soon as I saw the problem occur, I grabbed the logs from 3 or so minutes prior. I deleted a lot of log entries that I'm fairly certain are not involved in the issue. The .txt file is attached. Thanks for taking a look.

Hm, this log doesn't include the "normal" stack trace. I'm curious, did it show up again this time? I'll go through it just in case, but I'm not sure this capture is going to be much help.

@grantalewis
Copy link
Author

grantalewis commented Aug 15, 2019

In my configuration.yaml:

logger:
   default: debug

As soon as I saw the problem occur I grabbed the logs from the prior 3 minutes. Maybe I need to back up further?

@cdheiser
Copy link
Contributor

cdheiser commented Sep 4, 2019 via email

cdheiser added a commit to thecynic/pylutron that referenced this issue Sep 4, 2019
Fix a logging format error in the keypad component to use %s instead of %d for the keypad name.

Also add a try/except around the main loop to catch and log exceptions with the lutron thread for future debugging.
@cdheiser
Copy link
Contributor

cdheiser commented Sep 4, 2019 via email

@cdheiser
Copy link
Contributor

cdheiser commented Sep 4, 2019 via email

@cdheiser
Copy link
Contributor

cdheiser commented Sep 4, 2019

Note once pylutron is updated, it will still require another push to Pypi and an update of the component dependencies.

@JonGilmore
Copy link
Contributor

I found the bug. Proposing an edit now.

.... wait - is this the cause of all of @grantalewis issues? Or just a logging bug that is hiding his issues?

@cdheiser
Copy link
Contributor

cdheiser commented Sep 5, 2019 via email

@JonGilmore
Copy link
Contributor

Interesting - I don't think it has anything to do with fans as I have 5 of them, and don't see these issues either. FWIW, I have picos, hybrid seetouch keypads, 10 key seetouch keypad, and dimmers/switches and have never seen this. Regardless, awesome find!

@grantalewis
Copy link
Author

grantalewis commented Sep 5, 2019

Pretty excited about this! Fingers crossed. I've got the new custom component installed on 0.98.3 and am going to let it run overnight. I also re-enabled my customary (non-mimimal) configuration.yaml, the OmniProBridge, and all automations, so I admit to throwing caution to the wind a bit. If things are still working in the AM, that'll be a very good sign.

I still have all of the loggers enabled in case something goes wrong.

@cdheiser
Copy link
Contributor

cdheiser commented Sep 5, 2019 via email

@grantalewis
Copy link
Author

Everything's still working this morning! I will of course keep tabs on it throughout the day, but the outlook seems positive.

@thecynic
Copy link
Contributor

thecynic commented Sep 5, 2019 via email

@grantalewis
Copy link
Author

Checking in to say that it's been smooth sailing all day long. Completely stoked to have this problem finally taken care of! Thanks to @cdheiser and @thecynic for your determination and patience.

I'd appreciate a recommendation on how long to leave the issue open. And also I'd appreciate your giving me a heads up on when to remove the custom component. (I know it's not yet, but if you happen to know the version # that will include the fixed pylutron, please let me know.)

Once again, my thanks. If you guys were in my neighborhood I'd buy you a beer. Or three.

@Tangston311
Copy link

This is awesome - just want to echo @grantalewis comments to say it’s amazing to see how dedicated all of you have been to solving this. This is why HA and this community are so incredible.

@cdheiser
Copy link
Contributor

cdheiser commented Sep 5, 2019 via email

@JonGilmore
Copy link
Contributor

Checking in to say that it's been smooth sailing all day long. Completely stoked to have this problem finally taken care of! Thanks to @cdheiser and @thecynic for your determination and patience.

I'd appreciate a recommendation on how long to leave the issue open. And also I'd appreciate your giving me a heads up on when to remove the custom component. (I know it's not yet, but if you happen to know the version # that will include the fixed pylutron, please let me know.)

Once again, my thanks. If you guys were in my neighborhood I'd buy you a beer. Or three.

WOOHHOOO! This is great news! I assume this is the longest you've gone without issues since 0.86.4? If so, do you think you can confidently say it's solved?

@grantalewis
Copy link
Author

Yes, this is definitely the longest I've gone without problems, and I am nearly certain the problem is solved. I couldn't say this to most people and expect them to understand, but I know you guys can relate: this is huge. It's like having a weight off my shoulders. I am a happy man!

thecynic pushed a commit to thecynic/pylutron that referenced this issue Sep 8, 2019
Fix a logging format error in the keypad component to use %s instead of %d for the keypad name.
thecynic pushed a commit to thecynic/pylutron that referenced this issue Sep 22, 2019
Fix a logging format error in the keypad component to use %s instead of %d for the keypad name.
@thecynic
Copy link
Contributor

I've pushed version 0.2.5 to pypi, will need to update HASS for new version and hopefully we can close this.

@Tangston311
Copy link

Thanks @thecynic ! Does this mean 99.2 contains the fix, or will it be the next version?

@thecynic
Copy link
Contributor

Thanks @thecynic ! Does this mean 99.2 contains the fix, or will it be the next version?

@Tangston311 Will be in the next version.

@amattas
Copy link
Contributor

amattas commented Jan 13, 2020

Still happening in Home Assistant 0.101.3

@grantalewis
Copy link
Author

grantalewis commented Jan 13, 2020

I'm running 0.103.6 and have not encountered the problem I first described since 9/5/2019.

@amattas I've seen something similar if my RA2 system loses power/connectivity. I suggest shutting down HA, restarting your Lutron system, waiting for ~5 mins, and then rebooting HA.

@amattas
Copy link
Contributor

amattas commented Jan 13, 2020 via email

@grantalewis
Copy link
Author

The initial issue was due to a typo:

Uncaught exception: %d format: a number is required, not str

That's been fixed. It appears you're after something else, so you might want to create a new thread.

@amattas
Copy link
Contributor

amattas commented Jan 13, 2020 via email

@grantalewis
Copy link
Author

I'm neither a Github guru nor a competent lutron.py programmer so I'm going to leave this in others' hands. I'll just say that from my perspective the initial problem mentioned in this thread is resolved and is unrelated to your #29642. I'm guessing you're likely to get more traction in that other thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants