Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AttributeError: 'NoneType' object has no attribute 'close' #1345

Closed
rolette opened this issue May 15, 2020 · 12 comments
Closed

AttributeError: 'NoneType' object has no attribute 'close' #1345

rolette opened this issue May 15, 2020 · 12 comments

Comments

@rolette
Copy link

rolette commented May 15, 2020

Unfortunately, the fix for #732 doesn't correctly handle master node failover for multi-threaded code.

Traceback (most recent call last):
  # snip
  File "/usr/local/lib/python3.5/site-packages/redis/client.py", line 1870, in blpop
    return self.execute_command('BLPOP', *keys)
  File "/usr/local/lib/python3.5/site-packages/redis/client.py", line 884, in execute_command
    conn.send_command(*args)
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 721, in send_command
    check_health=kwargs.get('check_health', True))
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 692, in send_packed_command
    self.connect()
  File "/usr/local/lib/python3.5/site-packages/redis/sentinel.py", line 44, in connect
    self.connect_to(self.connection_pool.get_master_address())
  File "/usr/local/lib/python3.5/site-packages/redis/sentinel.py", line 107, in get_master_address
    self.disconnect()
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 1241, in disconnect
    connection.disconnect()
  File "/usr/local/lib/python3.5/site-packages/redis/connection.py", line 669, in disconnect
    self._sock.close()
AttributeError: 'NoneType' object has no attribute 'close'

Scenario is this:

  • multi-threaded process running, using a SentinelConnectionPool to get their separate connections to the Redis master
  • Redis master goes down, Sentinel does its job and elects a new master
  • Subsequent redis-py call ends up calling SentinelConnectionPool.get_master_address()
  • get_master_address() notices that there is a new master and calls self.disconnect() to flush the pool
  • ConnectionPool.disconnect() calls disconnect() on each of the connections in the pool.
  • Connection.disconnect() sees a matching PID and calls shutdown() on the socket, then sets it back to None
  • "Fun" ensues...

There is thread-locking to protect the pool's management of its members, but ConnectionPool.disconnect() is ripping sockets out from other threads in the middle of other operations.

The actual stack trace and error you'll get will vary depending on timing.

To fix it in my product code, I'm re-integrating the deferred disconnect using a generation id from PR #784.

Version: redis-py v3.4.1, redis/sentinel v3.2.13
Platform: Python 3.5.2 on Linux

@andymccurdy
Copy link
Contributor

andymccurdy commented May 19, 2020

I've been digging into this for a bit this afternoon. I didn't write redis-py's sentinel implementation so this is a good exercise for me to become more familiar with what's happening.

First, a question: Are you calling Sentinel.master_for() each time you need a connection to the master? Or do you call it once when your app starts and then use that object through the life of the application?

Here are some observations after reading the code.

  1. If the Sentinels elect a new master, it seems likely that the clients connected to the old master are going to have various connection problems. Even if they are still connected to the old master (e.g., a network partition occurred where clients can still see the old master but the sentinels cannot), allowing them to write to the old master would likely result in data loss. Given that, it seems like calling SentinelConnectionPool.disconnect() might be the correct thing to do, though perhaps we could do this more gracefully.

  2. However, it seems like the SentinelConnectionPool doesn't work well when a new master is elected. SentinelConnectionPool.master_address is only assigned in SentinelConnectionPool.reset() and SentinelConnectionPool.get_master_address(). reset() only gets called from SentinelConnectionPool.__init__(), and get_master_address() only assigns a value to master_address if the current value is None. So after a new master is elected, every subsequent call to SentinelConnectionPool.get_master_address() is going to fail in the master_address != self.master_address check and subsequently disconnect all the connections in the pool. This seems terrible, but easily fixable by simply assigning the new master_address in this step.

If we fix item 2, there will still be a hiccup when the new master is elected, but then all the clients should get connected to the new master and work correctly.

@rolette
Copy link
Author

rolette commented May 20, 2020

Hi Andy,

First, a question: Are you calling Sentinel.master_for() each time you need a connection to the master? Or do you call it once when your app starts and then use that object through the life of the application?

I'm just calling redis.hget() (for example), where the Redis instance was instantiated using a SentinelConnectionPool. The SentinelManagedConnection ends up doing a get_master_address() in its connect() method. Just stock redis-py behavior with sentinel.

  1. If the Sentinels elect a new master, it seems likely that the clients connected to the old master are going to have various connection problems.

Yep. Clients have to deal with various errors here (mostly timeout and connection errors). However, since execute_command() ends up calling get_master_address() each time instead of caching it, the connection automatically disconnects from the old master and connects to the new one. Occasionally you have to deal with a MasterNotFoundError, but in our product code, we have a wrapper that automatically retries the operation in that case.

Even if they are still connected to the old master (e.g., a network partition occurred where clients can still see the old master but the sentinels cannot), allowing them to write to the old master would likely result in data loss. Given that, it seems like calling SentinelConnectionPool.disconnect() might be the correct thing to do, though perhaps we could do this more gracefully.

I haven't found a good way to safely disconnect in-use connections in the pool other than using a generation id and reaping them when they are released back to the pool. Clients expect to deal with normal network and service outage errors, but they shouldn't have to be dealing with TypeError and AttributeError exceptions on a connection they own.

If it's helpful, I can post my patch that I'm using to fix the disconnect() problem. It's not materially different than the PR I submitted previously other than leaving out the fixes for forked processes, which your fixes seem to handle fine.

  1. However, it seems like the SentinelConnectionPool doesn't work well when a new master is elected...

Ah, I forgot PR #847 hasn't been merged into master yet. I have that merged into my tree. It fixes that problem :)

@andymccurdy
Copy link
Contributor

Merged #847 as that clearly needed to be fixed. If you want to post what you have for the generation ID stuff I'll take a look.

@rolette
Copy link
Author

rolette commented May 20, 2020

Great. Patch is attached, very simple.
redis_py_1345.patch.txt

@andymccurdy
Copy link
Contributor

@rolette I pushed the https://github.com/andymccurdy/redis-py/tree/sentinel-1345 branch that I believe fixes this. It uses a slightly different approach as your solution, but I believe it results in the same basic behavior of throwing a stale connection out when it's returned to the pool. It also fixes a bug where the base ConnectionPool class wasn't decrementing its _created_connections counter when it needs to throw connections away.

@rolette
Copy link
Author

rolette commented May 30, 2020

@andymccurdy Reviewed your changes and added my $0.02.

Thanks!

@andymccurdy
Copy link
Contributor

@rolette I added an optional inuse_connections=True arg to ConnectionPool.disconnect. By default it will work identical to before. Passing inuse_connections=False will only disconnect idle connections that aren't leased out by the pool.

When the SentinelConnectionPool detects the master address has changed, it will call self.disconnect(inuse_connections=False) to disconnect all the idle connections.

Have I missed anything?

@rolette
Copy link
Author

rolette commented Jun 1, 2020

@andymccurdy I'd make inuse_connections default to False since the old behavior would result in the same errors that you are fixing here. If they really want to force the disconnect and deal with the errors, then make that the exception case.

Have I missed anything?
I think the only feedback you haven't addressed is my comment about restoring the self.disconnect() in get_master_address() when a new master is detected. From my experience, it would be better to flush "available" connections in the pool right then rather than having to hit them individually as they are requested down the road.

@andymccurdy
Copy link
Contributor

@rolette I made inuse_connection=True the default to preserve backwards compatibility. I'm happy to change that behavior in the next major version.

I am calling self.disconnect(inuse_connections=False) in get_master_address() here: 811cdd0#diff-ea3ce4482d1ecb8a23d1cce2ec6e9b48R113

@rolette
Copy link
Author

rolette commented Jun 1, 2020

@andymccurdy Ok, that looks good. Appreciate the fixes!

@andymccurdy
Copy link
Contributor

Great, I'll merge this and release 3.5.3

@andymccurdy
Copy link
Contributor

Just released 3.5.3 with this fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants