-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Redis cluster doesn't reconnect when node returns online #37348
Comments
/cc @cescoffier (redis), @gsmet (redis), @machi1990 (redis) |
If I'm interpretting this RedisCacheImpl code correctly, then if a ConnectionException was thrown by Vertx redis client, then the RedisCache in quarkus should retrieve the non-cached value from 'the valueloader', so maybe quarkus is already supposed to do the second part of my suggested implementation, i.e. retrieve the cat fact by going out to the external service when the cache fails. Line 235 in 670b43c
|
@Ladicek Looking at https://vertx.io/docs/vertx-redis-client/java/#_implementing_reconnect_on_error. When a failure happens, we must re-create the client (and the pool). If so, we would need a facade that handles that. I'm worried that one connection having an error requires completely recreating the client and pools (meaning the other connection may still be fine). WDYT? |
The example code in Vert.x Redis client documentation is pretty naive. There's a good reason for that -- failure detection is hard. However, I believe that the simple failure mode (Redis has gone) should be handled transparently by the connection pool. If a connection fails, it should be evicted from the pool, and a new one should be added, which should effectively implement reconnection. I need to check why it doesn't work like that. EDIT: of course, what I'm suggesting leads to propagating errors to user code. I don't see an issue with that. |
Heh, this is funny. It actually works exactly as I expect (in my previous comment) when using a standalone Redis connection. When I configure a cluster connection, it falls apart:
|
The 1st issue mentioned above is easy to solve. There are 2 basic ways to do it, either on the Quarkus side (detect The 2nd issue took me a while to figure out. When the Redis client connects to a cluster, it first obtains the hash slot assignment. To prevent overloading the first node in the list, the hash slot assignment is cached for a brief period of time (1 second by default). However, we only set up a timer to expire that miniature cache when we obtain the hash slot assignment successfully. If the I'll submit PRs to Vert.x Redis client in a bit. |
I ended up amending my existing PRs that were not merged yet, because it's essentially the same area of improvements: |
Awesome! Thanks @Ladicek |
The issues here were fixed in Vert.x Redis client 4.5.1. Quarkus updated to Vert.x 4.5.1 in 3.7.0 (see #38034), hence closing this. |
Description
Creating issue per #37041 (comment)
There is an issue with reconnects - take the starter code from Issue 37041. If you shutdown the redis cluster, issue a request that connects to redis which fails, then restart the cluster, subsequent requests don't reconnect to redis.
Vert.x redis client purposely doesn't implement client reconnects - quarkus should probably do that.
Reproduction steps:
2023-11-17 22:27:36,551 ERROR [io.qua.ver.htt.run.QuarkusErrorHandler] (vert.x-eventloop-thread-1) HTTP Request to /cat-fact failed, error id: 736e1d37-7d84-43fb-8e37-2d4d34f4eda6-11: io.vertx.core.impl.NoStackTraceThrowable: Cannot connect to any of the provided endpoints"
Implementation ideas
Solution:
Quarkus should automatically handle reconnecting to redis if the redis cluster has become unavailable during some requests. At a minimum, in this application, quarkus should (in the absence of cache) retrieve the cat fact by going out to the external service when the cache fails.
The text was updated successfully, but these errors were encountered: