-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kv: more actively refresh range descriptor cache when nodes crash or removed abruptly #68225
Comments
Hello, I am Blathers. I am here to help you get the issue triaged. I have CC'd a few people who may be able to assist you:
If we have not gotten back to your issue within a few business days, you can try the following:
🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
btw, i see there is a "RANGEFEED" in the latest release which can detect updates to the range specified. Can we apply it in the span of range desc? |
Blathers didn't quite get it right this time since the queries team doesn't touch the range descriptor cache. I think this looks like a question for the @cockroachdb/kv team. Feel free to redirect if I'm mistaken. |
Yes, this should belong to the kv team. It has caused a SLA problem for our business. Though the data is still available while any of the node taken down, but a high latency means non-available for the business team. |
What version are you using? |
@ajwerner v2.1, a bit older version ;) is this issue fixed in the latest release? |
We've done quite a bit in this area since 2.1. I believe we at least filter decommissioned replicas. I'm going to close this as stale. If you can produce a problem on a supported version and can demonstrate bad behavior, feel free to open a new issue. |
Is your feature request related to a problem? Please describe.
We are running crdb in the containers, so it can be easily scaled up/down or instances can be replaced easily. However, each time when there is a node exiting accidentally due to all sorts of reasons, or decommissioned, the range descriptor cache in all nodes are not updated instantly (or in a short time). The cache will only be updated when querying the related ranges. This will increase the SQL latency which is not acceptable, especially when there are more than 1 nodes exiting.
Describe the solution you'd like
We are thinking of adding any strategy to update the cache actively, so it won't wait until there is any query.
Are you guys considering any plan like this? I've looked at the latest release, but find nothing.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context or screenshots about the feature request here.
The text was updated successfully, but these errors were encountered: