Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

kv: more actively refresh range descriptor cache when nodes crash or removed abruptly #68225

Closed
cindyzqtnew opened this issue Jul 29, 2021 · 7 comments
Assignees
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community T-kv KV Team X-blathers-triaged blathers was able to find an owner

Comments

@cindyzqtnew
Copy link

Is your feature request related to a problem? Please describe.
We are running crdb in the containers, so it can be easily scaled up/down or instances can be replaced easily. However, each time when there is a node exiting accidentally due to all sorts of reasons, or decommissioned, the range descriptor cache in all nodes are not updated instantly (or in a short time). The cache will only be updated when querying the related ranges. This will increase the SQL latency which is not acceptable, especially when there are more than 1 nodes exiting.

Describe the solution you'd like
We are thinking of adding any strategy to update the cache actively, so it won't wait until there is any query.
Are you guys considering any plan like this? I've looked at the latest release, but find nothing.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

@cindyzqtnew cindyzqtnew added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 29, 2021
@blathers-crl
Copy link

blathers-crl bot commented Jul 29, 2021

Hello, I am Blathers. I am here to help you get the issue triaged.

I have CC'd a few people who may be able to assist you:

  • @cockroachdb/sql-queries (found keywords: plan)

If we have not gotten back to your issue within a few business days, you can try the following:

  • Join our community slack channel and ask on #cockroachdb.
  • Try find someone from here if you know they worked closely on the area and CC them.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@blathers-crl blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jul 29, 2021
@cindyzqtnew
Copy link
Author

btw, i see there is a "RANGEFEED" in the latest release which can detect updates to the range specified. Can we apply it in the span of range desc?

@rytaft
Copy link
Collaborator

rytaft commented Jul 29, 2021

Blathers didn't quite get it right this time since the queries team doesn't touch the range descriptor cache. I think this looks like a question for the @cockroachdb/kv team. Feel free to redirect if I'm mistaken.

@blathers-crl blathers-crl bot added the T-kv KV Team label Jul 29, 2021
@knz knz changed the title actively refresh range descriptor cache? kv: more actively refresh range descriptor cache when nodes crash or removed abruptly Jul 29, 2021
@cindyzqtnew
Copy link
Author

Yes, this should belong to the kv team. It has caused a SLA problem for our business. Though the data is still available while any of the node taken down, but a high latency means non-available for the business team.

@ajwerner
Copy link
Contributor

What version are you using?

@cindyzqtnew
Copy link
Author

@ajwerner v2.1, a bit older version ;) is this issue fixed in the latest release?

@ajwerner
Copy link
Contributor

ajwerner commented Aug 2, 2021

We've done quite a bit in this area since 2.1. I believe we at least filter decommissioned replicas. I'm going to close this as stale. If you can produce a problem on a supported version and can demonstrate bad behavior, feel free to open a new issue.

@ajwerner ajwerner closed this as completed Aug 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-community Originated from the community T-kv KV Team X-blathers-triaged blathers was able to find an owner
Projects
None yet
Development

No branches or pull requests

4 participants