kv: more actively refresh range descriptor cache when nodes crash or removed abruptly #68225

cindyzqtnew · 2021-07-29T08:28:28Z

Is your feature request related to a problem? Please describe.
We are running crdb in the containers, so it can be easily scaled up/down or instances can be replaced easily. However, each time when there is a node exiting accidentally due to all sorts of reasons, or decommissioned, the range descriptor cache in all nodes are not updated instantly (or in a short time). The cache will only be updated when querying the related ranges. This will increase the SQL latency which is not acceptable, especially when there are more than 1 nodes exiting.

Describe the solution you'd like
We are thinking of adding any strategy to update the cache actively, so it won't wait until there is any query.
Are you guys considering any plan like this? I've looked at the latest release, but find nothing.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

blathers-crl · 2021-07-29T08:28:31Z

Hello, I am Blathers. I am here to help you get the issue triaged.

I have CC'd a few people who may be able to assist you:

@cockroachdb/sql-queries (found keywords: plan)

If we have not gotten back to your issue within a few business days, you can try the following:

Join our community slack channel and ask on #cockroachdb.
Try find someone from here if you know they worked closely on the area and CC them.

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.}

cindyzqtnew · 2021-07-29T08:52:37Z

btw, i see there is a "RANGEFEED" in the latest release which can detect updates to the range specified. Can we apply it in the span of range desc?

rytaft · 2021-07-29T12:11:20Z

Blathers didn't quite get it right this time since the queries team doesn't touch the range descriptor cache. I think this looks like a question for the @cockroachdb/kv team. Feel free to redirect if I'm mistaken.

cindyzqtnew · 2021-07-30T02:28:58Z

Yes, this should belong to the kv team. It has caused a SLA problem for our business. Though the data is still available while any of the node taken down, but a high latency means non-available for the business team.

ajwerner · 2021-07-30T14:22:28Z

What version are you using?

cindyzqtnew · 2021-08-02T02:25:00Z

@ajwerner v2.1, a bit older version ;) is this issue fixed in the latest release?

ajwerner · 2021-08-02T06:17:59Z

We've done quite a bit in this area since 2.1. I believe we at least filter decommissioned replicas. I'm going to close this as stale. If you can produce a problem on a supported version and can demonstrate bad behavior, feel free to open a new issue.

cindyzqtnew added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jul 29, 2021

blathers-crl bot added O-community Originated from the community X-blathers-triaged blathers was able to find an owner labels Jul 29, 2021

blathers-crl bot added the T-kv KV Team label Jul 29, 2021

tbg assigned mwang1026 Jul 29, 2021

knz changed the title ~~actively refresh range descriptor cache?~~ kv: more actively refresh range descriptor cache when nodes crash or removed abruptly Jul 29, 2021

cindyzqtnew mentioned this issue Jul 30, 2021

core: Latency spike when starting a previously node killed using pkill #36397

Closed

ajwerner closed this as completed Aug 2, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kv: more actively refresh range descriptor cache when nodes crash or removed abruptly #68225

kv: more actively refresh range descriptor cache when nodes crash or removed abruptly #68225

cindyzqtnew commented Jul 29, 2021

blathers-crl bot commented Jul 29, 2021

cindyzqtnew commented Jul 29, 2021

rytaft commented Jul 29, 2021

cindyzqtnew commented Jul 30, 2021

ajwerner commented Jul 30, 2021

cindyzqtnew commented Aug 2, 2021

ajwerner commented Aug 2, 2021

kv: more actively refresh range descriptor cache when nodes crash or removed abruptly #68225

kv: more actively refresh range descriptor cache when nodes crash or removed abruptly #68225

Comments

cindyzqtnew commented Jul 29, 2021

blathers-crl bot commented Jul 29, 2021

cindyzqtnew commented Jul 29, 2021

rytaft commented Jul 29, 2021

cindyzqtnew commented Jul 30, 2021

ajwerner commented Jul 30, 2021

cindyzqtnew commented Aug 2, 2021

ajwerner commented Aug 2, 2021