Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cdc,roachtest: add test with changefeeds over a large number of ranges #95236

Closed
irfansharif opened this issue Jan 13, 2023 · 4 comments
Closed
Assignees
Labels
A-admission-control A-cdc Change Data Capture C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-cdc

Comments

@irfansharif
Copy link
Contributor

irfansharif commented Jan 13, 2023

Is your feature request related to a problem? Please describe.

In internal incidents, we've seen sharp spikes in runnable g's per p (and CPU AC kicking in as a result) when those specific nodes were the changefeed coordinators. The way we publish closed ts updates, are we waking up many rangefeeds all at once? And as a result, causing large spikes in runnable goroutines? We observed an effect on SQL tail latency when this happened, and suspected the impact of elevated Go scheduling latency (which we now have metrics for: #87883). We also observed that pausing the changefeed helping reduce the latency impact. Baseline CPU utilization throughout was low (<25%).

This issue tracks reproducing a similar setup ourselves. Perhaps by introducing 200k+ splits on a single table, disabling the merge queue, and running a changefeed over it. It'll help with driving improvements. The motivating incident is https://github.com/cockroachlabs/support/issues/1997, and discussed internally here. Also https://github.com/cockroachlabs/support/issues/2036.

Jira issue: CRDB-23415

Epic CRDB-23738

@irfansharif irfansharif added the C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) label Jan 13, 2023
@irfansharif irfansharif added O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs A-admission-control labels Jan 14, 2023
@blathers-crl blathers-crl bot added the T-kv KV Team label Jan 14, 2023
@erikgrinaker erikgrinaker added T-kv-replication and removed T-kv KV Team labels Jan 17, 2023
@blathers-crl
Copy link

blathers-crl bot commented Jan 17, 2023

cc @cockroachdb/replication

@irfansharif irfansharif added the T-kv KV Team label Jan 17, 2023
@erikgrinaker erikgrinaker removed the T-kv KV Team label Jan 17, 2023
@erikgrinaker
Copy link
Contributor

I believe @miretskiy is looking into this, but let me know if that's not the case.

@blathers-crl
Copy link

blathers-crl bot commented Jan 23, 2023

cc @cockroachdb/cdc

@miretskiy
Copy link
Contributor

Closing; we have cdc tests over 100k ranges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-admission-control A-cdc Change Data Capture C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) O-support Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs T-cdc
Projects
No open projects
Status: Closed
Development

No branches or pull requests

3 participants