-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
jobs: canceling a job fails if the job's range is backpressuring writes #62627
Comments
We could pretty easily allow any high-priority write to skip the backpressure mechanism, regardless of the range size. Maybe we expose that from KV and then hook up certain operations like |
You can already run it at high priority if you just do it in a transaction. One thing to watch out for is that if you let it get big enough, then the backpressure will totally turn off... |
After discussing this in an internal Slack thread, we've decided to put this on the backburner and see if this is still a problem with 21.2. We've implemented lots of mitigations in the jobs infra and components that use it, and we have known workarounds (e.g. increasing the range size). |
@erikgrinaker @nvanbenschoten should we close this ? I don't remember anything like this coming up recently, do you ? |
I don't know if this is still a problem. I think it might be, but it's possible that we have mitigations in the jobs code. @cockroachdb/cdc owns the jobs infra now. |
We have addressed this by moving the job updates to a separate job_info table and never overwriting rows, so splits are always possible. |
In https://github.com/cockroachlabs/support/issues/900 we saw that we couldn't cancel a pathological job once its row in
system.jobs
had gotten large enough that the range is refusing writes (futilely waiting for a split).One way or another, canceling needs to work. We need, I guess, to find a way of marking the respective write as immune to write backpressuring.
cc @nvanbenschoten @dt
Jira issue: CRDB-2756
The text was updated successfully, but these errors were encountered: