-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: Backfill checkpointing appears to be broken #96959
Labels
A-cdc
Change Data Capture
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-postmortem
Originated from a Postmortem action item.
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
T-cdc
Comments
miretskiy
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
A-cdc
Change Data Capture
T-cdc
backport-22.1.x
labels
Feb 10, 2023
cc @cockroachdb/cdc |
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Feb 12, 2023
An over than 2 year old change (cockroachdb#71848) that added support for checkpointing during backfill after schema change, inadvertently broke initial scan checkpointing funcitonality Exacerbating the problem, the existing test `TestChangefeedBackfillCheckpoint` continued to work fine. Treason why it was passing was because the test was looking for a checkpoint whose timestamp matched bacfill timestamp. The bug involved incorrect initialize/use of 0 timestamp. It just so happens, that after initial scan completes, the rangefeed starts, and the very first thing it does is to generate a 0 timestamp checkpoint. So, the test was observing this event, and continued to work. This PR does not have a dedicated test because the existing tests work fine -- provided we ignore 0 timestamp checkpoint, which is what this PR does in addition to addressing the root cause of the bug. Informs cockroachdb#96959 Release note (enterprise change): Fix a bug in changefeeds, where long running initial scans will fail to generate checkpoint. Failure to generate checkpoint is particularly bad if the changefeed restarts for whatever reason. Without checkpoints, the changefeed will restart from the beginning, and in the worst case, when exporting substantially sized tables, changefeed initial scan may have hard time completing.
craig bot
pushed a commit
that referenced
this issue
Feb 13, 2023
96995: changefeedccl: Fix initial scan checkpointing r=miretskiy a=miretskiy An over than 2 year old change (#71848) that added support for checkpointing during backfill after schema change, inadvertently broke initial scan checkpointing functionality Exacerbating the problem, the existing test `TestChangefeedBackfillCheckpoint` continued to work fine. The reason why it was passing was because the test was looking for a checkpoint whose timestamp matched backfill timestamp. The bug involved incorrect initialize/use of 0 timestamp. It just so happens, that after initial scan completes, the rangefeed starts, and the very first thing it does is to generate a 0 timestamp checkpoint. So, the test was observing this event, and continued to work. This PR does not have a dedicated test because the existing tests work fine -- provided we ignore 0 timestamp checkpoint, which is what this PR does in addition to addressing the root cause of the bug. Informs #96959 Release note (enterprise change): Fix a bug in changefeeds, where long running initial scans will fail to generate checkpoint. Failure to generate checkpoint is particularly bad if the changefeed restarts for whatever reason. Without checkpoints, the changefeed will restart from the beginning, and in the worst case, when exporting substantially sized tables, changefeed initial scan may have hard time completing. 97037: acceptance: skip TestDockerCLI test_demo_partitioning.tcl only r=tbg a=herkolategan Renamed `test_demo_partitioning.tcl` to `test_demo_partitioning.tcl.disabled` which will cause TestDockerCLI to skip the test file. Refs: #96797 Reason: flaky test Epic: None Release note: None Co-authored-by: Yevgeniy Miretskiy <[email protected]> Co-authored-by: Herko Lategan <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Feb 13, 2023
An over than 2 year old change (#71848) that added support for checkpointing during backfill after schema change, inadvertently broke initial scan checkpointing funcitonality Exacerbating the problem, the existing test `TestChangefeedBackfillCheckpoint` continued to work fine. Treason why it was passing was because the test was looking for a checkpoint whose timestamp matched bacfill timestamp. The bug involved incorrect initialize/use of 0 timestamp. It just so happens, that after initial scan completes, the rangefeed starts, and the very first thing it does is to generate a 0 timestamp checkpoint. So, the test was observing this event, and continued to work. This PR does not have a dedicated test because the existing tests work fine -- provided we ignore 0 timestamp checkpoint, which is what this PR does in addition to addressing the root cause of the bug. Informs #96959 Release note (enterprise change): Fix a bug in changefeeds, where long running initial scans will fail to generate checkpoint. Failure to generate checkpoint is particularly bad if the changefeed restarts for whatever reason. Without checkpoints, the changefeed will restart from the beginning, and in the worst case, when exporting substantially sized tables, changefeed initial scan may have hard time completing.
This was referenced Feb 14, 2023
miretskiy
pushed a commit
to miretskiy/cockroach
that referenced
this issue
Feb 22, 2023
An over than 2 year old change (cockroachdb#71848) that added support for checkpointing during backfill after schema change, inadvertently broke initial scan checkpointing funcitonality Exacerbating the problem, the existing test `TestChangefeedBackfillCheckpoint` continued to work fine. Treason why it was passing was because the test was looking for a checkpoint whose timestamp matched bacfill timestamp. The bug involved incorrect initialize/use of 0 timestamp. It just so happens, that after initial scan completes, the rangefeed starts, and the very first thing it does is to generate a 0 timestamp checkpoint. So, the test was observing this event, and continued to work. This PR does not have a dedicated test because the existing tests work fine -- provided we ignore 0 timestamp checkpoint, which is what this PR does in addition to addressing the root cause of the bug. Informs cockroachdb#96959 Release note (enterprise change): Fix a bug in changefeeds, where long running initial scans will fail to generate checkpoint. Failure to generate checkpoint is particularly bad if the changefeed restarts for whatever reason. Without checkpoints, the changefeed will restart from the beginning, and in the worst case, when exporting substantially sized tables, changefeed initial scan may have hard time completing.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-cdc
Change Data Capture
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
O-postmortem
Originated from a Postmortem action item.
O-support
Would prevent or help troubleshoot a customer escalation - bugs, missing observability/tooling, docs
T-cdc
Customer reports that a reasonably sized table (40k ranges) cannot complete backfill;
towards the end changefeed restarts, and no backfill checkpoints appear to be written.
It appears that backfill checkpointing functionality regressed at some point w/out
tests picking up on that.
Jira issue: CRDB-24433
Epic CRDB-11783
The text was updated successfully, but these errors were encountered: