changefeedcc: Alter changefeed w/ Resolved has murky semantics. #84102

miretskiy · 2022-07-08T20:59:04Z

Alter changefeed to add new table with initial scan has murky
"resolved" semantics.

Consider the case when existing changefeed, running with resolved option,
is paused , and has the checkpoint as of time T
(and thus resolved event has been emitted at time T).

We then subsequently alter the feed, and add a new table with initial scan.
All previous tables will not be scanned; but the new table will be scanned
as of time T. This means that we will be emitting events with timestamp
set to T -- which seem to violate (or at least make murky) semantics
around previously emitted resolved event. That is: we have promised that all
events up to and including T have been emitted, and yet, we are emitting
more events with timestamp T.

This might not be a violation per se -- but it is confusing.
We should do 2 things:

Emit warning when altering changefeed with resolved option when adding new tables with initial scan
Consider updating docs to clarify these semantics.

Jira issue: CRDB-17461

Epic CRDB-14988

blathers-crl · 2022-07-08T20:59:05Z

cc @cockroachdb/cdc

Address multiple source of flakes in changefeed tests. cockroachdb#83530 made a change to ensure that changefeed do not fail when they are in the transient (e.g. pause-requested) state. Unfortunately, the PR made a mistake where even if the checkpoint could not be completed because the cangefeed is in the "pause requested" state, we would still proceed to emit resolved event. This is wrong, and the resolved event should never be emitted if we failed to checkpoint. In addition, alter changefeed can be used to add new tables to existing changefeed, with initial scan. In such cases, the newly added table will emit events as of the timestamp of "alter changefeed statement". When this happens, the semantics around resolved events are murky as document in cockroachdb#84102 Address this issue by making cloud storage sink more permissive around it's handling of resolved timestamp. When completing initial scan for newly added tables, fix an "off by 1" error when frontier was advanced to the next timestamp. This was wrong since cockroachdb#82451 clarified that the rangefeed start time is exclusive. Informs cockroachdb#83882 Fixes cockroachdb#83946 Release Notes: None

84109: changefeedcc: De-flake changefeed tests. r=miretskiy a=miretskiy Address multiple source of flakes in changefeed tests. #83530 made a change to ensure that changefeed do not fail when they are in the transient (e.g. pause-requested) state. Unfortunately, the PR made a mistake where even if the checkpoint could not be completed because the cangefeed is in the "pause requested" state, we would still proceed to emit resolved event. This is wrong, and the resolved event should never be emitted if we failed to checkpoint. In addition, alter changefeed can be used to add new tables to existing changefeed, with initial scan. In such cases, the newly added table will emit events as of the timestamp of "alter changefeed statement". When this happens, the semantics around resolved events are murky as documented in #84102 Address this issue by making cloud storage sink more permissive around its handling of resolved timestamp. When completing initial scan for newly added tables, fix an "off by 1" error when frontier was advanced to the next timestamp. This was wrong since #82451 clarified that the rangefeed start time is exclusive. Informs #83882 Fixes #83946 Release Notes: None Co-authored-by: Yevgeniy Miretskiy <[email protected]>

amruss · 2022-07-13T16:24:03Z

closing and making this a docs issue

Address multiple source of flakes in changefeed tests. cockroachdb#83530 made a change to ensure that changefeed do not fail when they are in the transient (e.g. pause-requested) state. Unfortunately, the PR made a mistake where even if the checkpoint could not be completed because the cangefeed is in the "pause requested" state, we would still proceed to emit resolved event. This is wrong, and the resolved event should never be emitted if we failed to checkpoint. In addition, alter changefeed can be used to add new tables to existing changefeed, with initial scan. In such cases, the newly added table will emit events as of the timestamp of "alter changefeed statement". When this happens, the semantics around resolved events are murky as document in cockroachdb#84102 Address this issue by making cloud storage sink more permissive around it's handling of resolved timestamp. When completing initial scan for newly added tables, fix an "off by 1" error when frontier was advanced to the next timestamp. This was wrong since cockroachdb#82451 clarified that the rangefeed start time is exclusive. Informs cockroachdb#83882 Fixes cockroachdb#83946 Release Notes: None Release note (<category, see below>): <what> <show> <why>

miretskiy added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture T-cdc labels Jul 8, 2022

miretskiy mentioned this issue Jul 8, 2022

changefeedcc: De-flake changefeed tests. #84109

Merged

amruss closed this as completed Jul 13, 2022

miretskiy mentioned this issue Jul 27, 2022

release-22.1: changefeedcc: De-flake changefeed tests #85152

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

changefeedcc: Alter changefeed w/ Resolved has murky semantics. #84102

changefeedcc: Alter changefeed w/ Resolved has murky semantics. #84102

miretskiy commented Jul 8, 2022 •

edited by exalate-issue-sync bot

Loading

blathers-crl bot commented Jul 8, 2022

amruss commented Jul 13, 2022

changefeedcc: Alter changefeed w/ Resolved has murky semantics. #84102

changefeedcc: Alter changefeed w/ Resolved has murky semantics. #84102

Comments

miretskiy commented Jul 8, 2022 • edited by exalate-issue-sync bot Loading

blathers-crl bot commented Jul 8, 2022

amruss commented Jul 13, 2022

miretskiy commented Jul 8, 2022 •

edited by exalate-issue-sync bot

Loading