Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changefeedcc: Alter changefeed w/ Resolved has murky semantics. #84102

Closed
2 tasks
miretskiy opened this issue Jul 8, 2022 · 2 comments
Closed
2 tasks

changefeedcc: Alter changefeed w/ Resolved has murky semantics. #84102

miretskiy opened this issue Jul 8, 2022 · 2 comments
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc

Comments

@miretskiy
Copy link
Contributor

miretskiy commented Jul 8, 2022

Alter changefeed to add new table with initial scan has murky
"resolved" semantics.

Consider the case when existing changefeed, running with resolved option,
is paused , and has the checkpoint as of time T
(and thus resolved event has been emitted at time T).

We then subsequently alter the feed, and add a new table with initial scan.
All previous tables will not be scanned; but the new table will be scanned
as of time T. This means that we will be emitting events with timestamp
set to T -- which seem to violate (or at least make murky) semantics
around previously emitted resolved event. That is: we have promised that all
events up to and including T have been emitted, and yet, we are emitting
more events with timestamp T.

This might not be a violation per se -- but it is confusing.
We should do 2 things:

  • Emit warning when altering changefeed with resolved option when adding new tables with initial scan
  • Consider updating docs to clarify these semantics.

Jira issue: CRDB-17461

Epic CRDB-14988

@miretskiy miretskiy added C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. A-cdc Change Data Capture T-cdc labels Jul 8, 2022
@blathers-crl
Copy link

blathers-crl bot commented Jul 8, 2022

cc @cockroachdb/cdc

miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Jul 8, 2022
Address multiple source of flakes in changefeed tests.

cockroachdb#83530 made a change
to ensure that changefeed do not fail when they are in the transient
(e.g. pause-requested) state.  Unfortunately, the PR made a mistake
where even if the checkpoint could not be completed because
the cangefeed is in the "pause requested" state, we would still
proceed to emit resolved event.  This is wrong, and the resolved
event should never be emitted if we failed to checkpoint.

In addition, alter changefeed can be used to add new tables to existing
changefeed, with initial scan.  In such cases, the newly added table
will emit events as of the timestamp of "alter changefeed statement".
When this happens, the semantics around resolved events are murky
as document in cockroachdb#84102
Address this issue by making cloud storage sink more permissive around
it's handling of resolved timestamp.

When completing initial scan for newly added tables, fix an "off by 1"
error when frontier was advanced to the next timestamp.
This was wrong since cockroachdb#82451
clarified that the rangefeed start time is exclusive.

Informs cockroachdb#83882
Fixes cockroachdb#83946

Release Notes: None
craig bot pushed a commit that referenced this issue Jul 9, 2022
84109: changefeedcc: De-flake changefeed tests. r=miretskiy a=miretskiy

Address multiple source of flakes in changefeed tests.

#83530 made a change
to ensure that changefeed do not fail when they are in the transient
(e.g. pause-requested) state.  Unfortunately, the PR made a mistake
where even if the checkpoint could not be completed because
the cangefeed is in the "pause requested" state, we would still
proceed to emit resolved event.  This is wrong, and the resolved
event should never be emitted if we failed to checkpoint.

In addition, alter changefeed can be used to add new tables to existing
changefeed, with initial scan.  In such cases, the newly added table
will emit events as of the timestamp of "alter changefeed statement".
When this happens, the semantics around resolved events are murky
as documented in #84102
Address this issue by making cloud storage sink more permissive around
its handling of resolved timestamp.

When completing initial scan for newly added tables, fix an "off by 1"
error when frontier was advanced to the next timestamp.
This was wrong since #82451
clarified that the rangefeed start time is exclusive.

Informs #83882
Fixes #83946

Release Notes: None

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@amruss
Copy link
Contributor

amruss commented Jul 13, 2022

closing and making this a docs issue

@amruss amruss closed this as completed Jul 13, 2022
miretskiy pushed a commit to miretskiy/cockroach that referenced this issue Jul 27, 2022
Address multiple source of flakes in changefeed tests.

cockroachdb#83530 made a change
to ensure that changefeed do not fail when they are in the transient
(e.g. pause-requested) state.  Unfortunately, the PR made a mistake
where even if the checkpoint could not be completed because
the cangefeed is in the "pause requested" state, we would still
proceed to emit resolved event.  This is wrong, and the resolved
event should never be emitted if we failed to checkpoint.

In addition, alter changefeed can be used to add new tables to existing
changefeed, with initial scan.  In such cases, the newly added table
will emit events as of the timestamp of "alter changefeed statement".
When this happens, the semantics around resolved events are murky
as document in cockroachdb#84102
Address this issue by making cloud storage sink more permissive around
it's handling of resolved timestamp.

When completing initial scan for newly added tables, fix an "off by 1"
error when frontier was advanced to the next timestamp.
This was wrong since cockroachdb#82451
clarified that the rangefeed start time is exclusive.

Informs cockroachdb#83882
Fixes cockroachdb#83946

Release Notes: None

Release note (<category, see below>): <what> <show> <why>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-cdc Change Data Capture C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. T-cdc
Projects
None yet
Development

No branches or pull requests

2 participants