-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
changefeedccl: Improve protected timestamp handling #103539
Conversation
d4a2504
to
a90f15c
Compare
var MaxProtectedTimestampAge = settings.RegisterDurationSetting( | ||
settings.TenantWritable, | ||
"changefeed.protect_timestamp.max_age", | ||
"fail the changefeed is the protected timestamp age exceeds this threshold; 0 disables expiration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"fail the changefeed is the protected timestamp age exceeds this threshold; 0 disables expiration", | |
"fail the changefeed if the protected timestamp age exceeds this threshold; 0 disables expiration", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done; FYI: I also updated changefeed_stmt warning message to produce better warning when this setting is too low.
func (s StatementOptions) DeprecationWarnings() []string { | ||
if newFormat, ok := NoLongerExperimental[s.m[OptFormat]]; ok { | ||
return []string{fmt.Sprintf(`%[1]s is no longer experimental, use %[2]s=%[1]s`, | ||
newFormat, OptFormat)} | ||
} | ||
for retiredOpt := range RetiredOptions { | ||
if _, isSet := s.m[retiredOpt]; isSet { | ||
return []string{fmt.Sprintf("%s option is no longer needed", retiredOpt)} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be nice if we explained what the current behaviour is now that the option is not needed, in this case it always being enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will have to be done on the docs side; based on the PR description.
(otherwise, we'll have to add more configurations for each option we decide to retire.
f5eb1bf
to
32ea568
Compare
Changefeeds utilize protected timestamp (PTS) in order to ensure that the data is not garbage collected (GC) prematurely. This subsystem underwent through many rounds of changes, resulting in an unintuitive, and potentially dangerous behavior. This PR updates and improves PTS handling as follows. PR cockroachdb#97148 introduce capability to cancel jobs that hold on to stale PTS records. This PR expands this functionality to apply to all jobs -- not just paused jobs. This is necessary because due to cockroachdb#90810, changefeeds will retry almost every error -- and that means that if a running changefeed jobs fails to make progress for very long time, it is possible that a PTS record will protect GC collection for many days, weeks, or even months. To guard against this case, introduce a new cluster setting `changefeed.protect_timestamp.max_age`, which defaults to generous 4 days, to make sure that even if the explicit changefeed option `gc_protect_expires_after` was not specified, the changefeed will fail after `changefeed.protect_timestamp.max_age` if no progress is made. This setting only applies to newly created changefeeds. Use `ALTER CHANGEFEED` statement to set `gc_protect_expires_after` option for existing changefeeds to enable PTS expiration. The fail safe can be disabled by setting `changefeed.protect_timestamp.max_age` to 0; Note, however, that doing so could result in stability issues once stale PTS record released. In addition, this PR deprecates `protect_data_from_gc_on_pause` option. This option is not needed since we now employ "active protected timestamp" management (meaning: there is always a PTS record when running changefeed jobs), and the handling of this record is consistent for both running and paused jobs. Fixes cockroachdb#103464 Release note (enterprise change): Introduce a new `changefeed.protect_timestamp.max_age` setting (default 4 days), which will cancel running changefeed jobs if they fail to make forward progress for much time. This setting is used if the explicit `gc_protect_expires_after` option was not set. In addition, deprecate `protect_data_from_gc_on_pause` option. This option is no longer needed since changefeed jobs always protect data.
bors r+ |
Build succeeded: |
Changefeeds utilize protected timestamp (PTS) in order to ensure that the data is not garbage collected (GC) prematurely.
This subsystem underwent through many rounds of changes, resulting in an unintuitive, and potentially dangerous behavior.
This PR updates and improves PTS handling as follows. PR #97148 introduce capability to cancel jobs that hold on to stale PTS records. This PR expands this functionality to apply to all jobs -- not just paused jobs.
This is necessary because due to #90810, changefeeds will retry almost every error -- and that means that if a running changefeed jobs fails to make progress for very long time, it is possible that a PTS record will protect GC collection for many days, weeks, or even months.
To guard against this case, introduce a new cluster setting
changefeed.protect_timestamp.max_age
, which defaults to generous 4 days, to make sure that even if the explicit changefeed optiongc_protect_expires_after
was not specified, the changefeed will fail afterchangefeed.protect_timestamp.max_age
if no progress is made.The fail safe can be disabled by setting
changefeed.protect_timestamp.max_age
to 0; Note, however, that doing so could result in stability issues once stale PTS record released.In addition, this PR deprecates
protect_data_from_gc_on_pause
option. This option is not needed since we now employ "active protected timestamp" management (meaning: there is always a PTS record when running changefeed jobs), and the handling of this record is consistent for both running and paused jobs.Fixes #103464
Release note (enterprise change): Introduce a new
changefeed.protect_timestamp.max_age
setting (default 4 days), which will cancel running changefeed jobs if they fail to make forward progress for much time. This setting is used if the explicitgc_protect_expires_after
option was not set. In addition, deprecateprotect_data_from_gc_on_pause
option. This option is no longer needed since changefeed jobs always protect data.