Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sql,protectedts: mark tables as ephemeral to exclude them from backups #73536

Closed
6 tasks
adityamaru opened this issue Dec 6, 2021 · 9 comments · Fixed by #77406
Closed
6 tasks

sql,protectedts: mark tables as ephemeral to exclude them from backups #73536

adityamaru opened this issue Dec 6, 2021 · 9 comments · Fixed by #77406
Assignees
Labels
A-disaster-recovery branch-master Failures and bugs on the master branch. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery

Comments

@adityamaru
Copy link
Contributor

adityamaru commented Dec 6, 2021

This is a tracking issue that captures an idea proposed by @dt.

Some users have encountered challenges with tables that contain high-churn, ephemeral data, where they want to GC history in that table quickly, but cannot simply configure it to do so because it is contained within a database, tenant or cluster that is backed up incrementally, which requires that history be preserved until that backup can back it up.

In many such cases, the user does not actually explicitly desire data in these tables to be backed up at all, and are only doing so because it is part of the larger database, tenant or cluster being backed up. Thus one possible solution would be to provide a mechanism for a table like this to opt out of inclusion in backups, freeing it to configure more aggressive GC without affecting the ability of its containing database to backup. Simply excluding the table entirely could be challenging -- it may still be referenced in jobs, logs, views, etc. However excluding its row data would allow the backup to succeed even if its that table's span has been GCed, and would result in it just being empty when restored.

Approach:

One proposed approach would be to mark a table as ephemeral. The simplest solution would be to store this information on the TableDescriptor but this does not work because of the way tenants are backed up. A BACKUP TENANT from the system tenant simply backs up a single-tenant span, and does not look inside the tenant to be able to exclude certain tables. Thus, this information needs to be written to the ZoneConfig, and will subsequently rely on the transport infrastructure that was introduced to support zcfgs in a multi-tenant environment.

The tasks can then largely be broken up into:

  • Add epehemeral bit to ZoneConfig for a table, and the corresponding SpanConfig created as a result of reconciliation.

  • Teach ExportRequest to return an empty ExportResponse when attempting to export a replica that has an assocaited SpanConfig marked as ephemeral.

  • Teach GC to ignore any protected timestamp records that might be written to a replica that is marked as ephemeral. This way, even if a backup were to protect a set of schema objects that included the ephemeral table, it would allow the low GC TTL to continue garbage collecting the data.

  • Add a migration to mark system tables that opt-out from a cluster backup as ephemeral so that they are never backed up or protected.

  • Add SQL to set/unset a table as ephemeral. This might work out of the box with ALTER ZONE CONFIGURATION.

  • Think about how to make the fact that a table is ephemeral observable to the user during a backup. To avoid surprises where they expect a table to be a part of the backup, but it is silently ignored. This might be solved via documentation, but it is worth calling out explicitly.

Epic: CRDB-10306

Jira issue: CRDB-11627

@adityamaru adityamaru added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) A-disaster-recovery labels Dec 6, 2021
@blathers-crl
Copy link

blathers-crl bot commented Dec 6, 2021

cc @cockroachdb/bulk-io

@adityamaru adityamaru changed the title sql,protectedts: mark tables as epehemeral to exclude them from backups sql,protectedts: mark tables as ephemeral to exclude them from backups Dec 6, 2021
@exalate-issue-sync exalate-issue-sync bot changed the title sql,protectedts: mark tables as ephemeral to exclude them from backups sql,protectedts: mark tables as epehemeral to exclude them from backups Dec 6, 2021
@shermanCRL
Copy link
Contributor

Is this synonymous with exclusion? Is ephemeral a hint?

@adityamaru
Copy link
Contributor Author

adityamaru commented Dec 7, 2021

Is this synonymous with exclusion? Is ephemeral a hint?

Yep, ephemeral is an indicator that will be used to determine if the row data is considered short-lived and should therefore not be backed up.

@adityamaru
Copy link
Contributor Author

With the new protected timestamp subsystem potentially protecting schema objects instead of spans, there is an increased importance to addressing this issue. In particular, the new protection scheme is going to unlock the ability to chain protected timestamps between backups (#67282). By chaining protected timestamp records we decouple backup cadence from GC TTL which has been a longstanding thorn in our side. This way we can move to a future where cluster-wide TTLs do not have to default to 25 hours, and CRDB is not burdened by garbage buildup.

In this chaining future, we do not want to hold up GC on high churn tables with short TTLs for the duration of a backup.

@irfansharif
Copy link
Contributor

The KV side of things LGTM, just a minor change for the MVCC GC queue.

@adityamaru
Copy link
Contributor Author

adityamaru commented Dec 11, 2021

Wondering if there is really a need to restrict this ephemeral bit to just tables? With the knew reconciler, if we were to set the ephemeral bit on the zone config of a database, it would propagate to the table span configs, and so both GC and ExportRequest would just skip for all tables in the database. This seems okay 🤷

On the other hand, we probably don't want to allow setting this on an INDEX or PARTITION level, because then you might end up partially backing up a non-ephemeral table with an ephemeral index.

@shermanCRL shermanCRL changed the title sql,protectedts: mark tables as epehemeral to exclude them from backups sql,protectedts: mark tables as ephemeral to exclude them from backups Dec 12, 2021
adityamaru added a commit to adityamaru/cockroach that referenced this issue Dec 16, 2021
This change adds an `is_ephemeral` field to the zone config
proto definition. A schema object marked as `ephemeral` is
expected to have high-churn, ephemeral data with a short GC TTL.

Bulk operations such as BACKUP write protected timestamp records
to prevent GC from running on the target keyspace during job execution.
Today, we protect the spans that are part of the target being backed
up, and in the near future we will switch to protecting entire schema
objects. Consequently, a database backup would protect all tables in it,
including those high-churn tables with a short TTL. This could lead to
the build up of a large amount of garbage during the execution of
a backup, which could be on the order of hours depending on the size of
the cluster. Users often want to exclude backing up such tables
all together, but are forced to because of a larger target backup.

While it is more involved to completely exclude a table from a backup
because of references in jobs, views etc, we can return empty
row data if a range is explicitly marked as ephemeral. This way, a
RESTORE would bring back an empty table. The GC queue can also be
taught not to respect any pts records that might overlap a range
marked as epehemeral, when proposing a new GC threshold. Thereby allowing
GC to cleanup older revisions in these high-churn tables. These
changes will be added in the following commit.

To begin with the `is_ephemeral` field cannot be set on
Subzones (index/partition zone configurations) since the UX of partially
backing up a non-ephemeral table with an epehemeral index/partition
is not convincing.

Informs: cockroachdb#73536

Release note (sql change): Add `is_ephemeral` field to zone configurations
to allow users to mark the data in certain schema objects as `ephemeral`.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Dec 16, 2021
This change leverages the `is_ephemeral` flag set on zone configurations
to exclude the row data of marked tables from being backed up.

An `ExportRequest` on a range marked as `ephemeral` will return
an empty `ExportResponse`.

Since the row data will not be backed up, this change also teaches
the GC queue to ignore pts records that cover the `ephemeral` range
when proposing a new GC threshold. As a result of this, it is possible
that the target spans of the `ExportRequest` have already been gc'ed when
we go to evaluate the request. To ensure that this read before GC does
not fail the backup, we decorate the `BatchTimestampBeforeGCError` with
information about the range being `ephemeral` and handle the error in the
backup processor.

Lastly, we make AdminVerifyProtectedTimestampRequest a noop on an `ephemeral`
range since pts records are no longer respected by the GC queue for these
ranges.

Informs: cockroachdb#73536

Release note (sql change): Row data for tables  marked as `is_ephemeral`
via zone configs will no longer be backed up. A RESTORE of this backup
would result in an empty table being created in the restoring cluster.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jan 21, 2022
This change adds SQL syntax to be able to mark a table's
row data as ephemeral. It adds two new statements:

`ALTER TABLE ... SET EPHEMERAL DATA` and
`ALTER TABLE ... SET NOT EPHEMERAL DATA`

Informs: cockroachdb#73536

Release note(sql change): Add SQL statement
`ALTER TABLE ... SET [NOT] EPHEMERAL DATA` to set and
unset the row data in a table as ephemeral.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jan 21, 2022
This change powers the newly added `ALTER TABLE SET EPHEMERAL DATA`
SQL queries. It sets or unsets the `ephemeral` field on a table descriptor
based on the query. Data in temporary tables or tables with inbound foreign
key constraints cannot be marked as ephemeral.

Every `set_ephemeral` schema change is emitted to the event log.

This change does not teach any part of the system to use this ephemeral
bit, that will come in follow up PRs.

Informs: cockroachdb#73536

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jan 24, 2022
This change adds SQL syntax to be able to mark a table's
row data as ephemeral. It adds two new statements:

`ALTER TABLE ... SET EPHEMERAL DATA` and
`ALTER TABLE ... SET NOT EPHEMERAL DATA`

Informs: cockroachdb#73536

Release note(sql change): Add SQL statement
`ALTER TABLE ... SET [NOT] EPHEMERAL DATA` to set and
unset the row data in a table as ephemeral.
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jan 24, 2022
This change powers the newly added `ALTER TABLE SET EPHEMERAL DATA`
SQL queries. It sets or unsets the `ephemeral` field on a table descriptor
based on the query. Data in temporary tables or tables with inbound foreign
key constraints cannot be marked as ephemeral.

Every `set_ephemeral` schema change is emitted to the event log.

This change does not teach any part of the system to use this ephemeral
bit, that will come in follow up PRs.

Informs: cockroachdb#73536

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Jan 24, 2022
This change is the first of two changes that gets us to the goal of backup
ignoring ephemeral table row data, and not holding up GC on these ranges.

This change does a few things:

- It sets up the transport of the ephemeral bit set on a table descriptor
via `ALTER TABLE ... SET EPHEMERAL DATA`, to the span configuration applied
in KV.

- It teaches ExportRequest on a range marked as ephemeral to return
an empty ExportResponse. In this way, a backup processor will receive no row
data to backup up for an ephemeral table.

- A follow up change will also teach the SQLTranslator
to not populate the protected timestamp field on the SpanConfig for ephemeral
tables. This way, a long running backup will not hold up GC on such high-churn
tables. With no protection on ephemeral ranges, it is possible that an
ExportRequest targetting an ephemeral range has a StartTime
below the range's GCThreshold. To avoid the returned BatchTimestampBeforeGCError
from failing the backup we decorate the the error with information about the
range being ephemeral and handle the error in the backup processor.

Informs: cockroachdb#73536

Release note (sql change): BACKUP of a table marked as `ephemeral` via
`ALTER TABLE ... SET EPHEMERAL DATA` will no longer backup that table's row
data. The backup will continue to backup the table's descriptor and related
metadata, and so on restore we will end up with an empty version of the backed
up table.
@petermattis
Copy link
Collaborator

Naming nit: the term "ephemeral" suggests something beyond not including the table in backups. For example, I could imagine an "ephemeral" table as not surviving a cluster crash (in-memory only), or being automatically deleted after some time period. Unfortunately, I don't have a better suggestion than ephemeral. Did you consider trying to find a term/syntax that explicitly mentions "backup"?

@adityamaru
Copy link
Contributor Author

Did you consider trying to find a term/syntax that explicitly mentions "backup"?

We had a round of bikeshedding the term in this internal thread, and similar concerns were brought up by @dt about ephemeral getting confused with temporary tables which we already support. The one argument against naming this something specific to backups is that we will be ignoring protected timestamps that apply to ephemeral ranges so as to not hold up GC on these high churn tables. CDC and the soon to be tenant to tenant streaming are also users of the protected timestamp subsystem (PTS). While they are initially going to configure the PTS records they write to not be ignored on ephemeral ranges, there might be a future where other systems want similar behavior as backups? Maybe it is okay to name it something backup specific, to begin with, and if and when we see a more general use case we can revisit the syntax.

adityamaru added a commit to adityamaru/cockroach that referenced this issue Feb 24, 2022
…backup to SpanConfig

This change is a follow up to cockroachdb#75451 which taught ExportRequests
to noop on ranges marked as exclude_data_from_backup. This change

This diff does two things:

- It adds an `ignore_if_excluded_from_backup` bit to ptpb.Target that is set
on PTS records written by backup schedules and jobs.

- It adds an `ignore_if_excluded_from_backup` bit to the ProtectionPolicy that
is shipped to KV as part of the SpanConfig.

In a follow up PR, this bit on the SpanConfig will be used in conjunction with
`exclude_data_from_backup` to decide whether or not to ignore the ProtectionPolicy
when making GC decisions on a span. All other consumers of PTS records will
default to setting this bit to false, and so their ProtectionPolicies will always
influence GC even if `exclude_data_from_backup` is set to true.

Informs: cockroachdb#73536

Release note: None
craig bot pushed a commit that referenced this issue Feb 25, 2022
76831: spanconfigsqltranslator,jobsprotectedts: add `ignore_if_excluded_from_backup` to SpanConfig r=dt,irfansharif a=adityamaru

This change is a follow up to #75451 which taught ExportRequests
to noop on ranges marked as exclude_data_from_backup. This change

This diff does two things:

- It adds an `ignore_if_excluded_from_backup` bit to ptpb.Target that is set
on PTS records written by backup schedules and jobs.

- It adds an `ignore_if_excluded_from_backup` bit to the ProtectionPolicy that
is shipped to KV as part of the SpanConfig.

In a follow up PR, this bit on the SpanConfig will be used in conjunction with
`exclude_data_from_backup` to decide whether or not to ignore the ProtectionPolicy
when making GC decisions on a span. All other consumers of PTS records will
default to setting this bit to false, and so their ProtectionPolicies will always
influence GC even if `exclude_data_from_backup` is set to true.

Informs: #73536

Release note: None

Co-authored-by: Aditya Maru <[email protected]>
maryliag pushed a commit to maryliag/cockroach that referenced this issue Feb 28, 2022
…backup to SpanConfig

This change is a follow up to cockroachdb#75451 which taught ExportRequests
to noop on ranges marked as exclude_data_from_backup. This change

This diff does two things:

- It adds an `ignore_if_excluded_from_backup` bit to ptpb.Target that is set
on PTS records written by backup schedules and jobs.

- It adds an `ignore_if_excluded_from_backup` bit to the ProtectionPolicy that
is shipped to KV as part of the SpanConfig.

In a follow up PR, this bit on the SpanConfig will be used in conjunction with
`exclude_data_from_backup` to decide whether or not to ignore the ProtectionPolicy
when making GC decisions on a span. All other consumers of PTS records will
default to setting this bit to false, and so their ProtectionPolicies will always
influence GC even if `exclude_data_from_backup` is set to true.

Informs: cockroachdb#73536

Release note: None
@adityamaru
Copy link
Contributor Author

There is a final PR required to teach the GC queue to ignore protection policies that apply to spans where:
exclude_data_from_backup introduced in #75295 && ignore_if_excluded_from_backup introduced in #76831 are both true. This can only happen once #73727 is complete, therefore marking it as a release blocker.

@adityamaru adityamaru added branch-master Failures and bugs on the master branch. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 28, 2022
adityamaru added a commit to adityamaru/cockroach that referenced this issue Mar 5, 2022
Previously, GetProtectionTimestamps would return all the protection
timestamps that would apply over a span. This commit renames the
method to GetProtectionPolicies, and instead returns the protection
policies that apply over a span. This will be required in a follow
up commit that needs more fields from the ProtectionPolicy than just
the protected timestamp.

Informs: cockroachdb#73536

Release note: None

Release justification: low risk updates for new functionality
adityamaru added a commit to adityamaru/cockroach that referenced this issue Mar 5, 2022
This is the last of the changes needed to achieve cockroachdb#73536.
It teaches the helper used to read PTS records that apply to a
replica, to ignore ProtectionPolicies that were written by a backup
if the replica has been marked as `exclude_data_from_backup`.

From a users point of view, this allows them to mark a table whose
row data will be excluded from backup, and to set that tables gc.ttl
to a very low value. Backups that write PTS records will no longer
holdup GC on such low GC TTL tables.

Fixes: cockroachdb#73536

Release note: None

Release justification: low risk update to new functionality
adityamaru added a commit to adityamaru/cockroach that referenced this issue Mar 6, 2022
This change teaches the `GetProtectionTimestamps` method in the KVSubscriber
to ignore ProtectionPolicies that were written by a backup, and apply to a span
that has been marked as excluded from backup. This ensures that the ProtectionPolicy
written by a backup does not holdup GC on the span since it will not be exporting its
row data.

Informs: cockroachdb#73536

Release note: None

Release justification: low risk update to new functionality
RajivTS pushed a commit to RajivTS/cockroach that referenced this issue Mar 6, 2022
… from backup

This change is the first of two changes that gets us to the goal of backup
ignoring certain table row data, and not holding up GC on these ranges.

This change does a few things:

- It sets up the transport of the exclude_data_from_backup bit set on a
table descriptor, to the span configuration applied in KV.

- It teaches ExportRequest on a range marked as excluded to return
an empty ExportResponse. In this way, a backup processor will receive no row
data to backup up for an ephemeral table.

- A follow up change will also teach the SQLTranslator
to not populate the protected timestamp field on the SpanConfig for such
tables. This way, a long running backup will not hold up GC on such high-churn
tables. With no protection on such ranges, it is possible that an
ExportRequest targetting the range has a StartTime
below the range's GCThreshold. To avoid the returned BatchTimestampBeforeGCError
from failing the backup we decorate the the error with information about the
range being excluded from backup and handle the error in the backup processor.

Informs: cockroachdb#73536

Release note (sql change): BACKUP of a table marked with `exclude_data_from_backup`
via `ALTER TABLE ... SET (exclude_data_from_backup = true)` will no longer backup
that table's row data. The backup will continue to backup the table's descriptor
and related metadata, and so on restore we will end up with an empty version of
the backed up table.
RajivTS pushed a commit to RajivTS/cockroach that referenced this issue Mar 6, 2022
…backup to SpanConfig

This change is a follow up to cockroachdb#75451 which taught ExportRequests
to noop on ranges marked as exclude_data_from_backup. This change

This diff does two things:

- It adds an `ignore_if_excluded_from_backup` bit to ptpb.Target that is set
on PTS records written by backup schedules and jobs.

- It adds an `ignore_if_excluded_from_backup` bit to the ProtectionPolicy that
is shipped to KV as part of the SpanConfig.

In a follow up PR, this bit on the SpanConfig will be used in conjunction with
`exclude_data_from_backup` to decide whether or not to ignore the ProtectionPolicy
when making GC decisions on a span. All other consumers of PTS records will
default to setting this bit to false, and so their ProtectionPolicies will always
influence GC even if `exclude_data_from_backup` is set to true.

Informs: cockroachdb#73536

Release note: None
adityamaru added a commit to adityamaru/cockroach that referenced this issue Mar 8, 2022
This change teaches the `GetProtectionTimestamps` method in the KVSubscriber
to ignore ProtectionPolicies that were written by a backup, and apply to a span
that has been marked as excluded from backup. This ensures that the ProtectionPolicy
written by a backup does not holdup GC on the span since it will not be exporting its
row data.

Informs: cockroachdb#73536

Release note: None

Release justification: low risk update to new functionality
adityamaru added a commit to adityamaru/cockroach that referenced this issue Mar 8, 2022
This is the last of the changes needed to achieve cockroachdb#73536.
It teaches the helper used to read PTS records that apply to a
replica, to ignore ProtectionPolicies that were written by a backup
if the replica has been marked as `exclude_data_from_backup`.

From a users point of view, this allows them to mark a table whose
row data will be excluded from backup, and to set that tables gc.ttl
to a very low value. Backups that write PTS records will no longer
holdup GC on such low GC TTL tables.

Fixes: cockroachdb#73536

Release note: None

Release justification: low risk update to new functionality
craig bot pushed a commit that referenced this issue Mar 8, 2022
77392: spanconfigkvsubscriber: conditionally ignore ProtectionPolicies r=arulajmani a=adityamaru

This change teaches the `GetProtectionTimestamps` method in the KVSubscriber
to ignore ProtectionPolicies that were written by a backup, and apply to a span
that has been marked as excluded from backup. This ensures that the ProtectionPolicy
written by a backup does not holdup GC on the span since it will not be exporting its
row data.

Informs: #73536

Release note: None

Release justification: low risk update to new functionality

Co-authored-by: Aditya Maru <[email protected]>
craig bot pushed a commit that referenced this issue Mar 10, 2022
72991: server,sql: implement connection_wait for graceful draining r=ZhouXing19 a=ZhouXing19

Currently, the draining process is consist of three consecutive periods:

1. Server enters the "unready" state: The `/health?ready=1` http endpoint starts to show that the node is shutting down, but new SQL connections and new queries are still allowed. The server does a hard wait till the timeout. This phrase's duration is set with cluster setting `server.shutdown.drain_wait`.

2. Drain SQL connections: New SQL connections are not allowed. SQL Connections with no queries in flight will be closed by the server immediately. The rest of these SQL connections will be terminated by the server as soon as their queries are finished. Early exit if all queries are finished. This phrase's maximum duration is set with cluster setting `server.shutdown.query_wait`.

3. Drain range lease: the server keeps retrying forever until all range leases on this draining node have been transferred. Each retry iteration's duration is specified by the cluster setting `server.shutdown.lease_transfer_timeout`.

This commit reorganizes the draining process by adding a phrase where the server waits SQL connections to be closed, and once all SQL connections are closed before timeout, the server proceeds to the next draining phase.

The newly proposed draining process is:

1. (unchanged) Server enters the "unready" state: The `/health?ready=1` http endpoint starts to show that the node is shutting down, but new SQL connections and new queries are still allowed. The server does a hard wait till the timeout. This phrase's duration is set with cluster setting `server.shutdown.drain_wait`.

2. (new phase) Wait SQL connections to be closed: New SQL connections are not allowed now. The server waits for the remaining SQL connections to be closed or timeout. Once all SQL connections are closed, the draining proceed to the next phase. The maximum duration of this phase is determined by the cluster setting `server.shutdown.connection_wait`.

3. (unchanged) Drain SQL connections: New SQL connections are not allowed. SQL Connections with no queries in flight will be closed by the server immediately. The rest of these SQL connections will be terminated by the server as soon as their queries are finished. Early exit if all queries are finished. This phrase's maximum duration is set with cluster setting `server.shutdown.query_wait`.

4. (unchanged) Drain range lease: the server keeps retrying forever until all range leases on this draining node have been transferred. Each retry iteration's duration is specified by the cluster setting `server.shutdown.lease_transfer_timeout`.

The duration of the new phase ("Wait SQL connections to close") can be set similarly to the other 3 existing draining phases:
```
SET CLUSTER SETTING server.shutdown.connection_wait = '40s'
```

Resolves #66319

Release note (ops change):  add `server.shutdown.connection_wait` to the
draining process configuration. This provides a workaround when customers
encountered intermittent blips and failed requests when they were performing
operations that are related to restarting nodes.

Release justification: Low risk, high benefit changes to existing functionality
(optimize the node draining process).

76430: [CRDB-9550] kv: adjust number of voters needed calculation when determining replication status r=Santamaura a=Santamaura

Currently, when a range has non-voting replicas and it is queried through replication
stats, it will be reported as underreplicated. This is because in the case where a
zone is configured to have non-voting replicas, for the over/under replicated counts,
we compare the number of current voters to the total number of replicas which is
erroneus. Instead, we will compare current number of voters to the total number of
voters if voters has been set and otherwise will defer to the total number of replicas.
This patch ignores the desired non-voters count for the purposes of this report, for
better or worse. Resolves #69335.

Release justification: low risk bug fix

Release note (bug fix): use total number of voters if set when determining replication
status

Before change:
![Screen Shot 2022-02-11 at 10 03 57 AM](https://user-images.githubusercontent.com/17861665/153615571-85163409-5bac-40f4-9669-20dce77185cf.png)

After change:
![Screen Shot 2022-02-11 at 9 53 04 AM](https://user-images.githubusercontent.com/17861665/153615316-785b156b-bd23-4cfa-a76d-7c9fa47fbf1e.png)

77315: backupccl: backup correctly tries reading in from base directory if l… r=DarrylWong a=DarrylWong

…atest/checkpoint files aren't found

Before, we only tried reading from the base directory if we caught a ErrFileDoesNotExist error. However
this does not account for the potential error thrown when the progress/latest directories don't exist.
This changes it so we now correctly retry reading from the base directory.

We also put the latest directory inside of a metadata directory, in order to avoid any potential
conflicts with there being a latest file and latest directory in the same base directory.

Also wraps errors in findLatestFile and readLatestCheckpointFile for more clarity when both base and
latest/progress directories fail to read.

Fixes #77312

Release justification: Low risk bug fix
Release note: none

77406: backupccl: test ignore ProtectionPolicy for exclude_data_from_backup r=dt a=adityamaru

This change adds an end to end test to ensure that a table excluded
from backup will not holdup GC on its replica even in the presence
of a protected timestamp record covering the replica

From a users point of view, this allows them to mark a table whose
row data will be excluded from backup, and to set that tables gc.ttl
to a very low value. Backups that write PTS records will no longer
holdup GC on such low GC TTL tables.

Fixes: #73536

Release note: None

Release justification: low risk update to new functionality

77450: ui: add selected period as part of cached key r=maryliag a=maryliag

Previously, the fingerprint id and the app names were used
as a key for a statement details cache. This commits adds
the start and end time (when existing) to the key, so
the details are correctly assigned to the selected period.

This commit also rounds the selected value period to the hour,
since that is what is used on the persisted statistics, with
the start value keeping the hour and the end value adding one
hour, for example:
start: 17:45:23  ->  17:00:00
end:   20:14:32  ->  21:00:00

Partially addresses #72129

Release note: None
Release Justification: Low risk, high benefit change

77597: kv: Add `created` column to `active_range_feeds` table. r=miretskiy a=miretskiy

Add `created` column to `active_range_feeds` table.
This column is initialized to the time when the partial range feed
was created.  This allows us to determine, among other things,
whether or not the rangefeed is currently performing a catchup scan
(i.e. it's resolved column is 0), and how long the scan has been running
for.

Release Notes (enterprise): Add created time column
to `crdb_internal.active_range_feeds` virtual table to improve observability
and debugability of rangefeed system.

Fixes #77581

Release Justification: Low impact observability/debugability improvement.

Co-authored-by: Jane Xing <[email protected]>
Co-authored-by: Santamaura <[email protected]>
Co-authored-by: Darryl <[email protected]>
Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Marylia Gutierrez <[email protected]>
Co-authored-by: Yevgeniy Miretskiy <[email protected]>
@craig craig bot closed this as completed in 3470def Mar 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-disaster-recovery branch-master Failures and bugs on the master branch. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

4 participants