Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: generate fewer spans for protected timestamps #54747

Closed
1 of 2 tasks
ajwerner opened this issue Sep 24, 2020 · 6 comments
Closed
1 of 2 tasks

backupccl: generate fewer spans for protected timestamps #54747

ajwerner opened this issue Sep 24, 2020 · 6 comments

Comments

@ajwerner
Copy link
Contributor

ajwerner commented Sep 24, 2020

Describe the problem

The protected timestamp subsystem was designed, perhaps mistakenly, under an assumption that the total number and size of protected spans would be bounded and relatively small (megabytes at most). This assumption was partially coming from an assumption that the entity which would be protected is an entire table. Table spans tend to be quite small in terms of byte size as they are just a byte and an varint integer. In order to deal with interleaved tables appropriately, backup jobs generate spans for every table index. This can lead to a major explosion in the total number of spans. When too many spans are created, the record can exceed the default size of the limits of the protected timestamp subsystem. While the setting on the maximum number of spans now seems too small, at some point we'll begin to worry about the actual size of the records. While not a general and infinitely scalable solution, creating smaller sets of spans and then compressing those spans is likely to make for a very large improvement in the total number of spans generated.

As a first pass this means:

  • Protecting the full table span for any non-interleaved table being backed up.
  • Merging contiguous table spans into a single span.

The difficulty will be dealing with the edge cases of overlapping spans but I believe we have the technology for this. The above optimizations should lead to full cluster backup going from all index spans to very few table spans.

Epic CRDB-7078

@blathers-crl
Copy link

blathers-crl bot commented Sep 24, 2020

Hi @ajwerner, please add a C-ategory label to your issue. Check out the label system docs.

While you're here, please consider adding an A- label to help keep our repository tidy.

🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.

@ajwerner
Copy link
Contributor Author

cc @adityamaru

@adityamaru adityamaru self-assigned this Sep 24, 2020
@adityamaru
Copy link
Contributor

Is my understanding correct that the only case we should see overlapping spans is when a tables index is interleaved in another table?
The code seems to suggest that when selecting the spans to backup, if any of the indexes of the current table are interleaved then we construct a span with the greatest ancestors {tableID, indexID}, which would overlap with one of the spans outputted when processing that ancestor table itself.

@ajwerner
Copy link
Contributor Author

Is my understanding correct that the only case we should see overlapping spans is when a tables index is interleaved in another table?

That sounds right to me.

adityamaru added a commit to adityamaru/cockroach that referenced this issue Sep 25, 2020
Previously, to account for interleaved tables, backup would generate
spans for every table index, which would then be used by the protected
ts record. Recently we saw a couple of instances where this was
resulting in a very large number of spans being generated during backup.
A direct consequence of this is that the record can exceed the default
size of the limits of the protected timestamp subsystem.

With this change we breakdown the span generation into two cases.
- If the table is non-interleaved, then we can simply protect a full
  table span rather than a span for each index.

 - If the table is interleaved, the behavior remains the same as before.

This optimization is one of a few that should help us reduce the number
of spans we try to protect.

Informs: cockroachdb#54747

Release note: None
@adityamaru
Copy link
Contributor

adityamaru commented Nov 18, 2020

Just leaving a note for myself to think about a further optimization where we merge contiguous full table spans as well, on top of the work done in the above PR. Will need to think about the invariants to apply when deciding if two table spans should be merged or not.

craig bot pushed a commit that referenced this issue Nov 23, 2020
55794: backupccl: optimize spans selected for backup and ts protection r=pbardea a=adityamaru

Previously, to account for interleaved tables, backup would generate
spans for every table index, which would then be used by the protected
ts record. Recently we saw a couple of instances where this was
resulting in a very large number of spans being generated during backup.
A direct consequence of this is that the record can exceed the default
size of the limits of the protected timestamp subsystem.

In this new scheme we attempt to merge spans using the following
rules:

- Contiguous index spans are merged.
- Two non-contiguous index spans are merged if a scan request for the index
IDs between them does not return any results.

Egs: {/Table/51/1 - /Table/51/2}, {/Table/51/3 - /Table/51/4} => {/Table/51/1 - /Table/51/4}
provided the dropped index represented by the span
{/Table/51/2 - /Table/51/3} has been gc'ed.

The resultant merged spans are what we BACKUP and what we protect from
gc using a protected ts record.

Informs: #54747

Release note: None

56298: pgwire: rework DecodeOidDatum to DecodeDatum to parse OidFamily types r=rafiss a=otan

Resolves #56193 

pgwire: rework DecodeOidDatum to DecodeDatum to parse OidFamily types

Reworked DecodeOidDatum to DecodeDatum to take in a type, which
encodes additional useful information necessary for ENUMs and oid
family types.

Release note (bug fix): Fixed a bug where reg* types were not parsed
properly over pgwire, COPY or prepared statements.

tree: create ParseDOid method

* Move ParseDOid and associated methods to new function.
* Move datum_test.go to datum_integration_test.go, as it does not import
  datum.go tests.
* Move some of datum_invariants_test.go out into datum_test.go.
* Created new datum_test.go with pure unit tests.

Release note: None



56920: importccl: Add DROP TABLE [IF EXISTS] support for import pgdump. r=adityamaru a=mokaixu

Previously, whenever a DROP TABLE statement was parsed, an
error was thrown and the import would fail since DROP TABLE
statements were not supported.

Now, when we encounter a DROP TABLE statement for a target table
foo, if foo exists, then we throw an error indicating to the user
to drop the table foo. Otherwise, if foo does not exist, we silently
ignore the DROP statement and proceed with the pgdump import.

Resolves: #53112

Release note: None

57004: geos: link to docs if GEOS is not installed r=rytaft,sumeerbhola a=otan

Release note (sql change): Introduce a hint when GEOS is improperly
installed to the docs instructions on installing CockroachDB.

Co-authored-by: Aditya Maru <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
Co-authored-by: Monica Xu <[email protected]>
annezhu98 pushed a commit to annezhu98/cockroach that referenced this issue Jun 8, 2021
…tection

Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently.

To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated.

To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met:

* No dropping or adding indexes between LHS and RHS
* LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed

Resolves: cockroachdb#58134
Informs: cockroachdb#54747

Release note: None
annezhu98 pushed a commit to annezhu98/cockroach that referenced this issue Jun 11, 2021
…tection

Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently.

To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794.

To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met:

* No dropping or adding indexes between LHS and RHS
* LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed

Resolves: cockroachdb#58134
Informs: cockroachdb#54747

Release note: None
annezhu98 pushed a commit to annezhu98/cockroach that referenced this issue Jun 15, 2021
…tection

Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently.

To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794.

To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met:

* No dropping or adding indexes between LHS and RHS
* LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed

Resolves: cockroachdb#58134
Informs: cockroachdb#54747

Release note: None
annezhu98 pushed a commit to annezhu98/cockroach that referenced this issue Jun 17, 2021
…tection

Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently.

To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794.

To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met:

* No dropping or adding indexes between LHS and RHS
* LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed

Resolves: cockroachdb#58134
Informs: cockroachdb#54747

Release note: None
annezhu98 pushed a commit to annezhu98/cockroach that referenced this issue Jun 25, 2021
…tection

Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently.

To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794.

To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met:

* No dropping or adding indexes between LHS and RHS
* LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed

Resolves: cockroachdb#58134
Informs: cockroachdb#54747

Release note: None
annezhu98 pushed a commit to annezhu98/cockroach that referenced this issue Jun 28, 2021
…tection

Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently.

To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794.

To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met:

* No dropping or adding indexes between LHS and RHS
* LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed

Resolves: cockroachdb#58134
Informs: cockroachdb#54747

Release note: None
@adityamaru
Copy link
Contributor

adityamaru commented Sep 22, 2021

We addressed part of this in #55794. We did attempt a change where we coalesced spans across table boundaries as well but the added complexity to backups span selection did not instill confidence in checking in that change.

We would like to pursue #70582 instead since cluster backups are what most customers use, and where this is probably most impactful. Database backups would also benefit from coalition across table boundaries but this is a lower priority in my opinion. I'm closing this issue in favor of the one linked above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
No open projects
Archived in project
Development

No branches or pull requests

4 participants