-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: generate fewer spans for protected timestamps #54747
Comments
Hi @ajwerner, please add a C-ategory label to your issue. Check out the label system docs. While you're here, please consider adding an A- label to help keep our repository tidy. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan. |
cc @adityamaru |
Is my understanding correct that the only case we should see overlapping spans is when a tables index is interleaved in another table? |
That sounds right to me. |
Previously, to account for interleaved tables, backup would generate spans for every table index, which would then be used by the protected ts record. Recently we saw a couple of instances where this was resulting in a very large number of spans being generated during backup. A direct consequence of this is that the record can exceed the default size of the limits of the protected timestamp subsystem. With this change we breakdown the span generation into two cases. - If the table is non-interleaved, then we can simply protect a full table span rather than a span for each index. - If the table is interleaved, the behavior remains the same as before. This optimization is one of a few that should help us reduce the number of spans we try to protect. Informs: cockroachdb#54747 Release note: None
Just leaving a note for myself to think about a further optimization where we merge contiguous full table spans as well, on top of the work done in the above PR. Will need to think about the invariants to apply when deciding if two table spans should be merged or not. |
55794: backupccl: optimize spans selected for backup and ts protection r=pbardea a=adityamaru Previously, to account for interleaved tables, backup would generate spans for every table index, which would then be used by the protected ts record. Recently we saw a couple of instances where this was resulting in a very large number of spans being generated during backup. A direct consequence of this is that the record can exceed the default size of the limits of the protected timestamp subsystem. In this new scheme we attempt to merge spans using the following rules: - Contiguous index spans are merged. - Two non-contiguous index spans are merged if a scan request for the index IDs between them does not return any results. Egs: {/Table/51/1 - /Table/51/2}, {/Table/51/3 - /Table/51/4} => {/Table/51/1 - /Table/51/4} provided the dropped index represented by the span {/Table/51/2 - /Table/51/3} has been gc'ed. The resultant merged spans are what we BACKUP and what we protect from gc using a protected ts record. Informs: #54747 Release note: None 56298: pgwire: rework DecodeOidDatum to DecodeDatum to parse OidFamily types r=rafiss a=otan Resolves #56193 pgwire: rework DecodeOidDatum to DecodeDatum to parse OidFamily types Reworked DecodeOidDatum to DecodeDatum to take in a type, which encodes additional useful information necessary for ENUMs and oid family types. Release note (bug fix): Fixed a bug where reg* types were not parsed properly over pgwire, COPY or prepared statements. tree: create ParseDOid method * Move ParseDOid and associated methods to new function. * Move datum_test.go to datum_integration_test.go, as it does not import datum.go tests. * Move some of datum_invariants_test.go out into datum_test.go. * Created new datum_test.go with pure unit tests. Release note: None 56920: importccl: Add DROP TABLE [IF EXISTS] support for import pgdump. r=adityamaru a=mokaixu Previously, whenever a DROP TABLE statement was parsed, an error was thrown and the import would fail since DROP TABLE statements were not supported. Now, when we encounter a DROP TABLE statement for a target table foo, if foo exists, then we throw an error indicating to the user to drop the table foo. Otherwise, if foo does not exist, we silently ignore the DROP statement and proceed with the pgdump import. Resolves: #53112 Release note: None 57004: geos: link to docs if GEOS is not installed r=rytaft,sumeerbhola a=otan Release note (sql change): Introduce a hint when GEOS is improperly installed to the docs instructions on installing CockroachDB. Co-authored-by: Aditya Maru <[email protected]> Co-authored-by: Oliver Tan <[email protected]> Co-authored-by: Monica Xu <[email protected]>
…tection Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently. To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated. To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met: * No dropping or adding indexes between LHS and RHS * LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed Resolves: cockroachdb#58134 Informs: cockroachdb#54747 Release note: None
…tection Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently. To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794. To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met: * No dropping or adding indexes between LHS and RHS * LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed Resolves: cockroachdb#58134 Informs: cockroachdb#54747 Release note: None
…tection Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently. To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794. To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met: * No dropping or adding indexes between LHS and RHS * LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed Resolves: cockroachdb#58134 Informs: cockroachdb#54747 Release note: None
…tection Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently. To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794. To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met: * No dropping or adding indexes between LHS and RHS * LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed Resolves: cockroachdb#58134 Informs: cockroachdb#54747 Release note: None
…tection Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently. To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794. To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met: * No dropping or adding indexes between LHS and RHS * LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed Resolves: cockroachdb#58134 Informs: cockroachdb#54747 Release note: None
…tection Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently. To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794. To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met: * No dropping or adding indexes between LHS and RHS * LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed Resolves: cockroachdb#58134 Informs: cockroachdb#54747 Release note: None
We addressed part of this in #55794. We did attempt a change where we coalesced spans across table boundaries as well but the added complexity to backups span selection did not instill confidence in checking in that change. We would like to pursue #70582 instead since cluster backups are what most customers use, and where this is probably most impactful. Database backups would also benefit from coalition across table boundaries but this is a lower priority in my opinion. I'm closing this issue in favor of the one linked above. |
Describe the problem
The protected timestamp subsystem was designed, perhaps mistakenly, under an assumption that the total number and size of protected spans would be bounded and relatively small (megabytes at most). This assumption was partially coming from an assumption that the entity which would be protected is an entire table. Table spans tend to be quite small in terms of byte size as they are just a byte and an varint integer. In order to deal with interleaved tables appropriately, backup jobs generate spans for every table index. This can lead to a major explosion in the total number of spans. When too many spans are created, the record can exceed the default size of the limits of the protected timestamp subsystem. While the setting on the maximum number of spans now seems too small, at some point we'll begin to worry about the actual size of the records. While not a general and infinitely scalable solution, creating smaller sets of spans and then compressing those spans is likely to make for a very large improvement in the total number of spans generated.
As a first pass this means:
The difficulty will be dealing with the edge cases of overlapping spans but I believe we have the technology for this. The above optimizations should lead to full cluster backup going from all index spans to very few table spans.
Epic CRDB-7078
The text was updated successfully, but these errors were encountered: