Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backupccl: optimize inter-table spans for backup and ts protection #66010

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

annezhu98
Copy link

@annezhu98 annezhu98 commented Jun 2, 2021

Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently.

To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work #55794.

To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met:

  • No dropping or adding indexes between LHS and RHS
  • LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed

Resolves: #58134
Informs: #54747

Release note: None

@annezhu98 annezhu98 requested review from pbardea, adityamaru and a team June 2, 2021 21:01
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@annezhu98 annezhu98 marked this pull request as draft June 2, 2021 21:08
@annezhu98 annezhu98 force-pushed the 58134 branch 2 times, most recently from 10973b7 to b500a04 Compare June 8, 2021 16:31
@annezhu98 annezhu98 marked this pull request as ready for review June 8, 2021 16:32
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @annezhu98 and @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 82 at r1 (raw file):

}

type tableAndSpan struct {

nit: rename to tableAndMergedSpans, and then rename the field to mergedSpans?


pkg/ccl/backupccl/backup_planning.go, line 181 at r1 (raw file):

// - Two non-contiguous index spans are merged if a scan request for the index
// IDs between them does not return any results.
// getLogicallyMergedTableSpans also returns the first and last index of the given table

nit: I don't think this is the case?


pkg/ccl/backupccl/backup_planning.go, line 296 at r1 (raw file):

// it does not make any changes to the output of getLogicallyMergedTableSpans except possibly the
// first and last spans of each table
func getLogicallyMergedInterTableSpans(

nit: maybe rename to getLogicallyMergedSpansAcrossTables


pkg/ccl/backupccl/backup_planning.go, line 308 at r1 (raw file):

	}

	var mergedInterTableSpans []roachpb.Span

I think this method can be simplified. We want to perform 3 checks:

  1. CheckForTablesInBounds(startTableID, endTableID): If we find any table descriptor between the LHS and RHS table ID we are considering for a merge, then we don't merge. This table descriptor would either be of an offline table, public table with no public indexes (not sure if this is possible) or table not included in the backup.

  2. `CheckForKVInBounds(lhs, rhs): This will scan the key range, and find any dropped but not gc'ed indexes. If this method returns true, abort merge. Here LHS is the EndKey of the last span in the tables' spans, and RHS is the StartKey of the first span in the next tables' spans.

3, check that the maxAddingIndexID for lhs (as defined above) is < lhs.indexID && minAddingIndexID for rhs > rhs.indexID.

Something like this for lhs, and similar for rhs but with the condition if idx.Adding() && idx.GetID() < lhs.indexID:

canMerge := true
allPhysicalIndexOpts := catalog.IndexOpts{AddMutations: true}		
if err := catalog.ForEachIndex(table, allPhysicalIndexOpts, func(idx catalog.Index) error {
        if idx.GetID() < lhs.indexID { skip cause we don't care about these inexes }
	if idx.Adding() && idx.GetID() > lhs.indexID {
                     canMerge = false
                     return nil
        }
	return nil
}

So then this method would check if any of these methods return true. If they did, we cannot merge the two table IDs in consideration. Take all the spans of the LHS table ID as is, and move on to consider the next two table IDs.


pkg/ccl/backupccl/backup_planning.go, line 412 at r1 (raw file):

	// getIndexesInBounds returns all adding indexes and public indexes found in the given table
	getIndexesInBounds := func(table catalog.TableDescriptor) (

let's not inline this method. The only reason we inlined checkForKVInbounds was so that we could inject our own implementation in the unit test. I don't think we need to do that here, so lets move this to a method.

Also, I would change the name to getPublicAndAddingIndexes cause we don't really have any "bounds".


pkg/ccl/backupccl/backup_planning.go, line 439 at r1 (raw file):

	}

	// checkForNonMVCCInvariants scans all tables between startTableID and endTableID and returns true iff for any table ID

for both these inline methods, see the above comment about how we can restructure getLogicallyMergedInterTableSpans. I'm okay injecting some of the helper methods described there so that we can unit test that method in isolation, but I think the logic can be simplified.

I'm also okay if you want to pull each of these out into helper methods while we iterate on the code, and once we are happy with them, we can make them inline.


pkg/ccl/backupccl/backup_planning.go, line 528 at r1 (raw file):

		if _, ok := tableSpans[tableID]; !ok {
			tableIDs = append(tableIDs, tableID)
		}

Do we expect to see the same tableID twice? I believe all tables in tables are unique descriptors? I'd understand this check for revs below cause you can have multiple revisions of a table since the last backup, but I'm not sure I see the need for this check here?


pkg/ccl/backupccl/backup_test.go, line 6305 at r1 (raw file):

		tableID2 := getTableID(db, "test", "foo2")
		require.Equal(t, []string{fmt.Sprintf("/Table/{%d/1-%d/4}", tableID, tableID2)},
			actualResolvedSpans)

i'm going to hold off looking at tests until we are happy with the code. We can then discuss how we want to remodel some of the unit tests!

@adityamaru
Copy link
Contributor

Nice work on this first stab @annezhu98! I've done the first pass, but I want to do a few more once we chat and restructure the code a little.

@adityamaru adityamaru self-requested a review June 10, 2021 03:51
@annezhu98 annezhu98 force-pushed the 58134 branch 2 times, most recently from 391c724 to 67eb4f5 Compare June 11, 2021 16:31
Copy link
Author

@annezhu98 annezhu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @annezhu98, and @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 308 at r1 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

I think this method can be simplified. We want to perform 3 checks:

  1. CheckForTablesInBounds(startTableID, endTableID): If we find any table descriptor between the LHS and RHS table ID we are considering for a merge, then we don't merge. This table descriptor would either be of an offline table, public table with no public indexes (not sure if this is possible) or table not included in the backup.

  2. `CheckForKVInBounds(lhs, rhs): This will scan the key range, and find any dropped but not gc'ed indexes. If this method returns true, abort merge. Here LHS is the EndKey of the last span in the tables' spans, and RHS is the StartKey of the first span in the next tables' spans.

3, check that the maxAddingIndexID for lhs (as defined above) is < lhs.indexID && minAddingIndexID for rhs > rhs.indexID.

Something like this for lhs, and similar for rhs but with the condition if idx.Adding() && idx.GetID() < lhs.indexID:

canMerge := true
allPhysicalIndexOpts := catalog.IndexOpts{AddMutations: true}		
if err := catalog.ForEachIndex(table, allPhysicalIndexOpts, func(idx catalog.Index) error {
        if idx.GetID() < lhs.indexID { skip cause we don't care about these inexes }
	if idx.Adding() && idx.GetID() > lhs.indexID {
                     canMerge = false
                     return nil
        }
	return nil
}

So then this method would check if any of these methods return true. If they did, we cannot merge the two table IDs in consideration. Take all the spans of the LHS table ID as is, and move on to consider the next two table IDs.

Done.


pkg/ccl/backupccl/backup_planning.go, line 412 at r1 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

let's not inline this method. The only reason we inlined checkForKVInbounds was so that we could inject our own implementation in the unit test. I don't think we need to do that here, so lets move this to a method.

Also, I would change the name to getPublicAndAddingIndexes cause we don't really have any "bounds".

Done.


pkg/ccl/backupccl/backup_planning.go, line 439 at r1 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

for both these inline methods, see the above comment about how we can restructure getLogicallyMergedInterTableSpans. I'm okay injecting some of the helper methods described there so that we can unit test that method in isolation, but I think the logic can be simplified.

I'm also okay if you want to pull each of these out into helper methods while we iterate on the code, and once we are happy with them, we can make them inline.

kept checkForKVInbounds and checkForTablesInBounds inline as we need to override them for unit tests


pkg/ccl/backupccl/backup_planning.go, line 528 at r1 (raw file):

Previously, adityamaru (Aditya Maru) wrote…
		if _, ok := tableSpans[tableID]; !ok {
			tableIDs = append(tableIDs, tableID)
		}

Do we expect to see the same tableID twice? I believe all tables in tables are unique descriptors? I'd understand this check for revs below cause you can have multiple revisions of a table since the last backup, but I'm not sure I see the need for this check here?

Done.

Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done another pass, it's getting there! Thanks for sticking with this 🙂

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @annezhu98, and @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 412 at r1 (raw file):

Previously, annezhu98 (Anne Zhu) wrote…

Done.

hmm, how do you feel about putting this back inside getLogicallyMergedSpans (as it was prior to this PR) just to reduce the diff. I think earlier, we expected to use this method in our new logic, but I don't think we do?


pkg/ccl/backupccl/backup_planning.go, line 189 at r2 (raw file):

	codec keys.SQLCodec,
	endTime hlc.Timestamp,
	added map[tableAndIndex]bool,

can we revert this change, doesn't seem needed anymore.


pkg/ccl/backupccl/backup_planning.go, line 314 at r2 (raw file):

		rhsTableID := tableIDs[curIndex+1]

		lhs := spans[lhsTableID]

nit: do we need lhs, rhs? Can we just store lhsSpan, and rhsSpan and rename them to:

lastSpanInLHSMergedSpans := lhs.mergedSpans[len(lhs.mergedSpans)-1]
firstSpanInRHSMergedSpans := rhs.mergedSpans[0]


pkg/ccl/backupccl/backup_planning.go, line 321 at r2 (raw file):

Quoted 6 lines of code…
		// LHS and RHS are safe to merge iff the following MVCC and Non-MVCC invariants hold:
		// 		1) MVCC: no dropping/adding indexes were found from the EndKey of the last span
		// 				 on the lhs table to the StartKey of the first span on the rhs table
		//		2) Non-MVCC: we can safely merge lhs and rhs if they're contiguous or there exists
		// 				 dropped and gc'ed tables between them. We should not merge if we are able
		//				 to fetch a table descriptor for any given table ID that is between lhs and rhs.

lets delete this comment and add a comment above each if check indicating what it is checking for?

  • Check for any tables between the table of the LHS and RHS span being considered for a merge. If found, we do not want to merge as the backup did not resolve the table as a target. Currently, this could either be because the table was not specified in the targets, or it was offline.

  • Check for any keys between the EndKey between the EndKey of the LHS span, and StartKey of the RHS span. If found, we do not want to merge as the backup will include unexpected data. These could be keys corresponding to tables that have been dropped, but not gc'ed.

  • Check for any adding indexes in the tables of the LHS and RHS that might be unexpectedly picked up


pkg/ccl/backupccl/backup_planning.go, line 327 at r2 (raw file):

		// 				 dropped and gc'ed tables between them. We should not merge if we are able
		//				 to fetch a table descriptor for any given table ID that is between lhs and rhs.
		for i := 0; i < 3 && canMerge; i++ {

any reason we can't restructure this as:

if foundTable, err := checkForTablesInBounds(lhsTableID, rhsTableID); foundTable {
mergedInterTableSpans = append(mergedInterTableSpans, lhs.mergedSpans...)
continue
}

if foundKVInBounds, err := checkForKVInBounds(lhsSpan.EndKey, rhsSpan.Key, endTime) {
mergedInterTableSpans = append(mergedInterTableSpans, lhs.mergedSpans...)
continue
}

if foundAddingIndexes, err := checkForAddingIndexes(...) {
mergedInterTableSpans = append(mergedInterTableSpans, lhs.mergedSpans...)
continue
}

I personally find it easier to read at a glance? Less bool bookkeeping too 🙂


pkg/ccl/backupccl/backup_planning.go, line 346 at r2 (raw file):

		if canMerge {
			mergedInterTableSpans = append(mergedInterTableSpans, lhs.mergedSpans[:len(lhs.mergedSpans)-1]...)

lets add a comment about what we're doing here. Something that captures that:

If a merge is possible, we add everything but the last span on the LHS, unchanged, to the result slice. We then modify the first span on the RHS to include the last span on the LHS, thereby performing a merge. This merged span will be added to the result slice in future iterations.


pkg/ccl/backupccl/backup_planning.go, line 352 at r2 (raw file):

		if curIndex == len(tableIDs)-2 {
			mergedInterTableSpans = append(mergedInterTableSpans, rhs.mergedSpans...)
		}

might need to pull this out of the loop if we go with the continue approach above?


pkg/ccl/backupccl/backup_planning.go, line 487 at r2 (raw file):

	// checkForTablesInBounds scans all tables between startTableID and endTableID and returns true iff
	// a non-IsUndefinedRelationError was returned for a table ID

nit: returns true if it successfully resolves a table in (startTableID, endTableID).

Also, should the check start at startTableID+1?


pkg/ccl/backupccl/backup_planning.go, line 490 at r2 (raw file):

	checkForTablesInBounds := func(startTableID, endTableID descpb.ID) (bool, error) {
		for tableID := startTableID; tableID < endTableID; tableID++ {
			_, err := p.LookupTableByID(ctx, tableID)

I think this might need a little bit of restructuring to be safer.

_, err :-= p.LookupTableByID(ctx, tableID)
// We found a table, return true.
if err == nil {
return true, nil
}
// We got an error that was not a "table not found error". Return that error.
if !sqlerrors.IsUndefinedRelationError(err) {
return true, err
}
// No table was found, safe to continue checking.


pkg/ccl/backupccl/backup_planning.go, line 507 at r2 (raw file):

			execCfg.Codec,
			endTime,
			added,

nit: can we revert this change to the original, will reduce the scope of this change.


pkg/ccl/backupccl/backup_planning.go, line 536 at r2 (raw file):

	)
	if err != nil {
		return nil, err

just a thought, since this is strictly an optimization, maybe we shouldn't fail the whole backup if this method returns an error and just continue with mergedIndexSpans. Curious what @pbardea thinks.

@adityamaru adityamaru self-requested a review June 15, 2021 14:38
Copy link
Author

@annezhu98 annezhu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @annezhu98, and @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 412 at r1 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

hmm, how do you feel about putting this back inside getLogicallyMergedSpans (as it was prior to this PR) just to reduce the diff. I think earlier, we expected to use this method in our new logic, but I don't think we do?

Done.


pkg/ccl/backupccl/backup_planning.go, line 189 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

can we revert this change, doesn't seem needed anymore.

Done.


pkg/ccl/backupccl/backup_planning.go, line 314 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

nit: do we need lhs, rhs? Can we just store lhsSpan, and rhsSpan and rename them to:

lastSpanInLHSMergedSpans := lhs.mergedSpans[len(lhs.mergedSpans)-1]
firstSpanInRHSMergedSpans := rhs.mergedSpans[0]

Done. I kept lhs and rhs since they are being referenced a few times in the code.


pkg/ccl/backupccl/backup_planning.go, line 321 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…
		// LHS and RHS are safe to merge iff the following MVCC and Non-MVCC invariants hold:
		// 		1) MVCC: no dropping/adding indexes were found from the EndKey of the last span
		// 				 on the lhs table to the StartKey of the first span on the rhs table
		//		2) Non-MVCC: we can safely merge lhs and rhs if they're contiguous or there exists
		// 				 dropped and gc'ed tables between them. We should not merge if we are able
		//				 to fetch a table descriptor for any given table ID that is between lhs and rhs.

lets delete this comment and add a comment above each if check indicating what it is checking for?

  • Check for any tables between the table of the LHS and RHS span being considered for a merge. If found, we do not want to merge as the backup did not resolve the table as a target. Currently, this could either be because the table was not specified in the targets, or it was offline.

  • Check for any keys between the EndKey between the EndKey of the LHS span, and StartKey of the RHS span. If found, we do not want to merge as the backup will include unexpected data. These could be keys corresponding to tables that have been dropped, but not gc'ed.

  • Check for any adding indexes in the tables of the LHS and RHS that might be unexpectedly picked up

Done.


pkg/ccl/backupccl/backup_planning.go, line 327 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

any reason we can't restructure this as:

if foundTable, err := checkForTablesInBounds(lhsTableID, rhsTableID); foundTable {
mergedInterTableSpans = append(mergedInterTableSpans, lhs.mergedSpans...)
continue
}

if foundKVInBounds, err := checkForKVInBounds(lhsSpan.EndKey, rhsSpan.Key, endTime) {
mergedInterTableSpans = append(mergedInterTableSpans, lhs.mergedSpans...)
continue
}

if foundAddingIndexes, err := checkForAddingIndexes(...) {
mergedInterTableSpans = append(mergedInterTableSpans, lhs.mergedSpans...)
continue
}

I personally find it easier to read at a glance? Less bool bookkeeping too 🙂

Done.


pkg/ccl/backupccl/backup_planning.go, line 346 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

lets add a comment about what we're doing here. Something that captures that:

If a merge is possible, we add everything but the last span on the LHS, unchanged, to the result slice. We then modify the first span on the RHS to include the last span on the LHS, thereby performing a merge. This merged span will be added to the result slice in future iterations.

Done.


pkg/ccl/backupccl/backup_planning.go, line 352 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…
		if curIndex == len(tableIDs)-2 {
			mergedInterTableSpans = append(mergedInterTableSpans, rhs.mergedSpans...)
		}

might need to pull this out of the loop if we go with the continue approach above?

Done.


pkg/ccl/backupccl/backup_planning.go, line 490 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

I think this might need a little bit of restructuring to be safer.

_, err :-= p.LookupTableByID(ctx, tableID)
// We found a table, return true.
if err == nil {
return true, nil
}
// We got an error that was not a "table not found error". Return that error.
if !sqlerrors.IsUndefinedRelationError(err) {
return true, err
}
// No table was found, safe to continue checking.

Done.


pkg/ccl/backupccl/backup_planning.go, line 507 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

nit: can we revert this change to the original, will reduce the scope of this change.

Done.

Copy link
Author

@annezhu98 annezhu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @annezhu98, and @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 536 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

just a thought, since this is strictly an optimization, maybe we shouldn't fail the whole backup if this method returns an error and just continue with mergedIndexSpans. Curious what @pbardea thinks.

It seems like the only place where an error would be returned is when doing the 3 checks, I changed it up a bit so that whenever an error is returned, we just continue with mergedIndexSpans without merging.

@annezhu98 annezhu98 force-pushed the 58134 branch 2 times, most recently from dba47a0 to f4bd801 Compare June 17, 2021 17:17
Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @annezhu98, and @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 536 at r2 (raw file):

Previously, annezhu98 (Anne Zhu) wrote…

It seems like the only place where an error would be returned is when doing the 3 checks, I changed it up a bit so that whenever an error is returned, we just continue with mergedIndexSpans without merging.

lets modify this call site to account for an error here and log it before ignoring it.


pkg/ccl/backupccl/backup_planning.go, line 332 at r3 (raw file):

		firstSpanInRHSMergedSpans := rhs.mergedSpans[0]

		// Check for any tables between the table of the LHS and RHS span being considered for a merge.

nit: we usually wrap comments at 80 characters. There's a way to configure this in goland (we can talk offline).


pkg/ccl/backupccl/backup_planning.go, line 335 at r3 (raw file):

		// If found, we do not want to merge as the backup did not resolve the table as a target.
		// Currently, this could either be because the table was not specified in the targets, or it was offline.
		if foundTable, err := checkForTablesInBounds(tableIDs[curIndex], tableIDs[curIndex+1]); foundTable || err != nil {

here, and in the checks below, im not sure we should be swallowing the error. If we do want to fallback to using the spans returned by getLogicallyMergedSpans, that should be done outside of this method. I do think if err != nil, we should return the error and allow the user of this method to deal with it.


pkg/ccl/backupccl/backup_planning.go, line 340 at r3 (raw file):

		}

		// Check for any keys between the EndKey between the EndKey of the LHS span, and StartKey of the RHS span.

nit: repetition


pkg/ccl/backupccl/backup_planning.go, line 348 at r3 (raw file):

		}

		// Check for any adding indexes in the tables of the LHS and RHS that might be unexpectedly picked up

nit: modify comment to - Check for adding indexes as described in the comment above checkForAddingIndexes. The presence of such an index makes the LHS and RHS unmergeable as the BACKUP could unexpectedly pick up non-public index entries.


pkg/ccl/backupccl/backup_planning.go, line 364 at r3 (raw file):

}

// checkForAddingIndexes returns true iff adding indexes were found after the end key of the last span on lhsTable

nit: wrap comment


pkg/ccl/backupccl/backup_planning.go, line 465 at r3 (raw file):

	// checkForTablesInBounds returns true iff it successfully resolves a table
	// between startTableID and endTableID (both exclusive)

nit: full stop after comments (here and everywhere:slightly_smiling_face:)


pkg/ccl/backupccl/backup_planning.go, line 393 at r4 (raw file):

		key := tableAndIndex{tableID: lhs.table.GetID(), indexID: idx.GetID()}
		isAdding := added[key] || idx.Adding()
		if isAdding && idx.GetID() >= lhsIndexID {

should this be idx.GetID() > lhsIndexID?


pkg/ccl/backupccl/backup_planning.go, line 395 at r4 (raw file):

		if isAdding && idx.GetID() >= lhsIndexID {
			foundAddingIndex = true
			added[key] = true

i'm not sure I understand the need for added in this method?


pkg/ccl/backupccl/backup_planning.go, line 432 at r4 (raw file):

Quoted 9 lines of code…
	lhsIndex, err := lhs.table.FindIndexWithID(lhsIndexID - 1)
	if err != nil {
		return false, err
	}
	rhsIndex, err := rhs.table.FindIndexWithID(rhsIndexID)
	if err != nil {
		return false, err
	}
	return lhsIndex.IsInterleaved() || rhsIndex.IsInterleaved(), nil

I believe that by the time this code is released, we will not have interleaved tables so we can delete this logic.

Copy link
Contributor

@adityamaru adityamaru left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @annezhu98, and @pbardea)


pkg/ccl/backupccl/backup_test.go, line 6494 at r4 (raw file):

	unusedMap := make(map[tableAndIndex]bool)

	checkForTablesInBoundsOverride := func(startTableID, endTableID descpb.ID) (bool, error) {

lets move this inline for every testcase.


pkg/ccl/backupccl/backup_test.go, line 6502 at r4 (raw file):

	testCases := []struct {
		name                       string

nit: instead of have a xxx1 and xxx2 for each field, can we define a struct tableDef with id, pkIndex, indexes, adding, and dropping as fields. Then each test case can have a tables []tableDef.
We can then have test cases that test merging more than just 2 tables.


pkg/ccl/backupccl/backup_test.go, line 6530 at r4 (raw file):

		},
		{
			name: "dropped-index-before-start-key-on-rhs-span",

fix based on offline discussion. Potentially, index 4 on table 65 should return true on a scan, so that we do not merge the two table spans.


pkg/ccl/backupccl/backup_test.go, line 6544 at r4 (raw file):

		},
		{
			name: "adding-index-between-contiguous-tables",

rename test to "adding-index-after-lhs", and also add a testcase for "adding-index-before-rhs"


pkg/ccl/backupccl/backup_test.go, line 6559 at r4 (raw file):

		},
		{
			name: "offline-table-between-spans",

inline the checkForTableInBounds method to return true for tableID 70.


pkg/ccl/backupccl/backup_test.go, line 6586 at r4 (raw file):

			expectedSpans: []string{"/Table/7{2/1-4/5}"},
		},
	}

some more test ideas:

  • merging multiple contiguous tables
  • interleaving mergeable and non-mergeable spans. egs: table1, offline table2, table3, dropped and GC'ed table 4, table5

This should not merge 1 and 3, but should merge 3 to 5.

Copy link
Author

@annezhu98 annezhu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 536 at r2 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

lets modify this call site to account for an error here and log it before ignoring it.

Done.


pkg/ccl/backupccl/backup_planning.go, line 335 at r3 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

here, and in the checks below, im not sure we should be swallowing the error. If we do want to fallback to using the spans returned by getLogicallyMergedSpans, that should be done outside of this method. I do think if err != nil, we should return the error and allow the user of this method to deal with it.

Done.


pkg/ccl/backupccl/backup_planning.go, line 393 at r4 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

should this be idx.GetID() > lhsIndexID?

lhsIndexID is the index ID of a span's EndKey, which is exclusive, hence whylhsIndexID should be checked as well.


pkg/ccl/backupccl/backup_planning.go, line 395 at r4 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

i'm not sure I understand the need for added in this method?

We need this here to keep track of adding indexes found in `getLogicallyMergedTableSpans.


pkg/ccl/backupccl/backup_planning.go, line 432 at r4 (raw file):

Previously, adityamaru (Aditya Maru) wrote…
	lhsIndex, err := lhs.table.FindIndexWithID(lhsIndexID - 1)
	if err != nil {
		return false, err
	}
	rhsIndex, err := rhs.table.FindIndexWithID(rhsIndexID)
	if err != nil {
		return false, err
	}
	return lhsIndex.IsInterleaved() || rhsIndex.IsInterleaved(), nil

I believe that by the time this code is released, we will not have interleaved tables so we can delete this logic.

Done.


pkg/ccl/backupccl/backup_test.go, line 6305 at r1 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

i'm going to hold off looking at tests until we are happy with the code. We can then discuss how we want to remodel some of the unit tests!

Done.


pkg/ccl/backupccl/backup_test.go, line 6494 at r4 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

lets move this inline for every testcase.

Done.


pkg/ccl/backupccl/backup_test.go, line 6530 at r4 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

fix based on offline discussion. Potentially, index 4 on table 65 should return true on a scan, so that we do not merge the two table spans.

Done.


pkg/ccl/backupccl/backup_test.go, line 6544 at r4 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

rename test to "adding-index-after-lhs", and also add a testcase for "adding-index-before-rhs"

Done.


pkg/ccl/backupccl/backup_test.go, line 6559 at r4 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

inline the checkForTableInBounds method to return true for tableID 70.

Done.


pkg/ccl/backupccl/backup_test.go, line 6586 at r4 (raw file):

Previously, adityamaru (Aditya Maru) wrote…

some more test ideas:

  • merging multiple contiguous tables
  • interleaving mergeable and non-mergeable spans. egs: table1, offline table2, table3, dropped and GC'ed table 4, table5

This should not merge 1 and 3, but should merge 3 to 5.

Done.

Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After debugging around with some of the span-merging logic, I realized that I may have missed some considerations for revision_history backups. Curious about what your thoughts are here @adityamaru and @annezhu98. I think the table existence and KV checks would still be valuable (since we don't want to protect huge swaths of unrelated tables when doing database backups), but separating the set of spans we protect and backup could give us more leeway on the more fragile checks (such as those that look for Adding indexes)

We could also have a sanity check that the protectedTS spans are a superset of those that we're backing up.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @annezhu98)


pkg/ccl/backupccl/backup_planning.go, line 314 at r5 (raw file):

// each table
func getLogicallyMergedSpansAcrossTables(
	tableIDs []descpb.ID,

Since tableIDs should always be the keys of the spans, it would probably be safer to create tableIDs inside this method to ensure that this assumption is always the case (rather than needing to check it at all callsites). If the concern is the cost of re-sorting the list of tableIDs I think it would be useful to at least add a comment about the relationship between the 2 args.


pkg/ccl/backupccl/backup_planning.go, line 354 at r5 (raw file):

		// include unexpected data. These could be keys corresponding to tables
		// that have been dropped, but not gc'ed.
		if foundKVInBounds, err := checkForKVInBounds(lastSpanInLHSMergedSpans.EndKey, firstSpanInRHSMergedSpans.Key, endTime); foundKVInBounds || err != nil {

might be clearer to read as:

if foundKVInBounds, err := ...; err != nil {
   return nil, err
} else if foundKVInBounds {
   ...
}

here (and above and below)


pkg/ccl/backupccl/backup_planning.go, line 389 at r5 (raw file):

// end key of the last span on lhsTable or before the start key of the first
// span of the rhsTable
func checkForAddingIndexes(

I worry about interactions between revision_history backups and this method. This method will always resolve the current version of the descriptor, which might not be what we want when considering descriptor revisions. Debugging logic around merging indexes on revision_history backups this week started to make me wonder if we can loosen the checks that we perform here and maintain a broader set of spans whose timestamps we'd protect in parallel to the set of spans we'd backup...

Sorry for not catching this earlier; but debugging this week has made me more aware to the sensitivity of this merging...


pkg/ccl/backupccl/backup_test.go, line 6499 at r5 (raw file):

func TestLogicallyMergedSpansAcrossTables(t *testing.T) {
	defer leaktest.AfterTest(t)()
	codec := keys.TODOSQLCodec

This could probably use keys.SystemSQLCodec

Copy link
Author

@annezhu98 annezhu98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @annezhu98, and @pbardea)


pkg/ccl/backupccl/backup_planning.go, line 314 at r5 (raw file):

Previously, pbardea (Paul Bardea) wrote…

Since tableIDs should always be the keys of the spans, it would probably be safer to create tableIDs inside this method to ensure that this assumption is always the case (rather than needing to check it at all callsites). If the concern is the cost of re-sorting the list of tableIDs I think it would be useful to at least add a comment about the relationship between the 2 args.

Left a comment with the reasoning for the 2 args.


pkg/ccl/backupccl/backup_planning.go, line 354 at r5 (raw file):

Previously, pbardea (Paul Bardea) wrote…

might be clearer to read as:

if foundKVInBounds, err := ...; err != nil {
   return nil, err
} else if foundKVInBounds {
   ...
}

here (and above and below)

Done.


pkg/ccl/backupccl/backup_test.go, line 6499 at r5 (raw file):

Previously, pbardea (Paul Bardea) wrote…

This could probably use keys.SystemSQLCodec

Done.

…tection

Previously, within each table, contiguous index spans are merged if there are no dropped or adding indexes between them. However, when a BACKUP involves a large number of tables, the protected ts cluster limit is still being hit very frequently.

To address this issue, this commit would logically merge spans between tables to reduce the number of spans generated, further optimizing from previous work cockroachdb#55794.

To determine if two tables can be merged we need to check if the following MVCC and Non-MVCC invariants are met:

* No dropping or adding indexes between LHS and RHS
* LHS and RHS are either contiguous tables, or non-contiguous and all tables between them have been dropped and gc'ed

Resolves: cockroachdb#58134
Informs: cockroachdb#54747

Release note: None
Copy link
Contributor

@pbardea pbardea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry -- this fell off my review queue. I think I've convinced myself that this is actually okay in the face of descriptor revisions, but adding some test cases that test descriptors with multiple versions of the same ID would be useful here.

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Aug 24, 2021

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

backupccl: Coalesce spans to backup across table boundaries
5 participants