-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
backupccl: empty splits in tpc-c 5k restore of new_order table #26375
Comments
I'm also seeing these empty ranges on one of the
This is used by the @mjibson Can someone from bulkio take a look soon? |
Btw, I created a smaller
So whatever the problem is, it isn't due to an old fixture or old restore code. |
Definitely seems to be something coming out of the restore code. Run the following commands on a local 1 node cluster:
In the logs I see:
I'm tracking down where these splits are coming from in the restore code. |
The splits are coming from
So this split code is doing the right thing. What is calling this code? |
The backup files cover the following key spans:
Notice that the key boundaries do not align precisely. The code which creates the import spans from the backup spans seems to be creating empty ranges between these files. |
@dt I'm tossing this your direction as I've reached the limits of my naive debugging. Backup seems to be creating a I haven't given this bug a severity label, but it is somewhat serious. AFAICT, we're creating double the number of ranges on restore that we should be. |
I'm looking into this now. |
A thought for a fix: |
@mjibson This "works" (for some definition of I-don't-know-what-I'm-doing-but-it-produces-the-result-I-want): diff --git a/pkg/ccl/backupccl/restore.go b/pkg/ccl/backupccl/restore.go
index 279d1795c..eacef0ebb 100644
--- a/pkg/ccl/backupccl/restore.go
+++ b/pkg/ccl/backupccl/restore.go
@@ -716,13 +716,17 @@ rangeLoop:
return nil, hlc.Timestamp{}, err
}
}
+ // We only need to create import entries for spans that contain files.
+ if len(files) > 0 {
+ requestEntries = append(requestEntries, importEntry{
+ Span: roachpb.Span{Key: importRange.Start, EndKey: importRange.End},
+ entryType: request,
+ files: files,
+ })
+ }
+ } else {
// If needed is false, we have data backed up that is not necessary
// for this restore. Skip it.
- requestEntries = append(requestEntries, importEntry{
- Span: roachpb.Span{Key: importRange.Start, EndKey: importRange.End},
- entryType: request,
- files: files,
- })
}
}
return requestEntries, maxEndTime, nil |
This currently looks like an IMPORT problem since the generated SSTs don't align perfectly. Am checking. |
I think IMPORT might have created them here, but emptied ranges in a backed up cluster would too. I think @petermattis patch makes sense -- there's no reason to make a range that we're not going to import any data into, so filtering the importEntry saves doing some pointless make-work. |
Note that the backups were create via:
I believe internally that creates a backup via some transform magic from CSVs. I used the following script to create the backup:
|
(to clarify, 👍 on filtering in RESTORE is in addition to, not instead of fixing in |
Looking into IMPORT, it creates these keys with extra /0 in |
Unless it has broken, backup doesn't make a |
ah, right, forgot about that. I think given that, we should probably actually warn if this filter ever sees anything, since that indicates we messed up somewhere. |
Sounds like you guys have this in hand. Please take over my patch as I'm not planning to wrap it into a PR and add tests. |
26452: importccl: make spans overlap during IMPORT with transform r=mjibson a=mjibson Force the backup descriptor spans to overlap correctly. This prevents various extra ranges from being created during RESTORE of data created with IMPORT with transform. The IMPORT commands will need to be re-run to generate correct spans in the descriptors. Fixes #26375 Release note: None Co-authored-by: Matt Jibson <[email protected]>
We decided your change isn't necessary to fix the underlying issue. Regenerate the fixtures and things should be fine. |
I agree, though there are a crap-ton of fixtures to regenerate. The |
I can submit another PR with your patch then if you want. Our concern was that it would possibly hide real problems since there shouldn't be anything that generates backup descriptors with non-adjacent spans. |
Ok I did a test with your patch importing the old data. It kind of works. Original, duplicate ranges for some test data after a RESTORE:
Applied your patch, did the same RESTORE:
So we reduced the number of ranges by N/2-1. Not quite the expected N/2 due to the initial NULL - /1/0 range. And in addition all of the ranges have the bogus /0 at the end. I'm not convinced this halfway good experience is worth putting in a patch for this specific problem (IMPORT with transform on data produced before yesterday). We could make further changes to RESTORE so that it does smarter things, but I'm relatively scared to do this because messing with keys and split locations isn't the safest thing. I don't think that risk outweighs the cost of just regenerating this data, especially if we consider the amount of dev time it'd take to come up with a safe patch. |
The
I wouldn't want to do anything beyond my patch. It completely solves the empty range problem for my use cases. I'll defer to your judgement if it is too risky or fragile. |
@mjibson do you want to do anything further here? Thanks |
Nope, I think we can close this. |
As brought up in #26059 (comment), when I restore the TPC-C 5k workload fixture onto a 15-node roachprod cluster, I get a bunch of empty splits in the
tpcc.new_order
table. Specifically, I runWhich internally runs:
Once the restore is done, I always get:
All those ranges from
key
tokey/0
are worthless and will always be empty. As mentioned on #26059 I think there's a decent chance that it's just because the restore was generated from a cluster that had partitioned the table and was running on a build before #24896 was fixed, but, it was suggested that the RESTORE team check into it because it happens consistently.The text was updated successfully, but these errors were encountered: