Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name Snapshot Data Blobs by UUID #40652

Merged
Original file line number Diff line number Diff line change
Expand Up @@ -135,11 +135,11 @@
* |- Ac1342-B_x/ - data for index "foo" which was assigned the unique id of Ac1342-B_x in the repository
* | |- meta-20131010.dat - JSON Serialized IndexMetaData for index "foo"
* | |- 0/ - data for shard "0" of index "foo"
* | | |- __1 \
* | | |- __2 |
* | | |- __3 |- files from different segments see snapshot-* for their mappings to real segment files
* | | |- __4 |
* | | |- __5 /
* | | |- __1 \
* | | |- __2 |
* | | |- __VPO5oDMVT5y4Akv8T_AO_A |- files from different segments see snapshot-* for their mappings to real segment files
* | | |- __1gbJy18wS_2kv1qI7FgKuQ | (files with numeric names were created by older ES versions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps add this comment to the files above (i.e. __1and __2)

* | | |- __R8JvZAHlSMyMXyZc2SS8Zg /
* | | .....
* | | |- snap-20131010.dat - JSON serialized BlobStoreIndexShardSnapshot for snapshot "20131010"
* | | |- snap-20131011.dat - JSON serialized BlobStoreIndexShardSnapshot for snapshot "20131011"
Expand Down Expand Up @@ -1069,41 +1069,6 @@ protected void finalize(final List<SnapshotFiles> snapshots,
}
}

/**
* Generates blob name
*
* @param generation the blob number
* @return the blob name
*/
protected String fileNameFromGeneration(long generation) {
return DATA_BLOB_PREFIX + Long.toString(generation, Character.MAX_RADIX);
}

/**
* Finds the next available blob number
*
* @param blobs list of blobs in the repository
* @return next available blob number
*/
protected long findLatestFileNameGeneration(Map<String, BlobMetaData> blobs) {
long generation = -1;
for (String name : blobs.keySet()) {
if (!name.startsWith(DATA_BLOB_PREFIX)) {
continue;
}
name = canonicalName(name);
try {
long currentGen = Long.parseLong(name.substring(DATA_BLOB_PREFIX.length()), Character.MAX_RADIX);
if (currentGen > generation) {
generation = currentGen;
}
} catch (NumberFormatException e) {
logger.warn("file [{}] does not conform to the '{}' schema", name, DATA_BLOB_PREFIX);
}
}
return generation;
}

/**
* Loads all available snapshots in the repository
*
Expand Down Expand Up @@ -1196,7 +1161,6 @@ public void snapshot(final IndexCommit snapshotIndexCommit) {
throw new IndexShardSnapshotFailedException(shardId, "failed to list blobs", e);
}

long generation = findLatestFileNameGeneration(blobs);
Tuple<BlobStoreIndexShardSnapshots, Integer> tuple = buildBlobStoreIndexShardSnapshots(blobs);
BlobStoreIndexShardSnapshots snapshots = tuple.v1();
int fileListGeneration = tuple.v2();
Expand Down Expand Up @@ -1264,7 +1228,7 @@ public void snapshot(final IndexCommit snapshotIndexCommit) {
indexIncrementalSize += md.length();
// create a new FileInfo
BlobStoreIndexShardSnapshot.FileInfo snapshotFileInfo =
new BlobStoreIndexShardSnapshot.FileInfo(fileNameFromGeneration(++generation), md, chunkSize());
new BlobStoreIndexShardSnapshot.FileInfo(DATA_BLOB_PREFIX + UUIDs.randomBase64UUID(), md, chunkSize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this violates the snapshot validation where snapshots must be lowercase? (I ran into this with SLM) Is this going to have problems for non-case-sensitive filesystems since different UUIDs could collide there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blob-name is not related to the name of the snapshot in any way so no such limitations apply here. These are segment data files.

Is this going to have problems for non-case-sensitive filesystems since different UUIDs could collide there?

I'm not sure that's possible in theory (just from the UUID specification, but I could be wrong here), but I'd say it's unlikely to the point of being impossible even if there were two such UUIDs. But that said, we currently make the same assumption about the safety of these UUIDs for other parts of the blob naming (index and snapshot metadata blobs) as well and so far haven't seen problems arise from that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for confirming! I only ask because we have to come up with an alternative for SLM (

// TODO: we are breaking the rules of UUIDs by lowercasing this here, find an alternative (snapshot names must be lowercase)
return candidates.get(0) + "-" + UUIDs.randomBase64UUID().toLowerCase(Locale.ROOT);
)

indexCommitPointFiles.add(snapshotFileInfo);
filesToSnapshot.add(snapshotFileInfo);
} else {
Expand Down