Name Snapshot Data Blobs by UUID #40652

original-brownbear · 2019-03-29T19:55:21Z

There is no functional reason why we need incremental naming for these files but
- As explained in S3 Snapshot Repository Erroneously Assumes Consistent List Operation #38941 it is a possible source of corrupting the repository
- It wastes API calls for the list operation (i.e. time and money)
- Is just needless complication
Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
- Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix

@ywelsch @andrershov this seems like an absolute freeby with (as far as I can tell) no downsides? That's why I marked this as 6.7.1 and 7.0 too to lower the risk of corrupting data for those as well.

* There is no functional reason why we need incremental naming for these files but * As explained in elastic#38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix

elasticmachine · 2019-03-29T19:55:23Z

Pinging @elastic/es-distributed

dakrone · 2019-03-29T20:10:16Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

@@ -1264,7 +1228,7 @@ public void snapshot(final IndexCommit snapshotIndexCommit) {
                        indexIncrementalSize += md.length();
                        // create a new FileInfo
                        BlobStoreIndexShardSnapshot.FileInfo snapshotFileInfo =
-                            new BlobStoreIndexShardSnapshot.FileInfo(fileNameFromGeneration(++generation), md, chunkSize());
+                            new BlobStoreIndexShardSnapshot.FileInfo(DATA_BLOB_PREFIX + UUIDs.randomBase64UUID(), md, chunkSize());


I believe this violates the snapshot validation where snapshots must be lowercase? (I ran into this with SLM) Is this going to have problems for non-case-sensitive filesystems since different UUIDs could collide there?

This blob-name is not related to the name of the snapshot in any way so no such limitations apply here. These are segment data files.

Is this going to have problems for non-case-sensitive filesystems since different UUIDs could collide there?

I'm not sure that's possible in theory (just from the UUID specification, but I could be wrong here), but I'd say it's unlikely to the point of being impossible even if there were two such UUIDs. But that said, we currently make the same assumption about the safety of these UUIDs for other parts of the blob naming (index and snapshot metadata blobs) as well and so far haven't seen problems arise from that.

Okay, thanks for confirming! I only ask because we have to come up with an alternative for SLM (

elasticsearch/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/snapshotlifecycle/SnapshotLifecyclePolicy.java

Lines 126 to 127 in 1f5811b

// TODO: we are breaking the rules of UUIDs by lowercasing this here, find an alternative (snapshot names must be lowercase)

return candidates.get(0) + "-" + UUIDs.randomBase64UUID().toLowerCase(Locale.ROOT);

)

…lobs

andrershov

Small, but a nice change. Can you please update JavaDoc for BlobStoreRepository as well? (not __1, but __-UUID, you might also specify that previously they were sequential).

…lobs

original-brownbear · 2019-04-01T14:03:37Z

@andrershov fixed docs in c09e648 :)

andrershov

I left on more comment on the docs. Except that LGTM.

…lobs

original-brownbear · 2019-04-01T15:05:02Z

@andrershov fixed :)
343efe6

original-brownbear · 2019-04-01T15:05:37Z

@ywelsch wdyt here?

Are you good with this change or do you see any problem with it?
How far do you think we can/should back-port this?

…lobs

ywelsch

Let's backport up to 6.7 for now.

ywelsch · 2019-04-02T08:36:12Z

server/src/main/java/org/elasticsearch/repositories/blobstore/BlobStoreRepository.java

+ *      |  |  |- __1                      \
+ *      |  |  |- __2                      |
+ *      |  |  |- __VPO5oDMVT5y4Akv8T_AO_A |- files from different segments see snap-* for their mappings to real segment files
+ *      |  |  |- __1gbJy18wS_2kv1qI7FgKuQ |  (files with numeric names were created by older ES versions)


perhaps add this comment to the files above (i.e. __1and __2)

…lobs

original-brownbear · 2019-04-02T13:48:12Z

thanks @andrershov + @ywelsch

* master: add reason to DataFrameTransformState and add hlrc protocol tests (elastic#40736) Remove timezone validation on rollup range queries (elastic#40647) Fix testRunStateChangePolicyWithAsyncActionNextStep race condition (elastic#40707) Don't mark shard as refreshPending on stats fetching (elastic#40458) Name Snapshot Data Blobs by UUID (elastic#40652) SQL: [TEST] Mute TIME related failing tests [TEST] RecoveryWithConcurrentIndexing test (elastic#40733)

* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in elastic#38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix

* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in #38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix

* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in elastic#38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix

original-brownbear added >non-issue :Distributed Coordination/Snapshot/Restore Anything directly related to the `_snapshot/*` APIs v7.0.0 >refactoring v8.0.0 v7.2.0 v6.7.1 labels Mar 29, 2019

original-brownbear requested review from andrershov and ywelsch March 29, 2019 19:55

dakrone reviewed Mar 29, 2019

View reviewed changes

Merge remote-tracking branch 'elastic/master' into uuid-naming-data-b…

18e5703

…lobs

colings86 added v6.7.2 and removed v6.7.1 labels Mar 30, 2019

andrershov suggested changes Apr 1, 2019

View reviewed changes

original-brownbear added 2 commits April 1, 2019 15:47

Merge remote-tracking branch 'elastic/master' into uuid-naming-data-b…

ef5e406

…lobs

CR: update Javadoc

c09e648

original-brownbear requested a review from andrershov April 1, 2019 14:03

andrershov approved these changes Apr 1, 2019

View reviewed changes

original-brownbear added 2 commits April 1, 2019 17:02

Merge remote-tracking branch 'elastic/master' into uuid-naming-data-b…

81d40c9

…lobs

fix prefix

343efe6

original-brownbear added 2 commits April 1, 2019 20:59

Merge remote-tracking branch 'elastic/master' into uuid-naming-data-b…

a6d989a

…lobs

Merge remote-tracking branch 'elastic/master' into uuid-naming-data-b…

83962a4

…lobs

ywelsch approved these changes Apr 2, 2019

View reviewed changes

original-brownbear added 2 commits April 2, 2019 10:42

Merge remote-tracking branch 'elastic/master' into uuid-naming-data-b…

e975cb1

…lobs

CR: move comment about legacy naming

7c844b4

original-brownbear merged commit 3ecfd9b into elastic:master Apr 2, 2019

original-brownbear deleted the uuid-naming-data-blobs branch April 2, 2019 13:48

original-brownbear added the backport pending label Apr 2, 2019

colings86 added v6.7.1 v6.7.2 and removed v6.7.2 v6.7.1 labels Apr 3, 2019

jakelandis added v7.0.0-rc2 v7.0.0 and removed v7.0.0 v7.0.0-rc2 labels Apr 3, 2019

original-brownbear mentioned this pull request Apr 5, 2019

Implement Eventually Consistent Mock Repository for SnapshotResiliencyTests #40893

Merged

This was referenced Apr 25, 2019

Name Snapshot Data Blobs by UUID (#40652) #41523

Merged

Name Snapshot Data Blobs by UUID (#40652) #41524

Merged

original-brownbear removed the backport pending label Apr 25, 2019

original-brownbear mentioned this pull request Apr 25, 2019

Name Snapshot Data Blobs by UUID (#40652) #41525

Merged

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Name Snapshot Data Blobs by UUID #40652

Name Snapshot Data Blobs by UUID #40652

original-brownbear commented Mar 29, 2019

elasticmachine commented Mar 29, 2019

dakrone Mar 29, 2019

original-brownbear Mar 29, 2019

dakrone Mar 29, 2019

andrershov left a comment

original-brownbear commented Apr 1, 2019

andrershov left a comment

original-brownbear commented Apr 1, 2019

original-brownbear commented Apr 1, 2019

ywelsch left a comment

ywelsch Apr 2, 2019

original-brownbear commented Apr 2, 2019

	// TODO: we are breaking the rules of UUIDs by lowercasing this here, find an alternative (snapshot names must be lowercase)
	return candidates.get(0) + "-" + UUIDs.randomBase64UUID().toLowerCase(Locale.ROOT);

Name Snapshot Data Blobs by UUID #40652

Name Snapshot Data Blobs by UUID #40652

Conversation

original-brownbear commented Mar 29, 2019

elasticmachine commented Mar 29, 2019

dakrone Mar 29, 2019

Choose a reason for hiding this comment

original-brownbear Mar 29, 2019

Choose a reason for hiding this comment

dakrone Mar 29, 2019

Choose a reason for hiding this comment

andrershov left a comment

Choose a reason for hiding this comment

original-brownbear commented Apr 1, 2019

andrershov left a comment

Choose a reason for hiding this comment

original-brownbear commented Apr 1, 2019

original-brownbear commented Apr 1, 2019

ywelsch left a comment

Choose a reason for hiding this comment

ywelsch Apr 2, 2019

Choose a reason for hiding this comment

original-brownbear commented Apr 2, 2019