-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Name Snapshot Data Blobs by UUID #40652
Name Snapshot Data Blobs by UUID #40652
Conversation
* There is no functional reason why we need incremental naming for these files but * As explained in elastic#38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
Pinging @elastic/es-distributed |
@@ -1264,7 +1228,7 @@ public void snapshot(final IndexCommit snapshotIndexCommit) { | |||
indexIncrementalSize += md.length(); | |||
// create a new FileInfo | |||
BlobStoreIndexShardSnapshot.FileInfo snapshotFileInfo = | |||
new BlobStoreIndexShardSnapshot.FileInfo(fileNameFromGeneration(++generation), md, chunkSize()); | |||
new BlobStoreIndexShardSnapshot.FileInfo(DATA_BLOB_PREFIX + UUIDs.randomBase64UUID(), md, chunkSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this violates the snapshot validation where snapshots must be lowercase? (I ran into this with SLM) Is this going to have problems for non-case-sensitive filesystems since different UUIDs could collide there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This blob-name is not related to the name of the snapshot in any way so no such limitations apply here. These are segment data files.
Is this going to have problems for non-case-sensitive filesystems since different UUIDs could collide there?
I'm not sure that's possible in theory (just from the UUID specification, but I could be wrong here), but I'd say it's unlikely to the point of being impossible even if there were two such UUIDs. But that said, we currently make the same assumption about the safety of these UUIDs for other parts of the blob naming (index and snapshot metadata blobs) as well and so far haven't seen problems arise from that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, thanks for confirming! I only ask because we have to come up with an alternative for SLM (
Lines 126 to 127 in 1f5811b
// TODO: we are breaking the rules of UUIDs by lowercasing this here, find an alternative (snapshot names must be lowercase) | |
return candidates.get(0) + "-" + UUIDs.randomBase64UUID().toLowerCase(Locale.ROOT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small, but a nice change. Can you please update JavaDoc for BlobStoreRepository
as well? (not __1, but __-UUID, you might also specify that previously they were sequential).
@andrershov fixed docs in c09e648 :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left on more comment on the docs. Except that LGTM.
@andrershov fixed :) |
@ywelsch wdyt here?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's backport up to 6.7 for now.
* | | |- __1 \ | ||
* | | |- __2 | | ||
* | | |- __VPO5oDMVT5y4Akv8T_AO_A |- files from different segments see snap-* for their mappings to real segment files | ||
* | | |- __1gbJy18wS_2kv1qI7FgKuQ | (files with numeric names were created by older ES versions) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps add this comment to the files above (i.e. __1
and __2
)
thanks @andrershov + @ywelsch |
* master: add reason to DataFrameTransformState and add hlrc protocol tests (elastic#40736) Remove timezone validation on rollup range queries (elastic#40647) Fix testRunStateChangePolicyWithAsyncActionNextStep race condition (elastic#40707) Don't mark shard as refreshPending on stats fetching (elastic#40458) Name Snapshot Data Blobs by UUID (elastic#40652) SQL: [TEST] Mute TIME related failing tests [TEST] RecoveryWithConcurrentIndexing test (elastic#40733)
* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in elastic#38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in elastic#38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in elastic#38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in #38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in #38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in #38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
* Name Snapshot Data Blobs by UUID * There is no functional reason why we need incremental naming for these files but * As explained in elastic#38941 it is a possible source of corrupting the repository * It wastes API calls for the list operation * Is just needless complication * Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
@ywelsch @andrershov this seems like an absolute freeby with (as far as I can tell) no downsides? That's why I marked this as
6.7.1
and7.0
too to lower the risk of corrupting data for those as well.