Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name Snapshot Data Blobs by UUID #40652

Merged

Conversation

original-brownbear
Copy link
Member

  • There is no functional reason why we need incremental naming for these files but
  • Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
    • Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix

@ywelsch @andrershov this seems like an absolute freeby with (as far as I can tell) no downsides? That's why I marked this as 6.7.1 and 7.0 too to lower the risk of corrupting data for those as well.

* There is no functional reason why we need incremental naming for these files but
  * As explained in elastic#38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@@ -1264,7 +1228,7 @@ public void snapshot(final IndexCommit snapshotIndexCommit) {
indexIncrementalSize += md.length();
// create a new FileInfo
BlobStoreIndexShardSnapshot.FileInfo snapshotFileInfo =
new BlobStoreIndexShardSnapshot.FileInfo(fileNameFromGeneration(++generation), md, chunkSize());
new BlobStoreIndexShardSnapshot.FileInfo(DATA_BLOB_PREFIX + UUIDs.randomBase64UUID(), md, chunkSize());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this violates the snapshot validation where snapshots must be lowercase? (I ran into this with SLM) Is this going to have problems for non-case-sensitive filesystems since different UUIDs could collide there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This blob-name is not related to the name of the snapshot in any way so no such limitations apply here. These are segment data files.

Is this going to have problems for non-case-sensitive filesystems since different UUIDs could collide there?

I'm not sure that's possible in theory (just from the UUID specification, but I could be wrong here), but I'd say it's unlikely to the point of being impossible even if there were two such UUIDs. But that said, we currently make the same assumption about the safety of these UUIDs for other parts of the blob naming (index and snapshot metadata blobs) as well and so far haven't seen problems arise from that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, thanks for confirming! I only ask because we have to come up with an alternative for SLM (

// TODO: we are breaking the rules of UUIDs by lowercasing this here, find an alternative (snapshot names must be lowercase)
return candidates.get(0) + "-" + UUIDs.randomBase64UUID().toLowerCase(Locale.ROOT);
)

@colings86 colings86 added v6.7.2 and removed v6.7.1 labels Mar 30, 2019
Copy link
Contributor

@andrershov andrershov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small, but a nice change. Can you please update JavaDoc for BlobStoreRepository as well? (not __1, but __-UUID, you might also specify that previously they were sequential).

@original-brownbear
Copy link
Member Author

@andrershov fixed docs in c09e648 :)

Copy link
Contributor

@andrershov andrershov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left on more comment on the docs. Except that LGTM.

@original-brownbear
Copy link
Member Author

@andrershov fixed :)
343efe6

@original-brownbear
Copy link
Member Author

@ywelsch wdyt here?

  1. Are you good with this change or do you see any problem with it?
  2. How far do you think we can/should back-port this?

Copy link
Contributor

@ywelsch ywelsch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's backport up to 6.7 for now.

* | | |- __1 \
* | | |- __2 |
* | | |- __VPO5oDMVT5y4Akv8T_AO_A |- files from different segments see snap-* for their mappings to real segment files
* | | |- __1gbJy18wS_2kv1qI7FgKuQ | (files with numeric names were created by older ES versions)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps add this comment to the files above (i.e. __1and __2)

@original-brownbear
Copy link
Member Author

thanks @andrershov + @ywelsch

@original-brownbear original-brownbear merged commit 3ecfd9b into elastic:master Apr 2, 2019
@original-brownbear original-brownbear deleted the uuid-naming-data-blobs branch April 2, 2019 13:48
jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Apr 2, 2019
* master:
  add reason to DataFrameTransformState and add hlrc protocol tests (elastic#40736)
  Remove timezone validation on rollup range queries (elastic#40647)
  Fix testRunStateChangePolicyWithAsyncActionNextStep race condition (elastic#40707)
  Don't mark shard as refreshPending on stats fetching (elastic#40458)
  Name Snapshot Data Blobs by UUID (elastic#40652)
  SQL: [TEST] Mute TIME related failing tests
  [TEST] RecoveryWithConcurrentIndexing test (elastic#40733)
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 25, 2019
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in elastic#38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 25, 2019
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in elastic#38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 25, 2019
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in elastic#38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
original-brownbear added a commit that referenced this pull request Apr 25, 2019
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in #38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
original-brownbear added a commit that referenced this pull request Apr 25, 2019
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in #38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
original-brownbear added a commit that referenced this pull request Apr 25, 2019
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in #38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
gurkankaymak pushed a commit to gurkankaymak/elasticsearch that referenced this pull request May 27, 2019
* Name Snapshot Data Blobs by UUID

* There is no functional reason why we need incremental naming for these files but
  * As explained in elastic#38941 it is a possible source of corrupting the repository
  * It wastes API calls for the list operation
  * Is just needless complication
* Since we store the exact names of the data blobs in all the metadata anyway, we can make this change without any BwC considerations
  * Even on the worst case scenario of a downgrade the functionality would continue working since the incremental names wouldn't conflict with the uuids and the number parsing for finding the next incremental name suppresses the exception when encountring a non-numeric value after the double underscore prefix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants