-
Notifications
You must be signed in to change notification settings - Fork 372
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Metadata from VolumeHandle #116
Comments
Same thing with |
I like this change, but I do worry it removes the ability to have some flexibility with a single readable field and then additional metadata. I was hoping the checksum change I proposed in #112 could be the compromise. This primary issue I have with this is that it encourages some plug-ins to treat this field as opaque, and that will result in COs not having a clear idea on what to display to users when volumes are listed. Before the ID could have been the name with additional unique data in the metadata. Now the ID may be encoded JSON or some other data that, while uniquely IDs the volume, is unhelpful to users. I still think the checksum is the best solution as it keeps the ID as s readable field and the metadata for additional data. This change completely removes the ability to return a volume’s unique name from ListVolumes unless the name is unique to the storage platform as well. Otherwise the forces plug-ins to possibly maintain a persistent mapping table of CO unique names to volume IDs, and That’ll I don’t think we should place that burden on all plug-ins. |
I'd suggest we still keep We definitely need to define "equality" for |
Hi @jieyu, Most map implementations are not ordered by design as the underlying bucket systems they use prevent such a thing. And if they are ordered they're much slower. In fact, #112 calls out that you must manually enforce a lexicographical ordering of the keys when calculating the checksum for this very reason. Again, I think #112 is the fastest way and easiest solution to provide a single piece of unique, identifiable information while still maintaining the ID and metadata fields for their original, intended purpose. I worry without the metadata the ID field will almost never be a human-readable piece of information, and that it will make supporting existing volumes (not created by CSI) even more difficult. |
Using a hash to determine equality is poor practice as two different strings are not guaranteed not to have the same hash. So even if the hashes are equal, you still need a way to test equality on the stuff you are hashing otherwise you are leaving your code open to false matches. To me it is much better to just declare that the ID returned by the plugin must be unique. If a plugin has to cope with some back-end that cannot guarantee uniqueness, then it can just put that non-unique key into the metadata and generate a guid for the ID. Even if the plugin never actually uses the id for anything, it gives the CO a unique key to hang its hat on. |
Hi Julian, I’m very confused by your reply. Hashes are used every day for comparisons. See Java for literally all of its equality checks. Did you read #112? Please do. It works. As for this proposed change is problematic as it renders an ID meaningless as a piece of human readable info and forces all plug-ins to maintain some persistent mapping table. |
see https://docs.oracle.com/javase/7/docs/api/java/lang/Object.html
|
Also note that if the plug-in puts a usable ID into the metadata, then it need not make a persistent mapping table. Since the metadata is always passed with the ID, the plug-in will always get it. I am curious about why the ID is supposed to be human readable...is that just so that we can find it in the logs easily? |
Hi Julian, I still dont understand. The javadocs do state that if the objects are equal then the hash code should be the same. That’s what I said. The reason is because the hashcode is used to store objects in a map. If two objects produce the same hash code and you store them both in a map the second object replaces the first. |
The ID should be human readable because COs need to support existing volumes returned by list volumes via some CO UX. How else is a CO going to display a list to a user if not via the handle? |
Again, the point I’m making is that if two strings are equal and the metadata is equal the same hash will always be produced. Read the original PR and I provide examples on how to consume the metadata to do so. |
In Java land, if two objects have the same hash code, but This is a fundamental aspect of hashing that people mess up routinely. The use case you're citing isn't really the interesting one. The interesting use case where two objects have different ids or metadata or whatever, but happen to have the same hash or checksum. This case is rare with most checksum algorithms, but then again, most checksums won't guarantee that it cannot happen. |
No, I get all that. I did Java for years. On phone so maybe not explaining well. My point is that hash in case I proposed is sufficient for what I proposed. Sure, there are possible collisions, but it’s rare enough that the value outweighs the potential issue. |
Hi @julian-hj / @saad-ali, Julian is absolutely correct that a hash is susceptible to false-positives, so if we put that aside for a moment, let's reexamine the original use case I was targeting -- a unique, string ID. @saad-ali 's proposed change in this issue provides that, but in the process very probably removes a simple , human readable string field that can be used to identify the volume. How about this proposal then be modified so that a volume handle is thus: message VolumeHandle {
// ID is UNIQUE and REQUIRED. MAY NOT be human readable.
string id = 1;
// Name MAY NOT be UNIQUE and is OPTIONAL. MUST BE human
// readable. MAY NOT be the same name value provided when
// creating a volume via CreateVolume.
string name = 2;
} The above proposal simply adds an optional |
Hi @julian-hj, I apologize for arguing about the hash stuff. I had oral surgery yesterday and was flying on what they gave me for a while. Please see my above addendum regarding this issue. |
Hi @saad-ali and @julian-hj, This relates to #117 and @julian-hj's comment there regarding a plug-in leveraging Both Julian and James have noted they understood that This is where I'm confused. Isn't state, by its definition, something that changes? If If However, if I guess I don't really care one way or the other as long as we hammer this out much, much sooner than later. cc @codenrhoden |
The spec clearly states that the metadata will not change over the life of the volume, and that seems totally fine with me. From my perspective, the more important use case for metadata was to aggregate volume metadata gathered from inputs to the create request and also potentially from whatever happens when the volume is provisioned, and package it up in such a way that the Node plugin can consume it easily without a back channel. So for example, in the case of a simple plugin that just mounts existing volumes, the We don't need to be able to change the metadata after the create is done, we just need some way to package up some information after create time and make sure it gets to the node. That way, even if the ControllerPlugin requires a backing store, the Node plugin can be extremely simple. Or, in another example, the controller plugin might create a ceph cluster and a keyring to access it, and then put that keyring into the metadata. The node plugin would consume that keyring from the ceph-fuse client during NodePublish. Neither of these use cases requires metadata to ever change after the Create is done. Later, we might like a way to rotate credentials for a Ceph cluster, but given that we do not have anything like "ModifyVolume" today, the spec does not support that use case regardless. I have no opinion about whether the metadata should be included in VolumeHandle or passed separately, except that it seems like metadata has gotten itself entangled in volume identity, which was never the intent, so maybe folding it into the handle was what caused this confusion. |
Hi Julian, Ah. Thank you for adding your perspective. It felt a bit like a blind discussion without your input. If you never intended to modify the metadata returned when creating a volume, then why can you not encode it, along with any platform-specific ID, as a string and send it back to the CO as the unique ID? It still gets persisted and your plug-in is able to easily decode it when given back to you. Or am I missing something? |
It may contain secrets and it may be largish so encoding it into an id seems like maybe not a great idea. We specifically separated metadata from I'd so that it would not leak into the logs.
…---
Sent from VMware Boxer
On October 6, 2017 at 6:05:19 PM PDT, Schley Andrew Kutz <[email protected]> wrote:
Hi Julian,
Ah. Thank you for adding your perspective. It felt a bit like a blind discussion without your input.
If you never intended to modify the metadata returned when creating a volume, then why can you not encode it, along with any platform-specific ID, as a string and send it back to the CO as the unique ID? It still gets persisted and your plug-in is able to easily decode it when given back to you.
Or am I missing something?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#116 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AIF0_H8vHj4C7Qu81vMPVI5wGu5f-GY8ks5sps5LgaJpZM4Psiqx>.
|
But then you’re relying on the CO to encrypt that data. Seems like a better idea to encrypt it before sending it to be CO?
I’ve no horse in the race as long as we can had some form of tags to Vol info for making available free form data.
I just need to know if metadata in the handle takes part in identity. Sounds like it does, and it’s not considered state as it were.
I still wouldn’t trust a CO to hold sensitive info that I didn’t protect before hand.
…--
-a
On Oct 6, 2017, at 8:21 PM, Julian Hjortshoj ***@***.***> wrote:
It may contain secrets and it may be largish so encoding it into an id seems like maybe not a great idea. We specifically separated metadata from I'd so that it would not leak into the logs.
---
Sent from VMware Boxer
On October 6, 2017 at 6:05:19 PM PDT, Schley Andrew Kutz ***@***.***> wrote:
Hi Julian,
Ah. Thank you for adding your perspective. It felt a bit like a blind discussion without your input.
If you never intended to modify the metadata returned when creating a volume, then why can you not encode it, along with any platform-specific ID, as a string and send it back to the CO as the unique ID? It still gets persisted and your plug-in is able to easily decode it when given back to you.
Or am I missing something?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#116 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AIF0_H8vHj4C7Qu81vMPVI5wGu5f-GY8ks5sps5LgaJpZM4Psiqx>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
It sounds like we all agree that volume handle should should be opaque to the CO and contain all the information a SP needs to uniquely identify (and easily reference) a volume, and therefore we should collapse volume handle down into a simple string. This will allow CO to use the string as a key in a map very easily in most languages. @akutz does that in PR #117 along with a similar change for node ID. Doing this, however, also removes the ability for an SP to return non-identifying, decorative information (aka "tags" or "label") which would be nice for UX. To fix this, @akutz also introduced #116. Tag (in #116) are not intended to be a cookie, meaning they should not be passed back from the CO to the SP on subsequent requests. This means that if the SP was relying on using metadata as a means to persist state in the CO, they no longer have any means to do so (other than possibly encoding that information into the volumeHandle string). So the big open question now remains, should we allow some mechanism, outside of the volumeHandle, for an SP to use the CO to persist state (like a cookie)? I would argue we should not. I would prefer to keep the CSI interface as small and simple as possible. To help move the discussion forward, I'd like to hear about more concrete cases instead of designing towards hypotheticals. I'd also like to point out that if a volume plugin really needs to persist state, it could set up a CO specific sidecar container that enables it to do so in a CO specific manner--this would isolate the complexity to a handful of plugins and not leak it into the CSI API. Considering how contentious this topic, I suggest we discuss and finalize at the next CSI meeting? |
Ach--that could be tough, as I am shortly gonna get on a plane for CF Summit Europe, so I may not be in the next working group meeting, and there doesn't seem to be anyone else who is really holding the torch for metadata passing. I will repeat the argument I made in #117 though--metadata is less about persisting state on behalf of the plugin, and more about passing metadata from the Controller Plugin to the Node Plugin. Since there are potentially many instances of the node plugin, if we require the node plugin to have backchannel access to the controller plugin, or access to the same backing store, then the whole plugin architecture gets quite a lot more difficult to manage. So we might simplify the CSI a little by removing metadata from it, but we make even the simplest plugin whole a lot more complex to implement, particularly if we want to do things like handle node failure, network segmentation, backup & restore, etc. I am fine if we want to take metadata out of the handle--I agree that it should not be part of the volume identity--but I think it is a mistake to discard it altogether. I will see if I can make it to the working group meeting, but that will be 10PM for me, so no promises. |
Hi Julian, I can definitely argue your case and will even outline when and if we’d be able to leverage it as well. I just doubt I would be able to give voice to your perspective with as much passion as you have as it’s simply not as critical to me. I agree it would be useful, but we also have to start depending on etcd for other features anyway. |
I'm +1 on what @julian-hj suggested. Yes, plugin can use etcd to pass data from controller plugin to node plugin, but that'll be a big pain operationally. Alternatively, plugin can piggyback those information in the volume ID, but that's pretty hacky, and is not what ID is intended for. @saad-ali, from k8s's perspective, is persisting volume metadata along with volume id hard? If not, i am leaning toward just providing this functionality. |
Personally a message bus would be far more valuable to me than just static data. Some spec method for bi-directional communication between controller and nodes... |
I'm perfectly happy with this change. I would even go so far as to mention that the id not to be displayable directly to the user. |
+1 for @julian-hj Being able to have the CO pass metadata from the controller seems fairly critical to simplifying the management of these plugins. |
CSI metadata discussion/resolution: |
The spec currently defines a VolumeHandle as a string ID and a key/value metadata map. Where the combination of ID and metadata is the unique identifier for a volume.
However, proto/struct equality is a pain and not natively supported in most (all?) libraries/languages. String equality, on the other hand, is supported, and very easy to understand.
Therefore, to make it easier to use the VolumeHandle as a key, I strongly suggest removing metadata, and having the VolumeHandle be just a string.
The text was updated successfully, but these errors were encountered: