-
-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In datastote.NewKey calling path.Clean is dangerous #2601
Comments
@kevina yeah, this is a known issue: #994 Its the cause of the following failure (and a few others i'm not finding right now). I'd say we can keep this issue open as it will add more pressure towards us getting this fixed. |
I should also note that currently this is much less of an issue than you would think. Since the mangling of keys is consistent, we can always access the keys we like we expect. The only noticeable issue this causes is that when running a gc, any key affected by this bug won't be removed. (also you won't be able to see keys affected by this in an |
There is also the small possibility that this bug will create a collision when there normally would not be one, although the probability will be low. For example lets's assume we have these two (8-byte) keys:
Both keys will get mangled to The probability of this happening with 32 byte keys is extremely low and I doubt I could create a collision. But the possible still exists and it is certainly higher than the probability of a SHA-256 collision. |
@whyrusleeping would there be any performance degradation to just using the base58 encoding for datastore keys? This will mesh much better with how go-datastore was designed. The flatfs datastore might need some special case code to decode the base58 encoding and re-encoding as a hex string. This will of course introduce a repo change so I image that we should coordinate the change with a major release. I am against just remove the call to path.Clean because mixing ASCII paths with binary keys is just a bad idea in my view. It could be made to work if we are very careful, but it not something I would be happy with at all. |
@kevina there is a huge performance degradation using base58 encoding for the datastore keys. The question is at what layer do we do the encoding? Do we force users to sanitize their inputs? or do we force something on all inputs? I'm personally in favor of requiring the user to sanitize all inputs, otherwise, if we always encode inputs, we won't be able to easily see paths in the keys, |
I am in favor of using base64 encoding then as it will be shorter and can be used directly in the leveldb (and filestore) datastore. It will only be the flatfs datastore that really would need to concern itself with how the keys are encoded. I am not 100% sure I am following you but I would think that datastore.NewKey() should fail if it sees binary data (anything in the range 0x00 to 0x31). Is this what you mean by force users to sanitize their inputs? |
@kevina Yeah, that would be one way to force users to sanitize input. Although that has the annoying side effect of making |
@whyrusleeping Binary keys are used all over the place, not just in the
As a temporary measure we could |
@kevina all of those are places where we are passing a hash value into the key name. Those should be pretty easy to find and eliminate. I think that is a fairly easy step forward, but it will require a repo version bump and another migration. |
@whyrusleeping I agree. Its the migration step that would be the biggest blocker. Do you think base64 is better than hex? The only place hex would be better would be that the flatfs datastore won't need to reencode, but it would still need to check the keys are in hex. For the filestore I would prefer base64. We can use the I can likely help with this once we decide on what to do. |
Unless we encounter significant opposition, heres the plan moving forward:
Very nice to have:
|
Note: As an implementation detail we should use |
@kevina good point. I just noticed that i used the standard encoding in my other PR just now. |
wow. The key encoding change was literally a one-liner. EDIT: okay, maybe there were two lines we needed. But still, way simpler than i expected |
It causes AllKeysChan not to return all keys, which breaks bloom filter. License: MIT Signed-off-by: Jakub Sztandera <[email protected]>
It causes AllKeysChan not to return all keys, which breaks bloom filter. License: MIT Signed-off-by: Jakub Sztandera <[email protected]>
Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <[email protected]>
Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <[email protected]>
Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <[email protected]>
Fixes ipfs#2601 Also bump version to 0.4.3-dev License: MIT Signed-off-by: Jeromy <[email protected]>
[I am not sure the best place to post this as this issue is related to both go-ipfs and go-datastore, I can move this to this to go-datastore if necessary.]
I may be completely missing something here but it seams datastore keys are represented via binary
[]byte
strings, possible prefixed with a "/" and are converted to adatastore.Key
by callingdatastore.NewKey()
.However,
datastore.NewKey()
also callspath.Clean()
. This is a dangerous thing to do with binary strings. For example the binary[]byte
representation of the hash might just happen to contain a "//" whichpath.Clean
will convert to a "/". In fact it seams that all ofdatastore.Key
is written with the assumption the keys are text and not binary blobs.Here is the relevant code in go-datastore/key.go:
The probability of a
[]byte
string for a multihash containing something thatpath.Clean()
will remove or change is low, which is why I image this issue has not come up yet.I am not sure the best way to fix this. Maybe it might be better to keep the mutihash in its b58 encoding to mesh better with how
datastore.Key
was (supposedly) designed. Or as a quick fix, maybe it would be better to remove the call topath.Clean
in thedatastore.Key.Clean()
method.The text was updated successfully, but these errors were encountered: