-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Case insensitivity #73
Comments
I would caution against strictness here. In e.g. DNS there are things like the 0x20 encoding. Yes it is a very old draft, but it is being guaranteed to work in many reference-implementations, and there are cases in the wild where not supporting case sensitivity on the resolution-boundary bleeds into the app-logic, for instance https://forum.mikrotik.com/viewtopic.php?t=128082 While we should (and already are) emit consistent casing at all times, I think we should continue accepting any-case strings in any context where DNS or other case-insensitive-by-the-book systems are being interfaced. |
Yes, but wouldn't this be a level higher than the multibase implementations? |
Technically: yes. But because everything is terrible I am worried that there are other cross-stack parts operating over the DNS boundary that are case sensitive, and more importantly are outside of our control. Say you could decree: any time we get a b32/36 encoded string from inside a HTTP header, be it a redirect or something cookie related, we invariably lowercase the string first. Then someone comes around and tries to do something cute like the Does it matter whether the above scenario is left open? Probably not... it is rather implausible But what is the benefit of not decoding "whatever comes" ? If anything the extra TLDR: what is the motivation to be lc-strict? |
You could argue that it's a performance hit doing it on the lowest level, even if it might not be needed. Though this is not about the performance implications.
This is based on experience that generally spoken, failing early and being strict reduces hard to find bugs. |
i would love to be strict a follow the spec without blacklist/whitelist's |
Reframing this issue here is how I read the proposal:
This is definitely doable, but it needs a justification. I am not saying there isn't one, my objection is that this justification hasn't been conveyed clearly yet. |
I updated the previous comment, cc-ing @lidel in case he hasn't seen this issue yet |
No the proposal is to have a strict parser. I don't think the uppercase versions need to be removed. I'm asking for having them case sensitive, i.e. a Multibase implementation would accept Base encoded string with prefix |
Hrm... I find it logically inconsistent that we effectively provide support for case insensitive encodings, but only accept them if the casing is the same in the entire string. We should:
The "same case throughout" feels like strictness-for-strictness-sake. Could we bullet-point a few benefits? |
We support several variations for each base, supporting mix case for all feels wrong and adds complexity. Plus i personally don't see the value besides been nice for hand written typos but maybe im missing something here. |
To me it sounded strange to have a uppercase and a lowercase variant, but still being case-insensitive. I was only thinkingabout decoding step (from Base encoded to the original data). But if I take the encoding (data to Base encoded version) into account, it makes sense. You might want to encode something to specifically uppercase or lowercase. So that would speak for having both variants.
Update: Added bullet point about roundtrips |
It seems we need to figure out why @whyrusleeping originally added case-insensitive support for what we now refer to as "4648-derived encodings". In fact it wasn't until 2d108367e5e3d30 that @Stebalien separated them into distinct entities. |
To rephrase: originally the multibase table had noting to do with rfc4648, aside from coincidentally using the same alphabet (which notably is not ideal for CIDs, as it interferes with sorting, and thus indexing). We seem to be converging to "let's be 4648-strict", whereas the original design rationale appears lost. This is my main objection: insufficient data for meaningful answer :) |
From a user perspective, matching the loose (case insensitive) rules found elsewhere would allow people to easily take existing base encoded strings, pre-pend a character for the multibase identifier, and be fine. All of our own code works with strings that are entirely produced and consumed by multibase, but a lot of people on existing systems will have strings that are already encoded using whatever the dominant rules are in their language/library of choice. The choice here is really about how much work we’re asking people to do in order to migrate to multibase from existing base encodings. Can they prepend a string or do we need them to do a full decode and then encode again with a proper multibase library. |
Having slept on this, I am still very much 👎 on removing the requirement for a compliant implementation to handle mixed-case decoding where the alphabet permits doing so. That said, I won't lose sleep if the specs change goes through, and will look into implementing the (non-trivial) extra code in go-b36 to return errors on mixcase. |
Currently we have Base encodings which are currently spec'ed as case-insensitive (Base32 and Base36). I can see that this is convenient from an application perspective. But wouldn't it make sense to be strict at the lowest level (the level of this spec) and reject wrongly cased encodings?
The convenient normalization can always happen at an application level, it would also always have enough information to do so.
Hence I'm in favour in being more strict and making things case sensitive.
PS: Thanks @hugomrdias for bringing this up.
The text was updated successfully, but these errors were encountered: