-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixed typo in test vectors, clarified proquint and identity, added links to specs. #85
Conversation
I have edited the multibase proquint RFC to clarify how the full prefix works, and I have added an explicit reference to the RFC in the table (because the proquint data encoded by multibase is different from the data encoded by the original proquint spec). I have also added an explicit reference to the RFC for base8 in the table, because base8 according to the multibase RFC is significantly different from the base8 provided by other standard implementations (e.g. Python).
The multibase identity prefix is the character non-printable ASCII/UTF-8 character with codepoint 0x00. Note that this is different from the multibase prefix 0 listed for base2, which is the ASCII/UTF-8 character "0" with codepoint 0x30. | ||
|
||
|
||
## Encoding |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work with invalid utf8, unfortunately.
Really, 0x00 (or NUL) isn't a true multibase, that's why it's so hard to describe. It exists to distinghish between text and binary in a context where 0x0 means "everything following is binary".
Really, the right way to do this would be to use a raw/utf8/ascii/etc. multicodec to specify the encoding of the "following" data. E.g.:
0x55 ...
- binary.0xXX ...
- utf8 where XX is some marker byte. Unfortunately, the BOM (0xFEFF) isn't a valid varint, so we'd probably need to pick another.
All this is saying... I'd rather just drop it entirely. I don't think we're actually using it anywhere (for real, at least).
Thoughts @vmx?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be happy to remove the identity Multibase in case there are no objections. I think in the realm of Multibase it doesn't really makes sense as it is really about string encoding things.
Though things are sadly a bit complicated. Technically the identity Multibase is used in every DAG-CBOR encoded CID. The reason is that CID is specified with the Multibased prefix being part of the actual CID: https://github.com/multiformats/cid/tree/97ff4a329f04b70c1ab7255c62af48192146b025#how-does-it-work
Though talking with other folks, hardly anyone I know thinks of CIDs that way. I (and the people I've talked to) think of CIDs without the Multibase prefix and there there is of course Multibase prefixed CIDs.
To conclude, I'd remove the the identity Multibase and update the CID spec to reflect how CIDs are thought of today.
Added the Python module `multiformats` to the list of implementations.
@vmx @Stebalien Was the rest of this PR other than the |
@bumblefudge I created a new pull request that fixes some of these typos. The |
Closes #76, closes #80, closes #83, closes #84.