diff --git a/.github/ISSUE_TEMPLATE/BUG-REPORT.yml b/.github/ISSUE_TEMPLATE/BUG-REPORT.yml
new file mode 100644
index 0000000..7b3dffa
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/BUG-REPORT.yml
@@ -0,0 +1,31 @@
+name: "Bug Report - documentation or registry"
+description: Report possible bugs in multibase spec, process docs, and/or the multibase registry.
+title: "🐛 [DOC/PROCESS BUG] -
"
+labels: [
+ "bug"
+]
+body:
+ - type: textarea
+ id: description
+ attributes:
+ label: "Description"
+ description: Please enter an explicit description of your issue,
+ placeholder: Short and explicit description of your incident, ideally with commit-specific link to lines
+ validations:
+ required: true
+ - type: input
+ id: reprod-url
+ attributes:
+ label: "Reproduction URL"
+ description: Please enter your GitHub URL to provide a reproduction of the issue
+ placeholder: ex. https://github.com/multiformats/multibase/
+ validations:
+ required: false
+ - type: textarea
+ id: context
+ attributes:
+ label: "Context"
+ description: Please provide additional context
+ placeholder: "Context or external links needed to explain the possible mistake"
+ validations:
+ required: false
\ No newline at end of file
diff --git a/.github/ISSUE_TEMPLATE/NEW-REGISTRATION.yml b/.github/ISSUE_TEMPLATE/NEW-REGISTRATION.yml
new file mode 100644
index 0000000..a7d1524
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/NEW-REGISTRATION.yml
@@ -0,0 +1,75 @@
+name: "New Registration"
+description: Express interest in registering a new encoding
+title: "📚 [NEW REGISTRATION] - "
+labels: [
+ "Registration"
+]
+body:
+ - type: input
+ id: encoding-name
+ attributes:
+ label: "Name of encoding"
+ description: Name this library or system
+ placeholder: acronyms and abbreviations are fine
+ validations:
+ required: false
+ - type: checkboxes
+ attributes:
+ label: "Have read contributing"
+ description: I have read the [contributing](https://github.com/multiformats/multiformats/blob/master/contributing.md) document
+ options:
+ - label: I read it!
+ validations:
+ required: true
+ - type: checkboxes
+ attributes:
+ label: "Have checked table"
+ description: I have reviewed the [multiformats mega-table](https://github.com/multiformats/multicodec/blob/master/table.csv) to assess viable sub-namespace for a registry if applicable
+ options:
+ - label: I read it!
+ - type: checkboxes
+ attributes:
+ label: "Willing to open a PR"
+ description: Once my questions are answered and my plan is confirmed, I will open a PR myself that adds the registration and be its change controller, or close this issue myself if I cannot
+ options:
+ - label: I will own this registration
+ - type: input
+ id: codepoint
+ attributes:
+ label: "Proposed codepoint"
+ description: Please put here the prefix in the target encoding. By tradition, the highest binary value in the encoding alphabet works well and has a built-in mnemonic if it doesn't conflict with any other entries
+ placeholder: x
+ validations:
+ required: true
+ - type: input
+ id: varint-value
+ attributes:
+ label: "Proposed varint value for registration in multiformats"
+ description: Please put here the UTF-8 value that corresponds to that target encoding, for inclusion in the multiformats table, formatted as an [unsigned varint](https://github.com/multiformats/unsigned-varint)
+ placeholder: See mf/unsigned-varint
+ validations:
+ required: true
+ - type: textarea
+ id: use-case
+ attributes:
+ label: "use-case"
+ description: Please describe the possible use-cases where this additional codec would be helpful, where this encoding is used currently in the wild, etc.
+ placeholder: Feel free to provide links for context and use-case descriptions
+ validations:
+ required: true
+ - type: textarea
+ id: specification
+ attributes:
+ label: "Description of relevant prior art and status quo"
+ description: Please describe relevant prior art and, if already specified in a static public document, the algorithms and configurations needed to deterministically encode/decode
+ placeholder: Links welcome
+ validations:
+ required: true
+ - type: textarea
+ id: solution_and_rationale
+ attributes:
+ label: "Proposed solution and rationale"
+ description: Please describe at a high level what you are exploring building and current open research questions.
+ placeholder: Detail welcome
+ validations:
+ required: true
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 0000000..4943f9b
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,5 @@
+blank_issues_enabled: true
+contact_links:
+ - name: Protocol Labs Vulnerability Disclosure Team
+ url: mailto:security@ipfs.io
+ about: Please do NOT open issues related to security of implementations or spec here without contacting the IPFS security team first.
\ No newline at end of file
diff --git a/README.md b/README.md
index 64aeecd..f2d9218 100644
--- a/README.md
+++ b/README.md
@@ -5,39 +5,37 @@
[![](https://img.shields.io/badge/freenode-%23ipfs-blue.svg?style=flat-square)](https://webchat.freenode.net/?channels=%23ipfs)
[![](https://img.shields.io/badge/readme%20style-standard-brightgreen.svg?style=flat-square)](https://github.com/RichardLitt/standard-readme)
-> Self identifying base encodings
+> Self-identifying base encodings
-Multibase is a protocol for disambiguating the encoding of base-encoded (e.g.,
-base32, base36, base64, base58, etc.) binary appearing in text.
+Multibase is a protocol for disambiguating the "base encoding" used to express binary data in text formats (e.g., base32, base36, base64, base58, etc.) from the expression alone.
-When text is encoded as bytes, we can usually use a one-size-fits-all encoding
-(UTF-8) because we're always encoding to the same set of 256 bytes (+/- the NUL
-byte). When that doesn't work, usually for historical or performance reasons, we
-can usually infer the encoding from the context.
+When text is encoded as bytes, we can usually use a one-size-fits-all encoding (UTF-8) because we're always encoding to the same set of 256 bytes (+/- the NUL byte).
+When that doesn't work, usually for historical or performance reasons, we can usually infer the encoding from the context.
-However, when bytes are encoded as text (using a base encoding), the base choice
-of base encoding is often restricted by the context. Worse, these restrictions
-can change based on where the data appears in the text. In some cases, we can
-only use `[a-z0-9]`. In others, we can use a larger set of characters but need a
-compact encoding. This has lead to a large set of "base encodings", one for
-every use-case. Unlike when encoding text to bytes, we can't just standardize
-around a single base encoding because there is no optimal encoding for all
-cases.
+However, when bytes are encoded as text (using a base encoding), the choice of base encoding (and alphabet, and other factors) is often restricted by the context.
+Worse, these restrictions can change based on where the data appears in the text.
+In some cases, we can only use `[a-z0-9]`; in others, we can use a larger set of characters but need a compact encoding.
+This has lead to a large set of "base encodings", almost one for every use-case.
+Unlike the case of encoding text to bytes, it is impractical to standardize widely around a single base encoding because there is no optimal encoding for all cases.
-Unfortunately, it's not always clear *what* base encoding is used; that's where
-multibase comes in. It answers the question:
+As data travels beyond its context, it becomes quite hard to ascertain *which* base encoding of the many possible ones were used; that's where multibase comes in.
+Where the data has been prefixed before leaving its context behind, it answers the question:
-> Given data d encoded into text s, what base is it encoded with?
+> Given binary data `d` encoded into text `s`, what base `b` was used to encode it?
+
+To answer this question, a single code point is prepended to `s` at time of encoding, which signals in that new context which `b` can be used to reconstruct `d`.
## Table of Contents
- [Format](#format)
- [Multibase Table](#multibase-table)
+- [Specifications](#specifications)
+- [Status](#status)
+ - [Reserved Terms](#reserved-terms)
- [Multibase By Example](#multibase-by-example)
- [FAQ](#faq)
- [Implementations:](#implementations)
- [Disclaimers](#disclaimers)
-- [Maintainers](#maintainers)
- [Contribute](#contribute)
- [License](#license)
@@ -46,45 +44,48 @@ multibase comes in. It answers the question:
The Format is:
```
-
+
```
-Where `` is used according to the multibase table.
+Where `` is a code representing an entry in the multibase table.
### Multibase Table
The current multibase table is [here](multibase.csv):
```
-encoding, code, description, status
-identity, 0x00, 8-bit binary (encoder and decoder keeps data unmodified), default
-base2, 0, Binary (01010101), candidate
-base8, 7, Octal, draft
-base10, 9, Decimal, draft
-base16, f, Hexadecimal (lowercase), default
-base16upper, F, Hexadecimal (uppercase), default
-base32hex, v, RFC4648 case-insensitive - no padding - highest char, candidate
-base32hexupper, V, RFC4648 case-insensitive - no padding - highest char, candidate
-base32hexpad, t, RFC4648 case-insensitive - with padding, candidate
-base32hexpadupper, T, RFC4648 case-insensitive - with padding, candidate
-base32, b, RFC4648 case-insensitive - no padding, default
-base32upper, B, RFC4648 case-insensitive - no padding, default
-base32pad, c, RFC4648 case-insensitive - with padding, candidate
-base32padupper, C, RFC4648 case-insensitive - with padding, candidate
-base32z, h, z-base-32 (used by Tahoe-LAFS), draft
-base36, k, Base36 [0-9a-z] case-insensitive - no padding, draft
-base36upper, K, Base36 [0-9a-z] case-insensitive - no padding, draft
-base58btc, z, Base58 bitcoin, default
-base58flickr, Z, Base58 flicker, candidate
-base64, m, RFC4648 no padding, default
-base64pad, M, RFC4648 with padding - MIME encoding, candidate
-base64url, u, RFC4648 no padding, default
-base64urlpad, U, RFC4648 with padding, default
-proquint, p, Proquint (https://arxiv.org/html/0901.4016), draft
-base256emoji, 🚀, Base256 with custom alphabet using variable-sized-codepoints, draft
+Unicode, character, encoding, description, status
+U+0000, NUL, none, (No base encoding), reserved
+U+0030, 0, base2, Binary (01010101), experimental
+U+0031, 1, none, (No base encoding) reserved
+U+0037, 7, base8, Octal, draft
+U+0039, 9, base10, Decimal, draft
+U+0066, f, base16, Hexadecimal (lowercase), final
+U+0006, F, base16upper, Hexadecimal (uppercase), final
+U+0076, v, base32hex, RFC4648 case-insensitive - no padding - highest char, experimental
+U+0056, V, base32hexupper, RFC4648 case-insensitive - no padding - highest char, experimental
+U+0074, t, base32hexpad, RFC4648 case-insensitive - with padding, experimental
+U+0054, T, base32hexpadupper, RFC4648 case-insensitive - with padding, experimental
+U+0062, b, base32, RFC4648 case-insensitive - no padding, final
+U+0042, B, base32upper, RFC4648 case-insensitive - no padding, final
+U+0063, c, base32pad, RFC4648 case-insensitive - with padding, draft
+U+0043, C, base32padupper, RFC4648 case-insensitive - with padding, draft
+U+0068, h, base32z, z-base-32 (used by Tahoe-LAFS), draft
+U+006b, k, base36, Base36 [0-9a-z] case-insensitive - no padding, draft
+U+004b, K, base36upper, Base36 [0-9a-z] case-insensitive - no padding, draft
+U+007a, z, base58btc, Base58 Bitcoin, final
+U+005a, Z, base58flickr, Base58 Flicker, experimental
+U+006d, m, base64, RFC4648 no padding, final
+U+004d, M, base64pad, RFC4648 with padding - MIME encoding, experimental
+U+0075, u, base64url, RFC4648 no padding, final
+U+0055, U, base64urlpad, RFC4648 with padding, final
+U+0070, p, proquint, Proquint (https://arxiv.org/html/0901.4016), experimental
+U+002F, Q, none, (no base encoding) reserved
+U+002F, /, none, (no base encoding) reserved
+U+1F680, 🚀, base256emoji, base256 with custom alphabet using variable-sized-codepoints, experimental
```
-**NOTE:** Multibase-prefixes are encoding agnostic. "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). For example, in UTF-32, "z" would be `[0x7a, 0x00, 0x00, 0x00]`. Also note the difference between `0x00` (codepoint 0 or 0x00) and `0` (codepoint 48 or 0x30).
+**NOTE:** Multibase-prefixes are encoding agnostic. "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). In UTF-32, for example, that same "z" would be `[0x7a, 0x00, 0x00, 0x00]` not `[0x7a]`, so detecting and dropping an initial byte of `0x7a` would not suffice to confirm the rest was `base58btc`-encoded bytes; `[0x7a, 0x00, 0x00, 0x00]` would instead be the UTF-32 bytes that correspond to the `z` codepoint for that entry, and the entire byte array would need to be detected and dropped. Also note the difference between `0x00` (codepoint 0 or 0x00) and `0` (codepoint 48 or 0x30).
## Specifications
@@ -102,24 +103,26 @@ Below is a list of specs for the underlying base encodings:
- `base58flickr` https://datatracker.ietf.org/doc/html/draft-msporny-base58-02, but using a different alphabet
- `proquint` [Proquint RFC](rfcs/Proquint.md), which is the [original spec](https://arxiv.org/html/0901.4016) with an added prefix for legibility
-## Reserved
-
-The following codes are _reserved_ for (backwards) compatibility with existing systems.
-* `/` - Separator used by [multiaddr](https://github.com/multiformats/multiaddr).
-* `1` - Base58 encoded identity multihashes used by libp2p peer IDs.
-* `Q` - Base58 encoded sha2-256 multihashes used by libp2p/ipfs for peer IDs and CIDv0.
-
-If you'd like to switch a project over to multibase and would also like to
-reserve a prefix for compatibility, please file an issue.
## Status
Each multibase encoding has a status:
-* draft - these encodings have been proposed but are not widely implemented and may be removed.
-* candidate - these encodings are mature and widely implemented but may not be implemented by all implementations.
-* default - these encodings should be implemented by all implementations and are widely used.
+* reserved - for functional reasons or to avoid collisions with other multi-* registries, this registry cannot accept registrations at this code-point and implementing one unregistered is discouraged for interoperability reasons
+* experimental - these encodings have been proposed but are not widely implemented and may be removed.
+* draft - these encodings are mature and widely implemented but may not be implemented by all implementations.
+* final - these encodings should be implemented by all implementations and are widely used.
+* deprecated - this entry will likely be removed and reassigned in the future and it will not likely become a `final` registration
+
+### Reserved Terms
+
+The following codes are _reserved_ and cannot be registered in the `multibase` table. Note that all three of the Unicode entries, expressed as the [unsigned varint] expression of that Unicode code-point in UTF-8, correspond to widely-used entries in the [multiformats registry group] that could create confusions for some legacy systems handling both binary and multibased structures from other multiformats. While technically the multibase registry is not part of the [multiformats registry group], these reservations minimize risk of confusion when composing multiple multiformats in one data system.
+
+* `NUL` (n/a) - Legacy data may be found with null-byte-prefixed binary structures mixed in among multibase-encoded ones in arrays of data, although support for this is no longer mandated by conformant implementations.
+* `/` (U+002F) - Separator used by [multiaddr].
+* `1` (U+0031) - Base58-encoded identity multihashes used by libp2p peer IDs.
+* `Q` (U+0011) - Base58-encoded sha2-256 multihashes used by libp2p/ipfs for peer IDs and CIDv0.
## Multibase By Example
@@ -157,11 +160,15 @@ Yes. If i give you `"1214314321432165"` is that decimal? or hex? or something el
> Why the strange selection of codes / characters?
-The code values are selected such that they are included in the alphabets of the base they represent. For example, `f` is the base code for `base16 (hex)`, because `f` is in hex's 16 character alphabet. Note that the alphabets can be encoded in UTF8, and most can be encoded in ASCII. We have not found a case needing something else.
+The code values are selected such that they are included in the alphabets of the base they represent.
+For example, `f` is the base code for `base16 (hex)`, because `f` is in hex's 16 character alphabet.
+Note that most of the alphabets used can be encoded in UTF-8, and most but not all can be encoded in ASCII.
+We have yet not found a case needing something else.
> Don't we have to agree on a table of base encodings?
-Yes, but we already have to agree on base encodings, so this is not hard. The table even leaves some room for custom encodings.
+Yes, but we already have to agree on base encodings, so this is not hard.
+The table even leaves some room for custom encodings and is intended to work both in contexts where the encodings are known or agreed on and open-world or brownfield contexts where these may vary.
## Implementations:
@@ -188,16 +195,26 @@ Yes, but we already have to agree on base encodings, so this is not hard. The ta
## Disclaimers
-Warning: **obviously multibase changes the first character depending on the encoding**. Do not expect the value to be exactly the same. Remove the multibase prefix before using the value.
+Warning: **obviously multibase changes the first character depending on the encoding**.
+Do not expect the value to be exactly the same.
+Remove the multibase prefix before using the value.
## Contribute
-Contributions welcome. Please check out [the issues](https://github.com/multiformats/multibase/issues).
+Contributions welcome.
+Please check out [the issues](https://github.com/multiformats/multibase/issues) and reading the [contributing document](https://github.com/multiformats/multiformats/blob/master/contributing.md) for the greater multiformats project before opening your first issue, as the workflow and the relation of multibase to the greater project both benefit from this context.
+more information on how we work, and about contributing in general.
-Check out our [contributing document](https://github.com/multiformats/multiformats/blob/master/contributing.md) for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS [Code of Conduct](https://github.com/ipfs/community/blob/master/code-of-conduct.md).
-
-Small note: If editing the README, please conform to the [standard-readme](https://github.com/RichardLitt/standard-readme) specification.
+If you'd like to switch a project over to multibase, whether by creating a new multibase implementation or building on one of those listed above, please file an issue in this repository using the "Interested in implementing" issue template.
+If would also like to reserve a prefix for compatibility, please file a separate issue in this repository using the "New Registration" issue template.
## License
-This repository is only for documents. All of these are licensed under the [CC-BY-SA 3.0](https://ipfs.io/ipfs/QmVreNvKsQmQZ83T86cWSjPu2vR3yZHGPm5jnxFuunEB9u) license © 2016 Protocol Labs Inc. Any code is under a [MIT](LICENSE) © 2016 Protocol Labs Inc.
+This repository is only for documents.
+All of these are licensed under the [CC-BY-SA 3.0](https://ipfs.io/ipfs/QmVreNvKsQmQZ83T86cWSjPu2vR3yZHGPm5jnxFuunEB9u) license © 2016 Protocol Labs Inc.
+Any code is under a [MIT](LICENSE) © 2016 Protocol Labs Inc.
+
+[multiaddr]: https://github.com/multiformats/multiaddr
+[multiformats registry group]: https://github.com/multiformats/multicodec/blob/master/table.csv
+[unsigned varint]: https://github.com/multiformats/unsigned-varint
+[code point]: https://infra.spec.whatwg.org/#code-points
\ No newline at end of file
diff --git a/multibase.csv b/multibase.csv
index 8ba3b06..2472ed9 100644
--- a/multibase.csv
+++ b/multibase.csv
@@ -1,26 +1,29 @@
-encoding, code, description, status
-identity, 0x00, 8-bit binary (encoder and decoder keeps data unmodified), default
-base2, 0, Binary (01010101), candidate
-base8, 7, Octal, draft
-base10, 9, Decimal, draft
-base16, f, Hexadecimal (lowercase), default
-base16upper, F, Hexadecimal (uppercase), default
-base32hex, v, RFC4648 case-insensitive - no padding - highest char, candidate
-base32hexupper, V, RFC4648 case-insensitive - no padding - highest char, candidate
-base32hexpad, t, RFC4648 case-insensitive - with padding, candidate
-base32hexpadupper, T, RFC4648 case-insensitive - with padding, candidate
-base32, b, RFC4648 case-insensitive - no padding, default
-base32upper, B, RFC4648 case-insensitive - no padding, default
-base32pad, c, RFC4648 case-insensitive - with padding, candidate
-base32padupper, C, RFC4648 case-insensitive - with padding, candidate
-base32z, h, z-base-32 (used by Tahoe-LAFS), draft
-base36, k, Base36 [0-9a-z] case-insensitive - no padding, draft
-base36upper, K, Base36 [0-9a-z] case-insensitive - no padding, draft
-base58btc, z, Base58 bitcoin, default
-base58flickr, Z, Base58 flicker, candidate
-base64, m, RFC4648 no padding, default
-base64pad, M, RFC4648 with padding - MIME encoding, candidate
-base64url, u, RFC4648 no padding, default
-base64urlpad, U, RFC4648 with padding, default
-proquint, p, Proquint (https://arxiv.org/html/0901.4016), draft
-base256emoji, 🚀, Base256 with custom alphabet using variable-sized-codepoints, draft
\ No newline at end of file
+Unicode, character, encoding, description, status
+U+0000, NUL, none, (No base encoding), reserved
+U+0030, 0, base2, Binary (01010101), experimental
+U+0031, 1, none, (No base encoding) reserved
+U+0037, 7, base8, Octal, draft
+U+0039, 9, base10, Decimal, draft
+U+0066, f, base16, Hexadecimal (lowercase), final
+U+0006, F, base16upper, Hexadecimal (uppercase), final
+U+0076, v, base32hex, RFC4648 case-insensitive - no padding - highest char, experimental
+U+0056, V, base32hexupper, RFC4648 case-insensitive - no padding - highest char, experimental
+U+0074, t, base32hexpad, RFC4648 case-insensitive - with padding, experimental
+U+0054, T, base32hexpadupper, RFC4648 case-insensitive - with padding, experimental
+U+0062, b, base32, RFC4648 case-insensitive - no padding, final
+U+0042, B, base32upper, RFC4648 case-insensitive - no padding, final
+U+0063, c, base32pad, RFC4648 case-insensitive - with padding, draft
+U+0043, C, base32padupper, RFC4648 case-insensitive - with padding, draft
+U+0068, h, base32z, z-base-32 (used by Tahoe-LAFS), draft
+U+006b, k, base36, Base36 [0-9a-z] case-insensitive - no padding, draft
+U+004b, K, base36upper, Base36 [0-9a-z] case-insensitive - no padding, draft
+U+007a, z, base58btc, Base58 Bitcoin, final
+U+005a, Z, base58flickr, Base58 Flicker, experimental
+U+006d, m, base64, RFC4648 no padding, final
+U+004d, M, base64pad, RFC4648 with padding - MIME encoding, experimental
+U+0075, u, base64url, RFC4648 no padding, final
+U+0055, U, base64urlpad, RFC4648 with padding, final
+U+0070, p, proquint, Proquint (https://arxiv.org/html/0901.4016), experimental
+U+002F, Q, none, (no base encoding) reserved
+U+002F, /, none, (no base encoding) reserved
+U+1F680, 🚀, base256emoji, base256 with custom alphabet using variable-sized-codepoints, experimental