From 246742f0800ada5bc5c969ff6a1f788298052cfa Mon Sep 17 00:00:00 2001 From: sg495 Date: Wed, 20 Oct 2021 12:00:12 +0100 Subject: [PATCH 1/6] Fix incorrect test bytestring in case_insensitivity.csv Closes #84 --- tests/case_insensitivity.csv | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tests/case_insensitivity.csv b/tests/case_insensitivity.csv index 3037d9c..e824ff5 100644 --- a/tests/case_insensitivity.csv +++ b/tests/case_insensitivity.csv @@ -1,4 +1,4 @@ -non-canonical encoding, "hello world" +non-canonical encoding, "yes mani !" base16, "f68656c6c6f20776F726C64" base16upper, "F68656c6c6f20776F726C64" base32, "bnbswy3dpeB3W64TMMQ" From faf8272a970584eca27e74278e9f55c26675379e Mon Sep 17 00:00:00 2001 From: sg495 Date: Wed, 20 Oct 2021 12:19:29 +0100 Subject: [PATCH 2/6] Clarified pro-quints RFC, added RFC notes to table. I have edited the multibase proquint RFC to clarify how the full prefix works, and I have added an explicit reference to the RFC in the table (because the proquint data encoded by multibase is different from the data encoded by the original proquint spec). I have also added an explicit reference to the RFC for base8 in the table, because base8 according to the multibase RFC is significantly different from the base8 provided by other standard implementations (e.g. Python). --- multibase.csv | 4 ++-- rfcs/PRO-QUINT.md | 15 ++++++++++++--- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/multibase.csv b/multibase.csv index 33f4f09..6a934fc 100644 --- a/multibase.csv +++ b/multibase.csv @@ -1,7 +1,7 @@ encoding, code, description, status identity, 0x00, 8-bit binary (encoder and decoder keeps data unmodified), default base2, 0, binary (01010101), candidate -base8, 7, octal, draft +base8, 7, octal (see RFC), draft base10, 9, decimal, draft base16, f, hexadecimal, default base16upper, F, hexadecimal, default @@ -22,4 +22,4 @@ base64, m, rfc4648 no padding, base64pad, M, rfc4648 with padding - MIME encoding, candidate base64url, u, rfc4648 no padding, default base64urlpad, U, rfc4648 with padding, default -proquint, p, PRO-QUINT https://arxiv.org/html/0901.4016, draft +proquint, p, pro-quint https://arxiv.org/html/0901.4016 (see RFC), draft diff --git a/rfcs/PRO-QUINT.md b/rfcs/PRO-QUINT.md index 5d59aa4..64de275 100644 --- a/rfcs/PRO-QUINT.md +++ b/rfcs/PRO-QUINT.md @@ -1,7 +1,16 @@ # PRO-QUINT -See: https://arxiv.org/html/0901.4016 ([/ipfs/bafybeib5jsyi5igjwhi7hzkfebpvnq2ykbwpxeaaxlkyfyxqvcecoao4qa](https://dweb.link/ipfs/bafybeib5jsyi5igjwhi7hzkfebpvnq2ykbwpxeaaxlkyfyxqvcecoao4qa)). +For the original proquint specification, see: https://arxiv.org/html/0901.4016 ([/ipfs/bafybeib5jsyi5igjwhi7hzkfebpvnq2ykbwpxeaaxlkyfyxqvcecoao4qa](https://dweb.link/ipfs/bafybeib5jsyi5igjwhi7hzkfebpvnq2ykbwpxeaaxlkyfyxqvcecoao4qa)). -While the multibase prefix is `p`, the "full" prefix is actually `pro-`. This way, proquints are always easily pronouncable. For example +The multibase prefix for proquints is the character `p`. The base encoded data is the encoded data according to the original specification, with an additional `ro-` prefix: -`127.0.0.1`, as a multibase proquint encoded number, is `pro-lusab-babad`. +``` + +``` + +The resulting full prefix for the actual proquint encoded data is `pro-`, making multibase-encoded proquints easily pronouncable. +For example, the proquint encoding of the bytestring `[127, 0, 0, 1]` (the data for the IPv4 address `127.0.0.1`) is `lusab-babad`, so the corresponding multibase-encoded proquint bytestring is: + +``` +pro-lusab-babad +``` From dd8f0f547aa2ed56d6420ca0f7bcfba0da378e0a Mon Sep 17 00:00:00 2001 From: sg495 Date: Wed, 20 Oct 2021 12:26:36 +0100 Subject: [PATCH 3/6] Update README.md Closes #83 --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 6fddb64..a652e54 100644 --- a/README.md +++ b/README.md @@ -59,7 +59,7 @@ The current multibase table is [here](multibase.csv): encoding, code, description, status identity, 0x00, 8-bit binary (encoder and decoder keeps data unmodified), default base2, 0, binary (01010101), candidate -base8, 7, octal, draft +base8, 7, octal (see RFC), draft base10, 9, decimal, draft base16, f, hexadecimal, default base16upper, F, hexadecimal, default @@ -80,10 +80,10 @@ base64, m, rfc4648 no padding, base64pad, M, rfc4648 with padding - MIME encoding, candidate base64url, u, rfc4648 no padding, default base64urlpad, U, rfc4648 with padding, default -proquint, p, PRO-QUINT https://arxiv.org/html/0901.4016, draft +proquint, p, pro-quint https://arxiv.org/html/0901.4016 (see RFC), draft ``` -**NOTE:** Multibase-prefixes are encoding agnostic. "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). For example, in UTF-32, "z" would be `[0x7a, 0x00, 0x00, 0x00]`. +**NOTE:** Multibase-prefixes are encoding agnostic: "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). For example, in UTF-32, "z" would be `[0x7a, 0x00, 0x00, 0x00]`. In particular, the multibase code 0x00 listed for the identity encoding is the non-printable ASCII/UTF-8 character with codepoint 0x00, while the multibase code 0 listed for base2 is the ASCII/UTF-8 character "0" (which has codepoint 0x30). ## Reserved From bf712035f404695fe10fbbdde35812a83b0a10b8 Mon Sep 17 00:00:00 2001 From: sg495 Date: Wed, 20 Oct 2021 12:30:22 +0100 Subject: [PATCH 4/6] Update Base2.md Closes #80 --- rfcs/Base2.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/rfcs/Base2.md b/rfcs/Base2.md index 5352f83..df510c1 100644 --- a/rfcs/Base2.md +++ b/rfcs/Base2.md @@ -16,7 +16,7 @@ order, where each byte of the array is set to the character `1`, if the corresponding bit in the byte is set, and the character `0` if the corresponding bit is unset. -For example, `[0x58, 0x59, 0x60]` can be converted to multibase base2 as +For example, `[0x58, 0x59, 0x5a]` can be converted to multibase base2 as follows: ``` From b2cec76bb03f3a78a4be3d84a81d27ee9dd1d50c Mon Sep 17 00:00:00 2001 From: sg495 Date: Wed, 20 Oct 2021 13:13:57 +0100 Subject: [PATCH 5/6] Added links to specs, created an explicit identity spec for clarification Closes #76 --- README.md | 17 +++++++++++++++++ rfcs/identity.md | 41 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 58 insertions(+) create mode 100644 rfcs/identity.md diff --git a/README.md b/README.md index a652e54..7b1b894 100644 --- a/README.md +++ b/README.md @@ -85,6 +85,23 @@ proquint, p, pro-quint https://arxiv.org/html/0901.4016 (see RFC), **NOTE:** Multibase-prefixes are encoding agnostic: "z" is "z", not 0x7a ("z" encoded as ASCII/UTF-8). For example, in UTF-32, "z" would be `[0x7a, 0x00, 0x00, 0x00]`. In particular, the multibase code 0x00 listed for the identity encoding is the non-printable ASCII/UTF-8 character with codepoint 0x00, while the multibase code 0 listed for base2 is the ASCII/UTF-8 character "0" (which has codepoint 0x30). +## Specifications + +Below is a list of specs for the underlying base encodings: + +- `identity` [identity RFC](rfcs/identity.md) +- `base2` [base2 RFC](rfcs/Base2.md) +- `base8` [base8 RFC](rfcs/Base8.md), similar to [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html) +- `base10` [base10 RFC](rfcs/Base10.md) +- `base36` [base36 RFC](rfcs/Base36.md) +- `base16*` [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html) +- `base32*` (except for `base32z`) [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html) +- `base32z` [human-oriented base32 spec](https://philzimmermann.com/docs/human-oriented-base-32-encoding.txt) +- `base64*` [rfc4648](https://datatracker.ietf.org/doc/html/rfc4648.html) +- `base58btc` https://datatracker.ietf.org/doc/html/draft-msporny-base58-02 +- `base58flickr` https://datatracker.ietf.org/doc/html/draft-msporny-base58-02, but using alphabet `123456789abcdefghijkmnopqrstuvwxyzABCDEFGHJKLMNPQRSTUVWXYZ` +- `proquint` [proquint RFC](rfcs/PRO-QUINT.md), which is the [original spec](https://arxiv.org/html/0901.4016) with an added prefix for legibility + ## Reserved The following codes are _reserved_ for backwards compatibility with existing systems. diff --git a/rfcs/identity.md b/rfcs/identity.md new file mode 100644 index 0000000..7880bba --- /dev/null +++ b/rfcs/identity.md @@ -0,0 +1,41 @@ +# Identity + +The multibase identity prefix is the character non-printable ASCII/UTF-8 character with codepoint 0x00. Note that this is different from the multibase prefix 0 listed for base2, which is the ASCII/UTF-8 character "0" with codepoint 0x30. + + +## Encoding + +A byte array `b` is encoded by converting it to the Unicode string `s` having as its UTF-8 bytes the byte array `b` prefixed with a single zero byte. + +Below is a minimal implementation in Python, for clarification: + +```py +def encode_identity(b: bytes) -> str: + utf8_bytes = b"\x00"+b + return utf8_bytes.decode("utf-8") +``` + +## Decoding + +A Unicode string `s` is decoded by obtaining its UTF-8 bytes and dropping the leading byte. The UTF-8 byte array must be non-empty and the leading byte must be zero. + +Below is a minimal implementation in Python, for clarification: + +```py +def decode_identity(s: str) -> bytes: + utf8_bytes = s.encode("utf-8") + if not utf8_bytes or utf8_bytes[0] != 0: + raise ValueError("String not identity-encoded.") + return utf8_bytes[1:] +``` + +## Examples + +```py +>>> encode_identity(bytes([0x31, 0x63, 0x57])) +'\x001cW' +>>> decode_identity("\x001cW") +b'1cW' +>>> list(decode_identity("\x001cW")) +[49, 99, 87] # [0x31, 0x63, 0x57] +``` From 3a414e6a88f7d27b7d20d364a26869c93cd16f5b Mon Sep 17 00:00:00 2001 From: sg495 Date: Thu, 21 Jul 2022 14:18:44 +0100 Subject: [PATCH 6/6] Update README.md Added the Python module `multiformats` to the list of implementations. --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 7b1b894..942ba9b 100644 --- a/README.md +++ b/README.md @@ -177,6 +177,7 @@ Yes, but we already have to agree on base encodings, so this is not hard. The ta - [scala-multibase](//github.com/fluency03/scala-multibase) - [cpp-multibase](//github.com/cpp-ipfs/cpp-multibase) - [ruby-multibase](//github.com/sleeplessbyte/ruby-multibase) +- `multibase` sub-module of Python module [multiformats](//github.com/hashberg-io/multiformats) - [Add yours here!](//github.com/multiformats/multibase/edit/master/README.md)