From bbee2b1a9fba47ddf4ddbed7cc26b017c3afade7 Mon Sep 17 00:00:00 2001 From: Steven Allen Date: Wed, 30 Aug 2017 13:39:24 -0700 Subject: [PATCH 1/2] Update the spec from the implementation. 1. CIDv0 only supports SHA256 multihashes. 2. In CIDv0, the multibase *can* be specified but defaults to base58btc. This commit also describes the proper algorithm for decoding CIDs as it's non-obvious. Fixes #11 --- README.md | 25 ++++++++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 18c52b1..8a75a80 100644 --- a/README.md +++ b/README.md @@ -89,7 +89,7 @@ CIDv0 is a backwards-compatible version, where: - the `multibase` is always `base58btc` and implicit (not written) - the `multicodec` is always `protobuf-mdag` and implicit (not written) - the `cid-version` is always `cidv0` and implicit (not written) -- the `multihash` is written as is. +- the `multihash` is written as is but is always a full (length 32) sha256 hash. ``` cidv0 ::= @@ -103,6 +103,29 @@ See the section: [How does it work? - Protocol Description](#how-does-it-work-pr ::= ``` +## Decoding Algorithm + +To decode a CID, follow the following algorithm: + +1. If it's a string (ASCII/UTF-8): + * If it is 46 characters long and starts with `Qm...`, it's a CIDv0. Decode it as base58btc and: + * The CID's multihash is the decoded CID. + * The CID's multicodec is DagProtobuf. + * The CID's version is 0. + * Otherwise, decode it according to the multibase spec. +2. Given a (binary) CID (`cid`): + * If it's 34 bytes long with the leading bytes `[0x12, 0x20, ...]`, it's a CIDv0. + * The CID's multihash is `cid`. + * The CID's multicodec is DagProtobuf + * The CID's version is 0. + * Otherwise, let `N` be the first varint in `cid`. This is the CID's version. + * If `N == 1` (CIDv1): + * THe CID's multicodec is the second varint in `cid` + * The CID's multihash is the rest of the `cid` (after the second varint). + * The CID's version is 1. + * If `N <= 0`, the CID is malformed. + * If `N > 1`, the CID version is reserved. + ## Implementations - [go-cid](https://github.com/ipfs/go-cid) From 177f8d308d5060c7e5b2049c0b428d215fc9656c Mon Sep 17 00:00:00 2001 From: Steven Allen Date: Thu, 31 Aug 2017 20:27:50 -0700 Subject: [PATCH 2/2] Update the CID decoding algorithm to explicitly forbid multibase encoded CIDv0s --- README.md | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index 8a75a80..216bbd6 100644 --- a/README.md +++ b/README.md @@ -108,11 +108,10 @@ See the section: [How does it work? - Protocol Description](#how-does-it-work-pr To decode a CID, follow the following algorithm: 1. If it's a string (ASCII/UTF-8): - * If it is 46 characters long and starts with `Qm...`, it's a CIDv0. Decode it as base58btc and: - * The CID's multihash is the decoded CID. - * The CID's multicodec is DagProtobuf. - * The CID's version is 0. - * Otherwise, decode it according to the multibase spec. + * If it is 46 characters long and starts with `Qm...`, it's a CIDv0. Decode it as base58btc and continue to step 2. + * Otherwise, decode it according to the multibase spec and: + * If the first decoded byte is 0x12, return an error. CIDv0 CIDs may not be multibase encoded and there will be no CIDv18 (0x12 = 18) to prevent ambiguity with decoded CIDv0s. + * Otherwise, you now have a binary CID. Continue to step 2. 2. Given a (binary) CID (`cid`): * If it's 34 bytes long with the leading bytes `[0x12, 0x20, ...]`, it's a CIDv0. * The CID's multihash is `cid`. @@ -120,7 +119,7 @@ To decode a CID, follow the following algorithm: * The CID's version is 0. * Otherwise, let `N` be the first varint in `cid`. This is the CID's version. * If `N == 1` (CIDv1): - * THe CID's multicodec is the second varint in `cid` + * The CID's multicodec is the second varint in `cid` * The CID's multihash is the rest of the `cid` (after the second varint). * The CID's version is 1. * If `N <= 0`, the CID is malformed.