-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metadata for offline signers #46
Metadata for offline signers #46
Conversation
|
||
## Explanation | ||
|
||
Detailed description of metadata shortening and digest process is provided in [metadata-shortener](https://github.com/Alzymologist/metadata-shortener) crate (see `cargo doc --open` and examples). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of writing an RFC is to describe the design in plain English here.
You can't just link to a third party repository that you modify whenever you want. The fellowship votes to accept specific design details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll be adding the detailed description soon. This is first draft mostly for the rest of the team to view.
|
||
### Digest | ||
|
||
1. Blake3 hash is computed for each chunk of modular short metadata registry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that Polkadot and Substrate use blake2 basically everywhere, is there a reason to prefer blake3?
We're not going to migrate from blake2 to blake3 any time soon, so whatever program implements this would most likely need to support both blake2 and blake3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two bottleneck tools support blake3 nicely (I understand that Ledger has hardware support or something comparably low-footprint), it is somewhat faster than blake2 even on single core, it is better defined, and we expect this standard to outlive several Metadata revisions ideally (as it has so much fewer features than metadata) thus it might end up living long enough to see ecosystem transition. We've discussed this overhead with Zondax and both teams see this as advantage in our cold devices; adding blake3 to Polkadot Vault is trivial, and we have no other tools that need this functionality at the moment - best guess is that if Ledger and Kampela can run both blakes, everything that's released later sure would be able too.
### Digest | ||
|
||
1. Blake3 hash is computed for each chunk of modular short metadata registry. | ||
2. Hashes are sorted and constructed into static Merkle tree as implemented in `merkle_cbt` crate using blake3 digest of concatenated child nodes values for merging. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same remark. We want the format to be clear and precise, we can't say "we'll use whatever format this third party library does". There are implementers that don't even use Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I was also expecting that we get the details/spec of how the merkelization works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Of course, I'll be adding algorithm to spec soon.
|
||
### Metadata descriptor | ||
|
||
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, SCALE-encoded base58 prefix, SCALE-encoded decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
base58prefix
is encoded as u16 and decimals
as u8 in the reference implementation. It seems that fixed-width uints work better here since their size is known.
base58prefix is u16 according to the spec and using more than 255 decimal values is not rational since max value for u256 is 10e78.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, SCALE-encoded base58 prefix, SCALE-encoded decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor. | |
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, `u8` decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor. |
### Metadata modularization | ||
|
||
1. Types registry is stripped from `docs` fields. | ||
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to mention that 0-variant enums are treated as regular types (as 1-variant enums).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants). | |
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants). Enums with 0 variants are treated as regular types. |
Co-authored-by: Carlo Sala <[email protected]>
It'd be great to define how lemmas and indices are sorted when computing |
Isn't it outside of RFC scope? This is not related to chain behaviour, it's purely on cold protocol side. We can spec it out elsewhere. |
Yep, it's probably outside of the chain's scope. Let's keep it in mind, it's actually important to have a standardized way of transmitting data. |
@Slesarew still missing the changes proposed above! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
English isn't my first language but I can still tell some sentences could be improved to help readers to understand them. e.g. There are too many very long sentences and I got out of memory exception in my head and need to retry it multiple times to successfully process the sentence.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I will be able to reimplement this work purely using the details provided by this RFC. Maybe more pseudo code will help.
- Metadata information that could be used in signable extrinsic decoding MUST be be included in digest; | ||
- Digest MUST be deterministic with respect to metadata; | ||
- Digest MUST be cryptographically strong against pre-image, both first and second; | ||
- Extra-metadata information necessary for extrinsic decoding and constant within runtime version MUST be included in digest; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't 100% understand this. Do you mean something like:
Additional information (on top of the metadata) that's required to decode extrinsics and runtime constants for a given runtime version must be included in the digest?
But what are those information? Shouldn't metadata itself includes everything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, metadata does not contain all information. A few extra things like base currency name ("DOT") and base currency value in planck
s are often (not always) not included there. This is a known issue that required us to separately transfer metadata and "additional network specifications" to cold wallets in past. Here we just combine those.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If metadata does not contain all the necessary metadata, I will suggest just fix the metadata. I raised a somewhat related issue before paritytech/substrate#3686
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a nice solution; however, for things like "units" we'll need more robust automation, those are not arbitrary values, but items declared in runtime. We'll need to make them properly coded part of metadata, otherwise parachain upgrading will end up in mess.
I'm all for fixing metadata issues, but I think this should be done one at a time. This particular issue that units are not mentioned anywhere was discussed for a long time and no effective solution was proposed. I think we should move on with this project and then, if there is a proper way to include units in metadata for all chains, we'll keep wasting couple byte in transfer protocol for empty values and then eventually remove this extra information if an upgrade to shortened metadata happens.
|
||
#### Reduce metadata size | ||
|
||
Metadata should be stripped from parts that are not necessary to parse a signable extrinsic, then it should be separated into a finite set of self-descriptive chunks. Thus, a subset of chunks necessary for signable extrinsic decoding and rendering could be sent, possibly in small portions (ultimately - one at a time), to cold device together with proof. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Metadata should be stripped from parts that are not necessary to parse a signable extrinsic, then it should be separated into a finite set of self-descriptive chunks. Thus, a subset of chunks necessary for signable extrinsic decoding and rendering could be sent, possibly in small portions (ultimately - one at a time), to cold device together with proof. | |
Metadata should be stripped from parts that are not necessary to parse a signable extrinsic, then it should be separated into a finite set of self-descriptive chunks. Thus, a subset of chunks necessary for signable extrinsic decoding and rendering could be sent, possibly in small portions, to cold device together with proof. |
I don't know why "one at a time" is "ultimately" because parallel transmitting could be more efficient when appliable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be all varieties of options ideally, to fit growing metadata into as small footprint systems as possible. There should also be options to send it in bulk data-efficient format where transmission channel is limiting. This will ensure that we are future-compatible with unpredictably developing hardware (well, and software) systems. Proposed solution solves this challenge.
To confirm my understanding is correct: |
|
||
### Metadata descriptor | ||
|
||
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, `u8` decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, `u8` decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor. | |
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, should be prepared and combined as metadata descriptor. |
Not every chain have token. Most of the chain have multiple tokens. Besides, why do you need them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chains without token would naturally have empty fields for token name and zero as unit value. This is how Polkadot Vault handles this now, nobody came up with better solution. These are not contained in metadata so must be transferred separately. It is not possible to decode a transaction without this information in Polkadot network. If you have better solution that would support:
- chains with units
- chains without units
- chains with multiple units
and be simple enough, let's discuss and integrate it.
I've now re-read this RFC multiple times. This RFC uses a lot of terminology that I have no idea what they mean (either generally or in this context), such as "token unit String", "modularization", "merging protocol", or "shortening". All these terms should really be introduced or explained. In the middle of the RFC there's suddenly the word "leaves", but I don't think I've seen where it ever introduces the fact that there's a trie. I understand that the metadata is split in a trie, but the reason why I understand that is because I've followed in the past the development of this feature, because I don't think that's even mentioned in the RFC. It would also have been nice to give a brief overview of what the metadata contains, rather than assume that the reader is familiar with that, because I don't think that there is a single document that can be found online that describes the metadata in details. This would make it possible to review what is left and what is stripped. It seems that this RFC is just a collection of notes that were written down while the implementation was taking place, rather than an actual presentation and explanation of the design. It seems to assume that the reader is already familiar with the implementation, and in general makes no effort to be accessible. It's consequently almost impossible for me to properly review this. |
I support what @tomaka is saying. IMO it requires much more information, like the actual data structures in pseudo code etc.
I would say that the actual metadata is out of scope. However, we should explain what it is as Pierre said and then explain the structures that are put into the trie etc. |
Co-authored-by: Alexander Gryaznov <[email protected]>
Co-authored-by: Alexander Gryaznov <[email protected]>
Co-authored-by: Alexander Gryaznov <[email protected]>
Co-authored-by: Alexander Gryaznov <[email protected]>
* More details and pseudocode examples * Minor corrections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Goes into a good direction. However, I still miss a real explanation of the types that are part of the shortened metadata, as they are crucial for the protocol being described here.
|
||
The root of metadata computed by cold device MAY be included into Signed Extensions; this way the transaction will pass as valid iff hash of metadata as seen by cold storage device is identical to consensus hash of metadata, ensuring fair signing protocol. | ||
|
||
The Signed Extension representing metadata digest is a single byte representing both digest vaule inclusion and shortening protocol version; this MUST be included in Signed Extensions set. Depending on its value, a digest value is included as `additionalSigned` to signature computation according to following specification: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the entire digest is added to additional signed? Only the hash? The version will already be added to the "signed extension data" that gets signed.
I would also maybe drop additionalSigned
and just explain that it gets added to the data that gets hashed as part of the extrinsic or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we add 1 byte that is version/inclusion byte, and then we add 32-byte digest to that part of data. Not explaining that it is part of additionalSigned
would just add to unmeasurable amount of confusion around this obscure entity. I lost count of times I've been asked how Signed Extensions work and I spent way too much time figuring it out myself first, probably massively pestered you as well, I don't remember. I'm pretty solid that this should stay here if we want to have any hope of more implementations appearing eventually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant just to drop additionalSigned
and describe it in a more human friendly way. Because outside of Substrate it is maybe not called additionalSigned
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would I explain it differently? It's a field that's called that way in metadata and at least in major Rust and JS codebases, if we drop the name or change it nobody would know where this belongs. Things that go into signature and their order are extremely confusing. It's not described anywhere (yet) in full. So if we just say that it is added to data that gets signed, the manner of addition would remain unclear - and I'm not describing this step anywhere else in the document.
Ok, I've mentioned that we imitate MetadataV14 and also added the structure. |
Co-authored-by: Bastian Köcher <[email protected]>
|
||
1. `u8` metadata shortening protocol version, | ||
2. SCALE-encoded `ExtrinsicMetadata`, | ||
3. SCALE-encoded `spec_version` `String`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this is string, not u32?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we've seen parachains and standalone chains where this was not u32
. Those went extinct some time ago though. This is not forbidden anywhere and people are experimenting sometimes, there is no reason to punish them. Or is there? We certainly should change it into u32
(it would be easier and cleaner indeed) if this is planned to be forced, but that has to be documented well in chain developer docs first, which is out of my reach.
7. SCALE-encoded `tokenSymbol` `String` defined on chain to identify the name of currency (available for example through `system.properties()` RPC call) or empty string if no base units are defined, | ||
|
||
``` | ||
enum MetadataDescriptor { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is SCALE encoded right?
spec_version: Vec<u8>, // SCALE form `String` | ||
spec_name: Vec<u8>, // SCALE from `String` | ||
base58_prefix: u16, | ||
decimals: u8, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about for the chain that does not have native token or multiple native tokens? I guess we can pass 0 and empty vec for symbol but best to be explicit
Values for: | ||
|
||
1. `u8` metadata shortening protocol version, | ||
2. SCALE-encoded `ExtrinsicMetadata`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is ExtrinsicMetadata
?
|
||
### Metadata modularization | ||
|
||
Structure of types in shortened metadata exactly matches structure of types in `scale-info` at MetadataV14 state, but `doc` field is always empty |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to use V15?
|
||
### Increased transaction size | ||
|
||
A 1-byte increase in transaction size due to signed extension value. Digest is not included in transferred transaction, only in signing process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we do already have a version byte in extrinsic format. maybe we can just use it instead of a new byte?
I already posted this yesterday internally in some chat, I also want to make it public here. I'm still not convinced that you can implement this Rfc by reading the Rfc. To actually implementing you will need to lookup all the types that are not mentioned here in the scale codec crate. I would personally also layout a more customized format for the types that would for example remove the docs field everywhere. Currently the docs are emptied everywhere, but it means that there is everytime a byte being used for each encoded type information. |
Co-authored-by: Xiliang Chen <[email protected]>
Co-authored-by: Xiliang Chen <[email protected]>
Co-authored-by: joe petrowski <[email protected]>
Co-authored-by: Xiliang Chen <[email protected]>
Do you suggest we just place whole
and this would exactly break that idea for - what? It's typical 20 chunks, at most 50 for heavy XCM calls - so we'll win 50 bytes out of few kb of chunks and proof data, that are not even close to the chain - that are just transferred to cold device and consumed there, with auxiliary data, erasure overheads, and error correction codes (or flash momentarily at build time of runtime to generate digest constant). |
We've also discussed this privately and would change the format slightly to better adhere to metadata V15 soon. This would remain backward-compatible with V14, as well as non-shortened metadata itself. I'll commit the update soon. |
…ns (#2) * docs: modified V15-oriented shortened structure * Include Basti's writings * Update text/0046-metadata-for-offline-signers.md Co-authored-by: Carlo Sala <[email protected]> * Update text/0046-metadata-for-offline-signers.md Co-authored-by: Carlo Sala <[email protected]> * Update text/0046-metadata-for-offline-signers.md * add string runtime version argument * Update text/0046-metadata-for-offline-signers.md Co-authored-by: Carlo Sala <[email protected]> * Update text/0046-metadata-for-offline-signers.md Co-authored-by: Carlo Sala <[email protected]> * Update text/0046-metadata-for-offline-signers.md Co-authored-by: Carlo Sala <[email protected]> * remove examples of what `spec_version` could be in a ideomatic universe * docs: grammar --------- Co-authored-by: Carlo Sala <[email protected]>
Co-authored-by: Juan Leni <[email protected]>
No update since Jan 30? |
My fellow archer, I've been informed that the Fellowship intends to re-do this work internally based on this proposal; thus this (which is certainly a finished text as far as I can tell) should be superseded now by another PR. The latter is supposed to be voted on and merged quite soon as far as I understand, so that's good news. I think this PR would just be closed in case of success afterwards. |
My understanding is that this needs to pass before the ledger app gets released. Wondering if anyone has a sense on the ETA? |
Isnt this superseded by #78? |
Yes. |
Add a digest of metadata to signed extensions as mentioned here paritytech/polkadot-sdk#2016