Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata for offline signers #46

Closed

Conversation

Slesarew
Copy link

@Slesarew Slesarew commented Nov 8, 2023

Add a digest of metadata to signed extensions as mentioned here paritytech/polkadot-sdk#2016


## Explanation

Detailed description of metadata shortening and digest process is provided in [metadata-shortener](https://github.com/Alzymologist/metadata-shortener) crate (see `cargo doc --open` and examples).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point of writing an RFC is to describe the design in plain English here.
You can't just link to a third party repository that you modify whenever you want. The fellowship votes to accept specific design details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I'll be adding the detailed description soon. This is first draft mostly for the rest of the team to view.


### Digest

1. Blake3 hash is computed for each chunk of modular short metadata registry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that Polkadot and Substrate use blake2 basically everywhere, is there a reason to prefer blake3?
We're not going to migrate from blake2 to blake3 any time soon, so whatever program implements this would most likely need to support both blake2 and blake3.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two bottleneck tools support blake3 nicely (I understand that Ledger has hardware support or something comparably low-footprint), it is somewhat faster than blake2 even on single core, it is better defined, and we expect this standard to outlive several Metadata revisions ideally (as it has so much fewer features than metadata) thus it might end up living long enough to see ecosystem transition. We've discussed this overhead with Zondax and both teams see this as advantage in our cold devices; adding blake3 to Polkadot Vault is trivial, and we have no other tools that need this functionality at the moment - best guess is that if Ledger and Kampela can run both blakes, everything that's released later sure would be able too.

### Digest

1. Blake3 hash is computed for each chunk of modular short metadata registry.
2. Hashes are sorted and constructed into static Merkle tree as implemented in `merkle_cbt` crate using blake3 digest of concatenated child nodes values for merging.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same remark. We want the format to be clear and precise, we can't say "we'll use whatever format this third party library does". There are implementers that don't even use Rust.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was also expecting that we get the details/spec of how the merkelization works.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, I'll be adding algorithm to spec soon.


### Metadata descriptor

Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, SCALE-encoded base58 prefix, SCALE-encoded decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor.
Copy link
Contributor

@carlosala carlosala Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base58prefix is encoded as u16 and decimals as u8 in the reference implementation. It seems that fixed-width uints work better here since their size is known.
base58prefix is u16 according to the spec and using more than 255 decimal values is not rational since max value for u256 is 10e78.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, SCALE-encoded base58 prefix, SCALE-encoded decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor.
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, `u8` decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor.

### Metadata modularization

1. Types registry is stripped from `docs` fields.
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to mention that 0-variant enums are treated as regular types (as 1-variant enums).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants).
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants). Enums with 0 variants are treated as regular types.

text/0000-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
@carlosala
Copy link
Contributor

It'd be great to define how lemmas and indices are sorted when computing ShortMetadata. Then, we could implement other shorteners (and proof-generator) in any language with the same output.
Then, every signer could modify that and transmit it to the device as needed, not caring about who created the hashes.
With the current implementation, indices are sorted using its associated entry blake3 hash (comparing byte per byte), and lemmas are sorted by their index in descending order. I propose to sort indices in ascending order (matching the registry order).

@Slesarew
Copy link
Author

It'd be great to define how lemmas and indices are sorted when computing ShortMetadata.

Isn't it outside of RFC scope? This is not related to chain behaviour, it's purely on cold protocol side. We can spec it out elsewhere.

@carlosala
Copy link
Contributor

Isn't it outside of RFC scope? This is not related to chain behaviour, it's purely on cold protocol side. We can spec it out elsewhere.

Yep, it's probably outside of the chain's scope. Let's keep it in mind, it's actually important to have a standardized way of transmitting data.

@Slesarew Slesarew marked this pull request as ready for review November 20, 2023 21:10
@carlosala
Copy link
Contributor

@Slesarew still missing the changes proposed above!

Copy link
Contributor

@xlc xlc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

English isn't my first language but I can still tell some sentences could be improved to help readers to understand them. e.g. There are too many very long sentences and I got out of memory exception in my head and need to retry it multiple times to successfully process the sentence.

text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
Copy link
Contributor

@xlc xlc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I will be able to reimplement this work purely using the details provided by this RFC. Maybe more pseudo code will help.

- Metadata information that could be used in signable extrinsic decoding MUST be be included in digest;
- Digest MUST be deterministic with respect to metadata;
- Digest MUST be cryptographically strong against pre-image, both first and second;
- Extra-metadata information necessary for extrinsic decoding and constant within runtime version MUST be included in digest;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't 100% understand this. Do you mean something like:
Additional information (on top of the metadata) that's required to decode extrinsics and runtime constants for a given runtime version must be included in the digest?
But what are those information? Shouldn't metadata itself includes everything?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, metadata does not contain all information. A few extra things like base currency name ("DOT") and base currency value in plancks are often (not always) not included there. This is a known issue that required us to separately transfer metadata and "additional network specifications" to cold wallets in past. Here we just combine those.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If metadata does not contain all the necessary metadata, I will suggest just fix the metadata. I raised a somewhat related issue before paritytech/substrate#3686

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a nice solution; however, for things like "units" we'll need more robust automation, those are not arbitrary values, but items declared in runtime. We'll need to make them properly coded part of metadata, otherwise parachain upgrading will end up in mess.

I'm all for fixing metadata issues, but I think this should be done one at a time. This particular issue that units are not mentioned anywhere was discussed for a long time and no effective solution was proposed. I think we should move on with this project and then, if there is a proper way to include units in metadata for all chains, we'll keep wasting couple byte in transfer protocol for empty values and then eventually remove this extra information if an upgrade to shortened metadata happens.

text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved

#### Reduce metadata size

Metadata should be stripped from parts that are not necessary to parse a signable extrinsic, then it should be separated into a finite set of self-descriptive chunks. Thus, a subset of chunks necessary for signable extrinsic decoding and rendering could be sent, possibly in small portions (ultimately - one at a time), to cold device together with proof.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Metadata should be stripped from parts that are not necessary to parse a signable extrinsic, then it should be separated into a finite set of self-descriptive chunks. Thus, a subset of chunks necessary for signable extrinsic decoding and rendering could be sent, possibly in small portions (ultimately - one at a time), to cold device together with proof.
Metadata should be stripped from parts that are not necessary to parse a signable extrinsic, then it should be separated into a finite set of self-descriptive chunks. Thus, a subset of chunks necessary for signable extrinsic decoding and rendering could be sent, possibly in small portions, to cold device together with proof.

I don't know why "one at a time" is "ultimately" because parallel transmitting could be more efficient when appliable?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be all varieties of options ideally, to fit growing metadata into as small footprint systems as possible. There should also be options to send it in bulk data-efficient format where transmission channel is limiting. This will ensure that we are future-compatible with unpredictably developing hardware (well, and software) systems. Proposed solution solves this challenge.

text/0046-metadata-for-offline-signers.md Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
@xlc
Copy link
Contributor

xlc commented Dec 4, 2023

To confirm my understanding is correct:
An algorithm is described to chunk FRAME metadata and generate a binary merkle tree from it.
This metadata merkle tree root and spec version, spec name, and other information are hashed and used to create a versioned digest.
The extrinsic format is updated to include this versioned digest to be part of the signing payload.
The signer is expected to receive a small metadata chunks to decode extrinsic. It will use the received chunks and merkle proof to reconstruct the versioned digest and use it in the signing payload.
The runtime is expected to verify the signing payload is valid.


### Metadata descriptor

Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, `u8` decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, `u8` decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor.
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, should be prepared and combined as metadata descriptor.

Not every chain have token. Most of the chain have multiple tokens. Besides, why do you need them here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chains without token would naturally have empty fields for token name and zero as unit value. This is how Polkadot Vault handles this now, nobody came up with better solution. These are not contained in metadata so must be transferred separately. It is not possible to decode a transaction without this information in Polkadot network. If you have better solution that would support:

  1. chains with units
  2. chains without units
  3. chains with multiple units
    and be simple enough, let's discuss and integrate it.

@tomaka
Copy link
Contributor

tomaka commented Dec 18, 2023

I've now re-read this RFC multiple times.
I know what a trie is, I know what a Merkle proof is and the details of the format of a Merkle proof in Substrate, and I know what the metadata is and what is found inside.
Yet as I've already commented, I don't see how one could possibly implement either generating or parsing this compact metadata by reading this RFC alone.

This RFC uses a lot of terminology that I have no idea what they mean (either generally or in this context), such as "token unit String", "modularization", "merging protocol", or "shortening". All these terms should really be introduced or explained.

In the middle of the RFC there's suddenly the word "leaves", but I don't think I've seen where it ever introduces the fact that there's a trie. I understand that the metadata is split in a trie, but the reason why I understand that is because I've followed in the past the development of this feature, because I don't think that's even mentioned in the RFC.

It would also have been nice to give a brief overview of what the metadata contains, rather than assume that the reader is familiar with that, because I don't think that there is a single document that can be found online that describes the metadata in details. This would make it possible to review what is left and what is stripped.

It seems that this RFC is just a collection of notes that were written down while the implementation was taking place, rather than an actual presentation and explanation of the design. It seems to assume that the reader is already familiar with the implementation, and in general makes no effort to be accessible.

It's consequently almost impossible for me to properly review this.

text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
@bkchr
Copy link
Contributor

bkchr commented Dec 19, 2023

I support what @tomaka is saying. IMO it requires much more information, like the actual data structures in pseudo code etc.

It would also have been nice to give a brief overview of what the metadata contains, rather than assume that the reader is familiar with that, because I don't think that there is a single document that can be found online that describes the metadata in details. This would make it possible to review what is left and what is stripped.

I would say that the actual metadata is out of scope. However, we should explain what it is as Pierre said and then explain the structures that are put into the trie etc.

Slesarew and others added 5 commits December 21, 2023 11:59
* More details and pseudocode examples

* Minor corrections
Copy link
Contributor

@bkchr bkchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Goes into a good direction. However, I still miss a real explanation of the types that are part of the shortened metadata, as they are crucial for the protocol being described here.

text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved

The root of metadata computed by cold device MAY be included into Signed Extensions; this way the transaction will pass as valid iff hash of metadata as seen by cold storage device is identical to consensus hash of metadata, ensuring fair signing protocol.

The Signed Extension representing metadata digest is a single byte representing both digest vaule inclusion and shortening protocol version; this MUST be included in Signed Extensions set. Depending on its value, a digest value is included as `additionalSigned` to signature computation according to following specification:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not the entire digest is added to additional signed? Only the hash? The version will already be added to the "signed extension data" that gets signed.

I would also maybe drop additionalSigned and just explain that it gets added to the data that gets hashed as part of the extrinsic or not.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we add 1 byte that is version/inclusion byte, and then we add 32-byte digest to that part of data. Not explaining that it is part of additionalSigned would just add to unmeasurable amount of confusion around this obscure entity. I lost count of times I've been asked how Signed Extensions work and I spent way too much time figuring it out myself first, probably massively pestered you as well, I don't remember. I'm pretty solid that this should stay here if we want to have any hope of more implementations appearing eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant just to drop additionalSigned and describe it in a more human friendly way. Because outside of Substrate it is maybe not called additionalSigned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would I explain it differently? It's a field that's called that way in metadata and at least in major Rust and JS codebases, if we drop the name or change it nobody would know where this belongs. Things that go into signature and their order are extremely confusing. It's not described anywhere (yet) in full. So if we just say that it is added to data that gets signed, the manner of addition would remain unclear - and I'm not describing this step anywhere else in the document.

text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
@Slesarew
Copy link
Author

Just realized this was the main question. The types are exactly same as they are defined in frame-metadata, so that we don't have to re-implement every single decoding tool. The only difference is that docs are left blank.

But the types in frame-metadata probably change per version. There should be a fixed representation of the types being used in this RFC.

Ok, I've mentioned that we imitate MetadataV14 and also added the structure.

text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved

1. `u8` metadata shortening protocol version,
2. SCALE-encoded `ExtrinsicMetadata`,
3. SCALE-encoded `spec_version` `String`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this is string, not u32?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we've seen parachains and standalone chains where this was not u32. Those went extinct some time ago though. This is not forbidden anywhere and people are experimenting sometimes, there is no reason to punish them. Or is there? We certainly should change it into u32 (it would be easier and cleaner indeed) if this is planned to be forced, but that has to be documented well in chain developer docs first, which is out of my reach.

7. SCALE-encoded `tokenSymbol` `String` defined on chain to identify the name of currency (available for example through `system.properties()` RPC call) or empty string if no base units are defined,

```
enum MetadataDescriptor {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is SCALE encoded right?

text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
spec_version: Vec<u8>, // SCALE form `String`
spec_name: Vec<u8>, // SCALE from `String`
base58_prefix: u16,
decimals: u8,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about for the chain that does not have native token or multiple native tokens? I guess we can pass 0 and empty vec for symbol but best to be explicit

Values for:

1. `u8` metadata shortening protocol version,
2. SCALE-encoded `ExtrinsicMetadata`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is ExtrinsicMetadata?


### Metadata modularization

Structure of types in shortened metadata exactly matches structure of types in `scale-info` at MetadataV14 state, but `doc` field is always empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to use V15?

text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved

### Increased transaction size

A 1-byte increase in transaction size due to signed extension value. Digest is not included in transferred transaction, only in signing process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we do already have a version byte in extrinsic format. maybe we can just use it instead of a new byte?

@bkchr
Copy link
Contributor

bkchr commented Jan 30, 2024

I already posted this yesterday internally in some chat, I also want to make it public here. I'm still not convinced that you can implement this Rfc by reading the Rfc. To actually implementing you will need to lookup all the types that are not mentioned here in the scale codec crate. I would personally also layout a more customized format for the types that would for example remove the docs field everywhere. Currently the docs are emptied everywhere, but it means that there is everytime a byte being used for each encoded type information.

@Slesarew
Copy link
Author

I already posted this yesterday internally in some chat, I also want to make it public here. I'm still not convinced that you can implement this Rfc by reading the Rfc. To actually implementing you will need to lookup all the types that are not mentioned here in the scale codec crate.

Do you suggest we just place whole frame-metadata documentation here? We are trying to reuse the types, as was mentioned many times, as this way we reuse old decoding code and speed up adoption.

I would personally also layout a more customized format for the types that would for example remove the docs field everywhere. Currently the docs are emptied everywhere, but it means that there is everytime a byte being used for each encoded type information.

and this would exactly break that idea for - what? It's typical 20 chunks, at most 50 for heavy XCM calls - so we'll win 50 bytes out of few kb of chunks and proof data, that are not even close to the chain - that are just transferred to cold device and consumed there, with auxiliary data, erasure overheads, and error correction codes (or flash momentarily at build time of runtime to generate digest constant).

@Slesarew
Copy link
Author

We've also discussed this privately and would change the format slightly to better adhere to metadata V15 soon. This would remain backward-compatible with V14, as well as non-shortened metadata itself. I'll commit the update soon.

…ns (#2)

* docs: modified V15-oriented shortened structure

* Include Basti's writings

* Update text/0046-metadata-for-offline-signers.md

Co-authored-by: Carlo Sala <[email protected]>

* Update text/0046-metadata-for-offline-signers.md

Co-authored-by: Carlo Sala <[email protected]>

* Update text/0046-metadata-for-offline-signers.md

* add string runtime version argument

* Update text/0046-metadata-for-offline-signers.md

Co-authored-by: Carlo Sala <[email protected]>

* Update text/0046-metadata-for-offline-signers.md

Co-authored-by: Carlo Sala <[email protected]>

* Update text/0046-metadata-for-offline-signers.md

Co-authored-by: Carlo Sala <[email protected]>

* remove examples of what `spec_version` could be

in a ideomatic universe

* docs: grammar

---------

Co-authored-by: Carlo Sala <[email protected]>
text/0046-metadata-for-offline-signers.md Outdated Show resolved Hide resolved
@archtuxfan
Copy link

No update since Jan 30?

@Slesarew
Copy link
Author

No update since Jan 30?

My fellow archer, I've been informed that the Fellowship intends to re-do this work internally based on this proposal; thus this (which is certainly a finished text as far as I can tell) should be superseded now by another PR. The latter is supposed to be voted on and merged quite soon as far as I understand, so that's good news. I think this PR would just be closed in case of success afterwards.

@archtuxfan
Copy link

No update since Jan 30?

My fellow archer, I've been informed that the Fellowship intends to re-do this work internally based on this proposal; thus this (which is certainly a finished text as far as I can tell) should be superseded now by another PR. The latter is supposed to be voted on and merged quite soon as far as I understand, so that's good news. I think this PR would just be closed in case of success afterwards.

My understanding is that this needs to pass before the ledger app gets released. Wondering if anyone has a sense on the ETA?

@ggwpez
Copy link
Member

ggwpez commented Mar 13, 2024

Isnt this superseded by #78?

@bkchr
Copy link
Contributor

bkchr commented Mar 20, 2024

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants