Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metadata for offline signers #46

Closed
Closed
Changes from 3 commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
2fbe803
feat: first draft
Slesarew Nov 8, 2023
8d05445
docs: merkle tree construction rule
Slesarew Nov 14, 2023
36eb717
docs: specify minimal sufficient requirements for shortening
Slesarew Nov 14, 2023
acffee2
Update text/0000-metadata-for-offline-signers.md
Slesarew Nov 15, 2023
0f800bf
chore: set RFC index
Slesarew Nov 20, 2023
8870436
docs: apply suggestions from draft discussion
Slesarew Nov 22, 2023
2c006b4
docs: more detailed description of signed extension and version byte
Slesarew Dec 5, 2023
9b2beb2
Apply suggestions from code review
Slesarew Dec 19, 2023
888223a
Update text/0046-metadata-for-offline-signers.md
Slesarew Dec 21, 2023
0bb74b0
Update text/0046-metadata-for-offline-signers.md
Slesarew Dec 21, 2023
c2708a8
Update text/0046-metadata-for-offline-signers.md
Slesarew Dec 21, 2023
4f7d07b
Update text/0046-metadata-for-offline-signers.md
Slesarew Dec 21, 2023
439e119
More details and pseudocode examples (#1)
Slesarew Dec 26, 2023
ab1c0d4
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 18, 2024
4c42ca9
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 18, 2024
613d1bb
revert metadatadescriptor representation to enum
Slesarew Jan 18, 2024
674498d
replace nulls with empty vectors in pseudocode
Slesarew Jan 19, 2024
5ea8c9b
add pseudocode representation of type from `scale-info`
Slesarew Jan 19, 2024
9c4ccb0
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 22, 2024
9418178
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 22, 2024
4f317c5
pseudocode for tree construction
Slesarew Jan 22, 2024
7b472d1
Remove shortening, transmission, and cold verification
Slesarew Jan 22, 2024
4763f2a
mention that type structure matcher particular metadata version
Slesarew Jan 22, 2024
b192d19
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 29, 2024
958e934
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 29, 2024
a2d8791
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 29, 2024
6d9603b
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 29, 2024
10bfa00
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 29, 2024
17b046c
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 29, 2024
21dc94a
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 29, 2024
78b8e67
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 30, 2024
ebfa55f
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 30, 2024
a00ecf7
remove old mentions of scale encoding from metadata descriptor
Slesarew Jan 30, 2024
91e9bdf
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 30, 2024
2de8cd7
Update text/0046-metadata-for-offline-signers.md
Slesarew Jan 30, 2024
6cb7381
docs: modified V15-oriented shortened structure and Basti's suggesti…
Slesarew Feb 5, 2024
601c623
Update text/0046-metadata-for-offline-signers.md
Slesarew Feb 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
150 changes: 150 additions & 0 deletions text/0000-metadata-for-offline-signers.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,150 @@
# RFC-0000: Metadata for offline signers

| | |
| --------------- | ------------------------------------------------------------------------------------------- |
| **Start Date** | 2023-10-31 |
| **Description** | Add SignedExtension to check Metadata Root Hash |
| **Authors** | Alzymologist Oy, Zondax LLC, Parity GmbH |

## Summary

Add a metadata digest value (33-byte constant within fixed `spec_version`) to Signed Extensions to supplement signer party with proof of correct extrinsic interpretation. The digest value is generated once before release and is well-known and deterministic. The digest mechanism is designed to be modular and flexible and to support partial metadata transfer as needed by the signing party's extrinsic decoding mechanism and taking into account signing devices potentially limited communication bandwidth and memory capacity.

## Motivation

### Background

While all blockchain systems support (at least in some sense) offline signing used in air-gapped wallets and lightweight embedded devices, only few allow simultaneously complex upgradeable logic and full message decoding on the cold off-line signer side; Substrate is one of these heartening few, and therefore - we should build on this feature to greatly improve transaction security, and thus in general, network resilience.

As a starting point, it is important to recognise that prudence and due care are naturally required. As we build further reliance on this feature we should be very careful to make sure it works correctly every time so as not to create false sense of security.

In order to enable decoding that is small and optimized for chain storage transactions, a metadata entity is used, which is not at all small in itself (on the order of half-MB for most networks). This is a dynamic data chunk which completely describes chain interfaces and properties that could be made into a portable scale-encoded string for any given network version and passed along into an off-chain device to familiarize it with latest network updates. Of course, compromising this metadata anywhere in the path could result in differences between what user sees and signs, thus it is essential that we protect it.

Therefore, we have 2 problems to be solved:

1. Metadata is large, takes long time to be passed into a cold storage device with memory insufficient for its storage; metadata SHOULD be shortened and transmission SHOULD be optimized.
2. Metadata authenticity SHOULD be ensured.

As of now, there is no working solution for (1), as the whole metadata has to be passed to the device. On top of this, the solution for (2) heavily relies on a trusted party managing keys and ensuring metadata is indeed authentic: creating poorly decentralized points of potential failure.

### Solution requirements

#### Include metadata digest into signature

Some cryptographically strong digest of metadata MUST be included into signable blob. There SHALL NOT be storage overhead for this blob, nor computational overhead, on the node side; thus MUST be a constant within given runtime version, deterministically defined by metadata.

- Metadata information that could be used in signable extrinsic decoding MUST be be included in digest;
- Digest MUST be deterministic with respect to metadata;
- Digest MUST be cryptographically strong against pre-image, both first and second;
- Extra-metadata information necessary for extrinsic decoding and constant within runtime version MUST be included in digest;
- Digest format SHOULD be versioned to allow rapid withdrawal of cold signing devices in case severe security vulnerability is found in shortener mechanism;
- Work necessary for proving metadata authenticity MAY be omitted at discretion of signer device design (to support automation tools).

#### Reduce metadata size

Metadata should be stripped from parts that are not necessary to parse a signable extrinsic, then it should be separated into a finite set of self-descriptive chunks. Thus, a subset of chunks necessary for signable extrinsic decoding and rendering could be sent, possibly in small portions (ultimately - one at a time), to cold device together with proof.

- Single chunk with proof payload size SHOULD fit within few kB;
- Chunks handling mechanism SHOULD support chunks being sent in any order without memory utilization overhead;
- Unused enum variants MUST be stripped (this has great impact on transmitted metadata size; examples: era enum, enum with all calls for call batching).

## Stakeholders

This feature is essential for **all** offline signer tools; many regular signing tools might make use of it. In general, this RFC greatly improves security of any network implementing it, as many governing keys are used with offline signers.

Implementing this RFC would remove requirement to maintain metadata portals manually, as task of metadata verification would be effectively moved to consensus mechanism of the chain.

## Explanation

Detailed description of metadata shortening and digest process is provided in [metadata-shortener](https://github.com/Alzymologist/metadata-shortener) crate (see `cargo doc --open` and examples). Below are presented algorithms of the process.

### Metadata descriptor

Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, SCALE-encoded base58 prefix, SCALE-encoded decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor.
Copy link
Contributor

@carlosala carlosala Nov 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base58prefix is encoded as u16 and decimals as u8 in the reference implementation. It seems that fixed-width uints work better here since their size is known.
base58prefix is u16 according to the spec and using more than 255 decimal values is not rational since max value for u256 is 10e78.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, SCALE-encoded base58 prefix, SCALE-encoded decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor.
Values for metadata shortening protocol version, `ExtrinsicMetadata`, SCALE-encoded `spec_version` and `spec_name` Strings, `u16` base58 prefix, `u8` decimals value, SCALE-encoded token unit String, should be prepared and combined as metadata descriptor.


### Metadata modularization

1. Types registry is stripped from `docs` fields.
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to mention that 0-variant enums are treated as regular types (as 1-variant enums).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants).
2. Types records are separated into chunks, with enum variants being individual chunks differing by variant index; each chunk consisting of `id` (same as in full metadata registry) and SCALE-encoded 'Type' description (reduced to 1-variant enum for enum variants). Enums with 0 variants are treated as regular types.

3. Chunks are sorted by `id` in accending order; chunks with same `id` are sorted by enum vainant index in accending order.

### Merging protocol

`blake3` transformation of concatenated child nodes (`blake3(left + right)`) as merge procedure;

### Complete Binary Merkle Tree construction protocol

1. Leaves are numbered in ascending order. Leaf index is associated with corresponding chunk.
2. Merge is performed using the leaf with highest index as right and node with second to highest index as left children; result is pushed to the end of nodes queue and leaves are discarded.
3. Step (2) is repeated until no leaves or just one leaf remains; in latter case, the last leaf is pushed to the front of the nodes queue.
4. Right node and then left node is popped from the front of the nodes queue and merged; the result is sent to the end of the queue.
5. Step (4) is repeated until only one node remains; this is tree root.

### Digest

1. Blake3 hash is computed for each chunk of modular short metadata registry.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that Polkadot and Substrate use blake2 basically everywhere, is there a reason to prefer blake3?
We're not going to migrate from blake2 to blake3 any time soon, so whatever program implements this would most likely need to support both blake2 and blake3.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two bottleneck tools support blake3 nicely (I understand that Ledger has hardware support or something comparably low-footprint), it is somewhat faster than blake2 even on single core, it is better defined, and we expect this standard to outlive several Metadata revisions ideally (as it has so much fewer features than metadata) thus it might end up living long enough to see ecosystem transition. We've discussed this overhead with Zondax and both teams see this as advantage in our cold devices; adding blake3 to Polkadot Vault is trivial, and we have no other tools that need this functionality at the moment - best guess is that if Ledger and Kampela can run both blakes, everything that's released later sure would be able too.

3. Complete Binary Merkle Tree is constructed as described above.
4. Root hash of this tree (left) is merged with metadata descriptor blake3 hash (right); this is metadata digest.

Product of concatenation of porotocol version number with resulting metadata digest MUST be included into Signed Extensions.
Slesarew marked this conversation as resolved.
Show resolved Hide resolved

### Shortening

For shortening, an attempt to decode transaction completely using provided metadata is performed with the same algorithm that would be used on the cold side. All chunks are associated with their leaf indices. An example of this protocol is proposed in [metadata-shortener](https://github.com/Alzymologist/metadata-shortener) that is based on [substrate-parser](https://github.com/Alzymologist/substrate-parser) decoding protocol; any decoding protocol could be used here as long as cold signer's design finds it appropriate for given security model.

### Transmission

Shortened metadata chunks MAY be trasmitted into cold device together with Merkle proof in its entirety or in parts, depending on memory capabilities of the cold device and it ability to reconstruct larger fraction of tree. This document does not specify the manner of transmission. The order of metadata chunks MAY be arbitrary, the only requirement is that indices of leaf nodes in Merkle tree corresponding to chunks MUST be communicated.

### Offline verification

The transmitted metadata chunks are hashed together with proof lemmas to obtain root that MAY be transmitted along with the rest of payload. Verification that the root transmitted with message matches with calculated root is optional; the transmitted root SHOULD NOT be used in signature, calculated root MUST be used; however, there is no mechanism to enforce this - it should be done during cold signers code audit.

### Chain verification

The root of metadata computed by cold device MUST be included into Signed Extensions; this way the transaction will pass as valid iff hash of metadata as seen by cold storage device is identical to consensus hash of metadata, ensuring fair signing protocol.

## Drawbacks

### Increased transaction size

Depending on implementation details, an extra byte may be needed to indicate whether the new version of metadata verification was used; this may be needed during transaction period, or the same byte may store the version of metadata hashing protocol

### Transition overhead

Some slightly out of spec systems might experience breaking changes as new content of signed extensions is added. It is important to note, that there is no real overhead in processing time nor complexity, as the metadata checking mechanism is voluntary. The only drawbacks are expected for tools that do not implement MetadataV14 self-descripting features.

## Testing, Security, and Privacy

The metadata shortening protocol should be extensively tested on all available examples of metadata before releasing changes to either metadata or shortener. Careful code review should be performed on shortener implementation code to ensure security. The main metadata tree would inevitably be constructed on runtime build which would also ensure correctness.

To be able to recall shortener protocol in case of vulnerability issues, a version byte is included.

## Performance, Ergonomics, and Compatibility

### Performance

This is negligibly short pessimization during build time on the chain side. Cold wallets performance would improve mostly as metadata validity mechanism that was taking most of effort in cold wallet support would become trivial.

### Ergonomics

The proposal was optimized for cold storage wallets usage with minimal impact on all other parts of the ecosystem

### Compatibility

Proposal in this form is not compatible with older toold that do not implement proper MetadataV14 self-descriptive features; those would have to be upgraded to include a new signed extensions field.

## Prior Art and References

This project was developed as Polkadot Treasury grant; relevant development links are located in [metadata-offline-project](https://github.com/Alzymologist/metadata-offline-project) repository.

## Unresolved Questions

1. Should hash inclusion bit be added to signed extensions?
2. How would polkadot-js handle the transition?
3. Where would non-rust tools like Ledger apps get shortened metadata content?

## Future Directions and Related Material

Changes to code of all cold signers to implement this mechanism SHOULD be done when this is enabled; non-cold signers may perform extra metadata check for better security. Ultimately, signing anything without decoding it with verifiable metadata should become discouraged in all situations where a decision-making mechanism is involved (that is, outside of fully automated blind signers like trade bots or staking rewards payout tools).