Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CHIP-0034: keccak and base64 operators #116

Closed
wants to merge 11 commits into from
137 changes: 137 additions & 0 deletions CHIPs/chip-0034.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,137 @@
CHIP Number | 0034
:-------------|:----
Title | keccak256 and base64 CLVM operators
Description | Add CLVM operators to support Ethereum addresses (keccak256) and passkeys (base64).
Author | [Cameron Cooper](https://github.com/cameroncooper), [Arvid Norberg](https://github.com/arvidn), [Dan Perry](https://github.com/danieljperry)
Editor | [Freddie Coleman](https://github.com/freddiecoleman)
Comments-URI | [CHIPs repo, PR #116](https://github.com/Chia-Network/chips/pull/116)
Status | Draft
Category | Standards Track
Sub-Category | Chialisp
Created | 2024-04-26
Requires | None
Replaces | None
Superseded-By | None

## Abstract

This CHIP will add three new operators to the CLVM:
* `keccak256` -- for supporting Ethereum addresses
* `base64_encode` -- for Base64 encoding, to support passkeys
* `base64_decode` -- for Base64 decoding

The operators will be accessible from behind the `softfork` guard. Hypothetically, in the event of a planned hard fork, the operators could be added to the core CLVM.

## Definitions

Throughout this document, we'll use the following terms:
* **Chialisp** - The high-level [programming language](https://chialisp.com/) from which Chia coins are constructed
* **CLVM** - [Chialisp Virtual Machine](https://chialisp.com/clvm), where the bytecode from compiled Chialisp is executed. Also commonly refers to the compiled bytecode itself
* **Keccak-256** - The 256-bit encryption standard of the Keccack family, of which [SHA-3](https://en.wikipedia.org/wiki/SHA-3) is a subset
* **Base64** -- A standard for transforming binary data into a sequence of printable characters, drawn from a set of 64 unique characters

## Motivation

CLVM currently includes an atomic operator called [`sha256`](https://chialisp.com/operators/#atoms). This operator calculates and returns the sha-256 hash of the input atom(s).

This CHIP will add an atomic operator to the CLVM called `keccak256`, which will calculate and return the keccak-256 hash of the input atom(s). The primary reason to add this operator is to support Ethereum addresses, which also rely on keccak-256.

This CHIP will also add operators to the CLVM called `base64_encode` and `base64_decode`, which will calculate and return the Base64 encoding/decoding of the input atoms(s). The primary reason to add the `base64_encode` operator is to support passkeys that rely on Base64 encoding. `base64_decode` is being added for completeness.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming I found the right specification for passkeys, could we have references:
https://passkeys.dev/docs/reference/specs/
https://www.w3.org/TR/webauthn-2/


Note that these operators could have other use cases not covered in this CHIP.

## Backwards Compatibility

A few notes regarding this CHIP's compatibility with the current implementation of CLVM:
* The CLVM operators to be added are backwards compatible -- any calls that succeed after the CHIP has been implemented also would have succeeded beforehand.
* The CLVM operators to be added are not forward compatible -- calling an undefined operator will result in a successful no-op, so some calls that succeed before the CHIP has been implemented will no longer succeed afterward.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite following what this bullet intends to communicate. all 3 new operators return some value (a hash, a binary blob or a base64 encoded string). This is fundamentally the compatibility-breaking feature. All unknown ops are allowed in consensus mode, but they return NIL. These new operators have return values, therefore it would be a hard fork to add them, had it not been for the softfork operator.

* Because of the forward incompatibility of the operators to be added, this CHIP will require a soft fork of Chia's blockchain.
* The operators to be added are unlikely to be contentious. However, as with all forks, there will be a risk of a chain split.
* The soft fork could also fail to be adopted. This might happen if an insufficient number of nodes have upgraded to include the changes introduced by this CHIP prior to the fork's block height.

The operators will be introduced in multiple phases:
* **Pre-CHIP**: Prior to block `[todo]`, any attempt to call the new operators will result in a successful no-op.
* **Soft fork**: A soft fork will activate at block `[todo]`. From that block forward, the new operators will exhibit the functionality laid out in this CHIP. They will need to be called from inside the `softfork` guard.
* **Hard fork** (hypothetical): In the event that a hard fork is enacted after the code from this CHIP has been added to the codebase, this hypothetical hard fork could include adding the operators from this CHIP to the core CLVM operator set. If this were to happen, the operators could be also be called from outside the `softfork` guard. (Note that the operators would still be callable from inside the `softfork` guard if desired.)

## Rationale

This CHIP's design was primarily chosen to support Ethereum addresses and passkeys. It was implemented in a manner consistent with the SHA-3/Keccak-256 and Base64 standards.

While the Keccak-256 and Base64 operators are not related, they each require soft forks in order to be made available from inside the `softfork` guard. Therefore, they were grouped together in this CHIP, to be activated with the same soft fork.

Each of the new operators will incur a CLVM cost, as detailed below. If this CHIP is adopted, the new operators will be optional when designing Chia coins.

The Base64 alphabet used in this CHIP is specified in [Section 4 of the RFC 4648 Standard](https://www.rfc-editor.org/rfc/rfc4648.html#section-4).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When following the passkey specification to webauthn-2 ( https://www.w3.org/TR/webauthn-2/ ), it clearly specifies the encoding as base64url, which is not section-4 that you reference above. It's section-5.

"Base 64 Encoding with URL and Filename Safe Alphabet"

The RFC calls this encoding base64url. Perhaps we should consider naming the functions this as well.


Each Base64 character is encoded with six bits, and decoded to eight bits. Therefore, the ending of some Base64 output strings [require padding](https://en.wikipedia.org/wiki/Base64#Output_padding). The padding used in this CHIP is `=`, which is also specified in the RFC 4648 Standard.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the webauth-2 also specifies that padding characters are omitted.

From the specification:

The term Base64url Encoding refers to the base64 encoding using the URL- and filename-safe character set defined in Section 5 of [RFC4648], with all trailing '=' characters omitted (as permitted by Section 3.2) and without the inclusion of any line breaks, whitespace, or other additional characters.


## Specification

### `keccak256`

Opcode: [todo]

Functionality: Calculate and return the Keccak-256 hash of the input atom(s)
arvidn marked this conversation as resolved.
Show resolved Hide resolved

Arguments:
* If zero arguments, Keccak-256 hash of an empty string will be returned
* If one or more arguments:
1. Each argument (if more than one) will be concatenated
2. The Keccak-256 hash of the resulting string will be returned

Usage: `(keccak256 A B …)`

CLVM Cost: `[todo]` base, `[todo]` per argument, `[todo]` per byte

### `base64_encode`
arvidn marked this conversation as resolved.
Show resolved Hide resolved

Opcode: [todo]

Functionality: Calculate and return the Base64 encoding of the input atom(s)

Arguments:
* If zero arguments, the result will be the Base64 encoding of an empty string (`NIL`), in other words, the result will be `NIL`
* If one or more arguments:
1. Each argument (if more than one) will be concatenated
2. The Base64 encoding (following [Section 4 of the RFC 4648 Standard](https://www.rfc-editor.org/rfc/rfc4648.html#section-4), and padded with `=` when necessary) of the resulting string will be returned
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should refer to section-5 and it should state that no padding is produced


Usage: `(base64_encode A B …)`

CLVM Cost: `[todo]` base, `[todo]` per argument, `[todo]` per byte

### `base64_decode`

Opcode: [todo]

Functionality: Calculate and return the Base64 decoding of the input atom
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's more straight-forward to just say "return the decoded form of the base64 input string".


Arguments:
* If zero arguments, the result will be an empty string (`NIL`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not clear to me that we would want to support 0 arguments. Typically we only do that for operators that take a variable number of parameters

* If one argument, the Base64 decoding of the argument will be returned
* If more than one argument, an exception will be raised

Note that the argument must be in Base64 format, encoded according to [Section 4 of the RFC 4648 Standard](https://www.rfc-editor.org/rfc/rfc4648.html#section-4). It must be padded with `=`, also according to the RFC 4648 Standard. If the argument is not encoded in the proper format, or if it is not properly padded, an exception will be raised.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same thing here. section-5, padding is disallowed, any characters not in the encoding alphabet are disallowed. e.g. new lines


Usage: `(base64_decode A)`

CLVM Cost: `[todo]` base, `[todo]` per argument, `[todo]` per byte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for base64 encoding & decoding? I understand it's something to do with ETH. Where can I find more details?

Including base64 data instead of the corresponding binary data on chain seems like a mistake, since reading base64 data would come from a computer before being dumped on chain, and there are plenty of opportunities for the reading client to decode before dumping on chain. Maybe ETH does this, due to some bad design somewhere. But I worry we're proposing changes to clvm to make simpler interoperability with bad design on ETH or elsewhere, which just encourages people to repeat those design errors on chia.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

base64 is in support of passkey support.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a signature must be validated against a base64 message, but you need to validate the message matches something within the puzzle, you have two options:

  • Give the puzzle the encoded base64 message and decode it, then check it matches
  • Give the puzzle the original message, check it matches, then base64 encode it before validating the signature

Both of these require base64 opcodes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keccak is for Ethereum address compatibility - it allows an application to construct a Chia puzzle that is only spendable by whoever owns the private key corresponding to the Ethereum address. Which means you can send XCH to an Ethereum address with a little bit of wrapping behind the scenes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And base64 allows you to have a puzzle that can only be spent by including a signature created with a passkey (meaning you don't need to have access to the original secret key, just the signer)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn’t have anything to do with ETH.

When passkeys are used via the Web Authentication API, they always encode the challenge (message) with base64 before signing it.

That’s why we need to encode the message in the puzzle too, so we can validate the passkey signature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@greimela Appreciate the response. Do you have a reference where I can learn more about this? It seems surprising to me that they couldn't sign binary data directly and I'm wondering what possible motivation there could be.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I posted some links here: #116 (comment) and here: #116 (comment)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks, those are the ones!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked at this more today and I think I understand this better now. WebAuthn apis sign JSON messages of a certain format, ie. certain keys must be present and must have specific formats, so that you don't sign arbitrary messages with arbitrary keys. For details, see https://www.w3.org/TR/webauthn-2/#client-data

So the thing that's signed is a JSON blob which includes a challenge: key, which is server-determined. In the prototype, we hack it to be a hash of the target conditions for the coin. Validating requires checking the signature, which is against the entire JSON blob, and then extracting the portion of the JSON blob (via an offset hint) that points to the base64 data, and comparing it against the delegated puzzle.


## Test Cases

[todo]

## Reference Implementation

[todo]

## Security

[todo]

## Additional Assets

None

## Copyright
Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
Loading