-
Notifications
You must be signed in to change notification settings - Fork 638
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add precompiled contracts for alt_bn128 curve #2842
Conversation
runtime/near-vm-logic/src/config.rs
Outdated
@@ -292,6 +304,10 @@ impl Default for ExtCostsConfig { | |||
ExtCostsConfig { | |||
base: SAFETY_MULTIPLIER * 88420586, | |||
read_memory_base: SAFETY_MULTIPLIER * 861350075, | |||
alt_bn128_g1_multiexp_base: SAFETY_MULTIPLIER * 50000000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How were these numbers computed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These numbers are based on EIP-1108, 1 eth gas = 1e6 near gas.
So, for pairing constant cost is about 45_000_000_000,
Byte cost is 34_000_000_000/(32*6) = 177_083_333.
Multiexponention constant cost is the same as for addition 150_000_000_000
Addition (element) cost is 6_000_000_000 / (254*2) = 11_811_023
Presented at the config values can differ a little.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 eth gas = 1e6 near gas.
I am not sure where this is coming from, I would assume it is coming from our EVM implementation but our EVM relationship of ETH to NEAR gas does not translate to the situation when we are adding a new host function. The way we estimate NEAR gas usage is by running runtime-params-estimator
which calculates the exact number of CPU instructions that were used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated runtime-params-estimator, but there are no cost generators for
validator_stake_base and validator_total_stake_base.
So, measurment process paniced.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@olonho (CC @bowenwang1996 ) Could we update runtime-params-estimator
to use the values that are currently hardcoded for these fees, instead of computing them? I thnk @bowenwang1996 might have forgot to do it. It is currently blocking snjax.
runtime/near-vm-logic/src/logic.rs
Outdated
value_ptr: u64, | ||
register_id: u64, | ||
) -> Result<()> { | ||
fn weight(data: &[u8]) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we have an explanation of what this function computes? Is it hamming weight?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the complexity of single ecmul in terms of elliptic curve additions.
weight := \floor{log2(exp)} + hamming(exp) + 1
1st adding is number of ec doublings.
2nd adding is the number of ec additions.
3rd adding is the cost for the result increment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this and another explanation into comments. We would like this code to be accessible to people who are as uneducated on crypto as me.
@abacabadabacaba Could you please review this PR? Specifically, could you look into the following aspects:
|
@snjax we also need tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for addressing the comments.
runtime/near-vm-logic/src/logic.rs
Outdated
self.gas_counter.pay_per_byte(alt_bn128_g1_multiexp_element, w_acc as u64)?; | ||
|
||
let value = <Vec<(bool, G1, Fr)>>::try_from_slice(&value_buf) | ||
.map_err(|e| HostError::GuestPanic { panic_msg: format!("{}", e) })?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GuestPanic
is still not a correct error type, it is reserved for the cases when Wasm contract chooses to voluntarily panic. We need to define new error here, e.g. HostError::AltBnU128DeserializationError
and HostError::AltBnU128SerializationError
.
runtime/near-vm-logic/src/logic.rs
Outdated
value_ptr: u64, | ||
register_id: u64, | ||
) -> Result<()> { | ||
fn weight(data: &[u8]) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add this and another explanation into comments. We would like this code to be accessible to people who are as uneducated on crypto as me.
core/crypto/Cargo.toml
Outdated
@@ -23,5 +23,6 @@ serde_json = "1" | |||
sha2 = "0.8" | |||
subtle = "2.2" | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like a unnecessary change here
I looked at the changes and I immediately see a couple of problems: Single-purpose API In general, pairing-friendly curves such as alt-bn128 can be used in multiple different cryptographic schemes: BLS signatures, various zero-knowledge proofs and so on. However, this API only implements a very limited set of operations, which makes it probably unusable for anything except this particular kind of SNARKs. Note: this particular curve has its parameters tuned to optimize the performance of SNARK generation, at the expense of performance of nearly every other operation. Thus, this curve may not be the best choice for other schemes. Also, the API provided by Ethereum has the same problem. Serialization of parameters The API is defined such that the parameters are serialized (using Borsh) by the caller, and deserialized by the implementation. This seems unnecessary. The contract code runs in the same process as the node, and the contract's memory is basically a byte vector in the node's memory. So, the node can easily read the inputs from the contract's memory, and having the contract serialize the data before passing it to the API seems an overkill. Note: in Ethereum, precompiled contracts are limited in the same way as regular contracts regarding the way they can exchange data with the caller. However, even in Ethereum the serialization used is simpler (array length is not passed explicitly). Gas requirements assume inefficient implementation The gas requirement for multiexp operation is computed depending on the Hamming weight of the exponents. This assumes that multiexponentiation is performed in a very inefficient way, by performing each exponentiation separately using a basic double-and-add approach and then adding the results together. Even if we later switch to an efficient implementation, changing the gas formula would need a hard fork. In Ethereum, gas requirement only depends on the number of points, not on the values of the exponents. This seems to be a better approach, especially given that the run time of the more efficient algorithms doesn't depend on the values of the exponents as much. Incoherent gas computation on invalid inputs If the input isn't properly serialized, the functions compute the amount of gas they consume (before failing) in an inconsistent way. They look into the input value with the assumption that it is properly serialized and take parts of it before actually attempting deserialization. As a result, those implementation details have an observable effect on the consensus, which shouldn't happen. Use of uncompressed points These operation use uncompressed point format for input and output. Compressed points take twice less space, but decompressing them takes additional time (which is significantly less than the time for exponentiation and especially pairing). We should consider whether the tradeoffs of using compressed points (space usage vs time usage) would makes us prefer using compressed points. |
@abacabadabacaba, thank you for your review.
There is a number of advantages for the alt-bn128 curve: fast SNARK computations, half-pairing cycles for recursive composition, production-ready Ethereum ecosystem (circom, perpetual powers of tower trusted setup, etc).
I suppose we should not bind the API to the current rust library implementation. bn use 128bit limbs in LE ordering, ff use 64bit limbs inn LE ordering. Maybe, in other language node implementations, it will be easier for developers to use stable audited libraries with some 3rd kind of object representation. In borsh all numbers are represented as byte arrays in LE notation, I used the same serialization for point coordinates.
API is used for in-contract computations, so, it is not necessary to use uncompressed points in transactions call data. Precompiled contracts could be added later if we need them.
Agree, the current implementation is not efficient. I can replace it with Pippenger's algorithm.
Could you explain to me more, how gas metering for failed transactions, depending on some log2 and Hamming weight for deterministic defined chunks of input could break the consensus? Taking into attention @abacabadabacaba's review, I propose 3 precompiles: All field elements will be encoded as a fixed-sized byte array in LE ordering (big integer, not in Montgomery representation). Maybe it will be useful to decide, what kind of protocol do we need, approve the one as NEP and after that, I will implement it. |
@abacabadabacaba Regarding serialization of arguments. We cannot assume any specific layout or structure of the data stored in the contract memory, since we allow any Wasm-enabled language and even a single language can change its format through different versions. Therefore we need some universal interface that would allow any language to communicate with the same host function. Unfortunately, each language would have to add some glue code to convert between the universal interface and language-specific data structures. Serialization plays the role of such glue code, and serialization format is that universal interface. I wouldn't be concerned with the overhead, since we are using binary serialization, so we are only paying for additional allocation, which should be negligible compared to things like: the cost of calling a host function, cost of the crypto primitive. As for Ethereum, it does not support general purpose languages, so they can enforce any conventions they want through EVM and Solidity standards.
I am not sure what input would be inconsistent with when the data is not correctly serialized. This is a standard behavior of our host functions -- users can attempt to provide any kind of mangled data and we will attempt to execute it and charge for this attempt proportionally to how much resources they consumed.
I don't think there is any effect on the consensus. Could you please elaborate?
How much would we gain from this optimization compared to the overall cost of the operation? |
Regarding serialization: I couldn't find any host function that we already have that uses Borsh serialization for its arguments. I would prefer something more similar to what Ethereum precompiles use, i.e. an explicit description of which bytes in the input correspond to which values. See the description of SNARK precompiles in the Ethereum yellow paper (formulas 255‒265). For us, the difference would be that the number of elements would not be encoded in the buffer but instead deduced from its length. Another option that can be considered is to add additional register types that will hold group elements in the internal representation (i.e. in projective coordinates in Montgomery form). Then, there will be separate functions for reading points from memory and converting them to the internal representation, and the functions that perform operations on them will take the IDs of the registers holding the internal representations as arguments. This is not possible in Ethereum, but is possible in NEAR. By the way, what is the cost of merely calling a host function? To what extent should we design the API such that the number of host function calls is as small as possible? Regarding the handling of improper serialization: currently, the code reads the values from the specific positions in the buffer (e.g. this line: The advantage of using compressed points is that we can reduce transaction size. Taking compressed points as input will allow to include compressed points instead of uncompressed points in transactions, which will make them smaller. As a result, blocks will be smaller. If points are stored in the state, the state will also be smaller. The cost of operations will increase slightly. |
@nearmax, @abacabadabacaba Thank you for your review. @abacabadabacaba, decompressed points are used at precompiled contracts, but we do not have to use them in external calls. We could use compressed points in external calls and decompress them before computations. Important to notice it that it is hard to estimate the gas cost for invmod. Complexity differs a lot for different numbers. That's why there are no compressed points in ethereum now. Currently, there are no stable precompiled contracts with serialized math objects inside. We can use borsh, or can use something more optimized. In ethereum, there are two abi encoders: normal encoder, like borsh (keeping length of iterable objects inside the blob) and packed encoder. But the difference between borsh and hypothetical borsh_packed is much lesser than between ethereum abi.encode and abi.encodePacked because borsh is already optimized. I took into attention issues, described by @abacabadabacaba and rewrite precompiles. Now there are 3 precompiles: Gas metering is simplified and now there is no abstraction breaking code. So, complexity for alt_bn128_pairing_check and alt_bn128_g1_sum is linear from number of elements. Complexity for alt_bn128_g1_multiexp is sublinear, n/ln(n), pippenger algorithm is implemented. Also, the metering of sublinear precompiled contracts is one more new abstraction for gas metering inside nearcore. Verifier contact is working with new precompiles. It is available here https://github.com/zeropoolnetwork/NEAR-pool/tree/master/pool-contract |
@evgenykuzyakov Could you please see this PR through? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are too many parts that might break the runtime with some math error by using malicious input. (e.g. overflow or division by 0). We'll have to audit entire bn
library and wrap all unsafe math with safe math.
That's too risky at this point and I don't think we should do this.
Have you tried to compile it straight to Wasm and run it within a contract? How much more expensive it is? Are we going to run out of gas limit of 200 TGas?
runtime/near-vm-logic/src/logic.rs
Outdated
let value_buf = self.get_vec_from_memory_or_register(value_ptr, value_len)?; | ||
self.gas_counter.pay_per_byte(alt_bn128_g1_multiexp_byte, value_buf.len() as u64)?; | ||
|
||
let value = <Vec<(G1, Fr)>>::try_from_slice(&value_buf) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we can do this in the logic. It requires to support fixed format for the fixed borsh
indefinitely. We'll not be able to upgrade borsh
version here, because this host relies on the particular borsh version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could create a special crate near-bn128 or something like this with binary serialized input and output and move all complex logic there. The same approach is used in openethereum for bls12 curves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed my mind about inclusion of borsh
. Since borsh
doesn't affect the gas cost for the runtime, we will be able to upgrade borsh as long as the serialization of the format doesn't change.
So let's keep borsh
, but we probably can move serialization/deserialization of bn
towards nearcore
. This way it will stay explicit in the nearcore
instead of being part of the zeropool-bn
crate. We may need a wrapper for this, but it's should be fine.
runtime/near-vm-logic/Cargo.toml
Outdated
[dependencies.bn] | ||
package = "zeropool-bn" | ||
version = "0.5.6" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should do this. This creates a very complex dependency for the runtime VM that is not easily replaceable with another library.
VM logic supposed to be as simple as possible. We've added sha2
and sha3
because they are common, but this one is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's move borsh
ser/de logic from crate to nearcore
. And let's depend on the minimum changes fork instead, or use original if possible. @abacabadabacaba
runtime/near-vm-logic/src/logic.rs
Outdated
let gas_estimation = { | ||
let l = value.len(); | ||
if l == 0 { | ||
0 | ||
} else { | ||
let l = l as f64; | ||
(l / (l.ln() + 1.0)).ceil() as u64 | ||
} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't use f64
. What if there is an overflow or division by 0, or some other math errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
x/(ln(x)+1) is contraction mapping for x>=1. So, no overwflows and zero divisions here.
Also, such logarithms are used in go-ethereum for the same case (parameters estimation for sublinear pippenger multiexp) here.
I can implement an integer logarithm, but it seems better to use the common one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, floating-point functions like ln
are not guaranteed to produce identical results on all systems. For this reason, we shouldn't use them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok. I will use a custom implementation of integer log2floor.
Could you comment on the proposal here near/NEPs#98?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may also need to update pippenger
implementation to not use f64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is much cleaner. Thanks!
Please add tests to near-vm-logic/tests
to cover basic calls of the new vm-logic
methods.
Also add tests to alt_bn128.rs
for basic calls to math and some corner case within serialization/deserialization.
Looks good from my side. I'm asking @abacabadabacaba to look again at crypto side and @olonho for fees. If the change looks fine, I can help with the protocol upgrade. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK for me, modulo buffer in test contract question.
7709103
to
32f3aeb
Compare
The branch is rebased to the last commit.
|
@bowenwang1996 . @evgenykuzyakov told me that this PR is blocked on upgradability. Will we be able to merge it using our new upgradability scheme at test it in night protocol version? |
yes if we put the change behind some feature flag |
@nearmax @bowenwang1996 Are there any blockers from my side? New API is optional with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the previous comments.
There a still some issues with upgradability.
We upgrade node using the following process:
- A binary with the new feature gets released. It has to be able to work with the legacy protocol version and the new protocol version.
- It reports the new protocol version to the network.
- Once 80% on nodes support the new protocol version, the protocol is scheduled to upgrade in 2 epochs.
- They still run 2 epochs with the old protocol.
- The new protocol is activated at a first block of 3rd epoch.
So the current PR has to address the following issues:
- we can't introduce a new feature in the old protocol. So old protocol version should be noop.
- Once the new protocol is live, we should activate new imports and start processing them. Before, they should return method not found error similarly to the old protocol.
@@ -177,6 +177,10 @@ pub enum HostError { | |||
ContractSizeExceeded { size: u64, limit: u64 }, | |||
/// The host function was deprecated. | |||
Deprecated { method_name: String }, | |||
/// Deserialization error for alt_bn128 functions | |||
AltBn128DeserializationError { msg: String }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove msg
from this error and AltBn128SerializationError
.
Because right now it creates a dependency on error format from borsh
create. And if the error changes due upgraded the dependency, then it may create an unexpected chain split.
AltBn128DeserializationError { msg: String }, | |
AltBn128DeserializationError, |
// Base cost for multiexp | ||
pub alt_bn128_g1_multiexp_base: Gas, | ||
|
||
// byte cost for multiexp | ||
pub alt_bn128_g1_multiexp_byte: Gas, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an issue, since the old protocol doesn't have such fields.
We haven't completed a config upgrade that introduces new fields, so I currently don't know what's the best way to introduce them.
But, check out how we want to add such fields for EVM branch: #3257
We basically added a new config and made it skipped form genesis, but deserialized from default.
#[cfg(feature="protocol_feature_alt_bn128")] alt_bn128_g1_multiexp<[value_len: u64, value_ptr: u64, register_id: u64] -> []>, | ||
#[cfg(feature="protocol_feature_alt_bn128")] alt_bn128_g1_sum<[value_len: u64, value_ptr: u64, register_id: u64] -> []>, | ||
#[cfg(feature="protocol_feature_alt_bn128")] alt_bn128_pairing_check<[value_len: u64, value_ptr: u64] -> [u64]>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's not enough. Since the upgrade has to be done on the live binary, we need to handle 2 different imports. One with current protocol version which doesn't have such methods and once the current protocol version changes we need to use different imports.
@evgenykuzyakov should we revive this to make private transactions possible on NEAR? Would be super cool to have a ZK-based pool where you can generate anonymous linkdrops. |
@evgenykuzyakov As we discussed last week, this is needed by the EVM. What's the next step for unblocking this and moving it forward? |
Merged latest. Based on #2842 by @snjax This feature adds to near following precompiled contracts: - alt_bn128_g1_multiexp - alt_bn128_pairing_check Example of usage is at https://github.com/zeropoolnetwork/near-groth16-verifier TODO: - [x] Update Cargo.toml files to properly expose feature - [x] Update test-contract to use feature - [x] Use latest VMLogic to pass protocol version - [x] Expose nightly feature - [x] Add contract tests - [x] Wrap errors with feature - [x] Update param estimator to use different versions of test contracts - [x] Re-run param estimator. ## Test plan: - CI - Added tests to test-contract-rs in near-vm-runner. - Doc tests for alt_bn128 within near-vm-logic Co-authored-by: Igor Gulamov <[email protected]>
This feature adds to near following precompiled contracts:
Example of usage is at zeropoolnetwork/near-groth16-verifier