Warp Signature Strengthening #71
Replies: 5 comments 9 replies
-
Interesting idea. IMO there are important questions to be answered:
|
Beta Was this translation helpful? Give feedback.
-
Tbh I don't love any of these options on the basis of a few principles.
|
Beta Was this translation helpful? Give feedback.
-
Another option discussed offline is to use HSMs to improve security at the validator level using something like https://cubist.dev/cubesigner-hardware-backed-remote-signing-for-validator-infrastructure. Improving an individual node's BLS key management and access control is mostly an orthogonal direction to the original post, but will explain the potential flow outlined in the conversation offline here. Cubist <> Ethereum StakingCubist provides a secure key management solution for Ethereum nodes where it hosts staking keys within a secure enclave and enforces a policy on block signing requests within the secure environment. This provides practical benefits by separating key management onto secure hardware. Most importantly, when Cubist receives a block signature request, it confirms first that it has never signed a conflicting block. This ensures that even if there is a bug in the consensus/execution client leading to conflicting block requests, the secure hardware will reject the invalid signature request. This can be a significant improvement for Ethereum stakers who may be slashed if they violate that requirement due to a faulty client. Avalanche Warp <> vHSM Security ImprovementsWe can fairly easily move BLS keys into a secure environment like Cubist to improve practical key management security, but it's more difficult to define a policy enforced within the vHSM that gives us the same protection. The benefit of using Cubist for block signatures is that the condition the staker cares most about guaranteeing is to make sure that they do not sign conflicting blocks, such that they get slashed. Cubist, in this design, is primarily intended to protect against a buggy client, but it would also protect against the Ethereum client providing input/signature requests to the vHSM being taken over by an attacker. For Warp, if an attacker takes over a validator's node, then they'd control the inputs to the secure environment. Since Avalanche Warp Messages are included in the events of a block, an attacker could craft an arbitrary block that builds on a valid last accepted parent block, and pack it with any warp messages it wants. If we instead want a security model where the vHSM provides security even in the case that an attacker has taken over the node providing inputs to the vHSM, then we need to enforce an integrity check that catches any such misbehavior. Potential vHSM Policy ControlsHere, we'll walk through a few potential policy controls (including strawmen). We'll use a model where the attacker wants to generate a signature for a warp message M that is never accepted onchain.
Let's say the vHSM receives full blocks from the node and is asked for a Warp signature over all valid payloads ie. WarpMessages including If the attacker controls inputs to the vHSM, then imagine there is some point of compromise, where the vHSM has produced only valid signatures up to block n. When the attacker takes over, it can create an arbitrary block at height n + 1. It cannot create a conflicting signature for any previous block in the range [0, n] because the vHSM will detect it. However, if it wants to create a block that contains an arbitrary Warp message included onchain, then it can create a block at height n + 1 that includes such a result (assuming the vHSM does not verify the execution path that leads to those messages). After the attacker has taken over, it can pack arbitrary malicious messages into a block to request signatures for those payloads. It's clear that if the vHSM only enforces the constraint of block requests coming in order, this does not offer much added security.
If we re-execute the full block in the vHSM, then we will verify all of produced warp messages could have been produced by a series of valid transactions that were issued. This still leaves a degree of freedom to an attacker if they want to form blocks that arbitrarily re-arrange transactions issued onchain to produce a different set of warp messages than the actual accepted chain on the network. Under this scenario, if a contract guarantees that it will only ever emit a single event with the characteristic a == 1, then an attacker can only ever produce a single warp message with that characteristic as well. This is a pretty strong guarantee and should ensure that a node that is compromised by an attacker will never produce duplicate warp messages for arbitrary withdrawals from a bridge with more funds than actually exist, since there will be no valid state transition that leads to the production of such messages. Ofc - this has the downside that we need to execute the full block in the vHSM.
The last component we could try to verify within the vHSM would be consensus itself. Unfortunately, there are two options of how we'd do this and neither is likely to provide a reasonable path.
|
Beta Was this translation helpful? Give feedback.
-
I think it's pretty clear that the first step to improve security is to improve the infrastructure around BLS key management. With that in mind, adding secure hardware support for BLS key management seems like the lowest hanging fruit to improve security. Beyond the practical key management improvements we could make, this post got me thinking about the different tradeoffs Subnets may want to accept. At the most basic level, cross-chain communication requires a trusted third party to authenticate between two chains (ref). Right now, we use the P-Chain to provide the Public Key Infrastructure (PKI) for signing and verifying warp messages. This requires an honest majority assumption on the source chain at both the consensus level and the generation of an aggregate signature from the validator set (required weight is parameterizable). If we want cross-chain communication to be resilient when that assumption fails, then we need to an added check that verifies or notarizes those results. This could be trusted hardware (vHSM like Cubist discussed above), a central observer (Warp observer), or I'd actually put an L2 like system in the same category - an agreed upon third party notarizing the results from a lower trust system. In all of these cases, the notary acts as a circuit breaker, such that both need to maintain liveness for cross-chain communication to live and both need to be corrupted to produce any invalid messages. I think there are some interesting tradeoffs here. Hierarchy of Invalid MessagesTo understand the options, here's a simple (probably incomplete) threat model of the types of messages that could be delivered to a destination chain under different types of compromise. The cases will range from syntactically/semantically invalid, there exists a feasible sequence of transactions that could cause it, has that sequence of transactions actually been created/issued, and finally was that sequence irreversibly committed. Syntactically or Semantically InvalidIf a majority on the source chain is compromised by a malicious party, then the malicious party can generate any arbitrary message regardless of the syntactic or semantic rules normally enforced by the VM. This can happen if the malicious party controls a sufficient portion of stake to create valid signatures without any participation from correct nodes. In Subnet-EVM, this defaults to 67%. For example, if the VM maintains an invariant that it only signs messages prefixed with Semantically Valid - No Path to Make It HappenIf there is no secondary verification and a majority of the source chain is compromised, then the malicious party could craft a message that is semantically possible, but for which there is no existing sequence of signed transactions that would lead to it. For example, if Kenny has 100 wrapped tokens on SubnetA, then Kenny can issue a withdrawal transaction that triggers a Warp message to unwrap them on SubnetB. In other words, the resulting message is semantically valid, but it requires a signed input from Kenny. In this model, the malicious party does not need Kenny's input to create that message because it can generate a Warp signature without ever receiving such a transaction from Kenny. The required level of compromise is identical to issue semantically invalid messages and this can be seen as semantically invalid as well. However, it's useful to think of this as a different class of invalid messages because it COULD be valid with the addition of a missing input, whereas the previous case there is no such input that would make the message feasible. Semantically Valid with ConflictsTake the same scenario as above where Kenny has 100 wrapped tokens on SubnetA, such that Kenny can issue a withdrawal transaction that triggers a Warp message to unwrap them. Instead of generating a message without Kenny signing the required transaction to issue a withdrawal, let's say Kenny signs and issues two conflicting transactions: (1) withdrawal to SubnetB or (2) withdrawal to a different address on SubnetB. Now that Kenny has signed both transactions (or short sequence of transcations), there are two semantically valid paths to generate two different Warp messages. However, a correct node should never sign both of these messages because executing sequence 1 invalidates sequence 2 and vice versa. This brings up the interesting case where a byzantine party controls enough of the network to trigger a safety failure (causing honest nodes to commit conflicting decisions), but not enough stake to sign a Warp message without any participation from correct nodes. Subnet-EVM defaults to require a threshold of 67% of stake to create a valid signature, so it needs to control 67% of stake to sign messages with zero participation from correct nodes. Assuming a 33% adversary is able to cause a safety failure that can force correct nodes to commit conflicting blocks in an even split (half commit blockA and half commit blockB), then it can split the correct nodes and generate two conflicting signatures by adding its own signature to both blockA and blockB (note: because the threshold is 67% this requires just over 33% byzantine stake). In this case, the adversary can generate signatures for two feasible Warp signatures (ie. there is a path of issued transactions that could generate them), but one of them is "invalid" because it was not actually committed on-chain. Semantically Valid and Correctly CommittedFinally, we have the simplest case where the Subnet produces only valid messages. These are the messages resulting from the correct execution of the chain that has been irreversibly committed. This is the normal case where the Subnet maintains an honest majority and performs correctly. Secondary System / NotarizationTo defend against a Subnet producing invalid messages, all of the ideas outlined at the beginning are different instantiations of the same idea: leveraging a secondary system to verify the work of the primary system. For the secondary system to work, it needs to (1) verify the full state transition of every block and (2) ensure that even if the primary system equivocates, the secondary system chooses one option and does not notarize conflicts. If the secondary verifies anything less than this condition, then it's exposed to at least one of the invalid message types. There may be some blockchain applications where there's a meaningful difference between the severity of a bug that allows some, but not all of the outlined message types through, but for most blockchains, any of the invalid message types is a critical failure. For that reason, I'll only consider the case that the secondary performs the full verification of each state transition and a consensus proof from the primary. Assuming the secondary system verifies the full state transition and the consensus proof, we can construct a nice tradeoff by requiring a certificate (Warp signature) from both the primary and secondary. If destination chains require both signatures to verify, the secondary acts as an effective circuit breaker: flowchart TB
A[Primary] -->|accept block| B[secondary]
A[Primary] -->|accept block| C[warp message]
B -->|2. notarize block| C[warp message]
Now there are four different scenarios: Primary and Secondary Function CorrectlyIf both work correctly, the system works perfectly. The secondary can aggregate a Warp signature over accepted blocks from the primary, re-execute the block, and produce its own signature. Without the secondary in the loop, someone already needed to aggregate a Warp signature from the primary to send a cross-chain message, so relying on the primary to perform this doesn't add any latency. The most expensive part will most likely be re-executing the block on the secondary. In the happy path, we get an added latency based off the amount of time it takes to re-execute a block, so I'd expect this to be less than 1s of added latency. Compromised PrimaryIf the primary is compromised, then it can produce an arbitrary stream of invalid blocks either conflicting, or completely invalid. However, it cannot get any of these invalid blocks past the secondary layer! The secondary layer will detect and reject any invalid blocks and will select one block to accept if the primary produces a set of conflicting blocks. Compromised or Failed SecondaryIf the secondary fails, the system cannot produced cross-chain messages until it comes back online. If the secondary is compromised by a malicious actor, they cannot produce any invalid cross-chain messages without participation of the primary as well. In this case, the primary acts as a circuit breaker instead of the reverse. Primary and Secondary CompromisedIf the primary and secondary are compromised, then they control the full system, so they can produce arbitrary cross-chain messages at will. TradeoffsA "notary" creates a nice set of tradeoffs for Subnets concerned about low economic security PoS subnets. It drastically limits the worst case fallout and selecting a high security secondary presents a number of different options. The most obvious is to use an L2 design that selects a high security settlement chain like the C-Chain. Another option is to deploy a quasi-bicameral system where the primary is a standard PoS blockchain and the secondary is a more limited, elected set of notaries (possibly a single party). The notary provides defenese in depth, but is very limited in the harm it can cause indepdenently. As long as there's an onchain fallback mechanism to replace a failed leader (probably requiring some social consensus or manually triggeered process), a notary failure would cause downtime on cross-chain messages, but the primary itself could continue to make progress. We could make use of secure hardware (optionally attesting onchain) to further improve practical security. The drawback of utilizing secure hardware like Intel SGX or AWS KMS is that it introduces a company like Intel or AWS into the critical dependencies of an application, which represents both a centralized point of failure and, in the case of Intel SGX, a long track record of security vulnerabilities. However, that's much less of an issue when it's used purely for defense in depth. In a notary design, using secure hardware is not critical to the safety of the application, it's a pure practical security improvement. Another drawback is that secure hardware like Intel SGX has a long track record of critical vulnerabilities being discovered over time. |
Beta Was this translation helpful? Give feedback.
-
How do we want to think about the relationship between the Subnet and the notaries/observers? Currently all design ideas require the Subnet to cooperate for observation or notarization. I want to bring up the idea to allow anyone to observe a Subnet and either agree or object to any message coming from that Subnet. This will allow developers to interact with a Subnet even though they may not trust the validator set. They can define that they are willing to accept a message from an insecure Subnet if some observer / some set of observers does not object to it. This has the circuit breaker functionality mentioned earlier, but giving more control what breaks the circuit to the developers. Generally, this would bring validators and observers closer together: Both are executing all state transitions and verify it's validity. The validators need to fulfill the staking requirements, can propose new blocks and receive rewards. Meanwhile, the observers only attest to outgoing messages. Depending where we allow the developer to configure what breaks the circuit, it will introduce complexity on the precompile or teleporter level. Making this intuitive may be challenging. |
Beta Was this translation helpful? Give feedback.
-
Context
Avalanche Warp Messaging introduced a Subnet to Subnet communication protocol that is arguably as theoretically secure as possible for a cross-chain communication protocol. It is as secure as the validator sets of the two chains being connected. This is achieved by using a BLS aggregate signature from the source chain's validator set to authenticate the message on the destination chain. There is no trust put in additional parties external to the two chains. Theoretically, a cross-chain communication protocol cannot be more secure than the chains that it is connecting because if one of the chains is corrupted itself, it can make arbitrary updates to its state that affect cross-chain communication.
However, theoretical security does not always translate directly into practical security. On its own (without any cross-chain connections), if a chain's validator set is corrupted and arbitrary updates are made to its state, social consensus can be reached to fork the chain at a state prior to the corruption (as long as one snapshot copy of state at that point is available). If that chain has connections to other chains, if it's validator set is corrupted and cross-chain messages are forged from it, the effects can result in state changes on other chains that are not likely to schedule a fork for the sake of the chain that was corrupted. As a concrete example, if funds are bridged from the C-Chain into a Subnet with only a few validators, and those funds are stolen due to the Subnet validators being corrupted, it is highly unlikely that the C-Chain would perform a fork for the sake of the small Subnet. This is also true of cross-ecosystem bridges between networks such as Avalanche, Ethereum, Solana, etc.
Considering that the initial validator set of many new Avalanche Subnets could feasibly be small, this draft proposal contains a few ideas for how the practical security of cross-chain messages sent from Subnets could be strengthened beyond their validator sets themselves. Of course, having the largest, most diverse, and most decentralized validator set possible is always the goal and recommendation for any blockchain as a priority.
Option 1: Warp Observers as a Service
This option involves individual validator nodes strengthening the BLS public key that they register on-chain.
Setup
A set of N parties run non-validator nodes for a given chain. They set up and publish a threshold BLS public key such that signatures from K of the N parties are required to assemble a valid signature of published threshold BLS public key.
When a validator node of the chain is set up, it creates its own BLS key (as would occur currently), and then registers its BLS public key as the aggregation of its BLS public key and the threshold BLS public key from above. When that validator is later queried for its Warp signature of a given message, it queries the configured non-validator nodes necessary to construct a signature of the message from the threshold BLS public key, and aggregates that signature with its own signature generated using its own BLS private key. Thus, in order to forge a Warp signature from this node, one would need to corrupt K of N of the configured non-validator nodes as well as the node itself.
Considerations
A nice property of this approach is that it does not require changes to the Warp protocol itself, or make the verification of aggregate Warp signatures on-chain any more expensive. As far as all other validators know, the BLS public key registered to a given node (generated as the aggregation of its own public key and the threshold public key), is the same as and indistinguishable from any other BLS public key.
This option also does not require any changes for those relaying Warp messages, since validators themselves are still directly queried for their BLS signatures, and those validators leveraging observer nodes assemble those nodes' signatures behind the scenes.
One complication of this approach that @patrick-ogrady called out is that AvalancheGo currently uses the BLS key for more than generating signatures of Warp messages, such as in peer handshakes. These other uses that require the node to have the correct BLS private key associated with its registered BLS public key on hand would need to be accounted for to make this approach possible.
Additionally, it is not known or provable which of a Subnet's validator are or are not using observe node service or not. This makes it difficult to properly assess the practical security of the Subnet validators.
Option 2: Registering BLS "observer keys" on the P-Chain
This option involves allowing Subnets to explicitly BLS "observer keys" on the P-Chain that are not associated with any validator but are required to participate in an aggregate BLS signature in addition to a sufficient threshold of the validator stake weight in order for that aggregate BLS signature to be considered valid for a Warp message.
Setup
An
addObserverPublicKey
andremoveObserverPublicKey
transaction type would each be added to the P-Chain. This transaction type would allow a Subnet to specify a BLS public key that always must be represented in a valid Warp aggregate signature. Only the Subnet itself would be able to add or remove observers keys for their Subnet. This could be done for permissioned Subnets using the Subnet owner keys, or for an elastic Subnet via Warp message.Considerations
Registering an observer key for a Subnet on the P-Chain would require a proof-of-possession of that key, similar to as is required for
AddPermissionlessValidatorTx
s today.The registered observer keys could be computed as more complex configurations than a single private key. For instance, the observer public key registered on-chain could be a threshold BLS key of a K of N set up similar to as described in option 1. This could be done to provide a level of fault tolerance for observer keys such that a valid Warp signature could still be generated if certain observer keys are offline.
Those relaying Warp messages (such as the
awm-relayer
) would need to be aware of observer key requirements, and know how to query for BLS signatures from the observer keys. There may need to be a discovery mechanism for relayers to be able to find the nodes/endpoints willing to provide observer key signatures.Validating a Warp signature that requires observer keys in addition to sufficient stake weight of the validator set is a more expensive operation because additional public keys need to be looked up and aggregated to validate the aggregate BLS signature. Virtual machines validating Warp signatures would need to know how many observer keys are registered in order to charge transaction fees proportional to the number of signers participating in an aggregate signature. This could potentially be done by requiring the
Signers
bit vector of aBitSetSignature
includes 1's for all observer key indexes, which would need to be canonically ordered with the current validator set.Since the observer keys are registered on-chain, it is easier for an observer to assess the practical security of a Subnet validator set, particularly if the holders of the observer keys make themselves known publicly.
Conclusion
Adding additional non-validator BLS keys into the requirements to successfully validate a Warp signature could help improve the practical security of a Subnet for cross-chain messages sent from it, beyond its validator set alone. This could be attractive for newer Subnets with relatively initial validator sets.
There are multiple possible places and routes for these requirements to be added, each with their own pros and cons that should be discussed and considered prior to moving forward to a formal proposal.
Thanks to @patrick-ogrady for brainstorming the idea behind option #2 and the term "observer keys".
Beta Was this translation helpful? Give feedback.
All reactions