-
Notifications
You must be signed in to change notification settings - Fork 390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade Refactors #999
Upgrade Refactors #999
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is still in draft, but I gave the PR a first read and I noticed one thing about the storage of upgrade timeout...
* refactor and fix bugs * fix packet flushing logic * remove unnecessary states
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @AdityaSripal. I left some comments/questions.
There are still some places in the spec where the old states (TRYUPGRADE
, ACKUPGRADE
) are still mention and also FlushStatus
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking excellent! Only reviewed up to the confirm step so far. Will continue later today. Only two issues which I noted offline:
- need to ensure counterparty upgrade timeout is stored even when we are in flushing
- explicitly handle crossing hellos case in try step
abortTransactionUnless(channel !== null) | ||
abortTransactionUnless(channel.state !== CLOSED && channel.flushStatus === NOTINFLUSH) | ||
abortTransactionUnless(channel.state === OPEN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is disabling optimistic sends. Is this intentional? Maybe it should be in a separate pr? The previous behaviour before channel upgradability changes was channel.state !== CLOSED
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, its necessary now but wasn't done before. I suppose i could do it in a separate pr and then merge conflict
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
up to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Only recommendation would be to have startFlushUpgradeHandshake
only abort (no restores)
Oh and see this #999 (comment)
// abort the transaction if the callback returns an error and | ||
// there was no existing upgrade. This will allow the counterparty upgrade |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// abort the transaction if the callback returns an error and | |
// there was no existing upgrade. This will allow the counterparty upgrade | |
// abort the transaction if the callback returns an error | |
// This will allow the counterparty upgrade |
// counterparty channel must be proved to still be in OPEN state or INITUPGRADE state (crossing hellos) | ||
abortTransactionUnless(counterpartyChannel.State === OPEN || counterpartyChannel.State == INITUPGRADE) | ||
// counterparty channel must be proved to not have completed flushing after timeout has passed | ||
abortTransactionUnless(counterpartyChannel.state !== OPEN || counterpartyChannel.state == FLUSHCOMPLETE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this check needs to be modified. This should only state that counterpartyChannel must be in any state except FLUSHCOMPLETE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could have completed handshake and went to OPEN. added more logic to check if this indeed did happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see latest comment on this #999 (comment)
abortTransactionUnless(counterpartyChannel.State === OPEN || counterpartyChannel.State == INITUPGRADE) | ||
// counterparty channel must be proved to not have completed flushing after timeout has passed | ||
abortTransactionUnless(counterpartyChannel.state !== OPEN || counterpartyChannel.state == FLUSHCOMPLETE) | ||
abortTransactionUnless(counterpartyChannel.sequence === currentChannel.sequence) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just noting, in ibc-go we would need a proof guarentee. That is, don't allow an INIT if a channel been restored/upgraded in the same block
@@ -855,12 +885,12 @@ function timeoutChannelUpgrade( | |||
} | |||
``` | |||
|
|||
Note that the timeout logic only applies to the INIT step. This is to protect an upgrading chain from being stuck in a non-OPEN state if the counterparty cannot execute the TRY successfully. Once the TRY step succeeds, then both sides are guaranteed to have the upgrade feature enabled. Liveness is no longer an issue, because we can wait until liveness is restored to execute the ACK step which will move the channel definitely into an OPEN state (either a successful upgrade or a rollback). | |||
Both parties must not complete the upgrade handshake if the counterparty upgrade timeout has already passed. Even if both sides could have successfully moved to FLUSHCOMPLETE. This will prevent the channel ends from reaching incompatible states. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor nit: if the timeout has passed, but each chain moved to flush complete before the timeout passed. That's fine
|
||
The TRY chain will receive the timeout parameters chosen by the counterparty on INIT, so that it can reject any TRY message that is received after the specified timeout. This prevents the handshake from entering into an invalid state, in which the INIT chain processes a timeout successfully and restores its channel to `OPEN` while the TRY chain at a later point successfully writes a `TRY` state. | ||
Note that a channel upgrade handshake may never complete successfully if the in-flight packets cannot successfully be cleared. This can happen if the timeout value of a packet is too large, or an acknowledgement never arrives, or if there is a bug that makes acknowledging or timing out a packet impossible. In these cases, some out-of-protocol mechanism (e.g. governance) must step in to clear the packets "manually" perhaps by forcefully clearing the packet commitments before restarting the upgrade handshake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
must step in to clear the packets "manually" perhaps by forcefully clearing the packet commitments before restarting the upgrade handshake.
historical proofs scare me with this statement 😱 Maybe we can leave off any suggestion until the situation arises
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like all the changes! I just have some confusion on how timeouts should work and I think if we write an error receipt for a lower counterparty sequence, the cancellation handler should be updated to only fast forward the sequence (and not delete the upgrade) in that situation
} | ||
|
||
// connectionHops can change in a channelUpgrade, however both sides must still be each other's counterparty. | ||
proposedConnection = provableStore.get(connectionPath(proposedUpgradeFields.connectionHops[0]) | ||
// since connection hops may be provided by relayer, we will abort to avoid changing state based on relayer-provided value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment can be removed?
// it will set the channel to desiredChannel state and move to flushing mode if we are not already in flushing mode | ||
// it will store the upgrade timeout in hte upgrade state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be updated
|
||
currentChannel.state = desiredChannelState | ||
currentChannel.flushState = FLUSHING | ||
`startFlushUpgradeHandshake` will set the counterparty last packet send and continue blocking the upgrade from continuing until all in-flight packets have been flushed. When the channel is in blocked mode, any packet receive above the counterparty last packet send will be rejected. It will set the channel state to `FLUSHING` and block `sendPackets`. During this time; `receivePacket`, `acknowledgePacket` and `timeoutPacket` will still be allowed and processed according to the original channel parameters. The state machine will set a timer for how long the other side can take before it completes flushing and moves to `FLUSHCOMPLETE`. The new proposed upgrade will be stored in the public store for counterparty verification. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super nit: feel like this could be called startFlushing
abortTransactionUnless(counterpartyChannel.state !== OPEN && counterpartyChannel.state !== FLUSHCOMPLETE) | ||
// if counterparty channel state is OPEN, we should abort only if the counterparty has successfully completed upgrade | ||
if counterpartyChannel.state === OPEN { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. You wrote this:
if counterpartyChannel.state == OPEN || counterpartyChanne.state == FLUSHCOMPLETE {
return err
}
if counterpartyChannel.state == OPEN {
}
} | ||
abortTransactionUnless(counterpartyChannel.sequence === currentChannel.sequence) | ||
abortTransactionUnless(verifyChannelState(connection, proofHeight, proofChannel, currentChannel.counterpartyPortIdentifier, currentChannel.counterpartyChannelIdentifier, counterpartyChannel)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A little odd you do proofs after the if statement. I feel like proof verification should happen first?
version: upgrade.fields.version, | ||
sequence: currentChannel.sequence, | ||
} | ||
abortTransactionUnless(counterpartyChannel != upgradedChannel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so the logic here is that there's one specific counterparty channel state we don't want to restore on. Still feels a bit odd to prove the counterparty isn't in the upgraded state rather than proving the counterparty is still in the pre-upgrade state
if counterpartyUpgradeSequence < channel.sequence { | ||
errorReceipt = ErrorReceipt{ | ||
channel.sequence - 1, | ||
"sequence out of sync", // constant string changable by implementation | ||
} | ||
provableStore.set(channelUpgradeErrorPath(portIdentifier, channelIdentifier), errorReceipt) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are adding this, I think you need to handle this in the cancellation message to not delete the upgrade but just fast forward the sequence
I think the app callbacks need to be more explicitly defined. Apps should ensure they are still processing on existing channel field information until the upgrade succeeds. Init/Try/Ack should probably just be used to validate proposed upgrade fields |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK, noting I gave @AdityaSripal my approval for merge offline
This PR makes the following changes: