Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade Refactors #999

Merged
merged 21 commits into from
Jul 27, 2023
Merged

Upgrade Refactors #999

merged 21 commits into from
Jul 27, 2023

Conversation

AdityaSripal
Copy link
Member

@AdityaSripal AdityaSripal commented Jul 6, 2023

This PR makes the following changes:

  • Remove notion of upgrade states and instead only add FLUSHING and FLUSHINGCOMPLETE to state enum
  • Make INITUpgrade authority gated
  • Start timers on TRY and ACK messages
  • Add ChanUpgradeConfirm to ensure both sides are aware of counterparty upgrade timeout before moving to FLUSHCOMPLETE
  • Add packet processing logic

Copy link
Contributor

@crodriguezvega crodriguezvega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is still in draft, but I gave the PR a first read and I noticed one thing about the storage of upgrade timeout...

@AdityaSripal AdityaSripal marked this pull request as ready for review July 19, 2023 12:44
* refactor and fix bugs

* fix packet flushing logic

* remove unnecessary states
@AdityaSripal AdityaSripal changed the title Extend upgrade timeout Upgrade Refactors Jul 19, 2023
Copy link
Contributor

@crodriguezvega crodriguezvega left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @AdityaSripal. I left some comments/questions.

There are still some places in the spec where the old states (TRYUPGRADE, ACKUPGRADE) are still mention and also FlushStatus.

Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking excellent! Only reviewed up to the confirm step so far. Will continue later today. Only two issues which I noted offline:

  • need to ensure counterparty upgrade timeout is stored even when we are in flushing
  • explicitly handle crossing hellos case in try step

abortTransactionUnless(channel !== null)
abortTransactionUnless(channel.state !== CLOSED && channel.flushStatus === NOTINFLUSH)
abortTransactionUnless(channel.state === OPEN)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is disabling optimistic sends. Is this intentional? Maybe it should be in a separate pr? The previous behaviour before channel upgradability changes was channel.state !== CLOSED

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its necessary now but wasn't done before. I suppose i could do it in a separate pr and then merge conflict

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you

spec/core/ics-004-channel-and-packet-semantics/README.md Outdated Show resolved Hide resolved
spec/core/ics-004-channel-and-packet-semantics/UPGRADES.md Outdated Show resolved Hide resolved
spec/core/ics-004-channel-and-packet-semantics/UPGRADES.md Outdated Show resolved Hide resolved
Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Only recommendation would be to have startFlushUpgradeHandshake only abort (no restores)

Oh and see this #999 (comment)

spec/core/ics-004-channel-and-packet-semantics/README.md Outdated Show resolved Hide resolved
spec/core/ics-004-channel-and-packet-semantics/README.md Outdated Show resolved Hide resolved
Comment on lines +580 to +581
// abort the transaction if the callback returns an error and
// there was no existing upgrade. This will allow the counterparty upgrade
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// abort the transaction if the callback returns an error and
// there was no existing upgrade. This will allow the counterparty upgrade
// abort the transaction if the callback returns an error
// This will allow the counterparty upgrade

spec/core/ics-004-channel-and-packet-semantics/UPGRADES.md Outdated Show resolved Hide resolved
// counterparty channel must be proved to still be in OPEN state or INITUPGRADE state (crossing hellos)
abortTransactionUnless(counterpartyChannel.State === OPEN || counterpartyChannel.State == INITUPGRADE)
// counterparty channel must be proved to not have completed flushing after timeout has passed
abortTransactionUnless(counterpartyChannel.state !== OPEN || counterpartyChannel.state == FLUSHCOMPLETE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this check needs to be modified. This should only state that counterpartyChannel must be in any state except FLUSHCOMPLETE?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could have completed handshake and went to OPEN. added more logic to check if this indeed did happen

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see latest comment on this #999 (comment)

abortTransactionUnless(counterpartyChannel.State === OPEN || counterpartyChannel.State == INITUPGRADE)
// counterparty channel must be proved to not have completed flushing after timeout has passed
abortTransactionUnless(counterpartyChannel.state !== OPEN || counterpartyChannel.state == FLUSHCOMPLETE)
abortTransactionUnless(counterpartyChannel.sequence === currentChannel.sequence)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting, in ibc-go we would need a proof guarentee. That is, don't allow an INIT if a channel been restored/upgraded in the same block

@@ -855,12 +885,12 @@ function timeoutChannelUpgrade(
}
```

Note that the timeout logic only applies to the INIT step. This is to protect an upgrading chain from being stuck in a non-OPEN state if the counterparty cannot execute the TRY successfully. Once the TRY step succeeds, then both sides are guaranteed to have the upgrade feature enabled. Liveness is no longer an issue, because we can wait until liveness is restored to execute the ACK step which will move the channel definitely into an OPEN state (either a successful upgrade or a rollback).
Both parties must not complete the upgrade handshake if the counterparty upgrade timeout has already passed. Even if both sides could have successfully moved to FLUSHCOMPLETE. This will prevent the channel ends from reaching incompatible states.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor nit: if the timeout has passed, but each chain moved to flush complete before the timeout passed. That's fine


The TRY chain will receive the timeout parameters chosen by the counterparty on INIT, so that it can reject any TRY message that is received after the specified timeout. This prevents the handshake from entering into an invalid state, in which the INIT chain processes a timeout successfully and restores its channel to `OPEN` while the TRY chain at a later point successfully writes a `TRY` state.
Note that a channel upgrade handshake may never complete successfully if the in-flight packets cannot successfully be cleared. This can happen if the timeout value of a packet is too large, or an acknowledgement never arrives, or if there is a bug that makes acknowledging or timing out a packet impossible. In these cases, some out-of-protocol mechanism (e.g. governance) must step in to clear the packets "manually" perhaps by forcefully clearing the packet commitments before restarting the upgrade handshake.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

must step in to clear the packets "manually" perhaps by forcefully clearing the packet commitments before restarting the upgrade handshake.

historical proofs scare me with this statement 😱 Maybe we can leave off any suggestion until the situation arises

spec/core/ics-004-channel-and-packet-semantics/UPGRADES.md Outdated Show resolved Hide resolved
Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like all the changes! I just have some confusion on how timeouts should work and I think if we write an error receipt for a lower counterparty sequence, the cancellation handler should be updated to only fast forward the sequence (and not delete the upgrade) in that situation

}

// connectionHops can change in a channelUpgrade, however both sides must still be each other's counterparty.
proposedConnection = provableStore.get(connectionPath(proposedUpgradeFields.connectionHops[0])
// since connection hops may be provided by relayer, we will abort to avoid changing state based on relayer-provided value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment can be removed?

Comment on lines +300 to +301
// it will set the channel to desiredChannel state and move to flushing mode if we are not already in flushing mode
// it will store the upgrade timeout in hte upgrade state
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be updated


currentChannel.state = desiredChannelState
currentChannel.flushState = FLUSHING
`startFlushUpgradeHandshake` will set the counterparty last packet send and continue blocking the upgrade from continuing until all in-flight packets have been flushed. When the channel is in blocked mode, any packet receive above the counterparty last packet send will be rejected. It will set the channel state to `FLUSHING` and block `sendPackets`. During this time; `receivePacket`, `acknowledgePacket` and `timeoutPacket` will still be allowed and processed according to the original channel parameters. The state machine will set a timer for how long the other side can take before it completes flushing and moves to `FLUSHCOMPLETE`. The new proposed upgrade will be stored in the public store for counterparty verification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: feel like this could be called startFlushing

Comment on lines 875 to 877
abortTransactionUnless(counterpartyChannel.state !== OPEN && counterpartyChannel.state !== FLUSHCOMPLETE)
// if counterparty channel state is OPEN, we should abort only if the counterparty has successfully completed upgrade
if counterpartyChannel.state === OPEN {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. You wrote this:

if counterpartyChannel.state == OPEN || counterpartyChanne.state == FLUSHCOMPLETE {
    return err
}

if counterpartyChannel.state == OPEN {

}

}
abortTransactionUnless(counterpartyChannel.sequence === currentChannel.sequence)
abortTransactionUnless(verifyChannelState(connection, proofHeight, proofChannel, currentChannel.counterpartyPortIdentifier, currentChannel.counterpartyChannelIdentifier, counterpartyChannel))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A little odd you do proofs after the if statement. I feel like proof verification should happen first?

version: upgrade.fields.version,
sequence: currentChannel.sequence,
}
abortTransactionUnless(counterpartyChannel != upgradedChannel)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so the logic here is that there's one specific counterparty channel state we don't want to restore on. Still feels a bit odd to prove the counterparty isn't in the upgraded state rather than proving the counterparty is still in the pre-upgrade state

spec/core/ics-004-channel-and-packet-semantics/UPGRADES.md Outdated Show resolved Hide resolved
Comment on lines +551 to +557
if counterpartyUpgradeSequence < channel.sequence {
errorReceipt = ErrorReceipt{
channel.sequence - 1,
"sequence out of sync", // constant string changable by implementation
}
provableStore.set(channelUpgradeErrorPath(portIdentifier, channelIdentifier), errorReceipt)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we are adding this, I think you need to handle this in the cancellation message to not delete the upgrade but just fast forward the sequence

@colin-axner
Copy link
Contributor

colin-axner commented Jul 27, 2023

I think the app callbacks need to be more explicitly defined. Apps should ensure they are still processing on existing channel field information until the upgrade succeeds. Init/Try/Ack should probably just be used to validate proposed upgrade fields

@AdityaSripal AdityaSripal merged commit 9c842a4 into main Jul 27, 2023
@AdityaSripal AdityaSripal deleted the aditya/extend-upgrade-timeout branch July 27, 2023 16:06
Copy link
Contributor

@colin-axner colin-axner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK, noting I gave @AdityaSripal my approval for merge offline

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

Successfully merging this pull request may close these issues.

3 participants