Generalized timeout strategy #86

cwgoes · 2019-05-09T11:07:42Z

In many places in the IBC protocol, we will need timeouts to deal with the case when transactions are not committed on a chain (which could be caused by competitive fee markets, censorship, offline relayers, any number of factors - the protocol cannot know the cause so must be designed independently of it).

At minimum, we will need timeouts in:

Sending packets (handling a timeout on the sending chain if the packet is not committed)
Handshakes for connections & channels (resetting state if the handshake is not finished)

I think we can use the same general strategy in these cases.

General strategy: destination chain is live but datagram is censored

Source chain A, destination chain B
A sends a datagram to B
Assume B is live (making progress in consensus) but the transaction is not confirmed

Construct the protocol as following:

Choose a field counter which is stored and signed over in each Header of B (example instantiations: BFT timestamp, block height). This field must only increase or remain constant in subsequent headers (it cannot decrease), although it need not be monotonic.
Include a maximumCounter field in the datagram sent from A to B
Construct the IBC handler logic to reject (refuse to execute) the datagram on B if counter > maximumCounter when the transaction is executed.

Then, when A sends the datagram to B, either:

The datagram is executed and appropriate action taken while counter < maximumCounter.
The datagram is not executed before counter < maximumCounter. Then a proof can be relayed back to A of the following:
1. counter of B's ConsensusState is greater than maximumCounter
2. The datagram has not been executed (e.g. handshake state unchanged, packet exclusion)

In case (1), A can proceed as normal. In case (2), A can safely reason that the datagram will never be executed on B and it can take appropriate action, e.g. unescrowing tokens, resetting a connection to an earlier state in the handshake, etc.

This may be sufficient, but if we need to concern ourselves with the case when B is not live at all, a different strategy will be required, since it will not be possible to relay a proof from B to A that counter has exceeded maximumCounter (no new headers are being produced). In that case, we may need to rely on weak liveness assumptions, an externally tied counter (e.g. timestamp), and a challenge period for headers to be submitted to A - but first we should decide if we want to design for that model.

The text was updated successfully, but these errors were encountered:

zmanian · 2019-05-09T22:30:42Z

Here are my thoughts on timeouts.

Correct me if I am wrong but the handshake should be the the only base layer message sequence the should require a time out.

I think timeouts should be local. ie. When a blockchain A intimates a Handshake, it should state a local height where it will teardown all the state if the handshake hasn't completed by. If the teardown happens, part of the Teardown process should send effective send a reset packet/message that tears down intermediate state on the corresponding chain.

cwgoes · 2019-05-09T22:59:10Z

Correct me if I am wrong but the handshake should be the the only base layer message sequence the should require a time out.

Yes (there will be several handshakes - for creating & closing connections & channels, at minimum) - and of course packet timeouts (not sure if you consider those base layer messages).

I think timeouts should be local. ie. When a blockchain A intimates a Handshake, it should state a local height where it will teardown all the state if the handshake hasn't completed by. If the teardown happens, part of the Teardown process should send effective send a reset packet/message that tears down intermediate state on the corresponding chain.

Hmm, I'm not sure this will work for the final datagram in a handshake, since there is no reply expected. I can see why it might be advantageous for handshakes, though, since it works if B is not live (no consensus progress).

I'm quite sure this won't work for packets, because it doesn't provide atomicity guarantees (possible that A fires the timeout and B eventually commits the first datagram anyways).

jackzampolin · 2019-05-13T20:35:11Z

Maybe we need to have a different timeout scheme for packets then?

ethanfrey · 2019-05-13T20:40:24Z

This is the same Timeout scheme (albeit generalized with counter instead of block height) presented in both ibc papers I wrote at cosmos. Both Zaki and Jae seemed quite happy with the above scheme for packet Timeout. In fact it was actually Jae's idea (writen on a napkin) which I formalized. I wonder why this is now being questioned.

As to the case where b is no live (halted indefinitely, hard fork), I think it would be almost impossible to objectively prove anything. And would open the door for people profiting from network partitions and double spending over ibc.

Rather than use a Timeout there, it would be interesting to. Handle this in the case the chain fully died. Such that the ibc connection is torn down with transactions pending and balance transfered. I think that is a hard problem to solve and would also take care of b not. live case without adding timing assumptions into. Ibc

cwgoes · 2019-05-13T20:53:39Z

Maybe we need to have a different timeout scheme for packets then?

For packets, we definitely need a scheme which is safe under asynchrony (of transaction confirmation).

Even for the handshake, I would favor this property, since it allows us to make strong claims about the possible states of both chains that we cannot (under asynchrony) with local-only timeouts.

This is the same Timeout scheme (albeit generalized with counter instead of block height) presented in both ibc papers I wrote at cosmos. Both Zaki and Jae seemed quite happy with the above scheme for packet Timeout. In fact it was actually Jae's idea (writen on a napkin) which I formalized. I wonder why this is now being questioned.

I don't think it is being questioned - and indeed, the essence of the scheme hasn't changed from your original writeup (or perhaps from the napkin, that predates me) - just writing it down in case anyone had comments or other ideas.

Rather than use a Timeout there, it would be interesting to. Handle this in the case the chain fully died. Such that the ibc connection is torn down with transactions pending and balance transfered. I think that is a hard problem to solve and would also take care of b not. live case without adding timing assumptions into. Ibc

Yes - I think this will be covered under #6, but is a bit lower priority.

ethanfrey · 2019-05-14T19:38:53Z

@cwgoes I like the issue write-up, and wanted to just comment on the liveness issue. However, I was a bit surprised by this comment:

Correct me if I am wrong but the handshake should be the the only base layer message sequence the should require a time out.

And wanted to challenge that. IBC application packets require timeouts most of all. I think Handshakes have a natural timeout of the validity of the validator set (unbonding period), and I'm not even sure if we need to add a stricter timeout besides those embedded in light-client proofs.

zmanian · 2019-05-15T03:45:18Z

I think we mean different things. I'm saying that partially complete Handshake should time out. The natural timeout on a connection is the unbonding time of the counter party chain.

My understanding of the protocol is focused on interactions where request and response cycles are needed to transition a state machine at base layer for a connection. Only connection and chennel level handshakes should require these time outs.

Higher level protocols like token transfers etc may need to define approiate timeouts as well but I view this as a higher protocol layer.

cwgoes · 2019-08-24T19:23:16Z

We do this everywhere now.

cwgoes added tao Transport, authentication, & ordering layer. brainstorming Open-ended brainstorming. labels May 9, 2019

cwgoes changed the title ~~General timeout strategy~~ Generalized timeout strategy May 9, 2019

cwgoes mentioned this issue May 9, 2019

ICS 3: Connection Semantics #32

Merged

cwgoes self-assigned this May 9, 2019

cwgoes mentioned this issue May 9, 2019

ICS 25: Handler interface #79

Merged

This was referenced May 13, 2019

ICS 2: Consensus Verification #20

Merged

ICS 9: IBC timeouts #4

Closed

cwgoes removed their assignment Jun 8, 2019

cwgoes closed this as completed Aug 24, 2019

tankcdr added this to IBC-GO Eureka Nov 19, 2024

github-project-automation bot moved this to Backlog in IBC-GO Eureka Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generalized timeout strategy #86

Generalized timeout strategy #86

cwgoes commented May 9, 2019 •

edited

Loading

zmanian commented May 9, 2019

cwgoes commented May 9, 2019

jackzampolin commented May 13, 2019

ethanfrey commented May 13, 2019 •

edited

Loading

cwgoes commented May 13, 2019 •

edited

Loading

ethanfrey commented May 14, 2019

zmanian commented May 15, 2019

cwgoes commented Aug 24, 2019

Generalized timeout strategy #86

Generalized timeout strategy #86

Comments

cwgoes commented May 9, 2019 • edited Loading

zmanian commented May 9, 2019

cwgoes commented May 9, 2019

jackzampolin commented May 13, 2019

ethanfrey commented May 13, 2019 • edited Loading

cwgoes commented May 13, 2019 • edited Loading

ethanfrey commented May 14, 2019

zmanian commented May 15, 2019

cwgoes commented Aug 24, 2019

cwgoes commented May 9, 2019 •

edited

Loading

ethanfrey commented May 13, 2019 •

edited

Loading

cwgoes commented May 13, 2019 •

edited

Loading