Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generalized timeout strategy #86

Closed
cwgoes opened this issue May 9, 2019 · 8 comments
Closed

Generalized timeout strategy #86

cwgoes opened this issue May 9, 2019 · 8 comments
Labels
brainstorming Open-ended brainstorming. tao Transport, authentication, & ordering layer.

Comments

@cwgoes
Copy link
Contributor

cwgoes commented May 9, 2019

In many places in the IBC protocol, we will need timeouts to deal with the case when transactions are not committed on a chain (which could be caused by competitive fee markets, censorship, offline relayers, any number of factors - the protocol cannot know the cause so must be designed independently of it).

At minimum, we will need timeouts in:

  • Sending packets (handling a timeout on the sending chain if the packet is not committed)
  • Handshakes for connections & channels (resetting state if the handshake is not finished)

I think we can use the same general strategy in these cases.

General strategy: destination chain is live but datagram is censored

  • Source chain A, destination chain B
  • A sends a datagram to B
  • Assume B is live (making progress in consensus) but the transaction is not confirmed

Construct the protocol as following:

  • Choose a field counter which is stored and signed over in each Header of B (example instantiations: BFT timestamp, block height). This field must only increase or remain constant in subsequent headers (it cannot decrease), although it need not be monotonic.
  • Include a maximumCounter field in the datagram sent from A to B
  • Construct the IBC handler logic to reject (refuse to execute) the datagram on B if counter > maximumCounter when the transaction is executed.

Then, when A sends the datagram to B, either:

  1. The datagram is executed and appropriate action taken while counter < maximumCounter.
  2. The datagram is not executed before counter < maximumCounter. Then a proof can be relayed back to A of the following:
    1. counter of B's ConsensusState is greater than maximumCounter
    2. The datagram has not been executed (e.g. handshake state unchanged, packet exclusion)

In case (1), A can proceed as normal. In case (2), A can safely reason that the datagram will never be executed on B and it can take appropriate action, e.g. unescrowing tokens, resetting a connection to an earlier state in the handshake, etc.

This may be sufficient, but if we need to concern ourselves with the case when B is not live at all, a different strategy will be required, since it will not be possible to relay a proof from B to A that counter has exceeded maximumCounter (no new headers are being produced). In that case, we may need to rely on weak liveness assumptions, an externally tied counter (e.g. timestamp), and a challenge period for headers to be submitted to A - but first we should decide if we want to design for that model.

@cwgoes cwgoes added tao Transport, authentication, & ordering layer. brainstorming Open-ended brainstorming. labels May 9, 2019
@cwgoes cwgoes changed the title General timeout strategy Generalized timeout strategy May 9, 2019
@cwgoes cwgoes self-assigned this May 9, 2019
@zmanian
Copy link
Member

zmanian commented May 9, 2019

Here are my thoughts on timeouts.

Correct me if I am wrong but the handshake should be the the only base layer message sequence the should require a time out.

I think timeouts should be local. ie. When a blockchain A intimates a Handshake, it should state a local height where it will teardown all the state if the handshake hasn't completed by. If the teardown happens, part of the Teardown process should send effective send a reset packet/message that tears down intermediate state on the corresponding chain.

@cwgoes
Copy link
Contributor Author

cwgoes commented May 9, 2019

Correct me if I am wrong but the handshake should be the the only base layer message sequence the should require a time out.

Yes (there will be several handshakes - for creating & closing connections & channels, at minimum) - and of course packet timeouts (not sure if you consider those base layer messages).

I think timeouts should be local. ie. When a blockchain A intimates a Handshake, it should state a local height where it will teardown all the state if the handshake hasn't completed by. If the teardown happens, part of the Teardown process should send effective send a reset packet/message that tears down intermediate state on the corresponding chain.

Hmm, I'm not sure this will work for the final datagram in a handshake, since there is no reply expected. I can see why it might be advantageous for handshakes, though, since it works if B is not live (no consensus progress).

I'm quite sure this won't work for packets, because it doesn't provide atomicity guarantees (possible that A fires the timeout and B eventually commits the first datagram anyways).

This was referenced May 13, 2019
@jackzampolin
Copy link
Member

Maybe we need to have a different timeout scheme for packets then?

@ethanfrey
Copy link
Contributor

ethanfrey commented May 13, 2019

This is the same Timeout scheme (albeit generalized with counter instead of block height) presented in both ibc papers I wrote at cosmos. Both Zaki and Jae seemed quite happy with the above scheme for packet Timeout. In fact it was actually Jae's idea (writen on a napkin) which I formalized. I wonder why this is now being questioned.

As to the case where b is no live (halted indefinitely, hard fork), I think it would be almost impossible to objectively prove anything. And would open the door for people profiting from network partitions and double spending over ibc.

Rather than use a Timeout there, it would be interesting to. Handle this in the case the chain fully died. Such that the ibc connection is torn down with transactions pending and balance transfered. I think that is a hard problem to solve and would also take care of b not. live case without adding timing assumptions into. Ibc

@cwgoes
Copy link
Contributor Author

cwgoes commented May 13, 2019

Maybe we need to have a different timeout scheme for packets then?

For packets, we definitely need a scheme which is safe under asynchrony (of transaction confirmation).

Even for the handshake, I would favor this property, since it allows us to make strong claims about the possible states of both chains that we cannot (under asynchrony) with local-only timeouts.

This is the same Timeout scheme (albeit generalized with counter instead of block height) presented in both ibc papers I wrote at cosmos. Both Zaki and Jae seemed quite happy with the above scheme for packet Timeout. In fact it was actually Jae's idea (writen on a napkin) which I formalized. I wonder why this is now being questioned.

I don't think it is being questioned - and indeed, the essence of the scheme hasn't changed from your original writeup (or perhaps from the napkin, that predates me) - just writing it down in case anyone had comments or other ideas.

Rather than use a Timeout there, it would be interesting to. Handle this in the case the chain fully died. Such that the ibc connection is torn down with transactions pending and balance transfered. I think that is a hard problem to solve and would also take care of b not. live case without adding timing assumptions into. Ibc

Yes - I think this will be covered under #6, but is a bit lower priority.

@ethanfrey
Copy link
Contributor

@cwgoes I like the issue write-up, and wanted to just comment on the liveness issue. However, I was a bit surprised by this comment:

Correct me if I am wrong but the handshake should be the the only base layer message sequence the should require a time out.

And wanted to challenge that. IBC application packets require timeouts most of all. I think Handshakes have a natural timeout of the validity of the validator set (unbonding period), and I'm not even sure if we need to add a stricter timeout besides those embedded in light-client proofs.

@zmanian
Copy link
Member

zmanian commented May 15, 2019

I think we mean different things. I'm saying that partially complete Handshake should time out. The natural timeout on a connection is the unbonding time of the counter party chain.

My understanding of the protocol is focused on interactions where request and response cycles are needed to transition a state machine at base layer for a connection. Only connection and chennel level handshakes should require these time outs.

Higher level protocols like token transfers etc may need to define approiate timeouts as well but I view this as a higher protocol layer.

@cwgoes cwgoes removed their assignment Jun 8, 2019
@cwgoes
Copy link
Contributor Author

cwgoes commented Aug 24, 2019

We do this everywhere now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
brainstorming Open-ended brainstorming. tao Transport, authentication, & ordering layer.
Projects
Status: Backlog
Development

No branches or pull requests

4 participants