Add real text on error handling #335

martinthomson · 2017-02-23T00:26:45Z

This covers two things:

the general operation of Public Reset packets and their role in the protocol, including two separate admonishments not to use Public Reset unless you don't have connection state
how to generate errors for the connection and for streams

On the former, I want to add ekr's proposed public reset design in a separate PR.

On the latter, I've cribbed from HTTP/2 a little, but the advice here turns out to be subtly different.

Closes #289, #47.

MikeBishop

More special cases than I like, but a good direction.

MikeBishop · 2017-02-23T00:37:32Z

draft-ietf-quic-transport.md

+
+An error on stream 1 MUST be handled as a connection error.  Stream 1 is
+critical to the functioning of the entire connection.  Furthermore, some
+application protocols place constraints on how streams are used that makes


SVA: "constraints ... that make"

MikeBishop · 2017-02-23T00:39:41Z

draft-ietf-quic-transport.md

+affected stream.
+
+An error on stream 1 MUST be handled as a connection error.  Stream 1 is
+critical to the functioning of the entire connection.  Furthermore, some


Would it make sense to tease this apart and say that if Stream 1 is closed in any way, that's a connection error? You don't necessarily need to special-case elevation of the error that impacted stream 1 to connection-level then, but it's still permitted.

MikeBishop · 2017-02-23T00:42:00Z

draft-ietf-quic-transport.md

+certain streams critical to the functioning of the protocol.  Termination of a
+critical stream could result in a fatal error in an application protocol.
+Errors in such critical streams MUST be changed into a error that closes the
+entire connection.


This shouldn't be a transport requirement. If an application has such constraints, it needs to terminate the connection upon seeing the stream close. Otherwise, the application needs an API to "bless" certain streams. This parallels the above comment about stream 1 -- permit stream errors to be stream errors, but if you depend on a certain stream surviving and it doesn't, it's your job to kill the connection.

To this point, I thought I had text in HTTP saying that if the Connection Control stream closes, you MUST close the connection, but I don't see it in master. Maybe it's buried in a PR?

MikeBishop · 2017-02-23T00:43:12Z

draft-ietf-quic-transport.md

@@ -2022,7 +2071,58 @@ send a WINDOW_UPDATE frame at least two roundtrips before it expects the sender
 to get blocked.


-# Error Codes {#error-handling}


You removed this anchor, which is referenced elsewhere in the text.

ekr · 2017-02-23T01:32:10Z

draft-ietf-quic-transport.md

+It is sent by a middlebox to notify an endpoint that the path is no longer
+functional.  Such a Public Reset does not include a Public Reset Proof, it
+includes only the connection ID, which provides an assurance that the middlebox
+received packets on the flow.


I think I would handle this with a structured proof instead. I.e.,

struct {
ProofType type;
opaque proof[len];
}

Then we can require the proof to contain the conn id or whatever. As I think I mentioned, proofs from non-participants should probably instead contain something derived from the packet, which includes a MAC and so is guaranteed to be high entropy.

Also, it's not clear to me we want to require conn ids to be high enough entropy to make receipt

You have to put something in the field. And while I agree that it's not necessarily true that a connection ID sufficient, it is something that a recipient of a PR will have and can use. The same goes for packet numbers, which I anticipate adding to this.

I hope that there is only ever a need for one type of proof per version. That suggests that switching on type isn't necessary. Can you justify having multiple types of proof within the same version?

Note that this all anticipates merging something like Jana's most recent header proposal to some extent.

As I understand it, it's quite likely that servers will not be sending the connection ID at all for many connections because the host/port is sufficient, so it's not clear that you have to put something in this field.

As far as having multiple kinds of proof, this PR in fact anticipates that, with one for an endpoint and one for middleboxes, it just labels the endpoint one proof and the middlebox one connection ID.

Jana's proposal has a strategy for that. If you don't like it, then we can continue to work on that.

As I think I indicated in my email, I don't like it....

I think there's an issue in the way this is stated. There's no way for a client to distinguish between a PR generated by a server from that generated by a middlebox. I think all we want to say that a valid PR MUST include proof that the PR was generated in response to a client packet. There's no way to say who's generating this because both middleboxes and "new" servers will know exactly the same things about the packet. (A middlebox may know more about the connection.)

ekr · 2017-02-23T01:34:05Z

draft-ietf-quic-transport.md

+After receiving a Public Reset that omits the Public Reset Proof, an endpoint
+SHOULD stop sending on the path on which the Public Reset was received.  Packets
+that are received on the path MAY be discarded.  Without support for multipath,
+this means that the connection could become unusable.


s/could/will/ right?

I'm not sure getting clever here helps.

could recognizes the problem that the next paragraph covers

ekr · 2017-02-23T01:34:19Z

draft-ietf-quic-transport.md

+packets continue to arrive after a Public Reset, this could indicate an attack.
+Delays could mean that a few packets arrive after a Public Reset, but if this
+continues well beyond the observed bounds on jitter, an endpoint MAY consider
+the Public Reset to have been spurious.


I feel like more guidance may be needed here.

Happy to accept that guidance if you have it. I was going to say something about round trips or packets, but it's not especially easy to pick an answer or even general guidance.

MikeBishop · 2017-02-23T01:43:35Z

draft-ietf-quic-transport.md

@@ -2165,6 +2169,9 @@ QUIC_MULTIPLE_TERMINATION_OFFSETS (0x80000005):
 QUIC_STREAM_CANCELLED (0x80000006):
 : The stream was cancelled

+QUIC_CLOSED_CRITICAL_STREAM (0x80000007):
+: A stream that is critical to the protocol was closed or reset.


Minor, but I think you can just say "closed," as "reset" is a type of closure.

ekr · 2017-02-23T03:07:43Z

On Wed, Feb 22, 2017 at 7:06 PM, Martin Thomson ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In draft-ietf-quic-transport.md <#335 (comment)>: > +include proof that the sender participated in the initial connection handshake. +It is sent by a middlebox to notify an endpoint that the path is no longer +functional. Such a Public Reset does not include a Public Reset Proof, it +includes only the connection ID, which provides an assurance that the middlebox +received packets on the flow. + +After receiving a Public Reset that omits the Public Reset Proof, an endpoint +SHOULD stop sending on the path on which the Public Reset was received. Packets +that are received on the path MAY be discarded. Without support for multipath, +this means that the connection could become unusable. + +A middlebox might send a packet in an attempt to attack a connection. If +packets continue to arrive after a Public Reset, this could indicate an attack. +Delays could mean that a few packets arrive after a Public Reset, but if this +continues well beyond the observed bounds on jitter, an endpoint MAY consider +the Public Reset to have been spurious. Happy to accept that guidance if you have it. I was going to say something about round trips or packets, but it's not especially easy to pick an answer or even general guidance.

If that's true, then we just shouldn't standardize this feature at all. I note that we don't seem to have anything like WG consensus so far that middleboxes should be able to do this.

…

— You are receiving this because you commented. Reply to this email directly, view it on GitHub <#335 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABD1oQ2s3H3YiuYbmaXVM-5H8kuesWXNks5rfPeagaJpZM4MJYSu> .

martinthomson · 2017-02-23T03:29:06Z

I've added some changes that are stricter on unauthenticated public resets. Firstly, any unauthenticated public reset can be ignored. Secondly, I've used rttvar in detecting spurious resets. If you get a reset then continue to get valid packets after more than 2*rttvar, then carry on and mark the path as a den of scum and middleboxes of the worst moral fiber.

janaiyengar

I think we should require a proof that's always possible to generate for a server/on-path-middlebox, and treat them exactly the same at the client. I don't think there's a reason for the client to do anything with a PR that doesn't include a proof if the proof MUST be generated by the server.

janaiyengar · 2017-02-23T07:07:47Z

draft-ietf-quic-transport.md

+depends on the entity that generates the packet.
+
+A Public Reset packet sent by an endpoint indicates that it does not have the
+state necessary to continue with a connection.  In this case, the endpoint will


Suggestion: "... it does not have the state necessary to respond otherwise to the received packet."

janaiyengar · 2017-02-23T07:14:27Z

draft-ietf-quic-transport.md

+It is sent by a middlebox to notify an endpoint that the path is no longer
+functional.  Such a Public Reset does not include a Public Reset Proof, it
+includes only the connection ID, which provides an assurance that the middlebox
+received packets on the flow.


I think there's an issue in the way this is stated. There's no way for a client to distinguish between a PR generated by a server from that generated by a middlebox. I think all we want to say that a valid PR MUST include proof that the PR was generated in response to a client packet. There's no way to say who's generating this because both middleboxes and "new" servers will know exactly the same things about the packet. (A middlebox may know more about the connection.)

janaiyengar · 2017-02-23T07:15:19Z

draft-ietf-quic-transport.md

+received packets on the flow.
+
+An endpoint MAY ignore a Public Reset that does not include a proof.  A
+middlebox might send a packet in an attempt to attack a connection.  If packets


Earlier text suggests that a middlebox may send a PR, and this suggests that such sending may be an attack. How's a client to know?

janaiyengar · 2017-02-23T07:19:32Z

draft-ietf-quic-transport.md

+middlebox might send a packet in an attempt to attack a connection.  If packets
+continue to arrive after a Public Reset, this could indicate an attack.  Delays
+could mean that a few packets arrive after a Public Reset.  However, if valid
+packets continue to arrive more than twice the RTT variance (see


I don't think this recommendation is appropriate. I'd suggest that we discuss exactly what should happen here in a separate issue. Basically my concern is (i) there's no reason to believe that the server has any packets coming to the client, so waiting for some time may not make any sense, (ii) you could have the client force something by sending a PING frame to the server and then wait an RTO before closing the connection. At this point I think saying something like "a client MAY wait to see if the PR is legitimate" seems adequate.

That said, I don't see why a client needs to handle both a PR with proof and without proof. I'd lean towards making a proof mandatory (proof TBD) and then ignore PRs without proof. In my header proposal, the proof is simply the first 8 bytes of the packet.

martinduke · 2017-02-23T20:23:08Z

Crazy idea alert: what is the value of having a separate CONNECTION_CLOSE frame at all, rather than adding an error code to Public Reset and doing all abrupt termination that way?

If the Reset is lost, the next packet from the peer will be for an unknown connection ID and trigger a Public Reset anyway. So reliablility isn't a big issue.

Pros: - It's simpler
-solves the problem of telling middleboxes when to delete their state.

Cons:

If the peer is idle for a long time after a lost Public Reset, it could maintain connection state for quite a while, whereas a retransmitted CONNECTION_CLOSE would have closed everything out. But with PING et al this is probably rare.
You need some kind of limiter to avoid a window of packets coming from the peer resulting in a storm of Public Resets. But we already have to deal with this in TCP (and will have to in other QUIC scenarios) and it seems to work OK.

MikeBishop · 2017-02-23T20:28:29Z

So the idea would be more like: * GOAWAY => no new streams created – connection is terminal * Service all existing streams * When the last stream closes and all data is ACK’d, discard your state and close silently For abrupt departures, use a Public Reset instead? Or are you suggesting that rather than closing silently, endpoints should emit a Public Reset packet once they’re totally done, and that’s the middleboxes’ signal to tear down state? From: martinduke [mailto:[email protected]] Sent: Thursday, February 23, 2017 12:23 PM To: quicwg/base-drafts <[email protected]> Cc: Mike Bishop <[email protected]>; Comment <[email protected]> Subject: Re: [quicwg/base-drafts] Add real text on error handling (#335) Crazy idea alert: what is the value of having a separate CONNECTION_CLOSE frame at all, rather than adding an error code to Public Reset and doing all abrupt termination that way? If the Reset is lost, the next packet from the peer will be for an unknown connection ID and trigger a Public Reset anyway. So reliablility isn't a big issue. Pros: - It's simpler -solves the problem of telling middleboxes when to delete their state. Cons: * If the peer is idle for a long time after a lost Public Reset, it could maintain connection state for quite a while, whereas a retransmitted CONNECTION_CLOSE would have closed everything out. But with PING et al this is probably rare. * You need some kind of limiter to avoid a window of packets coming from the peer resulting in a storm of Public Resets. But we already have to deal with this in TCP (and will have to in other QUIC scenarios) and it seems to work OK. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<#335 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AEE2hcnLQgQegTGDGsURs67kZCLVAtFxks5rfeqtgaJpZM4MJYSu>.

martinduke · 2017-02-23T20:33:00Z

Well this suggestion was focused on the abrupt termination case that Martin T. outlines in the PR. I am not sure at all where we have landed on what a clean close looks like, but it seems like the messages prior to CONNECTION_CLOSE are doing all the work. Sending a public reset when all that stuff is done would certainly help.

…

On Thu, Feb 23, 2017 at 12:28 PM, Mike Bishop ***@***.***> wrote: So the idea would be more like: * GOAWAY => no new streams created – connection is terminal * Service all existing streams * When the last stream closes and all data is ACK’d, discard your state and close silently For abrupt departures, use a Public Reset instead? Or are you suggesting that rather than closing silently, endpoints should emit a Public Reset packet once they’re totally done, and that’s the middleboxes’ signal to tear down state? From: martinduke ***@***.*** Sent: Thursday, February 23, 2017 12:23 PM To: quicwg/base-drafts ***@***.***> Cc: Mike Bishop ***@***.***>; Comment < ***@***.***> Subject: Re: [quicwg/base-drafts] Add real text on error handling (#335) Crazy idea alert: what is the value of having a separate CONNECTION_CLOSE frame at all, rather than adding an error code to Public Reset and doing all abrupt termination that way? If the Reset is lost, the next packet from the peer will be for an unknown connection ID and trigger a Public Reset anyway. So reliablility isn't a big issue. Pros: - It's simpler -solves the problem of telling middleboxes when to delete their state. Cons: * If the peer is idle for a long time after a lost Public Reset, it could maintain connection state for quite a while, whereas a retransmitted CONNECTION_CLOSE would have closed everything out. But with PING et al this is probably rare. * You need some kind of limiter to avoid a window of packets coming from the peer resulting in a storm of Public Resets. But we already have to deal with this in TCP (and will have to in other QUIC scenarios) and it seems to work OK. — You are receiving this because you commented. Reply to this email directly, view it on GitHub<https://github.com/ quicwg/base-drafts#335#issuecomment-282109796>, or mute the thread< https://github.com/notifications/unsubscribe-auth/ AEE2hcnLQgQegTGDGsURs67kZCLVAtFxks5rfeqtgaJpZM4MJYSu>. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#335 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AXRMEZsTc80As1HRVRkzJFY7pbaoAdBIks5rfevwgaJpZM4MJYSu> .

martinduke · 2017-02-23T21:04:44Z

To expand on the clean-connection case a bit more, I am not sure we have designed a mechanism that assures both sides that both sides have delivered all desired data. (we are hurting for a clean-close connection state machine).

I believe that if we repurposed CONNECTION_CLOSE to say "You have acknowledged everything I intended to send", and made both sides send it, that would solve the problem. This would also require not overloading CONNECTION_CLOSE with abrupt termination purposes, as I describe above.

janaiyengar · 2017-02-24T06:41:11Z

Needs to be rebased, but LGTM. We still need to describe that data needs to be delivered when CONNECTION_CLOSE is received, but that can be added later.

MikeBishop · 2017-02-25T00:02:23Z

Is there actually a difference between a peer that has lost all state and a middlebox that can't deliver packets to its next hop? I'm having trouble envisioning how a PR could be authenticated as coming from an endpoint that used to know you, unless you do something with long-lived key-pairs for which the endpoint persists the keypair and the peers learn about the public key during connection setup.

Short of that, I think it's going to be hard to exclude middleboxes while permitting rebooted peers. (And even harder if your peer is an Anycast IP address and you've suddenly wound up talking to a different instance.)

ekr · 2017-02-25T00:03:54Z

Mike you can do this without public keys by using a single long term key and hash preimages as in PR #20.

MikeBishop · 2017-02-25T00:16:25Z

Ah, so it doesn't have to be an asymmetric keypair, but you do need a persistent key. Which implies some way to do key rollover semi-cleanly. (Generate a new key every boot, use it for new connections, use the previous key for any incoming connections you don't recognize.) I would expect the window after boot that you'd need to be rejecting old packets to be reasonably small -- basically whatever connection idle time you were advertising before.

martinthomson · 2017-02-27T01:08:55Z

Yep, you got it @MikeBishop, though it isn't that simple for a cluster of machines.

…eceiving packets more than rttvar after the Public Reset

martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Feb 23, 2017

martinthomson assigned janaiyengar Feb 23, 2017

MikeBishop reviewed Feb 23, 2017

View reviewed changes

ekr reviewed Feb 23, 2017

View reviewed changes

MikeBishop reviewed Feb 23, 2017

View reviewed changes

janaiyengar reviewed Feb 23, 2017

View reviewed changes

janaiyengar approved these changes Feb 24, 2017

View reviewed changes

martinduke mentioned this pull request Feb 24, 2017

CONNECTION_CLOSE semantics #330

Closed

martinthomson added 5 commits February 27, 2017 12:12

Add real text on error handling

2f51cd6

Refining based on @MikeBishop's feedback

656dac0

Closed encompasses reset

5993e2e

Dealing with middleboxen more strictly, MAY discard, continue after r…

00f6a32

…eceiving packets more than rttvar after the Public Reset

No unauthenticated PR just yet

b8ab6b7

martinthomson force-pushed the reset_close branch from e80ae82 to b8ab6b7 Compare February 27, 2017 01:12

martinthomson merged commit 18638d0 into master Feb 27, 2017

martinthomson deleted the reset_close branch February 27, 2017 01:15

martinthomson mentioned this pull request Mar 10, 2017

Expand core protocol error handling section #47

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add real text on error handling #335

Add real text on error handling #335

martinthomson commented Feb 23, 2017

MikeBishop left a comment

MikeBishop Feb 23, 2017

MikeBishop Feb 23, 2017

MikeBishop Feb 23, 2017

MikeBishop Feb 23, 2017

MikeBishop Feb 23, 2017

ekr Feb 23, 2017

martinthomson Feb 23, 2017

ekr Feb 23, 2017

martinthomson Feb 23, 2017

ekr Feb 23, 2017

janaiyengar Feb 23, 2017

ekr Feb 23, 2017

martinthomson Feb 23, 2017

ekr Feb 23, 2017

martinthomson Feb 23, 2017

MikeBishop Feb 23, 2017

ekr commented Feb 23, 2017 via email

martinthomson commented Feb 23, 2017

janaiyengar left a comment

janaiyengar Feb 23, 2017

janaiyengar Feb 23, 2017

janaiyengar Feb 23, 2017

janaiyengar Feb 23, 2017

janaiyengar Feb 23, 2017

martinduke commented Feb 23, 2017

MikeBishop commented Feb 23, 2017 via email

martinduke commented Feb 23, 2017 via email

martinduke commented Feb 23, 2017

janaiyengar commented Feb 24, 2017

MikeBishop commented Feb 25, 2017

ekr commented Feb 25, 2017

MikeBishop commented Feb 25, 2017

martinthomson commented Feb 27, 2017

		@@ -2022,7 +2071,58 @@ send a WINDOW_UPDATE frame at least two roundtrips before it expects the sender
		to get blocked.


		# Error Codes {#error-handling}

Add real text on error handling #335

Add real text on error handling #335

Conversation

martinthomson commented Feb 23, 2017

MikeBishop left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ekr commented Feb 23, 2017 via email

martinthomson commented Feb 23, 2017

janaiyengar left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

martinduke commented Feb 23, 2017

MikeBishop commented Feb 23, 2017 via email

martinduke commented Feb 23, 2017 via email

martinduke commented Feb 23, 2017

janaiyengar commented Feb 24, 2017

MikeBishop commented Feb 25, 2017

ekr commented Feb 25, 2017

MikeBishop commented Feb 25, 2017

martinthomson commented Feb 27, 2017