Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add real text on error handling #335

Merged
merged 5 commits into from
Feb 27, 2017
Merged

Add real text on error handling #335

merged 5 commits into from
Feb 27, 2017

Conversation

martinthomson
Copy link
Member

This covers two things:

  1. the general operation of Public Reset packets and their role in the protocol, including two separate admonishments not to use Public Reset unless you don't have connection state

  2. how to generate errors for the connection and for streams

On the former, I want to add ekr's proposed public reset design in a separate PR.

On the latter, I've cribbed from HTTP/2 a little, but the advice here turns out to be subtly different.

Closes #289, #47.

@martinthomson martinthomson added design An issue that affects the design of the protocol; resolution requires consensus. -transport labels Feb 23, 2017
Copy link
Contributor

@MikeBishop MikeBishop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More special cases than I like, but a good direction.


An error on stream 1 MUST be handled as a connection error. Stream 1 is
critical to the functioning of the entire connection. Furthermore, some
application protocols place constraints on how streams are used that makes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SVA: "constraints ... that make"

affected stream.

An error on stream 1 MUST be handled as a connection error. Stream 1 is
critical to the functioning of the entire connection. Furthermore, some
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to tease this apart and say that if Stream 1 is closed in any way, that's a connection error? You don't necessarily need to special-case elevation of the error that impacted stream 1 to connection-level then, but it's still permitted.

certain streams critical to the functioning of the protocol. Termination of a
critical stream could result in a fatal error in an application protocol.
Errors in such critical streams MUST be changed into a error that closes the
entire connection.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be a transport requirement. If an application has such constraints, it needs to terminate the connection upon seeing the stream close. Otherwise, the application needs an API to "bless" certain streams. This parallels the above comment about stream 1 -- permit stream errors to be stream errors, but if you depend on a certain stream surviving and it doesn't, it's your job to kill the connection.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To this point, I thought I had text in HTTP saying that if the Connection Control stream closes, you MUST close the connection, but I don't see it in master. Maybe it's buried in a PR?

@@ -2022,7 +2071,58 @@ send a WINDOW_UPDATE frame at least two roundtrips before it expects the sender
to get blocked.


# Error Codes {#error-handling}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You removed this anchor, which is referenced elsewhere in the text.

It is sent by a middlebox to notify an endpoint that the path is no longer
functional. Such a Public Reset does not include a Public Reset Proof, it
includes only the connection ID, which provides an assurance that the middlebox
received packets on the flow.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would handle this with a structured proof instead. I.e.,

struct {
ProofType type;
opaque proof[len];
}

Then we can require the proof to contain the conn id or whatever. As I think I mentioned, proofs from non-participants should probably instead contain something derived from the packet, which includes a MAC and so is guaranteed to be high entropy.

Also, it's not clear to me we want to require conn ids to be high enough entropy to make receipt

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to put something in the field. And while I agree that it's not necessarily true that a connection ID sufficient, it is something that a recipient of a PR will have and can use. The same goes for packet numbers, which I anticipate adding to this.

I hope that there is only ever a need for one type of proof per version. That suggests that switching on type isn't necessary. Can you justify having multiple types of proof within the same version?

Note that this all anticipates merging something like Jana's most recent header proposal to some extent.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I understand it, it's quite likely that servers will not be sending the connection ID at all for many connections because the host/port is sufficient, so it's not clear that you have to put something in this field.

As far as having multiple kinds of proof, this PR in fact anticipates that, with one for an endpoint and one for middleboxes, it just labels the endpoint one proof and the middlebox one connection ID.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jana's proposal has a strategy for that. If you don't like it, then we can continue to work on that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I think I indicated in my email, I don't like it....

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an issue in the way this is stated. There's no way for a client to distinguish between a PR generated by a server from that generated by a middlebox. I think all we want to say that a valid PR MUST include proof that the PR was generated in response to a client packet. There's no way to say who's generating this because both middleboxes and "new" servers will know exactly the same things about the packet. (A middlebox may know more about the connection.)

After receiving a Public Reset that omits the Public Reset Proof, an endpoint
SHOULD stop sending on the path on which the Public Reset was received. Packets
that are received on the path MAY be discarded. Without support for multipath,
this means that the connection could become unusable.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/could/will/ right?

I'm not sure getting clever here helps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could recognizes the problem that the next paragraph covers

packets continue to arrive after a Public Reset, this could indicate an attack.
Delays could mean that a few packets arrive after a Public Reset, but if this
continues well beyond the observed bounds on jitter, an endpoint MAY consider
the Public Reset to have been spurious.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like more guidance may be needed here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to accept that guidance if you have it. I was going to say something about round trips or packets, but it's not especially easy to pick an answer or even general guidance.

@@ -2165,6 +2169,9 @@ QUIC_MULTIPLE_TERMINATION_OFFSETS (0x80000005):
QUIC_STREAM_CANCELLED (0x80000006):
: The stream was cancelled

QUIC_CLOSED_CRITICAL_STREAM (0x80000007):
: A stream that is critical to the protocol was closed or reset.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor, but I think you can just say "closed," as "reset" is a type of closure.

@ekr
Copy link
Collaborator

ekr commented Feb 23, 2017 via email

@martinthomson
Copy link
Member Author

I've added some changes that are stricter on unauthenticated public resets. Firstly, any unauthenticated public reset can be ignored. Secondly, I've used rttvar in detecting spurious resets. If you get a reset then continue to get valid packets after more than 2*rttvar, then carry on and mark the path as a den of scum and middleboxes of the worst moral fiber.

Copy link
Contributor

@janaiyengar janaiyengar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should require a proof that's always possible to generate for a server/on-path-middlebox, and treat them exactly the same at the client. I don't think there's a reason for the client to do anything with a PR that doesn't include a proof if the proof MUST be generated by the server.

depends on the entity that generates the packet.

A Public Reset packet sent by an endpoint indicates that it does not have the
state necessary to continue with a connection. In this case, the endpoint will
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: "... it does not have the state necessary to respond otherwise to the received packet."

It is sent by a middlebox to notify an endpoint that the path is no longer
functional. Such a Public Reset does not include a Public Reset Proof, it
includes only the connection ID, which provides an assurance that the middlebox
received packets on the flow.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an issue in the way this is stated. There's no way for a client to distinguish between a PR generated by a server from that generated by a middlebox. I think all we want to say that a valid PR MUST include proof that the PR was generated in response to a client packet. There's no way to say who's generating this because both middleboxes and "new" servers will know exactly the same things about the packet. (A middlebox may know more about the connection.)

received packets on the flow.

An endpoint MAY ignore a Public Reset that does not include a proof. A
middlebox might send a packet in an attempt to attack a connection. If packets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Earlier text suggests that a middlebox may send a PR, and this suggests that such sending may be an attack. How's a client to know?

middlebox might send a packet in an attempt to attack a connection. If packets
continue to arrive after a Public Reset, this could indicate an attack. Delays
could mean that a few packets arrive after a Public Reset. However, if valid
packets continue to arrive more than twice the RTT variance (see
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this recommendation is appropriate. I'd suggest that we discuss exactly what should happen here in a separate issue. Basically my concern is (i) there's no reason to believe that the server has any packets coming to the client, so waiting for some time may not make any sense, (ii) you could have the client force something by sending a PING frame to the server and then wait an RTO before closing the connection. At this point I think saying something like "a client MAY wait to see if the PR is legitimate" seems adequate.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That said, I don't see why a client needs to handle both a PR with proof and without proof. I'd lean towards making a proof mandatory (proof TBD) and then ignore PRs without proof. In my header proposal, the proof is simply the first 8 bytes of the packet.

@martinduke
Copy link
Contributor

Crazy idea alert: what is the value of having a separate CONNECTION_CLOSE frame at all, rather than adding an error code to Public Reset and doing all abrupt termination that way?

If the Reset is lost, the next packet from the peer will be for an unknown connection ID and trigger a Public Reset anyway. So reliablility isn't a big issue.

Pros: - It's simpler
-solves the problem of telling middleboxes when to delete their state.

Cons:

  • If the peer is idle for a long time after a lost Public Reset, it could maintain connection state for quite a while, whereas a retransmitted CONNECTION_CLOSE would have closed everything out. But with PING et al this is probably rare.
  • You need some kind of limiter to avoid a window of packets coming from the peer resulting in a storm of Public Resets. But we already have to deal with this in TCP (and will have to in other QUIC scenarios) and it seems to work OK.

@MikeBishop
Copy link
Contributor

MikeBishop commented Feb 23, 2017 via email

@martinduke
Copy link
Contributor

martinduke commented Feb 23, 2017 via email

@martinduke
Copy link
Contributor

To expand on the clean-connection case a bit more, I am not sure we have designed a mechanism that assures both sides that both sides have delivered all desired data. (we are hurting for a clean-close connection state machine).

I believe that if we repurposed CONNECTION_CLOSE to say "You have acknowledged everything I intended to send", and made both sides send it, that would solve the problem. This would also require not overloading CONNECTION_CLOSE with abrupt termination purposes, as I describe above.

@janaiyengar
Copy link
Contributor

Needs to be rebased, but LGTM. We still need to describe that data needs to be delivered when CONNECTION_CLOSE is received, but that can be added later.

@MikeBishop
Copy link
Contributor

Is there actually a difference between a peer that has lost all state and a middlebox that can't deliver packets to its next hop? I'm having trouble envisioning how a PR could be authenticated as coming from an endpoint that used to know you, unless you do something with long-lived key-pairs for which the endpoint persists the keypair and the peers learn about the public key during connection setup.

Short of that, I think it's going to be hard to exclude middleboxes while permitting rebooted peers. (And even harder if your peer is an Anycast IP address and you've suddenly wound up talking to a different instance.)

@ekr
Copy link
Collaborator

ekr commented Feb 25, 2017

Mike you can do this without public keys by using a single long term key and hash preimages as in PR #20.

@MikeBishop
Copy link
Contributor

Ah, so it doesn't have to be an asymmetric keypair, but you do need a persistent key. Which implies some way to do key rollover semi-cleanly. (Generate a new key every boot, use it for new connections, use the previous key for any incoming connections you don't recognize.) I would expect the window after boot that you'd need to be rejecting old packets to be reasonably small -- basically whatever connection idle time you were advertising before.

@martinthomson
Copy link
Member Author

Yep, you got it @MikeBishop, though it isn't that simple for a cluster of machines.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
-transport design An issue that affects the design of the protocol; resolution requires consensus.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants