-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
clisqlshell: handle interactive query cancellation #76437
Conversation
@rafiss could you download this branch and try it out? I'm seeing the following annoying behavior: the cancel is processed by the server (the client receives the cancel error, as expected), but the server also immediately closes the connection. I think that's either a bug in the handling of the cancel request server-side, or lib/pq getting confused by its own context cancel support? But I'm not sure. any ideas how to troubleshoot this? |
8951c0d
to
9f8582f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you download this branch and try it out?
trying now
the cancel is processed by the server (the client receives the cancel error, as expected), but the server also immediately closes the connection.
do you mean the server immediately closes the original SQL connection, or immediately closes the new connection made to send the cancel request?
I think that's either a bug in the handling of the cancel request server-side, or lib/pq getting confused by its own context cancel support?
I did hit some annoyance with how lib/pq supports cancels when testing. Specifically, this block: https://github.com/lib/pq/blob/8446d16b8935fdf2b5c0fe333538ac395e3e1e4b/conn_go18.go#L177-L181
If I understand correctly, that means lib/pq requires the server to close the connection used to send the cancel request
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @otan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you mean the server immediately closes the original SQL connection, or immediately closes the new connection made to send the cancel request?
We do not have control over the new connection, that's deep inside lib/pq, and that is essentially invisible here.
It's the original SQL connection that gets closed: after lib/pq returns from running the canceled query, the sql conn object is not available any more.
If I understand correctly, that means lib/pq requires the server to close the connection used to send the cancel request
Yes, that's what psql
does too. We're not concerned about this.
We're focusin on the original SQL conn, and this should remain open. Unfortunately, it does not. I don't know why.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @otan)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I captured the traffic using Wireshark. >
denotes client-to-server traffic and <
denotes server-to-client traffic.
1 0.000000 127.0.0.1 → 127.0.0.1 PGSQL 81 >Q # client sends pg_sleep(5)
2 1.116952 127.0.0.1 → 127.0.0.1 PGSQL 72 > # client sends cancel request
3 1.117599 127.0.0.1 → 127.0.0.1 PGSQL 165 <T/E # server sends "RowDescription" followed by "query execution canceled" ErrorResponse
4 1.117613 127.0.0.1 → 127.0.0.1 PGSQL 62 <Z # server sends ReadyForQuery message
5 1.117684 127.0.0.1 → 127.0.0.1 PGSQL 61 >X # client sends Termination message
6 1.118599 127.0.0.1 → 127.0.0.1 PGSQL 169 > # client sends StartupMessage to begin a new session
So either the CLI or lib/pq itself is sending a Termination message. ... After a bit of searching, I think it probably is lib/pq https://github.com/lib/pq/blob/8446d16b8935fdf2b5c0fe333538ac395e3e1e4b/conn_go18.go#L134
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru and @otan)
That change was made as part of lib/pq#1000 |
Ouch, that's bad right? We really want the cancellation to leave the session open, with all its session state. So this means that lib/pq is hosed for us? No chance but to switch to pgx then? |
Yeah I don't agree with that lib/pq behavior... We could try patching it, but I guess it would be a pretty big compat change for lib/pq users who rely on that. Using pgx should work as we want. Also relevant, it look like pgx does not send a pgwire CancelRequest when the context is done, so we'd have to do that ourselves if we want to start relying on context deadlines. |
I think it does but it's very weird how it does it:
So what I think happens is that if you cancel the context, the code sets a timeout far in the past, expects the go runtime to see that the I/O read (waiting for the query result) will never succeed, fail with a context deadline, then the deferred cancellation runs in the main goroutine. This is a clever way to handle context cancellation without spawning a new goroutine every time, and while avoiding the ordering problem that motivated the (misguided) change in lib/pq. So yeah, we could perhaps use pgx, even though it doesn't fill me with joy (it's a larger refactor!) |
Do you think a patch to lib/pq to expose |
yes, even for the broader community of folk using lib/pq.
Well it'd make it work by changing the code to have us peek inside the lib/pq API, instead of using a context cancel. That somewhat breaks the goal of the |
What about this:
WDYT? |
Your plan sounds good to me! Maybe we could even make the shell expect the current lib/pq behavior, so it doesn't show the |
Nah that's a dark pattern: the state of the session is lost, including any session var customizations. We don't want to hide that. Also we want to fix it anyway, so why add some polish for a behavior we're going to not have in the next PR... |
Now with tests. RFAL |
5ca0e1a
to
19a4e65
Compare
As of this PR, there's a bug in lib/pq which forces the session to terminate when any query gets cancelled. We find this unacceptable and we plan to fix it later. Release note (cli change): The interactive SQL shell (`cockroach sql`, `cockroach demo`) now supports interrupting a currently running query with Ctrl+C, without losing access to the shell.
19a4e65
to
afc0786
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @knz, and @otan)
pkg/cli/clisqlshell/sql.go, line 2055 at r7 (raw file):
// The cancellation function can be used by the Ctrl+C handler // to cancel this query. ctx, cancel := context.WithCancel(context.Background())
[coming more from a lack of understanding, rather than giving feedback]: why do we need a new context each time we run a query here? if we were to use the same cancellable context to run all the queries, then wouldn't we be able to avoid having to set/unset cancelFn
all the time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @otan, and @rafiss)
pkg/cli/clisqlshell/sql.go, line 2055 at r7 (raw file):
Previously, rafiss (Rafi Shamim) wrote…
[coming more from a lack of understanding, rather than giving feedback]: why do we need a new context each time we run a query here? if we were to use the same cancellable context to run all the queries, then wouldn't we be able to avoid having to set/unset
cancelFn
all the time?
a context can only be cancelled once. If we reused the same, after the 1st query cancelled we wouldn't be able to cancel any more query after that (in fact, no further query would ever run since the context would be cancelled already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @adityamaru, @knz, and @otan)
pkg/cli/clisqlshell/sql.go, line 2055 at r7 (raw file):
Previously, knz (kena) wrote…
a context can only be cancelled once. If we reused the same, after the 1st query cancelled we wouldn't be able to cancel any more query after that (in fact, no further query would ever run since the context would be cancelled already)
right, makes sense!
TFYR! bors r=rafiss |
Build succeeded: |
Fixes #76433.
As of this PR, there's a bug in lib/pq which forces the session to
terminate when any query gets cancelled. We find this unacceptable
and we plan to fix it later.
Release note (cli change): The interactive SQL shell (
cockroach sql
,cockroach demo
) now supports interrupting a currently runningquery with Ctrl+C, without losing access to the shell.