-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-20.1: sql: fix portals after exhausting rows #52443
release-20.1: sql: fix portals after exhausting rows #52443
Conversation
cc @andreimatei @jordanlewis to see whether you have thoughts about backporting this fix. |
Huh, I'm very confused why
cc @rafiss did you experience any flakiness working on row description fields? I'm hitting this error every time running locally on 1a085c0. |
dc12d42
to
36a62b0
Compare
Hmm I'm not sure how an existing commit would start failing. We have seen that the OIDs are not easy to work with, since the same server is shared across different pgtest files. That was made easier in #48555, but that was not backported. However, even without that PR, the OIDs should still be stable and deterministic. |
@yuzefovich I checked out that commit and the tests passed for me locally. Did you have any additional pgtest files saved but not checked in, perhaps? That could cause the OIDs to differ.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate from the test issues described earlier, can you make sure the fix from #52940 is included in this backport?
36a62b0
to
c827194
Compare
@rafiss I don't know why you're not seeing the failure, I can see it both on current release-20.1 (68610d6) and on 1a085c0 running as
and no, I don't have any other files in that folder, so I'm very confused. Which version of Go are you using? I'm on 1.13.15. One possibility is that the creation of some objects has been added in another files which would increase the IDs, another possibility is that we somehow changed the way we run this |
Previously, we would erroneously restart the execution from the very beginning of empty, unclosed portals after they have been fully exhausted when we should be returning no rows or an error in such scenarios. This is now fixed by tracking whether a portal is exhausted or not and intercepting the calls to `execStmt` when the conn executor state machine is in an open state. Note that the current solution has known deviations from Postgres: - when attempting to execute portals of statements that don't return row sets, on the second and consequent attempt PG returns an error while we are silently doing nothing (meaning we do not run the statement at all and return 0 rows) - we incorrectly populate "command tag" field of pgwire messages of some rows-returning statements after the portal suspension (for example, a suspended UPDATE RETURNING in PG will return the total count of "rows affected" while we will return the count since the last suspension). These deviations are deemed acceptable since this commit fixes a much worse problem - re-executing an exhausted portal (which could be a mutation meaning, previously, we could have executed a mutation multiple times). The reasons for why this commit does not address these deviations are: - Postgres has a concept of "portal strategy" (see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89). - Postgres has a concept of "command" type (these are things like SELECTs, UPDATEs, INSERTs, etc, see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672). CRDB doesn't have these concepts, and without them any attempt to simulate Postgres results in a very error-prone and brittle code. Release note (bug fix): Previously, CockroachDB would erroneously restart the execution of empty, unclosed portals after they have been fully exhausted, and this has been fixed.
PR cockroachdb#48842 added logic to exhaust portals after executing them. This had issues when the portal being executed closes itself, which happens when using DEALLOCATE in a prepared statement. Now we check if the portal still exists before exhausting it. There is no release note as this fixes a bug that only exists in unreleased versions. Release note: None
c827194
to
c51de08
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have go1.13.14
but I don't know how that would make a difference.
It looks like it passed on TeamCity too. Maybe you could try with a completely fresh checkout? Or perhaps a gceworker? It's certainly not great that it doesn't pass locally for you, but I don't know how much effort to put into finding out why.
One other unrelated thing I just realized -- for the original fix, did you try running with make test PKG=./pkg/sql/pgwire TESTS=TestPGTest TESTFLAGS="-addr=localhost:5432"
(to make it run against Postgres)? There is a syntax for the pgtest files to expect different things with PG vs CRDB and it is useful to see a clean run against both and make the tests document the differences in the wire protocol. It might be worth doing that and getting the test passing against PG on master, if you agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any objections to squashing the third commit into the first one?
just saw your comment. sounds good to me, but please consider trying the tests with PG and getting it to pass on master as i described above. (sorry i know that my request is beyond the scope of just backporting an existing change, but i didn't see the original PR where this comment would have made more sense)
What's confusing is that on master I didn't have problems with |
There are a bunch of failures against PG (on
So I'll go ahead and merge this with the adjustment of |
Right, sorry, I didn't mean to say you should make this backport work against PG. I meant to suggest that you modify the tests on master so that they pass against PG. The differing OIDs are resolved using
|
The tests that I added in #48842 do pass on master against PG, there are currently 2 failures, however, that are not related to this backport at all:
|
Thanks for the discussion here! |
Backport 1/1 commits from #48842.
Backport 1/1 commits from #52940.
/cc @cockroachdb/release
sql: fix portals after exhausting rows
Previously, we would erroneously restart the execution from the very
beginning of empty, unclosed portals after they have been fully
exhausted when we should be returning no rows or an error in such
scenarios. This is now fixed by tracking whether a portal is exhausted
or not and intercepting the calls to
execStmt
when the conn executorstate machine is in an open state.
Note that the current solution has known deviations from Postgres:
sets, on the second and consequent attempt PG returns an error while we
are silently doing nothing (meaning we do not run the statement at all
and return 0 rows)
rows-returning statements after the portal suspension (for example,
a suspended UPDATE RETURNING in PG will return the total count of "rows
affected" while we will return the count since the last suspension).
These deviations are deemed acceptable since this commit fixes a much
worse problem - re-executing an exhausted portal (which could be
a mutation meaning, previously, we could have executed a mutation
multiple times).
The reasons for why this commit does not address these deviations are:
(see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89).
SELECTs, UPDATEs, INSERTs, etc,
see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672).
CRDB doesn't have these concepts, and without them any attempt to
simulate Postgres results in a very error-prone and brittle code.
Fixes: #48448.
Release note (bug fix): Previously, CockroachDB would erroneously
restart the execution of empty, unclosed portals after they have been
fully exhausted, and this has been fixed.
sql: allow DEALLOCATE ALL with a prepared statement
PR #48842 added logic to exhaust portals after executing them. This had
issues when the portal being executed closes itself, which happens when
using DEALLOCATE in a prepared statement. Now we check if the portal
still exists before exhausting it.
There is no release note as this fixes a bug that only exists in
unreleased versions.
Release note: None