release-20.1: sql: fix portals after exhausting rows #52443

yuzefovich · 2020-08-06T00:18:51Z

Backport 1/1 commits from #48842.
Backport 1/1 commits from #52940.

/cc @cockroachdb/release

sql: fix portals after exhausting rows

Previously, we would erroneously restart the execution from the very
beginning of empty, unclosed portals after they have been fully
exhausted when we should be returning no rows or an error in such
scenarios. This is now fixed by tracking whether a portal is exhausted
or not and intercepting the calls to execStmt when the conn executor
state machine is in an open state.

Note that the current solution has known deviations from Postgres:

when attempting to execute portals of statements that don't return row
sets, on the second and consequent attempt PG returns an error while we
are silently doing nothing (meaning we do not run the statement at all
and return 0 rows)
we incorrectly populate "command tag" field of pgwire messages of some
rows-returning statements after the portal suspension (for example,
a suspended UPDATE RETURNING in PG will return the total count of "rows
affected" while we will return the count since the last suspension).

These deviations are deemed acceptable since this commit fixes a much
worse problem - re-executing an exhausted portal (which could be
a mutation meaning, previously, we could have executed a mutation
multiple times).

The reasons for why this commit does not address these deviations are:

Postgres has a concept of "portal strategy"
(see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89).
Postgres has a concept of "command" type (these are things like
SELECTs, UPDATEs, INSERTs, etc,
see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672).

CRDB doesn't have these concepts, and without them any attempt to
simulate Postgres results in a very error-prone and brittle code.

Fixes: #48448.

Release note (bug fix): Previously, CockroachDB would erroneously
restart the execution of empty, unclosed portals after they have been
fully exhausted, and this has been fixed.

sql: allow DEALLOCATE ALL with a prepared statement

PR #48842 added logic to exhaust portals after executing them. This had
issues when the portal being executed closes itself, which happens when
using DEALLOCATE in a prepared statement. Now we check if the portal
still exists before exhausting it.

There is no release note as this fixes a bug that only exists in
unreleased versions.

Release note: None

cockroach-teamcity · 2020-08-06T00:18:58Z

This change is

yuzefovich · 2020-08-06T00:19:22Z

cc @andreimatei @jordanlewis to see whether you have thoughts about backporting this fix.

yuzefovich · 2020-08-11T01:27:18Z

Huh, I'm very confused why TestPGTest/row_description started failing on this branch but wasn't failing before. When I ran the test manually on 1a085c0 (the commit that added the test file), I'm still hitting the same error:

--- FAIL: TestPGTest (0.20s)
    --- FAIL: TestPGTest/row_description (0.08s)
        datadriven.go:165: 
            testdata/pgtest/row_description:59: ReadyForQuery
            expected:
            {"Type":"RowDescription","Fields":[{"Name":"a","TableOID":55,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}
            {"Type":"DataRow","Values":[{"text":"1"}]}
            {"Type":"CommandComplete","CommandTag":"SELECT 1"}
            {"Type":"ReadyForQuery","TxStatus":"I"}
            
            found:
            {"Type":"RowDescription","Fields":[{"Name":"a","TableOID":52,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}
            {"Type":"DataRow","Values":[{"text":"1"}]}
            {"Type":"CommandComplete","CommandTag":"SELECT 1"}
            {"Type":"ReadyForQuery","TxStatus":"I"}

cc @rafiss did you experience any flakiness working on row description fields? I'm hitting this error every time running locally on 1a085c0.

rafiss · 2020-08-11T04:26:01Z

Hmm I'm not sure how an existing commit would start failing. We have seen that the OIDs are not easy to work with, since the same server is shared across different pgtest files. That was made easier in #48555, but that was not backported. However, even without that PR, the OIDs should still be stable and deterministic.

rafiss · 2020-08-11T15:07:28Z

@yuzefovich I checked out that commit and the tests passed for me locally. Did you have any additional pgtest files saved but not checked in, perhaps? That could cause the OIDs to differ.

❯ git status
HEAD detached at 1a085c0328

❯ make test PKG=./pkg/sql/pgwire TESTS=TestPGTest
Running make with -j12
GOPATH set to /Users/rafiss/go
GOFLAGS= go test   -tags ' make x86_64_apple_darwin19.6.0' -ldflags '-X github.com/cockroachdb/cockroach/pkg/build.typ=development -extldflags "-L/usr/local/opt/openssl/lib" -X "github.com/cockroachdb/cockroach/pkg/build.tag=v20.1.0-154-g1a085c0328-dirty" -X "github.com/cockroachdb/cockroach/pkg/build.rev=1a085c0328595dd3f13761dc8352b05be85308d8" -X "github.com/cockroachdb/cockroach/pkg/build.cgoTargetTriple=x86_64-apple-darwin19.6.0"  ' -run "TestPGTest"  -timeout 30m ./pkg/sql/pgwire
ok  	github.com/cockroachdb/cockroach/pkg/sql/pgwire	3.936s

rafiss

separate from the test issues described earlier, can you make sure the fix from #52940 is included in this backport?

yuzefovich · 2020-08-20T22:13:50Z

@rafiss I don't know why you're not seeing the failure, I can see it both on current release-20.1 (68610d6) and on 1a085c0 running as

make test PKG=./pkg/sql/pgwire TESTS=TestPGTest/row_description

and no, I don't have any other files in that folder, so I'm very confused. Which version of Go are you using? I'm on 1.13.15.

One possibility is that the creation of some objects has been added in another files which would increase the IDs, another possibility is that we somehow changed the way we run this TestPGTest (like reusing the same database between different files). Still, I don't understand why I can see the failure locally and you can't on 1a085c0.

yuzefovich · 2020-08-20T23:00:15Z

Hm, the build is now green. @rafiss @mjibson any objections to squashing the third commit into the first one?

Previously, we would erroneously restart the execution from the very beginning of empty, unclosed portals after they have been fully exhausted when we should be returning no rows or an error in such scenarios. This is now fixed by tracking whether a portal is exhausted or not and intercepting the calls to `execStmt` when the conn executor state machine is in an open state. Note that the current solution has known deviations from Postgres: - when attempting to execute portals of statements that don't return row sets, on the second and consequent attempt PG returns an error while we are silently doing nothing (meaning we do not run the statement at all and return 0 rows) - we incorrectly populate "command tag" field of pgwire messages of some rows-returning statements after the portal suspension (for example, a suspended UPDATE RETURNING in PG will return the total count of "rows affected" while we will return the count since the last suspension). These deviations are deemed acceptable since this commit fixes a much worse problem - re-executing an exhausted portal (which could be a mutation meaning, previously, we could have executed a mutation multiple times). The reasons for why this commit does not address these deviations are: - Postgres has a concept of "portal strategy" (see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89). - Postgres has a concept of "command" type (these are things like SELECTs, UPDATEs, INSERTs, etc, see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672). CRDB doesn't have these concepts, and without them any attempt to simulate Postgres results in a very error-prone and brittle code. Release note (bug fix): Previously, CockroachDB would erroneously restart the execution of empty, unclosed portals after they have been fully exhausted, and this has been fixed.

PR cockroachdb#48842 added logic to exhaust portals after executing them. This had issues when the portal being executed closes itself, which happens when using DEALLOCATE in a prepared statement. Now we check if the portal still exists before exhausting it. There is no release note as this fixes a bug that only exists in unreleased versions. Release note: None

rafiss

I have go1.13.14 but I don't know how that would make a difference.

It looks like it passed on TeamCity too. Maybe you could try with a completely fresh checkout? Or perhaps a gceworker? It's certainly not great that it doesn't pass locally for you, but I don't know how much effort to put into finding out why.

One other unrelated thing I just realized -- for the original fix, did you try running with make test PKG=./pkg/sql/pgwire TESTS=TestPGTest TESTFLAGS="-addr=localhost:5432" (to make it run against Postgres)? There is a syntax for the pgtest files to expect different things with PG vs CRDB and it is useful to see a clean run against both and make the tests document the differences in the wire protocol. It might be worth doing that and getting the test passing against PG on master, if you agree.

rafiss

any objections to squashing the third commit into the first one?

just saw your comment. sounds good to me, but please consider trying the tests with PG and getting it to pass on master as i described above. (sorry i know that my request is beyond the scope of just backporting an existing change, but i didn't see the original PR where this comment would have made more sense)

yuzefovich · 2020-08-20T23:19:00Z

please consider trying the tests with PG and getting it to pass on master as i described above

What's confusing is that on master I didn't have problems with TestPGTest/row_description (nor on release-19.2), only when backporting to 20.1 did I hit this issue. I'm double checking against the Postgres right now.

yuzefovich · 2020-08-20T23:26:32Z

There are a bunch of failures against PG (on int_size, notice, param_status, pgjdbc, and a few others) including row_description:

    --- FAIL: TestPGTest/row_description (0.01s)
        datadriven.go:165: 
            testdata/pgtest/row_description:59: ReadyForQuery
            expected:
            {"Type":"RowDescription","Fields":[{"Name":"a","TableOID":59,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}
            {"Type":"DataRow","Values":[{"text":"1"}]}
            {"Type":"CommandComplete","CommandTag":"SELECT 1"}
            {"Type":"ReadyForQuery","TxStatus":"I"}
            
            found:
            {"Type":"RowDescription","Fields":[{"Name":"a","TableOID":17639,"TableAttributeNumber":1,"DataTypeOID":23,"DataTypeSize":4,"TypeModifier":-1,"Format":0}]}
            {"Type":"DataRow","Values":[{"text":"1"}]}
            {"Type":"CommandComplete","CommandTag":"SELECT 1"}
            {"Type":"ReadyForQuery","TxStatus":"I"}

So I'll go ahead and merge this with the adjustment of row_description in the first commit once the CI is green. I don't think it's worth spending more time looking into this since our TableOIDs are different from postgres anyway.

rafiss · 2020-08-20T23:45:32Z

Right, sorry, I didn't mean to say you should make this backport work against PG. I meant to suggest that you modify the tests on master so that they pass against PG. The differing OIDs are resolved using noncrdb_only and crdb_only directives. Like in this example from row_description:

send
Query {"String": "SELECT a FROM tab1"}
----

# With postgres we don't control the table OID.
until ignore_table_oids noncrdb_only
RowDescription
----
{"Type":"RowDescription","Fields":[{"Name":"a","TableOID":0,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}

until crdb_only
RowDescription
----
{"Type":"RowDescription","Fields":[{"Name":"a","TableOID":52,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}

until
ReadyForQuery
----
{"Type":"DataRow","Values":[{"text":"1"}]}
{"Type":"CommandComplete","CommandTag":"SELECT 1"}
{"Type":"ReadyForQuery","TxStatus":"I"}

yuzefovich · 2020-08-20T23:48:41Z

The tests that I added in #48842 do pass on master against PG, there are currently 2 failures, however, that are not related to this backport at all:

    --- FAIL: TestPGTest/pgjdbc (0.01s)
        datadriven.go:194: 
            testdata/pgtest/pgjdbc:45: ReadyForQuery
            expected:
            {"Type":"ParameterStatus","Name":"application_name","Value":""}
            {"Type":"ParameterStatus","Name":"TimeZone","Value":"UTC"}
            {"Type":"CommandComplete","CommandTag":"DISCARD"}
            {"Type":"ReadyForQuery","TxStatus":"I"}
            
            found:
            {"Type":"ParameterStatus","Name":"is_superuser","Value":"off"}
            {"Type":"ParameterStatus","Name":"session_authorization","Value":"postgres"}
            {"Type":"CommandComplete","CommandTag":"DISCARD ALL"}
            {"Type":"ReadyForQuery","TxStatus":"I"}
    --- FAIL: TestPGTest/row_description (0.02s)
        datadriven.go:119: testdata/pgtest/row_description:307: waiting for *pgproto3.ReadyForQuery, got &pgproto3.ErrorResponse{Severity:"ERROR", Code:"42704", Message:"unrecognized configuration parameter \"enable_experimental_alter_column_type_general\"", Detail:"", Hint:"", Position:0, InternalPosition:0, InternalQuery:"", Where:"", SchemaName:"", TableName:"", ColumnName:"", DataTypeName:"", ConstraintName:"", File:"guc.c", Line:6215, Routine:"set_config_option", UnknownFields:map[uint8]string{0x56:"ERROR"}}
...

yuzefovich · 2020-08-20T23:55:47Z

Thanks for the discussion here!

yuzefovich requested a review from maddyblue August 6, 2020 00:18

yuzefovich force-pushed the backport20.1-48842 branch from dc12d42 to 36a62b0 Compare August 11, 2020 01:28

yuzefovich added the do-not-merge bors won't merge a PR with this label. label Aug 11, 2020

rafiss mentioned this pull request Aug 17, 2020

sql: panic caused by DEALLOCATE ALL / DISCARD ALL inside of a prepared statement #52915

Closed

rafiss requested changes Aug 18, 2020

View reviewed changes

yuzefovich force-pushed the backport20.1-48842 branch from 36a62b0 to c827194 Compare August 20, 2020 22:11

maddyblue approved these changes Aug 20, 2020

View reviewed changes

yuzefovich and others added 2 commits August 20, 2020 16:02

yuzefovich force-pushed the backport20.1-48842 branch from c827194 to c51de08 Compare August 20, 2020 23:07

yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Aug 20, 2020

rafiss reviewed Aug 20, 2020

View reviewed changes

rafiss approved these changes Aug 20, 2020

View reviewed changes

yuzefovich merged commit 4b1cf49 into cockroachdb:release-20.1 Aug 20, 2020

yuzefovich deleted the backport20.1-48842 branch August 20, 2020 23:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

release-20.1: sql: fix portals after exhausting rows #52443

release-20.1: sql: fix portals after exhausting rows #52443

yuzefovich commented Aug 6, 2020 •

edited

Loading

cockroach-teamcity commented Aug 6, 2020

yuzefovich commented Aug 6, 2020

yuzefovich commented Aug 11, 2020

rafiss commented Aug 11, 2020

rafiss commented Aug 11, 2020

rafiss left a comment

yuzefovich commented Aug 20, 2020 •

edited

Loading

yuzefovich commented Aug 20, 2020

rafiss left a comment

rafiss left a comment

yuzefovich commented Aug 20, 2020

yuzefovich commented Aug 20, 2020

rafiss commented Aug 20, 2020

yuzefovich commented Aug 20, 2020

yuzefovich commented Aug 20, 2020

release-20.1: sql: fix portals after exhausting rows #52443

release-20.1: sql: fix portals after exhausting rows #52443

Conversation

yuzefovich commented Aug 6, 2020 • edited Loading

cockroach-teamcity commented Aug 6, 2020

yuzefovich commented Aug 6, 2020

yuzefovich commented Aug 11, 2020

rafiss commented Aug 11, 2020

rafiss commented Aug 11, 2020

rafiss left a comment

Choose a reason for hiding this comment

yuzefovich commented Aug 20, 2020 • edited Loading

yuzefovich commented Aug 20, 2020

rafiss left a comment

Choose a reason for hiding this comment

rafiss left a comment

Choose a reason for hiding this comment

yuzefovich commented Aug 20, 2020

yuzefovich commented Aug 20, 2020

rafiss commented Aug 20, 2020

yuzefovich commented Aug 20, 2020

yuzefovich commented Aug 20, 2020

yuzefovich commented Aug 6, 2020 •

edited

Loading

yuzefovich commented Aug 20, 2020 •

edited

Loading