Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

release-20.1: sql: fix portals after exhausting rows #52443

Merged
merged 2 commits into from
Aug 20, 2020

Conversation

yuzefovich
Copy link
Member

@yuzefovich yuzefovich commented Aug 6, 2020

Backport 1/1 commits from #48842.
Backport 1/1 commits from #52940.

/cc @cockroachdb/release


sql: fix portals after exhausting rows

Previously, we would erroneously restart the execution from the very
beginning of empty, unclosed portals after they have been fully
exhausted when we should be returning no rows or an error in such
scenarios. This is now fixed by tracking whether a portal is exhausted
or not and intercepting the calls to execStmt when the conn executor
state machine is in an open state.

Note that the current solution has known deviations from Postgres:

  • when attempting to execute portals of statements that don't return row
    sets, on the second and consequent attempt PG returns an error while we
    are silently doing nothing (meaning we do not run the statement at all
    and return 0 rows)
  • we incorrectly populate "command tag" field of pgwire messages of some
    rows-returning statements after the portal suspension (for example,
    a suspended UPDATE RETURNING in PG will return the total count of "rows
    affected" while we will return the count since the last suspension).

These deviations are deemed acceptable since this commit fixes a much
worse problem - re-executing an exhausted portal (which could be
a mutation meaning, previously, we could have executed a mutation
multiple times).

The reasons for why this commit does not address these deviations are:

CRDB doesn't have these concepts, and without them any attempt to
simulate Postgres results in a very error-prone and brittle code.

Fixes: #48448.

Release note (bug fix): Previously, CockroachDB would erroneously
restart the execution of empty, unclosed portals after they have been
fully exhausted, and this has been fixed.

sql: allow DEALLOCATE ALL with a prepared statement

PR #48842 added logic to exhaust portals after executing them. This had
issues when the portal being executed closes itself, which happens when
using DEALLOCATE in a prepared statement. Now we check if the portal
still exists before exhausting it.

There is no release note as this fixes a bug that only exists in
unreleased versions.

Release note: None

@yuzefovich yuzefovich requested a review from maddyblue August 6, 2020 00:18
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@yuzefovich
Copy link
Member Author

cc @andreimatei @jordanlewis to see whether you have thoughts about backporting this fix.

@yuzefovich
Copy link
Member Author

Huh, I'm very confused why TestPGTest/row_description started failing on this branch but wasn't failing before. When I ran the test manually on 1a085c0 (the commit that added the test file), I'm still hitting the same error:

--- FAIL: TestPGTest (0.20s)
    --- FAIL: TestPGTest/row_description (0.08s)
        datadriven.go:165: 
            testdata/pgtest/row_description:59: ReadyForQuery
            expected:
            {"Type":"RowDescription","Fields":[{"Name":"a","TableOID":55,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}
            {"Type":"DataRow","Values":[{"text":"1"}]}
            {"Type":"CommandComplete","CommandTag":"SELECT 1"}
            {"Type":"ReadyForQuery","TxStatus":"I"}
            
            found:
            {"Type":"RowDescription","Fields":[{"Name":"a","TableOID":52,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}
            {"Type":"DataRow","Values":[{"text":"1"}]}
            {"Type":"CommandComplete","CommandTag":"SELECT 1"}
            {"Type":"ReadyForQuery","TxStatus":"I"}

cc @rafiss did you experience any flakiness working on row description fields? I'm hitting this error every time running locally on 1a085c0.

@yuzefovich yuzefovich added the do-not-merge bors won't merge a PR with this label. label Aug 11, 2020
@rafiss
Copy link
Collaborator

rafiss commented Aug 11, 2020

Hmm I'm not sure how an existing commit would start failing. We have seen that the OIDs are not easy to work with, since the same server is shared across different pgtest files. That was made easier in #48555, but that was not backported. However, even without that PR, the OIDs should still be stable and deterministic.

@rafiss
Copy link
Collaborator

rafiss commented Aug 11, 2020

@yuzefovich I checked out that commit and the tests passed for me locally. Did you have any additional pgtest files saved but not checked in, perhaps? That could cause the OIDs to differ.

❯ git status
HEAD detached at 1a085c0328

❯ make test PKG=./pkg/sql/pgwire TESTS=TestPGTest
Running make with -j12
GOPATH set to /Users/rafiss/go
GOFLAGS= go test   -tags ' make x86_64_apple_darwin19.6.0' -ldflags '-X github.com/cockroachdb/cockroach/pkg/build.typ=development -extldflags "-L/usr/local/opt/openssl/lib" -X "github.com/cockroachdb/cockroach/pkg/build.tag=v20.1.0-154-g1a085c0328-dirty" -X "github.com/cockroachdb/cockroach/pkg/build.rev=1a085c0328595dd3f13761dc8352b05be85308d8" -X "github.com/cockroachdb/cockroach/pkg/build.cgoTargetTriple=x86_64-apple-darwin19.6.0"  ' -run "TestPGTest"  -timeout 30m ./pkg/sql/pgwire
ok  	github.com/cockroachdb/cockroach/pkg/sql/pgwire	3.936s

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separate from the test issues described earlier, can you make sure the fix from #52940 is included in this backport?

@yuzefovich
Copy link
Member Author

yuzefovich commented Aug 20, 2020

@rafiss I don't know why you're not seeing the failure, I can see it both on current release-20.1 (68610d6) and on 1a085c0 running as

make test PKG=./pkg/sql/pgwire TESTS=TestPGTest/row_description

and no, I don't have any other files in that folder, so I'm very confused. Which version of Go are you using? I'm on 1.13.15.

One possibility is that the creation of some objects has been added in another files which would increase the IDs, another possibility is that we somehow changed the way we run this TestPGTest (like reusing the same database between different files). Still, I don't understand why I can see the failure locally and you can't on 1a085c0.

@yuzefovich
Copy link
Member Author

Hm, the build is now green. @rafiss @mjibson any objections to squashing the third commit into the first one?

yuzefovich and others added 2 commits August 20, 2020 16:02
Previously, we would erroneously restart the execution from the very
beginning of empty, unclosed portals after they have been fully
exhausted when we should be returning no rows or an error in such
scenarios. This is now fixed by tracking whether a portal is exhausted
or not and intercepting the calls to `execStmt` when the conn executor
state machine is in an open state.

Note that the current solution has known deviations from Postgres:
- when attempting to execute portals of statements that don't return row
sets, on the second and consequent attempt PG returns an error while we
are silently doing nothing (meaning we do not run the statement at all
and return 0 rows)
- we incorrectly populate "command tag" field of pgwire messages of some
rows-returning statements after the portal suspension (for example,
a suspended UPDATE RETURNING in PG will return the total count of "rows
affected" while we will return the count since the last suspension).

These deviations are deemed acceptable since this commit fixes a much
worse problem - re-executing an exhausted portal (which could be
a mutation meaning, previously, we could have executed a mutation
multiple times).

The reasons for why this commit does not address these deviations are:
- Postgres has a concept of "portal strategy"
(see https://github.com/postgres/postgres/blob/2f9661311b83dc481fc19f6e3bda015392010a40/src/include/utils/portal.h#L89).
- Postgres has a concept of "command" type (these are things like
SELECTs, UPDATEs, INSERTs, etc,
see https://github.com/postgres/postgres/blob/1aac32df89eb19949050f6f27c268122833ad036/src/include/nodes/nodes.h#L672).

CRDB doesn't have these concepts, and without them any attempt to
simulate Postgres results in a very error-prone and brittle code.

Release note (bug fix): Previously, CockroachDB would erroneously
restart the execution of empty, unclosed portals after they have been
fully exhausted, and this has been fixed.
PR cockroachdb#48842 added logic to exhaust portals after executing them. This had
issues when the portal being executed closes itself, which happens when
using DEALLOCATE in a prepared statement. Now we check if the portal
still exists before exhausting it.

There is no release note as this fixes a bug that only exists in
unreleased versions.

Release note: None
@yuzefovich yuzefovich removed the do-not-merge bors won't merge a PR with this label. label Aug 20, 2020
Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have go1.13.14 but I don't know how that would make a difference.

It looks like it passed on TeamCity too. Maybe you could try with a completely fresh checkout? Or perhaps a gceworker? It's certainly not great that it doesn't pass locally for you, but I don't know how much effort to put into finding out why.

One other unrelated thing I just realized -- for the original fix, did you try running with make test PKG=./pkg/sql/pgwire TESTS=TestPGTest TESTFLAGS="-addr=localhost:5432" (to make it run against Postgres)? There is a syntax for the pgtest files to expect different things with PG vs CRDB and it is useful to see a clean run against both and make the tests document the differences in the wire protocol. It might be worth doing that and getting the test passing against PG on master, if you agree.

Copy link
Collaborator

@rafiss rafiss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any objections to squashing the third commit into the first one?

just saw your comment. sounds good to me, but please consider trying the tests with PG and getting it to pass on master as i described above. (sorry i know that my request is beyond the scope of just backporting an existing change, but i didn't see the original PR where this comment would have made more sense)

@yuzefovich
Copy link
Member Author

please consider trying the tests with PG and getting it to pass on master as i described above

What's confusing is that on master I didn't have problems with TestPGTest/row_description (nor on release-19.2), only when backporting to 20.1 did I hit this issue. I'm double checking against the Postgres right now.

@yuzefovich
Copy link
Member Author

There are a bunch of failures against PG (on int_size, notice, param_status, pgjdbc, and a few others) including row_description:

    --- FAIL: TestPGTest/row_description (0.01s)
        datadriven.go:165: 
            testdata/pgtest/row_description:59: ReadyForQuery
            expected:
            {"Type":"RowDescription","Fields":[{"Name":"a","TableOID":59,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}
            {"Type":"DataRow","Values":[{"text":"1"}]}
            {"Type":"CommandComplete","CommandTag":"SELECT 1"}
            {"Type":"ReadyForQuery","TxStatus":"I"}
            
            found:
            {"Type":"RowDescription","Fields":[{"Name":"a","TableOID":17639,"TableAttributeNumber":1,"DataTypeOID":23,"DataTypeSize":4,"TypeModifier":-1,"Format":0}]}
            {"Type":"DataRow","Values":[{"text":"1"}]}
            {"Type":"CommandComplete","CommandTag":"SELECT 1"}
            {"Type":"ReadyForQuery","TxStatus":"I"}

So I'll go ahead and merge this with the adjustment of row_description in the first commit once the CI is green. I don't think it's worth spending more time looking into this since our TableOIDs are different from postgres anyway.

@rafiss
Copy link
Collaborator

rafiss commented Aug 20, 2020

Right, sorry, I didn't mean to say you should make this backport work against PG. I meant to suggest that you modify the tests on master so that they pass against PG. The differing OIDs are resolved using noncrdb_only and crdb_only directives. Like in this example from row_description:

send
Query {"String": "SELECT a FROM tab1"}
----

# With postgres we don't control the table OID.
until ignore_table_oids noncrdb_only
RowDescription
----
{"Type":"RowDescription","Fields":[{"Name":"a","TableOID":0,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}

until crdb_only
RowDescription
----
{"Type":"RowDescription","Fields":[{"Name":"a","TableOID":52,"TableAttributeNumber":1,"DataTypeOID":20,"DataTypeSize":8,"TypeModifier":-1,"Format":0}]}

until
ReadyForQuery
----
{"Type":"DataRow","Values":[{"text":"1"}]}
{"Type":"CommandComplete","CommandTag":"SELECT 1"}
{"Type":"ReadyForQuery","TxStatus":"I"}

@yuzefovich
Copy link
Member Author

The tests that I added in #48842 do pass on master against PG, there are currently 2 failures, however, that are not related to this backport at all:

    --- FAIL: TestPGTest/pgjdbc (0.01s)
        datadriven.go:194: 
            testdata/pgtest/pgjdbc:45: ReadyForQuery
            expected:
            {"Type":"ParameterStatus","Name":"application_name","Value":""}
            {"Type":"ParameterStatus","Name":"TimeZone","Value":"UTC"}
            {"Type":"CommandComplete","CommandTag":"DISCARD"}
            {"Type":"ReadyForQuery","TxStatus":"I"}
            
            found:
            {"Type":"ParameterStatus","Name":"is_superuser","Value":"off"}
            {"Type":"ParameterStatus","Name":"session_authorization","Value":"postgres"}
            {"Type":"CommandComplete","CommandTag":"DISCARD ALL"}
            {"Type":"ReadyForQuery","TxStatus":"I"}
    --- FAIL: TestPGTest/row_description (0.02s)
        datadriven.go:119: testdata/pgtest/row_description:307: waiting for *pgproto3.ReadyForQuery, got &pgproto3.ErrorResponse{Severity:"ERROR", Code:"42704", Message:"unrecognized configuration parameter \"enable_experimental_alter_column_type_general\"", Detail:"", Hint:"", Position:0, InternalPosition:0, InternalQuery:"", Where:"", SchemaName:"", TableName:"", ColumnName:"", DataTypeName:"", ConstraintName:"", File:"guc.c", Line:6215, Routine:"set_config_option", UnknownFields:map[uint8]string{0x56:"ERROR"}}
...

@yuzefovich yuzefovich merged commit 4b1cf49 into cockroachdb:release-20.1 Aug 20, 2020
@yuzefovich
Copy link
Member Author

Thanks for the discussion here!

@yuzefovich yuzefovich deleted the backport20.1-48842 branch August 20, 2020 23:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants