Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(spanner/spansql): add support for protobuf column types & Proto bundles #10945

Merged

Conversation

dfinkel
Copy link
Contributor

@dfinkel dfinkel commented Oct 2, 2024

feat(spansql): CREATE/ALTER/DROP PROTO BUNDLE

Add support for parsing and serializing CREATE, ALTER and DROP PROTO
BUNDLE DDL statements.

feat(spanner/spansql): support for protobuf types

Now that Spanner supports protobuf message and enum-typed columns and
casts, add support for parsing those those types.

Since protobuf columns aren't distinguished by a keyword, adjust the
parser to see any unquoted identifier that's not a known type as a
possible protobuf type and loop, consuming .s and identifiers until it
hits a non-ident/. token. (to match the proto namespace components up
through the message or enum names)

To track the fully-qualified message/enum type-name add an additional
field to the Type struct (tentatively) called ProtoRef so we can
recover the message/enum name if canonicalizing everything.

closes: #10944

Now that Spanner supports protobuf message and enum-typed columns and
casts, add support for parsing those those types.

Since protobuf columns aren't distinguished by a keyword, adjust the
parser to see any unquoted identifier that's not a known type as a
possible protobuf type and loop, consuming `.`s and identifiers until it
hits a non-ident/`.` token. (to match the proto namespace components up
through the message or enum names)

To track the fully-qualified message/enum type-name add an additional
field to the `Type` struct (tentatively) called `ProtoRef` so we can
recover the message/enum name if canonicalizing everything.
@dfinkel dfinkel requested review from a team as code owners October 2, 2024 14:17
@product-auto-label product-auto-label bot added the api: spanner Issues related to the Spanner API. label Oct 2, 2024
Add support for parsing and serializing CREATE, ALTER and DROP PROTO
BUNDLE DDL statements.
@dfinkel dfinkel force-pushed the spansql_protobuf_column_support branch from 8e788c3 to 0c841d8 Compare October 2, 2024 14:21
@rahul2393 rahul2393 added kokoro:force-run Add this label to force Kokoro to re-run the tests. automerge Merge the pull request once unit tests and other checks pass. labels Oct 3, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 3, 2024
@rahul2393
Copy link
Contributor

@dfinkel tests failed, please check and fix them
Screenshot 2024-10-03 at 4 19 57 PM

@rahul2393 rahul2393 removed the automerge Merge the pull request once unit tests and other checks pass. label Oct 3, 2024
This else if block got lost while squashing/reordering commits
(resolving a conflict). Bring it back so the tests pass.
@dfinkel
Copy link
Contributor Author

dfinkel commented Oct 3, 2024

Thanks @rahul2393 for the prompt review and approval!

Sorry about the test failures!
It looks like an else if block got lost while I was cleaning up the commits before sending it out for review. (then I absentmindedly ran the tests for the wrong directory 🫤 )

The tests for this package now pass locally.

@rahul2393 rahul2393 added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 3, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Oct 3, 2024
@dfinkel
Copy link
Contributor Author

dfinkel commented Oct 15, 2024

Friendly ping @rahul2393

(thanks for the prompt approval)

@dfinkel
Copy link
Contributor Author

dfinkel commented Oct 23, 2024

@rahul2393 if you have a few minutes, can you make another pass over this PR?

@rahul2393 rahul2393 added automerge Merge the pull request once unit tests and other checks pass. kokoro:force-run Add this label to force Kokoro to re-run the tests. labels Nov 6, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Nov 6, 2024
@gcf-merge-on-green gcf-merge-on-green bot merged commit 91c6f0f into googleapis:main Nov 6, 2024
8 checks passed
@gcf-merge-on-green gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Nov 6, 2024
@dfinkel dfinkel deleted the spansql_protobuf_column_support branch November 6, 2024 13:07
@dfinkel
Copy link
Contributor Author

dfinkel commented Nov 6, 2024

Thanks @rahul2393 !

Comment on lines +447 to +448
{`CAST(Bar AS ENUM)`, Func{Name: "CAST", Args: []Expr{TypedExpr{Expr: ID("Bar"), Type: Type{Base: Enum}}}}},
{`CAST(Bar AS PROTO)`, Func{Name: "CAST", Args: []Expr{TypedExpr{Expr: ID("Bar"), Type: Type{Base: Proto}}}}},
Copy link
Contributor

@apstndb apstndb Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe they are not a valid GoogleSQL query because ENUM and PROTO is keywords.

$ gcloud spanner databases execute-sql ${SPANNER_DATABASE} --sql 'SELECT CAST(Bar AS PROTO)'
ERROR: (gcloud.spanner.databases.execute-sql) INVALID_ARGUMENT: Syntax error: Unexpected keyword PROTO [at 1:20]\nSELECT CAST(Bar AS PROTO)\n                   ^
- '@type': type.googleapis.com/google.rpc.LocalizedMessage
  locale: en-US
  message: |-
    Syntax error: Unexpected keyword PROTO [at 1:20]
    SELECT CAST(Bar AS PROTO)
                       ^
$ gcloud spanner databases execute-sql ${SPANNER_DATABASE} --sql 'SELECT CAST(Bar AS ENUM)'                  
ERROR: (gcloud.spanner.databases.execute-sql) INVALID_ARGUMENT: Syntax error: Unexpected keyword ENUM [at 1:20]\nSELECT CAST(Bar AS ENUM)\n                   ^
- '@type': type.googleapis.com/google.rpc.LocalizedMessage
  locale: en-US
  message: |-
    Syntax error: Unexpected keyword ENUM [at 1:20]
    SELECT CAST(Bar AS ENUM)
                       ^

CAST AS ENUM and CAST AS PROTO needs named type name of PROTO or ENUM, not PROTO or ENUM keywords.

https://cloud.google.com/spanner/docs/reference/standard-sql/data-types#enum_type

You reference an enum type, such as when using CAST, by using its fully qualified name.

https://cloud.google.com/spanner/docs/reference/standard-sql/conversion_functions#cast_as_proto

SELECT
  CAST(
    '''
    year: 2001
    month: 9
    type { award_name: 'Best Artist' category: 'Artist' }
    type { award_name: 'Best Album' category: 'Album' }
    '''
    AS googlesql.examples.music.Award)
  AS award_col

Therefore, they should be escaped named type name:

CAST(Bar AS `PROTO`)
CAST(Bar AS `ENUM`)

or other named type names.

CAST(Bar AS ProtoType)
CAST(Bar AS EnumType)

}},
},
},
`SELECT CAST(7 AS ENUM)`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

$ gcloud spanner databases execute-sql ${SPANNER_DATABASE} --sql 'SELECT CAST(7 AS ENUM)'  
ERROR: (gcloud.spanner.databases.execute-sql) INVALID_ARGUMENT: Syntax error: Unexpected keyword ENUM [at 1:18]\nSELECT CAST(7 AS ENUM)\n                 ^
- '@type': type.googleapis.com/google.rpc.LocalizedMessage
  locale: en-US
  message: |-
    Syntax error: Unexpected keyword ENUM [at 1:18]
    SELECT CAST(7 AS ENUM)
                     ^

@apstndb
Copy link
Contributor

apstndb commented Nov 13, 2024

I have re-used this tests for another purpose( cloudspannerecosystem/memefish#115 ) , so it is better to contain only valid cases.
Would you fix it? @dfinkel
If not, I will send PR.

@dfinkel
Copy link
Contributor Author

dfinkel commented Nov 13, 2024

Thanks @apstndb, I definitely copied those cases from the documentation. I can put together a PR to switch those to something that's actually valid tomorrow.

(I think I meant to verify that those were valid before sending this PR out but as usual other things intervened and that got lost)

dfinkel added a commit to dfinkel/google-cloud-go that referenced this pull request Dec 12, 2024
A misreading of the spanner docs lead to tests that indicated that
casting `AS ENUM` or `AS PROTO` was valid syntax (despite not specifying
_which_ protobuf enum or message type to cast to). Replace these cases
with ones that validate casting to specific enum/message types.

Thanks to @apstndb for calling this out on googleapis#10945.
@dfinkel
Copy link
Contributor Author

dfinkel commented Dec 12, 2024

I just opened a PR with fixes for some bugs I ran into while using the features in this PR (including fixing the tests that @apstndb pointed out):
#11279

The most problematic bug is the fact that NOT NULL protobuf-typed columns don't get parsed correctly because I had accidentally made the parser too greedy.

Thanks in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: spanner Issues related to the Spanner API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

spanner/spansql: support for Protobuf column-types
4 participants