Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schemadiff: validate views' referenced columns via semantics #12565

Conversation

shlomi-noach
Copy link
Contributor

@shlomi-noach shlomi-noach commented Mar 7, 2023

An altrnative to #12147, that uses go/vt/vtgate/semantics to validate view column references.

Description

Up till now, schemadiff views validation and analysis was limited to:

  • Obviously validate statement
  • Infer tables (or views) reverenced by a view
  • Validate those tables/views exist
  • Validate no dependency loop

As of this PR, schemadiff also validates columns used and returned by views:

  • Validate that columns in SELECT and in WHERE clause do in fact exist in tables/views referenced by the view. Error when an unknown column is encountered.
  • Allow unqualified table name where there is no ambiguity. Error on unqualified name with ambiguity.
  • Analyze view's column names, whether aliased or not.
  • Support cascaded views.
  • Support star expression, either qualified (create view v as select t.* from t, t2 ...) or unqualified (create view v as select * from t1,t2) ; infer list of columns ; support cascaded views based on star expression views.

Plenty tests added to validate behavior.

Joint with @dbussink and based on his preliminary work.

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported
  • Tests were added or are not required
  • Documentation was added or is not required

Deployment Notes

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
…t views reading from DUAL

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach shlomi-noach added Type: Enhancement Logical improvement (somewhere between a bug and feature) Component: Query Serving labels Mar 7, 2023
@shlomi-noach
Copy link
Contributor Author

are any more reviews required from @vitessio/query-serving ?

Comment on lines 346 to 354
switch {
case e.Column == "":
return fmt.Sprintf("view %s references non-existent table %s", sqlescape.EscapeID(e.View), sqlescape.EscapeID(e.Table))
case e.Table != "":
return fmt.Sprintf("view %s references non-existent column %s.%s", sqlescape.EscapeID(e.View), sqlescape.EscapeID(e.Table), sqlescape.EscapeID(e.Column))
case e.Ambiguous:
return fmt.Sprintf("view %s references unqualified but non unique column %s", sqlescape.EscapeID(e.View), sqlescape.EscapeID(e.Column))
default:
return fmt.Sprintf("view %s references unqualified but non-existent column %s", sqlescape.EscapeID(e.View), sqlescape.EscapeID(e.Column))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error message looks very fragile based on what struct fields are set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also create the message ahead of time, when the struct is created?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that’s actually the wrong pattern. The core is the structured error and the string format is just a detail here.

The structured error removes the issue of string being brittle because anyone wants more details doesn’t look at the string at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also create the message ahead of time, when the struct is created?

Id' like to avoid that right now in the context of this PR, because that would require us to rewrite all error handling.

Also, I'm really not sure, because we want to be able to create errors using various combination of fields. We don't have a single "constructor" ; we'd need different New*() function for any combination of fields.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the error message looks very fragile based on what struct fields are set.

I can rewrite that to be more clear

Copy link
Contributor Author

@shlomi-noach shlomi-noach Mar 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was able to simplify InvalidColumnReferencedInViewError by removing unused Table field and subsequent error message generation cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshit-gangal allocating the string is also wasteful since when using structured errors it is never used. So I don’t think we should do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the MySQL protocol level, error strings are transferred over the wire, so we might be generating them eventually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@harshit-gangal this is never used for MySQL level errors. But even then, we’d allocate when needed and not always so then it’s imho still an improvement.

It also improved the ergonomics imho since it’s now easier to create a structured error then.

@shlomi-noach
Copy link
Contributor Author

PR seems to suffer a cyclic import path.

Comment on lines 34 to 41
semantics.FakeSI
}

func newDeclarativeSchemaInformation() *declarativeSchemaInformation {
return &declarativeSchemaInformation{
semantics.FakeSI{
Tables: make(map[string]*vindexes.Table),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FakeSI is for testing, better to take the field from there and create your own full struct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Copy link
Member

@harshit-gangal harshit-gangal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 comments. Overall looks good.

return errs.AggrError(vterrors.Aggregate)
}

// getTableColumnNames returns the names of columns in given table.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// getTableColumnNames returns the names of columns in given table.
// getEntityColumnNames returns the names of columns in given table.

Signed-off-by: Shlomi Noach <[email protected]>
Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

Depends on #12663 to solve import loop

@shlomi-noach
Copy link
Contributor Author

Merged resolve-semantics-engine-deps, introduced here.

Signed-off-by: Shlomi Noach <[email protected]>
@shlomi-noach
Copy link
Contributor Author

only merge this after #12663 is merged.

@shlomi-noach shlomi-noach merged commit 61b73b6 into vitessio:main Mar 21, 2023
@shlomi-noach shlomi-noach deleted the schemadiff-validate-view-columns-semantics branch March 21, 2023 03:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Query Serving Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants